1. Probability Theory
1.1 Set Thoery
  • Sample space $S$ is the set of all possible outcomes of a particular experiment.
  • Two events $A$ and $B$ are disjoint (mutually exclusive) if $A\cap B=\phi$.
  • The events $A_1,A_2,\cdots$ are pairwise disjoint if $A_i \cap A_j=\phi$ for all $i \neq j$.
  • If $A_1,A_2,\cdots$ are pairwise disjoint and $\cup_{i=1}^{\infty} A_i=S$, then $\{ A_i\}$ forms a partition of $S$.


1.2 Basics of Probability Theory
  • A collection of subsets of $S$ is called a sigma algebra (or Borel field), denoted by $\mathcal{B}$. $$\begin{align*}S&=\{1,2,3\}\\\mathcal{B}&=\{ \phi ,\{ 1\} ,\{ 2\} ,\{ 3\} ,\{ 1,2\} ,\{ 1,3\} ,\{ 2,3\} ,\{ 1,2,3\}\}\end{align*}$$
  • Bonferroni's Inequality is useful when it is difficult to calculate the intersection probability. $$p(A\cap B) \geq p(A) + p(B) - 1$$


1.3 Conditional Probability and Independence
  • Bayes' Rule $$p(A|B)=\frac{p(B|A)p(A)}{p(B)}$$


1.4 Random Variable
  • Random variable is a function from a sample space $S$ into the real numbers.


1.5 Distribution Functions

The cumulative distribution funciton (cdf) of a random variable $X$, denoted by $F_X(x)$, is defined by $$F_X(x)=p_X(X\leq x), \text{ for all }x$$ The function $F(x)$ is a cdf if and only if the following three conditions hold:

  1. $\lim_{x \rightarrow -\infty} F_X(x)=0$, and $\lim_{x\rightarrow \infty}F_X(x)=1$.
  2. $F_X(x)$ is a nondecreasing function of $x$.
  3. $F_X(x)$ is right-continuous; that is, $\lim_{\epsilon \rightarrow 0^+} F_X(x+\epsilon)=F_X(x)$, for every number $x \in \mathbb{R}$.


1.6 Density and Mass Functions

The probability mass function (pmf) of a discrete random variable $X$ is given by $$\begin{matrix}f_X(x)=p(X=x)=F_X(x)-F_X(x^-),& x\in S\end{matrix}$$ The probability density funciton (pdf) of a continuous random variable $X$ is the function that satisfies

$$f_X(x) = F^{'}_X(x)$$

where the derivative $F^{'}_X(x)$ exists:

$$F_X(x) = \int_{-\infty}^x f_X(t)dt$$


2. Transformations and Expectations
2.1 Distributions of Functions of a Random Variable

Formally, if we write $y=g(x)$, the function $f(x)$ defines a mapping from the original sample space of $X$, $\mathcal{X}$, to a new sample space $\mathcal{Y}$.

$$g(x): \mathcal{X} \rightarrow \mathcal{Y}$$


We associate with $g$ an inverse mapping, denoted by $g^{-1}$, which is a mapping from subsets of $\mathcal{Y}$ to subsets of $\mathcal{X}$, and is defined by

$$g^{-1}(A)=\left \{ x\in \mathcal{X}: g(x) \in A \right \}$$


If the random variable $Y$ is now defined by $Y=g(X)$, we can write for any subset A,

$$\begin{align*}p(Y\in A) &= p(g(X) \in A)\\&=p(\left \{x\in \mathcal{X} : g(x)\in A \right \})\\&=p(X\in g^{-1}(A))\end{align*}$$


If $X$ is a discrete random variable, then $\mathcal{X}$ is countable. The sample space for $Y=g(X)$ is also a countable set. Thus, $Y$ is also a discrete random variable. The pmf of $Y$ is

$$\begin{align*}f_Y(y)&=p(Y=y)\\&=p(g(X)=y)\\&=\sum_{x\in g^{-1}(y)} p(X=x)\\&=\sum_{x\in g^{-1}(y)} p_X(x)\end{align*}$$


If $X$ and $Y$ are continuous random variables with $X \sim f_X(x)$. The cdf of $Y$ is

$$\begin{align*}F_Y(y)&=p(Y\leq y)\\&=p(g(X)\leq y)\\&=p(\left \{x\in \mathcal{X} : g(x)\leq y\right \})\\&=\int_{\left \{x\in \mathcal{X} : g(x) \leq y\right \}} f_X(x)dx\end{align*}$$


Binomial transformation

A discrete random variable $X$ has a binomial distribution if its pmf is of the form

$$\begin{matrix}f_X(x)=p(X=x)=\binom{n}{x}p^x(1-p)^{n-x},& (x=0,1,\cdots,n)\end{matrix}$$

$$\begin{align*}f_Y(y)&=p(Y=y)\\&=p((n-X)=y)\\&=p(X=(n-y))\\&=f_X(n-y)\\&=\binom{n}{n-y} p^{n-y}(1-p)^y \sim Bin(n,1-p)\end{align*}$$


Uniform transformation

Suppose $X$ has a function distribution $y=\sin^2(x)$ on the interval $(0,2\pi)$, that is,

$$f_X(x)=\begin{cases}\frac{1}{2\pi} & 0<x<2\pi \\ 0 & \text{otherwise}\end{cases}$$

$$\begin{align*}F_Y(y)&=p(Y\leq y)\\&=p(g(X)\leq y)\\&=p(X\leq x_1)+p(x_2\leq X \leq x_3)+p(x_4\leq X \leq 2\pi)\\&=F_X(x_1)+(F_X(x_3)-F_X(x_2))+(F_X(2\pi)-F_X(x_4))\end{align*}$$


If $y=g(x)$ is a monotone function, then $g^{-1}$ is single-valued; that is, $g^{-1}(y)=x$ if and only if $y=g(x)$. If $g(x)$ is increasing,

$$F_Y(y)=\int_{\left \{x\in \mathcal{X} : x \leq g^{-1}(y)\right \}} f_X(x)dx=\int_{-\infty}^{g^{-1}(y)}f_X(x)dx=\color{red}{F_X(g^{-1}(y))}$$

If $g(x)$ is decreasing, we have

$$F_Y(y)=\int_{\left \{x\in \mathcal{X} : x \geq g^{-1}(y)\right \}} f_X(x)dx=\int^{\infty}_{g^{-1}(y)}f_X(x)dx=\color{blue}{1-F_X(g^{-1}(y))}$$


Let $X$ have pdf $f_X(x)$ and let $Y=g(X)$, where $g$ is a monotone function. Suppose that $f_X(x)$ is continuous on $\mathcal{X}$ and that $g^{-1}(y)$ has a continuous derivative on $\mathcal{Y}$. Then the pdf of $Y$ is given by

$$f_Y(y)=\begin{cases}f_X(g^{-1}(y))\left | \frac{d}{dy}g^{-1}(y)\right | & y \in \mathcal{Y} \\ 0 & \text{otherwise}\end{cases}$$


Probability integral transformation

Let $X$ have continuous cdf $F_X(x)$ and $Y=F_X(X)$. Then $Y$ is uniformly distributed on $(0,1)$.
$(p(Y\leq y)=y$, $0<y<1)$

$$\begin{matrix}F_X(x)=y & \Leftrightarrow & Y \sim Uniform(0,1)\end{matrix}$$

$$\begin{align*}F_Y(y)&=p(Y\leq y)\\&=p(F_X(x)\leq y)\\&=p(F_X^{-1}[F_X(X)] \leq F_X^{-1}(y))\\&=p(X\leq F_X^{-1}(y))\\&=F_X(F_X^{-1}(y))\\&=y\end{align*}$$

One application is in the generation of random samples from a particular distribution. For many distributions there are many other methods of generating observations that take less computing time, but this method is still useful because of its general applicability.


2.2 Expected Values

The expected value or mean of a random variable $g(X)$, denoted by $Eg(X)$, is

$$Eg(X)=\begin{cases}\sum_x g(x)f_X(x) & \text{if }X\text{ is discrete} \\\int g(x)f_X(x)dx & \text{if }X\text{ is continuous} \end{cases}$$

If $E|g(X)|=\infty$, we say that $Eg(X)$ does not exist.


Cauchy random variable

An example of a random variable whose expected value does not exist.
$\left (\int_{-\infty}^{\infty}f_X(x)dx=1 \text{, but } E|X|=\infty \right )$ $$\begin{matrix}f_X(x)=\frac{1}{\pi}\cdot \frac{1}{1+x^2}, & (-\infty  < x < \infty)\end{matrix}$$


2.3 Moments and Moment Generating Functions
  • The $n^\text{th}$ moment of $X$, $F_X(x)$ $${\mu_n}'=EX^n$$
  • The $n^\text{th}$ central moment of $X$ $$\begin{matrix}\mu_n=E(X-\mu)^n,& \mu={\mu_1}'=EX\end{matrix}$$
  • Moment Generating Function (MGF) $$M_X(t)=Ee^{tX}$$
  • The $n^\text{th}$ moment is equal to the $n^\text{th}$ derivative of $M_X(t)$ evaluated at $t=0$.$$EX^n=\left . \frac{d^n}{dt^n}M_X(t)\right |_{t=0}$$
  • The $n^\text{th}$ moment does not uniquely determine a distribution function.
    (That is, there may be two distinct random variables having the $n^\text{th}$ moments.)
  • If $X$ and $Y$ have bounded support, then $F_X(u) = F_Y(u)$ for all $u$ iff $EX^r=EY^r$ for all integer $r=0,1,2,\cdots$.
  • If the MGF exist and $M_X(t)=M_Y(t)$ for all $t$, then $F_X(u)=F_Y(u)$ for all $u$.
  • Convergence of MGFs $$\lim_{i\rightarrow \infty} F_{X_i}(x) = F_X(x)$$


Poisson approximation

Binomial probabilities can be approximated by Poisson probabilities when $n$ is large and $p$ is small. Suppose that $X\sim Binomial(n,p)$ and $Y\sim Poisson(\lambda)$, with $\lambda=np$. $$\begin{align*}M_X(t)&=\left [ pe^t + (1-p)\right ]^n\\M_Y(t)&=e^{\lambda(e^t-1)}\end{align*}$$ $$\begin{align*}\lim_{n\rightarrow \infty}M_X(t)&=\lim_{n\rightarrow \infty}\left [pe^t+(1-p)\right ]^n\\&=\lim_{n\rightarrow \infty}\left [\frac{\lambda}{n}e^t+ \left ( 1-\frac{\lambda}{n} \right ) \right ]^n\\&=\lim_{n\rightarrow \infty}\left [1+\frac{\lambda(e^t-1)}{n}\right ]^n=e^{\lambda(e^t-1)}=M_Y(t)\end{align*}$$


2.4 Differentiating Under an Integral Sign

Leibnitz's Rule

If $f(x,\theta)$ is differentiable with respect to $\theta$, then $$\frac{d}{d\theta}\int_a^b f(x,\theta)dx=\int_a^b \frac{\partial}{\partial\theta}f(x,\theta)dx$$


Lebesgue's Dominated Convergence Theorem

Suppose the $h(x,y)$ is continuous at $y_o$ for each $x$, and there exists a function $g(x)$ satisfying

  1. $|h(x,y)| \leq g(x)$ for all $x$ and $y$,
  2. $\int g(x)dx < \infty$.

$$\lim_{y\rightarrow y_0}\int h(x,y)dx=\int \lim_{y \rightarrow y_0} h(x,y)dx$$


Suppose the $f(x,\theta)$ is differentiable in $\theta$ and there exists a function $g(x,\theta)$ such that

  1. $\left | \left . \frac{\partial}{\partial \theta} f(x,\theta) \right | _{\theta={\theta}'} \right | \leq g(x,\theta)$ for all ${\theta}'$ such that $|{\theta}'-\theta|\leq\delta_0$,
  2. $\int_{-\infty}^{\infty} g(x,\theta)dx < \infty$.

Then $$\frac{d}{d\theta}\int_{-\infty}^{\infty} f(x,\theta)dx=\int_{-\infty}^{\infty} \frac{\partial}{\partial\theta}f(x,\theta)dx$$



References