Random Distribution

Random variable $X$ is a function, it maps a set of random events to a set of numerical values. The random variable can either be discrete or continuous. A continuous random variable uses probability density function $p(x)$ to describe.
Probability density function, pdf
Probability density function describes the random distribution of a continues random variable, and its integral over the entire space $[a,b]$ is equal to 1.
$$\int_{a}^{b}p(x)dx = 1$$
For $i \le x \le j$, the probability is
$$P = \int_{i}^{j}p(x)dx$$
Cumulative probability distribution function, cdf
$X$ is the random variable defines on $[a,b]$, its pdf is $p(x)$. Define its cdf
$$P(x) = \int p(x)dx$$
and $P(a)=0$, $P(b)=1$. So for $i \le x \le j$, the probability is $P(j)-P(i)$.
Uniform distribution
If a random variable $X$ defined over the space $[a,b]$ follows uniform distribution, its probability density function is a constant function:
$$p(x) = \frac{1}{b-a}$$
import numpy as np
from matplotlib import pyplot as plt
xvalues = np.random.uniform(0, 1, 10000)
yvalues = np.random.uniform(0, 1, 10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

Generate pdf from uniform distribution
Inverse function method
Let $X$ be a random variable, and its pdf is $p(x)$, and its cdf is $P(x)$. Let $Y$ be a random variable follows uniform distribution, the random variable $P^{-1}(Y)$ has the same distribution of $X$.
For example:
Random variable $X$ with pdf $p(x)=2x$, then its cdf is $P(x)=x^2$. The inverse function of cpf is $P^{-1}(x)=\sqrt{x}$.
vsqrt = np.vectorize(lambda x: x**0.5)
xvalues = np.random.uniform(0, 1, 10000)
xvalues = vsqrt(xvalues)
yvalues = np.random.uniform(0, 1, 10000)
yvalues = vsqrt(yvalues)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

Acceptance-rejection method
Sometimes it is difficult to calculate cdf or the inverse function of cdf. von Neumann proposed a new algorithm to generate pdf from uniform distribution.
Let $X$ be a random variable, and its pdf is $p(x)$, i.e. $X \sim p(x)$.
- Generate a random variable $Y \sim U(a,b)$
- Generate a random variable $Z \sim U(0,c)$, $c$ is the max value of $p(x)$
- If $Z \le p(Y)$, then accpet $Y$, reject otherwise
Let's use $p(x)=2x$ as the example
def pdf2x(count):
xvalues = list()
while len(xvalues) < count:
y = np.random.uniform(0, 1, count) # Y ~ U(a, b)
z = np.random.uniform(0, 2, count) # Z ~ U(0, c)
for i in range(0, count):
if z[i] <= 2*y[i]: # accept
xvalues.append(y[i])
if len(xvalues) == count:
break
return xvalues
xvalues = pdf2x(10000)
yvalues = pdf2x(10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

The acceptance-rejection method is very slow because it needs to sample the pdf function. To improve the speed, usually we don't use the max value $c$, instead, we consider a simple function $g(x)$ and make sure $g(x) \ge p(x)$. Then take random variable $Z \sim U(0, g(Y)$. By choosing a good $g(x)$, it can improve the sampling speed a lot.
- Generate a random variable $Y \sim g(x)$
- Generate a random variable $Z \sim U(0,g(Y))$, $c$ is the max value of $p(x)$
- If $Z \le p(Y)$, then accpet $Y$, reject otherwise
def g(x):
return x**0.5
def pdf2x(count):
xvalues = list()
while len(xvalues) < count:
y = g(np.random.uniform(0, 1, 1)[0]) # Y ~ g(x)
z = np.random.uniform(0, g(y), 1)[0] # Z ~ U(0, g(Y))
if z <= 2*y: # accept
xvalues.append(y)
return xvalues
xvalues = pdf2x(10000)
yvalues = pdf2x(10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()
