# Random Distribution

Random variable $X$ is a function, it maps a set of random events to a set of numerical values. The random variable can either be discrete or continuous. A continuous random variable uses probability density function $p(x)$ to describe.

## Probability density function, pdf

Probability density function describes the random distribution of a continues random variable, and its integral over the entire space $[a,b]$ is equal to 1.

$$\int_{a}^{b}p(x)dx = 1$$

For $i \le x \le j$, the probability is

$$P = \int_{i}^{j}p(x)dx$$

## Cumulative probability distribution function, cdf

$X$ is the random variable defines on $[a,b]$, its pdf is $p(x)$. Define its cdf

$$P(x) = \int p(x)dx$$

and $P(a)=0$, $P(b)=1$. So for $i \le x \le j$, the probability is $P(j)-P(i)$.

## Uniform distribution

If a random variable $X$ defined over the space $[a,b]$ follows uniform distribution, its probability density function is a constant function:

$$p(x) = \frac{1}{b-a}$$

import numpy as np
from matplotlib import pyplot as plt

xvalues = np.random.uniform(0, 1, 10000)
yvalues = np.random.uniform(0, 1, 10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

## Generate pdf from uniform distribution

### Inverse function method

Let $X$ be a random variable, and its pdf is $p(x)$, and its cdf is $P(x)$. Let $Y$ be a random variable follows uniform distribution, the random variable $P^{-1}(Y)$ has the same distribution of $X$.

For example:

Random variable $X$ with pdf $p(x)=2x$, then its cdf is $P(x)=x^2$. The inverse function of cpf is $P^{-1}(x)=\sqrt{x}$.

vsqrt = np.vectorize(lambda x: x**0.5)

xvalues = np.random.uniform(0, 1, 10000)
xvalues = vsqrt(xvalues)
yvalues = np.random.uniform(0, 1, 10000)
yvalues = vsqrt(yvalues)

plt.scatter(xvalues, yvalues, s=1)
plt.show()

### Acceptance-rejection method

Sometimes it is difficult to calculate cdf or the inverse function of cdf. von Neumann proposed a new algorithm to generate pdf from uniform distribution.

Let $X$ be a random variable, and its pdf is $p(x)$, i.e. $X \sim p(x)$.

1. Generate a random variable $Y \sim U(a,b)$
2. Generate a random variable $Z \sim U(0,c)$, $c$ is the max value of $p(x)$
3. If $Z \le p(Y)$, then accpet $Y$, reject otherwise

Let's use $p(x)=2x$ as the example

def pdf2x(count):
xvalues = list()
while len(xvalues) < count:
y = np.random.uniform(0, 1, count) # Y ~ U(a, b)
z = np.random.uniform(0, 2, count) # Z ~ U(0, c)
for i in range(0, count):
if z[i] <= 2*y[i]: # accept
xvalues.append(y[i])
if len(xvalues) == count:
break
return xvalues

xvalues = pdf2x(10000)
yvalues = pdf2x(10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

The acceptance-rejection method is very slow because it needs to sample the pdf function. To improve the speed, usually we don't use the max value $c$, instead, we consider a simple function $g(x)$ and make sure $g(x) \ge p(x)$. Then take random variable $Z \sim U(0, g(Y)$. By choosing a good $g(x)$, it can improve the sampling speed a lot.

1. Generate a random variable $Y \sim g(x)$
2. Generate a random variable $Z \sim U(0,g(Y))$, $c$ is the max value of $p(x)$
3. If $Z \le p(Y)$, then accpet $Y$, reject otherwise
def g(x):
return x**0.5

def pdf2x(count):
xvalues = list()
while len(xvalues) < count:
y = g(np.random.uniform(0, 1, 1)) # Y ~ g(x)
z = np.random.uniform(0, g(y), 1) # Z ~ U(0, g(Y))
if z <= 2*y: # accept
xvalues.append(y)
return xvalues

xvalues = pdf2x(10000)
yvalues = pdf2x(10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()