Random Distribution

Random Distribution

Random variable $X$ is a function, it maps a set of random events to a set of numerical values. The random variable can either be discrete or continuous. A continuous random variable uses probability density function $p(x)$ to describe.

Probability density function, pdf

Probability density function describes the random distribution of a continues random variable, and its integral over the entire space $[a,b]$ is equal to 1.

$$\int_{a}^{b}p(x)dx = 1$$

For $i \le x \le j$, the probability is

$$P = \int_{i}^{j}p(x)dx$$

Cumulative probability distribution function, cdf

$X$ is the random variable defines on $[a,b]$, its pdf is $p(x)$. Define its cdf

$$P(x) = \int p(x)dx$$

and $P(a)=0$, $P(b)=1$. So for $i \le x \le j$, the probability is $P(j)-P(i)$.

Uniform distribution

If a random variable $X$ defined over the space $[a,b]$ follows uniform distribution, its probability density function is a constant function:

$$p(x) = \frac{1}{b-a}$$

import numpy as np
from matplotlib import pyplot as plt

xvalues = np.random.uniform(0, 1, 10000)
yvalues = np.random.uniform(0, 1, 10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

Generate pdf from uniform distribution

Inverse function method

Let $X$ be a random variable, and its pdf is $p(x)$, and its cdf is $P(x)$. Let $Y$ be a random variable follows uniform distribution, the random variable $P^{-1}(Y)$ has the same distribution of $X$.

For example:

Random variable $X$ with pdf $p(x)=2x$, then its cdf is $P(x)=x^2$. The inverse function of cpf is $P^{-1}(x)=\sqrt{x}$.

vsqrt = np.vectorize(lambda x: x**0.5)

xvalues = np.random.uniform(0, 1, 10000)
xvalues = vsqrt(xvalues)
yvalues = np.random.uniform(0, 1, 10000)
yvalues = vsqrt(yvalues)

plt.scatter(xvalues, yvalues, s=1)
plt.show()

Acceptance-rejection method

Sometimes it is difficult to calculate cdf or the inverse function of cdf. von Neumann proposed a new algorithm to generate pdf from uniform distribution.

Let $X$ be a random variable, and its pdf is $p(x)$, i.e. $X \sim p(x)$.

  1. Generate a random variable $Y \sim U(a,b)$
  2. Generate a random variable $Z \sim U(0,c)$, $c$ is the max value of $p(x)$
  3. If $Z \le p(Y)$, then accpet $Y$, reject otherwise

Let's use $p(x)=2x$ as the example

def pdf2x(count):
    xvalues = list()
    while len(xvalues) < count:
        y = np.random.uniform(0, 1, count) # Y ~ U(a, b)
        z = np.random.uniform(0, 2, count) # Z ~ U(0, c)
        for i in range(0, count):
            if z[i] <= 2*y[i]: # accept
                xvalues.append(y[i])
            if len(xvalues) == count:
                break
    return xvalues

xvalues = pdf2x(10000)
yvalues = pdf2x(10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()

The acceptance-rejection method is very slow because it needs to sample the pdf function. To improve the speed, usually we don't use the max value $c$, instead, we consider a simple function $g(x)$ and make sure $g(x) \ge p(x)$. Then take random variable $Z \sim U(0, g(Y)$. By choosing a good $g(x)$, it can improve the sampling speed a lot.

  1. Generate a random variable $Y \sim g(x)$
  2. Generate a random variable $Z \sim U(0,g(Y))$, $c$ is the max value of $p(x)$
  3. If $Z \le p(Y)$, then accpet $Y$, reject otherwise
def g(x):
    return x**0.5

def pdf2x(count):
    xvalues = list()
    while len(xvalues) < count:
        y = g(np.random.uniform(0, 1, 1)[0]) # Y ~ g(x)
        z = np.random.uniform(0, g(y), 1)[0] # Z ~ U(0, g(Y))
        if z <= 2*y: # accept
            xvalues.append(y)
    return xvalues

xvalues = pdf2x(10000)
yvalues = pdf2x(10000)
plt.scatter(xvalues, yvalues, s=1)
plt.show()