Sampling from a probability distribution

A common problem in applications of random variables is to draw samples from a given distribution. It's easy to find computer codes for generating pseudorandom numbers that are distributed uniformly or normally, and these usually must be converted to simulate a different target distribution. The key steps are integration and function inversion, which Chebfun can do with great accuracy.

LW = 'linewidth'; FS = 'fontsize'; MS = 'markersize';
splitting off

Warning: The syntax 'splitting on' is deprecated.
Please see CHEBFUNPREF documentation for further details.

von Mises distribution

The von Mises distribution is a periodic variant of the normal distribution. While the density is easily defined, it's otherwise not simple to work with analytically.

We start with the density function, normalized to give total probability 1.

kappa = 1.5;
f = chebfun(@(x) exp(kappa*cos(x)),[-pi pi]);
density = f/sum(f);

Now we integrate to get the cumulative distribution function.

cdf = cumsum(density);
plot([density,cdf],LW,1.6), axis([-pi pi 0 1])
title('von Mises distribution',FS,12)
legend('density','distribution','Location','northwest')

Sampling from this distribution involves applying its inverse to uniformly sampled points. We could do this one-by-one using roots, but for a large number of points it is more efficient to find a chebfun for the inverse function with inv:

cdfinv = inv(cdf);
plot(cdfinv,LW,1.6)
title('Inverse of von Mises distribution',FS,12)

Now the resampling is easy. We compare the resulting histogram to the original von Mises density.

u = rand(1e4,1);                           % uniform
x = cdfinv(u);                             % von Mises
[count,bin] = hist(x,36);
count = count/sum(count*(bin(2)-bin(1)));  % renormalize, total area = 1
cla, bar(bin,count), hold on
plot(density,'r',LW,1.6), axis tight
title('Sampled points and the orignal density',FS,12)

Logit-normal distribution

A more exotic and troublesome distribution is the logit-normal distribution. Its density and cdf are easy enough to define:

sig = 1.11;
f = @(x) exp( -(log(x./(1-x))).^2/(2*sig^2))./(x.*(1-x));
density = chebfun(f,[0 1]);
density = density/sum(density);
cdf = cumsum(density);
clf, plot([density,cdf],LW,1.6)
title('logit-normal distribution',FS,12)
legend('density','distribution','Location','northwest')

However, because $F'=f=0$ at the ends, the inverse function has infinite slope at the ends, and a straightforward inversion will fail. To cope with this, we'll take some shortcuts. First, we'll use symmetry to restrict attention to $x> 1/2$. Second, we'll put Chebfun into splitting mode to help cope with the endpoint slope. Finally, we'll truncate the domain of the cdf slightly.

splitting on
cdfinv = inv( cdf{0.5,1-1e-3} );
clf, plot(cdfinv,LW,1.6)
title('Inverse of the logit-normal distribution',FS,12)

To apply the result for resampling, we have to reflect uniform values less than $1/2$ back into $[1/2,1]$, and reflect the results back.

u = rand(1e4,1);
flag = (u < 0.5);  u(flag) = 1-u(flag);
x = cdfinv( u );  x(flag) = 1-x(flag);
[count,bin] = hist(x,36);
count = count/sum(count*(bin(2)-bin(1)));  % renormalize, total area = 1
clf, bar(bin,count), hold on
plot(density,'r',LW,1.6), axis tight
title('Sampled points and the orignal density',FS,12)

We can see what our truncation of the original random variable costs us by looking at the domain of the inverse cdf:

cdfinv.ends.'
missing = 1 - ans(end)

ans =
   0.500000000000073
   0.998046874756072
   0.999992370360587
   0.999999880545825
   0.999999999755115
missing =
     2.448851121883422e-10

Thus, a uniform variable that takes a value closer to 1 than this number won't be mapped accurately back to the logit-normal variable we want, unless we take further steps. Clearly, such events will be extremely rare.

Sampling from a probability distribution

Toby Driscoll, December 2011 in statsdownload·view on GitHub

von Mises distribution

Logit-normal distribution