plotopt = {'linewidth',2,'markersize',12};
Mercer's theorem is a continuous analog of the singular-value or eigenvalue decomposition of a symmetric positive definite matrix. One of its main applications is to find convenient ways to express stochastic processes, via the Karhunen-Loeve expansion [1].
Mercer's theorem
Suppose $K(s,t)$ is a symmetric (that is, $K(t,s)=K(s,t)$), continuous, and nonnegative definite kernel function on $[a,b]\times [a,b]$. Mercer's theorem asserts that there is an orthonormal set of eigenfunctions $\psi_j(x)$ and eigenvalues $\lambda_j$ such that
$$ K(s,t) = \sum_j^\infty \lambda_j \psi_j(s) \psi_j(t), $$
where the values and functions satisfy the integral eigenvalue equation
$$ \lambda_j \psi_j(s) = \int_a^b K(s,t) \psi_j(t). $$
For example, suppose we have an exponentially decaying kernel:
K = @(s,t) exp(-abs(s-t));
We can create the integral operator and find the leading terms of its Mercer decomposition numerically.
F = chebop(@(u) fred(K, u)); [Psi,Lambda] = eigs(F,20,'lm'); Psi = chebfun(Psi); [lambda,idx] = sort(diag(Lambda),'descend'); Psi = Psi(:,idx);
plot(Psi(:,[1 2 5 10]),plotopt{:}) title('First four Mercer eigenfunctions') xlabel('x') ylabel('\Psi(x)')
The eigenfunctions returned by eigs
are orthonormal.
format short Psi(:,1:6)'*Psi(:,1:6)
ans = 1.0000 0.0000 -0.0000 -0.0000 0.0000 0.0000 0.0000 1.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 -0.0000 0.0000 1.0000
The truncation of the Mercer sum does lead to an underestimate of the values of the kernel $K(s,t)$. For our example, we should get $K(s,s)=1$, but we get noticeably less.
Psi(0,:)*diag(lambda)*Psi(0,:)' Psi(0.95,:)*diag(lambda)*Psi(0.95,:)'
ans = 0.9799 ans = 0.9825
In fact, the eigenvalues decrease only like $O(n^{-2})$, which makes the pointwise convergence in the number of terms rather slow.
loglog(lambda,'.',plotopt{:}), axis tight xlabel('n') ylabel('| \lambda_n |')
Karhunen-Loeve expansion
Now suppose that $X(t,\omega)$ is a stochastic process for $t$ in some interval $[a,b]$ and $\omega$ in some probability space. The process is often characterized by its mean, $\mu(t)$, and its covariance, $K(s,t)$, the expected value of $(X(s)-\mu(s))(X(t)-\mu(t))$. Using Mercer's theorem on $K$, we can express the process by the K-L expansion
$$ X(t,\omega) = \mu(t) + \sum_j^\infty \sqrt(\lambda_j) \psi_j(t) Z_j(\omega), $$
where $\lambda_j$ and $\psi_j$ are Mercer eigenmodes for $K$, and the $Z_j$ are uncorrelated and of unit variance.
K-L is a generalization of the singular value decomposition of a matrix, which can be written as a sum of outer products of vectors. The covariance $K$ plays the role of the Gram matrix inner products (in probability) of "columns" of the process for different values of $s$ and $t$. A number of SVD results have K-L analogs, most notably that the best approximation of the process results from truncating the expansion, if the eigenvalues are arranged in nonincreasing order.
Because the $Z_j$ in the expansion are uncorrelated, the variance of $X$ is just the sum of the eigenvalues. This is the trace of $K$, which is the integral of $K(s,s)$; in this case, the result is $2$. But we can also calculate the variance in a truncation of the expansion by summing only some of the eigenvalues. For example, suppose the process $X$ has the exponential covariance in $K$ above. The eigenvalues show that $95\%$ of the variance in the process is captured by the first $10$ K-L modes:
captured = sum(lambda(1:10)) / 2
captured = 0.9579
We can find realizations of $X$ by selecting the random parameters $Z_j$ in the expansion.
Z = randn(10,400); L = diag( sqrt(lambda(1:10)) ); X = Psi(:,1:10)*(L*Z); plot(X(:,1:40)) mu = sum(X,2)/400; hold on, plot(mu,'k',plotopt{:}) title('Random realizations, and the mean')
We should get roughly the original covariance function back. (We'll discretize the computation for speed.)
points = (-1:.05:1)'; [S,T] = meshgrid(points); C = cov( X(points,:)' ); % covariance at discrete locations clf, mesh(S,T,C) hold on, plot3(S,T,K(S,T),'k.',plotopt{:})
If we shorten the correlation length of the process relative to the domain (i.e., more randomness), the amount of variance captured by the first $10$ modes will decrease.
K = @(s,t) exp(-4*abs(s-t)); % decrease correlation faster, then... F = chebop(@(u) fred(K, u) ); lambdaShort = sort( eigs(F,24,'lm'), 'descend' );
clf loglog(lambda,'b.',plotopt{:}) hold on loglog(lambdaShort,'r.',plotopt{:}), axis tight xlabel('n') ylabel('| \lambda_n |') captured = sum(lambdaShort(1:10)) / 2 % ... a smaller fraction is captured
captured = 0.6744
References
- D. Xu, Numerical Methods for Stochastic Computations, Princeton University Press, 2010.