%% 16. Best and near-best
ATAPformats
%%
% Traditionally, approximation theory has given a great deal of attention
% to best approximations, by which we continue to mean best approximations
% in the $\infty$-norm, and rather less to alternatives such as Chebyshev
% interpolants. One might think that this is because best approximations
% are much better than the alternatives. However, this is not true.
%%
% In a moment we shall continue with Lebesgue constants to shed some light
% on this matter, but first, let us do some experiments. We start with the
% extreme case of a very smooth function, $\exp(x)$, and compare
% convergence of its Chebyshev interpolants $p$ and best approximants
% $p^*$. (The difference between $n$ and $n+1$ in this code is
% intentional, since |chebfun| takes as argument the number of
% interpolation points whereas |remez| takes the degree of the polynomial.)
%%
% \vskip -2em
x = chebfun('x'); f = exp(x); nn = 0:15;
errbest = []; errcheb = []; i = 0;
for n = nn
i = i+1;
[p,err] = remez(f,n);
errbest(i) = err;
errcheb(i) = norm(f-chebfun(f,n+1),inf);
end
hold off, semilogy(nn,errcheb,'.-r')
hold on, semilogy(nn,errbest,'h-b','markersize',6)
FS = 'fontsize';
text(7,3e-12,'||f-p_n^*||',FS,12)
text(9,2e-7,'||f-p_n||',FS,12)
ylim([1e-16 10])
xlabel n, ylabel error
title(['Convergence of best approximation '...
'vs. Chebyshev interpolation: exp(x)'],FS,9)
%%
% \vskip 1pt
%%
% Clearly the stars for $p^*$ aren't much better than the dots for $p$. The
% ratio of the two converges toward 2 until the rounding errors set in for
% larger degrees:
%%
% \vskip -2em
format short
ratio = errcheb./errbest;
disp(' n ratio')
fprintf('%8d %12.5f\n',[nn; ratio])
%%
% At the other extreme of smoothness, consider $|x|$:
%%
% \vskip -2em
f = abs(x); nn = [0 2 4 10 20 40 100 200];
errbest = []; errcheb = []; i = 0;
for n = nn
i = i+1;
[p,err] = remez(f,n);
errbest(i) = err;
errcheb(i) = norm(f-chebfun(f,n+1),inf);
end
hold off, loglog(nn+1,errbest,'h-b','markersize',6)
hold on, loglog(nn+1,errcheb,'.-r')
axis([1 300 .001 2])
text(5,.01,'||f-p_n^*||',FS,12)
text(26,.06,'||f-p_n||',FS,12)
xlabel n, ylabel error
title(['Convergence of best approximation '...
'vs. Chebyshev interpolation: |x|'],FS,9)
%%
% \vskip 1pt
%%
% Again the stars are only a little bit better than the dots, by a constant
% factor of about 2.13060:
%%
% \vskip -2em
ratio = errcheb./errbest;
disp(' n ratio')
fprintf('%8d %12.5f\n',[nn; ratio])
%%
% (For odd values of $n$ the ratio is somewhat larger, approaching a
% constant of about 3.57.)
%%
% So for these examples at least, you don't buy much with best
% approximations. And the cost in computing time is considerable. Here is
% the time for computing a Chebyshev interpolant $p$ of degree 200 and
% evaluating it at 100 points:
%%
% \vskip -2em
xx = rand(100,1);
tic, p = chebfun(f,201); p(xx); toc
%%
% Here is the time for finding the best approximation $p^*$ and
% evaluating it at the same points:
%%
% \vskip -2em
tic, p = remez(f,200); p(xx); toc
%%
% The reason computing $p^*$ is more difficult is that the mapping from $f$
% to $p^*$ is nonlinear, hence requiring iteration in a numerical
% implementation, whereas the mapping from $f$ to $p$ is linear (Exercise
% 10.5). It is perfectly feasible to compute $p$ for degrees in the
% millions, whereas for $p^*$ we would rarely attempt degrees higher than
% hundreds.
%%
% Why has $p^*$ received so much more attention than $p$ over the years?
% One reason is that in the days before fast computers, the degrees were
% low, so small differences in accuracy were more important. Another is
% that the theory of best approximations is so beautiful! Indeed, their
% very nonlinearity makes best approximations seemingly a richer field for
% research than the simpler Chebyshev interpolants. Everybody remembers
% Theorem 10.1, the equioscillation theorem, from the moment they first
% see it.
%%
% Yet in actual computation, true best approximations are not so often
% used, as we have mentioned earlier (Chapter 10). This is a clue that the
% world of practice may have its own wisdom, independent of the theorists.
%%
%
% Now let us see what theoretical results might tell us about the
% difference between $p$ and $p^*$. The first such results pertain to
% Theorems 7.2 and 8.2 given earlier. Those theorems concerned convergence
% rates of $p_n$ to $f$, depending on the smoothness of $f$. What about
% analogous theorems for $p_n^*$? Apart from constant factors, they turn
% out to be the same! For example, exactly the same bound (8.3) was
% published by de la $\hbox{Vall\'ee}$ Poussin [1919, $\hbox{pp.\
% 123--124],}$ except with the Chebyshev interpolant $p_n$ replaced by the
% best approximation $p_n^*$. So within the two classes of functions
% considered in Chapters 7 and 8---$f$ having a $k$th derivative of
% bounded variation, or $f$ being analytic---there is no clear reason to
% expect $p_n^*$ to be much better than $p_n$.
%
%%
% An observation for arbitrary functions $f$ is the following consequence
% of Theorems 15.1--15.3:
%%
% \em
% {\bf Theorem 16.1. Chebyshev projections and interpolants are near-best.}
% Let $f\in C([-1,1])$ have degree $n$
% Chebyshev projection $f_n$, Chebyshev interpolant
% $p_n$, and best approximant $p_n^*$, $n\ge 1$. Then
% $$ \| f-f_n\| \le \left(4 + {4\over \pi^2}\log (n+1)\right)\|f-p_n^*\|
% \eqno (16.1) $$
% and
% $$ \| f-p_n\| \le \left(2 + {2\over \pi} \log (n+1)\right)\|f-p_n^*\|.
% \eqno (16.2) $$
% \vspace{-1em}
%%
% _Proof._ Follows from Theorems 15.1, 15.2, and 15.3.
% $~\hbox{\vrule width 2.5pt depth 2.5 pt height 3.5 pt}$
%%
% So the loss of accuracy in going from $p_n^*$ to $p_n$, say, can never be
% larger than a factor of $2 + (2/\pi)\log (n+1)$. It is interesting to
% examine the size of this quantity for various values of $n$. For
% $n=10^5$, for example:
%%
% \vskip -2em
2 + (2/pi)*log(100001)
%%
% Since this number is less than 10, we see that in dealing with
% polynomials of degree up to $n=100000$, the non-optimality of Chebyshev
% interpolation can never cost us more than one digit of accuracy. Here is
% the computation for $n = 10^{66}$:
%%
% \vskip -2em
2 + (2/pi)*log(1e66)
%%
% So we never lose more than 2 digits for degrees all the way up to
% $10^{66}$---which might as well be $\infty$ for practical purposes.
% (For British audiences, one
% can give a talk on these matters with the title `` $\kern -3pt 10^{66}$
% and All That''.)
%%
% In fact, one might question whether best approximations are really better
% than near-best ones at all. Of course they are better in a literal
% sense, as measured in the $\infty$-norm. However, consider the following
% error curves, which are quite typical for high degree approximation
% of a function that is smoother in some regions than others.
%%
% \vskip -2em
f = abs(x-0.8);
tic, pbest = remez(f,100); toc
hold off, plot(f-pbest,'r')
tic, pcheb = chebfun(f,101); toc
hold on, plot(f-pcheb)
axis([-1 1 -.008 .008]), grid on
title('Best approximation (equiripple) vs. Chebyshev interpolation (spike)',FS,9)
%%
% \vskip 1pt
%%
% We see that |pbest| is worse than |pcheb| for almost all values of
% $x$, because the damage done by the singularity at $x=0.8$ is global. By
% contrast, the effect of the singularity on |pcheb| decays with distance.
% Of course, |pbest| is better in the $\infty$-norm:
%%
% \vskip -2em
errcheb = norm(f-pcheb,inf)
errbest = norm(f-pbest,inf)
%%
% In the 2-norm, however, it is a good deal worse:
%%
% \vskip -2em
errcheb2 = norm(f-pcheb,2)
errbest2 = norm(f-pbest,2)
%%
% One might question how many applications there might be in which
% |pbest| was truly better than |pcheb| as an approximation to this
% function $f$. To echo a title of Corless and Watt [2004], minimax
% approximations are optimal, but Chebyshev interpolants may sometimes
% be better!
%%
% Li [2004] takes another angle on the near-optimality of
% Chebyshev interpolants, pointing out that for applications
% to elementary functions, bounds on certain derivatives usually
% hold that imply that the error in interpolation in Chebyshev points
% of the first kind exceeds that of the best approximation
% by less than a factor of 2, or as he calls it, ``a fractional bit.''
%%
% From a more theoretical point of view, we return to a notion
% mentioned in Theorem 12.1. Given
% $f\in C([-1,1])$, let $\rho$ $(1\le \rho \le \infty)$ be the parameter
% of the largest Bernstein ellipse $E_\rho$ to which $f$ can be analytically
% continued, and let $\{\kern .5pt p_n\}$ be any sequence of approximations
% to $f$ with $p_n\in {\cal P}_n$. Then
% $$ \limsup_{n\to\infty} \|f-p_n\|^{1/n} \ge \rho^{-1}, $$
% and if equality holds, $\{\kern .5pt p_n\}$ is said to be
% _maximally convergent._ It follows from Theorem 15.1 that
% if $\{\kern .5pt p_n\}$ come from a linear projection with Lebesgue
% constants $\Lambda_n$ that grow more slowly than exponentially as
% $n\to\infty$, i.e., with $\limsup_{n\to\infty} \Lambda_n^{1/n} = 1$, then
% $\{\kern .5pt p_n\}$ is maximally convergent for every $f\in C([-1,1])$.
% In particular, Chebyshev projections and interpolants are maximally
% convergent. This is a precise sense in which such approximations
% are ``near-best''.
%%
% Finally, we mention another kind of optimality that has received
% attention in the approximation theory literature [Bernstein 1931,
% $\hbox{Erd\H os}$ 1961, Kilgore 1978, de Boor & Pinkus 1978]: _optimal
% interpolation points_ (Exercise 15.5). Chebyshev points are very good,
% but they do not quite minimize the Lebesgue constant. Optimal points
% minimize the Lebesgue constant (by definition), and they level out the
% peaks of the Lebesgue function exactly (it has been proved)---but the
% improvement is negligible. The first statement of Theorem 15.2
% establishes that, like Chebyshev points, they lead to Lebesgue constants
% that are asymptotic to $(2/\pi)\log n$ as $n\to\infty$, which means they
% do not even improve upon Chebyshev points by a constant factor.
%%
%
% \begin{displaymath}
% \framebox[4.7in][c]{\parbox{4.5in}{\vspace{2pt}\sl
% {\sc Summary of Chapter 16.}
% The $\infty$-norm error in degree $n$ Chebyshev interpolation is never
% greater than $2+(2/\pi)\log(n+1)$ times the $\infty$-norm error in
% degree $n$ best approximation, and in practice, the ratio of errors
% rarely exceeds even a factor of\/ $2$.\ \ In the $2$-norm, the
% interpolant is often much better than the best approximation.
% \vspace{2pt}}}
% \end{displaymath}
%
%%
% \small\smallskip\parskip=2pt
% {\bf Exercise 16.1. Computing times for interpolation
% and best approximation.}
% (a) Repeat the experiment of this chapter involving $|x-0.8|$ but for all
% the values $n = 100, 200, 300, \dots ,1000$. In each case measure the
% computing times for Chebyshev interpolation and best approximation as
% calculated by the Chebfun {\tt remez} command, the
% $L^2$ errors of both approximants, and the $L^\infty$ errors. Plot these
% results and comment on what you find. (b) In particular, produce a plot
% of error curves like that in the text. You may find it helpful to use a
% flag like {\tt 'numpts',10000} in your Chebfun plotting command.
% \par
% {\bf Exercise 16.2. Approximation of a wiggly function.}
% Define $f(x) = T_{200}(x)+T_{201}(x) + \cdots + T_{220}(x)$.
% Construct the Chebyshev interpolant $p$ and best approximation
% $p^*$ of degree 199. Plot the errors and measure the
% $\infty$- and $2$-norms.
% \par
% {\bf Exercise 16.3. Rounding errors on a grid of \boldmath$10^{66}$ points.}
% Suppose we had a computer with 16-digit precision capable of applying the
% barycentric formula (5.13) to evaluate a polynomial interpolant $p(x)$ for
% data on a Chebyshev
% grid of $10^{66}$ points. (For the sake of this thought experiment, imagine
% that the differences $x-x_j$ can be evaluated correctly to $16$-digit precision
% rather than coming out as $0$ and thereby invoking the
% $x=x_j$ clause of Theorem 5.2.)
% The evaluation would require adding up about $10^{66}$ numbers, entailing
% about $10^{66}$ rounding errors. Even if these errors only accumulated
% in the square root fashion of a random walk, it would still
% seem we must end up with errors on the order of $10^{33}$ times
% $10^{-16}$, destroying all accuracy. Yet in fact, the computation would
% be highly accurate. What is the flaw in this $10^{33}$ reasoning?
% \par