Due to the large dimensionality of the parameter space, it is important to
understand the performance of the algorithm when the optimization
problem (4) is approached.
For the conjugate gradients scheme, it is possible to get rigorous
upper bounds[3] to the error
, where
is the energy computed with the trial wave functions at
iteration step k:
In Eq. (5) the condition number c of the
Hessian matrix associated with (4) appears. To get an
efficient scheme and a high rate of convergence
, we want the condition number c to
be close to 1, in other words the spread of the Hessian matrix should
be as small as possible. In this section, we will extend the
convergence analysis by Mauri et al [1], and discuss the
effects of preconditioning.
To simplify matters, we first assume the so-called non-selfconsistent
case, where the gradient
does not depend on the charge
density. This corresponds to minimizing the simplified unconstrained
energy functional:
The operator
is still the same as in (3), but now
does not depend on the wave functions. It turns
out that
is minimized by the eigenfunctions
of the matrix
H which have the smallest eigenvalues
. We will
henceforth call those
the occupied eigenfunctions, since the corresponding electronic levels
are occupied. From a more mathematical point of view,
(6) provides a convenient means of finding the smallest
eigenpairs of a large hermitian matrix H. The relationship of the
conjugate gradient approach (6) with popular Krylov
subspace methods such as the Davidson and Lanczos schemes will be
subject of future work.
Since the minimum of
is given by eigenfunctions of H, we
can find the Hessian matrix of
by performing an
a-posteriori analysis. Following Mauri et al[1], we expand
the occupied wave functions around the minimum in terms of the
complete basis of all N eigenfunctions of H, which we
label in order of increasing eigenvalues:
We further restrict ourselves to real
wave functions, and therefore the expansion coefficients
in
(7) are real. By substituting (7) into
(6) and using
, we obtain to second order in the expansion
coefficients:
The first term on the r.h.s. of (8) is determined only by the
spectrum of H, and does not contain
. The smallest and largest
eigenvalues it contributes to the Hessian matrix
of
(6) are given by
and
, respectively. In other words,
the separation between the occupied and the unoccupied, and the spread
of H determine the eigenvalues entering
from the
first term. The second term on the r.h.s of (8) is directly
related to the implicit normalization constraints, whereas the third term is
due to the implicit orthogonalization constraints between different wave
functions. To see what new eigenvalues enter
due to
the orthonormality constraints, it is sufficient to just analyze the
second term, as it also captures the extreme eigenvalues brought about by
the third term.
First of all,
must be chosen such that
in
order to get the Hessian matrix to be positive definite.
To get good performance,
should be picked such that the
eigenvalues of
stemming from the second term fall
within the range of eigenvalues already present from the first term:
With this choice of shift parameter, the condition number of the Hessian matrix is just given by the spectrum of H, and the conjugate gradient scheme should perform optimally according to (5).
We point out that equation (8) is in disagreement with Mauri
et al[1], who have an additional factor of two in front of
the second and third
term. We also note that there is
another set of modes corresponding to
, but
those have zero eigenvalues, and therefore do not appear in
Eq. (8).
We now briefly discuss the effects of preconditioning on the
performance of the algorithm. Preconditioning requires a matrix
which, when applied from the left to the Hessian matrix
,
brings the condition number of
as close as possible to
one. Preferably,
the application of
to a set of trial wave functions should not
increase the operation
count significantly. For the problem we are considering here, indeed
such a preconditioner exists[2]. It is based on the
fact that the matrix H is diagonally dominant in the lower right
corner, and therefore an approximate inverse exists for this part of
the matrix. Since H enters
via the first term of
Eq. (8), it is possible to construct[2] a
which will reduce the large
's in the first term of
(8), and
thereby the maximum eigenvalues of
. The downside is that
reducing the spread in
eigenvalues arising from the first term renders it more difficult to
pick an optimal
using equation (9), as we will show
in the next section.