next up previous
Next: A numerical example Up: Conjugate-Gradient Based Electronic Structure Previous: Unconstrained Energy Functional

Convergence Analysis

Due to the large dimensionality of the parameter space, it is important to understand the performance of the algorithm when the optimization problem (4) is approached. For the conjugate gradients scheme, it is possible to get rigorous upper bounds[3] to the error , where is the energy computed with the trial wave functions at iteration step k:

 

In Eq. (5) the condition number c of the Hessian matrix associated with (4) appears. To get an efficient scheme and a high rate of convergence , we want the condition number c to be close to 1, in other words the spread of the Hessian matrix should be as small as possible. In this section, we will extend the convergence analysis by Mauri et al [1], and discuss the effects of preconditioning.

To simplify matters, we first assume the so-called non-selfconsistent case, where the gradient does not depend on the charge density. This corresponds to minimizing the simplified unconstrained energy functional:

 

The operator is still the same as in (3), but now does not depend on the wave functions. It turns out that is minimized by the eigenfunctions of the matrix H which have the smallest eigenvalues . We will henceforth call those the occupied eigenfunctions, since the corresponding electronic levels are occupied. From a more mathematical point of view, (6) provides a convenient means of finding the smallest eigenpairs of a large hermitian matrix H. The relationship of the conjugate gradient approach (6) with popular Krylov subspace methods such as the Davidson and Lanczos schemes will be subject of future work.

Since the minimum of is given by eigenfunctions of H, we can find the Hessian matrix of by performing an a-posteriori analysis. Following Mauri et al[1], we expand the occupied wave functions around the minimum in terms of the complete basis of all N eigenfunctions of H, which we label in order of increasing eigenvalues:

 

We further restrict ourselves to real wave functions, and therefore the expansion coefficients in (7) are real. By substituting (7) into (6) and using , we obtain to second order in the expansion coefficients:

 

The first term on the r.h.s. of (8) is determined only by the spectrum of H, and does not contain . The smallest and largest eigenvalues it contributes to the Hessian matrix of (6) are given by and , respectively. In other words, the separation between the occupied and the unoccupied, and the spread of H determine the eigenvalues entering from the first term. The second term on the r.h.s of (8) is directly related to the implicit normalization constraints, whereas the third term is due to the implicit orthogonalization constraints between different wave functions. To see what new eigenvalues enter due to the orthonormality constraints, it is sufficient to just analyze the second term, as it also captures the extreme eigenvalues brought about by the third term.

First of all, must be chosen such that in order to get the Hessian matrix to be positive definite. To get good performance, should be picked such that the eigenvalues of stemming from the second term fall within the range of eigenvalues already present from the first term:

 

With this choice of shift parameter, the condition number of the Hessian matrix is just given by the spectrum of H, and the conjugate gradient scheme should perform optimally according to (5).

We point out that equation (8) is in disagreement with Mauri et al[1], who have an additional factor of two in front of the second and third term. We also note that there is another set of modes corresponding to , but those have zero eigenvalues, and therefore do not appear in Eq. (8).

We now briefly discuss the effects of preconditioning on the performance of the algorithm. Preconditioning requires a matrix which, when applied from the left to the Hessian matrix , brings the condition number of as close as possible to one. Preferably, the application of to a set of trial wave functions should not increase the operation count significantly. For the problem we are considering here, indeed such a preconditioner exists[2]. It is based on the fact that the matrix H is diagonally dominant in the lower right corner, and therefore an approximate inverse exists for this part of the matrix. Since H enters via the first term of Eq. (8), it is possible to construct[2] a which will reduce the large 's in the first term of (8), and thereby the maximum eigenvalues of . The downside is that reducing the spread in eigenvalues arising from the first term renders it more difficult to pick an optimal using equation (9), as we will show in the next section.



next up previous
Next: A numerical example Up: Conjugate-Gradient Based Electronic Structure Previous: Unconstrained Energy Functional



Bernd Pfrommer
Tue Jul 22 17:09:54 PDT 1997