next up previous
Next: Unconstrained Energy Functional Up: Conjugate-Gradient Based Electronic Structure Previous: Conjugate-Gradient Based Electronic Structure

Introduction

The enormous increase in available computing power is certainly one of the key factors for the recent advances in computational condensed matter theory. Using density functional theory (DFT) in the local density approximation (LDA) or combined with generalized gradient approximations (GGAs), it is now possible to compute with sufficient accuracy the ground state properties of quantum-mechanical systems up to hundreds of atoms large. A wealth of information can be extracted from these CPU-intensive calculations, e.g. binding energies, stable configurations, magnetic and dielectric properties, and insight into electronic properties such as the charge density, band structure, and the quantum-mechanical wave functions.

To reduce the computational effort, we use pseudopotentials to represent the atomic nucleus and the core electrons, leaving only the set of m valence wave functions to be dealt with. Those are determined by the requirement that they minimize the DFT/LDA energy functional

 

subject to the constraint that the be mutually orthogonal and normalized. With a suitable construction of the pseudopotentials, the wave functions in (1) can be represented in a plane-wave basis of size N, thus turning (1) into a minimization problem of dimension Nm, with N often as large as , but m being substantially smaller, typically 1/1000 to 1/100 of N.

Fortunately, the derivative

 

(not including the constraints!) is available to aid the search for the minimum. In a plane-wave basis, the Hamiltonian operator turns into the () matrix H, which can easily become too large to fit into memory. However, most iterative schemes require only a rule describing how to apply to a given wave function, thus removing the need for explicit storage of H. The separable Kleinman-Bylander pseudopotentials allow for a particularly efficient representation of , namely a decomposition of the form:

 

In Eq. (3), the first term is the kinetic energy operator, which is proportional to the Laplacian and therefore diagonal in a plane wave basis . However, it grows like with the wave number , and gives the matrix H its characteristic diagonally-dominant shape. It also motivates our convention for arranging the plane wave basis according to increasing G. That way, the lower right corner of H will be dominated by the diagonal elements arising from . The second so-called nonlocal term in (3) consists of a sum of projectors onto one-dimensional subspaces, and requires only operations to be applied to a set of trial wave functions. Finally, the local potential term in (3) is diagonal in a realspace basis. Therefore, it is applied to a trial wave function by first inverse Fourier transforming (operation count ), then multiplying with a diagonal operator in realspace (operation count ), and finally Fourier transforming back to the plane wave basis. Since , and m and N are proportional to the number of atoms , the second term applied to a set of wave functions requires operationsgif. The local term only scales like , but has a large prefactor in front, such that it dominates the computational cost for systems with less than 50 atoms.

Given the gradient (2), the minimization problem (1) can be tackled with a standard optimization scheme. The use of the popular conjugate gradients method is impeded by the need to include the orthonormality constraints[2]. We follow the work of Mauri et al [1] and use a functional with implicit orthonormality constraints, which allows us to use a textbook conjugate gradients scheme. In section 3, we exploit this to assess the convergence properties of the algorithm for a representative, but small model problem. A numerical example verifies our analysis and demonstrates the effect of preconditioning.

Even with the efficient representation (3), the task of minimizing E in (1) is formidable. Only the use of parallel computers will allow to explore the very large, interesting systems such as surfaces, interfaces, alloys, amorphous solids, liquids, and large molecules. There is agreement that the best way to parallelize the minimization problem is by partitioning all vector quantities (i.e. wave functions) according to their plane wave basis indices[4]. Then, dot products between two vectors are completely local, and only require a global summation at the end. Concerning the parallel performance, the challenge lies in the Fourier transform. We will show that even with an MPI explicit message passing approach, a high-bandwidth low-latency interconnect like the one of the Cray T3E is necessary to give good application performance with a larger number of processors. We also compare the performance of the T3E to the one of the SGI PowerChallenge as a prototypical symmetric multiprocessor.



next up previous
Next: Unconstrained Energy Functional Up: Conjugate-Gradient Based Electronic Structure Previous: Conjugate-Gradient Based Electronic Structure



Bernd Pfrommer
Tue Jul 22 17:09:54 PDT 1997