Fast hybrid density-functional computations using plane-wave basis - - PowerPoint PPT Presentation
Fast hybrid density-functional computations using plane-wave basis - - PowerPoint PPT Presentation
Fast hybrid density-functional computations using plane-wave basis sets Paolo Giannozzi, Universit` a di Udine, Italy Workshop A fresco of contemporary condensed-matter physics in honour of Pino Grosso Pisa, 27-28 September 2018 Work mostly
Background and motivations
- In many fields of science, atomistic simulations have become very successful ...
- ...
in particular, first-principle simulations based on density-functional theory (DFT), pseudopotentials (PP) and a plane-wave (PW) basis sets
- Hybrid functionals, containing some admixture of exact (Hartee-Fock) exchange,
currently offer the best performances in terms of accuracy of results ...
- ... but they are so slooow with PWs!
The goal of this work: to produce a fast, reliable and robust method to compute the exact-exchange potential in a PW-PP framework
Density-Functional Theory: Kohn-Sham approach
Let us introduce the orbitals ψi for an auxiliary set of non-interacting electrons whose charge density is the same as that of the true system: n(r) =
- i
|ψi(r)|2, ψi|ψj = δij Let us rewrite the energy functional in a more manageable way as: E = Ts[n(r)] + EH[n(r)] + Exc[n(r)] +
- n(r)V (r)dr
where Ts[n(r)] is the kinetic energy of the non-interacting electrons: Ts[n(r)] = − ¯ h2 2m
- i
- ψ∗
i (r)∇2ψi(r)dr,
EH[n(r)] is the Hartree energy, due to electrostatic interactions: EH[n(r)] = e2 2 n(r)n(r′) |r − r′| drdr′,
Exc[n(r)] is called exchange-correlation energy (a reminiscence from the Hartree-Fock theory) and includes all the remaining (unknown!) energy terms. Minimization of the energy with respect to ψi yields the Kohn-Sham (KS) equations:
- − ¯
h2 2m∇2 + V (r) + VH(r) + Vxc(r)
- HKS
ψi(r) = ǫiψi(r), where the Hartree and exchange-correlation potentials: VH(r) = δEH[n(r)] δn(r) = e2
- n(r′)
|r − r′|dr′, Vxc(r) = δExc[n(r)] δn(r) depend self-consistently upon the ψi via the charge density. Self-consistency is achieved by looping over the charge density, using some suitable mixing algorithm, such as e.g. simple mixing. At iteration l + 1: n(l+1)
in
= βn(l)
- ut + (1 − β)n(l)
in,
0 < β < 1
Hybrid functionals
The “standard” functionals, LDA (local density approximation) or GGA (generalized gradient approximation), are functions of the local density n(r) and for GGA of gradient |∇n(r)|: Exc =
- n(r)ǫGGA (n(r), |∇n(r)|) dr
Better-performing hybrid functionals, such as B3LYP or PBE0, contain some amount
- f exact exchange, as in Hartree-Fock theory (spin-restricted):
E = Ts +
- n(r)V (r)dr + EH −e2
2
- i,j
ψ∗
i (r)ψj(r)ψ∗ j(r′)ψi(r′)
|r − r′| drdr′
- EHF
x
In hybrid DFT: E = Ts +
- n(r)V (r)dr + EH +
Ehyb
xc
- αxEHF
x
+ (1 − αx)EDF T
x
+ EDF T
c
, with αx = 0.2 ÷ 0.3.
Exchange potential
The exchange potential Vx is, unlike Vxc in GGA, nonlocal and depends upon the density matrix, γ(r, r′) =
- j
ψj(r)ψ∗
j(r′):
(Vxψi)(r) ≡ δEx δψ∗
i (r) = −e2 j
ψj(r)ψ∗
j(r′)ψi(r′)
|r − r′| dr′ More generally, it may be defined it via its action on a generic orbital: (Vxφi)(r) ≡ −e2
j
ψj(r)ψ∗
j(r′)φi(r′)
|r − r′| dr′ Let us assume that our orbitals are expanded into PWs: ψj(r) =
- G
cj,k+G 1 √ NΩ ei(k+G)·r, ¯ h2 2m|k + G|2 ≤ Ecut k = Bloch vector, G = reciprocal lattice vectors, Ecut = kinetic energy cutoff, Ω = unit cell volume, NΩ = crystal volume.
Exchange potential and plane waves
With PW, orbitals are represented in Fourier space as {cj,k+G}, j = 1, ..., M, where M is the number of orbitals. Vx can be easily computed using fast Fourier transforms (FFT) (in the following, simplified case, k = 0, orbitals are real):
- φi(G)
F F T
− → φi(r) (FFT brings from a discrete grid of G into a discrete grid of r)
- ρij(r) = ψj(r) · φi(r)
- ρij(r)
F F T
− → ρij(G)
- vij(G) = ρij(G)/G2
- vij(G)
F F T
− → vij(r)
- (VXφi)(r) =
j ψj(r) · vij(r)
- (VXφi)(r)
F F T
− → (VXφi)(G) Each operation is very quick ... but O(M 2) such operations are needed!
Hybrid functionals for plane waves
Double-loop algorithm
Pure DFT Outer loop: Do n = 1, ... (VX ϕ(n)
k )(
r) = −
N
- i
ϕ(n)
i
( r)
- d
r ′ ϕ(n)
i
( r ′)ϕ(n)
k (
r ′) | r − r ′| SLOW!! {ϕ(n)
i
, ψ(n,0)
i
} Inner loop: Do m = 1, ... (VX ψ(n,m)
k
)( r) = −
N
- i
ϕ(n)
i
( r)
- d
r ′ ϕ(n)
i
( r ′)ψ(n,m)
k
( r ′) | r − r ′| SLOW!! Iterative diagonalization Check Convergence on {ψ(n,m)
i
} at fixed {ϕ(n)
i
} End Inner loop Check Convergence on {ϕ(n)
i
} End Outer loop
The exchange integrals have to be evaluated at every iteration, even though the {ϕ(n)
i
} functions are not varying in the inner
- nes.
Giannozzi et al., JPCM, 21, 395502 (2009)
Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 4 / 18
Exchange with localized orbitals: inner projection
We project the exchange operator over the set of occupied orbitals ˆ WX =
- ij
ˆ VX |ϕi ϕi| ˆ VX|ϕj
−1 ϕj| ˆ
VX =
- i
|ξi ξi| |ξk =
- i
ˆ VX |ϕi L−T
ik
Adaptively Compressed Exchange (ACE) Inner projection methods. The application of ˆ WX during the SCF is equivalent to ˆ VX within the subspace of the projection ˆ WX |ψk =
- i
|ξi ξi|ψk
Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015); L owdin, IJQC, 4, 231 (1971).
Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 6 / 18
Hybrid functionals for plane waves
Double-loop algorithm
Pure DFT Outer loop: Do n = 1, ... (VX ϕ(n)
k )(
r) = −
N
- i
ϕ(n)
i
( r)
- d
r ′ ϕ(n)
i
( r ′)ϕ(n)
k (
r ′) | r − r ′| SLOW!! WX =
- i
|ξ(n) ξ(n)| = VX |ϕ(n) ϕ(n)|VX |ϕ(n)
−1 ϕ(n)| VX
WX |ϕ(n)
k = −
- i
|ξ(n)
i
ξ(n)
i
|ϕ(n)
k
{ξ(n)
i
, ψ(n,0)
i
} Inner loop: Do m = 1, ... WX |ψ(n,m)
k
= −
- i
|ξ(n)
i
ξ(n)
i
|ψ(n,m)
k
- Iterative diagonalization
Check Convergence on {ψ(n,m)
i
} at fixed {ξ(n)
i
} End Inner loop Check Convergence on {ϕ(n)
i
} End Outer loop
With the localization step the exchange integrals in the
- uter loop now
involve two localized functions.
Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015).
Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 7 / 18
Hybrid functionals for plane waves
2 4 6 8 10 12 14 16
#Iteration
200 400 600 800 1000
Wall time (sec) ACE Full Wall time along SCF iterations Exchange step
With the ACE method the cost of the inner iterations is analogous to the cost of a pure DFT method. Example: Ethylene, 1 CPU, 100 Ry, converged to 10−7 Ry Pure DFT: 6 Outer loop: 5 Inner loop: 2, 3, 3, 2
- Tot. calls: 32/5
Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 8 / 18
Localizing orbitals
The key to achieve better performances is orbital localization. There are various ways to localize orbitals in real space, the most famous being the transformation to Wannier
- rbitals.
The density matrix is naturally localized: for insulators, γ(r, r′) ∼ exp(−α|r − r′|)
- asymptotically. This means that w(r) =
j cjψj(r), with cj = ψ∗ j(r′) are localized!
The set of w’s above is redundant: we need a way to select the best manifold among the many possibles, without storing the density matrix. This can be achieved using the selected columns of the density matrix (SCDM)
- algorithm. This is an algebraic procedure that requires a standard QR decomposition
(actually, only the pivoting part): Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015). Once orbitals are localized, exchange integrals between non-overlapping orbitals can be discarded.
Hybrid functionals for plane waves
( ˆ VXwk)( r) = −
N
- i
wi( r)
- d
r ′ wi( r ′)wk( r ′) | r − r ′| The products between two canonical orbitals are much more delocalized than the product of two localized
- rbitals and this can be exploited in order to reduce the
cost of the exchange integrals evaluations.
Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 10 / 18
Hybrid functionals for plane waves
The SCDM method has been used for the localization and a threshold has been introduced in order to skip the smallest exchange integrals Sij =
- d
r|wi( r)| · |wj( r)| ≤ threshold Spatial extent of Canonical and Localized (SCDM) orbitals.
Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 11 / 18
Performances: amorphous Silica
432-atom amorphous silica cell Elapsed time (10y min) as a function of the size of the unit cell (10x atoms) a) old algorithm; b) ACE; c) and d) ACE+localization (threshold 0.002 and 0.004); e) simple DFT (no hybrid). Calculations performed on 150 processor cores at CINECA.
Performances: a molecule in water
CO3G molecule in water (565-atom cell)
Method ∆Ehyb ∆ES ∆ǫgas
HL
∆ǫwater
HL
Time ACE
- 24.9794
- 161.3846
2.0521 2.6091 249 (100%) L-ACE (0.001)
- 24.9790
- 161.3457
2.0521 2.6089 92 (36%) L-ACE (0.002)
- 24.9780
- 161.2540
2.0521 2.6085 71 (26%) L-ACE (0.003)
- 24.9767
- 161.1331
2.0521 2.6080 62 (21%) L-ACE (0.004)
- 24.9750
- 161.0085
2.0521 2.6075 56 (18%) L-ACE (0.005)
- 24.9730
- 160.8220
2.0521 2.6071 52 (16%) pure DFT
- 169.4025
1.1085 1.5666 5 (0%)
∆Ehyb: hybrid-pure DFT energy difference/atom ∆ES: solvation energy (kcal/mol) ∆ǫgas
HL: HOMO-LUMO gap (eV) in gas phase
∆ǫwater
HL
: HOMO-LUMO gap (eV) in water In parenthesis the fraction of included integral
Performances: TiO2 nanoparticle
465-atom model of a TiO2 nanoparticle in the anatase structure Method ∆Ehyb ∆ǫHL Time ACE 14.8840 4.3692 1877 (100%) L-ACE (0.001) 14.8844 4.3728 838 (38%) L-ACE (0.002) 14.8859 4.3751 754 (31%) L-ACE (0.003) 14.8879 4.3772 691 (27%) L-ACE (0.004) 14.8900 4.3783 643 (25%) L-ACE (0.005) 14.8929 4.3792 651 (23%) pure DFT 2.9434 112 (0%) ∆Ehyb: hybrid-pure DFT energy difference per atom (kcal/mol). ∆ǫHL: HOMO-LUMO gap (eV). Wall time (min) for one SCF with 288 CPUs. In parenthesis the fraction of included integral.
Conclusion
- Combination of ACE projection and SCDM localization leads to a stable and robust
algorithm allowing one order of magnitude improvements for medium- to large-size cells
- Preliminary results (not shown) for truly periodic systems (with k-points) also show
good performances, although less good than for the Γ−only cases shown here
- Further improvements possible if real-space localization is exploited to get rid of