Fast hybrid density-functional computations using plane-wave basis - - PowerPoint PPT Presentation

fast hybrid density functional computations using plane
SMART_READER_LITE
LIVE PREVIEW

Fast hybrid density-functional computations using plane-wave basis - - PowerPoint PPT Presentation

Fast hybrid density-functional computations using plane-wave basis sets Paolo Giannozzi, Universit` a di Udine, Italy Workshop A fresco of contemporary condensed-matter physics in honour of Pino Grosso Pisa, 27-28 September 2018 Work mostly


slide-1
SLIDE 1

Fast hybrid density-functional computations using plane-wave basis sets

Paolo Giannozzi, Universit` a di Udine, Italy Workshop A fresco of contemporary condensed-matter physics in honour of Pino Grosso – Pisa, 27-28 September 2018 Work mostly done by Ivan Carnimeo, in collaboration with Stefano Baroni (SISSA), in the framework of MaX - Materials at the eXascale Centre of Excellence

– Typeset by FoilT EX –

slide-2
SLIDE 2

Background and motivations

  • In many fields of science, atomistic simulations have become very successful ...
  • ...

in particular, first-principle simulations based on density-functional theory (DFT), pseudopotentials (PP) and a plane-wave (PW) basis sets

  • Hybrid functionals, containing some admixture of exact (Hartee-Fock) exchange,

currently offer the best performances in terms of accuracy of results ...

  • ... but they are so slooow with PWs!

The goal of this work: to produce a fast, reliable and robust method to compute the exact-exchange potential in a PW-PP framework

slide-3
SLIDE 3

Density-Functional Theory: Kohn-Sham approach

Let us introduce the orbitals ψi for an auxiliary set of non-interacting electrons whose charge density is the same as that of the true system: n(r) =

  • i

|ψi(r)|2, ψi|ψj = δij Let us rewrite the energy functional in a more manageable way as: E = Ts[n(r)] + EH[n(r)] + Exc[n(r)] +

  • n(r)V (r)dr

where Ts[n(r)] is the kinetic energy of the non-interacting electrons: Ts[n(r)] = − ¯ h2 2m

  • i
  • ψ∗

i (r)∇2ψi(r)dr,

EH[n(r)] is the Hartree energy, due to electrostatic interactions: EH[n(r)] = e2 2 n(r)n(r′) |r − r′| drdr′,

slide-4
SLIDE 4

Exc[n(r)] is called exchange-correlation energy (a reminiscence from the Hartree-Fock theory) and includes all the remaining (unknown!) energy terms. Minimization of the energy with respect to ψi yields the Kohn-Sham (KS) equations:

  • − ¯

h2 2m∇2 + V (r) + VH(r) + Vxc(r)

  • HKS

ψi(r) = ǫiψi(r), where the Hartree and exchange-correlation potentials: VH(r) = δEH[n(r)] δn(r) = e2

  • n(r′)

|r − r′|dr′, Vxc(r) = δExc[n(r)] δn(r) depend self-consistently upon the ψi via the charge density. Self-consistency is achieved by looping over the charge density, using some suitable mixing algorithm, such as e.g. simple mixing. At iteration l + 1: n(l+1)

in

= βn(l)

  • ut + (1 − β)n(l)

in,

0 < β < 1

slide-5
SLIDE 5

Hybrid functionals

The “standard” functionals, LDA (local density approximation) or GGA (generalized gradient approximation), are functions of the local density n(r) and for GGA of gradient |∇n(r)|: Exc =

  • n(r)ǫGGA (n(r), |∇n(r)|) dr

Better-performing hybrid functionals, such as B3LYP or PBE0, contain some amount

  • f exact exchange, as in Hartree-Fock theory (spin-restricted):

E = Ts +

  • n(r)V (r)dr + EH −e2

2

  • i,j

ψ∗

i (r)ψj(r)ψ∗ j(r′)ψi(r′)

|r − r′| drdr′

  • EHF

x

In hybrid DFT: E = Ts +

  • n(r)V (r)dr + EH +

Ehyb

xc

  • αxEHF

x

+ (1 − αx)EDF T

x

+ EDF T

c

, with αx = 0.2 ÷ 0.3.

slide-6
SLIDE 6

Exchange potential

The exchange potential Vx is, unlike Vxc in GGA, nonlocal and depends upon the density matrix, γ(r, r′) =

  • j

ψj(r)ψ∗

j(r′):

(Vxψi)(r) ≡ δEx δψ∗

i (r) = −e2 j

ψj(r)ψ∗

j(r′)ψi(r′)

|r − r′| dr′ More generally, it may be defined it via its action on a generic orbital: (Vxφi)(r) ≡ −e2

j

ψj(r)ψ∗

j(r′)φi(r′)

|r − r′| dr′ Let us assume that our orbitals are expanded into PWs: ψj(r) =

  • G

cj,k+G 1 √ NΩ ei(k+G)·r, ¯ h2 2m|k + G|2 ≤ Ecut k = Bloch vector, G = reciprocal lattice vectors, Ecut = kinetic energy cutoff, Ω = unit cell volume, NΩ = crystal volume.

slide-7
SLIDE 7

Exchange potential and plane waves

With PW, orbitals are represented in Fourier space as {cj,k+G}, j = 1, ..., M, where M is the number of orbitals. Vx can be easily computed using fast Fourier transforms (FFT) (in the following, simplified case, k = 0, orbitals are real):

  • φi(G)

F F T

− → φi(r) (FFT brings from a discrete grid of G into a discrete grid of r)

  • ρij(r) = ψj(r) · φi(r)
  • ρij(r)

F F T

− → ρij(G)

  • vij(G) = ρij(G)/G2
  • vij(G)

F F T

− → vij(r)

  • (VXφi)(r) =

j ψj(r) · vij(r)

  • (VXφi)(r)

F F T

− → (VXφi)(G) Each operation is very quick ... but O(M 2) such operations are needed!

slide-8
SLIDE 8

Hybrid functionals for plane waves

Double-loop algorithm

Pure DFT Outer loop: Do n = 1, ... (VX ϕ(n)

k )(

r) = −

N

  • i

ϕ(n)

i

( r)

  • d

r ′ ϕ(n)

i

( r ′)ϕ(n)

k (

r ′) | r − r ′| SLOW!! {ϕ(n)

i

, ψ(n,0)

i

} Inner loop: Do m = 1, ... (VX ψ(n,m)

k

)( r) = −

N

  • i

ϕ(n)

i

( r)

  • d

r ′ ϕ(n)

i

( r ′)ψ(n,m)

k

( r ′) | r − r ′| SLOW!! Iterative diagonalization Check Convergence on {ψ(n,m)

i

} at fixed {ϕ(n)

i

} End Inner loop Check Convergence on {ϕ(n)

i

} End Outer loop

The exchange integrals have to be evaluated at every iteration, even though the {ϕ(n)

i

} functions are not varying in the inner

  • nes.

Giannozzi et al., JPCM, 21, 395502 (2009)

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 4 / 18

slide-9
SLIDE 9

Exchange with localized orbitals: inner projection

We project the exchange operator over the set of occupied orbitals ˆ WX =

  • ij

ˆ VX |ϕi ϕi| ˆ VX|ϕj

−1 ϕj| ˆ

VX =

  • i

|ξi ξi| |ξk =

  • i

ˆ VX |ϕi L−T

ik

Adaptively Compressed Exchange (ACE) Inner projection methods. The application of ˆ WX during the SCF is equivalent to ˆ VX within the subspace of the projection ˆ WX |ψk =

  • i

|ξi ξi|ψk

Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015); L owdin, IJQC, 4, 231 (1971).

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 6 / 18

slide-10
SLIDE 10

Hybrid functionals for plane waves

Double-loop algorithm

Pure DFT Outer loop: Do n = 1, ... (VX ϕ(n)

k )(

r) = −

N

  • i

ϕ(n)

i

( r)

  • d

r ′ ϕ(n)

i

( r ′)ϕ(n)

k (

r ′) | r − r ′| SLOW!! WX =

  • i

|ξ(n) ξ(n)| = VX |ϕ(n) ϕ(n)|VX |ϕ(n)

−1 ϕ(n)| VX

WX |ϕ(n)

k = −

  • i

|ξ(n)

i

ξ(n)

i

|ϕ(n)

k

{ξ(n)

i

, ψ(n,0)

i

} Inner loop: Do m = 1, ... WX |ψ(n,m)

k

= −

  • i

|ξ(n)

i

ξ(n)

i

|ψ(n,m)

k

  • Iterative diagonalization

Check Convergence on {ψ(n,m)

i

} at fixed {ξ(n)

i

} End Inner loop Check Convergence on {ϕ(n)

i

} End Outer loop

With the localization step the exchange integrals in the

  • uter loop now

involve two localized functions.

Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015).

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 7 / 18

slide-11
SLIDE 11

Hybrid functionals for plane waves

2 4 6 8 10 12 14 16

#Iteration

200 400 600 800 1000

Wall time (sec) ACE Full Wall time along SCF iterations Exchange step

With the ACE method the cost of the inner iterations is analogous to the cost of a pure DFT method. Example: Ethylene, 1 CPU, 100 Ry, converged to 10−7 Ry Pure DFT: 6 Outer loop: 5 Inner loop: 2, 3, 3, 2

  • Tot. calls: 32/5

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 8 / 18

slide-12
SLIDE 12

Localizing orbitals

The key to achieve better performances is orbital localization. There are various ways to localize orbitals in real space, the most famous being the transformation to Wannier

  • rbitals.

The density matrix is naturally localized: for insulators, γ(r, r′) ∼ exp(−α|r − r′|)

  • asymptotically. This means that w(r) =

j cjψj(r), with cj = ψ∗ j(r′) are localized!

The set of w’s above is redundant: we need a way to select the best manifold among the many possibles, without storing the density matrix. This can be achieved using the selected columns of the density matrix (SCDM)

  • algorithm. This is an algebraic procedure that requires a standard QR decomposition

(actually, only the pivoting part): Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015). Once orbitals are localized, exchange integrals between non-overlapping orbitals can be discarded.

slide-13
SLIDE 13

Hybrid functionals for plane waves

( ˆ VXwk)( r) = −

N

  • i

wi( r)

  • d

r ′ wi( r ′)wk( r ′) | r − r ′| The products between two canonical orbitals are much more delocalized than the product of two localized

  • rbitals and this can be exploited in order to reduce the

cost of the exchange integrals evaluations.

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 10 / 18

slide-14
SLIDE 14

Hybrid functionals for plane waves

The SCDM method has been used for the localization and a threshold has been introduced in order to skip the smallest exchange integrals Sij =

  • d

r|wi( r)| · |wj( r)| ≤ threshold Spatial extent of Canonical and Localized (SCDM) orbitals.

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 11 / 18

slide-15
SLIDE 15

Performances: amorphous Silica

432-atom amorphous silica cell Elapsed time (10y min) as a function of the size of the unit cell (10x atoms) a) old algorithm; b) ACE; c) and d) ACE+localization (threshold 0.002 and 0.004); e) simple DFT (no hybrid). Calculations performed on 150 processor cores at CINECA.

slide-16
SLIDE 16

Performances: a molecule in water

CO3G molecule in water (565-atom cell)

Method ∆Ehyb ∆ES ∆ǫgas

HL

∆ǫwater

HL

Time ACE

  • 24.9794
  • 161.3846

2.0521 2.6091 249 (100%) L-ACE (0.001)

  • 24.9790
  • 161.3457

2.0521 2.6089 92 (36%) L-ACE (0.002)

  • 24.9780
  • 161.2540

2.0521 2.6085 71 (26%) L-ACE (0.003)

  • 24.9767
  • 161.1331

2.0521 2.6080 62 (21%) L-ACE (0.004)

  • 24.9750
  • 161.0085

2.0521 2.6075 56 (18%) L-ACE (0.005)

  • 24.9730
  • 160.8220

2.0521 2.6071 52 (16%) pure DFT

  • 169.4025

1.1085 1.5666 5 (0%)

∆Ehyb: hybrid-pure DFT energy difference/atom ∆ES: solvation energy (kcal/mol) ∆ǫgas

HL: HOMO-LUMO gap (eV) in gas phase

∆ǫwater

HL

: HOMO-LUMO gap (eV) in water In parenthesis the fraction of included integral

slide-17
SLIDE 17

Performances: TiO2 nanoparticle

465-atom model of a TiO2 nanoparticle in the anatase structure Method ∆Ehyb ∆ǫHL Time ACE 14.8840 4.3692 1877 (100%) L-ACE (0.001) 14.8844 4.3728 838 (38%) L-ACE (0.002) 14.8859 4.3751 754 (31%) L-ACE (0.003) 14.8879 4.3772 691 (27%) L-ACE (0.004) 14.8900 4.3783 643 (25%) L-ACE (0.005) 14.8929 4.3792 651 (23%) pure DFT 2.9434 112 (0%) ∆Ehyb: hybrid-pure DFT energy difference per atom (kcal/mol). ∆ǫHL: HOMO-LUMO gap (eV). Wall time (min) for one SCF with 288 CPUs. In parenthesis the fraction of included integral.

slide-18
SLIDE 18

Conclusion

  • Combination of ACE projection and SCDM localization leads to a stable and robust

algorithm allowing one order of magnitude improvements for medium- to large-size cells

  • Preliminary results (not shown) for truly periodic systems (with k-points) also show

good performances, although less good than for the Γ−only cases shown here

  • Further improvements possible if real-space localization is exploited to get rid of

FFTs (work in progress)