[PPT] - An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul PowerPoint Presentation

SLIDE 1

An FPGA Implementation of Reciprocal Sums for SPME

Sam Lee and Paul Chow

Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto

SLIDE 2

2

Objectives

Accelerate part of Molecular Dynamics Simulation

Smooth Particle Mesh Ewald

Implementation

FPGA based Try it and learn

Investigation

Acceleration bottleneck Precision requirement Parallelization strategy

SLIDE 3

3

Presentation Outline

Molecular Dynamics SPME The Reciprocal Sum Compute Engine Speedup and Parallelization Precision Future work

SLIDE 4

4

Molecular Dynamics Simulation

SLIDE 5

5

1. Calculate interatomic

forces.

2. Calculate the net force.
3. Integrate Newton’s

equations of motion.

Molecular Dynamics

Combines empirical force

calculations with Newton’s equations of motion.

Predict the time trajectory
f small atomic systems.
Computationally

demanding.

1 − → →

⋅ = m F a

( ) ( ) ( ) ( )

t a t t v t t r t t r

→ → → →

+ + = +

2

5 . δ δ δ

( ) ( ) ( ) ( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ + + + = +

→ → → →

t t a t a t t v t t v δ δ δ 5 .

∑

→

F

∫

SLIDE 6

6

Molecular Dynamics

∑

−

Bonds All

b

l l k

2

) (

∑

Θ − Θ −

Θ Angles All

k

2

) (

∑

+ +

Torsions All

n A )] cos( 1 [ φ τ

∑

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛

Pairs All

r r

6 12

4 σ σ ε

∑

Pairs All

r q q

2 1

U =

+

δ

−

δ

+ + + +

SLIDE 7

7

MD Simulation

Problem scientists are facing:

SLOW! O(N2) complexity.

3 0 CPU Years

SLIDE 8

8

Solutions

Parallelize to more compute engines Accelerate with FPGA Especially: The non-bonded calculations To be more specific, this paper addresses:

Electrostatic interaction (Reciprocal space) Smooth Particle Mesh Ewald algorithm.

SLIDE 9

9

Previous Work

Software SPME Implementations:

Original PME Package written by Toukmaji. Used in NAMD2.

Hardware Implementations:

No previous hardware implementation of

reciprocal sums calculation.

MD-Grape & MD-Engine uses Ewald Summation. Ewald Summation is O(N2); SPME is O(NLogN)!

SLIDE 10

10

Smooth Particle Mesh Ewald

SLIDE 11

11

Electrostatic Interaction

Coulombic equation: Under the Periodic Boundary Condition,

the summation to calculate Electrostatic energy is only … Conditionally Convergent.

∑∑∑

= =

=

' 1 1 ,

2 1

n N i N j n ij j i

r q q U

r q q vcoulomb

2 1

4πε − =

SLIDE 12

12

Periodic Boundary Condition

A

3 2 1 4 5

B

3 2 1 4 5

C

3 2 1 4 5

D

3 2 1 4 5

E

3 2 1 4 5

F

3 2 1 4 5

G

3 2 1 4 5

H

3 2 1 4 5

I

3 2 1 4 5

To combat Surface Effect…

3 2 1 4 5

Replication

SLIDE 13

13

Ewald Summation Used For PBC

r q r q r

q

Direct Sum Reciprocal Sum

To calculate the Coulombic Interactions O(N2) Direct Sum + O(N2) Reciprocal Sum

SLIDE 14

14

Smooth Particle Mesh Ewald

Shift the workload to the Reciprocal Sum. Use Fast Fourier Transform. O(N) Real + O(NLogN) Reciprocal. RSCE calculates the Reciprocal Sums

using the SPME algorithm.

SLIDE 15

15

SPME Reciprocal Contribution

) ,m ,m m Q)( (θ ) ,m ,m (m r Q r E F

K m K m rec K m αi αi rec ~ 3 2 1 1 1 1 1 2 2 1 3 3 3 2 1

∑ ∑ ∑

− = − = − =

∗

∂

∂ = ∂ ∂ =

2 3 3 2 2 2 2 1 1 3 2 1

) (m b ) (m b ) (m b ) ,m ,m B(m

=

1 2

2 exp 1 1 2 exp

− − =

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + × − =

∑

n k i i n i i i i

) K k πim ( ) (k M ) K )m πi(n ( ) (m b

2 2 2 2 3 2 1

exp 1 m ) /β m π ( πV ) ,m ,m C(m − =

= ≠ ) , , ,c( m ) m , m , m )F(Q)( ,m ,m F(Q)(m ) ,m ,m B(m m ) /β m π ( πV E

m ~ 3 2 1 3 2 1 3 2 1 2 2 2 2

exp 2 1 − − −

−

=

∑

≠

FFT FFT

Energy: Force:

) ,m ,m m Q)( (θ ) ,m ,m Q(m E

K m K m rec K m ~ 3 2 1 1 1 1 1 2 2 1 3 3 3 2 1

2 1 ∑ ∑ ∑

− = − = − =

∗

=

SLIDE 16

16

Charge Interpolation

A B C D E F

SLIDE 17

17

Reciprocal Sum Compute Engine

SLIDE 18

18

RSCE Architecture

SLIDE 19

19

RSCE Verification Testbench

SLIDE 20

20

RSCE Validation Environment

SLIDE 21

21

Speedup Estimate

RSCE vs. Software Implementation

SLIDE 22

22

RSCE Speedup

RSCE @ 100MHz vs. P4 Intel @ 2.4GHz.

Speedup: 3x to 14x

Why so insignificant?

Reciprocal Sums calculations not easily

parallelizable.

QMM memory bandwidth limitation.

Improvement:

Using more QMM memories can improve the

speedup.

Slight design modifications are required.

SLIDE 23

23

Parallelization Strategy

Multiple RSCE

SLIDE 24

24

RSCE Parallelization Strategy

Assume a 2-D simulation system. Assume P= 2, K= 8, N= 6. Assume NumP = 4. Four 4x4x4 Mini Meshes An 8x8x8 mesh

SLIDE 25

25

RSCE Parallelization Strategy

P1 P3 P2 P4

Kx

1D FFT Y direction

Ky

P1 P3 P2 P4

Kx

1D FFT X direction

Ky

Mini-mesh composed -> 2D-IFFT 2D-IFFT = two passes of 1D-FFT (X and Y). X Direction FFT Y Direction FFT

SLIDE 26

26

Parallelization Strategy ∑

=

3 P P Total

E E

2D-FFT 2D-IFFT -> Energy Calculation -> 2D-FFT 2D-FFT -> Force Calculation Energy Calculation Force Calculation

SLIDE 27

27

MD Simulations RSCE + NAMD2

SLIDE 28

28

RSCE Precision

Precision goal: Relative error bound < 10-5. Two major calculation steps:

B-Spline Calculation. 3D-FFT/ IFFT Calculation.

Due to the limited logic resource & limited

precision FFT LogiCore. = > Precision goal cannot be achieved.

SLIDE 29

29

RSCE Precision

To achieve the relative error bound of < 10-5. Minimum calculation precision:

FFT { 14.30} , B-Spline { 1.27}

SLIDE 30

30

MD Simulation with RSCE

RMS Energy Error Fluctuation:

E E E n Fluctuatio Energy RMS

2 2 −

=

SLIDE 31

31

FFT Precision Vs. Energy Fluctuation

SLIDE 32

32

Summary

Implementation of FPGA-based Reciprocal Sums

Compute Engine and its SystemC model.

Integration of the RSCE into a widely used

Molecular Dynamics program called NAMD2 for verification

RSCE Speedup Estimate

3x to 14x

Precision Requirement

B-Spline: { 1.27} & FFT: { 14: 30} = > 10-5 rel. error

Parallelization Strategy

SLIDE 33

33

Future Work

More in-depth precision analysis. Investigation on how to further speedup

the SPME algorithm with FPGA.

SLIDE 34

34

An FPGA Implementation of Reciprocal Sums for SPME

Objectives

Presentation Outline

Molecular Dynamics Simulation

Molecular Dynamics

∑

∫

Molecular Dynamics

∑

∑

∑

∑

U =

MD Simulation

Solutions

Previous Work

Smooth Particle Mesh Ewald

Electrostatic Interaction

∑∑∑

Periodic Boundary Condition

Ewald Summation Used For PBC

Smooth Particle Mesh Ewald

SPME Reciprocal Contribution

Charge Interpolation

Reciprocal Sum Compute Engine

RSCE Architecture

RSCE Verification Testbench

RSCE Validation Environment

Speedup Estimate

RSCE vs. Software Implementation

RSCE Speedup

Parallelization Strategy

Multiple RSCE

RSCE Parallelization Strategy

RSCE Parallelization Strategy

Parallelization Strategy ∑

MD Simulations RSCE + NAMD2

RSCE Precision

RSCE Precision

MD Simulation with RSCE

FFT Precision Vs. Energy Fluctuation

Summary

Future Work

Questions