Jeremy Sweezy
Scientist Monte Carlo Methods, Codes and Applications Group
3/28/2018
LA-UR-18-XXXX
Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport
GTC 2018
Breaking Through the Barriers to GPU Accelerated Monte Carlo - - PowerPoint PPT Presentation
Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport GTC 2018 Jeremy Sweezy Scientist Monte Carlo Methods, Codes and Applications Group 3/28/2018 Operated by Los Alamos National Security, LLC for the U.S.
Jeremy Sweezy
Scientist Monte Carlo Methods, Codes and Applications Group
3/28/2018
LA-UR-18-XXXX
Breaking Through the Barriers to GPU Accelerated Monte Carlo Particle Transport
GTC 2018
What is Monte Carlo Particle Transport?
3/23/18 | 2 Los Alamos National Laboratory
– Follows the path of individual particles through a system
– Uses pseudo-random numbers to sample processes – Randomly sample physical and non-physical processes – Attributed to Stanislaw Ulam and
Enrico Fermi – Named because Ulam had an uncle who who would borrow money from relatives because he “just had to go to Monte Carlo”
FERMIAC
Porting to Specialized Hardware is Prohibitively Expensive
3/23/18 | 3 Los Alamos National Laboratory
–The world’s production Monte Carlo codes have decades of development –LANL’s MCNP code has been in development since 1977 –Equally extensive amount of V&V effort –Codes have to run on desktop machines and super-computers –DOE HPC platforms have been in a state of flux for the last 10-years
Barrier #1: Limited Resources (Money, People, Time)
Monte Carlo Random Walk on GPU Hardware has reached a Performance Wall
3/25/18 | 4 Los Alamos National Laboratory
hardware for neutron transport
worse performance.
– Conditional branching – Random data access – No small computational intensive kernel to accelerate Barrier #2: Performance of random walk on GPUs 4.5x 3.0x
How do You Define Performance?
3/23/18 | 5 Los Alamos National Laboratory
speed.
𝑸 = 𝑼𝑫𝑸𝑽 𝑼𝑯𝑸𝑽
between speed and statistical variance using a Figure-of-Merit
To date, almost all GPU implementations of Monte Carlo particle transport
𝑭𝒚𝒃𝒏𝒒𝒎𝒇: 𝑮𝑷𝑵 = 𝟏. 𝟐𝟑 7 𝟐 min 𝟏. 𝟏𝟔𝟑 7 𝟑 min = 𝟑 𝑮𝑷𝑵 = 𝝉𝑫𝑸𝑽
𝟑
𝑼𝑫𝑸𝑽 𝝉𝑯𝑸𝑽
𝟑
𝑼𝑯𝑸𝑽
Next Event Estimator
3/23/18 | 6 Los Alamos National Laboratory
probability of a particle from a source or collision event reaches a point without interaction
A Cell 1 Cell 2 μ Image Plane B
𝑻 𝑺, 𝑭 = 𝒙 𝟑𝝆𝑺𝟑 × C 𝝉𝒋 𝑺, 𝑭 𝝉𝑼 𝒒𝒋 𝝂, 𝑭 → 𝑭G exp exp( − M 𝚻𝑼 𝒕, 𝑭G 𝒆𝒕
𝑺 𝟏
)
𝑶 𝒋S𝟐 Ray-cast One to two orders of magnitude faster on GPU hardware
Traditional Track-Length Estimator
3/25/18 | 7 Los Alamos National Laboratory
Cell 1 B Cell 2 Cell 3 Computing has changed, we need to change our algorithms too!
Volumetric-Ray-Casting Estimator
3/25/18 | 8 Los Alamos National Laboratory
Cell 1 B Cell 2 Cell 3
F 𝒋, 𝑭′ =
𝒙 𝟐UVWX U𝚻𝑼,𝒋 𝑭Y 𝒎𝒋 𝑶𝚻𝑼,𝒋(𝑭Y)
exp − ∫ 𝚻𝑼 𝒔 + 𝛁′𝒕′, 𝑭G 𝒆𝒕′
𝒔YU𝒔 𝟏 Ray-cast A neutron dance for a neutron fan. P.M. Dawn
MonteRay - Accelerating Monte Carlo Transport with GPU Ray Tracing
3/23/18 | 9 Los Alamos National Laboratory
–Next-Event estimator –Volumetric-Ray-Casting estimator, a new estimator designed for GPUs –Supports neutron and photon tallies
Reduces cost of accelerating an existing Monte Carlo code with GPUs
MonteRay - Testing
3/23/18 | 10 Los Alamos National Laboratory
–GeForce GTX TitanX GPU with NVIDIA Maxwell architecture –2 CPUs (Intel Haswell E5-2660 v3 at 2.60 GHz), with 10 cores each
shared memory
estimator
Track-length estimator performance on the CPU
Testing the Next-Event Estimator on GPU Hardware: Two Radiography Tests
3/23/18 | 11 Los Alamos National Laboratory
MonteRay – Medical X-Ray Imaging Simulation
3/23/18 | 12 Los Alamos National Laboratory
MonteRay – Medical X-Ray Imaging Simulation
3/23/18 | 13 Los Alamos National Laboratory
contribution calculated separately
relatively easy to calculate
important for collimator design
performance 15-18x
14.5x 15.3x
MonteRay – Industrial Radiography
3/23/18 | 14 Los Alamos National Laboratory
Hydrodynamic Test Facility
collimators and experiment, but too computational expensive
I'm a peeping-tom techie with x-ray eyes – Patrick Lee MacDonald
MonteRay – Industrial Radiography
3/23/18 | 15 Los Alamos National Laboratory
10 100 5 10 15 20 Relative Performance Number of CPU Cores / GPU Source Collided
Collided calculation performance 15-32x! GPU Performance vs Number of CPU Cores 28.5x 24.2x
Volumetric-Ray-Casting Estimator on GPU Hardware vs Track-Length Estimator on CPU Hardware
3/23/18 | 16 Los Alamos National Laboratory
Cancer Treatment Simulation
3/23/18 | 17 Los Alamos National Laboratory
Tumor 2-MeV Photon Beam What is the dose to healthy tissue? GPU Performance vs 8 CPU Cores 14x performance improvement in healthy tissue
Cancer Treatment Simulation
3/23/18 | 18 Los Alamos National Laboratory
GPU Performance vs Number of CPU Cores in Healthy Tissue Performance is 14x vs 8 CPU cores or 10x vs 12 CPU cores 14.3x 10.2x
Pressured Water Reactor Assembly Simulation
3/23/18 | 19 Los Alamos National Laboratory
GPU Performance vs 8 CPU Cores Control Rod Fuel Pin
Pressured Water Reactor Assembly Simulation
3/23/18 | 20 Los Alamos National Laboratory
GPU Performance vs Number of CPU Cores Compared to 8 CPU cores performance in control rod 7.2x and 6.0x in the fuel 7.2x 5.4x 6.0x 4.4x
Criticality Accident Simulation
3/23/18 | 21 Los Alamos National Laboratory
GPU Performance vs 8 CPU Cores Uranium Sphere Performance increase of 14-16x in the center of the room
Criticality Accident Simulation – Smoother Fluence Estimate
3/23/18 | 22 Los Alamos National Laboratory
Track-Length Estimator Volumetric-Ray-Casting Estimator
Criticality Accident Simulation
3/23/18 | 23 Los Alamos National Laboratory
GPU Performance vs Number of CPU Cores Things are going great, and they’re only getting better – Patrick Lee MacDonald 15x 10.5x
Reflected Godiva Criticality Experiment Simulation
3/23/18 | 24 Los Alamos National Laboratory
–2.5x in the core –1.0x in the water
GPU Performance vs 8 CPU Cores
Reflected Godiva Criticality Experiment Simulation
3/23/18 | 25 Los Alamos National Laboratory
estimator approaches that of the Track-Length estimator is strong scattering material.
1 1.5 2 2.5 3 3.5 4 4.5 1 4 8 12 16 20 Variance Ratio ( σTL
2 / σ2
VRC )Number of Samples per Collision (N)
Performance is limited by the estimator variance, not the GPU speed Variance Ratio vs Num. Collisions GPU Performance vs. Num. CPU Cores 2.2x 2.2x
Conclusions
3/23/18 | 26 Los Alamos National Laboratory
Monte Carlo particle transport
–Can be incorporated into legacy codes at low cost. –Works with standard variance reduction methods
–Up to 32 times for the Next-event estimator as compared to 8 CPU cores –Up to 14 times for the Volumetric-ray-casting estimator as compared to the Track-Length estimator on 8 CPU cores
MonteRay provides a method of breaking through the barriers of limited resources and limited performance
Jeremy Sweezy jsweezy@lanl.gov
3/23/18 | 27 Los Alamos National Laboratory
3/23/18 | 28 Los Alamos National Laboratory
Uncertainty - Pressured Water Reactor Assembly Simulation
3/23/18 | 29 Los Alamos National Laboratory
Volumetric-Ray-Casting Estimator Track-Length Estimator 600 sec., 8 CPU Cores and 1 GPU 93 cycles, 40000 Particles/Cycle 8 rays/collision 600 sec., 8 CPU Cores 124 cycles, 40000 Particles/Cycle