Q Q UINO A UINO A April 18, 2017 - - PowerPoint PPT Presentation

q q uino a uino a
SMART_READER_LITE
LIVE PREVIEW

Q Q UINO A UINO A April 18, 2017 - - PowerPoint PPT Presentation

Quinoa: Adaptive Computational Fluid Dynamics J. Bakosi , R. Bird, C. Junghans, R. Pavel, J. Waltz Los Alamos National Laboratory F. Gonzalez B. Rogers University of Illinois Urbana-Champaign University of Tennessee Q Q UINO A UINO A April


slide-1
SLIDE 1

Quinoa: Adaptive Computational Fluid Dynamics

  • J. Bakosi, R. Bird, C. Junghans, R. Pavel, J. Waltz

Los Alamos National Laboratory

  • F. Gonzalez
  • B. Rogers

University of Illinois Urbana-Champaign University of Tennessee

April 18, 2017

UINOA

QUINOA Q

https://github.com/quinoacomputing/quinoa

Goal: hardware-adaptive large-scale multiphysics

◮ Fluid dynamics, turbulence, particle transport, chemistry, plasma physics of non-ideal multiple mixing materials ◮ Automatic dynamic computational load redistribution for real-world problems ◮ Preserving the domain scientist’s sanity

Agenda:

◮ Philosophy ◮ Infrastructure ◮ Two tools: particle solver, unstructured-grid PDE solver ◮ Future plan LA-UR-17-22931

slide-2
SLIDE 2

Philosophy

◮ Partition everything ◮ Be asynchronous everywhere ◮ Automate everything ◮ Remember that everything fails

Strategy

◮ Most physics codes start with capability then software engineering is an afterthought ◮ We start with a state-of-the-art production code then put in physics ◮ From scratch: not based on existing code ◮ C++11 & Charm++ (fully asynchronous, distributed-memory parallel)

Funding & history

◮ Started as a hobby project in 2013 (weekends and nights) ◮ First funding: Oct 2016

Work in progress

slide-3
SLIDE 3

Infrastructure

◮ 46K lines of code ◮ 20+ third-party libraries, 3 compilers ◮ Unit-, and regression tests ◮ Open source: https://github.com/quinoacomputing/quinoa ◮ Continuous integration (build & test matrix) with Travis & TeamCity ◮ Continuous quantified test code coverage with Gcov & CodeCov.io ◮ Continuous quantified documentation coverage with CodeCov.io ◮ Continuous static analysis with CppCheck & SonarQube ◮ Continuous deployment (of binary releases) to DockerHub

Ported to Linux, Mac, Cray (LANL, NERSC), Blue Gene/Q (ANL)

slide-4
SLIDE 4

Current tools

  • 1. walker – Random walker for stochastic differential equations
  • 2. inciter – Partial differential equations solver on 3D unstructured grids
  • 3. rngtest – Random number generator test suite
  • 4. unittest – Unit test suite
  • 5. meshconv – Mesh file converter
slide-5
SLIDE 5

Quinoa::Walker

◮ Particle solver ◮ Numerical integrator for stochastic differential equations ◮ Used to analyze and design the evolution of fluctuating variables and their statistics ◮ Used in production for the design of statistical moment approximations required for

modeling mixing materials in turbulence

◮ Future plan: Predict the probability density function in turbulent flows

∂ ∂tF(Y, t) = −

N−1

  • α=1

∂ ∂Yα

  • Aα(Y, t)F(Y, t)
  • + 1

2

N−1

  • α=1

N−1

  • β=1

∂2 ∂Yα∂Yβ

  • Bαβ(Y, t)F(Y, t)
  • dYα(t) = Aα(Y, t)dt +

N

  • β=1

bαβ(Y, t)dWβ(t), α = 1, . . . , N, Bαβ = bαγbγβ

slide-6
SLIDE 6

Walker SDAG for each PE

AdvP OrdM OrdP EvT

NoSt

CenM CenP OutS OutP

AdvP – advance particles OrdM – estimate ordinary moments CenM – estimate central moments, e.g., y − Y 2 OutS – output statistical moments EvT – evaluate time step OrdP – estimate ordinary PDFs CenP – estimate central PDFs, e.g., F(y − Y ) OutP – output PDFs NoSt – no stats, nor PDFs

src/Walker/distributor.ci

slide-7
SLIDE 7

240 1200 2400 12000 24000

10

2

10

3

10

4

10

5

Number of CPU cores (24/node)

200 400 600 800 1000

Wall clock time, sec Walker weak scaling with up to 3x10

9 particles

ideal

slide-8
SLIDE 8

Quinoa::Walker future plan

0.1 0.2

g

0.3 time turbulent kinetic energy

A

light heavy Laminar−turbulent transition Non−equilibrium flow (No models, very difficult to predict) Equilibrium flow Fully developed turbulence (Models exist) 0.4 0.5 5 10 15 20 DNS, A=0.25 DNS, A=0.5 DNS, A=0.05 PDF, A=0.25 PDF, A=0.5 PDF, A=0.05

◮ Goal: Predict the probability density function in

turbulent flows

◮ Why: Because it requires less approximations ◮ How: Integrate a large particle ensemble governed by

stochastic differential equations

◮ The ensemble represents the fluid itself ◮ Statistics and the discrete PDF extracted from the

ensemble in cells

1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 density probability 1.1 1.2 1.3 1.4 1.5 PDF, t=0 PDF, t=1.7 PDF, t=2.4 PDF, t=2.5 PDF, t=3.0 PDF, t=3.8 DNS, t=0 DNS, t=1.7 DNS, t=2.4 DNS, t=2.5 DNS, t=3.0 DNS, t=3.8

A = 0.5

slide-9
SLIDE 9

Quinoa::Inciter

◮ PDE solver for 3D unstructured (tet-only) grids ◮ Native Charm++ code using MPI-only libs: hypre, Zoltan2 ◮ Simple Navier-Stokes solver for compressible flows ◮ Finite elements ◮ Flux-corrected transport ◮ Asynchronous linear system assembly ◮ File/PE I/O ◮ Current work: adaptive mesh refinement, V&V ◮ Future plan: use AMR to explore scalability with large load-imbalances

slide-10
SLIDE 10

Flux-corrected transport

◮ Used when stuff (e.g., energy) moves from A to B (i.e., all the time) ◮ Godunov theorem: No linear scheme of order greater than one will yield monotonic

(wiggle-free) numerical solutions.

◮ A solution: Use a nonlinear scheme ◮ Combine a low-order (guaranteed to be monotonic) with a high-order (more accurate)

scheme in a nonlinear fashion exact low-order high-order FCT

slide-11
SLIDE 11

Matrix assembly

C6 C3 C1 C8

− do not migrate − interact with MPI−only linear system solver lib − migrate (not yet but will) − perform heavy−lifting of physics L3

C7 C9

L1

C2

L2

C4 C5

Matrix distributed across PEs (Charm++ group) L1,L2,... − LinSysMerger Charm++ group elements C1,C2,... − Carrier worker Charm++ array elements

slide-12
SLIDE 12

Inciter SDAG for each PE

ChRow – chares contribute their global row IDs ChBC – chares contribute their BC node IDs RowComplete – all groups have finished their row IDs Init – chares initialize dt – chares compute their next ∆t Aux – Low order solution Solve – Call hypre to solve linear system Asm* – Assemble RHS/LHS/UNK Hypre* – Convert RHS/LHS/UNK to hypre data structure

src/LinSys/linsysmerger.ci

slide-13
SLIDE 13
slide-14
SLIDE 14

900 1800 2520 3600 7200 14400 21600 36000

10

2

10

3

10

4

10

5

Number of CPU cores (36/node)

10

1

10

2

10

3

10

4

Wall clock time, sec

Navier-Stokes, RCB Navier-Stokes, MJ

ideal

Compressible Navier-Stokes, 794M (setup, 100 time steps, no I/O) ~50Kel/PE

slide-15
SLIDE 15

Quinoa::Inciter future plan

◮ Now: Distributed-memory-parallel asynchronous AMR ◮ Next: Explore scalability with large load-imbalances (migration) ◮ Future:

◮ Asynchronous I/O ◮ Explore various threading and SIMD abstractions ◮ Explore CERN’s ROOT framework for data storage, statistical analysis, and visualization ◮ Fault tolerance

Waltz, Int. J. Numer. Meth. Fluids, 2004.

slide-16
SLIDE 16

Acknowledgments TPLs: Charm++, Parsing Expression Grammar Template Library, C++ Template Unit Test Framework, Boost, Cartesian product, PStreams, HDF5, NetCDF, Trilinos: SEACAS, Zoltan2, Hypre, RNGSSE2, TestU01, PugiXML, BLAS, LAPACK, Adaptive Entropy Coding library, libc++, libstdc++, MUSL libc, OpenMPI, Intel Math Kernel Library, H5Part, Random123 Compilers: Clang, GCC, Intel Tools: Git, CMake, Doxygen, Ninja, Gold, Gcov, Lcov, NumDiff