Improving Virtually Guided Product Certification with Implicit - - PowerPoint PPT Presentation

improving virtually guided product certification with
SMART_READER_LITE
LIVE PREVIEW

Improving Virtually Guided Product Certification with Implicit - - PowerPoint PPT Presentation

Improving Virtually Guided Product Certification with Implicit Finite Element Analysis at Scale Seid Koric 1 , Robert F. Lucas 2 , Erman Guleryuz 1 1 National Center for Supercomputing Applications (NCSA) 2 Livermore Software Technology Corporation


slide-1
SLIDE 1

Improving Virtually Guided Product Certification with Implicit Finite Element Analysis at Scale

Seid Koric1, Robert F. Lucas2, Erman Guleryuz1

1National Center for Supercomputing Applications (NCSA) 2Livermore Software Technology Corporation (LSTC)

Blue Waters Symposium 2018, June 6th

slide-2
SLIDE 2

NCSA Private Sector Partners

slide-3
SLIDE 3

Seid Koric, Erman Guleryuz Todd Simons, James Ong Jef Dawson, Ting-Ting Zhu Robert Lucas, Roger Grimes, Francois-Henry Rouet

slide-4
SLIDE 4

Project overview

  • Long-term vision: Fully virtual product development and certification with digital twins
  • Pressing need for high-performance simulations to reduce development risks and costs
  • Finite Element Method widely used for product design
  • Challenge: Large-scale system-level models with wide spectrum of characteristic lengths
  • Parallel performance is key for impact
  • Parallel performance = f (code, input, platform)
  • Measure-analyze-improve cycle with large-scale real-life models
slide-5
SLIDE 5

Simulation model: Gas turbine engine

{ } { }

t+Δt t+Δt t+Δt i-1 i-1 i-1

K Δu = R     Implicit FEM, Direct method based on factorization Linear system solved at each NR iterations Solving the linear system takes large portion of run time Stiffness matrix [K] is sparse! Most coefficients are zero Triangular systems are easy to solve O(100 M) DOF PDE  Linear system

slide-6
SLIDE 6

105M DOF, one implicit time step, 8 threads/MPI Symbolic factorization is sequential bottleneck Triangular solution wall-clock time < 10 s E(p) = T(n,1) / p T(n,p) Parallel efficiency of phases in linear solver

slide-7
SLIDE 7

Sparse matrix reordering - Highlights

  • LS-GPart: Non-multilevel parallel nested dissection based on half-level sets
  • Graph theory leveraged, goal is to find a vertex separator of the adjacency graph of K
  • Recursive partitioning of the graph defined by a tree of vertex separators
  • Optimal vertex separators  Optimal fill-in and factor FLOPS
  • Popular fill-reducing reordering tools: METIS (default), Scotch, ParMETIS, PT-Scotch
  • Ordering quality = f (factor non-zeros, factor flops)
slide-8
SLIDE 8

10 20 30 40 50 60 70 20 40 60 80 100 120 140

Free memory (GB) Time (minutes)

Reordering performance and quality Trace of available memory with Ovis 4096 threads, 2 MPI/node, 8 threads/MPI

slide-9
SLIDE 9

Numeric factorization - Highlights

  • Multifrontal block low-rank factorization
  • Sequence of dense matrix operations using the elimination tree
  • Factorization consists in a bottom-up traversal of the tree
  • Each node of the tree corresponds to a dense matrix, BLAS
  • Natural parallelization: Dependencies between tasks are captured by the elimination tree
  • Columns in different tree branches can be factorized in parallel
slide-10
SLIDE 10

Numeric factorization Triangular solution

slide-11
SLIDE 11

Input processing and domain decomposition on a xe_himem node Memory trace for 200 M DOF Model xe (64 GB) and xe_himem (128 GB) combo

slide-12
SLIDE 12

Work-in-progress and long-term vision

  • Parallel symbolic factorization
  • Unknown bottleneck: constraint processing
  • Load balance, communication improvements
  • Relevant scale for full impact?
  • Dr. Yoon Ho, Rolls-Royce, ISC14
slide-13
SLIDE 13

Concluding remarks

  • Current state: Scaling up to 16,000 threads with hybrid parallelization, 30 Tflop/s, 200 M DOF
  • Collaboration model matters: All stakeholders on board
  • Software development challenges: Access to HPC, lack of portable tools, hardware complexity
  • Additional challenges: Multiple development teams for large codes, evolve from present code
  • Beyond FLOPS: Workflow problem on path towards fully virtual product development
  • 2018 ASCR Leadership Computing Challenge (ALCC) award
  • Special thanks to Blue Waters SEAS team for great technical support!