[PPT] - Sparse Matrix Algorithms and Advanced Topics in FEM MATH 9830 Timo PowerPoint Presentation

SLIDE 1

Sparse Matrix Algorithms and Advanced Topics in FEM MATH 9830 Timo Heister (heister@clemson.edu)

http://www.math.clemson.edu/~heister/math9830-spring2015/

SLIDE 2

Tasks Lab 1

Grab class repo

– Git clone https://github.com/tjhei/spring15_9830 – Compile example: g++ main.cc – Run: ./a.out – Do question 2 from homework 1 – Work on the other tasks listed in the cc

Install deal.II
Run deal.II step 2

– Print sparsity pattern, change refinement – Use cuthill_mckee, other ordering? – Change to dim=3, etc.

SLIDE 3

Tasks Lab 2

Deal.II intro
Work on hw2
Bonus: step 2

– Print sparsity pattern, change refinement – Use cuthill_mckee, other ordering? – Change to dim=3, etc.

SLIDE 4

deal.II

“A Finite Element Differential Equations Analysis Library”
Open source, c++ library
I am one of the four maintainers
One of the most widely used libraries:

– ~600 papers using and citing deal.II – ~600 downloads/month – 100+ people have contributed in the past 10 years – ~600,000 lines of code – 10,000+ pages of documentation

Website: www.dealii.org

SLIDE 5

Features

1d, 2d, 3d computations, adaptive mesh

refinement (on quads/hexas only)

Finite element types:

– Continuous and DG Lagrangian elements – Higher order elements, hp adaptivity – Raviart-Thomas, Nedelec, … – And arbitrary combinations

SLIDE 6

Features, part II

Linear Algebra

– Own sparse and dense library – Interfaces to PETSc, Trilinos, UMFPACK, BLAS, ..

Parallelization

– Multi-threading on multi-core machines – MPI: 16,000+ processors

Output in many visualization file formats

SLIDE 7

Development of deal.II

Professional-level development style
Development in the open, open repository
Mailing lists for users and developers
Test suite with 6,000+ tests after every change
Platform support:

– Linux/Unix – Mac – Work in progress: Windows

SLIDE 8

Installation

How to install from source code, configure,

compile, test, run “step-1”

Ubuntu (or any other linux) or Mac OSX
Steps:

– Detect compilers/dependencies/etc. (cmake) – Compile & install deal.II (make)

SLIDE 9

Prerequisites on Linux

Compiler: GNU g++
Recommended:

$ s u d

a

p t

g

e t i n s t a l l s u b v e r s i

n
p

e n m p i 1 . 6

b

i n

p

e n m p i 1 . 6

c
m

m

n

g + + g f

r

t r a n l i b

p

e n b l a s

d

e v l i b l a p a c k

d

e v z l i b 1 g

d

e v g i t e m a c s g n u p l

t
manually: cmake (in a minute)
Later: eclipse, paraview
Optional manually: visit, p4est, PETSc, Trilinos, hdf5

SLIDE 10

On Mac OS

If OSX 10.9 follow instructions from wiki:

https://github.com/dealii/dealii/wiki/MacOSX

Later manually: eclipse, paraview

SLIDE 11

cmake

Ubuntu 12.04 has a version that is too old
If newer ubuntu do:

$ s u d

a

p t

g

e t i n s t a l l c m a k e … and you are done

Otherwise: install cmake from source or

download the 32bit binary

SLIDE 12

cmake from binary

Do:

e x p

r

t C M A K E V E R = 2 . 8 . 1 2 . 1 w g e t h t t p : / / w w w . c m a k e .

r

g / f i l e s / v 2 . 8 / c m a k e

$

C M A K E V E R

L

i n u x

i

3 8 6 . s h c h m

d

u + x c m a k e

$

C M A K E V E R

L

i n u x

i

3 8 6 . s h . / c m a k e

$

C M A K E V E R

L

i n u x

i

3 8 6 . s h

Answer “q”, yes and yes
Add the bin directory to your path (.bashrc)
You might need

s u d

a

p t

g

e t i n s t a l l i a 3 2

l

i b s

SLIDE 13

Cmake from source

w g e t ¬ h t t p : / / w w w . c m a k e .

r

g / f i l e s / v 2 . 8 / c m a k e

2

. 8 . 1 2 . 1 . t a r . g z t a r x f c m a k e

2

. 8 . 1 1 . 1 . t a r . g z . / c

n

f i g u r e m a k e i n s t a l l

SLIDE 14

Install deal.II

http://www.dealii.org/8.1.0/readme.html
Extract:

t a r x f d e a l . I I

8

. 1 . . t a r . g z

Build directory:

c d d e a l . I I ; m k d i r b u i l d ; c d b u i l d

Configuration:

c m a k e

D

C M A K E _ I N S T A L L _ P R E F I X = / ? / ? . . (where /?/? is your installation directory)

Compile (5-60 minutes):

m a k e

j

X i n s t a l l (where X is the number of cores you have)

Test:

m a k e t e s t (in build directory)

Test part two:

c d e x a m p l e s / s t e p

1

c m a k e

D

D E A L _ I I _ D I R = / ? / ? . m a k e r u n

Recommended layout:

d e a l . I I /

b u i l d < build files i n s t a l l e d < your inst. dir e x a m p l e s < all examples! i n c l u d e s

u

r c e . . .

SLIDE 15

Running examples

In short:

cd examples/step-1 cmake . make run

cmake:

–

Detect configuration, only needs to be run once

–

Input: CMakeLists.txt

–

Output: Makefile, (other files like CmakeCache.txt)

make:

–

Tool to execute commands in Makefile, do every time you change your code

–

Input: step-1.cc, Makefile

–

Output: step-1 (the binary executable file)

Run your program with

./step-1

Or (compile and run):

make run

SLIDE 16

How to create an eclipse project

Run this once in your project:

cmake -G "Eclipse CDT4 - Unix Makefiles" .

Now create a new project in eclipse (“file-

>import->existing project” and select your folder for the project above)

Eclipse intro:

–

http://www.math.tamu.edu/~bangerth/videos.676.7.html

–

http://www.math.tamu.edu/~bangerth/videos.676.8.html

SLIDE 17

Templates in C++

“blueprints” to generate functions and/or classes
Template arguments are either numbers or types
No performance penalty!
Very powerful feature of C++: difficult syntax, ugly

error messages, slow compilation

More info:

http://www.cplusplus.com/doc/tutorial/templates/ http://www.math.tamu.edu/~bangerth/videos.676.12 .html

SLIDE 18

Why used in deal.II?

Write your program once and run in 1d, 2d, 3d:
Also: large parts of the library independent of dimension

DoFHandler<dim>::active_cell_iterator cell = dof_handler.begin_active(), endc = dof_handler.end(); for (; cell!=endc; ++cell) { ... cell_matrix(i,j) += (fe_values.shape_grad (i, q_point) * fe_values.shape_grad (j, q_point) * fe_values.JxW (q_point));

SLIDE 19

Function Templates

Blueprint for a function
One type called “number”
You can use

“typename” or “class”

Sometimes you need to state which function

you want to call:

t e m p l a t e < t y p e n a m e n u m b e r > n u m b e r s q u a r e ( c

n

s t n u m b e r x ) { r e t u r n x * x ; } ; i n t x = 3 ; i n t y = s q u a r e ( x ) ; t e m p l a t e < t y p e n a m e T > v

i

d y e l l ( ) { T t e s t ; t e s t . s h

u

t ( “ H I ! ” ) ; } ; / / c a t i s a c l a s s t h a t h a s s h

u

t ( ) y e l l < c a t > ( ) ;

SLIDE 20

Value Templates

Template arguments can also be values (like

int) instead of types:

Of course this would have worked here too:

t e m p l a t e v

i

d m a k e _ g r i d ( T r i a n g u l a t i

n

< d i m > & t r i a n g u l a t i

n

) { … } T r i a n g u l a t i

n

< 2 > t r i a ; m a k e _ g r i d < 2 > ( t r i a ) ; t e m p l a t e < t y p e n a m e T > v

i

d m a k e _ g r i d ( T & t r i a n g u l a t i

n

) { … / / n

w

w e c a n n

t

a c c e s s “ d i m ” t h

u

g h

SLIDE 21

Class templates

Whole classes from a blueprint
Same idea:

t e m p l a t e c l a s s P

i

n t { d

u

b l e e l e m e n t s [ d i m ] ; / / . . . } P

i

n t < 2 > a _ p

i

n t ; P

i

n t < 5 > d i f f e r e n t _ p

i

n t ; n a m e s p a c e s t d { t e m p l a t e < t y p e n a m e n u m b e r > c l a s s v e c t

r

; } s t d : : v e c t

r

l i s t _

f

_ i n t s ; s t d : : v e c t

r

< c a t > c a t s ;

SLIDE 22

Example

Value of N known at compile time
Compiler can optimize (unroll loop)
Fixed size arrays faster than dynamic

(dealii::Point<dim> vs dealii::Vector<double>)

t e m p l a t e d

u

b l e n

r

m ( c

n

s t P

i

n t < N > & p ) { d

u

b l e t m p = ; f

r

( u n s i g n e d i n t i = ; i < N ; + + i ) t m p + = s q u a r e ( v . e l e m e n t s [ i ] ) ; r e t u r n s q r t ( t m p ) ; } ;

SLIDE 23

Examples in deal.II

Step-4:

t e m p l a t e v

i

d m a k e _ g r i d ( T r i a n g u l a t i

n

< d i m > & t r i a n g u l a t i

n

) { . . . }

So that we can use Vector<double> and Vector<float>:

t e m p l a t e < t y p e n a m e n u m b e r > c l a s s V e c t

r

< n u m b e r > { n u m b e r [ ] e l e m e n t s ; . . . } ;

Default values (embed dim-dimensional object in spacedim):

t e m p l a t e c l a s s T r i a n g u l a t i

n

< d i m , s p a c e d i m > { . . . } ;

Already familiar:

t e m p l a t e v

i

d G r i d G e n e r a t

r

: : h y p e r _ c u b e ( T r i a n g u l a t i

n

< d i m , s p a c e d i m > & t r i a , c

n

s t d

u

b l e l e f t , c

n

s t d

u

b l e r i g h t ) { . . . }

SLIDE 24

Explicit Specialization

different blueprint for a specific type T or value

/ / s t

r

e s

m

e i n f

r

m a t i

n

/ / a b

u

t a T r i a n g u l a t i

n

:

t

e m p l a t e s t r u c t N u m b e r C a c h e { } ; t e m p l a t e < > s t r u c t N u m b e r C a c h e < 1 > { u n s i g n e d i n t n _ l e v e l s ; u n s i g n e d i n t n _ l i n e s ; } ; t e m p l a t e < > s t r u c t N u m b e r C a c h e < 2 > { u n s i g n e d i n t n _ l e v e l s ; u n s i g n e d i n t n _ l i n e s ; u n s i g n e d i n t n _ q u a d s ; } / / m

r

e c l e v e r : t e m p l a t e < > s t r u c t N u m b e r C a c h e < 2 > : p u b l i c N u m b e r C a c h e < 1 > { u n s i g n e d i n t n _ q u a d s ; }

SLIDE 25

step-4

Dimension independent Laplace problem
Triangulation<2>, DoFHandler<2>, …

replaced by Triangulation<dim>, DoFHandler<dim>, …

Template class:

t e m p l a t e c l a s s S t e p 4 { } ;

SLIDE 26

Parallel Computing: Introduction

A modern CPU: Intel Core i7

SLIDE 27

Basics

Single cores are not

getting (much) faster

“the free lunch is over”:

http://www.gotw.ca/publi cations/concurrency-ddj. htm

Concurrency is only
ption:

– SIMD/vector instructions – Several cores – Several chips in one node – Combine nodes into

supercomputer

SLIDE 28

Hierarchy of memory

Latency: time CPU gets data after requesting
Bandwidth: how much data per second?
prefetching of data, “cache misses” are expensive
automatically managed by processor

SLIDE 29

https://gist.github.com/hellerbarde/2843375

SLIDE 30

Amdahl's Law

Task: serial fraction s, parallel fraction p=1-s
N workers (whatever that means)
Runtime: T(N) = (1-s)T(1)/N + sT(1)
Speedup T(1)/T(N), N to infinity:

max_speedup = 1/ s

http://en.wikipedia.org/wiki/Amdahl%27s_law
Reality: T(N) = (1-s)T(1)/N + sT(1) + aN + bN^2

SLIDE 31

Summary

Computing much faster than memory access
Parallel computing required: no free lunch!
Communication is serial fraction (or worse

when increasing with N!)

Communication in Amdahl's law is main

challenge in parallel computing

SLIDE 32

Multithreading

Idea: call functions in separate background thread and continue

immediately

All threads can access the same memory (dangerous, but easy)
Can spawn arbitrary many threads, scheduled by operating

system (pthreads, CreateThread, …)

Wrapper for c++: std::thread, boost::thread

– See demos

Higher level libraries (later):

– OpenMP (parallelize loops, tasks, ...) – TBB = Intel Threading Building Blocks (tasks, algorithms) – deal.II wraps TBB in a very easy task based interface

SLIDE 33

Multithreading limits

One machine only (no cluster) => MPI
Wrong abstraction:

– Creating threads is expensive – How many to create? – How to split work? – Reuse them? – Synchronization difficult

Better: task based library based on threads (later)

SLIDE 34

MPI

Need MPI library and compile with compiler wrappers (mpicxx

instead of g++) or use a cmake script that does that

Need to run with (for 4 processes):

mpirun -n 4 ./main

Reference http://mpi.deino.net/mpi_functions/
Most basic program:

#include <mpi.h> int main(int argc, char **argv) { MPI_Init(&argc, &argv); int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Finalize(); }

SLIDE 35

MPI_Send and MPI_Recv

Send and receive a message from <source> to <dest> of

<count> elements

<comm> is always MPI_COMM_WORLD for us
<tag> can be any integer buts needs to be the same
<status> can be MPI_STATUS_IGNORE if you don't care
Some examples for <datatype>: MPI_INT, MPI_DOUBLE
<source> can be MPI_ANY_SOURCE if you want to receive

any message

int MPI_Recv( void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status); int MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);

SLIDE 36

MPI_Barrier

Code:

MPI_Barrier(MPI_Comm comm)

Waits until all ranks arrived at this line, then all

continue

SLIDE 37

MPI_Probe

Wait until a matching message arrives an return info about it in <status>

int MPI_Probe( int source, int tag, MPI_Comm comm, MPI_Status *status );

<source> source rank or MPI_ANY_SOURCE
<tag> tag value or MPI_ANY_TAG
<status> contains:

– .MPI_SOURCE the source rank – .MPI_TAG the tag

You can get information about the size of the message using MPI_Get_count()

SLIDE 38

MPI_Bcast

Send the same data from <root> to all
Result: r

a n k [ j ] . b u f f e r [ i ] = r a n k [ r

t

] . b u f f e r [ i ]

int MPI_Bcast( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )

SLIDE 39

MPI_Reduce

Combine elements from <sendbuf> using <op> from all

ranks and store in <recvbuf> on <root>

MPI_OP examples: MPI_SUM, MPI_MIN, MPI_MAX, ...
Result:

r a n k [ r

t

] . r e c v b u f [ i ] =

p

( r a n k [ ] . s e n d b u f [ i ] , … , r a n k [ n

1

] . s e n d b u f [ i ] ) int MPI_Reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm );

SLIDE 40

MPI_AllReduce

Combine elements from <sendbuf> using <op> from all

ranks and store in <recvbuf> on all ranks

Like reduce, but result is available everywhere:

rank[j].recvbuf[i]=op(rank[0].sendbuf[i], …, rank[n-1].sendbuf[i]) int MPI_Allreduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm );

SLIDE 41

MPI_Gather

Collect data from all ranks at <root>

int MPI_Gather( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm );

SLIDE 42

MPI_AllGather

Collect data from all ranks at every rank
(like Gather but copied to everyone)

int MPI_Allgather( void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm );

SLIDE 43

MPI_Scatter

Send different data from <root> to all
<sendbuf> is only used on <root>

int MPI_Scatter( void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm );

SLIDE 44

Weak/Strong Scaling

Weak:

–

Fixed problem size per CPU

–

Increase CPUs

–

Optimal: constant time

Strong:

–

Fixed total problem size

–

Increase CPUs

–

Optimal: linear decrease

SLIDE 45

SLIDE 46

SLIDE 47

SLIDE 48

SLIDE 49

The following slides are for the future....

SLIDE 50

MPI vs. Multithreading

SLIDE 51

Boundary conditions

SLIDE 52

Adaptive Mesh Refinement

Typical loop:

– Solve – Estimate – Mark – Refine/coarsen

Estimate is problem dependent:

– Approximate gradient jumps: K

e l l y E r r

r

E s t i m a t

r

class

– Approximate local norm of gradient: D

e r i v a t i v e A p p r

x

i m a t i

n

class

– Or something else

Mark:

G r i d R e f i n e m e n t : : r e f i n e _ a n d _ c

a

r s e n _ f i x e d _ n u m b e r ( . . . )

r

G r i d R e f i n e m e n t : : r e f i n e _ a n d _ c

a

r s e n _ f i x e d _ f r a c t i

n

( . . . )

Refine/coarsen:

– t

r i a n g u l a t i

n

. e x e c u t e _ c

a

r s e n i n g _ a n d _ r e f i n e m e n t ( )

– Transferring the solution: S

l

u t i

n

T r a n s f e r class (maybe discussed later)

SLIDE 53

Constraints

xi=∑j αij x j+c j

Used for hanging nodes (and other things!)
Have the form:
Represented by class ConstraintMatrix
Created using D
F

T

l

s : : m a k e _ h a n g i n g _ n

d

e _ c

n

s t r a i n t s ( )

Will also use for boundary values from now on:

V e c t

r

T

l

s : : i n t e r p

l

a t e _ b

u

n d a r y _ v a l u e s ( . . . , c

n

s t r a i n t s ) ;

Need different SparsityPattern (see step-6):

D

F

T

l

s : : m a k e _ s p a r s i t y _ p a t t e r n ( . . . , c

n

s t r a i n t s , . . . )

SLIDE 54

Constraints II

Old approach (explained in video):

– Assemble global matrix – Then eliminate rows/columns: C

n

s t r a i n t M a t r i x : : c

n

d e n s e ( . . . ) (similar to M a t r i x T

l

s : : a p p l y _ b

u

n d a r y _ v a l u e s ( ) in step-3)

– Solve and then set all constraint values correctly: C

n

s t r a i n t M a t r i x : : d i s t r i b u t e ( . . . )

New approach (step-6):

– Assemble local matrix as normal – Eliminate while transferring to global matrix:

c

n

s t r a i n t s . d i s t r i b u t e _ l

c

a l _ t

_

g l

b

a l ( c e l l _ m a t r i x , c e l l _ r h s , l

c

a l _ d

f

_ i n d i c e s , s y s t e m _ m a t r i x , s y s t e m _ r h s ) ;

– Solve and then set all constraint values correctly: C

n

s t r a i n t M a t r i x : : d i s t r i b u t e ( . . . )

SLIDE 55

Vector Values Problems

(video 19&20)
FESystem: list of FEs (can be nested!)
Will give one FE with N shape functions
Use FEValuesExtractors to do

f e _ v a l u e s [ v e l

c

i t i e s ] . d i v e r g e n c e ( i , q ) , . . .

Ordering of DoFs in system matrix is independent
See module “handling vector valued problems”
Non-primitive elements (see fe.is_primitive()):

shape functions have more than one non-zero component, example: RT

SLIDE 56

Computing Errors

Important for verification!
See step-7 for an example
Set up problem with analytical solution and implement it as a Function<dim>
Quantities or interest:
Break it down as one operation per cell and the “summation” (local and global error)
Need quadrature to compute integrals

∥e∥0=∥e∥L2=(∑

K

∥e∥

0, K 2

)

1/2

∣e∣1=∣e∣H 1=∥∇ e∥0=(∑

K

∥∇ e∥0, K

2

)

1/2

∥e∥1=∥e∥H 1=(∣e∣1

2 +∥e∥0 2 ) 1/2=(∑ K

∥e∥1, K

2

)

1/2

∥e∥0,K=(∫K∣e∣

2) 1/2

e=u−uh

SLIDE 57

Computing Errors

Example:

V e c t

r

< f l

a

t > d i f f e r e n c e _ p e r _ c e l l ( t r i a n g u l a t i

n

. n _ a c t i v e _ c e l l s ( ) ) ; V e c t

r

T

l

s : : i n t e g r a t e _ d i f f e r e n c e ( d

f

_ h a n d l e r , s

l

u t i

n

, / / s

l

u t i

n

v e c t

r

S

l

u t i

n

< d i m > ( ) , / / r e f e r e n c e s

l

u t i

n

d i f f e r e n c e _ p e r _ c e l l , Q G a u s s < d i m > ( 3 ) , / / q u a d r a t u r e V e c t

r

T

l

s : : L 2 _ n

r

m ) ; / / l

c

a l n

r

m c

n

s t d

u

b l e L 2 _ e r r

r

= d i f f e r e n c e _ p e r _ c e l l . l 2 _ n

r

m ( ) ; / / g l

b

a l n

r

m

Local norms:

m e a n , L 1 _ n

r

m , L 2 _ n

r

m , L i n f t y _ n

r

m , H 1 _ s e m i n

r

m , H 1 _ n

r

m , . . .

Global norms are vector norms: l

1 _ n

r

m ( ) , l 2 _ n

r

m ( ) , l i n f t y _ n

r

m ( ) , . . .

SLIDE 58

ParameterHandler

Control program at runtime without recompilation
You can put in:

ints (e.g. number of refinements), doubles (e.g. coefficients, time step size), strings (e.g. choice for algorithm/mesh/problem/etc.), functions (e.g. right- hand side, reference solution)

Stuff can be grouped in sections
See class-repository: prm/

SLIDE 59

ParameterHandler

# order of the finite element to use. set fe order = 1 # Refinement method. Choice between 'global' and 'adaptive'. set refinement = global subsection equation # expression for the reference solution and boundary values. Function of x,y (and z) set reference = sin(pi*x)*cos(pi*y) # expression for the gradient of the reference solution. Function of x,y (and z) set gradient = pi*cos(pi*x)*cos(pi*y); -pi*sin(pi*x)*sin(pi*y) # expression for the right-hand side. Function of x,y (and z) set rhs = 2*pi*pi*sin(pi*x)*cos(pi*y) + sin(pi*x)*cos(pi*y) end

Sparse Matrix Algorithms and Advanced Topics in FEM MATH 9830 Timo Heister (heister@clemson.edu)

http://www.math.clemson.edu/~heister/math9830-spring2015/

Tasks Lab 1

Tasks Lab 2

deal.II

Features

refinement (on quads/hexas only)

Features, part II

Development of deal.II

Installation

compile, test, run “step-1”

Prerequisites on Linux

$ s u d

p t

e t i n s t a l l s u b v e r s i

e n m p i 1 . 6

i n

e n m p i 1 . 6

m

g + + g f

t r a n l i b

e n b l a s

e v l i b l a p a c k

e v z l i b 1 g

e v g i t e m a c s g n u p l

On Mac OS

https://github.com/dealii/dealii/wiki/MacOSX

cmake

$ s u d

p t

e t i n s t a l l c m a k e … and you are done

download the 32bit binary

cmake from binary

s u d

p t

e t i n s t a l l i a 3 2

i b s

Cmake from source

Install deal.II

Running examples

How to create an eclipse project

cmake -G "Eclipse CDT4 - Unix Makefiles" .

>import->existing project” and select your folder for the project above)

Templates in C++

error messages, slow compilation

http://www.cplusplus.com/doc/tutorial/templates/ http://www.math.tamu.edu/~bangerth/videos.676.12 .html

Why used in deal.II?

Function Templates

“typename” or “class”

you want to call:

Value Templates

int) instead of types:

Class templates

Example

(dealii::Point<dim> vs dealii::Vector<double>)

Examples in deal.II

Explicit Specialization

step-4

replaced by Triangulation<dim>, DoFHandler<dim>, …

t e m p l a t e < i n t d i m > c l a s s S t e p 4 { } ;

Parallel Computing: Introduction

A modern CPU: Intel Core i7

Basics

getting (much) faster

http://www.gotw.ca/publi cations/concurrency-ddj. htm

supercomputer

Hierarchy of memory

Amdahl's Law

max_speedup = 1/ s

Summary

when increasing with N!)

challenge in parallel computing

Multithreading

immediately

system (pthreads, CreateThread, …)

Multithreading limits

MPI

instead of g++) or use a cmake script that does that

mpirun -n 4 ./main

MPI_Send and MPI_Recv