[PPT] - Outline PowerPoint Presentation, free download

SLIDE 1

2110412 Parallel Comp Arch Parallel Programming Paradigm

Natawut Nupairoj, Ph.D. Department of Computer Engineering, Chulalongkorn University

Outline

!"

What are the factors for parallel programming paradigm?

#$ % &

#'()*+

!"

#'()*+

Generic Parallel Architecture

M
M
M
M

,$$$-

./ $

!"

SLIDE 2

Flynn’s Taxonomy

0$& 1

2& 2& (#.#+

(#.+

(#.+ (#.#+ (.+

SISD

' &

$

P M

I, D

3 45

"

SIMD

(+

P M

D

P M

D

P M

D

Ctrl

I

45

1")"''67

P M

D

P M

D

MISD

&.# #$$$

P

D

P

D

P

D D I I I

SLIDE 3

MIMD

5 )

$

P

M

I, D

P M

I, D

N E

45

.8#

#6.* "5"$

" '""7

P M P M

I, D

P M

I, D

E T W O R K

Parallelism

1$

9

1:$&

;

"22

Data Dependence Graph

$&& 5

45/

45/

4&8

1/22&/8 1/8/

1/&

2&$

Parallelism Structure

SLIDE 4

Example

,/$<

// "//:/ &&$$& 2/2&

Example: Dependency Graph

,$&=-

!""

Functional Parallelism

$&&

&&(+

0$&&
0$&&

2

*=

!""
Data Parallelism

$

&&

"2$

4$

4$

">>>

!""

4$

4$

SLIDE 5

Sample Algorithm

for i := 0 to 99 do a[i] := b[i] + c[i] endfor for i := 1 to 99 do a[i] := a[i-1] + c[i] endfor endfor for i := 1 to 99 do for j := 0 to 99 do a[i,j] := a[i-1,j] + c[i,j] endfor endfor

Pipelining

.5 /?@ 45$$ /(+ >>> >>>

(&3+

SLIDE 6

Pipelining Lessons

3

&/

.!!&

/ /

/

$ &&

A

#

Example

/&>>> 2&

<& 42

;

Pipelining in Modern Processor

1?'@

12& $>>>

.$&B .$&B Fetch Decode Operand Fetch Execute Store

SLIDE 7

Pipelining Execution

1 1 2 3 4 5 6 7 . F D O E S . F D O E S .: F D O E S

Performance of Pipeline

,- #5 '

"7

"/AC *9(/A+ '"7B '"7B

A)$5C$)5

A

&"7B

A)$5($)5 D!

$+ A =

Nothing is perfect !!!

22 3/&5

1 1 2 3 4 5 6 7 8 . F D O E S .BEF F D O E S .F F D O E S

Stalled Pipe

,?@ 8-

#2 $ '$5 '$5 . "5

12

SLIDE 8

Vector Processing

G

&&(/

?@+

45B#FH(FH+2

for i := 0 to 63 do for i := 0 to 63 do Y[i] := a*X[i] + Y[i] endfor

Vector Processing

LV V1,R1 ; R1 contains based address for “X[]” LV V2,R2 ; R2 contains based address for “Y[]” ADDSV V3,R3,V1 ; aX -- R3 contains the value of “a” ADDV V1,V3,V2 ; aX + Y SV R2,V1 ; write back to “Y[*]”

;$2/2

Level of Parallelism

<&&2$9(

$+

0$'&'('.<+ ;'('+ '('+

"'(/'+

7$2&&

2$9

Level of Parallelism

!"

SLIDE 9

Parallel Programming Models

#.#' #.' ' .' &)' #' "2&.#.

!"

Parallel Programming Models

*( *-

"

*(&$9

!"

Example

62

∑

− = 1

]) [ (

n k

k A f

∑

− + 1

]) [ (

m j

k A f

)2&

4&(I/J+& 2

∑

=

]) [ (

j k

k A f

!"

i i X

Y

Model 1: Message Passing

send P0,X recv Pn,Y P P P

i res s

. . .

i res s

X Y n

No shared data
Explicit data transfer (both sender and receiver must call

the send/recv functions

!"

SLIDE 10

Global Sum in Message Passing

partial_sum = for each data A[k] partial_sum += f(A[k]); end for if my_id == then if my_id == then for each proc j (excluding recv(j, psum); global_sum += psum end for else send(proc, partial_sum); end if

!"

Model : Shared Memory

i res s i res s

. . .

x = ... y = ..x ... Address:

Shared

K2 "K$92

(/

#'

P P P

. . .

Private

!"

Global Sum in Shared Memory

Thread 1 [s = 0 initially] local_s1= 0 for i = 0, n/2-1 local_s1 = local_s1 + f(A[i]) Thread 2 [s = 0 initially] local_s2 = 0 for i = n/2, n-1 local_s2= local_s2 + f(A[i]) local_s1 = local_s1 + f(A[i]) s = s + local_s1 local_s2= local_s2 + f(A[i]) s = s +local_s2

What could go wrong?

RACE CONDITION!

Solution? Mutual exclusion with locks

!"

Model : Data Parallel

#.$

#& #& B$ "B22 "B22

A: fA: f sum A = array of all data fA = f(A) s = sum(fA) s:

!"

SLIDE 11

Message Passing vs. Shared Memory

2 5 "5 #$9

#$ #$

#$92$2

!"