Outline - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline - - PowerPoint PPT Presentation

Outline


slide-1
SLIDE 1

2110412 Parallel Comp Arch Parallel Programming Paradigm

Natawut Nupairoj, Ph.D. Department of Computer Engineering, Chulalongkorn University

Outline

  • !"

What are the factors for parallel programming paradigm?

#$ % &

#'()*+

  • !"

#'()*+

Generic Parallel Architecture

  • M
  • M
  • M
  • M

,$$$-

./ $

  • !"
slide-2
SLIDE 2

Flynn’s Taxonomy

0$& 1

2& 2& (#.#+

(#.+

(#.+ (#.#+ (.+

SISD

' &

$

P M

I, D

3 45

"

SIMD

(+

P M

D

P M

D

P M

D

Ctrl

I

45

1")"''67

P M

D

P M

D

MISD

&.# #$$$

P

D

P

D

P

D D I I I

slide-3
SLIDE 3

MIMD

5 )

$

  • P

M

I, D

P M

I, D

N E

  • 45

.8#

#6.* "5"$

" '""7

P M P M

I, D

P M

I, D

E T W O R K

Parallelism

1$

9

1:$&

;

"22

Data Dependence Graph

$&& 5

45/

  • 45/

4&8

1/22&/8 1/8/

1/&

2&$

  • Parallelism Structure
slide-4
SLIDE 4

Example

,/$<

// "//:/ &&$$& 2/2&

Example: Dependency Graph

  • ,$&=-

!""

  • Functional Parallelism

$&&

&&(+

  • 0$&&
  • 0$&&

2

*=

  • !""
  • Data Parallelism

$

&&

"2$

  • 4$

4$

  • ">>>

!""

  • 4$

4$

slide-5
SLIDE 5

Sample Algorithm

for i := 0 to 99 do a[i] := b[i] + c[i] endfor for i := 1 to 99 do a[i] := a[i-1] + c[i] endfor endfor for i := 1 to 99 do for j := 0 to 99 do a[i,j] := a[i-1,j] + c[i,j] endfor endfor

Pipelining

.5 /?@ 45$$ /(+ >>> >>>

(&3+

slide-6
SLIDE 6

Pipelining Lessons

3

&/

.!!&

/ /

/

$ &&

A

#

Example

/&>>> 2&

<& 42

;

;

;

Pipelining in Modern Processor

1?'@

12& $>>>

.$&B .$&B Fetch Decode Operand Fetch Execute Store

slide-7
SLIDE 7

Pipelining Execution

1 1 2 3 4 5 6 7 . F D O E S . F D O E S .: F D O E S

Performance of Pipeline

,- #5 '

"7

"/AC *9(/A+ '"7B '"7B

A)$5C$)5

A

&"7B

A)$5($)5 D!

$+ A =

Nothing is perfect !!!

22 3/&5

1 1 2 3 4 5 6 7 8 . F D O E S .BEF F D O E S .F F D O E S

  • Stalled Pipe

,?@ 8-

#2 $ '$5 '$5 . "5

12

slide-8
SLIDE 8

Vector Processing

G

&&(/

?@+

45B#FH(FH+2

for i := 0 to 63 do for i := 0 to 63 do Y[i] := a*X[i] + Y[i] endfor

Vector Processing

LV V1,R1 ; R1 contains based address for “X[*]” LV V2,R2 ; R2 contains based address for “Y[*]” ADDSV V3,R3,V1 ; a*X -- R3 contains the value of “a” ADDV V1,V3,V2 ; a*X + Y SV R2,V1 ; write back to “Y[*]”

;$2/2

;$2/2

Level of Parallelism

<&&2$9(

$+

0$'&'('.<+ ;'('+ '('+

"'(/'+

"'(/'+

7$2&&

2$9

Level of Parallelism

  • !"
slide-9
SLIDE 9

Parallel Programming Models

#.#' #.' ' .' &)' #' "2&.#.

  • !"

Parallel Programming Models

*( *-

"

"

*(&$9

  • !"

Example

62

− = 1

]) [ (

n k

k A f

− + 1

]) [ (

m j

k A f

)2&

4&(I/J+& 2

=

]) [ (

j k

k A f

  • !"

i i X

Y

Model 1: Message Passing

send P0,X recv Pn,Y P P P

i res s

. . .

i res s

X Y n

  • No shared data
  • Explicit data transfer (both sender and receiver must call

the send/recv functions

  • !"
slide-10
SLIDE 10

Global Sum in Message Passing

partial_sum = for each data A[k] partial_sum += f(A[k]); end for if my_id == then if my_id == then for each proc j (excluding recv(j, psum); global_sum += psum end for else send(proc, partial_sum); end if

  • !"

Model : Shared Memory

i res s i res s

. . .

x = ... y = ..x ... Address:

Shared

K2 "K$92

(/

#'

P P P

. . .

Private

  • !"

Global Sum in Shared Memory

Thread 1 [s = 0 initially] local_s1= 0 for i = 0, n/2-1 local_s1 = local_s1 + f(A[i]) Thread 2 [s = 0 initially] local_s2 = 0 for i = n/2, n-1 local_s2= local_s2 + f(A[i]) local_s1 = local_s1 + f(A[i]) s = s + local_s1 local_s2= local_s2 + f(A[i]) s = s +local_s2

What could go wrong?

RACE CONDITION!

Solution? Mutual exclusion with locks

  • !"

Model : Data Parallel

#.$

#& #& B$ "B22 "B22

A: fA: f sum A = array of all data fA = f(A) s = sum(fA) s:

  • !"
slide-11
SLIDE 11

Message Passing vs. Shared Memory

2 5 "5 #$9

#$ #$

#$92$2

  • !"