rs r rr r - - PowerPoint PPT Presentation

r s r r r r p r
SMART_READER_LITE
LIVE PREVIEW

rs r rr r - - PowerPoint PPT Presentation

rs r rr r Pr rt tt Pr r Pr tt


slide-1
SLIDE 1

✐❚❤r❡❛❞s✿ ❆ ❚❤r❡❛❞✐♥❣ ▲✐❜r❛r② ❢♦r P❛r❛❧❧❡❧ ■♥❝r❡♠❡♥t❛❧ ❈♦♠♣✉t❛t✐♦♥

P❛♣❡r ❘❡❛❞✐♥❣ ●r♦✉♣ Pr❛♠♦❞ ❇❤❛t♦t✐❛ P❡❞r♦ ❋♦♥s❡❝❛ ❯♠✉t ❆✳ ❆❝❛r ❇❥⑧ ♦r♥ ❇✳ ❇r❛♥❞❡♥❜✉r❣ ❘♦❞r✐❣♦ ❘♦❞r✐❣✉❡s Pr❡s❡♥ts✿ ▼❛❦s②♠ P❧❛♥❡t❛ ✵✾✳✵✼✳✷✵✶✺

slide-2
SLIDE 2

❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts

▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥

slide-3
SLIDE 3
  • ♦❛❧s

▼❛❦❡ ✐♥❝r❡♠❡♥t❛❧ ❝♦♠♣✉t❛t✐♦♥s ❡❛s② t♦ ✉s❡✿

■ ❈♦♥✈❡♥✐❡♥t ❢♦r ✉s❡r ■ ▲❡❣❛❝② s✉♣♣♦rt ■ ▲❛♥❣✉❛❣❡ ✐♥❞❡♣❡♥❞❡♥t ■ ◆♦ ♣r♦❣r❛♠♠❡r ✐♥t❡r✈❡♥t✐♦♥ ■ ▼✉❧t✐t❤r❡❛❞❡❞ ❡♥✈✐r♦♥♠❡♥t ■ ❯s❡ ❡①✐st✐♥❣ ❖❙ ❢❛❝✐❧✐t✐❡s ■ ●❡♥❡r✐❝ ♣r♦❣r❛♠ ♠♦❞❡❧ ■ ▲♦✇ ♦✈❡r❤❡❛❞

slide-4
SLIDE 4

❲♦r❦✌♦✇

✶✳ ■♥✐t❛❧ r✉♥ ✷✳ ❇✉✐❧❞ ❈♦♥❝✉rr❡♥t ❉②♥❛♠✐❝ ❉❡♣❡♥❞❡♥❝❡ ●r❛♣❤ ✭❈❉❉●✮ ✸✳ ❙♣❡❝✐❢② ✐♥♣✉t ❝❤❛♥❣❡s ✹✳ ■♥❝r❡♠❡♥t❛❧ r✉♥ ✉s❡s ❝❤❛♥❣❡ ♣r♦♣❛❣❛t✐♦♥ ✺✳ ❯♣❞❛t❡ ❈❉❉●

slide-5
SLIDE 5

❲♦r❦✌♦✇

✶✳ ■♥✐t❛❧ r✉♥ ✷✳ ❇✉✐❧❞ ❈♦♥❝✉rr❡♥t ❉②♥❛♠✐❝ ❉❡♣❡♥❞❡♥❝❡ ●r❛♣❤ ✭❈❉❉●✮ ✸✳ ❙♣❡❝✐❢② ✐♥♣✉t ❝❤❛♥❣❡s ✹✳ ■♥❝r❡♠❡♥t❛❧ r✉♥ ✉s❡s ❝❤❛♥❣❡ ♣r♦♣❛❣❛t✐♦♥ ✺✳ ❯♣❞❛t❡ ❈❉❉●

$ LD PRELOAD=iThreads.so // preload iThreads $./<program executable> <input-file> // initial run $ emacs <input-file> // input modified $ echo "<off> <len>" >> changes.txt // specify changes $./<program executable> <input-file> // incremental run

Figure 1. How to run an executable using iThreads

646

slide-6
SLIDE 6

❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts

▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥

slide-7
SLIDE 7

❙②st❡♠ ♠♦❞❡❧

■ ▼❡♠♦r② ♠♦❞❡❧

■ ❘❡❧❡❛s❡ ❝♦♥s✐st❡♥❝②

■ ❙②♥❝❤r♦♥✐③❛t✐♦♥ ♠♦❞❡❧

■ ♣t❤r❡❛❞s ❆P■

■ ❉❡t❡r♠✐♥✐st✐❝ ❜❡❤❛✈✐♦r

slide-8
SLIDE 8

❚❤✉♥❦

■ ❯♥✐t ♦❢ s❡q✉❡♥t✐❛❧ ❡①❡❝✉t✐♦♥ ■ ❙✉rr♦✉♥❞❡❞ ❜② s②♥❝❤r♦♥✐③❛t✐♦♥ ♦♣❡r❛t✐♦♥s ■ ❙t❛t❡ ■ ❘❡❛❞ ❛♥❞ ✇r✐t❡ s❡ts ■ ❈❛✉s❛❧❧② ♦r❞❡r❡❞ ✭✈❡❝t♦r ❝❧♦❝❦s✮ ■ ❚❤✉♥❦ r❡❝♦♠♣✉t❡❞ ✮ ❆❧❧ t❤✉♥❦s ✐♥ t❤❡ t❤r❡❛❞ r❡❝♦♠♣✉t❡❞

Resolved invalid Resolved valid Pending Enabled Invalid Reused and applied memoized effects Re-executed and modified dirty set Unresolved Resolved

2 1 3 5 4

Figure 4. State transition for thunks during incremental run

650

slide-9
SLIDE 9

❊①❛♠♣❧❡

Sub-computations Case Input Thread schedule Reused Recomputed A x, y*, z T1.a → T2.a → T2.b T2.a T1.a, T2.b B x, y, z (T2.a → T2.b → T1.a)* T2.a T1.a, T2.b C x, y, z T1.a → T2.a → T2.b T1.a, T1.b, T2.a —

Figure 3. For the incremental run, some cases with changed input or thread schedule (changes are marked with *)

647

Thread 1 (T1) Thread 2 (T2) /* T1.a */ lock(); read={y} z = ++y; write={y, z} unlock(); ց lock(); /* T2.a */ x++; read={x} unlock(); write={x} ↓ lock(); /* T2.b */ y = 2*x + z; read={x, z} unlock(); write={y}

Figure 2. An example of shared-memory multithreading

646

slide-10
SLIDE 10

❆r❝❤✐t❡❝t✉r❡

CDDG Application OS Memoizer iThreads library Memory subsystem OS support Recorder / Replayer

Figure 5. iThreads implementation architecture. Shaded boxes represent the main components of the system.

651

slide-11
SLIDE 11

■♠♣❧❡♠❡♥t❛t✐♦♥

■ ❉t❤r❡❛❞s ■ ❙❡♣❛r❛t❡ ❛❞❞r❡ss s♣❛❝❡s ❢♦r t❤r❡❛❞s ■ P❛❣❡ r❡❛❞✴✇r✐t❡ ♣r♦t❡❝t✐♦♥ ■ ❇②t❡✲❧❡✈❡❧ ❞❡❧t❛

Shared address space Thread-1 private address space Thread-2 private address space Shared memory commit Sync Sync Thunk execution Thunk execution Write Write Shared memory commit Thunk execution Thunk execution

Figure 6. Overview of the RC model implementation

slide-12
SLIDE 12

❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts

▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥

slide-13
SLIDE 13

▼❡tr✐❝s

❚✐♠❡ r✉♥t✐♠❡ ♦❢ t❤❡ s❧♦✇❡st t❤r❡❛❞ ❲♦r❦ s✉♠ ♦❢ t❤❡ t♦t❛❧ r✉♥t✐♠❡ ♦❢ ❛❧❧ t❤r❡❛❞s ❇❡♥❝❤♠❛r❦s✿ P❆❘❙❊❈ ❛♥❞ P❤♦❡♥✐①

slide-14
SLIDE 14

0.01 0.1 1 10 100

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Work speedup

Number of threads 12 24 48 64

0.1 1 10

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Time speedup

<0.1

Number of threads 12 24 48 64

653

Figure 7. Performance gains of iThreads with respect to pthreads for the incremental run

653

slide-15
SLIDE 15

❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡

0.1 1 10 100

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Work speedup

Number of threads 12 24 48 64

0.1 1 10

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Time speedup

Number of threads 12 24 48 64

653

Figure 8. Performance gains of iThreads with respect to Dthreads for the incremental run

653

slide-16
SLIDE 16

❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡✱ ❞✐☛❡r❡♥t ✐♥♣✉t s✐③❡s

1 10 100 S M L S M L S M L 1 4 7 10 13

Work speedup Normalized input size

Histogram Linear-reg. String-match Work Input

1 10 100 S M L S M L S M L 1 4 7 10 13

Work speedup Normalized input size

Histogram Linear-reg. String-match

0.5 1 1.5 2 2.5 3 3.5 4 4.5 S M L S M L S M L 1 4 7 10 13

Time speedup Normalized input size

Histogram Linear-reg. String-match Time Input

0.5 1 1.5 2 2.5 3 3.5 4 4.5 S M L S M L S M L 1 4 7 10 13

Time speedup Normalized input size

Histogram Linear-reg. String-match

Figure 9. Scalability with data (work and time speedups)

654

slide-17
SLIDE 17

❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡✱ ❞✐☛❡r❡♥t ✇♦r❦ ❛♠♦✉♥t

2 4 6 8 10 12 14 16

1X2X 4X 8X 16X

Normalized total work Normalized computation size

pthreads Blackscholes iThreads Blackscholes pthreads Swapations iThreads Swapations

Figure 10. Scalability with work

654

slide-18
SLIDE 18

❙❡✈❡r❛❧ ♠♦❞✐☞❡❞ ♣❛❣❡s

0.01 0.1 1 10 100

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Work speedup

<0.01 <0.01 Number of dirty pages 2 4 8 16 32 64 0.1 1 10

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Time speedup

<0.1 <0.1 Number of dirty pages 2 4 8 16 32 64

654 Figure 11. Scalability with input change compared to pthreads for 64 threads 654

slide-19
SLIDE 19

❖✈❡r❤❡❛❞ ♦❢ ✐❚❤r❡❛❞s s②st❡♠ ❞❛t❛

Application Input size Memoized state CDDG Histogram 230400 347 (0.15%) 57 (0.02%) Linear-reg. 132436 192 (0.14%) 33 (0.02%) Kmeans 586 1145 (195.39%) 27 (4.61%) Matrix-mul. 41609 4162 (10.00%) 64 (0.15%) Swapations 143 1473 (1030.07%) 1 (0.70%) Blackscholes 155 201 (129.68%) 1 (0.65%) String match 132436 128 (0.10%) 33 (0.02%) PCA 140625 3777 (2.69%) 43 (0.03%) Canneal 9 15381 (170900.00%) 4 (44.44%) Word count 12811 10191 (79.55%) 24 (0.19%) Rev-index 359 260679 (72612.53%) 64 (17.83%)

Table 1. Space overheads in pages and input percentage

654

slide-20
SLIDE 20

■♥✐t✐❛❧ r✉♥ ♦✈❡r❤❡❛❞

0.1 1 10 100 1000

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Work overhead

Number of threads 12 24 48 64

0.1 1 10 100 1000

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Time overhead

Number of threads 12 24 48 64

655

Figure 12. Performance overheads of iThreads with respect to pthreads for the initial run

655

slide-21
SLIDE 21

■♥✐t✐❛❧ r✉♥ ♦✈❡r❤❡❛❞

1 1.05 1.1 1.15 1.2 1.25 1.3

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Work overhead

Histogram = 3.58X maximum

Number of threads 12 24 48 64

1 1.05 1.1 1.15 1.2 1.25 1.3

Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index

Time overhead

Histogram = 3.13X maximum

Number of threads 12 24 48 64

655

Figure 13. Performance overheads of iThreads with respect to Dthreads for the initial run

655

slide-22
SLIDE 22

❈❛s❡✲st✉❞② ❛♣♣❧✐❝❛t✐♦♥s

1 10 100

12 24 48 64

Speedup

Number of threads

Work - Parallel Gzip Time - Parallel Gzip Work - Monte-carlo Time - Monte-carlo

Figure 15. Work & time speedups for case-studies

656

slide-23
SLIDE 23

❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts

▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥

slide-24
SLIDE 24

▲✐♠✐t❛t✐♦♥s

■ ◆♦ s✉♣♣♦rt ❢♦r ❛❞✲❤♦❝ s②♥❝❤r♦♥✐③❛t✐♦♥

■ ◆♦ ❈✰✰ ❛t♦♠✐❝s

■ ◆♦ s✉♣♣♦rt ❢♦r s♠❛❧❧ ❧♦❝❛❧✐③❡❞ ✐♥s❡rt✐♦♥s ■ ❆ss✉♠❡s ❝♦♥st❛♥t ❛♠♦✉♥t ♦❢ t❤r❡❛❞s ■ ▼❛② ❤❛✈❡ s✐❣♥✐☞❝❛♥t ♦✈❡r❤❡❛❞ ■ ◆❛rr♦✇ ❛♣♣❧✐❝❛t✐♦♥ ❛r❡❛

slide-25
SLIDE 25

❖✉t❝♦♠❡

■ ◆✐❝❡ ✐❞❡❛ ■ Pr❛❝t✐❝❛❧ ■ ❚r❛♥s♣❛r❡♥t ■ ❊✍❝✐❡♥t ■ ❲♦r❦s ❢♦r s♦♠❡ ❛♣♣❧✐❝❛t✐♦♥s ■ ❲❛② s✐❣♥✐☞❝❛♥t❧② ❞❡❝r❡❛s❡ r❡q✉✐r❡❞ ✇♦r❦

slide-26
SLIDE 26

❉✐s❝✉ss✐♦♥

■ ❯♥✐ts ❢♦r s❝❛❧❡s ❛r❡ ♥♦t s♣❡❝✐☞❡❞✿

❙♦♠❡t✐♠❡s ♣❡r❝❡♥t❛❣❡✱ s♦♠❡t✐♠❡s t✐♠❡s

■ ■♥t❡r❛❝t✐✈❡ ❛♣♣❧✐❝❛t✐♦♥s ■ ❱❡❝t♦r ❝❧♦❝❦ ❢♦r ❡❛❝❤ t❤✉♥❦ ④ ♥♦t t♦♦ ♠✉❝❤❄ ■ ■❖ ♠❡♠♦r② ④ ❝❛♥ ②♦✉ ❞♦ s♦♠❡t❤✐♥❣❄ ❋♦r ✐♥st❛♥❝❡ ❢r❛♠❡

❜✉☛❡r✳

■ ❈❛♥ ❜❡ ❝♦♠❜✐♥❡❞ ✇✐t❤ ❞②♥❛♠✐❝ ❛❧❣♦r✐t❤♠s❄

slide-27
SLIDE 27

❊①♣❧❛♥❛t✐♦♥ ♦❢ ❉t❤r❡❛❞s ❤✐❣❤ ♦✈❡r❤❡❛❞

0.01 0.1 1 10 100

H i s t

  • g

r a m L i n e a r _ r e g K m e a n s M a t r i x _ m u l S w a p a t i

  • n

s B l a c k s c h

  • l

e s S t r i n g _ m a t c h P C A C a n n e a l W

  • r

d _ c

  • u

n t R e v e r s e _ i n d e x

1 1.05 1.1 1.15 1.2 1.25 1.3

Percentage (%) breakdown Work Overhead

Histogram = 3.58X

Read fault Memoization Work overhead

Figure 14. Work overheads breakdown w.r.t Dthreads

656

slide-28
SLIDE 28

❘❡❧❡❛s❡ ❝♦♥s✐st❡♥❝②

■ ❖❜❥❡❝ts ❛r❡ ❛❝q✉✐r❡❞ ❛♥❞ r❡❧❡❛s❡❞ ■ ❈r✐t✐❝❛❧ s❡❝t✐♦♥ ❜❡t✇❡❡♥ ❛❝q✉✐r❡ ❛♥❞ r❡❧❡❛s❡ ■ ●✉❛r❛♥t❡❡❞ ❝♦rr❡❝t♥❡ss ❛♥❞ ❧✐✈❡♥❡ss ❢♦r ❞❛t❛✲r❛❝❡✲❢r❡❡

♣r♦❣r❛♠s

slide-29
SLIDE 29

❱❡❝t♦r ❝❧♦❝❦s

■ ❯s❡❞ ❢♦r ✐♥✈❛❧✐❞❛t✐♦♥ ♣r♦♣❛❣❛t✐♦♥ ■ ▼❛✐♥t❛✐♥❡❞ ❢♦r✿

■ ❖❜❥❡❝ts ■ ❚❤r❡❛❞s ■ ❚❤✉♥❦s