rs r rr r - - PowerPoint PPT Presentation
rs r rr r - - PowerPoint PPT Presentation
rs r rr r Pr rt tt Pr r Pr tt
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts
▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
- ♦❛❧s
▼❛❦❡ ✐♥❝r❡♠❡♥t❛❧ ❝♦♠♣✉t❛t✐♦♥s ❡❛s② t♦ ✉s❡✿
■ ❈♦♥✈❡♥✐❡♥t ❢♦r ✉s❡r ■ ▲❡❣❛❝② s✉♣♣♦rt ■ ▲❛♥❣✉❛❣❡ ✐♥❞❡♣❡♥❞❡♥t ■ ◆♦ ♣r♦❣r❛♠♠❡r ✐♥t❡r✈❡♥t✐♦♥ ■ ▼✉❧t✐t❤r❡❛❞❡❞ ❡♥✈✐r♦♥♠❡♥t ■ ❯s❡ ❡①✐st✐♥❣ ❖❙ ❢❛❝✐❧✐t✐❡s ■ ●❡♥❡r✐❝ ♣r♦❣r❛♠ ♠♦❞❡❧ ■ ▲♦✇ ♦✈❡r❤❡❛❞
❲♦r❦✌♦✇
✶✳ ■♥✐t❛❧ r✉♥ ✷✳ ❇✉✐❧❞ ❈♦♥❝✉rr❡♥t ❉②♥❛♠✐❝ ❉❡♣❡♥❞❡♥❝❡ ●r❛♣❤ ✭❈❉❉●✮ ✸✳ ❙♣❡❝✐❢② ✐♥♣✉t ❝❤❛♥❣❡s ✹✳ ■♥❝r❡♠❡♥t❛❧ r✉♥ ✉s❡s ❝❤❛♥❣❡ ♣r♦♣❛❣❛t✐♦♥ ✺✳ ❯♣❞❛t❡ ❈❉❉●
❲♦r❦✌♦✇
✶✳ ■♥✐t❛❧ r✉♥ ✷✳ ❇✉✐❧❞ ❈♦♥❝✉rr❡♥t ❉②♥❛♠✐❝ ❉❡♣❡♥❞❡♥❝❡ ●r❛♣❤ ✭❈❉❉●✮ ✸✳ ❙♣❡❝✐❢② ✐♥♣✉t ❝❤❛♥❣❡s ✹✳ ■♥❝r❡♠❡♥t❛❧ r✉♥ ✉s❡s ❝❤❛♥❣❡ ♣r♦♣❛❣❛t✐♦♥ ✺✳ ❯♣❞❛t❡ ❈❉❉●
$ LD PRELOAD=iThreads.so // preload iThreads $./<program executable> <input-file> // initial run $ emacs <input-file> // input modified $ echo "<off> <len>" >> changes.txt // specify changes $./<program executable> <input-file> // incremental run
Figure 1. How to run an executable using iThreads
646
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts
▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
❙②st❡♠ ♠♦❞❡❧
■ ▼❡♠♦r② ♠♦❞❡❧
■ ❘❡❧❡❛s❡ ❝♦♥s✐st❡♥❝②
■ ❙②♥❝❤r♦♥✐③❛t✐♦♥ ♠♦❞❡❧
■ ♣t❤r❡❛❞s ❆P■
■ ❉❡t❡r♠✐♥✐st✐❝ ❜❡❤❛✈✐♦r
❚❤✉♥❦
■ ❯♥✐t ♦❢ s❡q✉❡♥t✐❛❧ ❡①❡❝✉t✐♦♥ ■ ❙✉rr♦✉♥❞❡❞ ❜② s②♥❝❤r♦♥✐③❛t✐♦♥ ♦♣❡r❛t✐♦♥s ■ ❙t❛t❡ ■ ❘❡❛❞ ❛♥❞ ✇r✐t❡ s❡ts ■ ❈❛✉s❛❧❧② ♦r❞❡r❡❞ ✭✈❡❝t♦r ❝❧♦❝❦s✮ ■ ❚❤✉♥❦ r❡❝♦♠♣✉t❡❞ ✮ ❆❧❧ t❤✉♥❦s ✐♥ t❤❡ t❤r❡❛❞ r❡❝♦♠♣✉t❡❞
Resolved invalid Resolved valid Pending Enabled Invalid Reused and applied memoized effects Re-executed and modified dirty set Unresolved Resolved
2 1 3 5 4
Figure 4. State transition for thunks during incremental run
650
❊①❛♠♣❧❡
Sub-computations Case Input Thread schedule Reused Recomputed A x, y*, z T1.a → T2.a → T2.b T2.a T1.a, T2.b B x, y, z (T2.a → T2.b → T1.a)* T2.a T1.a, T2.b C x, y, z T1.a → T2.a → T2.b T1.a, T1.b, T2.a —
Figure 3. For the incremental run, some cases with changed input or thread schedule (changes are marked with *)
647
Thread 1 (T1) Thread 2 (T2) /* T1.a */ lock(); read={y} z = ++y; write={y, z} unlock(); ց lock(); /* T2.a */ x++; read={x} unlock(); write={x} ↓ lock(); /* T2.b */ y = 2*x + z; read={x, z} unlock(); write={y}
Figure 2. An example of shared-memory multithreading
646
❆r❝❤✐t❡❝t✉r❡
CDDG Application OS Memoizer iThreads library Memory subsystem OS support Recorder / Replayer
Figure 5. iThreads implementation architecture. Shaded boxes represent the main components of the system.
651
■♠♣❧❡♠❡♥t❛t✐♦♥
■ ❉t❤r❡❛❞s ■ ❙❡♣❛r❛t❡ ❛❞❞r❡ss s♣❛❝❡s ❢♦r t❤r❡❛❞s ■ P❛❣❡ r❡❛❞✴✇r✐t❡ ♣r♦t❡❝t✐♦♥ ■ ❇②t❡✲❧❡✈❡❧ ❞❡❧t❛
Shared address space Thread-1 private address space Thread-2 private address space Shared memory commit Sync Sync Thunk execution Thunk execution Write Write Shared memory commit Thunk execution Thunk execution
Figure 6. Overview of the RC model implementation
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts
▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
▼❡tr✐❝s
❚✐♠❡ r✉♥t✐♠❡ ♦❢ t❤❡ s❧♦✇❡st t❤r❡❛❞ ❲♦r❦ s✉♠ ♦❢ t❤❡ t♦t❛❧ r✉♥t✐♠❡ ♦❢ ❛❧❧ t❤r❡❛❞s ❇❡♥❝❤♠❛r❦s✿ P❆❘❙❊❈ ❛♥❞ P❤♦❡♥✐①
0.01 0.1 1 10 100
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Work speedup
Number of threads 12 24 48 64
0.1 1 10
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Time speedup
<0.1
Number of threads 12 24 48 64
653
Figure 7. Performance gains of iThreads with respect to pthreads for the incremental run
653
❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡
0.1 1 10 100
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Work speedup
Number of threads 12 24 48 64
0.1 1 10
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Time speedup
Number of threads 12 24 48 64
653
Figure 8. Performance gains of iThreads with respect to Dthreads for the incremental run
653
❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡✱ ❞✐☛❡r❡♥t ✐♥♣✉t s✐③❡s
1 10 100 S M L S M L S M L 1 4 7 10 13
Work speedup Normalized input size
Histogram Linear-reg. String-match Work Input
1 10 100 S M L S M L S M L 1 4 7 10 13
Work speedup Normalized input size
Histogram Linear-reg. String-match
0.5 1 1.5 2 2.5 3 3.5 4 4.5 S M L S M L S M L 1 4 7 10 13
Time speedup Normalized input size
Histogram Linear-reg. String-match Time Input
0.5 1 1.5 2 2.5 3 3.5 4 4.5 S M L S M L S M L 1 4 7 10 13
Time speedup Normalized input size
Histogram Linear-reg. String-match
Figure 9. Scalability with data (work and time speedups)
654
❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡✱ ❞✐☛❡r❡♥t ✇♦r❦ ❛♠♦✉♥t
2 4 6 8 10 12 14 16
1X2X 4X 8X 16X
Normalized total work Normalized computation size
pthreads Blackscholes iThreads Blackscholes pthreads Swapations iThreads Swapations
Figure 10. Scalability with work
654
❙❡✈❡r❛❧ ♠♦❞✐☞❡❞ ♣❛❣❡s
0.01 0.1 1 10 100
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Work speedup
<0.01 <0.01 Number of dirty pages 2 4 8 16 32 64 0.1 1 10
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Time speedup
<0.1 <0.1 Number of dirty pages 2 4 8 16 32 64
654 Figure 11. Scalability with input change compared to pthreads for 64 threads 654
❖✈❡r❤❡❛❞ ♦❢ ✐❚❤r❡❛❞s s②st❡♠ ❞❛t❛
Application Input size Memoized state CDDG Histogram 230400 347 (0.15%) 57 (0.02%) Linear-reg. 132436 192 (0.14%) 33 (0.02%) Kmeans 586 1145 (195.39%) 27 (4.61%) Matrix-mul. 41609 4162 (10.00%) 64 (0.15%) Swapations 143 1473 (1030.07%) 1 (0.70%) Blackscholes 155 201 (129.68%) 1 (0.65%) String match 132436 128 (0.10%) 33 (0.02%) PCA 140625 3777 (2.69%) 43 (0.03%) Canneal 9 15381 (170900.00%) 4 (44.44%) Word count 12811 10191 (79.55%) 24 (0.19%) Rev-index 359 260679 (72612.53%) 64 (17.83%)
Table 1. Space overheads in pages and input percentage
654
■♥✐t✐❛❧ r✉♥ ♦✈❡r❤❡❛❞
0.1 1 10 100 1000
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Work overhead
Number of threads 12 24 48 64
0.1 1 10 100 1000
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Time overhead
Number of threads 12 24 48 64
655
Figure 12. Performance overheads of iThreads with respect to pthreads for the initial run
655
■♥✐t✐❛❧ r✉♥ ♦✈❡r❤❡❛❞
1 1.05 1.1 1.15 1.2 1.25 1.3
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Work overhead
Histogram = 3.58X maximum
Number of threads 12 24 48 64
1 1.05 1.1 1.15 1.2 1.25 1.3
Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index
Time overhead
Histogram = 3.13X maximum
Number of threads 12 24 48 64
655
Figure 13. Performance overheads of iThreads with respect to Dthreads for the initial run
655
❈❛s❡✲st✉❞② ❛♣♣❧✐❝❛t✐♦♥s
1 10 100
12 24 48 64
Speedup
Number of threads
Work - Parallel Gzip Time - Parallel Gzip Work - Monte-carlo Time - Monte-carlo
Figure 15. Work & time speedups for case-studies
656
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts
▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
▲✐♠✐t❛t✐♦♥s
■ ◆♦ s✉♣♣♦rt ❢♦r ❛❞✲❤♦❝ s②♥❝❤r♦♥✐③❛t✐♦♥
■ ◆♦ ❈✰✰ ❛t♦♠✐❝s
■ ◆♦ s✉♣♣♦rt ❢♦r s♠❛❧❧ ❧♦❝❛❧✐③❡❞ ✐♥s❡rt✐♦♥s ■ ❆ss✉♠❡s ❝♦♥st❛♥t ❛♠♦✉♥t ♦❢ t❤r❡❛❞s ■ ▼❛② ❤❛✈❡ s✐❣♥✐☞❝❛♥t ♦✈❡r❤❡❛❞ ■ ◆❛rr♦✇ ❛♣♣❧✐❝❛t✐♦♥ ❛r❡❛
❖✉t❝♦♠❡
■ ◆✐❝❡ ✐❞❡❛ ■ Pr❛❝t✐❝❛❧ ■ ❚r❛♥s♣❛r❡♥t ■ ❊✍❝✐❡♥t ■ ❲♦r❦s ❢♦r s♦♠❡ ❛♣♣❧✐❝❛t✐♦♥s ■ ❲❛② s✐❣♥✐☞❝❛♥t❧② ❞❡❝r❡❛s❡ r❡q✉✐r❡❞ ✇♦r❦
❉✐s❝✉ss✐♦♥
■ ❯♥✐ts ❢♦r s❝❛❧❡s ❛r❡ ♥♦t s♣❡❝✐☞❡❞✿
❙♦♠❡t✐♠❡s ♣❡r❝❡♥t❛❣❡✱ s♦♠❡t✐♠❡s t✐♠❡s
■ ■♥t❡r❛❝t✐✈❡ ❛♣♣❧✐❝❛t✐♦♥s ■ ❱❡❝t♦r ❝❧♦❝❦ ❢♦r ❡❛❝❤ t❤✉♥❦ ④ ♥♦t t♦♦ ♠✉❝❤❄ ■ ■❖ ♠❡♠♦r② ④ ❝❛♥ ②♦✉ ❞♦ s♦♠❡t❤✐♥❣❄ ❋♦r ✐♥st❛♥❝❡ ❢r❛♠❡
❜✉☛❡r✳
■ ❈❛♥ ❜❡ ❝♦♠❜✐♥❡❞ ✇✐t❤ ❞②♥❛♠✐❝ ❛❧❣♦r✐t❤♠s❄
❊①♣❧❛♥❛t✐♦♥ ♦❢ ❉t❤r❡❛❞s ❤✐❣❤ ♦✈❡r❤❡❛❞
0.01 0.1 1 10 100
H i s t
- g
r a m L i n e a r _ r e g K m e a n s M a t r i x _ m u l S w a p a t i
- n
s B l a c k s c h
- l
e s S t r i n g _ m a t c h P C A C a n n e a l W
- r
d _ c
- u
n t R e v e r s e _ i n d e x
1 1.05 1.1 1.15 1.2 1.25 1.3
Percentage (%) breakdown Work Overhead
Histogram = 3.58X
Read fault Memoization Work overhead
Figure 14. Work overheads breakdown w.r.t Dthreads
656
❘❡❧❡❛s❡ ❝♦♥s✐st❡♥❝②
■ ❖❜❥❡❝ts ❛r❡ ❛❝q✉✐r❡❞ ❛♥❞ r❡❧❡❛s❡❞ ■ ❈r✐t✐❝❛❧ s❡❝t✐♦♥ ❜❡t✇❡❡♥ ❛❝q✉✐r❡ ❛♥❞ r❡❧❡❛s❡ ■ ●✉❛r❛♥t❡❡❞ ❝♦rr❡❝t♥❡ss ❛♥❞ ❧✐✈❡♥❡ss ❢♦r ❞❛t❛✲r❛❝❡✲❢r❡❡
♣r♦❣r❛♠s
❱❡❝t♦r ❝❧♦❝❦s
■ ❯s❡❞ ❢♦r ✐♥✈❛❧✐❞❛t✐♦♥ ♣r♦♣❛❣❛t✐♦♥ ■ ▼❛✐♥t❛✐♥❡❞ ❢♦r✿
■ ❖❜❥❡❝ts ■ ❚❤r❡❛❞s ■ ❚❤✉♥❦s