Coping with the Memory Hierarchy the Cache-Oblivious Way
Rolf Fagerberg University of Aarhus
Imada, SDU, February 18, 2004
Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf - - PowerPoint PPT Presentation
Coping with the Memory Hierarchy the Cache-Oblivious Way Rolf Fagerberg University of Aarhus Imada, SDU, February 18, 2004 Overview The memory hierachy The I/O-model The cache-oblivious model Examples of cache-oblivious
Imada, SDU, February 18, 2004
Fagerberg: The Cache-Oblivious Way
2
Disk Tertiary Storage RAM CPU
Cache2 Cache3
Fagerberg: The Cache-Oblivious Way
3
Disk Tertiary Storage RAM CPU
Cache2 Cache3
Access time Volume Registers 1 cycle 1 Kb Cache 10 cycles 512 Kb RAM 100 cycles 512 Mb Disk 20,000,000 cycles 80 Gb
Fagerberg: The Cache-Oblivious Way
3
Disk Tertiary Storage RAM CPU
Cache2 Cache3
Access time Volume Registers 1 cycle 1 Kb Cache 10 cycles 512 Kb RAM 100 cycles 512 Mb Disk 20,000,000 cycles 80 Gb Gap increases over time. Real problems of Gigabyte, Terabyte, and even Petabyte size: Databases (finance, phone companies, banks, weather, geology, geography, astron-
graphics.
Fagerberg: The Cache-Oblivious Way
3
CPU A R M
Fagerberg: The Cache-Oblivious Way
4
CPU A R M
Fagerberg: The Cache-Oblivious Way
4
Fagerberg: The Cache-Oblivious Way
5
Model two layers
CPU External I/O Memory y r
e M
Aggarwal and Vitter 1988
Fagerberg: The Cache-Oblivious Way
6
Fagerberg: The Cache-Oblivious Way
7
Fagerberg: The Cache-Oblivious Way
7
N B log2 N I/Os
N B logM/B N M I/Os
Fagerberg: The Cache-Oblivious Way
8
B logM/B N M
B -way merge-sort.
B logM/B N M }
Fagerberg: The Cache-Oblivious Way
9
Fagerberg: The Cache-Oblivious Way
10
Disk CPU L2 L1 A R M C a c C c a e h h e
Increasing access time
CPU A R M CPU M e m
y
B M I/O
c a c h e
Cache- Oblivious- ness RAM model I/O model Multi-level models New Model
Fagerberg: The Cache-Oblivious Way
11
CPU
M e m
y
B M I/O
c a c h e
Frigo, Leiserson, Prokop, Ramachandran, FOCS’99
Fagerberg: The Cache-Oblivious Way
12
CPU
M e m
y
B M I/O
c a c h e
Frigo, Leiserson, Prokop, Ramachandran, FOCS’99
Disk CPU L2 L1 A R M C a c C c a e h h e
Increasing access time
Fagerberg: The Cache-Oblivious Way
12
Fagerberg: The Cache-Oblivious Way
13
FOCS’99
FOCS’99, ICALP’02, ALENEX’04
Prokop 99, FOCS’00, WAE’01, SODA’02 × 2, ESA’02, FOCS’03
STOC’02, ISAAC’02
STOC’02, BRICS-04-2
2 × ICALP’02 , SCG’03
ESA’02
STOC’03
Fagerberg: The Cache-Oblivious Way
13
FOCS’99
FOCS’99, ICALP’02, ALENEX’04
Prokop 99, FOCS’00, WAE’01, SODA’02 × 2, ESA’02, FOCS’03
STOC’02, ISAAC’02
STOC’02, BRICS-04-2
2 × ICALP’02 , SCG’03
ESA’02
STOC’03
Fagerberg: The Cache-Oblivious Way
13
Fagerberg: The Cache-Oblivious Way
14
X Y i j
Fagerberg: The Cache-Oblivious Way
15
X Y M M
Fagerberg: The Cache-Oblivious Way
16
X Y n/2 n/2 n/2 n/2
Fagerberg: The Cache-Oblivious Way
17
Fagerberg: The Cache-Oblivious Way
18
1 10 100 1000 10000 15 16 17 18 19 20 21 time (seconds) log2 of array size (bytes) plain cache-aware (L1) cache-aware (L2) cache-oblivious
366 MHz Pentium II, 128 MB RAM, 256 KB Cache, gcc -O3, Linux
Fagerberg: The Cache-Oblivious Way
19
0.1 1 10 100 1000 19 20 21 22 23 24 25 26 27 time (seconds) log2 of array size (bytes) plain cache-aware (L2) cache-aware (RAM) cache-oblivious
366 MHz Pentium II, 128 MB RAM, 256 KB Cache, gcc -O3, Linux
Fagerberg: The Cache-Oblivious Way
20
Fagerberg: The Cache-Oblivious Way
21
Fagerberg: The Cache-Oblivious Way
22
Prokop 1999
Bk A B1 A B1 Bk · · · · · · h ⌈h/2⌉ ⌊h/2⌋ · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Fagerberg: The Cache-Oblivious Way
23
Prokop 1999
Bk A B1 A B1 Bk · · · · · · h ⌈h/2⌉ ⌊h/2⌋ · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Fagerberg: The Cache-Oblivious Way
23
Prokop 1999
Bk A B1 A B1 Bk · · · · · · h ⌈h/2⌉ ⌊h/2⌋ · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Bender, Demaine, Farach-Colton, FOCS’00 Rahman, Cole, Raman, WAE’01 Bender, Duan, Iacono, Wu, SODA 02 Brodal, Fagerberg, Jacob, SODA’02
Fagerberg: The Cache-Oblivious Way
23
6 4 1 3 5 8 7 11 10 13 2 New 6 3 1 2 4 8 7 11 10 13 5
Itai, Konheim, Rodeh, 1981 Andersson, Lai, 1990
Fagerberg: The Cache-Oblivious Way
24
6 4 1 3 5 8 7 11 10 13
Fagerberg: The Cache-Oblivious Way
25
6 4 1 3 5 8 7 11 10 13
B
B
26
Brodal, Fagerberg, Jacob, SODA’02
Fagerberg: The Cache-Oblivious Way
27
Fagerberg: The Cache-Oblivious Way
28
2e-07 4e-07 1e-06 2e-06 4e-06 6e-06 12 14 16 18 20 22 24 26 average search time in seconds log2 of number elements stored cache veb:pointer bfs:pointer dfs:pointer rin:pointer
1 GHz Pentium III, 1GB RAM, 256 KB Cache, gcc -O3 / linux
Fagerberg: The Cache-Oblivious Way
29
2e-07 4e-07 1e-06 2e-06 4e-06 6e-06 12 14 16 18 20 22 24 26 average search time in seconds log2 of number elements stored cache veb:implicit bfs:implicit high008:implicit high016:implicit inorder:implicit
1 GHz Pentium III, 1GB RAM, 256 KB Cache, gcc -O3 / linux
Fagerberg: The Cache-Oblivious Way
30
2e-07 4e-07 1e-06 2e-06 4e-06 6e-06 12 14 16 18 20 22 24 26 average search time in seconds log2 of number elements stored veb:implicit veb:pointer bfs:implicit bfs:pointer
1 GHz Pentium III, 1GB RAM, 256 KB Cache, gcc -O3 / linux
Fagerberg: The Cache-Oblivious Way
31
1e-06 1e-05 0.0001 0.001 0.01 0.1 20 21 22 23 24 25 26 27 28 29 average search time in seconds log2 of number elements stored bfs veb high1024
1 GHz Pentium III, 32MB RAM, 256 KB Cache, gcc -O3 / linux
Fagerberg: The Cache-Oblivious Way
32
Fagerberg: The Cache-Oblivious Way
33
Divide input in N 1/3 segments of size N 2/3 Recursively Funnelsort each segment Merge sorted segments by an N 1/3-merger
k N1/3 N2/9 N4/27 . . . 2 Frigo, Leiserson, Prokop, Ramachandran, 1999
Fagerberg: The Cache-Oblivious Way
34
B1 · · · · · · · · · M1 M√
k
Mtop B√
k
d Brodal, Fagerberg 2002
Fagerberg: The Cache-Oblivious Way
35
B1 · · · · · · · · · M1 M√
k
Mtop B√
k
Fill(v): while out-buffer not full if left in-buffer empty Fill(left child) if right in-buffer empty Fill(right child) perform one merge step
d Brodal, Fagerberg 2002
Fagerberg: The Cache-Oblivious Way
35
Brodal, Fagerberg, Vinther, ALENEX’04
Fagerberg: The Cache-Oblivious Way
36
Fagerberg: The Cache-Oblivious Way
37
2e-08 2.5e-08 3e-08 3.5e-08 4e-08 4.5e-08 5e-08 5.5e-08 6e-08 6.5e-08 12 14 16 18 20 22 24 Walltime/n*log n log n Uniform pairs - Pentium III Funnelsort2 Funnelsort4 Mix msort-c msort-m Rmerge GCC TPIE
Fagerberg: The Cache-Oblivious Way
38
1e-08 1.5e-08 2e-08 2.5e-08 3e-08 12 14 16 18 20 22 24 Walltime/n*log n log n Uniform pairs - AMD Athlon Funnelsort2 Funnelsort4 Mix msort-c msort-m Rmerge GCC TPIE
Fagerberg: The Cache-Oblivious Way
39
6e-09 8e-09 1e-08 1.2e-08 1.4e-08 1.6e-08 1.8e-08 12 14 16 18 20 22 24 26 Walltime/n*log n log n Uniform pairs - Pentium 4 Funnelsort2 Funnelsort4 Mix msort-c msort-m Rmerge GCC TPIE
Fagerberg: The Cache-Oblivious Way
40
Fagerberg: The Cache-Oblivious Way
41
1e-07 2e-07 3e-07 4e-07 5e-07 6e-07 21 22 23 24 25 26 27 28 Walltime/n*log n log n Uniform pairs - Pentium III Funnelsort2 msort-c msort-m Rmerge GCC TPIE
Fagerberg: The Cache-Oblivious Way
42
5e-08 1e-07 1.5e-07 2e-07 2.5e-07 3e-07 3.5e-07 4e-07 21 22 23 24 25 26 27 28 Walltime/n*log n log n Uniform pairs - Pentium 4 Funnelsort2 msort-c msort-m Rmerge GCC TPIE
Fagerberg: The Cache-Oblivious Way
43
Fagerberg: The Cache-Oblivious Way
44
Fagerberg: The Cache-Oblivious Way
44
Fagerberg: The Cache-Oblivious Way
45
Fagerberg: The Cache-Oblivious Way
46
Frigo, Leiserson, Prokop, Ramachandran, FOCS’99
Fagerberg: The Cache-Oblivious Way
46
Brodal, Fagerberg, STOC’03 Bender, Brodal, Fagerberg, Ge, He, Hu, Iacono, López-Ortiz, FOCS’03
N M I/Os
Fagerberg: The Cache-Oblivious Way
47
Brodal, Fagerberg, STOC’03 Bender, Brodal, Fagerberg, Ge, He, Hu, Iacono, López-Ortiz, FOCS’03
N M I/Os
Fagerberg: The Cache-Oblivious Way
47
Brodal, Fagerberg, STOC’03 Bender, Brodal, Fagerberg, Ge, He, Hu, Iacono, López-Ortiz, FOCS’03
N M I/Os
Fagerberg: The Cache-Oblivious Way
47
B M:
Fagerberg: The Cache-Oblivious Way
48
B M:
N ǫB logM N M
N B log2 N M
1 M 1/2 M B: M 1−ǫ
Fagerberg: The Cache-Oblivious Way
48
1 ǫ
1 M 1/2 M 1−ǫ M B:
Penalty
Fagerberg: The Cache-Oblivious Way
49
1 ǫ
1 M 1/2 M 1−ǫ M B:
Penalty
Fagerberg: The Cache-Oblivious Way
49
Fagerberg: The Cache-Oblivious Way
50
Fagerberg: The Cache-Oblivious Way
51
|X| comparisons.
M: B2:
Fagerberg: The Cache-Oblivious Way
52
|X| comparisons.
M: B2:
One problem: Online choice
Fagerberg: The Cache-Oblivious Way
52
i s
T T T
Fagerberg: The Cache-Oblivious Way
53
i s
T T T
Fagerberg: The Cache-Oblivious Way
53
Fagerberg: The Cache-Oblivious Way
54