Cache Oblivious Sorting
Gerth Stølting Brodal
University of Aarhus
Algorithms and Data Structures, Bertinoro, Forl` ı, Italy, June 22-28, 2003
1
Cache Oblivious Sorting Gerth Stlting Brodal University of Aarhus - - PowerPoint PPT Presentation
Cache Oblivious Sorting Gerth Stlting Brodal University of Aarhus Algorithms and Data Structures, Bertinoro, Forl` , Italy, June 22-28, 2003 1 Foundation 2 Outline of Talk Cache oblivious model Sorting problem Binary and
Algorithms and Data Structures, Bertinoro, Forl` ı, Italy, June 22-28, 2003
1
2
Gerth S. Brodal: Cache Oblivious Sorting
3
Frigo, Leiserson, Prokop, Ramachandran, FOCS’99
CPU
M e m
y
B M I/O
c a c h e
Gerth S. Brodal: Cache Oblivious Sorting
4
Frigo, Leiserson, Prokop, Ramachandran, FOCS’99
CPU
M e m
y
B M I/O
c a c h e
Disk CPU L1 L2 A R M
Increasing access time and space Gerth S. Brodal: Cache Oblivious Sorting
4
Gerth S. Brodal: Cache Oblivious Sorting
5
2 8 4 8 4 4 6 4 2 8 4 3 2 8 3 4 4 4 8 6 4 3 4 3 4 8 4 6 2 8 4 4 4 6 8 2 8 4 4
Merging Merging Merging Ouput Input Merging
Gerth S. Brodal: Cache Oblivious Sorting
6
2 8 4 8 4 4 6 4 2 8 4 3 2 8 3 4 4 4 8 6 4 3 4 3 4 8 4 6 2 8 4 4 4 6 8 2 8 4 4
Merging Merging Merging Ouput Input Merging
B log2 N M
Gerth S. Brodal: Cache Oblivious Sorting
6
B log2 N M
B logd N M
B − 1)
B
B logM/B N M
Aggarwal and Vitter 1988
ε SortM,B(N))
Frigo, Leiserson, Prokop and Ramachandran 1999 Brodal and Fagerberg 2002
Gerth S. Brodal: Cache Oblivious Sorting
7
Gerth S. Brodal: Cache Oblivious Sorting
8
Gerth S. Brodal: Cache Oblivious Sorting
9
Frigo et al., FOCS’99 Sorted output stream
M · · ·
k sorted input streams
Gerth S. Brodal: Cache Oblivious Sorting
10
Frigo et al., FOCS’99 Sorted output stream
M · · ·
k sorted input streams
Recursive def.
B1 · · · · · · · · · M1 M√ k M0 B√ k
← buffers of size k3/2 ← k1/2-mergers
Gerth S. Brodal: Cache Oblivious Sorting
10
Frigo et al., FOCS’99 Sorted output stream
M · · ·
k sorted input streams
Recursive def.
B1 · · · · · · · · · M1 M√ k M0 B√ k
← buffers of size k3/2 ← k1/2-mergers
· · ·
M0 M1 B1 B√
k M√ k
B2 M2
Gerth S. Brodal: Cache Oblivious Sorting
10
Brodal and Fagerberg 2002
B1 · · · · · · · · · M1 M√ k M0 B√ k
Gerth S. Brodal: Cache Oblivious Sorting
11
Brodal and Fagerberg 2002
B1 · · · · · · · · · M1 M√ k M0 B√ k
Procedure Fill(v) while out-buffer not full if left in-buffer empty Fill(left child) if right in-buffer empty Fill(right child) perform one merge step
Gerth S. Brodal: Cache Oblivious Sorting
11
Brodal and Fagerberg 2002
B1 · · · · · · · · · M1 M√ k M0 B√ k
Procedure Fill(v) while out-buffer not full if left in-buffer empty Fill(left child) if right in-buffer empty Fill(right child) perform one merge step
B logM(k3) + k) I/Os are
Gerth S. Brodal: Cache Oblivious Sorting
11
Brodal and Fagerberg 2002 Frigo, Leiserson, Prokop and Ramachandran 1999
Divide input in N 1/3 segments of size N 2/3 Recursively MergeSort each segment Merge sorted segments by an N 1/3-merger
k N1/3 N2/9 N4/27 . . . 2
Gerth S. Brodal: Cache Oblivious Sorting
12
Brodal and Fagerberg 2002 Frigo, Leiserson, Prokop and Ramachandran 1999
Divide input in N 1/3 segments of size N 2/3 Recursively MergeSort each segment Merge sorted segments by an N 1/3-merger
k N1/3 N2/9 N4/27 . . . 2
Gerth S. Brodal: Cache Oblivious Sorting
12
Gerth S. Brodal: Cache Oblivious Sorting
13
Brodal and Fagerberg 2003
Gerth S. Brodal: Cache Oblivious Sorting
14
ε
Gerth S. Brodal: Cache Oblivious Sorting
15
|X| comparisons
M: B2:
Gerth S. Brodal: Cache Oblivious Sorting
16
|X| comparisons
M: B2:
One problem : Online choice
Gerth S. Brodal: Cache Oblivious Sorting
16
i s
T T T
Gerth S. Brodal: Cache Oblivious Sorting
17
Gerth S. Brodal: Cache Oblivious Sorting
18
Processor type Pentium 4 Pentium 3 MIPS 10000 Workstation Dell PC Delta PC SGI Octane Operating system GNU/Linux Kernel version 2.4.18 GNU/Linux Kernel version 2.4.18 IRIX version 6.5 Clock rate 2400 MHz 800 MHz 175 MHz Address space 32 bit 32 bit 64 bit Integer pipeline stages 20 12 6 L1 data cache size 8 KB 16 KB 32 KB L1 line size 128 Bytes 32 Bytes 32 Bytes L1 associativity 4 way 4 way 2 way L2 cache size 512 KB 256 KB 1024 KB L2 line size 128 Bytes 32 Bytes 32 Bytes L2 associativity 8 way 4 way 2 way TLB entries 128 64 64 TLB associativity Full 4 way 64 way TLB miss handler Hardware Hardware Software Main memory 512 MB 256 MB 128 MB
19
Pentium 4, 512/512
0.1µs 1.0µs 10.0µs 100.0µs 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements Wall clock time per element ffunnelsort funnelsort lowscosa stdsort ami_sort msort-c msort-m
Kristoffer Vinther 2003
Gerth S. Brodal: Cache Oblivious Sorting
20
Pentium 4, 512/512
0.0 5.0 10.0 15.0 20.0 25.0 30.0 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements Page faults per block of elements ffunnelsort funnelsort lowscosa stdsort msort-c msort-m
Kristoffer Vinther 2003
Gerth S. Brodal: Cache Oblivious Sorting
21
MIPS 10000, 1024/128
0.0 5.0 10.0 15.0 20.0 25.0 30.0 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements L2 cache misses per lines of elements ffunnelsort funnelsort lowscosa stdsort msort-c msort-m
Kristoffer Vinther 2003
Gerth S. Brodal: Cache Oblivious Sorting
22
MIPS 10000, 1024/128
1.0 10.0 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Elements TLB misses per block of elements ffunnelsort funnelsort lowscosa stdsort msort-c msort-m
Kristoffer Vinther 2003
Gerth S. Brodal: Cache Oblivious Sorting
23
Gerth S. Brodal: Cache Oblivious Sorting
24
Gerth S. Brodal: Cache Oblivious Sorting
25