Physical‐Aware System‐Level Design for Tiled Hierarchical Chip Multiprocessors
Jordi Cortadella, Javier de San Pedro, Nikita Nikitin and Jordi Petit Universitat Politècnica de Catalunya (Barcelona)
Project funded by Intel Corp.
Physical Aware System Level Design for Tiled Hierarchical Chip - - PowerPoint PPT Presentation
Physical Aware System Level Design for Tiled Hierarchical Chip Multiprocessors Jordi Cortadella, Javier de San Pedro, Nikita Nikitin and Jordi Petit Universitat Politcnica de Catalunya (Barcelona) Project funded by Intel Corp. Designing a
Project funded by Intel Corp.
ISPD 2013 Tiled CMPs 2
Data Mining
Bioinformatics
ISPD 2013 Tiled CMPs
MC MC MC MC MC MC MC MC
C1 C1 C1
R
C2
L2 L2 L2 NI L3 L2 Bus
C2
L2
C2
L2
C2
L2 L3 Bus
C2
L2
C2
L2 NI R
C1
L2
‐ 6x4 mesh, 24 clusters ‐ total 144 cores ‐ 6 cores/cluster ‐ 1 C1, 128K L1, 256K L2 ‐ 5 C2, 64K L1, 96K L2 ‐ 146 Mb total shared L3 Throughput = 85.71 IPC ‐ 5x5 mesh, 25 clusters ‐ total 100 cores ‐ 4 cores/cluster ‐ 3 C1, 128K L1, 1M L2 ‐ 1 C2, 128K L1, 1M L2 ‐ 130 Mb total shared L3 Throughput = 103.26 IPC
3
ISPD 2013 Tiled CMPs
Parameter Value Chip area Mesh dimensions
Interconnect Link width Workload MPI Workload MLP 350 mm2 2x2 to 16x16 200 cycles Bus, uni‐ / bi‐ring 256 – 1024 bits 0.5 1.25 1.5 1.75 2 2.25 2.5 2.75 0.75 1 1.25 1.5 1.75 2 2.25
IPC Area (mm2) C1 (IO) C2 (OoO) C3 (OoO)
0.2 0.4 0.6 0.8 1 2 3 4
Miss Ratio Cache Size (Mb)
Cache Size Area (mm2) Latency (cycles) 64Kb 0.063 2 128Kb 0.125 3 256Kb 0.25 4 … … … 8Mb 8.0 9
Miss rates for SPEC CPU2006 Core library CMP configuration Cache models
4
ISPD 2013 Tiled CMPs
5
Tiled CMPs ISPD 2013 6
Tiled CMPs ISPD 2013
1 10 100 1000 10000 1E+05 1E+06 1E+07 1E+08 1E+09 1E+10 1E+11 1E+12 1E+13
Simulation (full system) Simulation (probabilistic) Analytical (exhaustive) Analytical (metaheuristic)
Exploration runtime (sec)
Design space: 109 configurations
300 centuries 300 years 100 days 100 seconds
7
ISPD 2013 Tiled CMPs
Simulation
Analytical Modeling
8
Models
(performance/power)
Cores On‐chip caches Off‐chip memories Interconnect fabrics Cache protocol Workloads Architectural configuration Number of cores Cluster size L2/L3 size Intra‐cluster interconnect Inter‐cluster interconnect Memory controllers
Constraints Area Throughput Power
ISPD 2013 Tiled CMPs
Physical info Cores Caches Interconnects
9
– Simulated Annealing (Kirkpatrick et al., 1983) – Extremal Optimization (Boettcher et al., 1999)
Tiled CMPs ISPD 2013
Best configuration Models Constraints
search direction
10
ISPD 2013 Tiled CMPs 11
Increase_X
λ L Throughput
Core Core Core … Li λi
Memory subsystem
ISPD 2013 Tiled CMPs
L L •••
Queueing model: Umit Ogras et al. IEEE TCAD, Dec 2010 10 20 30 40 50 0.05 0.1 0.15 0.2
L, average latency (cycles) λ, average traffic rate (flits/cycle)
L(λ) λ (L)
Characteristic of the IC Characteristic of the cores/workload Hop‐count latency
Nonlinear analytical models
Throughput model: Traffic model: Latency model:
12
ISPD 2013 Tiled CMPs
Simulation Analytical modeling
10 20 30 40 50 60 70 80 1 55 109 163 217 271 325 379 433 487 541 595 649 703 757 811 865 919 973 1027 1081 1135 1189 1243 1297 1351 1405 1459 1513 1567 1621 1675 1729 1783 1837 1891 1945 1999 2053 2107
Throughput (IPC) Configurations sorted in descending order of throughput
Modeling Simulation
13
Tiled CMPs ISPD 2013
70 80 90 100 110 120 130 120 140 160 180 200 220 240
Throughput (IPC) Power (W)
6x5, Bi‐Ring, 4C2 5x4, Bi‐Ring, 6C2 5x3, Bi‐Ring, 8C2 4x3, Bi‐Ring, 10C2 4x2, Bi‐Ring, 15C2 3x2, Bi‐Ring, 20C2 6x5, Bus, 4C2 7x4, Bus, 3C2+1C3 7x4, Bus, 4C2 6x5, Bus, 2C1+2C2 6x4, Bus, 3C1+2C2
(Search space: 1.5 · 109 configurations)
14
Tiled CMPs ISPD 2013 15
ISPD 2013 Tiled CMPs 16
NSWE
ISPD 2013 Tiled CMPs 17
NSWE
ISPD 2013 Tiled CMPs
18
ISPD 2013 Tiled CMPs 19
Cntrl Addr Cache line 64 512 Cntrl Addr Cache line 64 512
ISPD 2013 Tiled CMPs
m1 m2 m3 m4 m5 m6 Memory Router Core In systems where memory bandwidth is the bottleneck, the physical resources providing the bandwidth are critical
20
Tiled CMPs ISPD 2013
Best configuration Models Constraints
search direction
Validation Architectural exploration
21
Tiled CMPs ISPD 2013
Best configuration
search direction
Physical planning Validation Architectural exploration
22
Models Constraints
ISPD 2013 Tiled CMPs
Simulation
Analytical Modeling L3 R Local IC
L2
C C
L2
23
Physical Planning
ISPD 2013 Tiled CMPs 24
ISPD 2013 Tiled CMPs 25
V H H V H V V H
D.F. Wong and C.L. Liu, “A New Algorithm for Floorplan Design” DAC, 1986, pages 101-107.
ISPD 2013 Tiled CMPs 26
Optimal Orientation of Cells in Slicing Floorplan Designs
ISPD 2013 Tiled CMPs
27
ISPD 2013 Tiled CMPs 28
W E
ISPD 2013 Tiled CMPs 29
Wire length [106 μm]
ISPD 2013 Tiled CMPs 30
Area [mm2]
ISPD 2013 Tiled CMPs 31
Area [mm2]
ISPD 2013 Tiled CMPs 32
ISPD 2013 Tiled CMPs 33
ISPD 2013 Tiled CMPs 34
ISPD 2013 Tiled CMPs 35
Tiled CMPs ISPD 2013 36
ISPD 2013 Tiled CMPs 37
NSWE
L2 L2
L2
L2
ISPD 2013 Tiled CMPs 38
ISPD 2013 Tiled CMPs 39
NSWE
L2 L2 L2 L2
ISPD 2013 Tiled CMPs 40
L2 L2 L2 L2
L2 L2
L2
L2
Regularity:
ISPD 2013 Tiled CMPs 41
L2 L2
L2
L2
L2 L2
Regularity:
Exploration:
ISPD 2013 Tiled CMPs 42