Physical Aware System Level Design for Tiled Hierarchical Chip - - PowerPoint PPT Presentation

physical aware system level design for tiled hierarchical
SMART_READER_LITE
LIVE PREVIEW

Physical Aware System Level Design for Tiled Hierarchical Chip - - PowerPoint PPT Presentation

Physical Aware System Level Design for Tiled Hierarchical Chip Multiprocessors Jordi Cortadella, Javier de San Pedro, Nikita Nikitin and Jordi Petit Universitat Politcnica de Catalunya (Barcelona) Project funded by Intel Corp. Designing a


slide-1
SLIDE 1

Physical‐Aware System‐Level Design for Tiled Hierarchical Chip Multiprocessors

Jordi Cortadella, Javier de San Pedro, Nikita Nikitin and Jordi Petit Universitat Politècnica de Catalunya (Barcelona)

Project funded by Intel Corp.

slide-2
SLIDE 2

Designing a Chip Multiprocessor

ISPD 2013 Tiled CMPs 2

CMP

Off‐Chip Memory

DSP

Graphics

Data Mining

Bioinformatics

  • How many cores?
  • How much L2/L3 on-chip cache?
  • Interconnect: mesh/ring/bus?
  • How many memory controllers?
slide-3
SLIDE 3

What is architectural exploration?

ISPD 2013 Tiled CMPs

MC MC MC MC MC MC MC MC

C1 C1 C1

R

C2

L2 L2 L2 NI L3 L2 Bus

C2

L2

C2

L2

C2

L2 L3 Bus

C2

L2

C2

L2 NI R

C1

L2

‐ 6x4 mesh, 24 clusters ‐ total 144 cores ‐ 6 cores/cluster ‐ 1 C1, 128K L1, 256K L2 ‐ 5 C2, 64K L1, 96K L2 ‐ 146 Mb total shared L3 Throughput = 85.71 IPC ‐ 5x5 mesh, 25 clusters ‐ total 100 cores ‐ 4 cores/cluster ‐ 3 C1, 128K L1, 1M L2 ‐ 1 C2, 128K L1, 1M L2 ‐ 130 Mb total shared L3 Throughput = 103.26 IPC

3

slide-4
SLIDE 4

Library of models

ISPD 2013 Tiled CMPs

Parameter Value Chip area Mesh dimensions

  • Mem. Cntrl. latency

Interconnect Link width Workload MPI Workload MLP 350 mm2 2x2 to 16x16 200 cycles Bus, uni‐ / bi‐ring 256 – 1024 bits 0.5 1.25 1.5 1.75 2 2.25 2.5 2.75 0.75 1 1.25 1.5 1.75 2 2.25

IPC Area (mm2) C1 (IO) C2 (OoO) C3 (OoO)

0.2 0.4 0.6 0.8 1 2 3 4

Miss Ratio Cache Size (Mb)

Cache Size Area (mm2) Latency (cycles) 64Kb 0.063 2 128Kb 0.125 3 256Kb 0.25 4 … … … 8Mb 8.0 9

Miss rates for SPEC CPU2006 Core library CMP configuration Cache models

4

slide-5
SLIDE 5

Physical planning for tiled CMPs

ISPD 2013 Tiled CMPs

N S E W

5

slide-6
SLIDE 6

Outline

  • Architectural exploration

– The cost of exploration – Exploring with metaheuristics – Analytical models

  • Physical planning for tiled CMPs
  • Current work: regular floorplanning

Tiled CMPs ISPD 2013 6

slide-7
SLIDE 7

Exploration engines

Tiled CMPs ISPD 2013

1 10 100 1000 10000 1E+05 1E+06 1E+07 1E+08 1E+09 1E+10 1E+11 1E+12 1E+13

Simulation (full system) Simulation (probabilistic) Analytical (exhaustive) Analytical (metaheuristic)

Exploration runtime (sec)

Design space: 109 configurations

300 centuries 300 years 100 days 100 seconds

7

slide-8
SLIDE 8

Scalable exploration

ISPD 2013 Tiled CMPs

Simulation

Analytical Modeling

8

Architectural configurations Promising configurations

slide-9
SLIDE 9

Automated exploration

Models

(performance/power)

Cores On‐chip caches Off‐chip memories Interconnect fabrics Cache protocol Workloads Architectural configuration Number of cores Cluster size L2/L3 size Intra‐cluster interconnect Inter‐cluster interconnect Memory controllers

Exploration tool

Constraints Area Throughput Power

ISPD 2013 Tiled CMPs

Physical info Cores Caches Interconnects

9

slide-10
SLIDE 10

Exploration engine: metaheuristics

  • Explore huge design spaces efficiently
  • Our proposal:

– Simulated Annealing (Kirkpatrick et al., 1983) – Extremal Optimization (Boettcher et al., 1999)

Tiled CMPs ISPD 2013

Exploration tool

Simulation

Best configuration Models Constraints

Partial generation

  • f configurations

search direction

Analytical modeling

10

slide-11
SLIDE 11

Generation of configurations

  • Generate neighbors by applying transformations

– Increase/Decrease

  • mesh dimensions
  • core count per cluster
  • L1, L2 size

– Change interconnect type (bus/uni‐ring/bi‐ring) – Complex updates (increase mesh/decrease core count)

  • Example: Increase_X(mesh 4x4) => mesh 5x4

ISPD 2013 Tiled CMPs 11

Increase_X

slide-12
SLIDE 12

Analytical performance model for CMPs

λ L Throughput

Core Core Core … Li λi

Memory subsystem

ISPD 2013 Tiled CMPs

  L    L    •••

Queueing model: Umit Ogras et al. IEEE TCAD, Dec 2010 10 20 30 40 50 0.05 0.1 0.15 0.2

L, average latency (cycles) λ, average traffic rate (flits/cycle)

L(λ) λ (L)

Characteristic of the IC Characteristic of the cores/workload Hop‐count latency

Nonlinear analytical models

Throughput model: Traffic model: Latency model:

12

slide-13
SLIDE 13

Analytical model vs. simulation

ISPD 2013 Tiled CMPs

Simulation Analytical modeling

10 20 30 40 50 60 70 80 1 55 109 163 217 271 325 379 433 487 541 595 649 703 757 811 865 919 973 1027 1081 1135 1189 1243 1297 1351 1405 1459 1513 1567 1621 1675 1729 1783 1837 1891 1945 1999 2053 2107

Throughput (IPC) Configurations sorted in descending order of throughput

Modeling Simulation

13

slide-14
SLIDE 14

Case Study: Power‐performance exploration

Tiled CMPs ISPD 2013

70 80 90 100 110 120 130 120 140 160 180 200 220 240

Throughput (IPC) Power (W)

6x5, Bi‐Ring, 4C2 5x4, Bi‐Ring, 6C2 5x3, Bi‐Ring, 8C2 4x3, Bi‐Ring, 10C2 4x2, Bi‐Ring, 15C2 3x2, Bi‐Ring, 20C2 6x5, Bus, 4C2 7x4, Bus, 3C2+1C3 7x4, Bus, 4C2 6x5, Bus, 2C1+2C2 6x4, Bus, 3C1+2C2

Power‐performance trade‐off

(Search space: 1.5 · 109 configurations)

14

slide-15
SLIDE 15

Outline

  • Architectural exploration
  • Physical planning for tiled CMPs

– Impact of physical planning – Floorplanning – Wire planning

  • Current work: regular floorplanning

Tiled CMPs ISPD 2013 15

slide-16
SLIDE 16

Physical planning

ISPD 2013 Tiled CMPs 16

C L2 C L2 C L2 C L2 r r r r r r R L3

NSWE

slide-17
SLIDE 17

The impact of physical planning

ISPD 2013 Tiled CMPs 17

C L2 C L2 C L2 C L2 r r r r r r R L3

NSWE

slide-18
SLIDE 18

Physical planning for tiles

ISPD 2013 Tiled CMPs

N S E W

18

slide-19
SLIDE 19

Link width: how many wires?

ISPD 2013 Tiled CMPs 19

Router Router

Cntrl Addr Cache line 64 512 Cntrl Addr Cache line 64 512

> 1K wires  100 m

slide-20
SLIDE 20

3D Wire Planning

ISPD 2013 Tiled CMPs

m1 m2 m3 m4 m5 m6 Memory Router Core In systems where memory bandwidth is the bottleneck, the physical resources providing the bandwidth are critical

20

FEOL

slide-21
SLIDE 21

Exploration without physical planning

Tiled CMPs ISPD 2013

Simulation

Best configuration Models Constraints

Generation of configurations

search direction

Analytical modeling

Validation Architectural exploration

21

slide-22
SLIDE 22

Exploration with physical planning

Tiled CMPs ISPD 2013

Simulation

Best configuration

Generation of configurations

search direction

Analytical modeling Wire Planning Floor‐ planning

Physical planning Validation Architectural exploration

22

  • Phys. Info

Models Constraints

slide-23
SLIDE 23

Physical planning

ISPD 2013 Tiled CMPs

Simulation

Analytical Modeling L3 R Local IC

L2

C C

L2

L3

23

Physical Planning

Estimations:

  • Area
  • Wirelength
  • Routability
slide-24
SLIDE 24

Physical planning technology

  • Floorplanning

– Slicing structures & Simulated Annealing – Lightweight 3D maze router – Constraints:

  • Adjacency (Core  L2)
  • Balanced links (rings)
  • Wire planning

– SAT‐based 3D global routing – Boolean constraints

ISPD 2013 Tiled CMPs 24

slide-25
SLIDE 25

Slicing structures

ISPD 2013 Tiled CMPs 25

1 3 2 4 5 1 2 3 5 4

V H H V H V V H

1 2 3 4 5 1 4 3 2 5

D.F. Wong and C.L. Liu, “A New Algorithm for Floorplan Design” DAC, 1986, pages 101-107.

slide-26
SLIDE 26

Bounding curves

ISPD 2013 Tiled CMPs 26

Memory

  • L. Stockmeyer, 1983,

Optimal Orientation of Cells in Slicing Floorplan Designs

slide-27
SLIDE 27

Wire planner

  • SAT‐based approach for gridded routing
  • Grid unit: link width ( 500 ‐ 1000 wires)
  • Support for floating terminals
  • Customizable for any type of Boolean‐encoded

constraints (symmetry, 1D/2D routing, …)

ISPD 2013 Tiled CMPs

Top view Cross-section view

27

slide-28
SLIDE 28

Wire planner

  • Concurrent routing: all nets simultaneously
  • Using Euler’s theory to find legal routes
  • SAT: a route is always found if it exists
  • ILP‐based route optimization

ISPD 2013 Tiled CMPs 28

Router

W E

W E

slide-29
SLIDE 29

Design space

ISPD 2013 Tiled CMPs 29

slide-30
SLIDE 30

Design space

Wire length [106 μm]

ISPD 2013 Tiled CMPs 30

slide-31
SLIDE 31

Filtering floorplans

Area [mm2]

ISPD 2013 Tiled CMPs 31

slide-32
SLIDE 32

Filtering floorplans

Area [mm2]

ISPD 2013 Tiled CMPs 32

slide-33
SLIDE 33

After physical planning

ISPD 2013 Tiled CMPs 33

slide-34
SLIDE 34

After physical planning

ISPD 2013 Tiled CMPs 34

slide-35
SLIDE 35

After physical planning

ISPD 2013 Tiled CMPs 35

slide-36
SLIDE 36

Outline

  • Architectural exploration
  • Physical planning for tiled CMPs
  • Current work: regular floorplanning

– Memory floorplanning – Regularity extraction

Tiled CMPs ISPD 2013 36

slide-37
SLIDE 37

Min‐area floorplan

ISPD 2013 Tiled CMPs 37

C L2 C L2 C L2 C L2 r r r r r r R L3

NSWE

L3 C

L2 L2

C

L2

r C R r r r r

L2

C r

slide-38
SLIDE 38

R

Integrated memory floorplanner

ISPD 2013 Tiled CMPs 38

1Mb 1Mb 1Mb 512Kb 512Kb 512Kb 512 Kb

L-shape

256 Kb 512 Kb 256 Kb

T-shape

slide-39
SLIDE 39

Regular floorplan

ISPD 2013 Tiled CMPs 39

C L2 C L2 C L2 C L2 r r r r r r R L3

NSWE

L3 r C r C r r C r C r R

L2 L2 L2 L2

slide-40
SLIDE 40

Regular floorplan

ISPD 2013 Tiled CMPs 40

L3 r C r C r r C r C r R

L2 L2 L2 L2

L3 C

L2 L2

C

L2

r C R r r r r

L2

C r

Regularity:

  • Smaller design effort
  • Efficient timing closure
  • Choppability
slide-41
SLIDE 41

Regular floorplan

ISPD 2013 Tiled CMPs 41

L3 r C r r C r R L3 C

L2 L2

C

L2

r C R r r r r

L2

C r

L2 L2

Regularity:

  • Smaller design effort
  • Efficient timing closure
  • Choppability

Exploration:

  • Graph based knowledge discovery
  • Hierarchical slicing structures
  • Simulated Annealing
slide-42
SLIDE 42

Conclusions

  • Physical information is essential for the

architectural exploration of tiled CMPs

  • Approach: divide & conquer

– 1‐cluster floorplan + abutment constraints – Build tiled CMPs (scalable and choppable)

  • Ongoing work:

– Multi‐module memory floorplanning – Regularity

ISPD 2013 Tiled CMPs 42