Network-on-Chip Switching strategies Routing algorithms (NoC) - - PowerPoint PPT Presentation

network on chip
SMART_READER_LITE
LIVE PREVIEW

Network-on-Chip Switching strategies Routing algorithms (NoC) - - PowerPoint PPT Presentation

Advanced Digital IC Design Agenda Introduction NoC Concept NoC topology Network-on-Chip Switching strategies Routing algorithms (NoC) Flow control schemes NoC Architecture Examples Emerging NoC technologies Chenxin Zhang & Xiaodong


slide-1
SLIDE 1

1

Advanced Digital IC Design

Network-on-Chip (NoC)

Chenxin Zhang & Xiaodong Liu

Agenda

Introduction NoC Concept

NoC topology Switching strategies Routing algorithms Flow control schemes

NoC Architecture Examples Emerging NoC technologies

Introduction

Evolution of on-chip communication architectures N t k Chi (N C) i k t it h d hi Network-on-Chip (NoC) is a packet switched on-chip communication network designed using a layered

  • methodology. NoC is a communication centric design

paradigm for System-on-Chip (SoC).

Introduction

NoCs use packets to route data from the source processing element (PE) to the destination PE via a network fabric that consists of

Network interfaces/ adapters (NI) Routers (a.k.a. switches) interconnection links (channels, wires bundles)

registers ALU

MEM NI

slide-2
SLIDE 2

2

Building Blocks: NI

Session-layer (P2P) interface with nodes Back-end manages interface with switches

Decoupling logic & synchronization

Front end Backend Node Switches

Standard P2P Node protocol Proprietary link protocol

Decoupling logic & synchronization Standardized node interface @ session layer.

  • 1. Supported transactions (e.g. QoSread…)
  • 2. Degree of parallelism
  • 3. Session prot. control flow & negotiation

NoC specific backend (layers 1-4) 1. Physical channel interface 2. Link-level protocol 3. Network-layer (packetization) 4. Transport layer (routing)

Building Blocks: Router (Switch)

Router or Switch: receives and forwards packets Buffers have dual function: synchronization & queuing

Crossbar

Allocator Arbiter

Output buffers & control flow Input buffers & control flow QoS & Routing Data ports with control flow wires

Building Blocks: Links

Connects two routers in both directions on a number of wires (e.g.,32 bits) In addition, wires for control are part of the link too Can be pipelined (include handshaking for asynchronous)

NoC Concept

Topology

How the nodes are connected together

Switching Switching

Allocation of network resources (bandwidth, buffer capacity, … ) to information flows

Routing

Path selection between a source and a destination node in a particular topology

Flow control Flow control

How the downstream node communicates forwarding availability to the upstream node

slide-3
SLIDE 3

3

NoC Topology

Direct Indirect Irregular Irregular

Direct Topologies

Direct Topologies

Each node has direct point-to-point link to a subset of other nodes in the system called neighboring nodes As the number of nodes in the system increases the total available As the number of nodes in the system increases, the total available communication bandwidth also increases Fundamental trade-off is between connectivity and cost

Most direct network topologies have an orthogonal implementation, where nodes can be arranged in an n-dimensional orthogonal space

e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon

2D-mesh

It is most popular topology All links have the same All links have the same length

eases physical design

Area grows linearly with the number of nodes Must be designed in such a way as to avoid traffic accumulating in the center of accumulating in the center of the mesh

Torus

Torus topology, also called a k-ary n-cube, is an n- dimensional grid with k nodes in each dimension k-ary 1-cube (1-D torus) is essentially a ring network with k nodes

limited scalability as performance decreases when more nodes

k-ary 2-cube (i.e., 2-D torus) topology is similar to a regular mesh

except that nodes at the edges are connected to switches at the opposite edge via wrap-around channels long end-around connections can, however, lead to excessive delays

slide-4
SLIDE 4

4

Folding Torus

Folding torus topology overcomes the long link limitation of a 2-D torus Meshes and tori can be extended by adding Meshes and tori can be extended by adding bypass links to increase performance at the cost

  • f higher area

Octagon

Octagon topology is another example of a direct network

messages being sent between any 2 nodes require at most messages being sent between any 2 nodes require at most two hops more octagons can be tiled together to accommodate larger designs by using one of the nodes as a bridge node

Indirect Topologies

Indirect Topologies

Each node is connected to an external switch, and switches have point-to-point links to other switches switches have point to point links to other switches Fat tree topology Butterfly topology

Irregular or ad hoc network topologies

Customized for an application Usually a mix of shared bus, direct, and indirect network topologies p g E.g. reduced mesh, cluster-based hybrid topology

slide-5
SLIDE 5

5

Switching techniques

Circuit switching

+ Dedicated links, simple, low overhead, full bandwidth

  • Inflexible low utilization

Inflexible, low utilization

Packet switching

+ Shared links, flexible, variable bit rate (payload length)

  • Packet overhead

Switching mode:

  • Datagram switching: packet oriented
  • Virtual circuit switching: connection oriented

Switching scheme:

Switching scheme:

  • Store and Forward (SAF) switching
  • Virtual Cut-through (VCT): e.g. Ethernet
  • Low latency, decreased reliability
  • Worm-Hole switching (WH): e.g. NoC
  • Few buffer, lower latency, decreased reliability

Packet Switching (Store and Forward)

Buffers for data packets Source end node Destination end node

P k t l t l t d b f ti i f d d Store Packets are completely stored before any portion is forwarded

Packet Switching (Store and Forward)

Requirement: buffers must be sized to hold entire packet Source end node Destination end node

P k t l t l t d b f ti i f d d Store Forward Packets are completely stored before any portion is forwarded

Packet Switching (Virtual Cut Through)

Routing Source end node Destination end node P ti f k t b f d d (“ t th h”) t th t it h Portions of a packet may be forwarded (“cut-through”) to the next switch before the entire packet is stored at the current switch

slide-6
SLIDE 6

6

Virtual Cut Through vs. Wormhole

Virtual Cut Through: Packet level

Buffers for data packets Requirement: buffers must be sized to hold entire packet

Wormhole: FLIT level

Source end node Destination end node

Buffers for flits: packets can be larger than buffers

Source end node Destination end node

Virtual Cut Through vs. Wormhole

Buffers for data packets Requirement: buffers must be sized to hold entire packet

Virtual Cut Through

Source end node Destination end node

Busy Link Packet completely stored at the switch

Buffers for flits: packets can be larger than buffers

Wormhole

Source end node Destination end node

Packet stored along the path Busy Link

Routing Algorithms

Responsible for correctly and efficiently routing packets or circuits from the source to the destination Ensure load balancing Latency minimization Deadlock and livelock free

S D

Deadlock

slide-7
SLIDE 7

7

Livelock Static vs. Dynamic Routing

Static routing

+ Simple logic, low overhead + Simple logic, low overhead

+ Guaranteed in-order packet delivery

  • Does not take into account current state of the

network

Dynamic routing

+ Dynamic traffic distribution according to the

current state of the network

  • Complex, need to monitor state of the network

and dynamically change routing paths

Turn model based routing algorithm

Basis:

Mainly for mesh NoC Analyze directions in which packets can turn in the network Determine the cycles that such turns can form Prohibit just enough turns to break all cycle

Resulting routing algorithms are: Resulting routing algorithms are:

Deadlock and livelock free Minimal/ Non-minimal Highly Adaptive: based on the Network load

Turn model

What is a turn

From one dimension to another : 90 degree turn g To another virtual channel in the same direction: 0 degree turn To the reverse direction: 180 degree turn

Turns combine to form cycles

slide-8
SLIDE 8

8

X-Y routing algorithm (Deterministic)

+ x x + y + x

  • y
  • x
  • y

+ y

  • x

+ x

West-First routing algorithm

W est First Algorithm

West-First routing algorithm

S D

W est First Algorithm

West-First routing algorithm

S D

slide-9
SLIDE 9

9

W est First Algorithm

West-First routing algorithm

D S

North Last Algorithm

North-Last routing algorithm

D S

Source routing Flow Control

Required in non-Circuit Switched networks to deal with congestion Recover from transmission errors Commonly used schemes:

ACK-NACK Flow control Credit based Flow control Xon/ Xoff (STALL-GO) Flow Control

A B C Block Buffer full Don’t send Buffer full Don’t send “Backpressure”

slide-10
SLIDE 10

10

Flow Control Schemes

Credit based flow control

Sender sends packets w henever receiver sender packets w henever credit counter is not zero 10

Credit counter

9 8 7 6 5 4 3 2 1

X

pipelined transfer Queue is not serviced

Flow Control Schemes

Credit based flow control

Receiver sends Sender resum es receiver sender 10

Credit counter

9 8 7 6 5 4 3 2 1 + 5 5 4 3 2

X

Receiver sends credits after they becom e available Sender resum es injection

pipelined transfer Queue is not serviced

Flow Control Schemes

Xon/ Xoff flow control

receiver sender Xon Xoff a packet is injected if control bit is in Xon

Control bit Xon Xoff

pipelined transfer

Flow Control Schemes

Xon/ Xoff flow control

W hen Xoff receiver sender Xon Xoff W hen Xoff threshold is reached, an Xoff notification is sent

Control bit Xoff Xon

W hen in Xoff, sender cannot inject packets

X

pipelined transfer Queue is not serviced

slide-11
SLIDE 11

11

Credit-Based vs. Xon/ Xoff Flow Control

Both schemes can fully utilize buffers Restart latency is lower for credit-based schemes and therefore therefore

Credit-based flow control has higher average buffer occupancy at high loads Credit-based flow control leads to higher throughput at high loads Smaller inter-packet gap

Control traffic is higher for credit schemes

Block credits can be used to tune link behavior

Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance) Credit schemes have higher information content useful for QoS schemes

NoC Architecture Examples

Intel’s Teraflops Research Processor

Deliver Tera-scale performance

Single precision TFLOP at desktop power Frequency target 5GHz

I/ O Area I/ O Area

single tile single tile 1 5mm 1 5mm

1 2 .6 4 m m

Frequency target 5GHz Bi-section B/ W order of Terabits/ s Link bandwidth in hundreds of GB/ s

Prototype two key technologies

On-die interconnect fabric 3D stacked memory

Develop a scalable design methodology

1.5mm 1.5mm 2. .0 0mm mm

2 1 .7 2 m m

65nm, 1 poly, 8 metal (Cu) Technology 100 Million (full-chip) 1.2 Million (tile) Transistors 275mm2 (full-chip) Die Area 65nm, 1 poly, 8 metal (Cu) Technology 100 Million (full-chip) 1.2 Million (tile) Transistors 275mm2 (full-chip) Die Area

methodology

Tiled design approach Mesochronous clocking Power-aware capability

I/ O Area I/ O Area PLL PLL TAP TAP I/ O Area I/ O Area PLL PLL TAP TAP

( ) 3mm2 (tile) 8390 C4 bumps # ( ) 3mm2 (tile) 8390 C4 bumps #

[Vangal08]

Main Building Blocks

Special Purpose Cores 2D Mesh Interconnect

Mesochronous Interface Crossbar Router

MSINT 39 40 GB/s MSINT MSINT

2D Mesh Interconnect Mesochronous Clocking Workload-aware Power Management

2KB Data memory (DMEM) memory (IMEM) 6-read, 4-write 32 entry RF

32 64 64 32 64 RIB 96 96 M 39 x 32 x 32 MSINT

3KB Inst. m

Processing Engine (PE)

FPMAC0 + Normalize 32 FPMAC1 + 32 Normalize

Tile

slide-12
SLIDE 12

12

Emerging NoC technologies

Multi-Band RF-Interconnect

Source: M. Chang, CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect

Multi-Band RF-Interconnect Multi-Band RF-Interconnect

An area overhead of 0.13% , an average 13% (max 18% ) boost in application performance, an average 22% (max 24% ) reduction in packet latency.

3D optical NoC

32Gbps optical link bandwidth, 70% power reduction compared to a matched 2D electronic NoC.

Source: Y. Ye, 3D Optical Networks-on-chip (NoC) for Multiprocessor Systems-on-chip (MPSoC)

slide-13
SLIDE 13

13

References

[ 1] S. Pasricha, On-Chip Communication: Networks on Chip (NoCs), 2011. [ 2] C J Glass The Turn Model for Adaptive Routing [ 2] C. J. Glass, The Turn Model for Adaptive Routing. [ 3] U. M. Mirza, Network on Chip, 2011. [ 4] M.F. Cong et al., "CMP network-on-chip overlaid with multi-band RF-interconnect," IEEE 14th International Symposium on High Performance Computer Architecture (HPCA), vol., no., pp.191-202, 16-20 Feb. 2008. [ 5] Y. Ye, “3D Optical Networks-on-chip (NoC) for M lti S t hi (MPS C)” IEEE Multiprocessor Systems-on-chip (MPSoC)”, IEEE International Conference on 3D System Integration (3DIC), vol., no., pp.1-6, 28-30 Sept. 2009.