Systems on Chip and Networks on Chip: Bridging the Gap with QoS - - PowerPoint PPT Presentation

systems on chip and networks on chip bridging the gap
SMART_READER_LITE
LIVE PREVIEW

Systems on Chip and Networks on Chip: Bridging the Gap with QoS - - PowerPoint PPT Presentation

Systems on Chip and Networks on Chip: Bridging the Gap with QoS Kees Goossens Philips Research The Netherlands Kees Goossens 09-07-2003 MPSOC 2 sources of unpredictability applications unpredictability architectures physical effects


slide-1
SLIDE 1

Kees Goossens 09-07-2003 MPSOC

Systems on Chip and Networks on Chip: Bridging the Gap with QoS

Kees Goossens Philips Research The Netherlands

slide-2
SLIDE 2

Kees Goossens 09-07-2003 MPSOC

2

sources of unpredictability

  • but we still want to build predictable systems!
  • the quality of service concept helps

unpredictability architectures physical effects applications

slide-3
SLIDE 3

Kees Goossens 09-07-2003 MPSOC

3

  • verview
  • 1. application & user views
  • leads to quality of service (QoS) concept
  • 2. system on chip (SoC) design view
  • leads to networks on chip (NoC)
  • 3. NoCs and QoS: a synthesis to recuperate predictability
  • types of QoS commitment and their costs
  • 4. the Æthereal approach and architecture
slide-4
SLIDE 4

Kees Goossens 09-07-2003 MPSOC

  • 1. application & user views

application-induced unpredictability

slide-5
SLIDE 5

Kees Goossens 09-07-2003 MPSOC

5

  • 1. future applications
  • convergence of application domains

– increased functionality and heterogeneity – higher semantic content/entropy more dynamism

15000 17000 19000 21000 23000 25000 27000 29000

worst-case load structural load running average instantaneous load

VBR MPEG DVD stream

[VBR: variable bit rate]

slide-6
SLIDE 6

Kees Goossens 09-07-2003 MPSOC

6

  • 1. future applications
  • embedded and pervasive applications ("ambient intelligence")

– real time – safety critical

  • users expect predictable behaviour

– e.g. PC, mobile phone, TV, heating system, air bag low high

  • quality of service is resource management for predictability
slide-7
SLIDE 7

Kees Goossens 09-07-2003 MPSOC

7

  • 1. consumer-electronics requirements
  • consumer-electronics media processing is challenging

signal processing hard real time very regular load high quality worst case typically on DSPs media processing hard real time irregular load high quality average case SoC/media processors multi-media soft real time irregular load "sloppy" quality average case PC/desktop

slide-8
SLIDE 8

Kees Goossens 09-07-2003 MPSOC

  • 2. SoCs and NoCs

architecture-induced unpredictability

slide-9
SLIDE 9

Kees Goossens 09-07-2003 MPSOC

9

  • 2. systems on chip
  • Moore’s law predicts exponential growth of resources
  • but.. someone has to do the work to make it come true
  • 1. deep submicron problems (DSM)

– wire vs. transistor speed, power, signal integrity

  • 2. design productivity gap

– IP re-use, platforms, NoCs – verification

DSM nightmare integration hell you!

slide-10
SLIDE 10

Kees Goossens 09-07-2003 MPSOC

10

  • 2. example SoC

Philips's advanced set-top box and digital TV SoC Viper (pnx8500)

  • 0.18 mm / 8M
  • 1.8V / 4.5 W
  • 35 M transistors
  • 82 clock domains
  • more than 50 IP blocks

MPEG MBS + VIP MMI+AICP 1394 MSP M-PI MIPS TriMedia VLIW T-PI Conditional access CAB

Viper @ 0.09 mm

slide-11
SLIDE 11

Kees Goossens 09-07-2003 MPSOC

11

  • 2. composing local solutions

to solve both DSM and productivity gap issues:

  • global approaches won’t work with exponential problem
  • so
  • 1. break up problem (modularity)
  • 2. then compose sub-solutions
  • 3. in a scalable fashion

– hierarchy helps – abstraction helps

slide-12
SLIDE 12

Kees Goossens 09-07-2003 MPSOC

12

  • 2. composing local solutions - examples
  • for timing (closure):

– globally asynchronous, locally synchronous (GALS)

  • for lay-out:

– IP-level Mead & Conway, e.g. wiring strategies, tiling

  • for architectures:

– chip multiprocessing (CMP), tiling, systolic/cell, …

  • for programming:

– e.g. Kahn process networks

  • locally sequential, globally concurrent
  • locally shared memory, globally message passing

$

cpu

$

cpu

$

cpu

$

cpu

network

mem

slide-13
SLIDE 13

Kees Goossens 09-07-2003 MPSOC

13

  • 2. networks on chip
  • have to connect many local solutions

– heterogeneity, scalability

  • through the decoupling of

communication & computation

? ip ip ip ip ip ip ip ip ip ip ip DSM nightmare integration hell NoC

  • networks on chip address this challenge

– from above (protocol stack, IP re-use) – from below (DSM)

network

  • n chip
slide-14
SLIDE 14

Kees Goossens 09-07-2003 MPSOC

14

  • 2. networks on chip
  • two-pronged approach

– deal with communication dynamism – protocol stacks enable differentiated services – scalable, compositional IP composition – structure interconnect (wires, lay-out, timing)

hardware technology application demands services router-based network

  • ffers services

protocol stack is based on services

slide-15
SLIDE 15

Kees Goossens 09-07-2003 MPSOC

15

  • 2. networks on chip: examples

two types of component:

  • routers

– transport data in packets

  • network interfaces

– convert IP view (transactions, e.g. Amba, OCP) to network view (packets)

fat tree tree mesh

NI NI NI NI R R R R R R R R NI R NI R NI R NI R NI R NI R NI R NI R NI R NI NI NI NI NI R R R R R R

slide-16
SLIDE 16

Kees Goossens 09-07-2003 MPSOC

16

  • 2. networks on chip: advantages
  • differentiated services

– offer different kinds of communication with one network

  • scalable

– add routers and network interfaces for extra bandwidth (at the cost of additional latency)

  • compositional

– add routers/NIs without changing existing components e.g. timing, buffers

  • efficient use of wires

– statistical multiplexing/sharing (average vs. worst-case) fewer wires less wire congestion – point to point wires at high speed

  • communication becomes re-usable, configurable IP
slide-17
SLIDE 17

Kees Goossens 09-07-2003 MPSOC

  • 3. NoCs and QoS: a synthesis

managing unpredictability with quality of service

slide-18
SLIDE 18

Kees Goossens 09-07-2003 MPSOC

18

… … … …

interconnect

mem arb.

interconnect interconnect

ext. mem IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP cpu $ cpu $

  • 3. GULP?

local schedulers

  • (RT)OS

– task switching – interrupts

  • cache strategy

– cache pollution

  • interconnect

– busses, bridges – networks

  • memory controllers

– external memory e.g. RR, TDMA, FCFS, LRU, EDLF, FIFO, priority, …

what is the global behaviour, composed of interacting local solutions?

slide-19
SLIDE 19

Kees Goossens 09-07-2003 MPSOC

19

  • example

– CPU @ 225MHz, 64KB I$ – ~70 cycle latency to external memory

  • single task switch

~13K cycles =

– RTOS overhead + task switch [1] ~6K cycles + – cache reload due to pollution (10%) [2] ~7K cycles

  • with 20 hard real-time video tasks @ 60Hz

– 1200 switches X 13K cycles 7% CPU load – what about effective task throughput, latency?

  • have to guarantee

– throughput and latency (for hard real-time IP) – throughput (for soft real-time IP) – minimal latency (for CPU control tasks)

  • 3. GULP?

task X

switch [1]

task Y time cycles/instr

[2]

slide-20
SLIDE 20

Kees Goossens 09-07-2003 MPSOC

20

  • 3. GULP?
  • so, now we can make SoCs with NoCs,

using our decoupled recomposed solutions

  • get locally predictable,

globally unpredictable behaviour (GULP)

– GALS: multiple clock domains leads to uncertainty in time or data – power management: combining local autonomous probabilistic managers – NUMA: local vs. remote shared memory, dynamic (cache/mem) paging – Kahn process networks: how are sequential processes scheduled? – interacting schedulers/resource managers

  • but the user wanted (global) predictable behaviour..
slide-21
SLIDE 21

Kees Goossens 09-07-2003 MPSOC

21

  • 3. QoS & GULP
  • ur tenet is that a quality of service approach is

essential to recuperate global predictability – the user and application require it – it fits well with NoC protocol stack

  • quality of service is nothing more than
  • 1. stating what service you want (negotiation)
  • 2. having the provider either commit to or reject your request
  • 3. renegotiate when your requirements change
  • create a series of steady states that are predictable

(re)negotiate steady states

slide-22
SLIDE 22

Kees Goossens 09-07-2003 MPSOC

22

15000 17000 19000 21000 23000 25000 27000 29000

worst-case load structural load running average instantaneous load

  • 3. example: QoS & VBR

(re)negotiate steady states

[VBR: variable bit rate]

slide-23
SLIDE 23

Kees Goossens 09-07-2003 MPSOC

23

  • 3. quality of service
  • QoS means reducing uncertainty to negotiation phase

– for both user and provider – requires & enables resource management

  • notion of commitment

– guaranteed versus best-effort service

  • types of commitment
  • 1. correctness

e.g. uncorrupted data

  • 2. completion

e.g. no packet loss

  • 3. bounds

e.g. maximum latency

slide-24
SLIDE 24

Kees Goossens 09-07-2003 MPSOC

24

  • 3. some remarks
  • the types of commitment are dependent

– e.g. cannot offer latency bound without completion – this has repercussions for protocol stack & architecture

  • data retransmission on unreliable low-swing wires

immediately excludes guaranteed latency & jitter

  • quality of service must be done at all levels

– physical: power manager of IP blocks – link & network: network and communication links – task level: CPU scheduler (RTOS), application software

  • QoS is pervasive, it cannot be bolted on afterwards

service providers service users services

slide-25
SLIDE 25

Kees Goossens 09-07-2003 MPSOC

25

  • 3. some remarks
  • the “statistical guarantees” oxymoron

– e.g. guaranteeing >0% packet arrival implies QoS

  • have to keep track of percentage lost

– post hoc analysis of behaviour of architecture is no guarantee

  • unless boundary conditions of analysis are enforced

(and then resources have to be managed QoS)

slide-26
SLIDE 26

Kees Goossens 09-07-2003 MPSOC

26

  • 3. the cost of QoS

a) guaranteed bounds require worst-case resource dimensioning b) completion requires at least average-case resources

slide-27
SLIDE 27

Kees Goossens 09-07-2003 MPSOC

27

  • 3. the cost of QoS

best-effort services can have better average resource utilisation at the cost of unpredictable/unbounded worst-case behaviour the combination of best-effort & guaranteed services (c) is useful!

slide-28
SLIDE 28

Kees Goossens 09-07-2003 MPSOC

28

  • 3. quality of service
  • IP integration is the problem
  • SoC design becomes communication centric
  • the NoC is the focus of the architecture
  • to make SoCs predictable, NoCs must offer QoS

network

  • n chip

ip ip ip ip ip ip ip ip ip ip ip predictable composition with QoS

slide-29
SLIDE 29

Kees Goossens 09-07-2003 MPSOC

  • 4. NoCs and QoS:

the Æthereal approach

slide-30
SLIDE 30

Kees Goossens 09-07-2003 MPSOC

30

  • 4. Æthereal context
  • consumer electronics

– reliability & predictability are essential – low cost is crucial – time to market must be reduced

  • NoCs and QoS helps on all accounts
  • NoCs are focal point of SoCs

QoS is essential for NoCs

  • hence the Æthereal NoC offers differentiated services

– to manage (and hence reduce) resources – to ease integration (and hence decrease TTM)

slide-31
SLIDE 31

Kees Goossens 09-07-2003 MPSOC

31

  • 4. NoC services
  • request communication services using connections

– opening & closing affect resource reservations

  • with properties

– data integrity (uncorrupted data transfer) – transaction ordering

  • un/ordered per slave/connection

– transaction completion – flow control

  • data loss or not

– delivery bounds

  • throughput, latency, jitter

correctness completion bounds commitment

slave IP master IP

LD, ST data

slide-32
SLIDE 32

Kees Goossens 09-07-2003 MPSOC

32

  • 4. architecture decisions
  • we expect lossless, (partially) ordered connections with
  • r without throughput guarantees to be most popular
  • hence, to reduce costs, we implement this natively, i.e.

– don't drop data in the network(*) – don't reorder data in the network no retransmissions & filtering of duplicates no reorder buffers

  • not dropping data complicates congestion & deadlock issues

(*) excluding network interfaces

slide-33
SLIDE 33

Kees Goossens 09-07-2003 MPSOC

33

  • 4. architecture decisions
  • conceptually, two disjoint networks

– a network with throughput+latency guarantees (GT) – a network without those guarantees (best-effort, BE)

  • we have a several types of commitment in the network

– combine guaranteed worst-case behaviour with good average resource usage

priority/arbitration best-effort router guaranteed router

programming

slide-34
SLIDE 34

Kees Goossens 09-07-2003 MPSOC

34

  • 4. best-effort router architecture
  • worm-hole routing
  • input queueing
  • source routing

– source decides on path to follow through network

  • other options (all feasible, area-wise)

– input queuing with virtual-cut-through routing – virtual output queuing & iSLIP with worm-hole routing – output queuing with worm-hole routing

slide-35
SLIDE 35

Kees Goossens 09-07-2003 MPSOC

35

  • 4. guaranteed-throughput router
  • contention-free routing

– synchronous, using slot tables – time-division multiplexed circuits

  • store-and-forward routing
  • headerless packets

– information is present in slot table

slide-36
SLIDE 36

Kees Goossens 09-07-2003 MPSOC

36

  • 4. architecture decisions
  • to offer guaranteed latency or bandwidth over finite interval

– cannot drop data – must bound contention and congestion

  • rate-based scheduling

– has high buffer costs (deep fifos)

  • deadline-based scheduling

– even higher buffer costs (deep priority queues)

  • contention-free routing

– low buffer costs (shallow fifos)

  • NB, all require some notion of time
slide-37
SLIDE 37

Kees Goossens 09-07-2003 MPSOC

37

  • 4. contention-free routing
  • latency guarantees are easy in circuit switching
  • emulate circuits with packet switching
  • schedule packet injection in network

such that they never contend for same link at same time – in space: disjoint paths – in time: time-division multiplexing – or a combination

slide-38
SLIDE 38

Kees Goossens 09-07-2003 MPSOC

38 router 1 router 3 router 2 network interface network interface network interface 1 1 2 1 2 1 3 3 input 2 for router 1 is

  • utput 1 for router 2

slot table

  • f router 1

the input routed to the

  • utput at this slot

3 3 1 1 2 2 1 1 2 2 1 1 2 2 2 2 1 1 2 2 1 1 4 4

  • 1
  • i1

i1

  • 2
  • 3
  • 4
  • i4
  • 1
  • 2
  • 3

i1

  • 4

use slots to

  • avoid contention
  • divide up bandwidth
  • 1
  • 3
  • i1
  • 4

i1

  • 2

i3

slide-39
SLIDE 39

Kees Goossens 09-07-2003 MPSOC

39

  • 4. programming model
  • use best-effort packets to set up connections

– set-up & tear-down packets like in ATM (asynchronous transfer mode)

  • distributed, concurrent, pipelined
  • safe: always consistent
  • compute slot assignment compile time, run time,
  • r combination
  • connection opening is guaranteed to complete

(but without a latency guarantee) with commitment or rejection

slide-40
SLIDE 40

Kees Goossens 09-07-2003 MPSOC

40

  • 4. architecture insights
  • memories (for packet storage)

– register-based fifos are expensive – RAM-based fifos are as expensive

  • 80% of router is memory

– special hardware fifos are very useful

  • 20% of router is memory
  • speed of memories

– registers are fast enough – RAMs may be too slow – hardware fifos are fast enough

iqu iqu iqu iqu switch iqu iqu msu stu routers based on register-file and hardware fifos drawn to approximately same scale (1mm2, 0.26mm2)

slide-41
SLIDE 41

Kees Goossens 09-07-2003 MPSOC

41

  • 4. architecture insights
  • router must be scalable, but up to a point

– in terms of area (switch, input or output buffers) – in terms of speed (arbiter, memories) – don't go beyond 8x8 routers (for fat tree)?

  • switch is less of a problem than expected
  • latency of router is essential

– increase data rate – increase arbitration rate

  • minimise number of hops in network

– topology choice

  • think about trade-off hops versus router latency
slide-42
SLIDE 42

Kees Goossens 09-07-2003 MPSOC

42

  • 4. router results

a prototype router:

  • 5 input and 5 output ports (arity 5)
  • 0.25 mm2 CMOS12
  • 500 MHz data path, 166 MHz control path
  • flit size of 3 words of 32 bits
  • 500x32 = 16 Gb/s throughput per link, in each direction
  • 256 slots & 5x1 flit fifos for guaranteed-throughput traffic
  • 6x8 flit fifos for best-effort traffic
slide-43
SLIDE 43

Kees Goossens 09-07-2003 MPSOC

43

  • 4. router architecture

… X

BQ GQ

slot table arbiter reconfiguration logic programming packets

BQ GQ

flow control data packets

BQ GQ

slide-44
SLIDE 44

Kees Goossens 09-07-2003 MPSOC

  • 5. conclusions
slide-45
SLIDE 45

Kees Goossens 09-07-2003 MPSOC

45

  • 5. conclusions
  • future applications are more dynamic and embedded
  • users wants predictable and reliable behaviour
  • QoS bridges this apparent contradiction
  • future SoC design relies on NoCs to

– solve DSM issues – close the design productivity gap

  • QoS can be provided by the NoC protocol stack (services)
  • but, the NoC architecture must take this into account
  • there is an increasing awareness of the need for

predictable system design and the role NoCs can play

DSM heaven? integration bliss? NoC

slide-46
SLIDE 46

Kees Goossens 09-07-2003 MPSOC

46

  • 5. conclusions
  • the Æthereal NoC

– offers differentiated services – with different types of commitment

  • the Æthereal architecture

– aims to marry guaranteed worst-case behaviour with good average resource usage

  • Æthereal's prototype routers show feasibility of NoCs