Models of Architecture Maxime Pelcat INSA Rennes, IETR, Institut - - PowerPoint PPT Presentation

models of architecture
SMART_READER_LITE
LIVE PREVIEW

Models of Architecture Maxime Pelcat INSA Rennes, IETR, Institut - - PowerPoint PPT Presentation

Models of Architecture Maxime Pelcat INSA Rennes, IETR, Institut Pascal Nokia Bell Labs 2018 This work has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO. INSA


slide-1
SLIDE 1

Models of Architecture

Nokia Bell Labs 2018 Maxime Pelcat INSA Rennes, IETR, Institut Pascal

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732105: CERBERO.

slide-2
SLIDE 2

INSA Rennes – IETR VAADER

  • INSA Rennes
  • IETR VAADER
  • Institut Pascal

2

slide-3
SLIDE 3
  • Abstracting computational architecture to

–Predict performance –Support current hardware evolutions

Models of Architecture

slide-4
SLIDE 4
  • Hardware Architectures are becoming

–More complex –More heterogeneous –More High Performance embedded Computing (HPeC)

  • Embedded deep learning, near-sensor computing, fog

computing, edge computing, many-cores, etc.

  • Real-time constraints, stream processing applications

Motivation: architecture evolution

slide-5
SLIDE 5
  • Let’s look at ARM-based HPeC

–Let us consider 4 heterogeneous solutions

  • ARM = control path + some of the data path
  • in red: data path

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP Multi-ARM

slide-6
SLIDE 6
  • Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP Multi-ARM

slide-7
SLIDE 7
  • ARM big.LITTLE: Samsung Exynos 5422

Motivation: HPeC architectures

A15 A15 A15 A15

SCU ACE

A7 A7 A7 A7

SCU 2MB 0.5MB 2GB DDR (PoP) Easy to program Linux SMP Thread migration 12Gflops <10W

Low energy cores High Performance cores 2GHz 1.4GHz

slide-8
SLIDE 8
  • Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP Multi-ARM

slide-9
SLIDE 9
  • Multi-ARM + GPGPU: Nvidia Jetson TX1 module

Motivation: HPeC architectures

A57 A57 A57 A57

SCU 4GB external DDR

  • n

module Less easy to program Linux SMP + CUDA/OpenCL

32 cores /warp Control path 1.6GHz

256-core Maxwell GPGPU

Data path

H.264 4K 60Hz

slide-10
SLIDE 10
  • Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP Multi-ARM

slide-11
SLIDE 11
  • Multi-ARM + DSP: Texas Instruments Keystone II TCI6638K2K

Motivation: HPeC architectures

A15 A15 A15 A15

SCU 4MB Difficult to program (well) Linux SMP + Open Event Machine 160 Gflops <15W

Control path 1.4GHz Data path

Teranet

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

C66

1MB

FFTC

6MB MSMC

1.2GHz

slide-12
SLIDE 12
  • Let’s look at ARM-based HPeC

Motivation: HPeC architectures

Multi-ARM FPGA Multi-ARM GPGPU Multi-ARM DSP Multi-ARM

slide-13
SLIDE 13
  • Multi-ARM + FPGA: Xilinx Zynq Ultrascale +

Motivation: HPeC architectures

A53 A53 A53 A53

SCU 1MB More difficult to program (well) Linux SMP + HLS or HDL

Control path 1.5GHz Data path

GPU FPGA R5 R5

Switch fabric

Not GPGPU Up to 4MB 1MFF 0.5MLUT 600MHz

slide-14
SLIDE 14

Motivation: HPeC architectures

  • Current trends

– FPGAs are gaining importance: what about flops? – Adding video/image accelerators

  • Video Compression: H.264/AVC, H.265/HEVC, etc.
  • AI: For tensor applications  reach 1Tops/W

–RISC-V as an open HW competitor to ARM

slide-15
SLIDE 15
  • Towards more complexity

–More cores, hierarchies of clusters –Heteronegeneity, Interconnect complexity

  • Reminds intra-core modifications in XXth

Motivation: architecture evolution

ALU

clk clk clk

+ + + + Ld × + Str SIMD VLIW

slide-16
SLIDE 16
  • But there are some differences between intra-

core and inter-core parallelism

–At coarse grain, PEs communicate asynchronously –There is no (or less) centralized processing decision –There is no performance portability (nothing equivalent to C-to-VLIW compilers)

  • How can/should we manage this HW

complexity?

–Can we predict performance at design time? How?

Motivation: architecture evolution

slide-17
SLIDE 17

System Objectives

Maxime Pelcat 17

T°C Energy Reliability Memory Unit Cost

$

Security Maintenance Cost

$

Performance Peak Power

slide-18
SLIDE 18

System Prototype

System Design: Y-Chart

Maxime Pelcat 18

Architecture Design Algorithm Application Redesign Redesign

slide-19
SLIDE 19

Model of Architecture (MoA) conform to

Model-Based Design

19

KPI Architecture Model KPI Evaluation Algorithm Algorithm Model Redesign

Maxime Pelcat

Model of Computation(MoC) conforms to Redesign

slide-20
SLIDE 20

On MoC Side: Many Results

  • #EdwardALee, #ProgrammingParadigms
  • Discrete Event MoCs
  • Finite State Machines  imperative languages
  • Functional MoCs
  • Petri Nets
  • Dataflow MoCs SDF, CSDF, IDF, IBSDF, PSDF,

SPDF, PiSDF, etc.

20 Maxime Pelcat

PREESM

slide-21
SLIDE 21

And they are not all here…

Dataflow MoCs Case

Feature

SDF ADF IBSDF DSSF PSDF PiSDF SADF SPDF DPN KPN

Expressivity Low Med. Turing complete Hierarchical X X X X Compositional X X X Reconfigurable X X X X X X Statically schedulable X X X X Decidable X X X X (X) (X) X (X) Variable rates X X X X X X X Non-determinism X X X

SDF: Synchronous Dataflow ADF: Affine Dataflow IBSDF: Interface-Based Dataflow DSSF: Deterministic SDF with Shared Fifos PSDF: Parameterized SDF PiSDF Parameterized and Interfaced SDF SADF: Scenario-Aware Dataflow SPDF: Schedulable Parametric Dataflow DPN: Dataflow Process Network KPN: Kahn Process Network

slide-22
SLIDE 22

But Still a Lot to Do

  • on Real-Time Multicore systems especially
  • Usually, RT application specification =

–Multiple tasks sharing resources –Activation periods or triggering events

  • Objective = keeping resources busy

22 Maxime Pelcat

T1 T2 T3

slide-23
SLIDE 23

MoCs are not sufficient

23

Energy Energy Evaluation Algorithm Algorithm Model

Maxime Pelcat

Model of Computation(MoC) conforms to

slide-24
SLIDE 24

Models of Architecture

Maxime Pelcat 24

Model of Architecture (MoA) conform to KPI Architecture Model KPI Evaluation Algorithm Algorithm Model Redesign Redesign

slide-25
SLIDE 25

Problem: Predict System Quality

  • How to predict a system « quality » ?

–Efficiently (simple procedure) –Early (from abstract models) –Accurately (with a good fidelity) –With reproducibility (same models = same prediction)

25 Maxime Pelcat

slide-26
SLIDE 26

Model of Architecture

  • Definition

–Model of a system Non-Functional Property –Application-independent –Abstract –Reproducible

26 Maxime Pelcat

Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD.

slide-27
SLIDE 27

Model of Architecture

Maxime Pelcat 27

Model Reproducible Application- independent Abstract AADL

  

MCA SHIM

  

UML MARTE

 / 

AAA

  

CHARMED

  

S-LAM

  

MAPS

  

LSLA

  

slide-28
SLIDE 28

NFP = MoA( ) activity( )

MoA depends on MoC

Model of Architecture

28

One and always the same quality evaluation Model H conforms to MoA Model G conforms to MoC Activity

MoC( )

Maxime Pelcat

application

Performance Power Energy Memory T°C Reliability Security Cost

slide-29
SLIDE 29

Model of Architecture

29

KPI MoA MoC Act

Maxime Pelcat

slide-30
SLIDE 30

LSLA: First MoA

  • LSLA = Linear System-Level Architecture

Model

  • Motivated by the additive nature of energy

consumption

Maxime Pelcat 30

slide-31
SLIDE 31

System Objectives

Maxime Pelcat 31

T°C Energy Reliability Memory Unit Cost

$

Security Maintenance Cost

$

Performance Peak Power

slide-32
SLIDE 32

Energy/Power Define Architecture

20W 20kW 20MW Need a dissipator 2W 7W Need a fan Embedded system Dedicated system

  • r conventional system

HPC HPeC influence

slide-33
SLIDE 33

LSLA Model of Architecture

33

Task1 signal signal Task2 Task3 Task4 Task5 1 1 1 1 1 1 1

PE1 PE2

CN

10x+1 2x+0 3x+0

16+12+22=50

Maxime Pelcat

token quantum Compositional

slide-34
SLIDE 34

LSLA Model of Architecture

34

Task1 signal signal Task2 Task3 Task4 Task5 1 1 1 1 1 1 1

PE1 PE2

CN

10x+1 2x+0 3x+0

16+12+22=50

Maxime Pelcat

SDF: Model of Computation Activity LSLA: Model of Architecture

slide-35
SLIDE 35

LSLA MoA for Energy Prediction

  • 86% of fidelity on octo-core ARM 

35 Maxime Pelcat

slide-36
SLIDE 36

LSLA MoA for Energy Prediction

  • The model is learnt from energy

measurements

36

PE PE

CN

PE PE PE PE

CN

PE PE

CN

Maxime Pelcat

slide-37
SLIDE 37

LSLA MoA for Energy Prediction

  • The model is learnt from energy

measurements

37

PE PE

CN

α 1.5W 1.5W PE PE 1.5W 1.5W PE PE

CN

γ 0.3W 0.3W PE PE 0.3W 0.3w

CN

β

Maxime Pelcat

slide-38
SLIDE 38

LSLA: MoA, not MoHW

  • LSLA models HW + communication

libraries + scheduler + Oss +…

  • LSLA models the service the platform
  • ffers to the applications
  • Top-down approach

–Learning parameters from experiments

Maxime Pelcat 38

slide-39
SLIDE 39

System Objectives

Maxime Pelcat 39

T°C Energy Reliability Memory Unit Cost

$

Security Maintenance Cost

$

Latency Peak Power

slide-40
SLIDE 40

MoAs: Limits of LSLA

  • Energy

 Linear model OK

  • Latency
  • Latency does not have an additive nature

40

Maxime Pelcat

Task1 Task2 1 1 1 Task1 Task2 1 1 1 1

Latency = sum Latency = max

!

slide-41
SLIDE 41

Activity & MoA for Latency

41

Task1 signal signal Task2 Task3 Task4 Task5 1 1 1 1 1 1 1

SDF a) b)

Maxime Pelcat

c)

slide-42
SLIDE 42

Activity & MoA for Latency

42

PE1 PE2

CN

10x+1 2x+0 3x+0

Σ  12+12+11=35 Σ 8+6+11=25 max(35,25)=35 a) b)

Maxime Pelcat

MaxPlus

slide-43
SLIDE 43

c)

Activity & MoA for Latency

43

PE1 PE2

CN

10x+1 2x+0 3x+0

Σ  24

Maxime Pelcat

slide-44
SLIDE 44

System Prototype

Accuracy? No, Fidelity!

Maxime Pelcat 44

Architecture Design Algorithm Application Redesign Redesign

slide-45
SLIDE 45

Current Activities

Maxime Pelcat 45

slide-46
SLIDE 46
  • Cross-layer Design of Reconfigurable

Cyber-Physical Systems

H2020 CERBERO

Maxime Pelcat 46

slide-47
SLIDE 47

CERBERO System Adaptation

Maxime Pelcat 47

slide-48
SLIDE 48

H2020 Cerbero Toolchain

Maxime Pelcat 48

VT AOW DynAA PAPI SPIDER JADE ARTICO3 MDC Intermediate Representation C++ System Model Application / Architecture Runtime Support Low-Level Implementation (Hardware Abstraction) PREESM MECA End-user interaction

slide-49
SLIDE 49

GdR SOC2

  • Groupement de recherche SOC2

– Systems on a Chip, Connected Systems – Industrial partner club

Maxime Pelcat 49

slide-50
SLIDE 50

GdR SOC2

Maxime Pelcat 50

slide-51
SLIDE 51

SAMOS XIX

  • 19th edition of SAMOS Conference
  • July 7-11, submission in March

Maxime Pelcat 51

slide-52
SLIDE 52

Takeaway Message

  • MoAs can early predict performance/quality

–Especially for HPeC systems

  • MoAs are not HW Models

–They model HW + protocols + OS + …

  • MoAs are built/learnt top-down

–They can and should be simple

  • The need for MoAs may rise

–Due to Fog/Edge Computing complexity

Maxime Pelcat 52

KPI MoA MoC Act

slide-53
SLIDE 53

Questions?

Maxime Pelcat 53

Pelcat, M; Mercat, A; Desnos, K; Maggiani, L; Liu, Y; Heulot, J; Nezan, J-F; Hamidouche, W; Ménard, D; Bhattacharyya, S (2017) "Reproducible Evaluation of System Efficiency with a Model of Architecture: From Theory to Practice", IEEE TCAD.

www.cerbero-h2020.eu http://preesm.org