PARADES
An Overview of (Electronic) System An Overview of (Electronic) System Level Design: beyond hardware Level Design: beyond hardware-
- software co
software co-
- design
An Overview of (Electronic) System An Overview of (Electronic) - - PowerPoint PPT Presentation
An Overview of (Electronic) System An Overview of (Electronic) System Level Design: beyond hardware- - Level Design: beyond hardware software co- -design design software co Alberto Ferrari Alberto Ferrari Deputy Director Deputy Director
PARADES
2
PARADES
3
PARADES
4
PARADES
5
PARADES
Source: Edward A. Lee
6
PARADES
7
PARADES
Gen2/GEM
GeN2-JIS
8
PARADES
9
PARADES
10
PARADES
* C++ CODE
FABIO ROMEO, Magneti-Marelli DAC, Las Vegas, June 20th, 2001
Memory Lines Of Code Changing Rate
Validation Time Time To Market
INSTRUMENT CLUSTER
Productivity Residual Defect Rate @ End Of Dev 256 Kb 50.000 3 Years 40 Man-yr 5 Months 24 Months
PWT UNIT
6 Lines/Day 3000 Ppm 128 Kb 30.000 2 Years 12 Man-yr 1 Month 18 Months
BODY GATEWAY
10 Lines/Day 2500 ppm 184 Kb 45.000 1 Year 30 Man-yr 2 Months 12 Months 6 Lines/Day 2000ppm 8 Mb 300.000 < 1 Year 200 Man-yr 2 Months < 12 Months
TELEMATIC UNIT
10 Lines/Day* 1000 ppm
11
PARADES
I nformation Systems Telematics Fail Stop Body Electronics Body Functions Fail Safe Fault Functional System Electronics Driving and Vehicle Dynamic Functions Mobile Communications Navigation Fire Wall Access to WWW DAB Gate Way Gate Way Theft warning Door Module Light Module Air Conditioning Shift by Wire Engine Management ABS Steer by Wire Brake by Wire MOST MOST Firewire Firewire CAN CAN Lin Lin CAN CAN TTCAN TTCAN FlexRay FlexRay Real Time Soft Real Time Hard Real Time
12
PARADES
some aspects are not considered at the beginning of the development: ent:
Node and Network
Processes and Processors
SoC Software and Hardware Software and Hardware
the designer wants to explore different possible implementations in order to in order to maximize (or minimize) a cost function maximize (or minimize) a cost function
Mechanical partition
Hardware partition: analog and digital
Software partition: low, middle and application level
13
PARADES
Development
System Sub-System(s) Integration, Test, and Validation Distributed System Sign-Off!
I nformation Systems Telematics Fail Stop Body Electronics Body Functions Fail Safe Fault Functional System Electronics Driving and Vehicle Dynamic Functions Mobile Communications Navigation Fire Wall Access to WWW DAB Gate Way Gate Way Theft warning Door Module Light Module Air Conditioning Shift by Wire Engine Management ABS Steer by Wire Brake by Wire MOST MOST Firewire Firewire CAN CAN Lin Lin CAN CAN TTCAN TTCAN FlexRay FlexRay Real Time Soft Real Time Hard Real Time
What:
Functionality
How:
Architecture
Trading (ES):
Computation (hw/sw sw) )
Communication (hw/sw sw) )
Time trigger/Event trigger
Abstractions ?
Cost evaluation ?
14
PARADES
Development
System Development of Mechanical Part (s) ECU Development ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test Sub-System(s) Integration, Test, and Validation Development of Sub-System Sub-System Sign-Off! Distributed System Sign-Off!
What: Functionality
How: Architecture
Trading (ES):
Algorithm complexity (hw/sw sw) )
Sensors/Actuators
Abstractions ?
Cost evaluation ?
15
PARADES
Development
System Development of Mechanical Part (s) ECU Development ECU SW Development ECU HW Development ECU SW Integration and Test ECU HW/SW Integration and Test ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test Sub-System(s) Integration, Test, and Validation Development of Sub-System ECU Sign-Off! Sub-System Sign-Off! Distributed System Sign-Off! ECU HW Sign-Off! ECU SW Implementation
What: Functionality
How: Architecture
Trade (ES):
Hardware
Software
Abstractions ?
Cost evaluation ?
16
PARADES
Development
System Development of Mechanical Part (s) ECU Development ECU SW Development ECU HW Development ECU SW Integration and Test ECU HW/SW Integration and Test ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test Sub-System(s) Integration, Test, and Validation Development of Sub-System ECU Sign-Off! Sub-System Sign-Off! Distributed System Sign-Off! ECU HW Sign-Off! ECU SW Implementation
17
PARADES
18
PARADES
19
PARADES
formal/semi-
formal/natural language
MoC
Language
Manual/automatic/semi-
automatic
20
PARADES
21
PARADES
22
PARADES
Specification Analysis After Sales Service Calibration Implementation Development Process
Buses Buses
Matlab
CPUs Buses Operating Systems
Behavior Components Virtual Architectural Components C-Code
IPs
ASCET
ECU ECU-
1 ECU ECU-
2 ECU ECU-
3 Bus Bus
f1 f1 f2 f2 f3 f3
System Behavior System Platform
Mapping
Performance Analysis
Refinement Evaluation of Architectural and Partitioning Alternatives
23
PARADES
Define a set of abstraction layers
From specifications at a given level, select a solution (controls, components) in s, components) in terms of components (Platforms) of the following layer and propa terms of components (Platforms) of the following layer and propagate gate constraints constraints
Platform components (e.g., micro-
controller, RTOS, communication primitives) at a given level are abstracted to a higher level by their funct at a given level are abstracted to a higher level by their functionality and a set of ionality and a set of parameters that help guiding the solution selection process. The parameters that help guiding the solution selection process. The selection selection process is equivalent to a covering problem if a common semantic process is equivalent to a covering problem if a common semantic domain is domain is used. used.
24
PARADES
Development Development Development
System System System Distributed Distributed Distributed System System System Sign Sign Sign-
Off! Off! Distributed System Partitioning Sub Sub Sub-
System(s) System(s) Sign Sign Sign-
Off! Off! Network Network Network Communication Communication Communication Protocol Sign Protocol Sign Protocol Sign-
Off! Off! Virtual Integration of Sub-System(s) w/ Network Protocol, Test, and Validation Sub-Systems (s) Requirements Sub Sub Sub-
System(s) System(s) Integration, Test, Integration, Test, Integration, Test, and Validation and Validation and Validation Sub-System(s) Implementation Models Sign-Off! Distributed System Requirements Network Network Network Protocol Protocol Protocol Requirements Requirements Requirements Sub-Systems Model Based Development
25
PARADES
Platform Abstraction
Design Exploration
Partitioning
Scheduling
Estimation
Interface Synthesis
(or configuration) (or configuration)
Component Synthesis
(or configuration) (or configuration)
WHAT ? HOW ?
26
PARADES
Specification of the system (top-
down)
Architecture export (bottom-
up)
Abstraction of processor, of communication infrastructure, interface between hardware and software, etc. face between hardware and software, etc.
Partitioning
Partitioning objectives
Minimize network load, latency, jitter,
Maximize speedup, extensibility, flexibility
Minimize size, cost, etc.
Partitioning strategies
partitioning by hand
automated partitioning using various techniques, etc.
Scheduling
Computation
Communication
Different levels:
Transaction/Packet scheduling in communication
Process scheduling in operating systems
Instruction scheduling in compilers
Operation scheduling in hardware
Modeling the partitioned system during the design process
27
PARADES
hide unnecessary details
expose only relevant parameters for the next step
Intercom Platform (BWRC, 2001)
Wireless Processor Protocol Baseband Processor Flash Xilinx FPGA ADC DAC RF Frontend Bus Sonics Silicon Backplane Tensilica Xtensa RISC CPU ASICs SRAM Speech Samples Interface UART Interface External Bus Interface
Platform Design-Space Export Platform Mapping Architectural Space Application Space
Application Instance Platform Instance
System (Software + Hardware) Platform
28
PARADES
Function Space Architecture Platform
29
PARADES
Semantic Platform Function Space
30
PARADES
Platform i+1
Platform Design-Space Export Platform Mapping Refinement Implementation Space Application Space Platform 4 Platform 3 Platform 2 Platform 1
implementation instance application instance plat.3 instance plat.2 instance
Platform i platform i instance platform i+1 instance
31
PARADES
1 Transmission ECU 2 Actuation group 3 Engine ECU 4 DBW 5 Active shift display 6/7 Up/Down buttons 8 City mode button 9 Up/Down lever 10 Accelerator pedal position sensor 11 Brake switch
Subsystem Partitioning Subsystem Integration Software Design: Control Algorithms, Data Processing Physical Implementation and Production
32
PARADES
DESIGN
Powertrain System Behavior
Powertrain System Specifications
Functional Decomposition Capture System Architecture
Electronic System Mapping Operations and Macro Architecture
Performance Back- Annotation HW and SW Components Implementation
Components
Verify Components
Functions
Capture Electronic Architecture HW/SW partitioning Design Mechanical Components Operation Refinement Capture Electrical/Mechanical Architecture Partitioning and Optimization
Functional Network Operational Architecture (ES)
Verify Performance
A2 A3 A4 A5
Only SW components
33
PARADES
34
PARADES
35
PARADES
A rigorous design of functions requires a mathematical framework
The functional description must be an invariant of the design
The mathematical model should be expressive enough to capture easily the functions sily the functions
The different nature of functions might be better captured by heterogeneous model of terogeneous model of computations (e.g. finite state machine, data flows) computations (e.g. finite state machine, data flows)
The functional design requires the abstraction of
Time (i.e. un-
timed model)
Time appears only in constraints that involve interactions with the environment the environment
Data type (i.e. infinite precision)
Any implementation MUST be a refinement of this abstraction (i.e. functionality is . functionality is “ “guaranteed guaranteed” ”): ):
E.g. Un-
timed -
> logic time -
> time
E.g. Infinite precision -
> float -
> fixed point
36
PARADES
FSMs
Discrete Event Systems
CFSMs
Data Flow Models
Petri Nets
The Tagged Signal Model
Synchronous Languages and De-
synchronization
Heterogeneous Composition: Hybrid Systems and Languages
Interface Synthesis and Verification
Trace Algebra, Trace Structure Algebra and Agent Algebra
Definition: Definition: A mathematical description that A mathematical description that has a syntax and rules for computation of has a syntax and rules for computation of the behavior described by the syntax the behavior described by the syntax (semantics). Used to specify the semantics (semantics). Used to specify the semantics
37
PARADES
One way to get all of these is to mix diverse, simple models of computation, while keeping compilation, synthesis, and verification separate for each MoC. To do that, we need to understand these MoCs relative to one another, and understand their interaction when combined in a single system design.
38
PARADES
39
PARADES
time/synchronization
concurrency
heterogeneity
how specify behavior
how specify communication
implementability
composability
availability of tools for validation and synthesis
40
PARADES
Communicating Finite State Machines
Dataflow Process Networks
Petri Nets
Discrete Event
(Abstract) Codesign Finite State Machines
Synchronous Reactive
Task Programming Model
StateCharts
Esterel
Dataflow networks
Simulink
UML
Details Details
41
PARADES
Main MOCs MOCs: :
Communicating Finite State Machines
Dataflow Process Networks
Petri Nets
Discrete Event
Codesign Finite State Machines
Synchronous Reactive
Task Programming Model
Main languages: :
StateCharts
Esterel
Dataflow networks
Simulink
UML
42
PARADES
* is dealing with
*A. Benveniste and G. Berry: The synchronous approach to reactive and real-time systems, Proc IEEE, 1991
43
PARADES
+ G
(*) S. A. Edwards and E. A. Lee, “The semantics and execution of a synchronous block-diagram language”,
Science of Computer Programming, 48(1):21–42, jul 2003. MEM
44
PARADES
Priorities can be static or dynamic
Locally: via shared variables
Globally: via communication network
T10 T14 T12 T13 T11 T8 T7 T9
45
PARADES
46
PARADES
Development
System Sub-System(s) Integration, Test, and Validation Distributed System Sign-Off!
I nformation Systems Telematics Fail Stop Body Electronics Body Functions Fail Safe Fault Functional System Electronics Driving and Vehicle Dynamic Functions Mobile Communications Navigation Fire Wall Access to WWW DAB Gate Way Gate Way Theft warning Door Module Light Module Air Conditioning Shift by Wire Engine Management ABS Steer by Wire Brake by Wire MOST MOST Firewire Firewire CAN CAN Lin Lin CAN CAN TTCAN TTCAN FlexRay FlexRay Real Time Soft Real Time Hard Real Time
47
PARADES
Functions
Functional Networks
bus
Resources
Topologies
Solution Patterns
Mapping
Solution n+1 Evaluation and Iteration
48
PARADES
From:
a model of the functionality (e.g. TPM or SPM)
a model of the platform (abstraction of topology, network protocol, CPU, Hw/
Sw etc) etc)
Allocate:
The tasks to the nodes
The communication signals to the network segments
Schedule:
The task sets in each node
The packets (mapping signals) in each network segment
Such that:
The system is schedulable and the cost is minimized
Design solutions:
Architectural constrains
Analytical approaches
Simulation models
49
PARADES
Communication and computation are synchronized and MUST HAPPEN AND COMPLETE in a given cyclic AND COMPLETE in a given cyclic time time-
division schema
Time-
Triggered Architecture (TTA) C.
Scheidler, G. , G. Heiner Heiner, R. , R. Sasse Sasse, E. Fuchs, H. , E. Fuchs, H. Kopetz Kopetz
Find optimal allocation and scheduling of a Time Triggered TPM scheduling of a Time Triggered TPM
An Improved Scheduling Technique for Time-
Triggered Embedded Systems, Paul Pop, Petru Eles, and Zebo Peng
Extensible and Scalable Time Triggered Scheduling , EEWei Zheng, Jike Chong, Claudio Pinello, Sri Kanajan, Alberto L. Sangiovanni-Vincentelli
Models of bus/network speed and
topology (Hw) and WCET (Hw/Sw) are needed
50
PARADES
Worst Case Execution Time of Tasks and Communication time of each h message are known message are known Construct a correct static schedule for the TT tasks and ST messages (a
Holistic Scheduling and Analysis of Mixed Time/Event-Triggered Distributed Embedded Systems (2002) Traian Pop, Petru Eles, Zebo Peng
51
PARADES
“Network calculus”, J-Y Le Boudec and P. Thiran, Lecture Notes in Computer Sciences vol.
2050, Springer Verlag
52
PARADES
53
PARADES
Px Px transformation based on: transformation based on:
Output event dependency
WCET
BCET Provide: Provide:
Schedulability check check
Output stream models Other strategy to search solutions (allocation and scheduling)
54
PARADES
Task_A
in
Task_B
55
PARADES
Task_A
in
Task_B
Post() from Task_A Value()/Enabled() from Task_B
Controller Network Communication Pattern
Sender Receiver
RTOS
CLib
CPU
Memory Access
CPU Port
Bus Adapter Slave Adapter Memory
Local Bus
Bus Arbiter
Bus
Network Bus
RTOS
CLib
CPU
Memory Access
CPU Port
Bus Adapter
Local Bus
Bus Arbiter LLC/MAC Bus Adapter
Controller Network
Slave Adapter Memory LLC/MAC Bus Adapter
Device DriverNetwLayer Device Driver
NetwLayer
56
PARADES
Cadence SYSDESIGN
T2 f1 f2 T1 f3 f4 T3 Task_10ms Task_2ms Init T5
M3
f5 f6
P2
T4
M4
f7 f8 Project_Car_v06 Task_1ms
M5 M6
f9 f10 T6 f11 f12 Init Task_10ms Car_brake Car_steer Plant_brake Plant_steer T8
P3
T7
M7
f13 f15 Project_Steer_Control_v06
M8 M9
f16 f17 T9 f19 f20 f14 f18 Control_steer Interrupt_counter Vote_steer T10 T11 t1 t2 Project_Driver
My_Vehicle_Application
Task_2ms Prc_count Task_10ms Init SW_IRQ1
M1 P1 M2
Project_Brake_Control_v06 Vote_brake Control_brake
T1 T
Driver
Corrupt Data Single Disconnect
Double Disconnect
Requires a Requires a model of the model of the functionality functionality and and performance models of performance models of CPUs and network CPUs and network protocols protocols It is trace based! It is trace based!
57
PARADES
Development
System Development of Mechanical Part (s) ECU Development ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test Sub-System(s) Integration, Test, and Validation Development of Sub-System Sub-System Sign-Off! Distributed System Sign-Off!
58
PARADES
Specifications given at a high level of abstraction: high level of abstraction:
known input/output relation (or properties) and constraints (or properties) and constraints
Control algorithms design
Mapping to different architectures using performance estimation techniques and techniques and automatic code generation from models automatic code generation from models
Mechanical/Electronic architecture selected among a set of candidates idates
59
PARADES
– M different hw/sw implementation architectures – for each hw/sw implementation architecture m ∈{1,...,M},
– e.g. CPU clock, task priorities, hardware frequency, etc.
μControllers Library
OSEK RTOS OSEK COM I/O drivers & handlers (> 20 configurable modules)
Application Programming Interface Boot Loader
Transport KWP 2000 CCP
Application Specific Software Speedometer Tachometer Water temp. Speedometer Tachometer Odometer
Libraries Customer Libraries
60
PARADES
controller structure and parameters ( (r r ∈ ∈ R, c R, c ∈ ∈ X XC
C)
)
are selected in order to satisfy system specification to satisfy system specifications s
implementation architecture and parameters architecture and parameters ( (m m ∈ ∈ M, z M, z ∈ ∈ X XZ
Z)
)
are selected in order selected in order to minimize implementation cost to minimize implementation cost
if system specifications are not met, the design cycle is repeated ed
both controller and architecture options ( (r, c, m, z r, c, m, z) ) are selected at the are selected at the same time same time to to
minimize implementation cost
satisfy system specifications s
too complex!!
61
PARADES
DESIGN
Powertrain System Behavior
Powertrain System Specifications
Functional Decomposition Capture System Architecture
Electronic System Mapping Operations andMacroArchitecture
Performance Back
HW and SW Components Implementation
Components
Verify Components
Functions
Capture Electronic Architecture HW/SW partitioning Design Mechanical Components Operation Refinement Capture Electrical /Mechanical Architecture Partitioning and Optimization
Functional Network Operational Architecture (ES)
Verify Performance
A2 A3 A4 A5
Only SW components
62
PARADES
which exposes ONLY the implementation non-
idealities that affect the performance of the controlled plant, e.g. performance of the controlled plant, e.g.
S different implementation architecture different implementation architectures s
for each implementation architecture s s ∈ ∈{ {1,...,S 1,...,S} }, ,
a set of implementation implementation parameters parameters p p
an admissible set X XP
P of values
for p p
63
PARADES
d
Controller
y
Plant
w u r
Δw Δr Δu
+ nu + + nr nw
Δu u, , Δ Δr r, , Δ Δw w : : time
control loop delays, sample & hold loop delays, sample & hold , etc. , etc.
nu
u , n
, nr
r ,
, n nw
w :
:value
quantization error, computation imprecision, etc. , etc.
64
PARADES
Model and Simulation files
nts
Time History Simulation Results Calibration data Simulink Model
65
PARADES
Development
System Development of Mechanical Part (s) ECU Development ECU SW Development ECU HW Development ECU SW Integration and Test ECU HW/SW Integration and Test ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test Sub-System(s) Integration, Test, and Validation Development of Sub-System ECU Sign-Off! Sub-System Sign-Off! Distributed System Sign-Off! ECU HW Sign-Off! ECU SW Implementation
66
PARADES
Development
System Development of Mechanical Part (s) ECU Development ECU SW Development ECU HW Development ECU SW Integration and Test ECU HW/SW Integration and Test ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test Sub-System(s) Integration, Test, and Validation Development of Sub-System ECU Sign-Off! Sub-System Sign-Off! Distributed System Sign-Off! ECU HW Sign-Off! ECU SW Implementation
Define ECU Hardware/Software Partitioning
Platform instance structure selection
Software Implementation Hardware (SoC) Design and Implementation
67
PARADES
68
PARADES
Software generation is possible today from several MoC MoC and languages: and languages:
StateCharts, Dataflow, SR, …
Implement the same MoC MoC of specification or guarantee the equivalence
Fit into the chosen software architecture to maximize reuse at component
level level
E.g. AUTOSAR for automotive
Define application and platform software architecture
69
PARADES
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
Sensor/Actuator Layer Application Software
erge
ergeOutData
erge
ergeOutData
The software platform is cross applications and cross HW plats and is composed of parameterized software components (sources)
The software application is composed of model- based and hand-written application-dependent software components (sources)
erge
ergeOutData
erge
ergeOutData
70
PARADES
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
Sensor/Actuator Layer Application Software
erge
ergeOutData
erge
ergeOutData
The software platform is cross applications and cross HW plats and is composed of parameterized software components (sources)
The software application is composed of model- based and hand-written application-dependent software components (sources)
erge
ergeOutData
erge
ergeOutData
71
PARADES
control-logic
data-flow computational blocks
72
PARADES
Mapping a functional model to software platform:
Data refinement
Software platform services mapping (communication and computation) n)
Time refinement (scheduling)
Data refinement
Float to Fixed Point Translation.
Range, scaling and size setting (by the designer).
Worst case analysis for internal variable ranges and scaling.
Signals and parameters to C-
variables mapping.
Software platform model:
variables and services (naming).
Access variable method are mapped with variable classes.
execution model:
Multi-
rate subsystems are implemented as multi-
task software components scheduled by an OSEK/VDX standard RTOS standard RTOS
Time refinement
Task scheduling
73
PARADES
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
Sensor/Actuator Layer Application Software
From high level models:
computation
services Flow examples: ASCET, Simulink/eRTW/TargetLink, UML
74
PARADES
Modelled Components SLOC % of Model Compiled SLOC Platform Components 26-HandCoded 26500 0% Application Components 86-AutomCoded 13-HandCoded 93600 90%
% of the total memory occupation ROM % RAM % Platform 17.9 2.9 Application 82.1 97.1
75
PARADES
76
PARADES
Output Devices Input devices Hardware Platform I O Hardware network
DUAL-CORE
RTOS BIOS Device Drivers Network Communication
DUAL-CORE
Architectural Space (Performance) Application Space (Features)
Platform Instance Application Instances System Platform (no ISA) Platform Design Space Exploration Platform Specification
Platform API
Software Platform
Output Devices Input devices Hardware Platform I O Hardware network
HITACHI
RTOS BIOS Device Drivers Network Communication
HITACHI
RTOS BIOS Device Drivers Network Communication Output Devices Input devices Hardware Platform I O Hardware network
ST10
RTOS BIOS Device Drivers Network Communication
ST10 Application Software Application Software
77
PARADES
Simulink Defined MoC and Languages
Integration Algorithm Analysis Code Generation (Synthesis)
UML ASCET
Different Languages and MoCs
Platform non idealities C/C++/SystemC Mapping Build Platform Models C/C++/SystemC Generators Simulator
Simulation and Performance Estimation
Unique Representation
C/C++/SystemC
StateMate Exporters
Platform Export
Performance Traces
78
PARADES
+Computation Time +Computation Time Time Functional (TF) Time Functional (TF) Function Function Untimed Untimed Functional (UTF) Functional (UTF) Gates Gates RTL (L RTL (L-
0) +Pin/clock +Pin/clock Pin Cycle Accurate (PCA) Pin Cycle Accurate (PCA) Computation Computation
Communication (I/F) Communication (I/F)
+Clock cycle +Clock cycle Register Transfer (RT) Register Transfer (RT) Token Token Untimed Untimed Functional Functional Wire registers Wire registers Transfer (L Transfer (L-
1) +Clock cycle +Clock cycle Bus cycle Accurate (BCA) Bus cycle Accurate (BCA) Clocks, protocols Clocks, protocols Transaction (L Transaction (L-
2) +Transaction time +Transaction time Programmers View + Programmers View + Time (PVT) Time (PVT) Time Resource Time Resource Sharing Sharing Message (L Message (L-
3) +Address +Address Programmers View (PV) Programmers View (PV) Abstraction Abstraction Removes Removes
Abstraction Accuracy Abstraction Accuracy
79
PARADES
5 10 15 CPU load% mapping "zero" mapping "uno" mapping "due" mapping "tre" 1000 2000 IRQ/s mapping "zero" mapping "uno" mapping "due" mapping "tre" 5000 10000 task switching (attivazioni/s) mapping "zero" mapping "uno" mapping "due" mapping "tre" 5 10 15 numero di task mapping "zero" mapping "uno" mapping "due" mapping "tre"
80
PARADES
Higher reliability
Allow slower hardware, smaller size, lower power consumption
81
PARADES
Static estimation
Determination of runtime properties at compile time
Most of the (interesting) properties are undecidable => use approximations
An approximation program analysis is safe, if its results can always be depended on. ways be depended on.
E.G. WCET, BCET
Quality of the results (precision) should be as good as possible
Dynamic estimation
Determination of properties at runtime
DSP Processors
relatively data independent
most time spent in hand-
coded kernels
static data-
flow consumes most cycles
small number of threads, simple interrupts
Regular processors
arbitrary C, highly data dependent
commercial RTOS, many threads
complex interrupts, priorities
82
PARADES
The structure of the code (program path analysis program path analysis) )
E.g. loops and false paths
The system on which the software will run (micro micro-
architecture modeling) )
CPU (ISA, interrupts, etc.), HW (cache, etc.), OS, Compiler
Low-
level
e.g. gate-
level, assembly-
language level
Easy and accurate, but long design iteration time
High/system-
level
Fast: reduces the exploration time of the design space
Accurate “ “enough enough” ”: approximations are required : approximations are required
Processor model must be cheap
“what if” my processor did X future processors not yet developed evaluation of processor not currently used
Must be convenient to use
no need to compile with cross-compilers and debug on my desktop
83
PARADES
Pros:
does not require target software development chain (uses host compiler) mpiler)
fast simulation model generation and execution
simple and cheap generation of a new processor model
Needed when target processor and compiler not available
Cons:
hard to model target compiler optimizations (requires “ “best in class best in class” ” Virtual Virtual Compiler that can also as C Compiler that can also as C-
to-
C optimization for the target compiler)
low precision, especially for data memory accesses
84
PARADES
generally available from processor IP provider
considers target compiler optimizations and real data and code addresses ddresses
requires target software development chain and full application (boot, RTOS, (boot, RTOS, Interrupt handling, etc) Interrupt handling, etc)
different integration problem for every vendor (and often for every CPU) ery CPU)
may be difficult to support communication models that require waiting to iting to complete an I/O or synchronization operation complete an I/O or synchronization operation
85
PARADES
86
PARADES
87
PARADES
Support successive refinement for both processors and bus models
Depending on abstraction level, simulation performance of 100 to 200 200 Kcycles Kcycles/sec /sec
88
PARADES
UF Platform-in-the-Loop
C Code on platform model
Platform model
TF/RT Platform-in-the-Loop
C Code on platform model Platform model
Model level Untimed, host data type Untimed, target data type Timed, target data type Real target
Code based
Model based
89
PARADES
Control Specification
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
Platform Abstraction
void integratutto4_initializer( void ) { /* Initialize machine's broadcast event variable */ _sfEvent_ = CALL_EVENT; _integratutto4MachineNumber_ = sf_debug_initialize_machine("integratutto4","sfun",0,3,0,0,0); sf_debug_set_machine_event_thresholds(_integratutto4MachineNumber_,0,0); sf_debug_set_machine_data_thresholds(_integratutto4MachineNumber_,0); }
90
PARADES
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
ECU output devices ECU input devices CPUs RTOS BIOS Device Drivers Net
Software Platform (API services)
Sensor/Actuator Layer Application Software
erge
ergeOutData
erge
ergeOutData
The software platform is cross applications and cross HW plats and is composed of parameterized software components (sources)
The software application is composed of model- based and hand-written application-dependent software components (sources)
erge
ergeOutData
erge
ergeOutData
91
PARADES
Output Devices Input devices Hardware Platform I O Hardware network
DUAL-CORE
RTOS BIOS Device Drivers Network Communication
DUAL-CORE
Architectural Space (Performance) Application Space (Features)
Platform Instance Application Instances System Platform (no ISA) Platform Design Space Exploration Platform Specification
Platform API
Software Platform
Output Devices Input devices Hardware Platform I O Hardware network
HITACHI
RTOS BIOS Device Drivers Network Communication
HITACHI
RTOS BIOS Device Drivers Network Communication Output Devices Input devices Hardware Platform I O Hardware network
ST10
RTOS BIOS Device Drivers Network Communication
ST10 Application Software Application Software
92
PARADES
Hardware, computation:
Cores:
Core selection
Core instantiation
Coprocessors:
Selection (Peripherals)
Configuration/Synthesis
Instructions:
ISA definition (VLIW)
ISA Extension Flow
Hardware, communication:
Busses
Networks
Software, granularity:
Set of Processes
Process/Thread
Instruction sequences
Instructions
Software, layers:
RTOS
HAL
Middle layers
93
PARADES
94
PARADES
95
PARADES
Hardware implementation of RTOS
Parameterization of a configurable processor
Custom extension of the ISA
Define a new ISA (e.g. VLIW)
96
PARADES
97
PARADES
98
PARADES
99
PARADES
RAM ROM
Hardwired Logic General General Purpose Purpose 32b CPU 32b CPU
A/D I/O PHY
General-purpose CPUs used in
traditional SOCs are not fast enough for data-intensive applications, don’t have enough I/O or compute bandwidth, lacks efficiency
General-purpose CPUs used in
traditional SOCs are not fast enough for data-intensive applications, don’t have enough I/O or compute bandwidth, lacks efficiency
Hardwired Logic
to parallelism
in/out of the block
familiar to many
But …
Hardwired Logic
to parallelism
in/out of the block
familiar to many
But …
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
100
PARADES
High Level Models: High Level Models: TLM/ TLM/Simulink Simulink SystemC SystemC/C++ Models /C++ Models
IR: Control Flow Data Graph IR: Control Flow Data Graph
High High-
Level Synthesis Software Extraction Software Extraction Hardware Hardware implementations implementations Software Cost Software Cost Estimation Estimation Hardware Cost Hardware Cost Estimation Estimation Software Software Compilation Compilation Hw/ Hw/Sw Sw Integration Integration Performance Performance Estimation Estimation
Cost Function Evaluation Cost Function Evaluation
Hardware Refinement Hardware Refinement Software Refinement Software Refinement Chunks Identification & Chunks Identification & System partitioning System partitioning
hardware hardware software software
Hw/ Hw/Sw Sw Co Co-
verification
101
PARADES
DK Design Suite DK Design Suite Cynthesizer Cynthesizer
102
PARADES
Loosely coupled coprocessor that
Delivers the parallel processing
resources of a custom processor.
Automatically synthesizes
programmable coprocessor from software executable (hw and sw).
Maximizes system performance
through memory access and bus communication optimizations.
103
PARADES
Bottleneck Identification:
Analyze the profiling results of the application software running on the main microprocessor. g on the main microprocessor.
Manually identifies the specific tasks to be migrated to the coprocessor. rocessor.
Architecture Synthesis and Performance Estimation:
User-
defined constraints like gate count, clock cycle count, and bus utilization utilization
Analysis of the instruction code and architecte architecte the coprocessor deploy the maximum parallelism consistent with the coprocessor deploy the maximum parallelism consistent with the input the input constraints. constraints.
Estimation of gate-
count and performance including estimates of communication overhead with the main processor. ead with the main processor.
Coprocessor-
Performance and “ “What What-
If” ” Analysis: Analysis:
Generation of an instruction-
and bit-
accurate C model of the coprocessor architecture used in conjunction with the main tion with the main processor processor’ ’s instruction s instruction-
set simulator (ISS).
Typical analysis: performance profiling, memory-
access activity, and activation trace data
The model also is used to validate the coprocessor within a standard C or dard C or SystemC SystemC simulation environment. simulation environment.
Hardware Synthesis and Microcode generation:
Generation of the coprocessor hardware, delivering synthesizable RTL code in either VHDL or RTL code in either VHDL or Verilog Verilog and of the circuitry and of the circuitry that that’ ’s needed to enable the coprocessor to communicate with the main s needed to enable the coprocessor to communicate with the main processor processor’ ’s bus interface. s bus interface.
Generation of the coprocessor microcode.
It automatically modifies the original executable code so that function calls are directed to a communications library. unction calls are directed to a communications library.
This library manages the coprocessor handoff. It also communicates parameters and results between the main processor es parameters and results between the main processor and the coprocessor. and the coprocessor.
Microcode can be generated independently of the coprocessor hardware, allowing new microcode to be targeted at an ware, allowing new microcode to be targeted at an existing coprocessor design. existing coprocessor design.
104
PARADES
External Bus Interface
Base ISA Feature Configurable Functions Optional Function Designer Defined Features (TIE) Optional & Configurable
User Defined Queues / Ports up to 1M Pins
Xtensa Local Memory Interface
Trace/TJAG/OCD
User Defined Execution Units, Register Files and Interfaces Base ALU Optional Execution Units Instruction Fetch / Decode Data Load/Store Unit Register File User Defined Execution Unit Vectra LX DSP Engine
Processor Controls Interrupts, Breakpoints, Timers
Load/Store Unit #2 Local Instruction Memories
Processor Interface (PIF) to System Bus
Local Data Memories
User Defined Execution Units, Register Files and Interfaces
Designer-defined FLIX parallel execution pipelines - “N” wide Base ISA Execution Pipeline
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
105
PARADES
{ assign z = {16'b0, m[23:8] }; } The operation statement describes an entire new instruction, including: Instruction name Instruction format and arguments Functional Behavior From this single statement, Tensilica’s technology generates processor hardware, simulation and software development tool support for the new instruction.
3 3 2 2 1 1 3 3 2 2
1 1
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
106
PARADES
{ wire [31:0] m = TIEmul(a[15:0],b[15:0],1); assign z = {16'b0, m[31] ? ((m[31:23]==9'b1) ? m[23:8] : 16'h8000) : ((m[31:23]==9'b0) ? m[23:8] : 16'h7fff) }; } schedule ms {MUL_SAT_16} {def z 2;}
OPERAND2 RESULT E1 E2
Pipeline Stage
OPERAND1
SAT MUL
Core 32bit Register File (AR)
a b z
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
107
PARADES
{ wire [31:0] m0 = TIEmul(a[15:0], b[15:0], 1); assign z = { m0[31] ? ((m0[31:23]==9'b1) ? m0[23:8] : 16'h8000) : ((m0[31:23]==9'b0) ? m0[23:8] : 16'h7fff) }; } schedule ms {MUL_SAT_2x16} {def z 2;}
SAT MUL
a1 a0 b1 b0
Core 32bit Register File (AR)
wire [31:0] m1 = TIEmul(a[31:16],b[31:16],1); wire [31:0] m0 = TIEmul(a[15:0], b[15:0], 1); assign z = {m1[31] ? ((m1[31:23]==9'b1) ? m1[23:8] : 16'h8000) : ((m1[31:23]==9'b0) ? m1[23:8] : 16'h7fff), m0[31] ? ((m0[31:23]==9'b1) ? m0[23:8] : 16'h8000) : ((m0[31:23]==9'b0) ? m0[23:8] : 16'h7fff) };
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
108
PARADES
™ ™ Architecture
Designer-Defined FLIX Instruction Formats with Designer-Defined Number of Operations
Example 4 – Operation, 32b Instruction Format Example 5 – Operation, 64b Instruction Format 63
1 1 1 0 Operation 5 Op 4 Operation 1 Op 3 Operation 2
Example 3 – Operation, 64b Instruction Format
Operation 1 1 1 1 0 Operation 3 Operation 2
63 31
1 1 1 0
Op 3 Op 2 Op 1
FLIX
™ ™ –
– F Flexible lexible L Length ength I Instruction nstruction X Xtensions tensions
Multiple, concurrent, independent, compound operations per instruction uction
Modeless intermixing of 16, 24, and 32 or 64 bit instructions
Fast and concurrent code (concurrent execution) when needed
Compact code when concurrency / parallelism isn’ ’t needed t needed
Full code compatibility with base 16/24 bit Xtensa ISA
Minimal overhead
No VLIW-
style code-
bloat
~2000 gates added control logic
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
109
PARADES
Multi-issue instruction
L operations packed in one long instruction M copies of storage and function
SIMD operation
N dependent
implemented as single fused
const
register and constant inputs
reg Fused operation reg reg reg
Three forms of instruction-set parallelism:
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
110
PARADES
Synthesizable RTL
Synopsys/Cadence flows
Scheduling assembler
Xtensa C/C++ Compiler: vectorizing C/C++ compiler vectorizing C/C++ compiler
Xtensa Instruction Set Simulator Simulator – – Pipeline accurate Pipeline accurate
Debuggers
XTMP: System Modeling API
Bus Functional Model for HW/SW co co-
simulation model
RTOS: VxWorks, Nucleus, XTOS Nucleus, XTOS
Integrated Development Environment
TIE Development tools
C Development tools
Profiling & visualization tools
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
111
PARADES
Electronic Specification
Configuration selection and custom-instruction description
Xtensa Processor Generator *
* US Patent: 6,477,697
Use standard ASIC/COT design techniques and libraries for any IC fabrication process
Iterate in hours
Complete Hardware Design
Source RTL, EDA scripts, test suite
Customized Software Tools
C/C++ compiler Debuggers Simulators RTOSes
Processor
Extensions
int main() { int i; short c[100]; for (i=0;i<N/2;i++) { int main() { int i; short c[100]; for (i=0;i<N/2;i++) {
ANSI C/C++ Code
Source code
XPRES Compiler
Optional Step Runs in Minutes
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
112
PARADES
RAM
General General Control Control RISC RISC
A/D I/O PHY
Image Image Logic Logic Video Video Logic Logic Audio Audio Logic Logic Video Video Logic Logic Security Security Logic Logic Packet Packet Logic Logic DSP DSP Logic Logic
System-On-Chip (SOC)
RAM
General General Control Control RISC RISC
A/D I/O PHY
Image Image Logic Logic Video Video Logic Logic Audio Audio Logic Logic Video Video Logic Logic Security Security Logic Logic Packet Packet Logic Logic DSP DSP Logic Logic
Advanced System-On-Chip (SOC)
General General Control Control
Processor Processor
Image Image
Processor Processor
Video Video
Processor Processor
Audio Audio
Processor Processor
Video Video
Processor Processor
Security Security
Processor Processor
Packet Packet
Processor Processor
DSP DSP
Processor Processor
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
113
PARADES
Processor Master Processor Master Processor Master Processor Master Memory Slave Memory Slave Output Device Slave Input Device Slave
Shared Bus
Processor Master Processor Master Processor Master Memory Slave Memory Slave Output Device Slave Input Device Slave
Cross-Bar
Processor Master Global Memory Slave Global I/O Slave Global I/O Slave Processor Master Processor Master Processor Master Processor Master Processor Master Processor Master Processor Master Processor Master
On-chip Routing Network
Processor Master I/O Processor Data Crunching Processor Output Device Slave Processor Master Processor Master
Queue Queue Queue
Dual-Port Memory
Application-specific
Routing Node Routing Node Routing Node Routing Node Routing Node Routing Node Routing Node Routing Node Routing Node Global Memory Slave
Input Device Slave bus bus
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
114
PARADES
Conceptual Model Of Application Partition Application into Tasks High-Level Architecture Add Communication Channels b/w Tasks Refine Arch: Add TIE, Mems, Queues Map Tasks to Processors & Comm. Channels to Queues, Shared Memories
Spec,Matlab, C/C++, SystemC µP
1
µP
2
µP
3
Shared Memory µP
1
µP
2
µP
3
Q1 Q2 SM
Shared Memory µP
1
µP
2
µP
3
Q1 Q2 SM Q1
µP
1
µP
2
µP
3
Q2 SM
Simulation Model
Top-level RTL Component RTL Sample Test Bench Simulate, Profile, Analyze, Iterate
C/C++
Remap Tasks or comms Change Comm Channels Repartition Application Change Processor Config Change System Architecture
Courtesy of Courtesy of Grant Martin, Chief Scientist, Tensilica
115
PARADES
Courtesy of Courtesy of SONICS
116
PARADES
Delivers communication framework plus peripherals
Limits the modeling efforts
SiliconBackplane™ (patented) { SiliconBackplane Agent™ Open Core Protocol™ MultiChip Backplane™ DSP MPEG CPU DMA C MEM I O
SONICs Architecture
117
PARADES
Behavioral models
Trace generation
Monitors
Disassemblers
Protocol checkers
Performance analysis
SystemC models
Timing constraint propagation
Synthesis script generation
Floorplanner interface interface
Courtesy of Courtesy of SONICS
118
PARADES
119
PARADES
Motivation
Design complexity and the need for verification and time-
to-
market constraints are increasing increasing
Semantic link between specification and implementation is necessary ary
Platform-
Based Design
Meet-
in-
the-
middle approach
Separation of concerns
Function vs. architecture
Capability vs. performance
Computation vs. communication
Metropolis Framework
Extensible framework providing simulation, verification, and synthesis capabilities thesis capabilities
Easily extract relevant design information and interface to external tools rnal tools
Released Sept. 15th, 2004
120
PARADES
Target: Embedded System Design
Set-top boxes, cellular phones, automotive controllers, … Heterogeneity:
computation: Analog, ASICs, programmable logic, DSPs, ASIPs, processors communication: Buses, cross-bars, cache, DMAs, SDRAM, … coordination: Synchronous, Asynchronous (event driven, time driven)
Goals:
Design methodologies:
abstraction levels: design capture, mathematics for the semantics design tasks: cache size, address map, SW code generation, RTL generation, …
Tool set:
synthesis: data transfer scheduling, memory sizing, interface logic, SW/HW
generation, …
verification: property checking, static analysis of performance, equivalence checking,
…
121
PARADES
Participants:
122
PARADES
Design Constraints & Assertions Function Specification Architecture (Platform) Specification
Synthesis/Refinement
concurrency
synthesis
Analysis/Verification
systems
programs
software
123
PARADES
Kahn process networks dataflow discrete events synchronous/ reactive hybrid systems continuous time Metropolis provides a process networks abstract semantics and emphasizes formal description of constraints, communication refinement, and joint modeling of applications and architectures.
124
PARADES
Metropolis elements adhere to a “ “separation of concerns separation of concerns” ” point of view. point of view.
Active Objects Sequential Executing Thread Passive Objects I mplement I nterface Services Schedule access to resources and quantities
125
PARADES
126
PARADES
package producers_consumer; process P { port IntWriter port_wr; public P(String name) {} void thread() { int w = 0; while (w < 30) { port_wr.writeInt(w); w = w + 1; } }} package producers_consumer; interface IntWriter extends Port{ update void writeInt(int i); eval int nspace(); }
127
PARADES
Scheduled Netlist Scheduling Netlist
Global Time
Metropolis Architectures are created via two netlists:
events with quantities.
Event1 –
represents a transition in the action automata
be annotated with any number
This allows performance estimation. Related Work
128
PARADES
Represents a transition in the action automata action automata of an object
An event is owned by the object that exports it
During simulation, generated events are termed as event instances event instances
Events can be annotated with any number of quantities
Events can partially expose the state around them, constraints can then an then reference or influence this state reference or influence this state
All elements in the set have a common begin event and a common end event nd event
A service may be parameterized with arguments
1.
IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, Vol. 17, N. 12, pg. 1217-1229, December 1998
129
PARADES
Processes take actions actions. .
statements and some expressions, e.g. y = y = z+port.f z+port.f();, ();, z+port.f z+port.f(), (), port.f port.f(), i < 10, (), i < 10, … …
An execution execution of a given
netlist is a sequence of vectors of is a sequence of vectors of events events. .
event : the beginning of an action, e.g. : the beginning of an action, e.g. B( B(port.f port.f() ()) ), , the end of an action, e.g. the end of an action, e.g. E( E(port.f port.f() ()) ), or null , or null N N
the i i-
th component of a vector is an event of the component of a vector is an event of the i i-
th process process
An execution is legal legal if if
it satisfies all coordination constraints, and
it is accepted by all action automata.
130
PARADES
defines the set of sequences of events that can happen in executing the ing the action action
it may update shared memory variables:
process and media member variables values of actions-expressions
it may have guards that depend on states of other action automata and a and memory variables memory variables
transitions must be taken together in different automata, if they correspond y correspond to the same event. to the same event.
131
PARADES
Return
B y= x+ 1 B x+ 1 E x+ 1 E y= x+ 1 y:= Vx+ 1 B x+ 1 E x+ 1 E y= x+ 1 y:= any * = write y * * * B x+ 1 E x+ 1 Vx+ 1 := x+ 1 E x+ 1 Vx+ 1 := any write x
y= x+ 1 x+ 1
B y= x+ 1 B x+ 1 E x+ 1 N N N E y= x+ 1 5 5 5 1 1 1 Vx+ 1 y x
132
PARADES
Processes run sequential code concurrently, each at its own arbitrary trary pace. pace.
Read-
Write and Write-
Write hazards may cause unpredictable results
atomicity has to be explicitly specified.
Progress may block at synchronization points
awaits
function calls and labels to which awaits or constraints refer.
The legal behavior of a netlist netlist is given by a set of sequences of event is given by a set of sequences of event vectors. vectors.
multiple sequences reflect the non-
determinism of the semantics: concurrency, synchronization (awaits and constraints) concurrency, synchronization (awaits and constraints)
133
PARADES
“if process P starts to execute a statement s1, no other process can start the statement until P reaches a statement s2.”
“any successive actions of starting a statement s1 by process P must take place with at most 10ms interval.”
134
PARADES
→
135
PARADES
Stimuli FSM Datapath FIR Display
( SystemC2.0 Distribution )
Stimuli : 0 at time 9 Display : 0 at time 13 Stimuli : 1 at time 19 Display : -6 at time 23 Stimuli : 2 at time 29 Display : -16 at time 33 Stimuli : 3 at time 39 Display : -13 at time 43 Stimuli : 4 at time 49 Display : 6 at time 53
FIR Trace
136
PARADES
An architecture component specifies services, i.e.
medium Bus implements BusMasterService …{ port BusArbiterService Arb; port MemService Mem; … update void busRead(String dest, int size) { if(dest== … ) Mem.memRead(size); } … interface BusMasterService extends Port { update void busRead(String dest, int size); update void busWrite(String dest, int size); }
interfaces, methods, coordination (awaits, constraints), netlists quantities, annotated with events, related over a set of events
137
PARADES
the global time is non-decreasing in a sequence of vectors of any feasible execution.
class class GTime GTime extends Quantity { extends Quantity { double t; double t; double sub(double t2, double t1){...} double sub(double t2, double t1){...} double add(double t1, double t2){ double add(double t1, double t2){… …} } boolean boolean equal(double t1, double t2){ ... } equal(double t1, double t2){ ... } boolean boolean less(double t1, double t2){ ... } less(double t1, double t2){ ... } double A(event e, double A(event e, int int i){ ... } i){ ... } constraints{ constraints{ forall(event forall(event e1, event e2, e1, event e2, int int i, i, int int j): j): GXI.A(e1, i) == GXI.A(e2, j) GXI.A(e1, i) == GXI.A(e2, j) -
> equal(A(e1, i), A(e2, j)) && GXI.A(e1, i) < GXI.A(e2, j) GXI.A(e1, i) < GXI.A(e2, j) -
> (less(A(e1, i), A(e2, j)) || equal(A(e1, i), A(e2. j))); }} }}
138
PARADES
This modeling mechanism is generic, independent of services and cost specified. cost specified.
Which levels of abstraction, what kind of quantities, what kind of cost constraints should be
used to capture architecture components? used to capture architecture components?
depends on applications: on
going research
Transaction:
Services:
Quantities:
Physical:
Services: full characterization Quantities: time CPU ASIC2 ASIC1
Sw1 Hw Sw2 Sw I/F Channel I/F Wrappers Hw Bus I/F C-Ctl Channel Ctl B-I/F CPU-IOs e.g. PIBus 32b e.g. OtherBus 64b... C-Ctl RTOS
Virtual BUS:
Services:
Quantities: same as above, different weights
139
PARADES
The 2 The 2-
step approach to resolve quantities at each state of a netlist netlist being executed: being executed: 1.
quantity requests for each process for each process Pi Pi, for each event , for each event e e that that Pi Pi can can take, find all the quantity constraints on take, find all the quantity constraints on e e. . In the meta In the meta-
model, this is done by explicitly requesting quantity annotations at the relevant events, i.e. s at the relevant events, i.e. Quantity.request(event, requested quantities). Quantity.request(event, requested quantities). 2.
quantity resolution find a vector made of the candidate events and a set of quan find a vector made of the candidate events and a set of quantities annotated with each of the events, such tities annotated with each of the events, such that the annotated quantities satisfy: that the annotated quantities satisfy:
all the quantity requests, and
all the axioms of the Quantity types. In the meta In the meta-
model, this is done by letting each Quantity type implement a resolve() method, and the solve() method, and the methods of relevant Quantity types are iteratively called. methods of relevant Quantity types are iteratively called.
theory of fixed-
point computation
140
PARADES
The 2-
step approach is same as how schedulers work, e.g. OS schedulers, BUS schedulers, , BUS schedulers, BUS bridge controllers. BUS bridge controllers.
Semantically, a scheduler can be considered as one that resolves a quantity called a quantity called execution execution index. index.
Two ways to model schedulers:
explicitly model the scheduling protocols using the meta-
model building blocks
a good reflection of actual implementations
use the built-
in request/resolve approach for modeling the scheduling protocols s
more focus on resolution (scheduling) algorithms, than protocols: suitable for higher level abstraction : suitable for higher level abstraction models models
141
PARADES
PPC405 MicroBlaze SynthSlave SynthMaster Processor Local Bus (PLB) On-Chip Peripheral Bus (OPB) OPB/PLB Bridge Mapping Process
Computation Services
Read (addr, offset, cnt, size), Write(addr, offset, cnt, size), Execute (operation, complexity) BRAM
Task Before Mapping
Read (addr, offset, cnt, size)
Task After Mapping
Read (0x34, 8, 10, 4)
Communication Services
addrTransfer(target, master) addrReq(base, offset, transType, device) addrAck(device) dataTransfer(device, readSeq, writeSeq) dataAck(device)
142
PARADES
PPC Sched OPB Sched PLB Sched MicroBlaze Sched BRAM Sched General Sched
Request (event e)
queue of requested events
Resolve()
event from the pending queue
PostCond()
(annotation). This is typically the interaction with the quantity manager
GTime
143
PARADES
From Char Flow Shown From Metro Model Design From I SS for PPC
1. Douglas Densmore, Adam Donlin, A.Sangiovanni-Vincentelli, FPGA Architecture Characterization in System Level Design, Submitted to CODES 2005. 2. Adam Donlin and Douglas Densmore, Method and Apparatus for Precharacterizing Systems for Use in System Level Design of Integrated Circuits, Patent Pending.
Create database ONCE prior to simulation and populate with independent (modular) information.
performance based on physical implementation.
composition of communication transactions.
processing elements computation. Work with Xilinx Research Labs
144
PARADES
DedHW Sched PLB Sched BRAM Sched Global Time PPC Sched Task1 Task2 PPC Task3 Task4 DEDICATED HW BRAM PLB Scheduled Netlist Characterizer Scheduling Netlist
Media (scheduled) Process Quantity Manager Quantity
Enabled Event Disabled Event
145
PARADES
Map a functional network with an architectural network without changing hanging either of the two either of the two
Support design reuse
Specify the mapping between the two in a formal way
Support analysis techniques
Make future automation easier
Use declarative synchronization constraints synchronization constraints between events between events
One of the unique aspects
Functional Network Arch. Network synch(…), synch(…), … Mapping Network
146
PARADES
ltl synch(e1, e2) synch(e1, e2)
e1 and e2 occur simultaneously or not at all simultaneously or not at all during simulation during simulation
ltl synch(e1, e2: var1@e1 == var2@e2) synch(e1, e2: var1@e1 == var2@e2)
The value of var1 var1 in the scope of e1 is equal to the value of in the scope of e1 is equal to the value of var2 var2 when e1 and when e1 and e2 occur e2 occur
Can be useful for “ “passing passing” ” values between functional and architectural values between functional and architectural models models
147
PARADES
e1 = beg(P1, M1.read); e2 = beg(T1, T1.read); ltl synch(e1, e2: items@e1 = = i@e2); e3 = end(P1, M1.read); e4 = end(T1, T1.read); ltl synch(e3, e4);
P1 M1: void read (int items) { … } T1: await { (true;;) read (int i); (true;;) write (int i); } CPU Global Time
148
PARADES
Bus Arbiter
Bus Mem Cpu
OsSched MyArchNetlist mP1 mP2 MyFncNetlist
M
P1 P2 Env1 Env2
B(P1, M.write) <=> B(mP1, mP1.writeCpu); E(P1, M.write) <=> E(mP1, mP1.writeCpu); B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, mP1.mapf); B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, M.read) <=> E(mP2, mP2.readCpu); B(P2, P2.f) <=> B(mP2, mP2.mapf); E(P2, P2.f) <=> E(mP2, mP2.mapf);
MyMapNetlist
149
PARADES
interface MyService extends Port { int myService(int d); } medium AbsM implements MyService{ int myService(int d) { … } } B(thisthread, AbsM.myService) <=> B(P1, M.read); E(thisthread, AbsM.myService) <=> E(P2, M.write);
refine(AbsM, MyMapNetlist);
MyArchNetlist MyFncNetlist M
P1 P2
B(P1, M.write) <=> B(mP1, mP1.writeCpu); B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, ) B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, P2.f) <=> E(mP2, mP2.mapf);
MyMapNetlist1
MyArchNetlist MyFncNetlist M
P1 P2
B(P1, M.write) <=> B(mP1, mP1.writeCpu); B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, ) B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, P2.f) <=> E(mP2, mP2.mapf);
MyMapNetlist1
B(…) <=> B(…); E(…) <=> E(…);
refine(AbsM, MyMapNetlist1)
MyArchNetlist MyFncNetlis t
M
P1 P2
B(P1, M.write) <=> B(mP1, mP1.writeCpu); B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, ) B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, P2.f) <=> E(mP2, mP2.mapf);
MyMapNetlist2
M
B(…) <=> B(…); E(…) <=> E(…);
refine(AbsM, MyMapNetlist2)
A set of mapping netlists, together with constraints on event relations to a given interface implementation, constitutes a platform of the interface.
150
PARADES
S N N'
B(Q2, S.cdx) <=> B(Q2, mQ2.excCpu); E(Q2, M.cdx) <=> E(mQ2, mQ2.excCpu); B(Q2, Q2.f) <=> B(mQ2, mQ2.mapf); E(Q2, P2.f) <=> E(mQ2, mQ2.mapf);
MyArchNetlist MyFncNetlist M
P1 P2
B(P1, M.write) <=> B(mP1, mP1.writeCpu); B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, ) B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, P2.f) <=> E(mP2, mP2.mapf);
MyMapNetlist1
MyArchNetl ist MyFncNe tlist
M
P1 P2
B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, P2.f) <=> E(mP2, mP2.mapf); M
151
PARADES
Evaluate the methodology with formal techniques applied.
– Input: a transport stream for multi-channel video images – Output: a PiP video stream
dynamically changeable
DEMUX PARSER JUGGLER MPEG RESIZE MPEG
PIP
USRCONTROL
60 processes with 200 channels
152
PARADES
∞ ∞ ∞ ∞
DMA DSP RAMs RAMd $ CPU $ $ HW HW MemF MemS $ DSP CPU HW HW
memories with finer primitives (called TTL API): allocate/release space, move data, probe space/data
153
PARADES
Meta model compiler
Verification tool Synthesis tool
Front end Meta model language
Simulator tool
... Back end1 Abstract syntax trees Back end2 Back endN Back end3
Metropolis interactive Shell
refine, map etc
Verification tool
Functional Spec
Communication Spec
Constraints Architecture
154
PARADES
Quasi-
static scheduling
Scheduler synthesis from constraint formulas
Interface synthesis
Refinement (mapping) synthesis
Architecture-
specific synthesis from concurrent processes for:
Hardware (with known architecture)
Dynamically reconfigurable logic
Static timing analysis for reactive processes
Invariant analysis of sequential programs
Refinement verification
Formal verification for software
155
PARADES
The trade-
ign
SoC
Starting from the system specification:
Functionality, i.e., WHAT the system is required to do
Constraints, i.e., the set of requirements that restrict the design space by taking into ign space by taking into consideration non functional aspects of the design such as cost, consideration non functional aspects of the design such as cost, power power consumption, performance, fault tolerance and physical dimension consumption, performance, fault tolerance and physical dimensions. s.
Architecture, i.e., the set of available components from which the designer can he designer can decide HOW she can implement the functionality satisfying the co decide HOW she can implement the functionality satisfying the constraints nstraints
The PBD methodology progresses towards the implementation of the design design “ “mapping mapping” ” the functionality of the design to the available components. the functionality of the design to the available components.
The library of available components (they can be already fully designed or they can esigned or they can be considered virtual components) is called a platform. be considered virtual components) is called a platform.
Mapping implies the selection of the components, of their interconnection scheme
and of the allocation of the functionality to each and of the allocation of the functionality to each
Several models and methods are applied to achieve the final implementation ementation
156
PARADES
A lot of material from his course at University of California at Berkeley Berkeley
L.Mangeruca, , M.Baleani M.Baleani, , M.Carloni M.Carloni, , A.Balluchi A.Balluchi, , L.Benvenuti L.Benvenuti, , T.Villa T.Villa and others and others
Who provided all the slides on configurable and extendible cores
Yoshi Watanabe, Watanabe, Felice Felice Balarin Balarin
Clas Jacobson, and others
slides on communication centric flow
157
PARADES