Hardware Modeling 3 Timing Anomalies
Peter Puschner
slides credits: P. Puschner, R. Kirner, B. Huber
VU 2.0 182.101 SS 2015
Hardware Modeling 3 Timing Anomalies Peter Puschner slides credits: - - PowerPoint PPT Presentation
Hardware Modeling 3 Timing Anomalies Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015 Timing Anomalies Obstacles to building
Peter Puschner
slides credits: P. Puschner, R. Kirner, B. Huber
VU 2.0 182.101 SS 2015
Obstacles to building models that allow for a safe and tight WCET analysis
2
3
§ Modeling processor timing ð State explosion
(TRDPS)
§ What is needed:
Reduction of modeled state space
4
Def: TRDPS (timing-relevant dynamic processor state) contains all memory elements in the target hardware whose content influences the timing and may be modified during program execution.
§ Complex hardware: TRDPS space of a program may be
huge
ð use simplified models to compute WCET
Safe abstraction: over-approximation by reducing granularity
concrete behaviors, including the real one à safe by construction, but pessimistic Simplification: approximation by eliminating execution scenarios that are considered irrelevant or non-existing à needs proof of soundness, otherwise dangerous!!
same HW state as a long execution path from the further analysis
5
Decomposition: Decompose state space into two partitions, A and B. First, solve local problem for partition A. Second, solve the global problem for A and B by building on the solution to the local problem (instead of modeling the state space of partition A). à needs “continuity properties”, otherwise dangerous!!
analyze cache behavior, then use the cache results to analyze the overall processor including the pipeline.
6
If prerequisites of neither Simplification nor Decomposition are fulfilled, we must try pessimistic strategies ï Anomalies
7
concrete TRDPS
“the” solution: abstract TRDPS
8
§ Concrete domain (deterministic computation):
initial TRDPS
9
§ Abstract domain (non-deterministic state transfer):
both join and split operations along the traces initial abstract TRDPS
10
concrete TRDPS
a limited solution … abstract TRDPS
11
§ alternative to abstraction:
reduce the complexity by decomposition.
12
13
§ Analysis on control-flow graphs instead on the set of
execution traces
14
15
§ The execution time T(I,s) of an instruction sequence I
depends on the TRDPS s:
Dublin, ¡ECRTS'09 ¡
16
Dublin, ¡ECRTS'09 ¡
17
Variant 2: Delta Composition: choose a∈A such that absolute delay of HWA is minimal and compensate by the maximal variation |a’-a’’|; a’, a’’∈A
18
There are two types of series timing anomalies: Amplification TA-S-A: ∃s,s‘∈INM. 0 < Δ(M,s,s‘) < Δ(M°N,s,s‘) Inversion TA-S-I: ∃s,s‘∈ INM . Δ(M,s,s‘) > 0 ∧ Δ(M°N,s,s‘) < 0 Auxiliary definitions
19
Δ(M,s,s‘) Δ(M°N,s,s’)
20
21
22
23
24
25
26
1 2 3 4 5 6 7 8 9 10 11 12 13 14
LSU IU MCIU LSU IU MCIU
A B C D E
A C B D E
Instructions A LD r4, 0(r3) B ADD r5, r4, r4 C ADD r11, r10, r10 D MUL r12, r11, r11 E MUL r13, r12, r12
Instruction A Cache Miss
n n
in-order resource
n n
resource
Instruction A Cache Hit A B C D E Latency of instruction A varies by cycles.
7 t Δ = −
27
A x B C D E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
IU LSU MCIU IU LSU MCIU
n n in-order resource n n out-of-order resource
B D
A B C D E A B C D E A B C D E A B C D E
B D C C C B D B D A E A E A E A B D E A E B D E C C C A
A x B C D E A B C D E A B C D E
A B D C E A B D C E A B D C E A B D C E A B D C E A B D C E
Instructions A ADD r4, r3, r3 B SW r4, 0x0 C MUL r10, r4, r4 D LW r3, 0x8 E ADD r11, r10, r10
Initially empty Pipeline First instruction One cycle delayed
extra delay of 1 cycle each iteration !!!
Common to shown patterns is a changed resource allocation sequence caused by a latency variation. Consequence: Hardware without resource allocation decisions does not allow timing anomalies to occur. Note: Occurrence of timing anomalies depends on hardware features as well as code structure.
28
Resource Allocation Criterion: A possible resource allocation decision for a hardware model is a necessary - but not sufficient - condition for the occurrence of timing anomalies.
29
State Analysis Technique
no TA-P-x TA-P-I TA-P-A TA-P-I & TA-P-A (same b∈B)
Delta Composition
OK OK unsound unsound
Max Composition
OK unsound OK unsound
max (DC, MC)
OK OK OK unsound
Full State
OK OK OK OK
Knowledge of the execution history required to tightly bound the execution time Without knowledge of the execution history (e.g., because it is too complex to analyze):
30
Eliminate the need for considering long execution histories:
Change code structure to ensure that timing anomalies cannot take place:
31
So far, no feasible check for timing anomalies is known Extend code generators to produce SW patterns that avoid timing anomalies Develop more predictable systems
(e.g., scratchpad instead of caches, decisions by compiler instead of processor)
(e.g., time-triggered (static) actions)
32
Henrik Theiling, Christian Ferdinand, and Reinhard
Cache and Path Analyses, Real-Time Systems 18(2/3), Kluwer, 2000. Raimund Kirner and Martin Schöberl, Modeling the Function Cache for Worst-Case Execution Time Analysis. In Proc. 44th ACM Design Automation Conference, 2007. Raimund Kirner, Albrecht Kadlec, and Peter Puschner. Precise Worst-Case Execution Time Analysis for Processors with Timing Anomalies. In Proc. 21st Euromicro Conf. on Real-Time Systems, 2009.
33