Hardware Modeling 3 Timing Anomalies Peter Puschner slides credits: - - PowerPoint PPT Presentation

hardware modeling 3 timing anomalies
SMART_READER_LITE
LIVE PREVIEW

Hardware Modeling 3 Timing Anomalies Peter Puschner slides credits: - - PowerPoint PPT Presentation

Hardware Modeling 3 Timing Anomalies Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015 Timing Anomalies Obstacles to building


slide-1
SLIDE 1

Hardware Modeling 3 Timing Anomalies

Peter Puschner

slides credits: P. Puschner, R. Kirner, B. Huber

VU 2.0 182.101 SS 2015

slide-2
SLIDE 2

Timing Anomalies

Obstacles to building models that allow for a safe and tight WCET analysis

2

slide-3
SLIDE 3

3

The State Explosion Problem

§ Modeling processor timing ð State explosion

  • instruction timing depends on context (execution history)
  • caches, pipelines, etc.
  • even, when using the timing relevant dynamic processor state

(TRDPS)

§ What is needed:

Reduction of modeled state space

slide-4
SLIDE 4

4

The Non-Locality of Instruction Timing

Def: TRDPS (timing-relevant dynamic processor state) contains all memory elements in the target hardware whose content influences the timing and may be modified during program execution.

§ Complex hardware: TRDPS space of a program may be

huge

ð use simplified models to compute WCET

slide-5
SLIDE 5

Complexity Reduction

Safe abstraction: over-approximation by reducing granularity

  • f model à behavior of abstract model subsumes multiple

concrete behaviors, including the real one à safe by construction, but pessimistic Simplification: approximation by eliminating execution scenarios that are considered irrelevant or non-existing à needs proof of soundness, otherwise dangerous!!

  • Example: eliminate short execution paths that lead to the

same HW state as a long execution path from the further analysis

5

slide-6
SLIDE 6

Complexity Reduction (2)

Decomposition: Decompose state space into two partitions, A and B. First, solve local problem for partition A. Second, solve the global problem for A and B by building on the solution to the local problem (instead of modeling the state space of partition A). à needs “continuity properties”, otherwise dangerous!!

  • Example: for a processor with pipeline and cache, first

analyze cache behavior, then use the cache results to analyze the overall processor including the pipeline.

6

If prerequisites of neither Simplification nor Decomposition are fulfilled, we must try pessimistic strategies ï Anomalies

slide-7
SLIDE 7

7

ABSTRACTION

concrete TRDPS

The State Explosion Problem

“the” solution: abstract TRDPS

slide-8
SLIDE 8

8

The Price of Abstraction

§ Concrete domain (deterministic computation):

  • nly join operations along the traces

initial TRDPS

slide-9
SLIDE 9

9

The Price of Abstraction (2)

§ Abstract domain (non-deterministic state transfer):

both join and split operations along the traces initial abstract TRDPS

slide-10
SLIDE 10

10

ABSTRACTION

concrete TRDPS

The Price of Abstraction (3)

a limited solution … abstract TRDPS

slide-11
SLIDE 11

11

Reducing Complexity

§ alternative to abstraction:

reduce the complexity by decomposition.

“divide and conquer”

slide-12
SLIDE 12

12

Series Decomposition

Instruction sequence

MN M N

ð decompose into two sequences:

slide-13
SLIDE 13

13

Series Decomposition

§ Analysis on control-flow graphs instead on the set of

execution traces

slide-14
SLIDE 14

14

Parallel Decomposition

Timing depends on the TRDPS s:

s

ð decompose s along the hardware components

  • f the machine used

a (cache) b (pipeline) s = 〈a,b〉

slide-15
SLIDE 15

15

Parallel Composition

§ The execution time T(I,s) of an instruction sequence I

depends on the TRDPS s:

slide-16
SLIDE 16

Dublin, ¡ECRTS'09 ¡

16

Parallel Composition (TRDPS Partitioning)

Partitioning the TRDPS between HW component A and HW component B: TRDPS: A × B

slide-17
SLIDE 17

Dublin, ¡ECRTS'09 ¡

17

Parallel Composition (TRDPS Partitioning)

Variant 1: Max Composition: choose a∈A such that absolute delay of HWA is maximal

Variant 2: Delta Composition: choose a∈A such that absolute delay of HWA is minimal and compensate by the maximal variation |a’-a’’|; a’, a’’∈A

slide-18
SLIDE 18

18

Series Timing Anomalies

slide-19
SLIDE 19

Series Timing Anomalies

There are two types of series timing anomalies: Amplification TA-S-A: ∃s,s‘∈INM. 0 < Δ(M,s,s‘) < Δ(M°N,s,s‘) Inversion TA-S-I: ∃s,s‘∈ INM . Δ(M,s,s‘) > 0 ∧ Δ(M°N,s,s‘) < 0 Auxiliary definitions

  • Δ(I,s, s’〉 … change of exec. time of instr. sequence I.
  • INM … reachable states at start of instr. sequence M

19

Δ(M,s,s‘) Δ(M°N,s,s’)

slide-20
SLIDE 20

Parallel Timing Anomalies

20

slide-21
SLIDE 21

21

Parallel Timing Anomalies

slide-22
SLIDE 22

22

Parallel Timing Anomalies

slide-23
SLIDE 23

23

Parallel Timing Anomalies

slide-24
SLIDE 24

24

Example of Amplification Anomaly

  • ut-of-order pipeline + cache + data dependencies:
slide-25
SLIDE 25

25

Example of Inversion Anomaly

  • ut-of-order pipeline + cache + data dependencies:
slide-26
SLIDE 26

Concrete Example for Inversion:

26

1 2 3 4 5 6 7 8 9 10 11 12 13 14

LSU IU MCIU LSU IU MCIU

A B C D E

A C B D E

Instructions A LD r4, 0(r3) B ADD r5, r4, r4 C ADD r11, r10, r10 D MUL r12, r11, r11 E MUL r13, r12, r12

Instruction A Cache Miss

n n

in-order resource

n n

  • ut-of-order

resource

Instruction A Cache Hit A B C D E Latency of instruction A varies by cycles.

7 t Δ = −

slide-27
SLIDE 27

Domino Effect

27

A x B C D E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

IU LSU MCIU IU LSU MCIU

n n in-order resource n n out-of-order resource

B D

A B C D E A B C D E A B C D E A B C D E

B D C C C B D B D A E A E A E A B D E A E B D E C C C A

A x B C D E A B C D E A B C D E

A B D C E A B D C E A B D C E A B D C E A B D C E A B D C E

Instructions A ADD r4, r3, r3 B SW r4, 0x0 C MUL r10, r4, r4 D LW r3, 0x8 E ADD r11, r10, r10

Initially empty Pipeline First instruction One cycle delayed

extra delay of 1 cycle each iteration !!!

slide-28
SLIDE 28

Precondition for Timing Anomalies

Common to shown patterns is a changed resource allocation sequence caused by a latency variation. Consequence: Hardware without resource allocation decisions does not allow timing anomalies to occur. Note: Occurrence of timing anomalies depends on hardware features as well as code structure.

28

Resource Allocation Criterion: A possible resource allocation decision for a hardware model is a necessary - but not sufficient - condition for the occurrence of timing anomalies.

slide-29
SLIDE 29

29

Soundness of WCET Analysis with Parallel Timing Anomalies

State Analysis Technique

no TA-P-x TA-P-I TA-P-A TA-P-I & TA-P-A (same b∈B)

Delta Composition

OK OK unsound unsound

Max Composition

OK unsound OK unsound

max (DC, MC)

OK OK OK unsound

Full State

OK OK OK OK

slide-30
SLIDE 30

Consequences of Timing Anomalies

Knowledge of the execution history required to tightly bound the execution time Without knowledge of the execution history (e.g., because it is too complex to analyze):

  • pessimistic overestimations … abstractions
  • potentially unsafe approximations … simplifications

30

slide-31
SLIDE 31

How can we avoid Timing Anomalies?

Eliminate the need for considering long execution histories:

  • deactivate caches
  • use synchronization points
  • choose more predictable HW platform

Change code structure to ensure that timing anomalies cannot take place:

  • e.g., code reordering, instruction insertion

31

slide-32
SLIDE 32

Summary

So far, no feasible check for timing anomalies is known Extend code generators to produce SW patterns that avoid timing anomalies Develop more predictable systems

  • hardware components

(e.g., scratchpad instead of caches, decisions by compiler instead of processor)

  • adequate software design patterns

(e.g., time-triggered (static) actions)

32

slide-33
SLIDE 33

Further Reading

Henrik Theiling, Christian Ferdinand, and Reinhard

  • Wilhelm. Fast and Precise WCET Prediction by Separate

Cache and Path Analyses, Real-Time Systems 18(2/3), Kluwer, 2000. Raimund Kirner and Martin Schöberl, Modeling the Function Cache for Worst-Case Execution Time Analysis. In Proc. 44th ACM Design Automation Conference, 2007. Raimund Kirner, Albrecht Kadlec, and Peter Puschner. Precise Worst-Case Execution Time Analysis for Processors with Timing Anomalies. In Proc. 21st Euromicro Conf. on Real-Time Systems, 2009.

33