Conditions for and effects of CARD cache implementations Gustaf R - - PowerPoint PPT Presentation

conditions for and effects of card cache implementations
SMART_READER_LITE
LIVE PREVIEW

Conditions for and effects of CARD cache implementations Gustaf R - - PowerPoint PPT Presentation

Conditions for and effects of CARD cache implementations Gustaf R antil a and Mikael W anggren {e99_gra,e99_mwa}@e.kth.se 1 Agenda Problem formulation Hypothesis Our approach and methods Results (and their reliability)


slide-1
SLIDE 1

Conditions for and effects of CARD cache implementations Gustaf R¨ antil¨ a and Mikael W˚ anggren {e99_gra,e99_mwa}@e.kth.se

1

slide-2
SLIDE 2

Agenda

  • Problem formulation
  • Hypothesis
  • Our approach and methods
  • Results (and their reliability)
  • Questions

2

slide-3
SLIDE 3

Problem formulation

  • Context switches degrade performance – interactive systems (with short

timeslices) extra sensitive – Overhead: Saving and loading registers & processor state, scheduling – Flushing caches, TLB, prediction buffers etc → need to rebuild them every new timeslice

3

slide-4
SLIDE 4

Hypothesis

  • We can decrease the negative effects of context switches by “caching the

cache”

  • How? On context switch – activate a CARD cache

– Save process-specific data (cache, buffers etc.) – Load ditto for the next process

  • CARD: Context switch Active – Run-time Drowsy

– Sleeps when “programs run” – Awakens on context switch – Hardware implementation not discussed in this project

4

slide-5
SLIDE 5

Issues not discussed in this project

  • Many processes – huge CARD cache

– Scheduler can prioritize most suitable processes

  • Kernel–CPU interaction

– New instructions required

5

slide-6
SLIDE 6

Our approach and methods

  • We only save and restore the cache (not registers etc)
  • Simics 2.0 for full-system simulation

– g-cache as cache model

  • x86 20 MHz hardware model
  • Red Hat Enterprice 7.3 with Linux 2.6 kernel

6

slide-7
SLIDE 7

Our approach and methods contd.

  • Cache setup (we mimic an XScale)

– 32 kB L1 i-cache, and 32 kB L1 d-cache ∗ 32-way, virtually indexed, physically tagged ∗ i-cache policy: lru, d-cache: random ∗ 1 cycle penalty for hit ∗ 50 cycle penalty for miss

7

slide-8
SLIDE 8

Implementation

  • Requirements

– Identifying context switches in Simics ∗ Break on execution of __switch_to ∗ Re-build kernel with magic instructions – Grab PID to use as key to the CARD ∗ Currently requires magic instructions

8

slide-9
SLIDE 9

Magic instructions in Linux

  • Magic instructions do no harm
  • Our procedure in Linux

– Before context switch ∗ Set eax to 0 and call magic instruction – After context switch ∗ Copy PID to eax and call magic instruction

9

slide-10
SLIDE 10

Magic instructions in Simics (python)

  • Simics has native support for magic instructions
  • Our procedure in python

– Break on MI and read eax – Load or save current cache to CARD – Start a temporal breakpoint chain ∗ For every temporal breakpoint, store statistics

10

slide-11
SLIDE 11

Experimentation

  • We simulate applications of different behaviour
  • From MiBench

– lame, calculation heavy – dijkstra, both calculation and data heavy – crc32, a very common sequential application

  • Home-made

– string search, data heavy

11

slide-12
SLIDE 12

Experimentation contd.

  • Simulations runs on a pre-emptive kernel

– But it’s not easy to force “pre-emption”

  • We want context switches!

– We loop “ps >> file” in background to force CS – Thereby we also get the programs PID

12

slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

23

slide-24
SLIDE 24

24

slide-25
SLIDE 25

25

slide-26
SLIDE 26

26

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

slide-30
SLIDE 30

Reliability in the results

  • Longer runs would eliminate start-up slowdown
  • Do we use a decent cache setup?
  • L2 Cache?
  • Is our clock frequency fair?

30

slide-31
SLIDE 31
  • Questions?

31