Gaining Insights into Multicore Cache Partitioning: Bridging the - - PowerPoint PPT Presentation

gaining insights into multicore
SMART_READER_LITE
LIVE PREVIEW

Gaining Insights into Multicore Cache Partitioning: Bridging the - - PowerPoint PPT Presentation

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi 2/18/2014 Introduction and Motivation 2 A serious issue to the effective utilization of multicore


slide-1
SLIDE 1

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Presented by Hadeel Alabandi

2/18/2014

1

slide-2
SLIDE 2

Introduction and Motivation

  • A serious issue to the effective utilization of multicore processors

is cache partitioning and sharing

  • Simulation were used to evaluate cache partitioning in the

existing studies, however, it has some limitations

  • Excessive simulation time
  • Absence of OS activities
  • Proneness to simulation inaccuracy

2/18/2014

2

slide-3
SLIDE 3

Introduction and Motivation (cont.)

  • In this paper, a software approach has been used
  • It supports static and dynamic cache partitioning by using memory

address mapping

  • It emulates hardware partitioning mechanism will examine cache

partitioning policies on real time systems

  • Three metrics were used through evaluation for optimization

purposes

  • Performance
  • Fairness
  • QoS

2/18/2014

3

slide-4
SLIDE 4

Cache Partitioning for Multicore Processors

  • It has two interdependent parts
  • Mechanism
  • Forces cache partitioning
  • Provides partitioning policy input
  • Policy
  • Decides how much cache resources will be allocated to each program with an
  • ptimization objective

2/18/2014

4

slide-5
SLIDE 5

Adopted Evaluation Metrics in The Study

  • Performance Metrics
  • Throughput (IPCs)
  • Absolute number of IPCs
  • Combined miss rates
  • Summarizes miss rates
  • Combined misses
  • Summarizes number of cache misses
  • QoS Metrics
  • Suppose that QoS constraints are never violated in their case

2/18/2014

5

slide-6
SLIDE 6

Adopted Evaluation Metrics in The Study (cont.)

  • Fairness Metrics
  • Miss rates
  • The number of misses
  • The slowdown for each co- secheduled program should be identical after

cache partitioning

  • In the study, fairness metrics related to single core execution with

dedicated L2 cache

  • Date required for policy metric and the evaluation metric were acquired

by running a workload with different cache partitioning

  • The result value will be in the range (-1 to 1)
  • If the result is 1, the correlation between the 2 metrics is perfect

2/18/2014

6

slide-7
SLIDE 7

Static OS-based Cache Partitioning

  • Static cache partitioning policy predetermines the amount of

cache blocks allocated to each program at the beginning of its execution

  • Page coloring will be used in the partitioning mechanism
  • There several bits between cache index and physical page number

in the physical address

  • It will be used for page color
  • Addressed cache will be divided to non-intersecting regions by

page color

  • Pages with the same color are mapped to the same cache region

2/18/2014

7

slide-8
SLIDE 8

Cache Partitioning – Page Coloring

2/18/2014

8

slide-9
SLIDE 9

Cache Partitioning – Page Coloring

2/18/2014

9

slide-10
SLIDE 10

Dynamic OS-based Cache Partitioning

  • Adjust cache quotas among processes dynamically
  • Page recoloring procedure
  • Increasing the process cache resources ( i.e number of colors used by the

process)

  • The kernel rearrange the virtual memory mapping of the process
  • Allocating physical pages of the new color
  • Copying the memory contents
  • Freeing the old pages
  • Remapping virtual pages cause performance overhead
  • Reduce the overall overhead by lowering the frequency of cache allocation

adjustment

  • Another option is using lazy method of page migration, so the content of

colored page is moved only when it’s accessed

  • Average overhead of dynamic partitioning reduced to 2%
  • Highest migration overhead observed 7%

2/18/2014

10

slide-11
SLIDE 11

Page Recoloring

2/18/2014

11

slide-12
SLIDE 12

Dynamic Cache Partitioning Policies

  • Cache partitioning will be adjusted periodically by the policies at

the end of each epoch

  • Dynamic cache partitioning policy for performance
  • Adjust cache partitioning dynamically
  • Metrics
  • Throughput (IPCs)
  • Combined miss rate
  • Combined misses
  • Fair speedup
  • Dynamic cache partitioning policy for fairness
  • Two dynamic policies were implemented based on FM0 and FM4
  • FM0 is the evaluation metric ( i.e. the ratio of the current cumulative IPC over

the baseline IPC)

  • FM4 is the cache miss rates

2/18/2014

12

slide-13
SLIDE 13

Dynamic Cache Partitioning Policies (cont.)

  • Dynamic cache partitioning policy for QoS consideration
  • Two core workload of two programs
  • The first is the target program
  • The second is the partner program
  • QoS guarantee
  • Ensure the target program performance is larger than or equal to X% of a

baseline execution of homogeneous workload on a dual core processor with half

  • f the cache capacity allocated for each program
  • Increase the performance of the partner program

2/18/2014

13

slide-14
SLIDE 14

Experimental Methodology

  • Hardware and software platform
  • Dell PowerEdge1950
  • Two dual core, 3.0GHz Intel Xeon 5160 processors and 8GB fully Buffered DIMM

(FB-DIMM) main memory

  • Shared, 4MB, 16-way set associative L2 cache
  • Each core has a private 32KB instruction cache and a private 32KB data cache
  • Red Hat Enterprise Linux 4.0
  • Kernel linux-2.6.20.3
  • Performance collected using pfmon

2/18/2014

14

slide-15
SLIDE 15

Evaluation Results

2/18/2014

15

  • Show the improvement with the best static partitioning of each

workload over shared cache

slide-16
SLIDE 16

The Performance – Static & Dynamic

2/18/2014

16

slide-17
SLIDE 17

Fairness – Correlation between Evaluation Metrics and Policy Metrics

2/18/2014

17

slide-18
SLIDE 18

QoS – Static & Dynamic

2/18/2014

18

slide-19
SLIDE 19

Related Work

  • Cache partitioning for multicore processors
  • Page Coloring

2/18/2014

19

slide-20
SLIDE 20

Summary

  • An OS-based cache partitioning mechanism on multicore

processors were designed and implemented

  • Using it to study different cache partitioning polices
  • Some simulation-based study findings’ were confirmed, however,

this approach shows new insights haven’t been shown by simulation

  • Future work
  • Reduce cache partitioning overhead
  • Adding easy user interface
  • Conducting partitioning research at the compiler level for both

multiprogramming and multithreaded applications

2/18/2014

20

slide-21
SLIDE 21

Discussion

  • Does OS-based approach had provided new insights and
  • bservations that simulation couldn’t or failed to show it?

2/18/2014

21

slide-22
SLIDE 22

References

  • Gaining Insights into Multicore Cache Partitioning:Bridging the

Gap between Simulation and Real Systems

  • http://www.contrib.andrew.cmu.edu/~hyoseunk/pdf/ecrts13-

hyos-slides.pdf

  • http://ftp.cs.rochester.edu/~xiao/eurosys09/euro061-zhang.pdf

2/18/2014

22