[PPT] - An Imitation Learning Approach for Cache Replacement Evan Z. Liu, PowerPoint Presentation

SLIDE 1

Evan Z. Liu, Milad Hashemi, Kevin Swersky, Paruhasarathy Ranganathan, Junwhan Ahn

An Imitation Learning Approach for Cache Replacement

SLIDE 2

The Need for Faster Compute

(htups://openai.com/blog/ai-and-compute/) Small cache improvements can make large difgerences! (Beckman, 2019)

E.g., 1% cache hit rate improvement → 35%

decrease in latency (Cidon, et. al., 2016) Caches are everywhere:

CPU chips
Operating Systems
Databases
Web applications

Our goal: Faster applications via betuer cache replacement policies

SLIDE 3

TL;DR:

I. We approximate the optimal cache replacement policy by (implicitly) predicting the future II. Caching is an aturactive benchmark for the general reinforcement learning / imitation learning communities

SLIDE 4

Miss Hit (100x faster)

Cache Replacement

Miss

B A C D A B A D C B A D

Cache Accesses Evict Goal: Evict the cache lines to maximize cache hits

SLIDE 5

Miss

Cache Replacement

C C

Cache Accesses Evict

B A D D A B A B D A

Hit Miss Mistake

SLIDE 6

Cache Replacement

C C

Cache Accesses

B A D D B D A

Hit Miss

B A A

Optimal decision Miss

SLIDE 7

Cache Replacement

C C

Cache Accesses

B A D D B D A

Hit Miss

B A A

Miss

Reuse distance dt(line): number of accesses from access t until the line is reused

d0(A) = 1, d0(B) > 2, d0(C) = 2

Optimal Policy (Belady’s): Evict the line with the greatest reuse distance (Belady, 1966)

SLIDE 8

Belady’s Requires Future Information

Reuse distance dt(line): number of accesses from access t until the line is reused Problem: Computing reuse distance requires knowing the future So in practice, we use heuristics, e.g.:

Least-recently used (LRU)
Most-recently used (MRU)

… but these pergorm poorly on complex access patuerns

SLIDE 9

Leveraging Belady’s

Idea: approximate Belady’s from past accesses

Past accesses Current access Future accesses . . . . . . Learned Model Belady’s Predicted decision Optimal decision Training

SLIDE 10

Prior Work

Past accesses Current access Current cache state Current line cache friendly or averse? Evict line X Trained on Belady’s Traditional Algorithm Hawkeye / Glider Current state-of-the-aru (Shi et. al., ‘19, Jain et. al., ‘18)

SLIDE 11

Prior Work

+ binary classifjcation is relatively easy to learn

traditional algorithm can’t

express optimal policy

Past accesses Current access Current cache state Current line cache friendly or averse? Evict line X Trained on Belady’s Traditional Algorithm Hawkeye / Glider Current state-of-the-aru (Shi et. al., ‘19, Jain et. al., ‘18)

SLIDE 12

. . . Our proposal

Our Approach

Past accesses Current access Model Current cache state Evict line X Our contribution: Directly approximate Belady’s via imitation learning Trained on Belady’s Past accesses Current access Current cache state Current line cache friendly or averse? Evict line X Trained on Belady’s Traditional Algorithm Hawkeye / Glider Current state-of-the-aru (Shi et. al., ‘19, Jain et. al., ‘18)

SLIDE 13

Cache Replacement Markov Decision Process

Miss Hit Miss

B A C D B D C B A D

Cache Accesses Evict

A A

Similar to Wang, et. al., 2019

SLIDE 14

Past accesses Current access Miss Hit Miss

B A C D B D C B A D

Cache Accesses Evict

A A

Current cache contents

Cache Replacement Markov Decision Process

Similar to Wang, et. al., 2019

SLIDE 15

Miss Hit Miss

B A C D B D C B A D A A

Cache Accesses

Cache Replacement Markov Decision Process

Similar to Wang, et. al., 2019

SLIDE 16

B A D

Cache Accesses Evict Miss Hit Miss

D C A B A C B D A

Cache Replacement Markov Decision Process

Similar to Wang, et. al., 2019

SLIDE 17

Leveraging the Optimal Policy

Typical imitation learning setuing

(Pomerlau, 1991, Ross, et. al., 2011, Kim, et. al., 2013)

state

Learned policy

ptimal action

Learned policy

Approximate optimal policy

state

ptimize, e.g.,

Observation: Not all errors are equally bad

Learning from optimal policy yields

greater training signal Concretely: minimize a ranking loss

SLIDE 18

Reuse distance

Reuse Distance as an Auxiliary Task

Observation: predicting reuse distance is correlated with cache replacement

Cast this as an auxiliary task (Jaderberg, et. al., 2016)

State st

State embedding Policy

Loss

SLIDE 19

Results

LRU cache-hit rate Optimal cache-hit rate ~19% cache-hit rate increase over Glider (Shi, et. al., 2019) on memory-intensive SPEC2006 applications (Jaleel, et. al., 2009) ~64% cache-hit rate increase over LRU on Google Web Search

SLIDE 20

This work: Establish a proof-of-concept

A Note on Practicality

12

... Address: 0x

C5 A1 Byte 1 Byte 2 Byte 3

Linear Layer address embedding

Per-byte address embedding

Reduce embedding size from 100MB to <10KB
~6% cache-hit rate increase on SPEC2006 vs.

Glider

~59% cache-hit rate increase on Google Web

Search vs. LRU

SLIDE 21

Per-byte address embedding

Reduce embedding size from 100MB to <10KB
~6% cache-hit rate increase on SPEC2006 vs.

Glider

~59% cache-hit rate increase on Google Web

Search vs. LRU

This work: Establish a proof-of-concept

A Note on Practicality

12

... Address: 0x

C5 A1 Byte 1 Byte 2 Byte 3

Linear Layer address embedding

Future work: Production ready learned policies

Smaller models via distillation (Hinton, et. al., 2015), pruning (Janowsky, 1989,

Han, et. al., 2015, Sze, et. al., 2017), or quantization

Target domains with longer latency and larger caches (e.g., sofuware

caches)

SLIDE 22

A New Imitation / Reinforcement Learning Benchmark

+ plentiful data

delayed real-world utility
limited / expensive data

+ immediate real-world impact + plentiful data + immediate real-world impact

Miss

B A C D

Evict

Bellemare, et. al., 2012, Silver, et. al., 2017, OpenAI, 2019, Vinyals, et. al., 2019 Levine, et. al., 2016, Lillicrap, et. al., 2015

Open-source cache replacement Gym environment coming soon!

SLIDE 23

Takeaways

A new state-of-the-aru approach for cache replacement by imitating the
racle policy

○ Future work: making this production ready

A new benchmark for imitation learning / reinforcement learning research