CS6354: Snooping Cache Coherency 7 October 2016 1 To read more - - PowerPoint PPT Presentation

cs6354 snooping cache coherency
SMART_READER_LITE
LIVE PREVIEW

CS6354: Snooping Cache Coherency 7 October 2016 1 To read more - - PowerPoint PPT Presentation

CS6354: Snooping Cache Coherency 7 October 2016 1 To read more This days papers: Goodman, Using cache memory to reduce processor-memory traffic Archibald and Baer, Cache Coherence Models: Evaluation Using a Multiprocessor


slide-1
SLIDE 1

CS6354: Snooping Cache Coherency

7 October 2016

1

slide-2
SLIDE 2

To read more…

This day’s papers:

Goodman, “Using cache memory to reduce processor-memory traffic” Archibald and Baer, “Cache Coherence Models: Evaluation Using a Multiprocessor Simulation Model”

Supplementary readings:

Hennessy and Patterson, section 5.3

1

slide-3
SLIDE 3

caching shared memories

CPU1 CPU2 MEM1

address value 0xA300 100 0xC400 200 0xE500 300 address value 0x9300 172 0xA300 100 0xC500 200

CPU1 writes 101 to 0xA300?

When does this change? When does this change?

2

slide-4
SLIDE 4

caching shared memories

CPU1 CPU2 MEM1

address value 0xA300 100101 0xC400 200 0xE500 300 address value 0x9300 172 0xA300 100 0xC500 200

CPU1 writes 101 to 0xA300?

When does this change? When does this change?

2

slide-5
SLIDE 5

cache coherency states

extra information for each cache block

  • verlaps with valid, dirty bits

stored in each cache difgerent caches may have difgerent states for same block

3

slide-6
SLIDE 6

scheme 1: MSI

read write write read read or write hear write hear read or writeback hear write Invalid

start

Shared Modifjed

required to write leaving updates memory triggered by others writing

dashed: overhead on bus; blue: message sent on bus

4

slide-7
SLIDE 7

scheme 1: MSI

read write write read read or write hear write hear read or writeback hear write Invalid

start

Shared Modifjed

required to write leaving updates memory triggered by others writing

dashed: overhead on bus; blue: message sent on bus

4

slide-8
SLIDE 8

scheme 1: MSI

read write write read read or write hear write hear read or writeback hear write Invalid

start

Shared Modifjed

required to write leaving updates memory triggered by others writing

dashed: overhead on bus; blue: message sent on bus

4

slide-9
SLIDE 9

scheme 1: MSI

read write write read read or write hear write hear read or writeback hear write Invalid

start

Shared Modifjed

required to write leaving updates memory triggered by others writing

dashed: overhead on bus; blue: message sent on bus

4

slide-10
SLIDE 10

scheme 1: MSI

State hear read hear write read write Invalid — — Shared Modifjed Shared — to Invalid Modifjed Modifjed Shared Invalid — —

blue: transition sends bus signal

5

slide-11
SLIDE 11

MSI example

CPU1 CPU2 MEM1

address value state 0xA300 100 Shared 0xC400 200 Shared 0xE500 300 Shared address value state 0x9300 172 Shared 0xA300 100 Shared 0xC500 200 Shared

“CPU1 is writing 0xA3000” CPU1 writes 101 to 0xA300 Cache sees write: invalidate 0xA300 Memory updated* CPU1 writes 102 to 0xA300 Modifjed state — nothing communicated! Nothing changed yet (writeback) “What is 0xA300?” CPU2 reads 0xA300 Modifjed state — must update for CPU2! “Write 102 into 0xA300” CPU2 reads 0xA300 Written back to memory early (could also become Invalid at CPU1)

6

slide-12
SLIDE 12

MSI example

CPU1 CPU2 MEM1

address value state 0xA300 100101 Modifjed 0xC400 200 Shared 0xE500 300 Shared address value state 0x9300 172 Shared 0xA300 100 Invalid 0xC500 200 Shared

“CPU1 is writing 0xA3000” CPU1 writes 101 to 0xA300 Cache sees write: invalidate 0xA300 Memory updated* CPU1 writes 102 to 0xA300 Modifjed state — nothing communicated! Nothing changed yet (writeback) “What is 0xA300?” CPU2 reads 0xA300 Modifjed state — must update for CPU2! “Write 102 into 0xA300” CPU2 reads 0xA300 Written back to memory early (could also become Invalid at CPU1)

6

slide-13
SLIDE 13

MSI example

CPU1 CPU2 MEM1

address value state 0xA300 101102 Modifjed 0xC400 200 Shared 0xE500 300 Shared address value state 0x9300 172 Shared 0xA300 100 Invalid 0xC500 200 Shared

“CPU1 is writing 0xA3000” CPU1 writes 101 to 0xA300 Cache sees write: invalidate 0xA300 Memory updated* CPU1 writes 102 to 0xA300 Modifjed state — nothing communicated! Nothing changed yet (writeback) “What is 0xA300?” CPU2 reads 0xA300 Modifjed state — must update for CPU2! “Write 102 into 0xA300” CPU2 reads 0xA300 Written back to memory early (could also become Invalid at CPU1)

6

slide-14
SLIDE 14

MSI example

CPU1 CPU2 MEM1

address value state 0xA300 102 0xC400 200 Shared 0xE500 300 Shared address value state 0x9300 172 Shared 0xA300 100 Invalid 0xC500 200 Shared

“CPU1 is writing 0xA3000” CPU1 writes 101 to 0xA300 Cache sees write: invalidate 0xA300 Memory updated* CPU1 writes 102 to 0xA300 Modifjed state — nothing communicated! Nothing changed yet (writeback) “What is 0xA300?” CPU2 reads 0xA300 Modifjed state — must update for CPU2! “Write 102 into 0xA300” CPU2 reads 0xA300 Written back to memory early (could also become Invalid at CPU1)

6

slide-15
SLIDE 15

MSI example

CPU1 CPU2 MEM1

address value state 0xA300 102 Shared 0xC400 200 Shared 0xE500 300 Shared address value state 0x9300 172 Shared 0xA300 100 Invalid 0xC500 200 Shared

“CPU1 is writing 0xA3000” CPU1 writes 101 to 0xA300 Cache sees write: invalidate 0xA300 Memory updated* CPU1 writes 102 to 0xA300 Modifjed state — nothing communicated! Nothing changed yet (writeback) “What is 0xA300?” CPU2 reads 0xA300 Modifjed state — must update for CPU2! “Write 102 into 0xA300” CPU2 reads 0xA300 Written back to memory early (could also become Invalid at CPU1)

6

slide-16
SLIDE 16

update memory

to write value (enter modifjed state), only need to invalidate others more efficient: shorter bus message

7

slide-17
SLIDE 17
  • n cache replacement/writeback

still happens — e.g. want to store something else changes state to invalid requires writeback if modifjed (= dirty bit)

8

slide-18
SLIDE 18

scheme 1: MSI

Modifjed value is difgerent than memory and I am the only one who has it Shared value is the same as memory Invalid I don’t have the value; I will need to ask for it

9

slide-19
SLIDE 19

MSI complaints

modifying (read then write then write) a value often three messages: initial read from memory invalidate other caches (and maybe write to memory) on initial write fjnal writeback

10

slide-20
SLIDE 20

scheme 2: MESI

Modifjed value is difgerent than memory and I am the only one who has it Exclusive value is same as memory and I am the only one who has it Shared value is the same as memory Invalid I don’t have the value; I will need to ask for it

11

slide-21
SLIDE 21

scheme 2: MESI

read from memory read from cache write write read hear read write read read or write hear write hear read Invalid

start

Exclusive Shared Modifjed

blue = message sent caches must respond if they have a copy change state and return unchanged value need to write value to memory

  • therwise no one will

12

slide-22
SLIDE 22

scheme 2: MESI

read from memory read from cache write write read hear read write read read or write hear write hear read Invalid

start

Exclusive Shared Modifjed

blue = message sent caches must respond if they have a copy change state and return unchanged value need to write value to memory

  • therwise no one will

12

slide-23
SLIDE 23

scheme 2: MESI

read from memory read from cache write write read hear read write read read or write hear write hear read Invalid

start

Exclusive Shared Modifjed

blue = message sent caches must respond if they have a copy change state and return unchanged value need to write value to memory

  • therwise no one will

12

slide-24
SLIDE 24

read for ownership

reading to modify a value soon? read into Exclusive state even if reading from cache invalidate and read second way to enter exclusive state

13

slide-25
SLIDE 25

MESI complaints

have to update memory to share a modifjed value … even though caches read from other caches read from which cache?

14

slide-26
SLIDE 26

scheme 2: MESI

read from memory read from cache write write read hear read write read read or write hear write hear read Invalid

start

Exclusive Shared Modifjed

blue = message sent caches must respond if they have a copy change state and return unchanged value need to write value to memory

  • therwise no one will

15

slide-27
SLIDE 27

scheme 3: MOESI

Modifjed value is difgerent than memory and I am the only one who has it Owned value is difgerent than memory and I must update memory Exclusive value is same as memory and I am the only one who has it Shared value is same as memory or cache in Owned state Invalid I don’t have the value

16

slide-28
SLIDE 28

scheme 3: MOESI

read memory read cache write write read hear any write read read

  • r

write hear read hear write hear write write read Invalid Exclusive Shared Modifjed Owned

blue = message sent send value to caches, but not memory writing notifjes other caches (unlike Modifjed state) invalidate only due to cache replacement

17

slide-29
SLIDE 29

scheme 3: MOESI

read memory read cache write write read hear any write read read

  • r

write hear read hear write hear write write read Invalid Exclusive Shared Modifjed Owned

blue = message sent send value to caches, but not memory writing notifjes other caches (unlike Modifjed state) invalidate only due to cache replacement

17

slide-30
SLIDE 30

scheme 3: MOESI

read memory read cache write write read hear any write read read

  • r

write hear read hear write hear write write read Invalid Exclusive Shared Modifjed Owned

blue = message sent send value to caches, but not memory writing notifjes other caches (unlike Modifjed state) invalidate only due to cache replacement

17

slide-31
SLIDE 31

scheme 3: MOESI

read memory read cache write write read hear any write read read

  • r

write hear read hear write hear write write read Invalid Exclusive Shared Modifjed Owned

blue = message sent send value to caches, but not memory writing notifjes other caches (unlike Modifjed state) invalidate only due to cache replacement

17

slide-32
SLIDE 32

MOESI example

CPU1 CPU2 MEM1

address value state address value state

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-33
SLIDE 33

MOESI example

CPU1 CPU2 MEM1

address value state 0xA300 100 Exclusive address value state

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-34
SLIDE 34

MOESI example

CPU1 CPU2 MEM1

address value state 0xA300 100101 Modifjed address value state

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-35
SLIDE 35

MOESI example

CPU1 CPU2 MEM1

address value state 0xA300 101 Modifjed address value state

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-36
SLIDE 36

MOESI example

CPU1 CPU2 MEM1

address value state 0xA300 101 Modifjed address value state

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-37
SLIDE 37

MOESI example

CPU1 CPU2 MEM1

address value state 0xA300 101 Owned address value state 0xA300 101 Shared

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-38
SLIDE 38

MOESI example

CPU1 CPU2 MEM1

address value state 0xA300 101 Invalid address value state 0xA300 101102 Modifjed

CPU1: “What is 0xA300” Memory: “0xA300 = 100” CPU2: “What is 0xA300” CPU1: “0xA300 = 101” CPU2: “I’m changing 0xA300” CPU1: read 0xA300 CPU1: write 0xA300 CPU1: read 0xA300 CPU2: read 0xA300 CPU2: write 0xA300

18

slide-39
SLIDE 39

MSI versus MESI versus MOESI

CPU1: read 0xA300 CPU1: write 0xA300 MSI: invalidate CPU1: read 0xA300 CPU2: read 0xA300 MSI/MESI: memory write CPU2: write 0xA300 MSI: invalidate

19

slide-40
SLIDE 40

Other cache coherency options

can invalidate instead of updating other caches on write invalidation message faster to send than new value tradeofg: faster if other cache won’t use value

20

slide-41
SLIDE 41

Dropping states from MOESI

Modifjed value is difgerent than memory and I am the only one who has it Owned value is difgerent than memory and I must update memory Exclusive value is same as memory and I am the only one who has it Shared value is same as memory or cache in Owned state Invalid I don’t have the value

21

slide-42
SLIDE 42

Dropping states from MOESI

Modifjed value is difgerent than memory and I am the only one who has it Owned value is difgerent than memory and I must update memory Exclusive value is same as memory and I am the only one who has it Shared value is same as memory or cache in Owned state Invalid I don’t have the value

21

slide-43
SLIDE 43

Mapping to the paper

MSI + reread to get in Modifjed: Synapse MESI + full-write-to-invalidate: write-once MOSI + forward-on-write: Berkeley MESI + forward-on-write: Illinois MESI + invalidate-on-write: Firefmy MOESI + forward-on-write: Dragon

22

slide-44
SLIDE 44

“System Power”

sum of processor utilizations how much time are CPUs spending waiting for bus what about overlapping cache accesses and computation??

23

slide-45
SLIDE 45
  • verhead if almost no shared data

24

slide-46
SLIDE 46
  • verheads without sharing data

sending invalidation signals no other cache needs reloading value from memory no cache needs (Synapse)

25

slide-47
SLIDE 47

simulation caveats

workloads? variation in hardware?

26

slide-48
SLIDE 48

false sharing

cache blocks are shared even if you are accessing difgerent parts huge performance problem with writes

27

slide-49
SLIDE 49

Present-day snooping cache coherency

AMD processors use MOESI Intel uses something called MESIF plus some techniques we’ll talk about next time

28

slide-50
SLIDE 50

MESIF states

Modifjed value is difgerent than memory and I am the only one who has it Exclusive value is same as memory and I am the only one who has it Shared value is same as memory Invalid I don’t have the value Forwarding value is same as memory and I should provide it if requested

29

slide-51
SLIDE 51

Forwarding state: lower traffic

Image from Kanter, “The Common System Interface: Intel’s Future Interconnect” http://www.realworldtech.com/common-system-interface/5/

30

slide-52
SLIDE 52

Non-bus topologies

necessary to connect large numbers of caches higher bandwidth — if you don’t broadcast everything next time: avoiding broadcast

31

slide-53
SLIDE 53

timing trickiness

CPU1 CPU2 CPU3 CPU4

CPU1 is changing X CPU4 is changing X

32

slide-54
SLIDE 54

compare-and-swap

compare−and−swap(address, expect−old−value, new−value) { atomically { if (expect−old−value == memory[address]) { memory[address] = new−value } } }

33

slide-55
SLIDE 55

Implementing compare-and-swap

get block into Exclusive or Modifjed state

read from memory/cache if necessary invalidate other caches if necessary

compare, if value matches, do write (Modifjed state)

34

slide-56
SLIDE 56

Coherency

common property: single ‘responsible’ cache for possibly changed values

Owned, Exclusive, Modifjed states

responsible cache must reply to reads of address variation:

when is responsibility acquired? (only on write?) when is it relinguished? (only on other’s write?)

35