CSE 120 Cache to map virtual page numbers to page frame Associative - - PowerPoint PPT Presentation

cse 120
SMART_READER_LITE
LIVE PREVIEW

CSE 120 Cache to map virtual page numbers to page frame Associative - - PowerPoint PPT Presentation

Translation Lookaside Buffer (TLB) Implemented in Hardware CSE 120 Cache to map virtual page numbers to page frame Associative memory: HW looks up in all cache entries simultaneously Usually not big: 64-128 entries TLB entry: page


slide-1
SLIDE 1

CSE 120

July 18, 2006 Day 5 Memory Instructor: Neil Rhodes Translation Lookaside Buffer (TLB) Implemented in Hardware

Cache to map virtual page numbers to page frame

Associative memory: HW looks up in all cache entries simultaneously

– Usually not big: 64-128 entries

TLB entry:

– page number – Valid – Modified – Protection – Page frame

If not present, do ordinary lookup, then evict entry from TLB and add new

  • ne

– Evict which entry?

Serial/Parallel lookup

– Serial: First look in TLB. If not found, then look in page table – Parallel. Look in TLB and in page table in parallel. If not found in TLB, then page table

lookup already in progress.

2

Software TLB Management

MMU doesn’t handle page tables; software does On a TLB miss, generate a TLB fault and let OS deal with it

Search a larger memory cache.

– Page containing cache must be in TLB for speed

If not in cache, search page table Once page frame, etc. found, update TLB

Why not use hardware?

Logic to search page table takes space on the die Spend die size alternatively:

– Increase Memory cache – Reduce cost/power consumption

3

TLB Summary

Cost Example

Direct memory access: 100ns Without TLB: 200ns (lookup in Page Table first) With TLB

– Assume cost of TLB lookup is 10ns – Assume TLB hit rate is 90% – Serial lookup: Average cost = .9*110ns + .1*200ns = 119ns – Parallel lookup: Average cost = .9*110ns + .1*(200ns-10ns) = 118ns

Caches are very sensitive to:

Hit rate Cost of cache miss

Note that TLB must be flushed on context switch

Unless TLB entries include process ID

4

slide-2
SLIDE 2

Inverted Page Tables

Traditional page tables: 1 entry/virtual page Inverted page tables: 1 entry/physical frame of memory Why? Size

64-bit virtual addresses, 4KB page 256MB of RAM per process. Inverted

page table needs 65536 entries

Page Table Entry:

Process ID Virtual page number Additional PTE info

Slow to search through table with 65536 entries

Solution: Hash table. Key is virtual page number. Entry contains virtual page,

process ID and page frame

Advantage:

Page table memory is proportional to physical memory

– Not logical address space – Not number of processes

Disadvantage

Hard to share memory between processes

5

Inverted Page Tables

Hash table

Space: proportional to number of allocated memory frames

– 1 entry in hash table for each allocated page

6

p virtual address hash(p)

  • ffset

f f physical address

  • ffset

pid:

Segmentation vs. Paging

7

Segmentation Paging Need the programmer be aware the technique is being used? How many linear address spaces are there? Can the total address space exceed the size of phys. mem? Can procedures and data be distinguished and separately protected? Can tables whose size fluctuates be accommodated easily? Is sharing of procedures between users facilitated?

Page Fault Handling for Paging

MMU generates Page Fault (protection violation or page not present). Page fault handler must:

Save registers Figure out virtual address that caused fault

– Often in hardware register

If protection problem, signal or kill process If writing to page after currently-allocated stack

– Allocate free page to add to stack – Update page-table –

Restart instruction for faulting process

  • Must undo any partial effects

Else, signal or kill process

8

slide-3
SLIDE 3

Virtual Memory

Idea: Use fast (small, expensive) memory as a cache for slow (large, expensive) disk

90/10 rule: processes spend 90% of there time in 10% of the code Not all of a process’s address space need be in memory at a time Illusion of near-infinite memory More processes in memory (higher degree of multiprogramming)

Locality

Spatial: The likelihood of accessing a resource is higher if a resource close

to it was just referenced.

Temporal: The likelihood of accessing a resource is higher if it was recently

accessed.

9

Page Fault Handling for Virtual Memory

MMU generates Page Fault (protection violation or page not present)

Save registers Figure out virtual address that caused fault

– Often in hardware register

If protection problem, signal or kill process If no free frame, evict a page from memory (which one?)

– If modified, write to backing store (dedicated paging space or normal file) – Keep disk location of this page (not in page table, but some other data structure).

  • MMU doesn’t need to know disk location

– Suspend faulting process (resume when write is complete)

Read data from backing store for faulting page

– From backing store or application code or fill-with-zero – Suspend faulting process (resume when read complete)

Update page table Restart instruction for faulting process

– Must undo any partial effects

10

Paging and Translation Lookaside Buffer

11

CPU checks TLB PTE in TLB? CPU generates physical address Access page table yes Page in main memory no Update TLB Free page frame? no OS instructs CPU to read the page from disk CPU activates I/O hardware Page transferred from disk to main memory OS instructs CPU to write the page to disk CPU activates I/O hardware Page transferred from main memory to disk Update page table Update page table yes return to failed instruction

Page Replacement Policy

Resident Set Management

How many page frames are allocated to each active process?

– Fixed – Variable

What existing pages can be considered for replacement

– Local: only the process that caused the page fault – Global: all processes

Cleaning policy

– Pre-Cleaning: Write dirty pages out prospectively – Demand-Cleaning: Write dirty pages out only as needed

Fetch policy

– Demand paging – Prepaging. Load extra pages speculatively while you’re loading others – Copy-on-write

  • Lazy duplicate of pages. For example, on fork, don’t copy data page until write occurs.

Replacement Policy

Which page, among those eligible, should be replaced

– All policies want to replace those that won’t be needed for a long time – Since most processes exhibit locality, recent behavior helps predict future behavior

Eligibility may be limited based on locked frames

– Kernel pages – I/O buffers in kernel space

12

slide-4
SLIDE 4

Page References

Assumption is that the sequence of page references exhibits locality Reference string is list of page numbers used by program

For example, <0 1 2 3 0 1 4 01 2 3 4> Consecutive references to the same page are removed

– That page better still be in memory!

Reference means read or write

13

Opt: the Optimal Page Replacement Policy

Swap out the page that will be used farthest in the future

Difficult to implement:)

Example reference string: <0 1 2 3 0 1 4 0 1 2 3 4>

Three page frames

14

2 3 2 1 5 2 4 5 3 2 5 2 FIFO: First-In First-Out

Swap out the page that’s been in memory the longest

Works well for swapping out initialization code Not so good for often-used code

15

2 3 2 1 5 2 4 5 3 2 5 2 FIFO: Belady’s anomaly

For FIFO, adding extra page frames can cause more page faults Example reference string: <0 1 2 3 0 1 4 0 1 2 3 4>

Three page frames Four page frames

16

1 2 3 1 4 1 2 3 4 1 2 3 1 4 1 2 3 4

slide-5
SLIDE 5

Least Recently Used (LRU)

Remove the page that has been unused the longest

Hardware

Keep counter in PTE. Increment on use. Find PTE with lowest counter to evict Or, keep a linked list ordered by usage

Example reference string: <0 1 2 3 0 1 4 0 1 2 3 4> 17

2 3 2 1 5 2 4 5 3 2 5 2 Clock (or Second-chance)

Choose the oldest page that hasn’t been referenced

Implementation:

– Pages in circular list – R bit maintained by hardware in the PTE

  • HW: Whenever a PTE is accessed (read or

write for that page), R bit is set to 1

  • SW: can set R bit to 0 or 1

– When page is loaded, set R bit to 1 – Hand points to particular page. When a page is

needed, it checks R bit of that page

  • If set, clear and move to next page
  • If not set, this is the page to free

18

2 3 2 1 5 2 4 5 3 2 5 2 Clock

Two levels of pages:

  • ld pages (those not referenced in last clock cycle)

new pages (referenced in last clock cycle

Algorithm picks one of the old pages

Not the oldest (LRU)

Another way to look at it:

FIFO with a second chance (if front of list is referenced, clear reference and

put in back of list)

Can it loop infinitely?

19

Nth Chance

Clock gives a second chance, so has 2 ages it can distinguish Give n chances instead.

Don’t evict page unless hand has swept by n times Need counter in PTE

Higher we make N, the closer it approximates LRU

20

slide-6
SLIDE 6

Working-Set Model

W(, t)

Developed by Deming Set of pages a process has accessed from time t- to time t

– t is virtual time (last t memory accesses)

is size of the window (a larger window means a possibly larger set of

pages)

Working set can grow and shrink over time

Idea for algorithm

Monitor the working set of each process Shrink/Grow the page frames allocated to a process down/up to

that of its working set

If there is not enough space for the working set, swap this

process to disk Difficulties

What size of to use? Keeping track of working set very difficult

Approximation

Monitor page fault frequency of process.

– Exceed upper threshold: add page frame – Below lower threshold: remove page frame

21

Keeping Free Pages

Keeping some clean free pages makes page faults faster

Don’t need to run page replacement algorithm: just go to free list Only need to wait for page to be brought in (instead of first waiting for dirty

page to be written out).

Retain contents of freed page frames

If requested again, reuse page frame without I/O

Write modified page frames lazily

Save in modified page list Write out in groups (based on disk locality)

22

Thrashing

What it is:

Spending more time paging than doing real work

Why it happens:

If the degree of multiprogramming gets too high, each process’ working set is

not resident

With local replacement, the number of frames allocated to this process isn’t

  • enough. (Fighting within a process)

With global replacement, one process causes pages from other processes

working sets to be evicted. (Fighting among processes)

Solution

Reduce the degree of multiprogramming. Swap processes out to disk

How to determine good degree of multiprogramming

Look at utilization of page device (50% utilization optimal) Look at mean time between faults versus mean time to service a fault (equal

maximizes CPU utilization)

For clock algorithm, look at rate that hand scans through the clock

– Too low

  • Few page faults: not many requests to move the pointer
  • Not scanning many pages per request: most pages not referenced

– Too high

  • High fault rate
  • Scanning many pages per request: most pages are referenced

23

Memory-Mapped Files

A file can be mapped into an address space

Pager must read/write from file, similar to the way it pages in from an

executable

Processes can read and write using memory access rather than file read/file write

Written data is cached in page frame Difficult to change EOF

Can be shared between processes

24

slide-7
SLIDE 7

Page Sizes

Advantages of smaller page size

Less internal fragmentation

– On average, the address space of each process wastes P/2 space

Advantages of larger page size

TLB covers more bytes (TLB size * P), so better TBL hit rate Smaller page tables (need address space/P PTEs)

As memory has become cheaper and address space has become larger, page sizes have increased

1970s: Vax: 512 bytes 1990s: PowerPC: 4KB 1990s: Pentium: 4KB or 4MB (defined per secondary page table) 1990s: MIPS: 16KB

25

Summary

Some page replacement algorithms are better than others

OPT LRU Clock FIFO

Locality is what makes VM (or any caching) work

Physical memory is a cache for logical memory

Keep working set in memory

Otherwise, thrashing

26