Virtual Memory CS 351: Systems Programming Michael Saelee - - PowerPoint PPT Presentation

virtual memory
SMART_READER_LITE
LIVE PREVIEW

Virtual Memory CS 351: Systems Programming Michael Saelee - - PowerPoint PPT Presentation

Virtual Memory CS 351: Systems Programming Michael Saelee <lee@iit.edu> Computer Science Science registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM


slide-1
SLIDE 1

Virtual Memory

CS 351: Systems Programming Michael Saelee <lee@iit.edu>

slide-2
SLIDE 2

Computer Science Science

registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud)

previously: SRAM ⇔ DRAM

slide-3
SLIDE 3

Computer Science Science

registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud)

next: DRAM ⇔ HDD, SSD, etc. i.e., memory as a “cache” for disk

slide-4
SLIDE 4

Computer Science Science

main goals:

  • 1. maximize memory throughput
  • 2. maximize memory utilization
  • 3. provide address space consistency

& memory protection to processes

slide-5
SLIDE 5

Computer Science Science

throughput = # bytes per second

  • depends on access latencies (DRAM,

HDD) and “hit rate”

slide-6
SLIDE 6

Computer Science Science

utilization = fraction of allocated memory that contains “user” data (aka payload)

  • vs. metadata and other overhead

required for memory management

slide-7
SLIDE 7

Computer Science Science

address space consistency → provide a uniform “view” of memory to each process

slide-8
SLIDE 8

Computer Science Science

address space consistency → provide a uniform “view” of memory to each process

Kernel virtual memory (code, data, heap, stack) Memory mapped region for shared libraries Run-time heap (created by malloc) User stack (created at runtime) Unused %esp (stack pointer) Memory invisible to user code brk

0xc0000000 0x08048000 0x40000000

Read/write segment (.data, .bss) Read-only segment (.init, .text, .rodata) Loaded from the executable file

0xffffffff

slide-9
SLIDE 9

Computer Science Science

memory protection → prevent processes from directly accessing each other’s address space

slide-10
SLIDE 10

Computer Science Science

memory protection → prevent processes from directly accessing each other’s address space

Kernel virtual memory (code, data, heap, stack) Memory mapped region for shared libraries Run-time heap (created by malloc) User stack (created at runtime) Unused %esp (stack pointer) Memory invisible to user code brk

0xc0000000 0x08048000 0x40000000

Read/write segment (.data, .bss) Read-only segment (.init, .text, .rodata) Loaded from the executable file

0xffffffff

P0

Kernel virtual memory (code, data, heap, stack) Memory mapped region for shared libraries Run-time heap (created by malloc) User stack (created at runtime) Unused %esp (stack pointer) Memory invisible to user code brk

0xc0000000 0x08048000 0x40000000

Read/write segment (.data, .bss) Read-only segment (.init, .text, .rodata) Loaded from the executable file

0xffffffff

P1

Kernel virtual memory (code, data, heap, stack) Memory mapped region for shared libraries Run-time heap (created by malloc) User stack (created at runtime) Unused %esp (stack pointer) Memory invisible to user code brk

0xc0000000 0x08048000 0x40000000

Read/write segment (.data, .bss) Read-only segment (.init, .text, .rodata) Loaded from the executable file

0xffffffff

P2

slide-11
SLIDE 11

Computer Science Science

i.e., every process should be provided with a managed, virtualized address space

slide-12
SLIDE 12

Computer Science Science

“memory addresses”: what are they, really?

slide-13
SLIDE 13

Computer Science Science

“physical” address: (byte) index into DRAM

data CPU address: N Main Memory N

(note: cache not shown)

slide-14
SLIDE 14

Computer Science Science

int glob = 0xDEADBEEE; main() { fork(); glob += 1; } (gdb) set detach-on-fork off (gdb) break main Breakpoint 1 at 0x400508: file memtest.c, line 7. (gdb) run Breakpoint 1, main () at memtest.c:7 7 fork(); (gdb) next [New process 7450] 8 glob += 1; (gdb) print &glob $1 = (int *) 0x6008d4 (gdb) next 9 } (gdb) print /x glob $2 = 0xdeadbeef (gdb) inferior 2 [Switching to inferior 2 [process 7450] #0 0x000000310acac49d in __libc_fork () 131 pid = ARCH_FORK (); (gdb) finish Run till exit from #0 in __libc_fork () 8 glob += 1; (gdb) print /x glob $4 = 0xdeadbeee (gdb) print &glob $5 = (int *) 0x6008d4

parent child

slide-15
SLIDE 15

Computer Science Science

data CPU address: N Main Memory N

instructions executed by the CPU do not refer directly to physical addresses!

slide-16
SLIDE 16

Computer Science Science

processes reference virtual addresses, the CPU relays virtual address requests to the memory management unit (MMU), which are translated to physical addresses

slide-17
SLIDE 17

Computer Science Science

disk address CPU Main Memory

“swap” space

MMU address translation unit physical address virtual address

(note: cache not shown)

slide-18
SLIDE 18

Computer Science Science

essential problem: translate request for a virtual address → physical address … this must be FAST, as every memory access from the CPU must be translated

slide-19
SLIDE 19

Computer Science Science

both hardware/software are involved:

  • MMU (hw) handles simple and fast
  • perations (e.g., table lookups)
  • Kernel (sw) handles complex tasks

(e.g., eviction policy)

slide-20
SLIDE 20

Computer Science Science

§Virtual Memory Implementations

slide-21
SLIDE 21

Computer Science Science

keep in mind goals:

  • 1. maximize memory throughput
  • 2. maximize memory utilization
  • 3. provide address space consistency

& memory protection to processes

slide-22
SLIDE 22

Computer Science Science

P0 Main Memory

  • 1. simple relocation

B N N+B

slide-23
SLIDE 23

Computer Science Science

data CPU VA: N PA: N+B MMU relocation reg. Main Memory B N

  • 1. simple relocation
  • per-process relocation address is loaded

by kernel on every context switch

B

slide-24
SLIDE 24

Computer Science Science

data CPU VA: N PA: N+B MMU relocation reg. Main Memory B N

  • 1. simple relocation
  • problem: processes may easily overextend

their bounds and trample on each other

B

slide-25
SLIDE 25

Computer Science Science

data CPU VA: N PA: N+B Main Memory B N

  • 1. simple relocation
  • incorporate a limit register to provide

memory protection

MMU relocation reg. B limit reg. L assert (0 ≤ N ≤ L) B+L process sandbox

slide-26
SLIDE 26

Computer Science Science

data CPU VA: N PA: N+B Main Memory B N

  • 1. simple relocation
  • assertion failure triggers a fault, which

summons kernel (which signals process)

MMU relocation reg. B limit reg. L assert (0 ≤ N ≤ L) B+L process sandbox

slide-27
SLIDE 27

Computer Science Science

pros:

  • simple & fast!
  • provides protection
slide-28
SLIDE 28

Computer Science Science

but: available memory for mapping depends on value of base address i.e., address spaces are not consistent!

B B

vs.

Main Memory Main Memory Virtual Memory stack code data heap stack code data heap stack code data heap code data

slide-29
SLIDE 29

Computer Science Science

also: all of a process below the address limit must be loaded in memory i.e., memory may be vastly under-utilized

Main Memory B

possibly unused!

virtual address space

L stack code stack code

slide-30
SLIDE 30

Computer Science Science

  • 2. segmentation
  • partition virtual address space into

multiple logical segments

  • individually map them onto physical

memory with relocation registers

slide-31
SLIDE 31

Computer Science Science

MMU Base Limit B0 L0 1 B1 L1 2 B2 L2 3 B3 L3 Segment Table Main Memory B3 B3+L3 B2 B2+L2 B1 B1+L1 B0 B0+L0 Seg #0: Code Seg #1: Data Seg #3: Stack Seg #2: Heap Segmented Virtual Address Space

virtual address has form seg#:offset

slide-32
SLIDE 32

Computer Science Science

MMU Base Limit B0 L0 1 B1 L1 2 B2 L2 3 B3 L3 Segment Table VA: seg#:offset data assert (offset ≤ L2)

CPU PA: offset + B2 Main Memory B3 B3+L3 B2 B2+L2 B1 B1+L1 B0 B0+L0

slide-33
SLIDE 33

Computer Science Science

  • implemented as MMU registers
  • part of kernel-maintained, per-process

metadata (aka “process control block”)

  • re-populated on each context switch

Base Limit B0 L0 1 B1 L1 2 B2 L2 3 B3 L3 Segment Table

slide-34
SLIDE 34

Computer Science Science

pros:

  • still very fast
  • translation = register access & addition
  • memory protection via limits
  • segmented addresses improve consistency
slide-35
SLIDE 35

Computer Science Science

possibly unused!

Main Memory B

virtual address space

L stack code stack code

simple relocation: segmentation:

better!

Main Memory stack code stack code virtual address space

slide-36
SLIDE 36

Computer Science Science

stack code virtual address space 2

x x x

Main Memory stack code stack code virtual address space

  • variable segment sizes → memory fragmentation
  • fragmentation potentially lowers utilization
  • can fix through compaction, but expensive!
slide-37
SLIDE 37

Computer Science Science

  • 3. paging
  • partition virtual and physical address

spaces into uniformly sized pages

  • virtual pages map onto physical pages
slide-38
SLIDE 38

Computer Science Science

stack heap data code physical memory

slide-39
SLIDE 39

Computer Science Science

stack heap data code

  • minimum mapping granularity = page
  • not all of a given segment need be mapped

physical memory

slide-40
SLIDE 40

Computer Science Science

modified mapping problem:

  • a virtual address is broken down into

virtual page number & page offset

  • determine which physical page (if any)

a given virtual page is loaded into

  • if physical page is found, use page
  • ffset to access data
slide-41
SLIDE 41

Computer Science Science

VA: PA:

Given page size = 2p bytes

p p virtual page offset virtual page number physical page offset physical page number

slide-42
SLIDE 42

Computer Science Science

physical page offset physical page number

address translation

VA: PA:

virtual page offset virtual page number

slide-43
SLIDE 43

Computer Science Science

physical page offset physical page number

VA: PA:

virtual page offset virtual page number

translation structure: page table

valid PPN n 2n entries index if invalid, page is not mapped

slide-44
SLIDE 44

Computer Science Science

page table entries (PTEs) typically contain additional metadata, e.g.:

  • dirty (modified) bit
  • access bits (shared or kernel-owned

pages may be read-only or inaccessible)

slide-45
SLIDE 45

Computer Science Science

e.g., 32-bit virtual address, 4KB (212) page size, 4-byte PTE size;

  • size of page table?
slide-46
SLIDE 46

Computer Science Science

e.g., 32-bit virtual address, 4KB (212) pages, 4-byte PTEs;

  • # pages = 232 ÷ 212 = 220 =1M
  • page table size = 1M × 4 bytes = 4MB
slide-47
SLIDE 47

Computer Science Science

4MB is much too large to fit in the MMU — insufficient registers and SRAM! Page table resides in main memory

slide-48
SLIDE 48

Computer Science Science

The translation process (aka page table walk) is performed by hardware (MMU). The kernel must initially populate, then continue to manage a process’s page table The kernel also populates a page table base register on context switches

slide-49
SLIDE 49

Computer Science Science

➊ VA: N

translation: hit

CPU

➌ PA: N'

Main Memory Page Table

➋ page table walk ➍ data

Address Translator (part of MMU)

slide-50
SLIDE 50

Computer Science Science

➐ VA: N (retry)

Main Memory Disk (swap space)

➎ data transfer ➊ VA: N

translation: miss

CPU

➒ PA: N'

Page Table Address Translator (part of MMU)

➋ page table walk ➓ data ➌ page fault

kernel

➍ transfer control to kernel ➑ ➏ PTE update

slide-51
SLIDE 51

Computer Science Science

kernel decides where to place page, and what to evict (if memory is full)

  • e.g., using LRU replacement policy
slide-52
SLIDE 52

Computer Science Science

this system enables on-demand paging i.e., an active process need only be partly in memory (load rest from disk dynamically)

slide-53
SLIDE 53

Computer Science Science

but if working set (of active processes) exceeds available memory, we may have swap thrashing

slide-54
SLIDE 54

Computer Science Science

integration with caches?

slide-55
SLIDE 55

Computer Science Science

Q: do caches use physical or virtual addresses for lookups?

slide-56
SLIDE 56

Computer Science Science

CPU

Process A Process B

Virtual Address Space Virtual Address Space M L M N

X Z

Cache

Address Data L X M Y N Z

Virtual address based Cache ambiguous! ? ?

slide-57
SLIDE 57

Computer Science Science

CPU Cache

Address Data S X Q Y R Z

Process A Process B

Virtual Address Space Virtual Address Space M L M N Physical Memory

X

S

Y

Q

Z

R

Physical address based Cache

X Z Y

slide-58
SLIDE 58

Computer Science Science

Q: do caches use physical or virtual addresses for lookups? A: caches typically use physical addresses

slide-59
SLIDE 59

Computer Science Science

(miss) PA VA

Main Memory process page table CPU Cache

page table walk

MMU (address translation unit)

(hit) data (update)

%*@$&#!!!

slide-60
SLIDE 60

Computer Science Science

saved by hardware: the Translation Lookaside Buffer (TLB) — a cache used solely for VPN→PPN lookups

slide-61
SLIDE 61

Computer Science Science

MMU Main Memory process page table CPU Cache

VA PA (miss)

TLB (VPN→PPN cache) address translation unit

  • nly if

TLB miss! page table walk (hit) data (update)

TLB + Page table

(exercise for reader: revise earlier translation diagrams!)

slide-62
SLIDE 62

Computer Science Science virtual page number (VPN) page offset physical address n-1 p p-1 valid tag physical page number (PPN) virtual address = TLB Hit valid tag data = Cache Hit Data byte offset Cache TLB

slide-63
SLIDE 63

Computer Science Science

TLB mappings are process specific — requires flush & reload on context switch

  • some architectures store PID (aka

“virtual space” ID) in TLB

slide-64
SLIDE 64

Computer Science Science

Familiar caching problem:

  • TLB caches a few thousand mappings
  • vs. millions of virtual pages per process!
slide-65
SLIDE 65

Computer Science Science

we can improve TLB hit rate by reducing the number of pages … by increasing the size of each page

slide-66
SLIDE 66

Computer Science Science

compute # pages for 32-bit memory for:

  • 1KB, 512KB, 4MB pages
  • 232 ÷ 210 = 222 = 4M pages
  • 232 ÷ 219 = 213 = 8K pages
  • 232 ÷ 222 = 210 = 1K pages

(not bad!)

slide-67
SLIDE 67

Computer Science Science

Process A Process B

Virtual Memory Virtual Memory Physical Memory

lots of wasted space!

slide-68
SLIDE 68

Computer Science Science

Process A Process B

Virtual Memory Virtual Memory Physical Memory

slide-69
SLIDE 69

Computer Science Science

increasing page size results in increased internal fragmentation and lower utilization

slide-70
SLIDE 70

Computer Science Science

i.e., TLB effectiveness needs to be balanced against memory utilization

slide-71
SLIDE 71

Computer Science Science

so what about 64-bit systems? 264 = 16 Exabyte address space ≈ 4 billion x 4GB

slide-72
SLIDE 72

Computer Science Science

most modern implementations support a max of 248 (256TB) addressable space

slide-73
SLIDE 73

Computer Science Science

page table size (assuming 4K page size)?

  • # pages = 248 ÷ 212 = 236
  • PTE size = 8 bytes (64 bits)
  • PT size = 236 x 8 = 239 bytes

= 512GB

slide-74
SLIDE 74

Computer Science Science

512GB (just for the virtual memory mapping structure) (and we need one per process)

slide-75
SLIDE 75

Computer Science Science

(these things aren’t going to fit in memory)

slide-76
SLIDE 76

Computer Science Science

instead, use multi-level page tables:

  • split an address translation into two

(or more) separate table lookups

  • unused parts of the table don’t need to

be in memory!

slide-77
SLIDE 77

Computer Science Science

7 6 5 4 3 2 1

1 1 1 1 1

“toy” memory system

  • 8 bit addresses
  • 32-byte pages

page offset VPN

(unmapped) PPN (unmapped) PPN (unmapped) (unmapped) (unmapped) (unmapped)

Page Table

7 6 5 4 3 2 1

all 8 PTEs must be in memory at all times

slide-78
SLIDE 78

Computer Science Science

7 6 5 4 3 2 1

1 1 1 1 1

page offset

1

(unmapped) PPN (unmapped) PPN

3 2 1

(unmapped) (unmapped) (unmapped) (unmapped)

3 2 1

page “directory”

“toy” memory system

  • 8 bit addresses
  • 32-byte pages

all unmapped; don’t need in memory!

slide-79
SLIDE 79

Computer Science Science

7 6 5 4 3 2 1

1 1 1 1 1

page offset

1

(unmapped) PPN (unmapped) PPN

3 2 1

“toy” memory system

  • 8 bit addresses
  • 32-byte pages

slide-80
SLIDE 80

Computer Science Science

Intel Architecture Memory Management

http://www.intel.com/products/processor/manuals/ (Software Developer’s Manual Volume 3A)

slide-81
SLIDE 81

Computer Science Science

Global Descriptor Table (GDT) Linear Address Space Segment Segment Descriptor Offset Logical Address Segment Base Address Page

  • Phy. Addr.
  • Lin. Addr.

Segment Selector Dir Table Offset Linear Address Page Table Page Directory Entry Physical Space Entry (or Far Pointer) Paging Segmentation Address Page

slide-82
SLIDE 82

Computer Science Science

Segmented → Linear Address

Offset (Effective Address) Base Address Descriptor Table Segment Descriptor 31(63)

  • Seg. Selector

15 Logical Address

+

Linear Address 31(63)

slide-83
SLIDE 83

Computer Science Science

Segment registers

CS SS DS ES FS GS Segment Selector Base Address, Limit, Access Information Visible Part Hidden Part

slide-84
SLIDE 84

Computer Science Science

Segment descriptor

Figure 3-8. Segment Descriptor

31 24 23 22 21 20 19 16 15 13 14 12 11 8 7 P

Base 31:24

G D P L

Type

S L

4

31 16 15

Base Address 15:00 Segment Limit 15:00 Base 23:16

D / B A V L

Seg. Limit 19:16

G — Granularity LIMIT — Segment Limit P — Segment present S — Descriptor type (0 = system; 1 = code or data) TYPE — Segment type DPL — Descriptor privilege level AVL — Available for use by system software BASE — Segment base address D/B — Default operation size (0 = 16-bit segment; 1 = 32-bit segment) L — 64-bit code segment (IA-32e mode only)

slide-85
SLIDE 85

Computer Science Science

Linear Address Space (or Physical Memory) Segment Registers CS Segment Descriptors Limit Access Base Address SS Limit Access Base Address DS Limit Access Base Address ES Limit Access Base Address FS Limit Access Base Address GS Limit Access Base Address Limit Access Base Address Limit Access Base Address Limit Access Base Address Limit Access Base Address Stack Code Data Data Data Data

Segmented address space

slide-86
SLIDE 86

Computer Science Science

“Flat” address space

Linear Address Space (or Physical Memory) Data and FFFFFFFFH Segment Limit Access Base Address Registers CS SS DS ES FS GS Code Code- and Data-Segment Descriptors Stack Not Present

slide-87
SLIDE 87

Computer Science Science

Paging Mode CR0.PG CR4.PAE LME in IA32_EFER Linear- Address Width Physical- Address Width1 NOTES: Page Size(s) Supports Execute- Disable? None N/A N/A 32 32 N/A No 32-bit 1 02 32 Up to 403 4-KByte 4-MByte4 No PAE 1 1 32 Up to 52 4-KByte 2-MByte Yes5 IA-32e 1 1 2 48 Up to 52 4-KByte 2-MByte 1-GByte6 Yes5

Paging modes

slide-88
SLIDE 88

Computer Science Science

IA-32 paging (4KB pages)

Directory Table Offset Page Directory PDE with PS=0 CR3 Page Table PTE 4-KByte Page Physical Address 31 21 11 12 22 Linear Address 32 10 12 10 20 20

slide-89
SLIDE 89

Computer Science Science

IA-32 paging (4MB pages)

Directory Offset Page Directory PDE with PS=1 CR3 4-MByte Page Physical Address 31 21 22 Linear Address 10 22 32 18

slide-90
SLIDE 90

Computer Science Science

PTE formats (32-bit paging)

31302928272625242322212019181716151413121110 9 8 7 6 5 4 3 2 1 0

Address of page directory1 Ignored P C D P W T Ignored CR3 CR3 Bits 31:22 of address

  • f 2MB page frame

Reserved (must be 0) Bits 39:32

  • f

address2 P A T Ignored G 1 D A P C D P W T U / S R / W 1 PD PDE: E: 4M 4MB page page Address of page table Ignored I g n A P C D P W T U / S R / W 1 PD PDE: E: page page table table Ignored PD PDE: E: no not pr presen ent Address of 4KB page frame Ignored G P A T D A P C D P W T U / S R / W 1 PT PTE: E: 4K 4KB page page Ignored PT PTE: E: no not pr presen ent

Figure 4-4. Formats of CR3 and Paging-Structure Entries with 32-Bit Paging

slide-91
SLIDE 91

Computer Science Science

PAE paging (4KB pages)

Directory Table Offset Page Directory PDE with PS=0 Page Table PTE 4-KByte Page Physical Address 31 20 11 12 21 Linear Address PDPTE value 30 29 PDPTE Registers Directory Pointer 2 9 12 9 40 40 40

slide-92
SLIDE 92

Computer Science Science

IA-32e paging (4KB pages)

Directory Ptr PTE Linear Address Page Table PDPTE CR3 39 38 Pointer Table 9 9 40 12 9 40 4-KByte Page Offset Physical Addr

PDE with PS=0

Table 11 12 20 21 Directory 30 29 Page-Directory- Page-Directory PML4 47 9 PML4E 40 40 40

slide-93
SLIDE 93

Computer Science Science

IA-32e paging (1GB pages)

Directory Ptr Linear Address

PDPTE with PS=1

CR3 39 38 Pointer Table 9 40 30 22 1-GByte Page Offset Physical Addr 30 29 Page-Directory- PML4 47 9 PML4E 40