SLIDE 1 Slides for Lecture 12
ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng
Electrical & Computer Engineering Schulich School of Engineering University of Calgary
25 February, 2014
SLIDE 2 ENCM 501 W14 Slides for Lecture 12
slide 2/19
Previous Lecture
◮ more about multi-level caches ◮ classifying cache misses: the 3 C’s ◮ introduction to virtual memory
SLIDE 3 ENCM 501 W14 Slides for Lecture 12
slide 3/19
Today’s Lecture
◮ Continued explanation of virtual memory.
Related reading in Hennessy & Patterson: Sections B.4–B.5
SLIDE 4 ENCM 501 W14 Slides for Lecture 12
slide 4/19
Quick review of address translation
page number
page physical page number virtual page translation (no translation!) straight copy virtual address physical address
The master list of VPN-to-PPN translations for a single process is maintained by the O/S kernel in a data structure called a page table. TLBs are circuits capable of doing some
- f these translations very quickly.
SLIDE 5
ENCM 501 W14 Slides for Lecture 12
slide 5/19
A couple of questions about address translation (1)
Process 98 and 99 are running at the same time. Suppose that 0x7fffff567 is the VPN for a page for process 98’s stack, and the corresponding PPN is 0x13579bd. Suppose that 0x7fffff567 is also the VPN for a page for process 99’s stack. What can we conclude about the VPN-to-PPN translation for VPN 0x7fffff567 in process 99?
SLIDE 6
ENCM 501 W14 Slides for Lecture 12
slide 6/19
A couple of questions about address translation (2)
As on the previous slide, process 98 and 99 are running at the same time. Suppose that 0x000000400 is the VPN for a page for process 98’s instructions, and the corresponding PPN is 0x1234567. Suppose that 0x000000400 is also the VPN for a page for process 99’s instructions. What can we conclude about the VPN-to-PPN translation for VPN 0x000000400 in in process 99?
SLIDE 7 ENCM 501 W14 Slides for Lecture 12
slide 7/19
Linux / Mac OS X virtual address spaces on x86-64
Pointers are 64 bits wide, but only the least significant 48 bits are used in a virtual address.
0x0000 7fff ffff ffff 0x0000 7fff ffff fffe 0x0000 0000 0000 0000 . . . virtual address space for user processes virtual address space for O/S kernel 0xffff ffff ffff ffff 0xffff ffff ffff fffe 0xffff 8000 0000 0000 . . . HUGE range of invalid addresses byte address
(For 64-bit Microsoft Windows, the picture is either identical,
- r not quite the same but very similar.)
SLIDE 8
ENCM 501 W14 Slides for Lecture 12
slide 8/19
A page table for an x86-64 Linux process
The normal page size is 4 KB. So bits 11–0 of an address are page offset, and bits 46–12 of a virtual address are VPN (virtual page number). Conceptually, a page table is just an array of PTEs (page table entries), where the indexes are VPNs:
0x7fff ffff f 0x7fff ffff e 0x0000 0000 1 0x0000 0000 0 64-bit PTE 64-bit PTE 64-bit PTE 64-bit PTE . . . . . . VPN
SLIDE 9 ENCM 501 W14 Slides for Lecture 12
slide 9/19
Suppose that a page table really is just a big array, as shown
How much space would such a page table occupy? The answer to the above question is a totally unreasonable number, so we’ll need to use more complex and much more space-efficient data structures for page tables. Let’s worry about the data structures later, and continue for a while with the simple model that a page table is just a big array of PTEs.
SLIDE 10 ENCM 501 W14 Slides for Lecture 12
slide 10/19
What information is in a PTE?
A PTE answers several different questions about a virtual
- page. Here is an incomplete list:
◮ First, does the virtual page even exist? (For a typical
x86-64 Linux process, the vast majority of VPNs in the range from 0x0000 0000 0 from 0x7fff ffff f correspond to non-existent virtual pages.)
◮ If the page exists, is it present in physical memory? ◮ If the page is present, what is the PPN (physical page
number)?
◮ What are the permissions for the page—can the process
write to the page, and can it fetch instructions from the page?
SLIDE 11
ENCM 501 W14 Slides for Lecture 12
slide 11/19
PTE formats in x86-64 Linux (1)
First, let’s look at a PTE for a page that does not exist. I haven’t found documentation to confirm this, but I’m pretty sure that 64 zeros indicate that there is no page corresponding to a VPN:
63
bit numbers within PTE 0 0 0 0 · · ·
SLIDE 12
ENCM 501 W14 Slides for Lecture 12
slide 12/19
PTE formats in x86-64 Linux (2)
Now let’s look at a PTE for a page that does exist, and is present in physical memory. How can a page exist but NOT be present in physical memory? Okay, back to the PTE format for a page that is present . . .
63
bit numbers within PTE 1
1
XD R/W P
51 12
up to 40 bits for PPN
2 8
more page status bits : unused bits
Let’s make some notes about the P, R/W and XD bits.
SLIDE 13 ENCM 501 W14 Slides for Lecture 12
slide 13/19
PTE formats in x86-64 Linux (3)
And here is a PTE for a page that exists, but is not present in physical memory.
63 1
P page location on disk, other info about page
We won’t go into detail about bits 63–1, but if the assumption
- n slide 11 is correct, they must not all be zero.
Source for information on this slide and slide 12: Bryant, R. E. and O’Hallaron, D. R., Computer Systems: A Programmer’s Perspective, 2nd ed., published by Prentice Hall.
SLIDE 14 ENCM 501 W14 Slides for Lecture 12
slide 14/19
Review of P3/P4 memory system structure
CORE DRAM MODULES UNIFIED L2 CACHE L1 I- CACHE I-TLB L1 D- CACHE D-TLB DRAM CONTROLLER
On every instruction fetch, the I-TLB must attempt to translate a virtual instruction address into a physical instruction address. On every data read or write, the D-TLB must attempt to translate a virtual data address into a physical data address.
SLIDE 15
ENCM 501 W14 Slides for Lecture 12
slide 15/19
TLB structure
A TLB is essentially a cache for page table information. A page table is a complete list of the statuses of all of the virtual pages belonging to a process. A TLB contains some of the most recently accessed information in a page table.
SLIDE 16 ENCM 501 W14 Slides for Lecture 12
slide 16/19
TLB hits
Let’s outline:
◮ how a TLB hit is detected; ◮ what happens as a result of a TLB hit.
SLIDE 17
ENCM 501 W14 Slides for Lecture 12
slide 17/19
Simple TLB misses
The simplest form of a TLB miss occurs when there is a valid VPN-to-PPN translation, which is in the page table, but not in the TLB. Let’s describe how such a TLB miss is handled.
SLIDE 18 ENCM 501 W14 Slides for Lecture 12
slide 18/19
DRAM, disk storage and flash memory
Here’s a story that is simple, easy to understand, but not actually true . . .
◮ Instructions and data belonging to the kernel and to
processses are in DRAM.
◮ I-caches and D-caches allow processor cores to access
instructions and data much faster than if all such accesses really had to go to DRAM.
◮ Non-volatile storage, such as magnetic disks and flash
memory arrays, are used for file storage. That’s actually a good model to start with, but it’s wrong! What is a more accurate model?
SLIDE 19 ENCM 501 W14 Slides for Lecture 12
slide 19/19
Upcoming Topics
Short-term:
◮ Completion of material on virtual memory. ◮ Simple pipelining.
Related reading in Hennessy & Patterson: Sections B.4–B.5, Appendix C. Big topics for the second half of the course:
◮ Instruction-level parallelism. ◮ Thread-level parallelism.
Related reading in Hennessy & Patterson: Appendix C, Chapters 3 and 5.