Computer Organization & Assembly Language Programming (CSE 2312)
Lecture 20: Memory Hierarchies (Registers, Caches, Main Memory, Storage) Taylor Johnson
Computer Organization & Assembly Language Programming (CSE - - PowerPoint PPT Presentation
Computer Organization & Assembly Language Programming (CSE 2312) Lecture 20: Memory Hierarchies (Registers, Caches, Main Memory, Storage) Taylor Johnson Announcements and Outline Programming assignment 1 assigned, due 11/4 Review
Lecture 20: Memory Hierarchies (Registers, Caches, Main Memory, Storage) Taylor Johnson
October 30, 2014 CSE2312, Fall 2014 2
October 30, 2014 CSE2312, Fall 2014 3
FETCH[PC] IR := MEM[PC] (Get instruction from memory at address PC) EXECUTE (Execute instruction fetched from memory) Interrupt ? PC := PC + 4 (Increment the Program Counter)
Handle Interrupt (Input/Output Event) DECODE(IR) (Decode fetched instruction, find operands)
Executed instruction has PC-8 Decoded instruction has PC-4
stored in memory
be “re-programmed”
through CPU
representative of modern systems (direct memory access [DMA])
representative of modern systems
October 30, 2014 CSE2312, Fall 2014 4
Memory (Data + Program [Instructions]) CPU I/O
DMA
October 30, 2014 CSE2312, Fall 2014 5
October 30, 2014 CSE2312, Fall 2014 6
October 30, 2014 CSE2312, Fall 2014 7
ldr r0,=ADDR_UART0@ r0 := 0x 101f 1000 mov r2,#’a’ @ R2 := ‘a’ str r2,[r0] @ MEM[r0] := r2
controllers, etc.) are addressable in same address space as main memory, and their values are mapped (i.e., readable / writeable at certain addresses)
October 30, 2014 CSE2312, Fall 2014 8
October 30, 2014 CSE2312, Fall 2014 9
October 30, 2014 CSE2312, Fall 2014 10
http://infocenter.arm.com/help/topic/com.arm.doc.dui0224i/DUI0224I_realview_platform_ baseboard_for_arm926ej_s_ug.pdf
October 30, 2014 CSE2312, Fall 2014 11
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183f/DDI0183.pdf
October 30, 2014 CSE2312, Fall 2014 12
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0183f/DDI0183.pdf
.equ IO_ADDRESS, 0x101f1000 @ uart memory-map address .equ OFFSET_FR, 0x018 @ flag register offset from uart .equ RXFE, 0x10 @ receive status bit .equ TXFF, 0x20 @ transmit status bit get_char: push {r2, r3, r4, lr} @ preamble ldr r4,=IO_ADDRESS @ r4 := 0x 101f 1000 get_char_wait: ldr r2,[r4,#OFFSET_FR] @ load IO flag register to r2 and r3,r2,#RXFE @ mask non receive fifo empty bits cmp r3, #0 @ check if r3 == 0 bne get_char_wait @ wait if not ready (if r3 != 0) ldr r0, [r4] @ read character str r0, [r4] @ echo character to screen pop {r2, r3, r4, lr} @ wrap up bx lr
CSE2312, Fall 2014 13 October 30, 2014
(gdb) x /16x 0x101f1000 <- View all registers 0x101f1000: 0x00000000 0x00000000 0x00000000 0x00000000 0x101f1010: 0x00000000 0x00000000 0x00000090 0x00000000 0x101f1020: 0x00000000 0x00000000 0x00000000 0x00000000 0x101f1030: 0x00000300 0x00000012 0x00000000 0x00000020 (gdb) x /1x 0x101f1000+0x018 <- View Flag Register 0x101f1018: 0x00000090 (gdb) x /1t 0x101f1000+0x018 <- View Flag Register 0x101f1018: 00000000000000000000000010010000 (gdb) x /1t 0x101f1000+0x018 <- Character entered 0x101f1018: 00000000000000000000000011000000
October 30, 2014 CSE2312, Fall 2014 14
.equ IO_ADDRESS, 0x101f1000 @ uart memory-map address .equ OFFSET_FR, 0x018 @ flag register offset from uart .equ RXFE, 0x10 @ receive status bit
@ assumes r0 contains uart data register address @ r1 should contain first character of string to display print_string: push {r1,r2,lr} str_out: ldrb r2,[r1] cmp r2,#0x00 @ '\0' = 0x00: null character? beq str_done @ if yes, quit str r2,[r0] @ otherwise, write char of string add r1,r1,#1 @ go to next character b str_out @ repeat str_done: pop {r1,r2,lr} bx lr
October 30, 2014 CSE2312, Fall 2014 15
October 30, 2014 16 CSE2312, Fall 2014
procedure B
executed now?
October 30, 2014 CSE2312, Fall 2014 17
October 30, 2014 CSE2312, Fall 2014 18
October 30, 2014 CSE2312, Fall 2014 19
October 30, 2014 CSE2312, Fall 2014 20
21 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 22
23 October 30, 2014 CSE2312, Fall 2014
registers.
24 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 25
October 30, 2014 CSE2312, Fall 2014 26
October 30, 2014 27 CSE2312, Fall 2014
October 30, 2014 28 CSE2312, Fall 2014
October 30, 2014 29 CSE2312, Fall 2014
50 100 150 200 250 300 '80 '83 '85 '89 '92 '96 '98 '00 '04 '07 Trac Tcac Year Capacity $/GB 1980 64Kbit $1500000 1983 256Kbit $500000 1985 1Mbit $200000 1989 4Mbit $50000 1992 16Mbit $15000 1996 64Mbit $10000 1998 128Mbit $4000 2000 256Mbit $1000 2004 512Mbit $250 2007 1Gbit $50
October 30, 2014 30 CSE2312, Fall 2014
October 30, 2014 31 CSE2312, Fall 2014
4-word wide memory
Miss penalty = 1 + 15 + 1 = 17 bus cycles Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle
4-bank interleaved memory
Miss penalty = 1 + 15 + 4×1 = 20 bus cycles Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle
October 30, 2014 32 CSE2312, Fall 2014
33 October 30, 2014 CSE2312, Fall 2014
34 October 30, 2014 CSE2312, Fall 2014
35 October 30, 2014 CSE2312, Fall 2014
36 October 30, 2014 CSE2312, Fall 2014
37 October 30, 2014 CSE2312, Fall 2014
38 October 30, 2014 CSE2312, Fall 2014
39 October 30, 2014 CSE2312, Fall 2014
40 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 41
42 October 30, 2014 CSE2312, Fall 2014
43 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 44
October 30, 2014 CSE2312, Fall 2014 45
October 30, 2014 CSE2312, Fall 2014 46
level
lower level
= 1 – hit ratio
from upper level
CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 48
word CPU)
October 30, 2014 CSE2312, Fall 2014 49
October 30, 2014 CSE2312, Fall 2014 50
51 October 30, 2014 CSE2312, Fall 2014
52 October 30, 2014 CSE2312, Fall 2014
53 October 30, 2014 CSE2312, Fall 2014
from memory, which takes time M.
54 October 30, 2014 CSE2312, Fall 2014
55 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 56 CSE2312, Fall 2014
October 30, 2014 57 CSE2312, Fall 2014
58 October 30, 2014 CSE2312, Fall 2014
determine that.
59 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 60
61 October 30, 2014 CSE2312, Fall 2014
CSE2312, Fall 2014
October 30, 2014 62
63 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 64
CSE2312, Fall 2014
October 30, 2014 65
CSE2312, Fall 2014
October 30, 2014 66
CSE2312, Fall 2014
October 30, 2014 67
October 30, 2014 CSE2312, Fall 2014 68
69 October 30, 2014 CSE2312, Fall 2014
70 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 71
72 October 30, 2014 CSE2312, Fall 2014
73 October 30, 2014 CSE2312, Fall 2014
74 October 30, 2014 CSE2312, Fall 2014
75 October 30, 2014 CSE2312, Fall 2014
76 October 30, 2014 CSE2312, Fall 2014
77 October 30, 2014 CSE2312, Fall 2014
78 October 30, 2014 CSE2312, Fall 2014
79 October 30, 2014 CSE2312, Fall 2014
80 October 30, 2014 CSE2312, Fall 2014
81 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 82
83 October 30, 2014 CSE2312, Fall 2014
84 October 30, 2014 CSE2312, Fall 2014
85 October 30, 2014 CSE2312, Fall 2014
at a time). Then, striping cannot be used.
86 October 30, 2014 CSE2312, Fall 2014
87 October 30, 2014 CSE2312, Fall 2014
88 October 30, 2014 CSE2312, Fall 2014
89 October 30, 2014 CSE2312, Fall 2014
90 October 30, 2014 CSE2312, Fall 2014
91 October 30, 2014 CSE2312, Fall 2014
92 October 30, 2014 CSE2312, Fall 2014
93 October 30, 2014 CSE2312, Fall 2014
94 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 95
October 30, 2014 CSE2312, Fall 2014 96
97 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 98
bytes/sec
~=650 MB
October 30, 2014 CSE2312, Fall 2014 99
October 30, 2014 CSE2312, Fall 2014 100
October 30, 2014 CSE2312, Fall 2014 101
October 30, 2014 CSE2312, Fall 2014 102
103 October 30, 2014 CSE2312, Fall 2014
October 30, 2014 CSE2312, Fall 2014 104
CSE2312, Fall 2014
October 30, 2014 105
CSE2312, Fall 2014
October 30, 2014 106
How do we know if
Where do we look?
October 30, 2014 107 CSE2312, Fall 2014
CSE2312, Fall 2014
#Blocks is a
Use low-order
October 30, 2014 108
CSE2312, Fall 2014
October 30, 2014 109
CSE2312, Fall 2014
Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N
October 30, 2014 110
CSE2312, Fall 2014
Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 22 10 110 Miss 110
October 30, 2014 111
CSE2312, Fall 2014
Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 26 11 010 Miss 010
October 30, 2014 112
CSE2312, Fall 2014
Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010
October 30, 2014 113
CSE2312, Fall 2014
Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000
October 30, 2014 114
CSE2312, Fall 2014
Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N Word addr Binary addr Hit/miss Cache block 18 10 010 Miss 010
October 30, 2014 115
Index Tag Data Valid
October 30, 2014 CSE2312, Fall 2014 116
Index Tag Data Valid
October 30, 2014 CSE2312, Fall 2014 117
October 30, 2014 CSE2312, Fall 2014 118
CSE2312, Fall 2014
October 30, 2014 119
CSE2312, Fall 2014
3 4 9 10 31 4 bits 6 bits 22 bits
October 30, 2014 120
CSE2312, Fall 2014
October 30, 2014 121
CSE2312, Fall 2014
October 30, 2014 122
CSE2312, Fall 2014
memory takes 100 cycles
October 30, 2014 123
CSE2312, Fall 2014
October 30, 2014 124
CSE2312, Fall 2014
initialization)
October 30, 2014 125