Operating Systems
Julian Bradfield jcb@inf.ed.ac.uk IF–4.07
1 / 184
Operating Systems Julian Bradfield jcb@inf.ed.ac.uk IF4.07 1 / - - PowerPoint PPT Presentation
Operating Systems Julian Bradfield jcb@inf.ed.ac.uk IF4.07 1 / 184 Course Aims general understanding of structure of modern computers purpose, structure and functions of operating systems illustration of key OS aspects by example
1 / 184
◮ general understanding of structure of modern computers ◮ purpose, structure and functions of operating systems ◮ illustration of key OS aspects by example
2 / 184
◮ describe the general architecture of computers ◮ describe, contrast and compare differing structures for operating
◮ understand and analyse theory and implementation of: processes,
◮ become familiar (if not already) with the C language, gcc compiler,
◮ understand the high-level structure of the Linux kernel both in
◮ acquire a detailed understanding of one aspect (the scheduler) of
3 / 184
◮ Introduction; history of computers; overview of OS (this lecture) ◮ Computer architecture (high-level view); machines viewed at
◮ Basic OS functions and the historical development of OSes ◮ Processes (1) ◮ Processes (2) – threads and SMP ◮ Scheduling (1) – cpu utilization and task scheduling ◮ Concurrency (1) – mutual exclusion, synchronization ◮ Concurrency (2) – deadlock, starvation, analysis of concurrency
4 / 184
◮ Memory (1) – physical memory, early paging and segmentation
◮ Memory (2) – modern virtual memory concepts and techniques ◮ Memory (3) – paging policies ◮ I/O (1) – low level I/O functions ◮ I/O (2) – high level I/O functions and filesystems ◮ Case studies: one or both of: the Windows NT family; IBM’s
◮ Other topics to be determined, e.g. security.
5 / 184
6 / 184
7 / 184
8 / 184
9 / 184
10 / 184
11 / 184
12 / 184
13 / 184
◮ 30 tons, 1000 sq feet, 140 kW ◮ 18k vacuum tubes, 20 10-digit accumulators ◮ 100 kHz, around 300 M(ult)PS ◮ in 1946 added blinking lights for the Press!
14 / 184
Accumulator
15 / 184
◮ 3k vacuum tubes, 300 sq ft, 12 kW ◮ 500kHz, ca 650 IPS ◮ 1K 17-bit words of memory (Hg ultrasonic delay lines) ◮ operating system of 31 words ◮ see http://www.dcs.warwick.ac.uk/~edsac/ for a simulator
16 / 184
C/C++ Source ASM Source Object File Other Object Files ("Libraries") Executable File ("Machine Code")
compile assemble link execute
ML/Java Bytecode Level 4 Level 3 Level 2 Level 1 Level 5
interpret
17 / 184
Virtual Machine M5 (Language L5) Virtual Machine M4 (Language L4) Virtual Machine M3 (Language L3)
Meta-Language Level Compiled Language Level Assembly Language Level
Virtual Machine M2 (Language L2) Virtual Machine M1 (Language L1)
Digital Logic Level Operating System Level
Actual Machine M0 (Language L0)
Conventional Machine Level
18 / 184
Control Unit e.g. 64 MByte 2^26 x 8 = 536,870,912bits
Address Data Control
Reset
Execution Unit Register File (including PC) Sound Card Framebuffer Hard Disk Super I/O
Mouse Keyboard Serial
19 / 184
20 / 184
◮ eight 32-bit general purpose registers ◮ six 16-bit segment registers (for address space management) ◮ two 32-bit control registers, including Program Counter (called EIP
◮ sixteen 64-bit general registers ◮ sixteen 64-bit floating point registers ◮ one 32-bit floating point control register ◮ sixteen 64-bit control registers ◮ sixteen 32-bit access registers (for address space management) ◮ one Program Status Word (PC)
21 / 184
32K ROM
Execution Unit Control Unit
Address Data Control
Data Cache Instruction Cache
Bus Interface Unit
64MB DRAM
22 / 184
23 / 184
IB Decode
Register File
PC
24 / 184
◮ I/O devices typically connected to CPU via a bus (or via a chain of
◮ wide range of devices, e.g.: hard disk, CD, graphics card, sound
◮ often with several stages and layers ◮ all of which are very slow compared to CPU.
25 / 184
Processor Memory Other Devices
ADDRESS DATA CONTROL
26 / 184
Sound Card Bridge 64MByte DIMM
Processor
Caches 64MByte DIMM Framebuffer Bridge SCSI Controller
PCI Bus (33Mhz) Memory Bus (100Mhz) Processor Bus ISA Bus (8Mhz)
27 / 184
28 / 184
29 / 184
30 / 184
31 / 184
32 / 184
Interrupt Processing Device Drivers Job Sequencing Control Language Interpreter User Program Area Monitor Boundary
◮ monitor is simple resident OS: reads
◮ batches of jobs can be put onto one
◮ monitor permanently resident: user
33 / 184
◮ memory protection: user programs should not be able to . . . write to
◮ timer control: . . . or run for ever, ◮ privileged instructions: . . . or directly access I/O (e.g. might read
◮ interrupts: . . . or delay the monitor’s response to external events
34 / 184
◮ manage memory among the various tasks ◮ schedule execution of the tasks
35 / 184
36 / 184
37 / 184
38 / 184
39 / 184
Processor Memory Management Unit Virtual Address Main Memory Real Address Secondary Memory Disk Address
Virtual Memory Addressing
40 / 184
41 / 184
◮ S/370 family has supervisor and problem states ◮ Intel x86 has rings 0,1,2,3.
42 / 184
◮ A frame or page may be read or write accessible only to a processor
◮ In S/370, each frame of memory has a 4-bit storage key, and each
◮ the virtual memory mechanism may be extended with permission
◮ combination of all the above may be used.
43 / 184
H/W S/W App. Priv Unpriv App. App. App. Kernel
Scheduler Device Driver Device Driver System Calls File System Protocol Code
44 / 184
H/W S/W
App.
Priv Unpriv
Server Device Driver Server Server App. App. App.
Kernel Scheduler
Device Driver
45 / 184
◮ increase modularity ◮ increase extensibility
◮ have more overhead (due to IPC) ◮ can be difficult to implement (synchronization) ◮ often keep multiple copies of OS data structures
◮ Linux is monolithic, but has modules that are dynamically
◮ Windows NT was orig. microkernel-ish, but for performance has put
46 / 184
◮ its memory, including stack and heap ◮ the contents of registers ◮ program counter ◮ its state
47 / 184
◮ New: process being created ◮ Running: process executing on CPU ◮ Ready: not on CPU, but ready to run ◮ Blocked: waiting for an event (and so not runnable) ◮ Exit: process finished, awaiting cleanup
48 / 184
dispatch timeout
release admit event-wait event
◮ admit: process control set up, move to run queue ◮ dispatch: scheduler gives CPU to runnable process ◮ timeout/yield: running process forced to/volunteers to give up CPU ◮ event-wait: process needs to wait for e.g. I/O ◮ event: event occurs – wake up process and tell it ◮ release: process terminates, release resources
49 / 184
Process Number (or Process ID) Current Process State Other CPU Registers Memory Management Information CPU Scheduling Information Program Counter Other Information (e.g. list of open files, name of executable, identity of owner, CPU time used so far, devices owned) Refs to previous and next PCBs
◮ unique process ID ◮ process state ◮ PC and other
◮ memory
◮ scheduling and
◮ . . .
50 / 184
Process A Process B Operating System
Save State into PCB A Restore State from PCB B Save State into PCB B Restore State from PCB A idle idle idle executing executing executing
51 / 184
◮ in older OSes, kernel is seen as single program in real memory ◮ in modern OSes, kernel may execute in context of user process ◮ parts of OS may be processes (in some sense)
52 / 184
53 / 184
◮ By the OS when a job is submitted or a user logs on. ◮ By the OS to perform background service for user (e.g. printing). ◮ By explicit request from user program (spawn, fork).
54 / 184
◮ assign unique identifier ◮ allocate memory space: both kernel memory for control structures,
◮ initialize PCB and (maybe) memory management tables ◮ link PCB into OS data structures ◮ initialize remaining control structures ◮ for WinNT, OS/390: load program ◮ for Unix: make child process a copy of parent
55 / 184
◮ terminate voluntarily (Unix exit()) ◮ perform illegal operation (privileged instruction, access non-existent
◮ be killed by user (Unix kill()) or OS because
◮ allocated resources exceeded ◮ task functionality no longer needed ◮ parent terminating (in some OSes) ...
◮ deal with pending output etc. ◮ release all system resources held by process ◮ unlink PCB from OS data structures ◮ reclaim all user and kernel memory
56 / 184
◮ own resources such as address space, i/o devices, files ◮ are units of scheduling and execution
◮ creating threads is quick (ca. 10 times quicker than processes) ◮ ending threads is quick ◮ switching threads within one process is quick ◮ inter-thread communication is quick and easy (have shared memory)
57 / 184
◮ create: thread spawns new thread, specifying instruction pointer or
◮ block: thread waits for event. Other threads may execute. ◮ unblock: event occurs, thread become ready. ◮ finish: thread completes; context reclaimed.
58 / 184
◮ thread library implements mini-process scheduler (entirely in user
◮ context of thread is PC, registers, stacks etc., saved in ◮ thread control block (stored in user process’s memory) ◮ switching between threads can happen voluntarily, or on timeout
59 / 184
◮ context switching very fast – no OS involvement ◮ scheduling can be tailored to application ◮ thread library can be OS-independent
◮ if thread makes blocking system call, entire process is blocked.
◮ user-space threads don’t execute concurrently on multiprocessor
60 / 184
◮ Single Instruction Single Data stream (SISD): normal setup, one
◮ Single Instruction Multiple Data stream (SIMD): a single program
◮ Multiple Instruction Single Data stream (MISD): not used. ◮ Multiple Instruction Multiple Data stream (MIMD): many processors
61 / 184
62 / 184
◮ cache coherence: several CPUs, one shared memory. Each CPU has
◮ re-entrancy: several CPUs may call kernel simultaneously. Kernel
◮ scheduling: genuine concurrency between threads. Also between
◮ memory: must maintain virtual memory consistency between
◮ fault tolerance: single CPU failure should not be catastrophic.
63 / 184
◮ Batch scheduling, long-term: which jobs should be started?
◮ medium term: some OSes suspend or swap out processes to
◮ process scheduling, short-term: which process gets the CPU next?
64 / 184
◮ good utilization: minimize the amount of CPU idle time ◮ good utilization: job throughput ◮ fairness: jobs should all get a ‘fair’ share of CPU . . . ◮ priority: . . . unless they’re high priority ◮ response time: fast (in human terms) response to interactive input ◮ real-time: hard deadlines, e.g. chemical plant control ◮ predictability: avoid wild variations in user-visible performance
65 / 184
◮ first-come-first-served: (FCFS, FIFO, queue) – what it says. Favours
◮ shortest process next: (SPN) – dispatch process with shortest
◮ and others . . .
66 / 184
◮ round-robin: when the quantum expires, running process is sent to
◮ shortest remaining time: (SRT) – preemptive version of SPN. On
67 / 184
◮ feedback: use dynamically assigned priorities:
◮ new process starts in queue of priority 0 (highest); ◮ each time it’s pre-empted, goes to back of next lower priority queue; ◮ dispatch first process in highest occupied queue.
◮ increase quantum for lower priority processes ◮ raise priority for processes that are starved 68 / 184
69 / 184
◮ assigning processes to processors ◮ deciding on multiprogramming on each processor ◮ actually dispatching processes
70 / 184
◮ load sharing: idle processor selects ready thread from whole pool ◮ gang scheduling: a gang of related threads are simultaneous
◮ dedicated CPUs: static assignment of threads (within program) to
◮ dynamic scheduling: involve the application in changing number of
71 / 184
◮ the single pool of TCBs must be accessed with mutual exclusion –
◮ preempted threads are unlikely to be rescheduled to same CPU;
◮ program wanting all its threads running together is unlikely to get it
72 / 184
◮ determinism: need to acknowledge events (e.g. interrupt) within
◮ responsiveness: and take appropriate action quickly enough ◮ user control: hardness of deadlines and relative priorities is (almost
◮ reliability: systems must ‘fail soft’. panic() is not an option!
73 / 184
74 / 184
◮ resource control: if one resource, e.g. global variable, is accessed by
◮ resource allocation: processes can acquire resources and block,
◮ debugging: execution becomes non-deterministic (for all practical
75 / 184
76 / 184
◮ mutual exclusion must be enforced! ◮ processes blocking in noncritical section must not interfere with
◮ processes wishing to enter critical section must eventually be allowed
◮ entry to critical section should not be delayed without cause ◮ there can be no assumptions about speed or number of processors
◮ processes remain in their critical section for finite time
77 / 184
◮ via hardware: special machine instructions ◮ via OS support: OS provides primitives via system call ◮ via software: entirely by user code
78 / 184
◮ processes busy-wait ◮ the processes must take strict turns
79 / 184
80 / 184
81 / 184
82 / 184
83 / 184
84 / 184
85 / 184
◮ init(s,n): create the semaphore and initialize it to the
◮ wait(s): the semaphore value is decremented. If the value is now
◮ signal(s): the semaphore is incremented. If the value is
86 / 184
87 / 184
88 / 184
89 / 184
90 / 184
◮ cwait(c) where c is a condition variable confined to the monitor:
◮ csignal(c): some process suspended on c is released and takes the
91 / 184
◮ Unix file locks: many Unices provide read/write locking on files. See
◮ The OS/390 ENQ system call provides general purpose read/write
◮ The Linux kernel uses ‘read/write semaphores’ internally. See
92 / 184
◮ simple mutex by using a single message as a token ◮ producer/consumer: producer sends data as messages to consumer;
93 / 184
◮ A is a disk file, B is a tape drive. ◮ A is an I/O port, B is a memory page.
94 / 184
◮ resources are held by only one process at a time ◮ a resource can be held while waiting for another ◮ processes do not unwillingly lose resources
◮ a circular dependency arises between resource requests
95 / 184
96 / 184
◮ kill all deadlocked processes (!) ◮ selectively kill deadlocked processes ◮ forcibly remove resources from some processes (what does the
◮ if checkpoint-restart is available, roll back to pre-deadlock point,
97 / 184
◮ code (instructions, text): the program itself ◮ static data: data compiled into the program ◮ dynamic data: heap, stack
◮ relocation: moving programs in memory ◮ allocation: assigning memory for processes ◮ protection: preventing access to other processes’ memory. . . ◮ sharing: . . . except when appropriate ◮ logical organization: how memory is seen by process ◮ physical organization: and how it is arranged in hardware
98 / 184
99 / 184
100 / 184
limit
base
101 / 184
◮ fixed partitioning: divide memory into fixed chunks. Disadvantage:
◮ dynamic partitioning: load process into suitable chunk; when exits,
◮ first fit: choose first big enough chunk ◮ next fit: choose first big enough chunk after last allocated chunk ◮ best fit: choose chunk with least waste
102 / 184
◮ Memory is maintained as a binary tree of blocks of sizes 2k for
◮ When process of size s, 2i−1 < s ≤ 2i, comes in, look for free block
◮ When blocks are freed, merge free sibling nodes (‘buddies’) to
103 / 184
◮ hardware/OS provide different segments for different types of data,
◮ hardware/OS provides multiple segments at user request.
◮ logical memory address viewed as pair (s, o) ◮ process has segment table: look up entry s in table to get base and
◮ translate as normal to o + bs or raise fault if o + bs > ls
104 / 184
◮ may correspond to user view of memory. ◮ importantly, protection can be done per segment: each segment can
◮ makes sharing of code/data easy. (But better to have a single list of
◮ variable size segments leads to external fragmentation again; ◮ may need to compact memory to reduce fragmentation; ◮ small segments tend to minimize fragmentation, but annoy
105 / 184
106 / 184
107 / 184
logical address physical address
p1 p2 p3 p4 f1 f2 f3 f4
108 / 184
109 / 184
Virtual Address L2 Address L1 Page Table
n N
Base Register L2 Page Table
n N
Leaf PTE
110 / 184
◮ mark the pages read-only in each process (using protection bits in
◮ when process writes, generates protection exception; ◮ OS handles exception by allocating new frame, copying shared page,
111 / 184
◮ initialize process’s page table with invalid entries; ◮ on first reference to page, get exception: handle it, allocate frame,
◮ when real memory gets tight, choose some pages, write them to
◮ when process refers to page on disk, get exception; handle by
112 / 184
◮ modified bit for page: no need to write out page if not changed
◮ referenced bit or counter: unreferenced pages are first candidates for
◮ On Intel, modified and reference bits are part of page table entry. ◮ On S/390, they are part of storage key associated with each real
113 / 184
◮ Logical address is 31 bits: ◮ first 11 bits index into current segment table ◮ next 8 bits index into page table; ◮ remaining bits are offset.
114 / 184
◮ Logical address is 16-bit segment id and 32-bit offset. ◮ Segment id indexes into segment table; but ◮ segment id portion of logical address is found via a segment register; ◮ which is usually implicit in access type (CS register for instruction
◮ Segment registers are part of task context. (Task context stored in
◮ May be single global segment table; may also have task-specific
115 / 184
◮ Segment related info (e.g. segment tables) can be paged out; so can
◮ There is no link between pages and segments: segments need not lie
◮ Pages can be 4KB, or 4MB. ◮ Page table register is part of task context, stored in task segment (!).
116 / 184
◮ minimize number of page faults: avoid paging out pages that will be
◮ minimize disk i/o: avoid reclaiming dirty (modified) pages
117 / 184
◮ demand paging: when referenced. The locality principle suggests
◮ prepaging: try to bring in pages ahead of demand, exploiting
118 / 184
◮ LRU – least recently used: choose the page with longest time since
◮ FIFO – first in, first out: simple, but pages out heavily used pages.
◮ clock policy: attempts to get some of the performance of LRU
119 / 184
120 / 184
◮ When a page is replaced, it’s added to the end of the free page list if
◮ The actual frame used for the paged-in page is the head of the free
◮ If no free pages, or when modified list gets beyond certain size, write
◮ pages in the caches can be instantly restored if referenced again; ◮ I/O is batched, and therefore more efficient.
121 / 184
◮ allocate a certain number of frames to each process (on what
◮ after a process reaches its allocation, if it page faults, choose some
◮ re-evaluate resident set size (RSS) from time to time
122 / 184
◮ page fault frequency: choose threshold frequency f . On page fault:
◮ if (virtual) time since last fault is < 1/f , add one page to RSS;
◮ discard unreferenced pages, and shrink RSS; clear use bits on other
◮ variable-interval sampled working set: at intervals,
◮ evaluate working set (clear use bits at start, check at end) ◮ make this the initial resident set for next interval ◮ add any faulted-in pages (i.e. shrink RS only between intervals) ◮ the interval is every Q page faults (for some Q), subject to upper and
◮ Tune Q, U, L according to experience. . . 123 / 184
◮ dealing with wildly disparate hardware ◮ with speeds from 102 to 109 bps ◮ and applications from human communication to data storage ◮ varying complexity of device interface (e.g. line printer vs disk) ◮ data transfer sizes from 1 byte to megabytes ◮ in many different representations and encodings ◮ and giving many idiosyncratic error conditions
124 / 184
◮ direct control: CPU controls device by reading/writing data. lines
◮ polled I/O: CPU communicates with hardware via built-in controller;
◮ interrupt-driven I/O: CPU issues command to device, gets interrupt
◮ direct memory access: CPU commands device, which transfers data
◮ I/O channels: device has specialized processor, interpreting special
125 / 184
◮ CPU places data in data register ◮ CPU puts write command in command register ◮ CPU busy-waits reading status register until ready flag is set
126 / 184
127 / 184
128 / 184
129 / 184
Device Driver Layer
Device Driver Device Driver Device Driver
Common I/O Functions
Keyboard HardDisk Network
Device Layer Virtual Device Layer
H/W Unpriv Priv
I/O Scheduling I/O Buffering
Application-I/O Interface
130 / 184
◮ character: terminals, printers, keyboards, mice, . . . typically transfer
◮ block: disk, CD-ROM, tape, . . . transfer data in blocks (fixed or
◮ network: ethernet etc, tend to have mixed characteristics and need
◮ other: clocks etc.
131 / 184
◮ A typical modern disk drive comprises several platters, each a thin
◮ A comb of heads is on a movable arm, with one head per surface. ◮ If the heads stay still, they access circles on the spinning platters.
◮ Often, tracks are divided into fixed length sectors.
◮ move head assembly to right cylinder (around 4 ms on modern disks) ◮ wait for right sector to rotate beneath head (around 5 ms in modern
132 / 184
◮ SSTF (shortest service time first): do request with shortest seek
◮ SCAN: move the head assembly from out to in and back again,
◮ C-SCAN: scan in one direction only, then flip back. Avoids bias
◮ FSCAN, N-step-SCAN: avoid long delays by servicing only a quota
133 / 184
◮ Level 0: data are striped across n disks. Data is divided into strips;
◮ Level 1: data are mirrored (duplicated) on each disk. Protects
134 / 184
◮ Level 2: data are striped in small (byte or word) strips across some
◮ Level 3: same, but using only parity bits, stored on other disk. If
◮ Level 4: large data strips, as for level 0, with extra parity strip on
◮ Level 5: as level 4, but distribute parity strip across disks, avoiding
◮ Level 6: data striping across n disks, with two different checksums
135 / 184
◮ field: basic element of data. May be typed (string, integer, etc.).
◮ record: collection of related fields, relating to one entity. May be of
◮ file: collection of records forming a single object at OS and user
◮ database: collection of related data, often in multiple files, satisfying
136 / 184
◮ device drivers: already covered ◮ physical I/O: reading/writing blocks on disk. Already covered. ◮ basic I/O system: connects file-oriented I/O to physical I/O.
◮ logical I/O: presents the application programmer with a (hopefully
◮ access methods: provide application programmer with routines for
137 / 184
◮ byte stream: unstructured stream of bytes. Only native Unix type. ◮ pile: unstructured sequence of variable length records. Records and
◮ (fixed) sequential: sequence of fixed-length records. Can store only
◮ indexed sequential: add an index file, indexing
◮ indexed: drop the
◮ hashed / direct: hash key value directly into
138 / 184
◮ directories list files, including other directories. ◮ file is located by path through directory tree, e.g.
◮ directory entry may contain file metadata (owner, permissions,
◮ (usually) directories can only be accessed via system calls, not by
139 / 184
◮ files are unstructured byte sequences ◮ metadata (including pointers to data) is stored in an inode ◮ directories link names to inodes (and that’s all) ◮ hence file permissions are entirely unrelated to directory permissions ◮ inodes may be listed in multiple directories ◮ inodes (and file data) are automatically freed when no directory
◮ the root directory of a filesystem is found in a fixed inode (number 2
140 / 184
◮ files may have any of the formats mentioned above, and others ◮ files live on a disk volume, which has a VTOC giving names and
◮ files from many volumes can be put in catalogs ◮ and a filename prefix can be associated with a catalog via the
◮ catalogs also contain additional metadata (security etc.), depending
◮ the master catalog is defined at system boot time from the VTOC
141 / 184
◮ knowledge of existence (e.g. seeing directory entry) ◮ execute (for programs) ◮ read ◮ write ◮ write append-only ◮ change access rights ◮ delete
142 / 184
◮ predefined permission bits, e.g. Unix read/write/execute for
◮ access control lists giving specific rights to specific users or groups ◮ capabilities granted to users over files (see Computer Security)
143 / 184
◮ fixed blocking: pack constant number of fixed-length records into
◮ variable, spanning: variable-length records, packed without regard to
◮ variable, non-spanning: records don’t span blocks; just waste space
144 / 184
◮ contiguous allocation: makes file I/O easy and quick. But:
◮ chained allocation: allocate blocks as and when needed, and chain
◮ indexed allocation: file has index of blocks or sequences of blocks
145 / 184
◮ Started 1989. New codebase; microkernel based architecture. ◮ NT3.1 released 1993; poor quality. ◮ NT3.5 released 1994 (3.51 in 1995); more or less usable. ◮ NT4.0 released 1996; matched W95 look’n’feel. For performance,
◮ Windows 2000 (NT 5.0); adds features for distributed processing;
◮ Windows XP: no really significant OS-side changes. Terminal servers
◮ Windows Vista: still NT, but many components extensively
◮ Windows 7: trying to cut down on kernel bloat. Changes to memory
146 / 184
◮ portability – not just Intel; ◮ security – for commercial and military use; ◮ POSIX compliance – to ‘ease transition from Unix’. . . ◮ SMP support; ◮ extensibility; ◮ internationalization and localization; ◮ backwards compatibility (to Windows 9?, 3.1, even MS-DOS)
147 / 184
OS/2 Subsytem OS/2 Applications Win32 Applications Kernel Mode User Mode Hardware Native NT Interface (Sytem Calls)
Object Manager Process Manager VM Manager I/O Manager
Win32 Subsytem POSIX Subsytem Security Subsytem MS-DOS Applications Posix Applications Win16 Applications Logon Process MS-DOS Subsytem Win16 Subsytem
ERNEL
K
EVICE
D
Hardware Abstraction Layer (HAL)
RIVERS
D
File System Drivers Cache Manager Security Manager LPC Facility XECUTIVE
E
148 / 184
149 / 184
150 / 184
151 / 184
◮ hierarchical namespace for named objects (via directory objects); ◮ access control lists ◮ naming domains – mapping existing namespaces to object
◮ symbolic links ◮ handles to objects (used by processes etc.)
152 / 184
◮ asynchronous request/response model – application queues request,
◮ device drivers and file system drivers can be stacked (cf. Solaris
◮ cache manager provides general caching services ◮ network drivers include distributed system support (W2K and XP)
◮ an NTFS volume occupies a partition, a disk, or multiple disks ◮ an NTFS file is structured: a file has attributes, including (possibly
◮ files located via MFT (master file table) ◮ NTFS is a journalling file system
153 / 184
◮ performance ◮ reliability ◮ availability ◮ compatibility
154 / 184
◮ supervisor: main OS functions ◮ Master Scheduler: system start and control, communication with
◮ Job Entry Subsystem (JES2): entry of batch jobs, handling of
◮ System Management Facility (SMF): accounting, performance
◮ Resource Measurement Facility: records data about system events
◮ Workload Manager: manages workload according to installation
◮ Timesharing Option (TSO/E): provides interactive timesharing
◮ TCAM, VTAM, TCP/IP: telecoms and networking ◮ Global Resource Serialization: resource control across clusters
155 / 184
◮ Dispatcher: main scheduler (in OS sense) ◮ Real Storage Manager: manages real memory, decides on page
◮ Auxiliary Storage Manager: handles page/swap in/out ◮ Virtual Storage Manager: address space management and virtual
◮ System Resources Manager: supervisor component of Workload
156 / 184
◮ read from ‘card reader’ or TSO SUBMIT command ◮ convert JCL to internal form ◮ start execution ◮ spool output; hold for later inspection, and/or print
157 / 184
◮ nucleus: important control blocks (in ptic CVT); most OS routines ◮ Fixed Link Pack Area: non-pageable (e.g. for performance) shared
◮ private area: includes address-space-local system data, and user
◮ Common Service Area: data shared between all tasks ◮ System Queue Area: page tables etc. ◮ Pageable Link Pack Area: shared libraries permanently resident in
158 / 184
◮ Memory is paged on demand, with a two-level paging system. ◮ The task is the basic dispatchable unit. One address space may have
◮ There is a general resource control mechanism ENQ/DEQ. ◮ SMP is supported in hardware. ◮ I/O is highly sophisticated, offloaded from CPU. Throughput and
159 / 184
◮ VM provides each user with a (configurable) virtual S/390 machine
◮ The VM CP (control program) gives each user a virtual console with
◮ they may start CMS (Conversational Monitor System), a single-user
◮ or they may load a S/390 operating system: MVS, Linux/390, or
◮ VM is a fully paging virtual memory OS, so ◮ IBM OSes can be adjusted to allow communication with VM
160 / 184
◮ Confidentiality: data (or even its existence) should protected from
◮ Integrity: data should not be modified by unauthorized entities. ◮ Availability: data should be available to authorized entities ◮ Authenticity: all entities should be identified, so that all operations
◮ Non-repudiation: no entity should be able to deny doing any action
161 / 184
162 / 184
◮ no protection: fine, if the system is contained by physical security. ◮ isolation of tasks: different tasks have separate address spaces,
◮ public/private: allow object owners to make them public (accessible
◮ sharing via access lists: OS enforces user-specified access restrictions
◮ . . . via capabilities: or with dynamically created access capabilities,
◮ limit uses: constrain detailed use: printing, viewing, copying etc.
163 / 184
◮ User identification
◮ Passwords ◮ One-time passwords ◮ Biometrics
◮ Confidentiality
◮ OS facilities ◮ Encryption
◮ Authenticity and non-repudiation
◮ Cryptographic signing 164 / 184
◮ exploiting bugs in system software, e.g. buffer overflow attacks ◮ exploiting users, e.g. most email viruses
◮ rigorous access control on need-to-know basis ◮ reviews of potentially exploitable code ◮ user education
◮ signature scanning ◮ sandboxed execution ◮ performance and system behaviour analysis
165 / 184
◮ break directly into a privileged network server, suborn an operator,
◮ break a user account, then exploit weakness in OS to get root ◮ break into a trusted but more vulnerable machine, use as relay
166 / 184
◮ educate users ◮ scan mail for known viruses ◮ modify local mail programs etc. to stop them executing attachments ◮ prohibit (by modifying OS if necessary) execution of any program
167 / 184
◮ guessing (or brute force searching) passwords.
◮ faking login screens (this would be very easy on DICE).
168 / 184
◮ modify the C compiler (cc) so that it recognizes when it is
◮ and if it recognizes it is compiling itself, it inserts these two routines.
169 / 184
◮ It was caught by clash detection in version control. ◮ The source tree they modified was not actually the master (although
170 / 184
171 / 184
◮ buffer overflow problem in the Unix finger service ◮ exploiting intentional ‘trapdoor’ in Unix mail servers compiled in
◮ password guessing ◮ Unix remote execution allowed easy spread
◮ changes its name to sh ◮ forks() to change process id frequently ◮ avoids leaving files around ◮ obfuscates data in memory to hinder analysis
172 / 184
◮ MVS has a notion of authorized program which can do privileged
◮ one MVS system had a home-grown user command processor (=
◮ and which ran in storage key 8 (normal user storage key) . . . ◮ MVS I/O standard access methods allow user to intercept READ
◮ user could install read hook on files being used by shell
173 / 184
174 / 184
175 / 184
◮ No Read Up (simple security property): a process running at one
◮ No Write Down (*-property): a process running at one level may
176 / 184
177 / 184
178 / 184
179 / 184
180 / 184
181 / 184
182 / 184
183 / 184
184 / 184