Interconnection Structures
Patrick Happ Raul Queiroz Feitosa
Interconnection Structures Patrick Happ Raul Queiroz Feitosa - - PowerPoint PPT Presentation
Interconnection Structures Patrick Happ Raul Queiroz Feitosa Objective To present key issues that affect interconnection design. 2 Interconnection Structures Outline Introduction Computer Busses Bus Types PCI PCI Express
Interconnection Structures
Patrick Happ Raul Queiroz Feitosa
Objective
To present key issues that affect interconnection design.
Outline
Introduction Computer Busses Bus Types PCI PCI Express
Introduction
All the units must be connected Different type of connection for different
type of unit:
Memory Input/Output CPU
Interconnection Structures 4Unit Types
Interconnection Structures 5Computer Busses
A bus is a common electrical pathway between multiple devices
Functional groups of bus lines
Data Bus
Carries data
Remember that there is no difference between
“data” and “instruction” at this level.
Data bus width determines the amount of data moved in a single access.
8, 16, 32, 64 bit
Address bus
Identifies the source or destination of data
e.g. CPU needs to read an instruction (data) from a
given location in memory.
Address bus width determines maximum memory capacity of system
e.g. 8080 has 16 bit address bus giving 64k address
space
Control bus
Carries control and timing information. Typical control lines:
Memory and I/O read/write signal Interrupt request/acknowledge Bus grant/request Clock signals Reset
Physical Realization of Bus Architecture
Single Bus Problems
The more devices attached to the bus,
Most systems use multiple buses to overcome
these problems
Traditional bus architecture
Example: ISA bus
High Performance Bus
Example: PCI Bus
Bus Types
Dedicated
Separate data & address lines
Multiplexed
Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages
More complex control PerformanceBus Arbitration
There may be more than one potential bus master, e.g.
CPU and DMA controller Multiple CPUs in a parallel shared bus system.
An arbitration mechanism is required to guarantee that no more than one master controls the bus at a time.
Bus Arbitration
Why?
digital output model
IN OE Vcc OUT Q1 Q2 Q3 Q4 IN OE Vcc OUT Q1 Q2 Q3 Q4 OE=low;IN=low →Q3 off ; Q4 on ; OUT=low OE=low;IN=high→Q3 on ; Q4 off ; OUT=high OE=high → Q3 off ; Q4 off ; OUT=highZ OUT=low (Q3 off - Q4 on) OUT=high (Q3 on - Q4 off) OUT=highZ (Q3 off - Q4 off) bus line short circuit
Methods of Arbitration
Centralised
a single arbiter grants bus access.
Arbiter
R0 R1 R2 . . . Rn-1 G0 G1 G2 . . . Gn-1
bus requests bus grants
Methods of Arbitration
Distributed
Each module may claim the bus. Control logic on all modules.
Methods of Arbitration
Distributed
Bus request and Busy lines are open collector. Requesting devices set Bus request=0 Requesting devices make Out=0; non requesting devices make Out=In Requesting device with In=1, gets the bus It waits until Busy=1, sets Busy=0 and takes the bus Upon relinquishing the Bus the device sets Busy=1
Timing
Synchronous
Events determined by clock signals. Control Bus includes clock line. Usually sync on leading edge Usually a single cycle for an event A READY/WAIT line signals when the slave is
expected to have completed the access.
Advantage: simple implementation (due to the
clock signal).
Synchronous Timing Diagram
Synchronous Timing Diagram
Interconnection Structures 23Timing
Asynchronous
No clock signal (due to distortion) Events determined by completion of earlier events Status lines signal when the slave completes the
access.
Advantages:
allows for “fractional” cycles no minimal access time is imposed longer busses are possible (clock distortion)
Asynchronous Timing Diagram
Data Transfer Type
Time → Write (multiplexed) operation Write (non-multiplexed) operation Read (multiplexed) operation Read-modify-write operation Read (non-multiplexed) operation Read after write operation Block Data Transfer
Address Data Address Data access time Address Data read Data write Address Data Data Data Address Data Address Data read Data write Address Data
The PCI Bus
The bus structure of a Pentium 4.
The PCI Bus
Characteristics
Synchronous Parallel 32 or 64 bit transfers Up to 528 MB/s
The three 64-bit PCI slots and a single 32-bit PCI-slotPCI Bus Signals(1)
Mandatory PCI bus signals.
PCI Bus Signals(2)
Optional PCI bus signals.
PCI Bus Transactions
Examples of 32-bit PCI bus transactions.
multiplexed address and data lines bus command/bit map for bytes enable AD and C/BE are enabled read: master will accept; write: data present read: data present; write: slave will accept master slave bothProblem with parallel busses
Clock skew
a phenomenon in
synchronous circuits in which the clock signal (sent from the clock circuit) arrives at different components at different times.
discrepancyProblem with parallel busses
Clock skew may be caused by many factors:
wire length, variation in intermediate devices, capacitive coupling, material imperfections, …
As the clock rate increases, less variation can be tolerated if the circuit is to function properly. It imposes a clock rate limit to the parallel bus.
PCIe
A typical PCI Express system.
Manages multiple PCIe streams
PCIe
Interconnection Structures 35Switch: manages multiple PCIe streams PCIe endpoint: An I/O device or controller that implements PCIe, Gigabit ethernet switch, a graphis
PCIe/PCI bridge: Allows
connected to PCIe-base system.
PCIe Characteristics
Communication flows through one or more pairs
Each connection of a lane consists of one wire for
the signal and one for ground, which provide high noise immunity.
PCIe devices communicate through a link, which
is built up from a collection of one or more lanes
links with 1, 2, 4 ,8 16 and 32 lanes are allowed.
PCIe Configuration
paired serial links
PCIe Evolution
Evolution
2004: PCIe 1.1 → 2.5 GB/s (per lane). 2007: PCIe 2.0 → 5.0 GB/s (per lane). 2010: PCIe 3.0 → 8.0 GB/s (per lane). 2017: PCIe 4.0 → 16.0 GB/s (per lane). 2019: PCIe 5.0 → 32.0 GB/s (per lane). 2021: PCIe 6.0 → 64.0 GB/s (per lane).
PCIe Evolution
Interconnection Structures 39Source: Wikipedia
PCIe Protocol Layers
A protocol is a set of rules governing the
conversation between two parties.
A protocol stack is a hierarchy of protocols that deals
with different issues at different layers.
The PCI Express protocol stack has 3 layers:
PCI Express – physical layer
It deals with moving bits from a sender to a receiver. Recall that each point-to-point connection consists of one or more pairs of
simplex (unidirecional) links, called lanes.1, 2, 4, 8, 16 or 32 pairs are allowed
No master clock – 128b/130b enconding→1 enssures enough clock
transitions to keep synchonization.
A PCI Express x16 slot A PCI Express x1 slotPCIe Multilane Distribution
PCIe Link Layer
It deals with packet transmission It adds to the header+payload the sequence number and error-correction
code CRC
If the CRC is checked OK, the receiver sends back an acknowledgement
packet, otherwise it asks for retransmission.
This greatly improves data integrity.
Transaction layer Link layer Physical layer
frame CRC seq # payload header frame CRC seq # payload header payload headerPCI Express – transaction layer
It handles bus actions, It splits transactions in request and response separated by time. It may divide each lane in up to eight virtual circuits, each handling
different class of traffic.
Flow control – guarantees that the transmiter stops sending data until the
there is free space in receiver buffer.
Transaction layer Link layer Physical layer
frame CRC seq # payload header frame CRC seq # payload header payload headerExercise 1
Consider two microprocessors having 8, and 16 –bit- wide external data buses, respectively. The two processors are identical otherwise and their bus cycles take just as long.
a) Suppose all instructions and operands are two bytes
rates differ?
b) Repeat assuming that half of the operands and
instructions are one byte long.
Interconnection Structures 45Exercise 2
For a synchronous read operation (slides 23 e 24), the memory module must place the data on the bus sufficiently ahead of the falling edge of the Read signal to allow for signal settling. Assume a microprocessor bus is clocked at 10 MHz.
a) When, at the latest, should memory data be placed on the bus
after the Read signal is asserted?
b) How many wait states (clock cycles) need to be inserted for
proper read operation if the read-to-data available time of a memory chip is equal to 350 ns.
Interconnection Structures 46Exercise 3
A microprocessor has an increment memory direct instruction, which adds 1 to the value in a memory location. The instruction has five stages: fetch opcode (four bus clock cycles), fetch operand address (three cycles), fetch operand (three cycles), add 1 to operand (three cycles), and store
a)
By what amount (in percent) will the duration of the instruction increase if we have to insert two bus wait states in each memory read and memory write operation?
b)
Repeat assuming that the increment operation takes 13 cycles instead of 3 cycles.
Interconnection Structures 47Exercise 4
Consider a 32 bit microprocessor whose bus cycle is the same duration as that of a 16 bit microprocessor. Assume that, on average, 20% of the operands and instructions are 32 bits long, 40% are 16 bits long, and 40% are only 8 bits long. Calculate the improvement achieved when fetching instruction and
Interconnection Structures