1 Response Time Det tar 4 mnader att odla fram en tomat How long - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Response Time Det tar 4 mnader att odla fram en tomat How long - - PDF document

Foto: Hughes Leglise-Bataille some rights reserved How do we define (speed) performance ? Response time (aka execution time) the time between the start and the Thus, to maximize completion of a task performance, need to Important to


slide-1
SLIDE 1

1

Foto: Hughes Leglise-Bataille some rights reserved

Digitalteknik och Datorarkitektur 5hp

Kapitel 4: prestanda

24 april 2008

karl.marklund@it.uu.se How do we define (speed) performance? Response time (aka execution time) – the time between the start and the completion of a task Important to individual users. Thus, to maximize performance, need to minimize execution time.

performance = 1 execution_time

Nisse Klasse performance = 1 execution_time Jag är n gånger snabbare än Nisse. performancenisse performanceklasse execution_timeklasse execution_timenisse = = n Stockholm Göteborg ≈ 270 km 230 km/h 100 km/h 60 passagerare 2 passagerare 6000 60 2h 43 min 100 Buss 460 2 1h 10 min 230 Sportbil Passagerare*km/h Passagerare Till Göteborg km/h Tid att utföra en uppgift från start till slut: execution time, response time, latency. Mängden nyttigt arbete per tidsenhet: throughput, bandwidth.

slide-2
SLIDE 2

2

Execution Time = 1/Througput Det tar 4 månader att odla fram en tomat… ...men det betyder inte att vi endast kan odla fram 3 tomater på ett år. uppgifter/tidsenhet Throughput: tidsenheter/uppgift Execution Time: I de fall vi inte kan utföra flera uppgifter parallellt: Response Time Throughput

How long does it take for my job to run? How long does it take to execute a job? How long must I wait for the database query? How many jobs can the machine run at once? What is the average execution rate? How much work is getting done?

If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase? Vad bestämmer om ett program körs snabbt eller långsamt? Elapsed Time: counts everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes. CPU time: doesn't count I/O or time spent running

  • ther programs

can be broken up into system time, and user time. Our focus - user CPU time: time spent executing the lines

  • f code that are "in" our

program. Hur stort programmet är... dvs antal rader kod (LOC)... Antal instruktioner. Hur ofta processorn kan utföra en uppgift clock cycle time... Clock cycles per instruction (CPI).

k

  • m

p i l a t

  • r

Beror på hårdvaran!

time

seconds program = cycles program × seconds cycle

cycle time = seconds per cycle

?

Instead of reporting execution time in seconds, we often use cycles. Clock “ticks” indicate when to start activities (one abstraction) clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) (ps) s picosecond 250 12 10 9 10 4 1 = × × A 4 Ghz. clock has a cycle time

slide-3
SLIDE 3

3

So, to improve performance… everything else being equal you can either increase or decrease…

seconds program= cycles program× seconds cycle

________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate.

decrease increase decrease

CPUtime = Instruction_count x CPI x clock_cycle_time Instruction_count x CPI clock_rate CPUtime = -----------------------------------------------

Can measure the CPU execution time by running the program. These equations separate the three key factors that affect performance:

Clock Cycles per Instruction cycle_time = 1/clock_rate clock_rate = 1/cycle_time

The clock rate is usually given in the documentation. Can measure instruction count by using profilers/simulators without knowing all of the implementation details. CPI varies by instruction type and ISA implementation for which we must know the implementation details…

How many cycles are required for a program?

Could assume that number of cycles equals number of instructions: 1st instruction 2nd instruction 3rd instruction 4th 5th 6th ... time Är detta antagande korrekt?

Different numbers of cycles for different instructions

time

  • Multiplication takes more time than addition
  • Floating point operations take longer than

integer ones

  • Accessing memory takes more time than

accessing registers Changing the cycle time

  • ften changes the number of

cycles required for various instructions…

Memory-reference instructions: lw,lb,sw, sb Arithmetic-logical instructions: add, sub, and, or, slt Control flow instructions: beq, j Fetch

PC = PC+4

Decode Execute

Use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) Decode the instruction and read registers. Execute the instruction (possibly write registers).

State element 1 State element 3 Combinational Logic 2

clock

  • ne clock cycle

When can signals be read and written?

An edge-triggered methodology

1 read contents of state elements 2 send values through combinational logic 3 write results to one or more state elements Aha! A state element can be read and written in the same clock cycle! How long time to reach a stable state?

slide-4
SLIDE 4

4

A given program will require:

  • some number of instructions (machine instructions)
  • some number of cycles
  • some number of seconds

We have a vocabulary that relates these quantities:

  • cycle time (seconds per cycle)
  • clock rate (cycles per second)
  • CPI (cycles per instruction)

A a floating point intensive application might have a higher CPI

MIPS (millions of instructions per second) this would be higher for a program using simple instructions.

Determinates of CPU Performance

CPU time = Instruction_count x CPI x clock_cycle

Algorithm Technology Processor

  • rganization

ISA Compiler Programming language clock_cycle CPI Instruction_ count X X X X X X X X X X X X It is the portion of the computer visible to the programmer/compiler such as registers, instructions and memory access. In some sense, the instruction set architecture is defined by the set of assembly instructions that can be used and by what they do. The Instruction Set Architecture, or ISA, of a computer is the interface between the software and the hardware.

Summary: Evaluating ISAs

Design-time metrics:

Can it be implemented, in how long, at what cost? Can it be programmed? Ease of compilation?

Static Metrics:

How many bytes does the program occupy in memory?

Dynamic Metrics:

How many instructions are executed? How many bytes does the

processor fetch to execute the program?

How many clocks are required per instruction? How "lean" a clock is practical?

Best Metric: Time to execute the program!

CPI

  • Inst. Count

Cycle Time

depends on the instructions set, the processor organization, and compilation techniques.

Most “popular” instructions?

4% Other 1% Return 1% Call 4% Move register-register 5% Sub 6% And 8% Add 12% Store 16% Compare 20% Conditional branch 22% Load

10 simple instructions account for 96% of all instructions should make sure that they go fast because they are the common case. It is dubious that it’s worth implementing many other sophisticated functions

In this 1984 photograph, Stanford computer scientists, left to right, John Shott, John Hennessy and James D. Meindl brainstorm about the MIPS project, which simplified computing with RISC architecture. Photo: Chuck Painter.

Reduced Instruction Set Computer (RISC). A type of microprocessor architecture that utilizes a small, highly-optimized set of instructions, rather than a more specialized set of instructions often found in other types of architectures. By reducing the number of transistors and instructions to only those most frequently used, the computer would get more done in a shorter amount of time.

slide-5
SLIDE 5

5

CISC processors generally feature variable-length instructions and multiple addressing formats and have a small number of general-purpose registers. Intel's 80x86 family is the quintessential example of CISC. CISC (Complex Instruction Set Computer) is a retroactive definition that was introduced to distinguish the design from RISC

  • microprocessors. In contrast to

RISC, CISC chips have a large amount of different and complex instruction. Memory-to-memory: "LOAD" and "STORE" incorporated in instructions .

RISC CISC

Spends more transistors

  • n memory registers

Transistors used for storing complex instructions Low cycles per second, large code sizes Small code sizes, high cycles per second Register to register: "LOAD" and "STORE" are independent instructions Memory-to-memory: "LOAD" and "STORE" incorporated in instructions Single-clock, reduced instruction only Includes multi-clock complex instructions