Performance, Power CS301 Prof Szajda Performance Metrics (How do - - PowerPoint PPT Presentation

performance power
SMART_READER_LITE
LIVE PREVIEW

Performance, Power CS301 Prof Szajda Performance Metrics (How do - - PowerPoint PPT Presentation

Performance, Power CS301 Prof Szajda Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the best performance? 3 Performance One size does not fit all Depends on application domain


slide-1
SLIDE 1

Performance, Power

CS301 Prof Szajda

slide-2
SLIDE 2

Performance Metrics

(How do we compare two machines?)

slide-3
SLIDE 3

What to Measure?

3

Which airplane has the best performance?

slide-4
SLIDE 4

Performance

  • One size does not fit all
  • Depends on application domain

Scientific computing Graphics Databases General-Purpose desktop Beware of designing to benchmark!

  • Depends on technology characteristics

DRAM speed and capacity, chip size, etc.

slide-5
SLIDE 5

Which Metric Do We Use?

  • Response or execution time

Difgerence between start and end time Individual user cares most about this

  • Throughput

Total amount of work done in given time Frequently used for servers and clusters

  • How are these afgected by

Replacing processor with faster version? Adding more processors?

slide-6
SLIDE 6

Execution Time

  • Shorter execution time is better
  • Allows comparison between 2

machines

slide-7
SLIDE 7

Relative Performance

  • “X is n times faster than Y”
  • Example:

Machine A takes 10s to run program Machine B takes 15s to run same program What is the performance ratio?

slide-8
SLIDE 8

Difgerent Time Values

  • Execution time

Wall-clock, response, or elapsed time Includes everything (processing,I/O, OS overhead, etc)! Determines system performance

  • CPU time

Time spent executing code for this task only Does not include I/O or time-sharing Comprises user CPU time and system CPU time

Difgerent programs are afgected difgerently by CPU and system performance

man time

  • 90.7u 12.9s 2:39 65%

User: 90.7 sec System: 12.9 sec Elapsed time: 2 min 39 sec

slide-9
SLIDE 9

Clock Cycles

  • Instead of expressing time in seconds, use

clock cycles

  • Clock

Determines when events take place Runs at constant rate (ex. 1 GHz) Easy to convert between clock rate and seconds

Clock rate = 1 / Clock Cycle time 500 MHz = 1 / (2 ns) 1 ns = 10-9 s

slide-10
SLIDE 10

Chapter 1 — Computer Abstractions and Technology —

CPU Clocking

Operation of digital hardware governed by a

constant-rate clock

Clock (cycles) Data transfer
 and computation Update state Clock period

Clock period: duration of a clock cycle

e.g., 250ps = 0.25ns = 250×10–12s

Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0×109Hz

slide-11
SLIDE 11

Chapter 1 — Computer Abstractions and Technology —

An Aside

slide-12
SLIDE 12

Chapter 1 — Computer Abstractions and Technology —

CPU Time

Performance improved by

Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off

clock rate against cycle count

slide-13
SLIDE 13

Chapter 1 — Computer Abstractions and Technology —

CPU Time Example

Computer A: 2GHz clock, 10s CPU time Designing Computer B

Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles

How fast must Computer B clock be?

slide-14
SLIDE 14

Chapter 1 — Computer Abstractions and Technology —

Instruction Count and CPI

Instruction Count for a program

Determined by program, ISA and compiler

Average cycles per instruction

Determined by CPU hardware If different instructions have different CPI

Average CPI affected by instruction mix

slide-15
SLIDE 15

Chapter 1 — Computer Abstractions and Technology —

CPI Example

Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much?

A is faster… …by this much

slide-16
SLIDE 16

Application Characteristics

  • Determine the mix of difgerent

instruction types

Integer arithmetic Logical operations Floating point arithmetic Loads and stores

  • Difgerent applications have difgerent

CPI because of difgerent instruction mixes

slide-17
SLIDE 17

Chapter 1 — Computer Abstractions and Technology —

CPI in More Detail

If different instruction classes take

different numbers of cycles

Weighted average CPI

Relative frequency

slide-18
SLIDE 18

Chapter 1 — Computer Abstractions and Technology —

CPI Example

Alternative compiled code sequences using

instructions in classes A, B, C

Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1

Sequence 1: IC = 5

Clock Cycles


= 2×1 + 1×2 + 2×3
 = 10

  • Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6

Clock Cycles


= 4×1 + 1×2 + 1×3
 = 9

  • Avg. CPI = 9/6 = 1.5
slide-19
SLIDE 19

Chapter 1 — Computer Abstractions and Technology —

Performance Summary

Performance depends on

Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC,

CPI, Tc

The BIG Picture

slide-20
SLIDE 20

Amdahl’s Law

  • How much speedup do you get from an

enhancement?

  • Based on

Fraction of time enhancement used Improvement in enhanced mode

Speedup = Execution time w/o enhancement Execution time w/ enhancement Execnew = Execold × ((1-fractionenh) + Speedupenh fractionenh )

slide-21
SLIDE 21

Chapter 1 — Computer Abstractions and Technology —

Pitfall: Amdahl’s Law

Improving an aspect of a computer and

expecting a proportional improvement in

  • verall performance

§1.10 Fallacies and Pitfalls

Can’t be done!

Example: multiply accounts for 80s/100s

How much improvement in multiply performance

to get 5× overall?

Corollary: make the common case fast

slide-22
SLIDE 22

Review Question

  • Your machine has a clock rate of

2.4GHz. How long is the clock cycle?

slide-23
SLIDE 23

Review Questions

  • Suppose you are given the following:

Machine A

1 GHz Average CPI = 1.6 Instructions = 1.7 Billion

Machine B

3.3 GHz Average CPI = 6.1 Instructions = 2 Billion

  • Which machine is faster? By how

much?

slide-24
SLIDE 24

Review Questions

  • What is the average CPI for a machine

with the following CPIs on an application with the following instruction frequency?

Type Frequen cy CPI Arithmeti c 0.45 1 Memory 0.3 8 Control 0.2 3 Mult/Div 0.05 5

slide-25
SLIDE 25

Review Questions

  • What factors must be included when

comparing the relative performance of two machines?

slide-26
SLIDE 26

Amdahl’s Law

  • Suppose you have an enhancement

that makes a functional unit 10x faster.

  • Speedup if used 5% of the time?
  • Speedup if used 40% of the time?

Execnew = Execold × ((1-fractionenh) + Speedupenh fractionenh )

slide-27
SLIDE 27

Review Questions

  • What is the equation for execution

time?

  • What does Amdahl’s Law say?
slide-28
SLIDE 28

Benchmarks

  • Programs specifically used to measure

performance

  • Hope is that it is representative of how

computer will be used

  • Examples

SPEC Integer and Floating Point MediaBench MineBench TPC

slide-29
SLIDE 29

Chapter 1 — Computer Abstractions and Technology —

SPEC CPU Benchmark

Programs used to measure performance

Supposedly typical of actual workload

Standard Performance Evaluation Corp

(SPEC)

Develops benchmarks for CPU, I/O, Web, …

SPEC CPU2006

Elapsed time to execute a selection of programs

Negligible I/O, so focuses on CPU performance

Normalize relative to reference machine Summarize as geometric mean of performance

ratios

CINT2006 (integer) and CFP2006 (floating-point)

slide-30
SLIDE 30

Chapter 1 — Computer Abstractions and Technology —

CINT2006 for Intel Core i7 920

slide-31
SLIDE 31

Chapter 1 — Computer Abstractions and Technology —

Recent Concern: Power

In CMOS IC technology

§1.7 The Power Wall ×1000 ×40 5V → 1V

slide-32
SLIDE 32

Tricks to Increase Power

  • Attach large cooling devices
  • Turn ofg parts of chips not used in

given clock cycle

Can increase power to 300 watts... ...But these and other ways all prohibitively expensive for desktop

  • computers. So...

32

slide-33
SLIDE 33

More Recent Approaches:
 Chip Multiprocessors

  • Reasons for change

Limited opportunities to improve single thread performance Power On-chip communication latencies

slide-34
SLIDE 34

Chapter 1 — Computer Abstractions and Technology —

Uniprocessor Performance

§1.8 The Sea Change: The Switch to Multiprocessors

Constrained by power, instruction-level parallelism, memory latency

slide-35
SLIDE 35

Chapter 1 — Computer Abstractions and Technology —

Multiprocessors

Multicore microprocessors

More than one processor per chip

Requires explicitly parallel

programming

Compare with instruction level parallelism

Hardware executes multiple instructions at

  • nce

Hidden from the programmer

Hard to do

Programming for performance Load balancing Optimizing communication and synchronization

slide-36
SLIDE 36

Chapter 1 — Computer Abstractions and Technology —

Concluding Remarks

Cost/performance is improving

Due to underlying technology development

Hierarchical layers of abstraction

In both hardware and software

Instruction set architecture

The hardware/software interface

Execution time: the best performance

measure

Power is a limiting factor

Use parallelism to improve performance

§1.9 Concluding Remarks