Application-controlled Frequency Scaling
Jons-Tobias Wamhoff Stephan Diestelhorst Christof Fetzer Technische Universität Dresden, Germany Patrick Marlier Pascal Felber Université de Neuchâtel, Switzerland Dave Dice Oracle Labs, USA
Application-controlled Frequency Scaling Jons-Tobias Wamhoff - - PowerPoint PPT Presentation
Application-controlled Frequency Scaling Jons-Tobias Wamhoff Stephan Diestelhorst Christof Fetzer Technische Universitt Dresden, Germany Patrick Marlier Pascal Felber Universit de Neuchtel, Switzerland Dave Dice Oracle Labs, USA
Jons-Tobias Wamhoff Stephan Diestelhorst Christof Fetzer Technische Universität Dresden, Germany Patrick Marlier Pascal Felber Université de Neuchâtel, Switzerland Dave Dice Oracle Labs, USA
bottlenecks/serial peak loads
properties of applications
2
(MSRs, privileged rdmsr/wrmsr)
savings
3
Pturbo Pbase Pslow
… …
C0 C1-Cn
halted frequency/voltage
4
Pbase Pbase Pbase Pbase Pturbo ≥C1 ≥C1 ≥C1 Pturbo Pslow Pslow Pslow HT HT x86 FPU x86
5
fPbase tCS twait
Acquireentry Acquireexit Release
time
fCS = fbase · tCS tA+CS+R
ENORM = Esample · tA+CS+R tCS
6
fPturbo fPbase twait tCS tPbase→Chalt tChalt→Pbase tPturbo→Pbase tramp
OS halt: entry, wakeup CPU deeper C-state boosted P-state
0.0 1.4 3.1 4.0 fCS (GHz)
Frequency AMD
0.0 0.8 3.4 3.9
Frequency Intel
103 104 105 106 107 SizeCS (cycles, log) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ENORM (kWh)
Energy AMD
spin futex
102 103 104 105 106 107 SizeCS (cycles, log) 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Energy Intel
7
↑ 1.5M ↑ 4M 1M, twait = 7M ↓ 10k twait = 70k ↓
8
fPturbo fPbase fPslow twait tCS tPbase→Pslow tPslow→Pturbo tPturbo→Pbase tramp
ioctl 1k 1k 1k wrmsr 28k 2k 23k transition 2k 225k 1k
103 104 105 106 107 108 SizeCS (cycles, log) 0.0 1.4 3.1 4.0 fCS (GHz)
Frequency AMD
103 104 105 106 107 108 SizeCS (cycles, log) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ENORM (kWh)
Energy AMD
spin
dlgt mgrt
9
↖ 600k ↗ 200k ↑
400k
futex: 1.5M
10
https://bitbucket.org/donjonsn/turbo
Linux kernel and hardware interfaces Hardware abstraction Topology PCI-Configuration MSR-Interface
PerfEvent
Performance configuration Thread
P-States
PerformanceMonitor
Execution control ThreadRegistry
ThreadControl
up to 50% speedup with only 2% more energy
9% speedup but 22% higher frequency
11
12