A Power-Aware Online Scheduling Algorithm for Streaming - - PowerPoint PPT Presentation

a power aware online scheduling algorithm for streaming
SMART_READER_LITE
LIVE PREVIEW

A Power-Aware Online Scheduling Algorithm for Streaming - - PowerPoint PPT Presentation

PATMOS 2010: 7-10 September 2010,Grenoble,France A Power-Aware Online Scheduling Algorithm for Streaming Applications in Embedded MPSoC T. Sassolas, N. Ventroux, G. Blanc CEA LI ST, Em bedded Com puting Laboratory contact:


slide-1
SLIDE 1

A Power-Aware Online Scheduling Algorithm for Streaming Applications in Embedded MPSoC

PATMOS 2010: 7-10 September 2010,Grenoble,France

  • T. Sassolas, N. Ventroux, G. Blanc

CEA LI ST, Em bedded Com puting Laboratory contact: tanguy.sassolas@cea.fr

slide-2
SLIDE 2

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

2

slide-3
SLIDE 3

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

3

slide-4
SLIDE 4

Context of the Study

  • Embedded systems must support

Various application domains more computation intensive applications Application that become more and more dynamic

  • Move to multiprocessor Architectures

1 GOPS 0.1 10 100 1 TOPS

HD Audio

Multimedia

OpenGL1.1 OpenGL 2.0 H264 Digital TV Mobile multimedia MPEG2 3D Graphics UMTS EDGE GPRS GSM WIMAX 3GPP-LTE SDR

Telecom

DVB-S2

4

slide-5
SLIDE 5

T3 D2 T3 D3 T3 D1

Multiprocessor issues

  • Need to maximize resource usage

Increase task parallelism Streaming applications Set of tasks with data dependencies Scheduling of dependent tasks Execution speed determined by slowest task

5 T3 D2 T3 D3 T3 D1

Worst case execution Some energy savings can be performed

T1 T2 T3 in

  • ut
  • Need to reduce power consumption

Real case execution Dynamism implies loss of energy Need of a dynamic control

T1 D3 T2 D2 T2 D3 T2 D4 T1 D4 T1 D5 P2 P1 P0 time T1 D2 T2 D1 T1 D2 Slack Slack Slack Slack

slide-6
SLIDE 6

DVFS vs DPM

  • Dynamic Voltage and Frequency Scaling (DVFS)

Low mode switching penalty Reduces mainly dynamic power consumption

T1 T2

time Power

T1 T2

time Power

6

  • Dynamic power management (DPM)

High energy and time switching penalty Reduces both static and dynamic power consumption

  • Optimal functioning points are highly

dependent on the technological process

slide-7
SLIDE 7

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

7

slide-8
SLIDE 8

Streaming schedulings: offline solutions

  • Scheduling on a multiprocessor is an NP complete problem [1]
  • Adding power optimization adds complexity
  • Monoprocessor solutions [2] [3] …

Find minimum power consumption given data production rate and communication buffer sizes With DPM or DVFS functionalities Variable production rate following probability rule

  • Multiprocessor solutions

Minimize energy consumption by finding optimum number of resources and their speed to meet QoS requirements [4] Various models : communication costs, consumption model, optimization techniques… [5]

  • But regular workload was assumed: Application dynamism imposes
  • nline solutions

8

slide-9
SLIDE 9

Streaming scheduling: Online solutions

  • Monoprocessor

Slack time reclamation: GSR [6] Offline and online partitioning

9 T1 D2 T2 D2 T3 D2 T1 D2 T2 D2 T3 D2 T1 D2 T2 D2 T4 D2 T3 D2

Resulting execution : No slack time! Buffer added

  • Multiprocessor

Many solutions for independent tasks -> do not apply Partitioning -> apply monoprocessor solution to every processor [7] [8]

T0 T1 T3 T2

  • ut

in T1 D2 T2 D2 T4 D2 T3 D2 time time

P1 P0 P0 P1 P0 P0

slide-10
SLIDE 10

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

10

slide-11
SLIDE 11

Power-aware streaming application scheduling

  • Properties

Throughput constrained by slowest task

» Other tasks can be slowed down to reach the same throughput -> DVFS

Task deeper in the pipeline can be blocked waiting for available data

» Preemption mechanisms are required for a higher resource usage rate » Unused resources can be shut down -> DPM

  • Objective: keep the throughput while making substantial

energy savings

11

slide-12
SLIDE 12

Static Priorities

  • If PE number < task number : need to specify static priority

Describes the position in the pipeline Allows to execute oldest data first. Prevents to buffer instead of executing critical tasks

12 T0 T1 T3 T4 T2 in

  • ut

Prio = 0 Prio = 1 Prio = 2 Prio = 3

slide-13
SLIDE 13

buffers monitors

Buffer full threshold : Preempt Writer Buffer empty threshold: Preempt Reader(s) Buffer filling threshold : reduce DVFS couple of Writer Buffer emptying threshold: increase DVFS couple of Writer Change QoS Change QoS 13

  • Priority impact

Task is blocked Task executes at fastest speed Application priority Task priority

slide-14
SLIDE 14

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

14

slide-15
SLIDE 15

Consumption model

  • SESAM[9] simulation environment

SystemC AT-TLM IP: Noc, caches, memories… Processors ArchC ISS [10] Statistics

Turbo A=1,B=1 Consumption 923 mW Half-Turbo A=1,B=2 Consumption 390 mW Deep Idle A=0,B=1 Consumption 64 mW 1 µs 1 µs 2 µs 3 µs 2 µs 3 µs 15

  • Modified ArchC models

MIPS32 ISS annotated with PXA270 [11] PSM mode power consumption Execution speed variation Mode switching penalties

» Energy » t ime

slide-16
SLIDE 16

Implemented platform

16

Central memory CPU Controller

Processing elements Shared memory banks

Scheduling Algorithm

Task 1 Task 2

T1 T2 in T1 T2 D1

D1

Threshold reached

Task 1

slide-17
SLIDE 17

The scheduling loop

17

Update Dynamic task priorities

Buffer Statuses Task Statuses

Order task along with priority Keep already allocated tasks on the same PE Allocate remaining tasks on remaining PE Update PE consumption mode along with buffer status

Execution / preemption demands Mode switching demands

slide-18
SLIDE 18

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

18

slide-19
SLIDE 19

The WCDMA test case

  • Wideband code division multiple access application [12]

13 tasks Variable workload : pilot frame once every 10 frames Irregular pipeline task lengths

19

slide-20
SLIDE 20

Results – Energy saving

  • 3 scheduling solutions: Standard, DPM only and DVFS + DPM
  • Substantial energy saving

10 20 30 40 50 60 70 80 90 100 1 2 4 8 13 16

E nergy saving (% ) PE effective occ upancy (% ) Number of PE

S tandard dpm only dpm + dvfs energy saving dpm only energy saving dpm+dvfs

20

slide-21
SLIDE 21

Results – Execution time

  • No deviation in execution time

21

slide-22
SLIDE 22

Results – pipeline balancing

  • Blocked states are reduced by the use of DVFS
  • More could be achieved with other DVFS couples

22

slide-23
SLIDE 23

Table of content

  • Context
  • Previous works
  • Proposed solution
  • Implementation
  • Results
  • Conclusion

23

slide-24
SLIDE 24

Conclusion

  • Power reduction for variable pipeline

Substantial powers saving when PE load drops : 45% on 13 processors No performance loss Light execution to reduce control overhead

  • This work was partly funded by project SCALOPES

(ARTEMIS)

  • Upcoming works

Implementation on hardware multiprocessor platform Evaluation with other applications from various domains Evaluation of optimal buffer sizes

24

slide-25
SLIDE 25

References

  • [1] M. L. Dertouzos and A. K. Mok. Multiprocessor Online Scheduling of Hard-Real-

Time Tasks. IEEE Transactions on Software Engineering, 15(12):1497-1506, 1989.

  • [2] Y.-H. Lu, L. Benini, and G. De Micheli. Dynamic Frequency Scaling with Buffer

Insertion for Mixed Workloads. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(5):1284-1305, 2002.

  • [3] N. Pettis, L. Cai, and Y.-H. Lu. Statistically Optimal Dynamic Power Management

for Streaming Data. IEEE Transactions on Computers, 55(7):800-814, 2006.

  • [4] Xu, R., Melhem, R., and Mosse, D. 2007. Energy-Aware Scheduling for Streaming

Applications on Chip Multiprocessors. In Proceedings of the 28th IEEE international Real-Time Systems Symposium (RTSS),pages 25-38, 2007.

  • [5] L. Benini, D. Bertozzi, A. Guerri, and M. Milano. Allocation, Scheduling and Voltage

Scaling on Energy Aware MPSoCs. In Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CPAIOR), pages 44-58, 2006.

  • [6] D. Mosse, H. Aydin, B. Childers and R. Melhem. Compiler-Assisted Dynamic

Power-Aware Scheduling for Real-Time Applications, In Workshop on Compilers and Operating Systems for Low Power, 2000.

25

slide-26
SLIDE 26

References

  • [7] P. Choudhury, P. P. Chakrabarti, and R. Kumar. Online Dynamic Voltage Scaling

using Task Graph Mapping Analysis for Multiprocessors. In International Conference

  • n VLSI Design (VLSID), pages 89-94, 2007.
  • [8] S. Hua, G. Qu, and S. S. Bhattacharyya. Energy-Ecient Embedded Software

Implementation on Multiprocessor System-on-Chip with Multiple Voltages. ACM Transactions on Embedded Computing Systems (TECS), 5(2):321-341, 2006.

  • [9] N. Ventroux, A. Guerre, T. Sassolas, L. Moutaoukil, C. Bechara, and R. David.

SESAM: an MPSoC Simulation Environment for Dynamic Application Processing. In IEEE International Conference on Embedded Software and Systems (ICESS),2010.

  • [10] M. Bartholomeu G. Araujo C. Araujo R. Azevedo, S. Rigo and E. Barros. The

ArchC Architecture Description Language and Tools. Parallel Programming, 33(5):453–484, 2005.

  • [11] Intel PXA27x Processor Family, Electrical, Mechanical, and Thermal

Specication,2005.

  • [12]A. Richardson. WCDMA Design Handbook. 2006.

26

slide-27
SLIDE 27

Thank you for your attention We value your opinion and questions