Dynamic Compilation for Reducing Dynamic Compilation for Reducing - - PowerPoint PPT Presentation

dynamic compilation for reducing dynamic compilation for
SMART_READER_LITE
LIVE PREVIEW

Dynamic Compilation for Reducing Dynamic Compilation for Reducing - - PowerPoint PPT Presentation

Dynamic Compilation for Reducing Dynamic Compilation for Reducing Energy Consumption of I/O- -Intensive Intensive Energy Consumption of I/O Applications Applications Seung Woo Son, , Guangyu Guangyu Chen, Chen, Alok Choudhary Choudhary


slide-1
SLIDE 1

Dynamic Compilation for Reducing Dynamic Compilation for Reducing Energy Consumption of I/O Energy Consumption of I/O-

  • Intensive

Intensive Applications Applications

Seung Woo Son Seung Woo Son, , Guangyu Guangyu Chen, Chen, Mahmut Kandemir Mahmut Kandemir Dep Dept.

  • t. of
  • f CSE

CSE Pennsylvania State University Pennsylvania State University { {sson,gchen,kandemir}@cse.psu.edu sson,gchen,kandemir}@cse.psu.edu The 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC 05) October 20~22, 2005 Alok Alok Choudhary Choudhary Dep Dept.

  • t. of
  • f ECE

ECE Northwestern Northwestern University University choudhar choudhar@ @ece.northwestern ece.northwestern.edu .edu

slide-2
SLIDE 2

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 2 2

Outline Outline

Motivation Motivation Dynamic Compilation Dynamic Compilation Our Dynamic Compilation Framework Our Dynamic Compilation Framework

– – Dynamic compiler/linker Dynamic compiler/linker – – Metadata manager Metadata manager – – Layout manager Layout manager – – High High-

  • level I/O library

level I/O library

Experimental Results Experimental Results Conclusion Conclusion

slide-3
SLIDE 3

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 3 3

Motivation Motivation

Tera Tera-

  • scale high

scale high-

  • performance computing

performance computing has enabled scientists to tackle very has enabled scientists to tackle very large and computationally challenging large and computationally challenging problems problems

– – Data Data-

  • intensive, I/O

intensive, I/O-

  • intensive, and energy

intensive, and energy consuming consuming

To cope with larger problems and data To cope with larger problems and data sizes, models and applications need to sizes, models and applications need to be dynamic in nature be dynamic in nature

slide-4
SLIDE 4

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 4 4

*Source: Terascale Data Management, LLNL.

I/O is bottleneck I/O is bottleneck

slide-5
SLIDE 5

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 5 5

Energy Consumption? Energy Consumption?

I/O 26% Memory 7% Processor 17% Others 10% Cooling 40%

*Source: Mike Rosenfield, ACEED, February 2003.

slide-6
SLIDE 6

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 6 6

Related Work Related Work

Academic/industry Academic/industry-

  • based dynamic compilers

based dynamic compilers

– – Dynamo, DAISY, PIN, Dynamo, DAISY, PIN, DyC DyC, , … …

All efforts focused on enhancing the All efforts focused on enhancing the performance, i.e., their goal is to reduce the performance, i.e., their goal is to reduce the execution cycles execution cycles Recently, dynamic voltage/frequency scaling Recently, dynamic voltage/frequency scaling technique proposed using dynamic compilation technique proposed using dynamic compilation

  • > focused on reducing processor

> focused on reducing processor’ ’s energy s energy consumption [MICRO consumption [MICRO-

  • 38]

38]

slide-7
SLIDE 7

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 7 7

Our Goal Our Goal

To capture high To capture high-

  • level dynamic behaviors

level dynamic behaviors in the I/O in the I/O-

  • intensive applications using

intensive applications using dynamic compilers dynamic compilers Propose a dynamic compilation Propose a dynamic compilation framework for I/O framework for I/O-

  • intensive

intensive applications applications

– – Dynamic compiler/linker, metadata manager, Dynamic compiler/linker, metadata manager, high high-

  • level I/O library, and layout manager

level I/O library, and layout manager

slide-8
SLIDE 8

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 8 8

Why Dynamic Compilation? Why Dynamic Compilation?

Dynamic compilation exploits run Dynamic compilation exploits run-

  • time

time state to generate code that is specific state to generate code that is specific to run to run-

  • time behavior

time behavior Large Large-

  • scale scientific applications

scale scientific applications exhibit the changes in data access exhibit the changes in data access patterns patterns

– – Simulation runs, post Simulation runs, post-

  • processing, and

processing, and analysis analysis – – Large quantities of data are generated and Large quantities of data are generated and frequent data layout changes occur frequent data layout changes occur

slide-9
SLIDE 9

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 9 9

Application Codes Application Codes

51,114 46 104.0GB Remote Sensing Database RSense 2.0 49,518 11 106.1GB Quantum Chemistry SCF 3.0 42,905 31 95.5GB 3D Visualization Visuo 36,076 27 87.4GB Sparse Cholesky Factorization Cholesky 39,451 19 96.6GB Fast Fourier Transform FFT 57,322 38 153.3GB Astrophysics AST Energy consumed (J) Number of Phase Changes Data Size Description Application Name

slide-10
SLIDE 10

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 10 10

Framework overview Framework overview

Dynamic Compiler/Linker HLL Mini Database (Metadata Manager) Layout Manager Parallel, Hierarchical Storage System Application

slide-11
SLIDE 11

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 11 11

Dynamic Compiler/linker Dynamic Compiler/linker

Steering Unit Performance Tracer

Data Access Pattern Performance Statistics

Dynamic Compiler Dynamic Linker

Compilation Request Linking Request Suggestions to Layout Manager

slide-12
SLIDE 12

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 12 12

Optimization Rules Optimization Rules

Prestaging Prestaging PRE PRE Subfiling Subfiling SUB SUB Collective I/O Collective I/O CIO CIO Data Purging Data Purging DP DP Data Migration Data Migration DM DM Setting Striping Unit Setting Striping Unit SSU SSU Replacement Policy Selection Replacement Policy Selection POL POL Strided Strided Prefetching Prefetching STD STD Sequential Sequential Prefetching Prefetching SP SP Multi Multi-

  • collective I/O

collective I/O MCIO MCIO Optimization Optimization Opt rule Opt rule

slide-13
SLIDE 13

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 13 13

Optimization Rules Optimization Rules

Collective I/O (CIO) Collective I/O (CIO)

– – Invoked if access pattern of the data is Invoked if access pattern of the data is different from its storage pattern, and different from its storage pattern, and multiple processors are used to access the multiple processors are used to access the data data

Subfiling Subfiling (SUB) (SUB)

– – Invoked if a small Invoked if a small subregion subregion of a file is

  • f a file is

accessed with high temporal locality accessed with high temporal locality

slide-14
SLIDE 14

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 14 14

Example: CIO Example: CIO

Parallel Independent I/O

column-wise access pattern column-wise access pattern column-major storage layout row-major storage layout

Collective I/O and and

slide-15
SLIDE 15

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 15 15

Experiment Experiment – – application codes application codes

51,114 46 104.0GB Remote Sensing Database RSense 2.0 49,518 11 106.1GB Quantum Chemistry SCF 3.0 42,905 31 95.5GB 3D Visualization Visuo 36,076 27 87.4GB Sparse Cholesky Factorization Cholesky 39,451 19 96.6GB Fast Fourier Transform FFT 57,322 38 153.3GB Astrophysics AST Energy consumed (J) Number of Phase Changes Data Size Description Application Name

slide-16
SLIDE 16

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 16 16

Simulation parameters Simulation parameters

Parallel processors: total 16 Parallel processors: total 16

– – 1.8 GHz with 2MB 4 1.8 GHz with 2MB 4-

  • way set

way set-

  • associative cache, 1GB

associative cache, 1GB main memory main memory – – Energy consumption measured using Energy consumption measured using Wattch Wattch [ISCA [ISCA’ ’00] 00]

Parallel disks Parallel disks

– – 8*18GB disks with low 8*18GB disks with low-

  • power mode (spin

power mode (spin-

  • down)

down) – – TPM disk power model [ISCA TPM disk power model [ISCA’ ’03] 03]

Interconnect Interconnect

– – 2D mesh 2D mesh – – Infiniband Infiniband switch/link power model [ISLPED switch/link power model [ISLPED’ ’03] 03]

slide-17
SLIDE 17

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 17 17

Architecture Considered Architecture Considered

P/M P/M P/M P/M

I/O Network Interprocessor Communication Network Disk Subsystem Tape Subsystem

slide-18
SLIDE 18

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 18 18

Normalized Energy Consumption Normalized Energy Consumption

AST

10 20 30 40 50 60 70 80 90 100 110 C I O M C I O S P S T D P O L S S S D M D P P R E S U B M C I O + P O L M C I O + P O L + S T D B E S T Normalized Energy (%)

<AVERAGE> Hand-Optimized: 19.8% Our Approach: 16.1%

slide-19
SLIDE 19

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 19 19

Breakdown of Dynamic Breakdown of Dynamic Compilation Energy Compilation Energy

Dynamic Compiler Dynamic Linker Performance Tracer Steering Unit

slide-20
SLIDE 20

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 20 20

Sensitivity Analysis Sensitivity Analysis – – # of processors # of processors

  • 60

70 80 90 100 2 4 8 16 32 64 128 Number of Processors Normalized Energy (%)

AST FFT Cholesky Visuo SCF3.0

Rsense 2.0
slide-21
SLIDE 21

10/22/2005 10/22/2005 LCPC 2005 LCPC 2005 21 21

Conclusion Conclusion

Proposed a dynamic compilation Proposed a dynamic compilation framework for I/O framework for I/O-

  • intensive

intensive applications applications

– – Composed of four components Composed of four components – – Employ a set of I/O optimizations Employ a set of I/O optimizations

Reduce energy consumption of I/O Reduce energy consumption of I/O-

  • intensive applications

intensive applications

slide-22
SLIDE 22

Thank you! Thank you!

sson@cse.psu.edu sson@cse.psu.edu