Jun He 1 , Huaiming Song 1 , Xian-He Sun 1 , Yanlong Yin 1 , Rajeev - - PowerPoint PPT Presentation

jun he 1 huaiming song 1 xian he sun 1 yanlong yin 1
SMART_READER_LITE
LIVE PREVIEW

Jun He 1 , Huaiming Song 1 , Xian-He Sun 1 , Yanlong Yin 1 , Rajeev - - PowerPoint PPT Presentation

Pattern-Aware File Reorganization in MPI-IO Jun He 1 , Huaiming Song 1 , Xian-He Sun 1 , Yanlong Yin 1 , Rajeev Thakur 2 1: Illinois Institute of Technology, Chicago, Illinois 2: Argonne National Laboratory, Argonne, Illinois PDSW11 Outline


slide-1
SLIDE 1

PDSW’11

Pattern-Aware File Reorganization in MPI-IO

Jun He1, Huaiming Song1, Xian-He Sun1, Yanlong Yin1, Rajeev Thakur2

1: Illinois Institute of Technology, Chicago, Illinois 2: Argonne National Laboratory, Argonne, Illinois

slide-2
SLIDE 2

PDSW’11

Outline

  • Motivation
  • Examples
  • Basic idea
  • Design
  • System Overview
  • Trace collecting
  • Pattern classification
  • I/O Trace analyzer
  • Remapping table
  • MPI-IO remapping layer
  • Evaluation
  • Remapping overhead
  • Pattern variation
  • Benchmarks
  • Conclusion & Future Work
slide-3
SLIDE 3

PDSW’11

Motivation

slide-4
SLIDE 4

PDSW’11

Parallel File Systems

  • Important Factors
  • Number of requests
  • Contiguousness of accesses

Network overhead IOPS Locality … A typical parallel file system

slide-5
SLIDE 5

PDSW’11

Mismatch

  • Logical data
  • Developer’s understanding, for programmability and

runtime performance

  • -> Logical organization -> Access pattern
  • Physical data
  • Where the data blocks are stored
  • -> Physical data organization

Good logical organization

!=

Good physical organization for better I/O performance

slide-6
SLIDE 6

PDSW’11

A Tiny Example for Irregular Data

1 2 3 4 5 6 7 8 9

Potential benefit: Better spatial locality Easier for some optimization to take effect Less disk head movements …

3 5 8 7 4 2 1 0 9 6

Programmer’s view Also file system’s view

slide-7
SLIDE 7

PDSW’11

An Example for Regular 2-d Array

Default Organization

A 2-D array

slide-8
SLIDE 8

PDSW’11

Read a Subarray

A 2-D array

slide-9
SLIDE 9

PDSW’11

After Re-organizing

slide-10
SLIDE 10

PDSW’11

A Messier One

  • Irregular data
  • Very complex data model
  • Computation which involves multiple data fields
slide-11
SLIDE 11

PDSW’11

Pattern-Aware Reorganization

  • Be aware of repeating non-contiguous access patterns
  • n-d strided and irregular
  • Try to reorganize the data so that data is contiguous.
  • Less network overhead
  • Less IO operations
  • Better locality
  • Beneficial for other optimizations, e.g. data sieving…
  • Motivating Scenarios
  • Application start-up
  • Data analysis, visualization
  • Where it does not apply
  • Patterns do not repeat from run to run.
slide-12
SLIDE 12

PDSW’11

Design

slide-13
SLIDE 13

PDSW’11

System Overview

Remapping Table Application I/O Client I/O Traces MPI-IO I/O Trace Analyzer Remapping Layer

slide-14
SLIDE 14

PDSW’11

Trace Collecting

  • Wrap the original function call
  • Add recording function
  • Call original function inside
  • Process ID, MPI rank, file path, type of operation,
  • ffset, length, data type, time stamp, and file view

Remapping Table Application I/O Client I/O Traces MPI-IO I/O Trace Analyzer Remapping Layer

slide-15
SLIDE 15

PDSW’11

Pattern Classification

Spatial Pattern  Contiguous  Non-contiguous  Fixed strided  2d-strided  Negative strided  Random strided  kd-strided  Combination of contiguous and non-contiguous patterns Repetition  Single occurrence  Repeating  Fixed  Variable Temporal Intervals  Fixed  Random  Small  Medium  Large Request Size I/O Operation  Read only  Write only  Read/write

slide-16
SLIDE 16

PDSW’11

I/O Trace Analyzer

  • Pattern matching
  • Sort Traces by time
  • Separate by process
  • Find out patterns
  • I/O Signature

{I/O operation, initial position, dimension, ([{offset Pattern}, {request size pattern}, {pattern of number of repetitions}, {temporal pattern}], [...]), # of repetitions}

Remapping Table Application I/O Client I/O Traces MPI-IO I/O Trace Analyzer Remapping Layer

slide-17
SLIDE 17

PDSW’11

I/O-signature-based Remapping Table

Old New File, {MPI_READ, offset0, 1, ([(hole size, 1), LEN, 1]), 4} Offset0’

Remapping Table Application I/O Client I/O Traces MPI-IO I/O Trace Analyzer Remapping Layer

LEN LEN LEN LEN Offset 0' Offset 1' Offset 2' Offset 3' Offset 0 Offset 1 Offset 3 Offset 2

Example, 1-d strided

slide-18
SLIDE 18

PDSW’11

MPI-IO Remapping Layer

  • Convert old offsets to new ones

Example:

  • Read m bytes data from offset f.
  • Whether this access falls in a 1-d strided pattern ?
  • starting offset off
  • read size rsz
  • hole size hsz
  • number of accesses of this pattern n
  • (f-off)/(rsz+hsz) <n

(1)

  • (f-off)%(rsz+hsz) = 0

(2)

  • m = rsz

(3)

newoff = off+rsz*(f-off)/(rsz+hsz)

Remapping Table Application I/O Client I/O Traces MPI-IO I/O Trace Analyzer Remapping Layer

slide-19
SLIDE 19

PDSW’11

Evaluation

slide-20
SLIDE 20

PDSW’11

Experiment Environment

  • Dual 2.3GHz Opteron quad-core processors
  • 8G memory
  • 250GB 7200RPM SATA hard drive
  • 100GB PCI-E OCZ Revodrive X2 SSD (read: up to 740

MB/s, write: up to 690 MB/s).

  • Ethernet/Infiniband
  • Ubuntu 9.04 (Linux kernel 2.6.28-11-server)
  • PVFS2 2.8.1: stripe size 64 KB
  • MPICH2 1.3.1
slide-21
SLIDE 21

PDSW’11

Remapping Overhead

Table Type Size (bytes) Building time (sec) Time of 1,000,000 lookups (sec) 1-to-1 64,000,000 0.780287 0.489902 I/O Signature 28 0.000000269 0.024771

1-D Strided Remapping Table Performance (1,000,000 accesses)

Who use 1-to-1: PLFS uses 1-to-1 mapping table in index file. Most OS file systems also use similar table to store free blocks in disk.

slide-22
SLIDE 22

PDSW’11

Request Size Variation

  • X: different of request size. For example, 5% means

the actual request size is 5% less than the one assumed.

slide-23
SLIDE 23

PDSW’11

Variation of Starting Offset

  • X: difference of starting offsets. 5% means that the

starting offset moved to the 5%th of the whole access.

slide-24
SLIDE 24

PDSW’11

R/W Performance – on IOR

  • 4 I/O clients, 4 I/O servers. 64 processes with HDD and Infiniband
slide-25
SLIDE 25

PDSW’11

Performance on MPI- TILE-IO

  • 4 I/O clients, 4 I/O servers. 64 processes with HDD and Infiniband.

Elements in a tile: 1024x1024.

slide-26
SLIDE 26

PDSW’11

Performance on MPI- TILE-IO with SSD

  • 4 I/O clients, 4 I/O servers. 64 processes with SSD and Infiniband.

Elements in a tile: 1024x1024.

slide-27
SLIDE 27

PDSW’11

Conclusion & Future Work

Conclusion

  • Different file organizations lead to very different

performance.

  • Bridging logical data and physical data

Access pattern

  • > better organization
  • > better performance

Future Work

  • Multiple replicas with different organizations.
  • More complicated access patterns, patterns with hints
  • File reorganization for emerging storage medias, such as

SSD

slide-28
SLIDE 28

PDSW’11

Acknowledgement

  • Hui Jin and Spenser Gilliland (Illinois Institute of

Technology)

  • Ce Yu (Tianjin University, China)
  • Samuel Lang (Argonne National Laboratory)
  • NSF grant CCF-0621435, CCF-0937877
  • Office of Advanced Scientific Computing Research,

Office of Science, U.S. DOE, under Contract DEAC02-06CH11357.

Thanks!