Okeanos: Wasteless Journaling for Fast and Reliable Multistream - - PowerPoint PPT Presentation

okeanos wasteless journaling for fast and reliable
SMART_READER_LITE
LIVE PREVIEW

Okeanos: Wasteless Journaling for Fast and Reliable Multistream - - PowerPoint PPT Presentation

Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage Andromachi Hatzieleftheriou , Stergios V. Anastasiadis Department of Computer Science University of Ioannina, Greece University of Ioannina A. Hatzieleftheriou 1 Outline


slide-1
SLIDE 1

Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage

Andromachi Hatzieleftheriou, Stergios V. Anastasiadis

Department of Computer Science University of Ioannina, Greece

1

  • A. Hatzieleftheriou

University of Ioannina

slide-2
SLIDE 2

Outline

Motivation Design Implementation Evaluation Conclusions

2

  • A. Hatzieleftheriou

University of Ioannina

slide-3
SLIDE 3

Motivation

  • Synchronous small writes

 critical for system and application

reliability

  • Multistream concurrency

 effectively random I/O

  • In page-sized disk accesses

 async writes have good performance due to batching in memory  sync writes result in wasteful traffic due to excessive full-page I/Os

1 10 100 1000 1 10 100

Total Journal Volume (MB) Request Size (KB)

Write Traffic

(Linux ext3) Data Journaling Ordered

Page Size=4KB

3

  • A. Hatzieleftheriou

data & metadata metadata only University of Ioannina

slide-4
SLIDE 4

Design Goals

1.

Reliable storage

 keep data on disk

2.

Inexpensive synchronous small writes

 sequential disk throughput

3.

Reduce disk bandwidth waste due to:

 writes with high positioning overhead  unnecessary writes of unmodified data

  • Proposed approach:

 batch random small writes in memory  journal data updates at subpage granularity

4

  • A. Hatzieleftheriou

University of Ioannina

slide-5
SLIDE 5

DISK Filesystem

Wasteless Journaling

  • Idea:
  • 1. Synchronously transfer data deltas from memory to journal
  • 2. Occasionally move data blocks from memory to final location
  • Still wasteful!

 large writes  disk traffic duplication

MEMORY Journal Pages

5

  • A. Hatzieleftheriou

University of Ioannina data deltas

slide-6
SLIDE 6

Selective Journaling

  • Definition:

 write threshold differentiates requests by size

  • Idea:
  • 1. Transfer large requests to final location without journaling of data
  • 2. Treat small requests according to wasteless journaling

MEMORY DISK Journal

data deltas

Pages Filesystem

6

  • A. Hatzieleftheriou

University of Ioannina

slide-7
SLIDE 7

Consistency

  • Wasteless Journaling:

 atomic updates of both data and metadata

  • Selective Journaling:

data updates either journaled or not depending on request size

 consistency at least as strict as default ext3 journaling mode (ordered)

7

  • A. Hatzieleftheriou

University of Ioannina

slide-8
SLIDE 8

Prototype Implementation

Header Tag Tag Tag Data Delta Data Delta Data Delta Data Copies Block Buffer

Page Cache Journal Descriptor Block Multiwrite Journal Block

Modified Data Original Data

  • block num of final location
  • offset in page
  • length in bytes

… … …

  • Multiwrite journal block

 accumulates multiple subpage data updates

  • During recovery

 apply data deltas to corresponding final disk blocks

8

  • A. Hatzieleftheriou

University of Ioannina

slide-9
SLIDE 9

Experiments

  • Implemented in Linux kernel 2.6.18 ext3
  • Experimentation Environment:

 x86-based servers  quad-core 2.66GHz processor  3GB RAM  Seagate Cheetah SAS 300GB 15KRPM disks

  • Workloads:

 Microbenchmarks  Postmark  MPIO-IO over PVFS2

9

  • A. Hatzieleftheriou

University of Ioannina

slide-10
SLIDE 10

Latency

⁻ Data & wasteless achieve substantially lower write latency

 similar to NILFS (stable Linux port of LFS )

⁻ NILFS read latency significantly higher due to poor storage locality!

1 10 100 1000 20 40 60 80 100

Write Latency (ms) Number of Streams

1 Mbps/stream

Selective Ordered Wasteless Data NILFS 1 10 100 1000 20 40 60 80 100

Read Latency (us) Number of Streams

1 Mbps/stream

NILFS Selective Ordered Data Wasteless 10

  • A. Hatzieleftheriou

University of Ioannina

slide-11
SLIDE 11

Disk Traffic

⁻ Data journaling expensive in terms of journal traffic ⁻ Ordered journaling incurs increased filesystem traffic ⁻ Wasteless & selective substantially reduce journal and filesystem traffic

0.001 0.01 0.1 1 10 2000 4000 6000 8000

Journal Throughput (MB/s) Number of Streams

1Kbps/stream

Data Wasteless Selective Ordered University of Ioannina 11

  • A. Hatzieleftheriou

1 2 3 4 5 2000 4000 6000 8000

File System Throughput (MB/s) Number of Streams

1Kbps/stream

Ordered Data Wasteless Selective

Lower is better!

slide-12
SLIDE 12

Application-Level Workloads

₋ Small files workload

wasteless increases transaction throughput

₋ Parallel I/O workload

13 clients, 1 PVFS2 data server, 1 PVFS2 metadata server (15 machines)

wasteless doubles the throughput of parallel application checkpointing

200 400 600 800 1 10 100

Transactions/s Request Size (KB)

Postmark

Wasteless Data Selective Ordered 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40

Throughput MB/s Threads per Client

MPI-IO over PVFS2

(Write Size 1KB)

Wasteless Data Selective Ordered 12

  • A. Hatzieleftheriou

University of Ioannina

slide-13
SLIDE 13

Conclusions & Future Work

  • Key concept:

 apply subpage journaling of data updates to ensure reliability

  • Wasteless Journaling

 merges subpage writes into page-sized journal blocks

  • Selective Journaling

 journals only updates below a write threshold

  • Performance benefits demonstrated over ext3:

 reduced write latency  improved transaction throughput  avoided bandwidth waste

  • Future Work

 extent for virtualization environments and flash memory systems

13

  • A. Hatzieleftheriou

University of Ioannina