Pufferbench: Evaluating and Optimizing Malleability of Distributed - - PowerPoint PPT Presentation

pufferbench evaluating and optimizing malleability of
SMART_READER_LITE
LIVE PREVIEW

Pufferbench: Evaluating and Optimizing Malleability of Distributed - - PowerPoint PPT Presentation

Pufferbench: Evaluating and Optimizing Malleability of Distributed Storage Nathanal Cheriere, Matthieu Dorier, Gabriel Antoniu PDSW-DISCS 2018, Dallas Data is everywhere High variety of applications High variety of needs Resource


slide-1
SLIDE 1

Pufferbench: Evaluating and Optimizing Malleability of Distributed Storage

Nathanaël Cheriere, Matthieu Dorier, Gabriel Antoniu PDSW-DISCS 2018, Dallas

slide-2
SLIDE 2

Data is everywhere

High variety of applications High variety of needs

slide-3
SLIDE 3

Resource requirements vary in time

Day/night cycles Weekly cycles Workflows

slide-4
SLIDE 4

Problem: What about task/data colocation?

  • Local data access
  • Easy scalability

Why?

  • Satisfy resource requirements
  • Peaks
  • Low
  • Avoid idle nodes

üSave money üSave energy

Dynamically adjust the amount of resources?

? Storage system malleability üComputing resources malleability

slide-5
SLIDE 5

Two operations:

Commission Decommission Constraints:

  • No data losses
  • Maintain fault tolerance
  • Balance data

Problems:

  • Long data transfers
slide-6
SLIDE 6

What is the duration of storage rescaling on a given platform?

  • Previous works: lower bounds
  • Useful but unrealistic
  • Many simplifications
  • Need a tool to measure it on real hardware

How fast can one scale down a distributed file system?, N. Cheriere, G. Antoniu, Bigdata 2017 A Lower Bound for the Commission Times in Replication-Based Distributed Storage Systems. N. Cheriere,

  • M. Dorier, G. Antoniu. [Research Report – Submitted to JPDC] 2018
slide-7
SLIDE 7

A benchmark: Pufferbench

Goals:

  • Measure the duration of rescaling on a platform
  • Serve as a quick prototyping testbed for rescaling mechanisms

How:

  • Do all I/Os that are needed by a rescaling
slide-8
SLIDE 8

Main steps

  • 1. Migration Planning
  • 2. Data Generation
  • 3. Execution
  • 4. Statistics Aggregation
slide-9
SLIDE 9

Software Architecture

slide-10
SLIDE 10

MetadataGenerator: Generate information about files on the storage (number,size)

Software Architecture

slide-11
SLIDE 11

DataDistributionGenerator: Assign files to storage nodes

Software Architecture

slide-12
SLIDE 12

DataTransferScheduler: Compute data transfers needed for rescaling

Software Architecture

slide-13
SLIDE 13

IODispatcher: Assign transfer instructions to storage and network

Software Architecture

slide-14
SLIDE 14

Storage: Interface with the storage devices

Software Architecture

slide-15
SLIDE 15

Network: Exchange data between nodes

Software Architecture

slide-16
SLIDE 16

DataDistributionValidator: Compute statistics about data placement (load, replication)

Software Architecture

slide-17
SLIDE 17

Validation

Hardware

  • Up to 40 nodes
  • 16 cores, 2.4 GHz
  • 128 GB RAM
  • 558 GB disk
  • 10 Gbps ethernet

Comparison to lower bounds

Matching hypotheses:

  • Load balancing (50 GB per node)
  • Uniform data distribution
  • Data replication

Differences:

  • Hardware is not identical
  • Storage has latency
  • Network has latency and interferences
slide-18
SLIDE 18

Pufferbench is close to lower bounds!

1 2 3 4 5 6 7 5 10 15 20 25 30 Number of decommissionned nodes (out of a cluster of 20) Time to decommission (s) 5 10 15 20 25 30

Decommission times Pufferbench Theoretical minimum

1 2 3 4 5 6 7 50 100 150 Number of decommissionned nodes (to a cluster of 20) Time to decommission (s) 50 100 150

Decommission times Pufferbench Theoretical minimum

5 10 15 20 25 30 15 20 25 30 35 40 Number of commissionned nodes (to a cluster of 10) Time to commission (s) 15 20 25 30 35 40

Commission times Pufferbench Theoretical minimum

5 10 15 20 25 30 100 150 200 250 Number of commissionned nodes (to a cluster of 10) Time to commission (s) 100 150 200 250

Commission times Pufferbench Theoretical minimum

Commission Decommission In memory storage On drive storage

Within 16% of lower bounds

Lower bounds are realistic

slide-19
SLIDE 19

Use case: HDFS

Question: How fast can the rescaling in HDFS be? No modifications of HDFS With Pufferbench:

  • Reproduce initial conditions
  • Aim for same final data placement
slide-20
SLIDE 20

Pufferbench matching HDFS’s rescaling

  • Chunks of 128 MiB
  • Random placement
  • Replicated 3 times
  • Load balanced
  • Mostly random
slide-21
SLIDE 21

HDFS needs better disk I/Os

1 2 3 4 5 6 7 100 300 500 700 Number of decommissionned nodes (to a cluster of 20) Time to decommission (s) 100 300 500 700 100 300 500 700

Decommission times Measured on HDFS Pufferbench Theoretical minimum

1 2 3 4 5 6 7 5 10 15 20 25 30 35 Number of decommissionned nodes (out of a cluster of 20) Time to decommission (s) 5 10 15 20 25 30 35 5 10 15 20 25 30 35

Decommission times Measured on HDFS Pufferbench Theoretical minimum

In memory storage On drive storage

Improvement possible on disk access patterns

3 x

Commission

slide-22
SLIDE 22

5 10 15 20 25 30 50 100 150 200 Number of commissionned nodes (to a cluster of 10) Time to commission (s) 50 100 150 200 50 100 150 200

Commission times Measured on HDFS Pufferbench Theoretical minimum

HDFS is far from optimal performances!

5 10 15 20 25 30 200 400 600 800 1000 Number of commissionned nodes (to a cluster of 10) Time to commission (s) 200 400 600 800 1000 200 400 600 800 1000

Commission times Measured on HDFS Pufferbench Theoretical minimum

In memory storage On drive storage

Improvement possible on algorithms, disk access patterns, pipelining

14 x

Commission

slide-23
SLIDE 23

Setup duration

Setup overhead for the commission in memory:

  • HDFS: 26 h
  • Pufferbench: 53 min

Good for prototyping:

  • Fast evaluation
  • Light setup
slide-24
SLIDE 24

Pufferbench:

  • Evaluate the viability of storage malleability on platforms
  • Quickly prototype and evaluate rescaling mechanisms

Available at https://gitlab.inria.fr/Puffertools/Pufferbench Can be installed with Spack

To conclude

slide-25
SLIDE 25

To conclude

Pufferbench:

  • Evaluate the viability of storage malleability on platforms
  • Quickly prototype and evaluate rescaling mechanisms

Available at https://gitlab.inria.fr/Puffertools/Pufferbench Can be installed with Spack

Thank you! Questions?

nathanael.cheriere@irisa.fr