MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build - - PowerPoint PPT Presentation

mvapich2 over openstack with sr iov an efficient approach
SMART_READER_LITE
LIVE PREVIEW

MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build - - PowerPoint PPT Presentation

MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds Jie Zhang, Xiaoyi Lu, Mark Arnold and Dhabaleswar. K. Panda Outline Introduction Problem Statement Proposed Design Performance Evaluation


slide-1
SLIDE 1

MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds

Jie Zhang, Xiaoyi Lu, Mark Arnold and Dhabaleswar. K. Panda

slide-2
SLIDE 2

2 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Proposed Design
  • Performance Evaluation

Outline

slide-3
SLIDE 3

3 Network Based Computing Laboratory

  • Single Root I/O Virtualization (SR-IOV) is providing new opportunities

to design HPC cloud with very little low overhead

Single Root I/O Virtualization (SR-IOV)

– Allows a single physical device,

  • r a Physical Function (PF), to

present itself as multiple virtual devices, or Virtual Functions (VFs) – Each VF can be dedicated to a single VM through PCI pass- through – VFs are designed based on the existing non-virtualized PFs, no need for driver change

Guest 1 Guest OS VF Driver Guest 2 Guest OS VF Driver Guest 3 Guest OS VF Driver Hypervisor PF Driver I/O MMU SR-IOV Hardware Virtual Function Virtual Function Virtual Function Physical Function PCI Express

slide-4
SLIDE 4

4 Network Based Computing Laboratory

  • SR-IOV shows near to native performance for inter-node point to point

communication

  • However, NOT VM locality aware
  • IVShmem offers zero-copy access to data on shared memory of co-resident VMs

Inter-VM Shared Memory (IVShmem)

Host Environment

Guest 1

Hypervisor PF Driver

Infiniband Adapter Physical Function user space kernel space

MPI proc PCI Device VF Driver

Guest 2

user space kernel space

MPI proc PCI Device VF Driver

Virtual Function Virtual Function /dev/shm/ IV-SHM

IV-Shmem Channel SR-IOV Channel

slide-5
SLIDE 5

5 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Proposed Design
  • Performance Evaluation

Outline

slide-6
SLIDE 6

6 Network Based Computing Laboratory

  • How to design a high performance MPI library to efficiently take advantage SR-

IOV and IVShmem to deliver VM locality aware communication and optimal performance?

  • How to build an HPC Cloud with near native performance for MPI applications
  • ver SR-IOV enabled InfiniBand clusters?
  • How much performance improvement can be achieved by our proposed

design on MPI point-to-point, collective operations and applications in HPC clouds?

  • How much benefit the proposed approach with InfiniBand can provide

compared to Amazon EC2?

Problem Statement

slide-7
SLIDE 7

7 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Proposed Design
  • Performance Evaluation

Outline

slide-8
SLIDE 8

8 Network Based Computing Laboratory

  • MVAPICH2 library running in native and virtualization environments
  • In virtualized environment
  • Support shared-memory channels (SMP, IVShmem) and SR-IOV channel
  • Locality detection
  • Communication coordination

VM Locality Aware MVAPICH2 Design Overview

Application MPI Layer ADI3 Layer SMP Channel Network Channel Shared Memory InfiniBand API MPI Library Communication Device APIs Native Hardware Application MPI Layer ADI3 Layer IVShmem Channel SR-IOV Channel Shared Memory InfiniBand API Virtual Machine Aware Communication Device APIs Virtualized Hardware Communication Coordinator Locality Detector SMP Channel

slide-9
SLIDE 9

9 Network Based Computing Laboratory

Virtual Machine Locality Detection

  • Create a VM List

structure on IVShmem region of each host

  • Each MPI process writes

its own membership information into shared VM List structure according to its global rank

  • One byte each, lock-free,

O(N)

Host Environment Hypervisor PF Driver

/dev/shm/ IVShmem user space kernel space

rank0

MPI proc PCI Device VF Driver

user space kernel space

MPI proc PCI Device VF Driver

user space kernel space

MPI proc PCI Device VF Driver

user space kernel space

MPI proc PCI Device VF Driver

0 0 0 0

rank1 rank4 rank5

0 0 0 0

1 1 1 1

VM VM List st

slide-10
SLIDE 10

10 Network Based Computing Laboratory

Communication Coordination

  • Retrieve VM locality

detection information

  • Schedule

communication channels based on VM locality information

  • Fast index, light-

weight

Host Environment Hypervisor PF Driver

IVShmem

Guest1

user space kernel space

PCI Device VF Driver Physical Function Virtual Function Virtual Function

InfiniBand Adapter /dev/shm

IVShmem Channel SR-IOV Channel

1 1 0 0 1 1 0 0

Communication Coordinator

MPI Process Rank 1

Guest2

user space kernel space

PCI Device VF Driver IVShmem Channel SR-IOV Channel

1 1 0 0 1 1 0 0

Communication Coordinator

MPI Process Rank 4

slide-11
SLIDE 11

11 Network Based Computing Laboratory

  • OpenStack is one of the most

popular open-source solutions to build a cloud and manage huge amounts of virtual machines

  • Deployment with OpenStack

– Supporting SR-IOV configuration – Extending Nova in OpenStack to support IVShmem – Virtual Machine Aware design of MVAPICH2 with SR-IOV

  • An efficient approach to build HPC

Clouds

MVAPICH2 with SR-IOV over OpenStack

Nova Glance Neutron Swift

Keystone

Cinder Heat

Ceilometer Horizon

VM

Backup volumes in Stores images in Provides images Provides Network Provisions Provides Volumes Monitors Provides UI Provides Auth for Orchestrates cloud

slide-12
SLIDE 12

12 Network Based Computing Laboratory

Experimental HPC Cloud

slide-13
SLIDE 13

13 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Proposed Design
  • Performance Evaluation

Outline

slide-14
SLIDE 14

14 Network Based Computing Laboratory

Cloud Testbeds

Cluster Nowlab Cloud Amazon EC2 Instance 4 Core/VM 8 Core/VM 4 Core/VM 8 Core/VM Platform RHEL 6.5 Qemu+KVM HVM Amazon Linux (EL6) Xen HVM C3.xlarge Instance Amazon Linux (EL6) Xen HVM C3.2xlarge Instance CPU SandyBridge Intel(R) Xeon E5-2670 (2.6GHz) IvyBridge Intel(R) Xeon E5-2680v2 (2.8GHz) RAM 6 GB 12 GB 7.5 GB 15 GB Interconnect FDR (56Gbps) InfiniBand Mellanox ConnectX-3 with SR-IOV 10 GigE with Intel ixgbevf SR-IOV driver

slide-15
SLIDE 15

15 Network Based Computing Laboratory

  • Performance of MPI Level Point-to-point Operations

– Inter-node MPI Level Two-sided Operations – Intra-node MPI Level Two-sided Operations – Intra-node MPI Level One-sided Operations

  • Performance of MPI Level Collectives Operations

– Broadcast, Allreduce, Allgather and Alltoall

  • Performance of Typical MPI Benchmarks and Applications

– NAS and Graph500

Performance Evaluation

*Amazon EC2 does not support users to explicitly allocate VMs in one physical node so far. We allocate multiple VMs in one logical group and compare the point-to-point performance for each pair of VMs. We see the VMs who have the lowest latency as located within one physical node (Intra-node), otherwise Inter-node.

slide-16
SLIDE 16

16 Network Based Computing Laboratory

Inter-node MPI Level Two-sided Point-to-Point Performance

  • EC2 C3.xlarge instances
  • Similar performance with SR-IOV-Def
  • Compared to Native, similar overhead as basic IB level
  • Compared to EC2, up to 29X and 16X performance speedup on Lat & BW

0%

slide-17
SLIDE 17

17 Network Based Computing Laboratory

Intra-node MPI Level Two-sided Point-to-Point Performance

  • EC2 C3.xlarge instances
  • Compared to SR-IOV-Def, up to 84% and 158% performance improvement on Lat & BW
  • Compared to Native, 3%-7% overhead for Lat, 3%-8% overhead for BW
  • Compared to EC2, up to 160X and 28X performance speedup on Lat & BW
slide-18
SLIDE 18

18 Network Based Computing Laboratory

Intra-node MPI Level One-sided Put Performance

  • EC2 C3.xlarge instances
  • Compared to SR-IOV-Def, up to 63% and 42% improvement on Lat & BW
  • Compared to EC2, up to 134X and 33X performance speedup on Lat & BW
slide-19
SLIDE 19

19 Network Based Computing Laboratory

Intra-node MPI Level One-sided Get Performance

  • EC2 C3.xlarge instances
  • Compared to SR-IOV-Def, up to 70% improvement on both Lat & BW
  • Compared to EC2, up to 121X and 24X performance speedup on Lat & BW
slide-20
SLIDE 20

20 Network Based Computing Laboratory

MPI Level Collectives Operations Performance

(4 cores/VM * 4 VMs)

  • EC2 C3.xlarge instances
  • Compared to SR-IOV-Def, up to 74% and 60% performance improvement on Broadcast &

Allreduce

  • Compared to EC2, up to 65X and 22X performance speedup on Bcast & Allreduce
slide-21
SLIDE 21

21 Network Based Computing Laboratory

MPI Level Collectives Operations Performance

(4 cores/VM * 4 VMs)

  • EC2 C3.xlarge instances
  • Compared to SR-IOV-Def, up to 74% and 81% performance improvement on Allgather &

Alltoall

  • Compared to EC2, up to 28X and 45X performance speedup on Allgather & Alltoall
slide-22
SLIDE 22

22 Network Based Computing Laboratory

MPI Level Collectives Operations Performance

(4 cores/VM * 16 VMs)

  • Compared to SR-IOV-Def, up to 41% and 45% performance improvement on

Bcast & Allreduce

slide-23
SLIDE 23

23 Network Based Computing Laboratory

MPI Level Collectives Operations Performance

(4 cores/VM * 16 VMs)

  • Compared to SR-IOV-Def, up to 40% and 39% performance improvement on

Allgather & Alltoall

slide-24
SLIDE 24

24 Network Based Computing Laboratory

Performance of Typical MPI Benchmarks and Applications

(8 cores/VM * 4 VMs)

  • EC2 C3.2xlarge instances
  • Compared to Native, 2%-9% overhead for NAS, around 6% overhead for Graph500
  • Compared to EC2, up to 4.4X (FT) speedup for NAS, up to 12X (20,10) speedup for

Graph500

slide-25
SLIDE 25

25 Network Based Computing Laboratory

Performance of Typical MPI Benchmarks and Applications

(8 cores/VM * 8 VMs)

  • EC2 C3.2xlarge instances
  • Compared to Native, 6%-9% overhead for NAS, around 8% overhead for Graph500
slide-26
SLIDE 26

Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds?

Jie Zhang, Xiaoyi Lu and Dhabaleswar K. Panda

slide-27
SLIDE 27

27 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Evaluation Methodology
  • Performance Evaluation

Outline

slide-28
SLIDE 28

28 Network Based Computing Laboratory

Virtualization Technology (Hypervisor vs. Container)

  • Provides abstractions of multiple virtual resources by utilizing an intermediate

software layer on top of the underlying system

Hardware Host OS Hypervisor bins/ libs Guest OS Redhat Linux App2 Stack bins/ libs Guest OS Window App3 Stack bins/ libs Guest OS Ubuntu VM1 Hypervisor-based Virtualization App1 Stack

  • Hypervisor provides a full abstraction of

VM

  • Full virtualization, different guest OS,

better isolation

  • Larger overhead due to heavy stack
slide-29
SLIDE 29

29 Network Based Computing Laboratory

Virtualization Technology (Hypervisor vs. Container)

  • Provides abstractions of multiple virtual resources by utilizing an intermediate

software layer on top of the underlying system

Container1 Hardware Host Linux OS bins/ libs bins/ libs bins/ libs App1 Stack App2 Stack App3 Stack Container-based Virtualization

  • Share host kernel
  • Allows execution of isolated user space

instance

  • Lightweight, portability
  • Not strong isolation
slide-30
SLIDE 30

30 Network Based Computing Laboratory

Container Technology (Docker vs. Singularity)

  • Inherit advantages of container

technique

  • Active community contribution
  • Root owned daemon process
  • Root escalation in Docker

container

  • Non-negligible performance
  • verhead
slide-31
SLIDE 31

31 Network Based Computing Laboratory

  • Reproducible software stacks

– Easily verify via checksum or cryptographic signature

  • Mobility of compute

– Able to transfer (and store) containers via standard data mobility tools

  • Compatibility with complicated architectures

– Runtime immediately compatible with existing HPC architecture

  • Security model

– Support untrusted users running untrusted containers http://singularity.lbl.gov/about

Singularity Overview

slide-32
SLIDE 32

32 Network Based Computing Laboratory

Container Technology (Docker vs. Singularity)

  • Singularity aims to provide reproducible and

mobile environments across HPC centers

  • NO root owned daemon
  • NO root escalation
  • mpirun_rsh –np 2 –hostfile htfiles singualrity exec

/tmp/Centos-7.img /usr/bin/osu_latency

  • Performance ?
slide-33
SLIDE 33

33 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Evaluation Methodology
  • Performance Evaluation

Outline

slide-34
SLIDE 34

34 Network Based Computing Laboratory

  • What is the performance characterization of running Singularity on HPC cloud?
  • Can Singularity deliver near-native performance for MPI applications on HPC

cloud with different cutting-edge hardware technologies?

  • Is Singularity-based container technology ready for running MPI applications
  • n HPC clouds on top of HPC infrastructure?

Problem Statement

slide-35
SLIDE 35

35 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Evaluation Methodology
  • Performance Evaluation

Outline

slide-36
SLIDE 36

36 Network Based Computing Laboratory

Evaluation Methodology

Multi-core Processor (Haswell) Many-core Processor (KNL)

Processor Architecture Memory Access Mode

NUMA Cache Flat InfiniBand Omni-Path

High Speed Interconnects

  • Virtualization Solution Overhead

– Singularity vs. Native

  • Three-dimensional Evaluation
slide-37
SLIDE 37

37 Network Based Computing Laboratory

Processor Architecture

Haswell KNL

  • Up to 72 cores (1.4GHz) on 36 active tiles
  • Each tile has a single 1MB L2 cache shared

between two cores

  • Each core supports 4 threads
  • 6 DDRs + 8 Multi-Channel DRAMs (MCDRAM)
  • Dual socket (NUMA)
  • Each socket with 12 cores (2.30GHz)
  • Each core supports 2 threads
  • 4 DDRs, 2 for each
slide-38
SLIDE 38

38 Network Based Computing Laboratory

Memory Access Mode

68 cores 16GB cache 96GB RAM DDR4 96GB MCDRAM 16GB

MCDRAM 16GB 112GB RAM DDR4 96GB 68 cores

Haswell: 2 NUMA Nodes KNL with Cache Mode KNL with Flat Mode

  • 2 NUMA nodes
  • QPI channels between

sockets

  • Intra-/Inter-socket
  • Cache Mode:
  • MCDRAM as an L3 cache
  • OS transparently uses

MCDRAM to move data from main memory

  • DDR4 and MCDRAM act as

two distinct NUMA nodes

  • Need to specify the type of

memory (DDR4 or MCDRAM) when allocating

slide-39
SLIDE 39

39 Network Based Computing Laboratory

  • Introduction
  • Problem Statement
  • Evaluation Methodology
  • Performance Evaluation

Outline

slide-40
SLIDE 40

40 Network Based Computing Laboratory

Testbeds

Cluster Chameleon Cloud Nowlab Cloud CPU Intel Xeon E5-2670 Haswell processors, 24 cores (2.3 GHz)

Intel Xeon Phi CPU 7250 KNL co-processor, 68 cores (1.40GHz)

Memory 2 NUMA Nodes, 128 GB

96GB host memory, and 16GB MCDRAM

Interconnect Mellanox ConnectX-3 HCA, (FDR 56Gbps)

Omni-Path HFI Silicon 100 Series fabric controller (100Gbps)

OS CentOS Linux release 7.1.1503 (Core)

CentOS Linux release 7.3.1611 (Core) Singularity 2.3, MVAPICH2-2.3a, OSU micro-benchmarks v5.3

slide-41
SLIDE 41

41 Network Based Computing Laboratory

Evaluation Methodology

Multi-core Processor (Haswell) Many-core Processor (KNL)

Processor Architecture

  • Virtualization Solution Overhead

– Singularity vs. Native

  • Dimension 1

– Multi-core Processor – Many-core Processor

slide-42
SLIDE 42

42 Network Based Computing Laboratory 1000 2000 3000 4000 5000 6000 7000 8000 9000 1K 4K 16K 64K 256K 1M Bandwidth(MB/s) Mesage Size ( bytes) Singularity-Intra-Node Native-Intra-Node Singularity-Inter-Node Native-Inter-Node

Processor Architecture (Haswell & KNL)

  • MPI point-to-point Bandwidth
  • On both Haswell and KNL, less than 7% overhead for Singularity solution
  • Worse intra-node performance than Haswell because low CPU frequency, complex cluster mode, and cost

maintaining cache coherence

  • KNL - Inter-node performs better than intra-node case after around 256 Kbytes, as Omni-Path

interconnect outperforms shared memory-based transfer for large message size

BW on Haswell BW on KNL

7%

2000 4000 6000 8000 10000 12000 14000 16000 1K 4K 16K 64K 256K 1M Bandwidth (MB/s) Mesage Size ( bytes) Singularity-Intra-Node Native-Intra-Node Singularity-Inter-Node Native-Inter-Node

slide-43
SLIDE 43

43 Network Based Computing Laboratory

Evaluation Methodology

Memory Access Mode

NUMA Cache Flat

  • Virtualization Solution Overhead

– Singularity vs. Native

  • Dimension 2

– NUMA – Cache – Flat

slide-44
SLIDE 44

44 Network Based Computing Laboratory 5 10 15 20 25 30 35 40 45 1 4 16 64 256 1K 4K 16K 64K Latency (us) Mesage Size ( bytes) Singularity-Intra-Node Native-Intra-Node Singularity-Inter-Node Native-Inter-Node

Memory Access Mode (NUMA, Cache)

  • MPI point-to-point Latency
  • NUMA

– Intra-socket performs better than inter-socket case, as the QPI bottleneck between NUMA nodes

– Performance difference is gradually decreased, as the message size increases

  • Overall, less than 8% overhead for Singularity solution in both cases, compared with Native

2 4 6 8 10 12 1 4 16 64 256 1K 4K 16K 64K Latency (us) Mesage Size ( bytes) Singularity-Intra-Socket Native-Intra-Socket Singularity-Inter-Socket Native-Inter-Socket

2 NUMA nodes w Cache mode

8%

slide-45
SLIDE 45

45 Network Based Computing Laboratory

Memory Access Mode (Flat)

1000 2000 3000 4000 5000 6000 7000 8000 Bandwidth (MB/s) Mesage Size ( bytes) Singularity(DDR) Native(DDR) Singularity(MCDRAM) Native(MCDRAM)

  • Explicitly specify DDR or MCDRAM for memory allocation
  • MPI point-to-point BW: No significant performance difference
  • MPI collective Allreduce: Clear benefits (up to 67%) with MCDRAM after around 256 KB message,

compared with DDR

  • More parallel processes increase data access, which can NOT fit in L2 cache, higher BW in MCDRAM
  • Near-native performance for Singularity (less than 8% overhead)

BW w Flat mode

1000 2000 3000 4000 5000 6000 7000 8000 9000 Latency (us) Mesage Size ( bytes) Singularity(DDR) Native(DDR) Singularity(MCDRAM) Native(MCDRAM)

MPI_Allreduce w Flat mode

67% 8%

slide-46
SLIDE 46

46 Network Based Computing Laboratory

Evaluation Methodology

InfiniBand Omni-Path

High Speed Interconnects

  • Virtualization Solution Overhead

– Singularity vs. Native

  • Dimension 3

– InfiniBand – Omni-Path

slide-47
SLIDE 47

47 Network Based Computing Laboratory

High Performance Interconnects (InfiniBand & Omni-Path)

1 10 100 1000 10000 Latency (us) Mesage Size ( bytes) Singularity Native

  • MPI_Allreduce
  • InfiniBand - 512 Processes across 32 nodes
  • Omni-Path - 128 Processes on 2 nodes
  • Near-native performance for Singularity

1 10 100 1000 10000 100000 Latency (us) Mesage Size ( bytes) Singularity Native

InfiniBand Intel Omni-Path

8%

slide-48
SLIDE 48

48 Network Based Computing Laboratory

Put It All Together

Multi-core Processor (Haswell) Many-core Processor (KNL)

Processor Architecture Memory Access Mode

NUMA Cache Flat InfiniBand Omni-Path

High Speed Interconnects

  • Virtualization Solution Overhead

– Singularity vs. Native

  • Haswell + InfiniBand
  • KNL + Cache mode + Omni-Path
  • KNL + Flat mode + Omni-Path
slide-49
SLIDE 49

49 Network Based Computing Laboratory

500 1000 1500 2000 2500 3000 22,16 22,20 24,16 24,20 26,16 26,20 Execution Time (ms) Singularity Native

Application Performance on Haswell with InfiniBand

50 100 150 200 250 300 CG EP FT IS LU MG Execution Time (s) Singularity Native

Class D NAS Graph500

  • 512 processors across 32 Haswell nodes
  • Singularity delivers near-native performance, less than 7% overhead on Haswell

with InfiniBand

7%

slide-50
SLIDE 50

50 Network Based Computing Laboratory

  • 128 Processors across 2 KNL nodes with Omni-Path
  • Singularity only incurs less than 6% overhead on KNL in both Cache and Flat modes
  • No clear performance difference between DDR and MCDRAM in Flat mode

– Graph500 heavily utilizes pt2pt communication with 4 Kbytes message size for BFS search – Consistent with pt2pt performance on KNL with Flat mode

50 100 150 200 250 300 350 400 CG EP FT IS LU MG Execution Time (s) Singularity Native

Application Performance on KNL with Omni-Path

200 400 600 800 1000

20,10 20,16 20,20 22,10 22,16 22,20 24,10 24,16

Execution Time (s) Singularity(DDR) Native(DDR) Singularity(MCDRAM) Native(MCDRAM)

Class C NAS with Cache mode Graph500 with Flat mode

6%

slide-51
SLIDE 51

High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand

  • H. Subramoni, P. Lai, R. Kettimuthu and D. K. Panda
slide-52
SLIDE 52

52 Network Based Computing Laboratory

  • GridFTP is a high-performance, secure, reliable extension of the standard FTP
  • ptimized for WAN
  • Globus XIO framework, used to design GridFTP, offers easy-to-use interface
  • The framework hides the complications of communication semantics of

underlying devices (network or disk)

Overview

slide-53
SLIDE 53

53 Network Based Computing Laboratory

  • Combining the ease of use of Globus XIO framework and the high performance

achieved through IB

  • Enhancing the disk I/O performance of the existing ADTS library

– By decoupling the network processing from disk I/O operations

  • Evaluation of the design

– micro-benchmark level – applications like Community Climate System Model and ultra scale visualization

Contribution

slide-54
SLIDE 54

54 Network Based Computing Laboratory

  • Most HPC applications require movement of huge amount of data

– Needs slower hard disks and RAIDs for storage – With low bandwidth provided by TCP/UDP based FTP, this was not an issue – Will be an issue for Globus ADTS XIO

  • Solution

– decoupling of network from disk I/O

Problem Statement

slide-55
SLIDE 55

55 Network Based Computing Laboratory

  • Introduction of

– multiple threads

  • read, write and network thread

– set of buffers to stage the data

  • Read thread prefetches a set of locations

from the disk and keeps it ready for the network

  • Avoid frequent context switches

– Low and High Water Marks

  • High water mark: max size of circular buffer

– Read when available buffers less than low- water mark

Design of the Globus ADTS XIO Driver

slide-56
SLIDE 56

56 Network Based Computing Laboratory

  • Variant network delays
  • Transmit 128 GB of aggregate data as multiples of 256 MB files
  • Legend: staging buffer sizes

Evaluation

slide-57
SLIDE 57

57 Network Based Computing Laboratory

  • Variant network delays
  • Transmit 128 GB of aggregate data as multiples of 256 MB files
  • Legend: staging buffer sizes

Evaluation

slide-58
SLIDE 58

58 Network Based Computing Laboratory

  • Community Climate System

Model (CCSM)

– National Center for Atmospheric Research (NCAR) and Lawrence Livermore National Laboratory (LLNL) – Most files are 256 MB

  • Ultra-Scale Visualization

– ORNL and UC Davis – Most files are 2.6 GB

Evaluation

slide-59
SLIDE 59

59 Network Based Computing Laboratory

Thank You!

Network-Based Computing Laboratory http://nowlab.cse.ohio-state.edu/