Pilot-Streaming: Design Considerations for a Stream Processing - - PowerPoint PPT Presentation

pilot streaming design considerations for a stream
SMART_READER_LITE
LIVE PREVIEW

Pilot-Streaming: Design Considerations for a Stream Processing - - PowerPoint PPT Presentation

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu Motivation There is


slide-1
SLIDE 1

Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

slide-2
SLIDE 2

Motivation

There is a need to couple data sources, HPC, analytics! 20+ applications identified at STREAM16 Challenges:

  • Data applications and pipelines are complex
  • Scalability and Elasticity: dynamic changes in resource demands
  • Scheduling and provisioning of resources: right amount of resources at right time
  • Programming models: HPC (MPI, OpenMP, GPU) vs. Big Data (Java, Python, R)
  • Interoperability: Data sources sinks often in different environments (IoT, cloud, HPC,

HPDC) than compute Current State:

  • Streaming (in sciences) often implemented on application-level (w/ limited re-use)
  • Manifold landscape of streaming tools (Apache Open Source Tools, Cloud Tools)
slide-3
SLIDE 3

Workload Characteristics

HPC Resource HPC Resource 2 HPC Resource 1

Simulation Analysis Simulation Analysis

slide-4
SLIDE 4

Workload Characteristics

HPC Resource 3 HPC Resource 1 HPC Resource 2

Message Broker

Simulation Analysis 1 Analysis 2

slide-5
SLIDE 5

Introduction Pilot Abstraction

Resource A Resource B Resource C Resource D User Application System Space User Space Resource Manager Pilot-Job System Policies Pilot-Job Pilot-Job

http://arxiv.org/abs/1207.6644

slide-6
SLIDE 6

Compute Resources (Nodes, Cores, VMs) Workload Management (Pilots, Condor) Orchestration (Pegasus, Taverna, Dryad, Swift) Declarative Languages (Swift) MPI Frameworks for Advanced Analytics & Machine Learning (Blas, ScaLAPACK, CompLearn, PetSc, Blast) Applications MapReduce Frameworks (Pilot-MapReduce) M a Cluster Resource Manager (Slurm, Torque, SGE) Storage Resources (Lustre, GPFS) Data Access (Virtual Filesystem, GridFTP, SSH) H En Da O Compute and Data Resources (Nodes, Cores, HDFS) Higher-Level Workload Management (TEZ, LLama) Advanced Analytics & Machine Learning (Mahout, R, MLBase) Applications MapReduce Cluster Resource Manager (YARN, Mesos) Map Reduce Scheduler Data Store & Processing (HBase) In-Memory (Spark) Spark Scheduler Twister MapReduce Twister Scheduler SQL-Engines (Impala, Hive, Shark, Phoenix) Scheduler MPI, RDMA Hadoop Shuffle/Reduction, HARP Collectives C

  • High-Performance Computing

Apache Hadoop Big Data Orchestration (Oozie, Pig) Advanced Analytics & Machine Learning (Pilot-KMeans, Replica Exchange) Storage Management (iRODS, SRM, GFFS)

The Convergence of HPC and “Data Intensive” Computing

A Tale of Two Data-Intensive Paradigms: Data Intensive Applications, Abstractions and Architectures In collaboration with Geoffrey Fox (Indiana), http://arxiv.org/abs/1403.1528

slide-7
SLIDE 7

HPC Scheduler (Slurm, Torque, SGE) YARN Map Reduce Other YARN App Mode I: Hadoop on HPC

System-level Scheduling Application-level Scheduling

Spark Spark- App YARN/HDFS Pilot-Job Mode II: HPC on Hadoop HPC App (e.g. MPI) Pilot-Job Hadoop/Spark App

Appli- cation

Hadoop Application Scheduler (e.g. Spark, Tez, LLama)

Pilot-Abstraction for HPC and Hadoop Interoperability

http://arxiv.org/abs/1602.00345

slide-8
SLIDE 8

Streaming and Batch Computing

Storage and Format (e.g. Lustre, HDFS,…) Compute (e.g. YARN, SLURM, Torque, PBS) Streaming Framework ETL Hadoop SQL Machine Learning Raw Text Columnar Data HDF5 Other Broker Broker Broker Mutable/ Random Access Message Broker Storage Stream Processing

Questions:

  • How to manage batch and

streaming frameworks side-by- side?

  • How to enable interoperability

between different programming system/models/middleware/schedu lers?

  • How to enable elasticity?

http://dx.doi.org/10.5281/zenodo.47946

slide-9
SLIDE 9

Pilot-Streaming

HTC (OSG/EGI) Cloud

Infrastructure User-Space

Distributed Application HPC Node n SSH Node n SSH Node Pilot Agent Hadoop Pilot API SAGA Pilot Compute Pilot Data Cloud YARN SSH Cloud

SRM (iRODS) S3 (HTTP)

Local/ Parallel FS (SSH/GO)

Globus Online HDFS (WebHDFS)

Node n SSH Node n SSH YARN Pilot Agent Node n SSH Node n SSH Node Pilot Agent Node n SSH Node n SSH EC2 VM Pilot Agent

Local / EBS (SSH) GFFS Local (iRODS)

iRODS HDFS Kafka

slide-10
SLIDE 10

Conclusion

  • 1. Pilot-Jobs enable the co-location of HPC/Simulations

and Big Data Tools (Hadoop, Spark, higher-level tools)

  • 2. Pilot-Streaming will support message-broker as data

source/sink that enables the de-coupling of applications

  • 3. Dynamic resource management provided by the Pilot-

Abstraction is critical for stream environments

slide-11
SLIDE 11

Thank you!