Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu
Pilot-Streaming: Design Considerations for a Stream Processing - - PowerPoint PPT Presentation
Pilot-Streaming: Design Considerations for a Stream Processing - - PowerPoint PPT Presentation
Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu Motivation There is
Motivation
There is a need to couple data sources, HPC, analytics! 20+ applications identified at STREAM16 Challenges:
- Data applications and pipelines are complex
- Scalability and Elasticity: dynamic changes in resource demands
- Scheduling and provisioning of resources: right amount of resources at right time
- Programming models: HPC (MPI, OpenMP, GPU) vs. Big Data (Java, Python, R)
- Interoperability: Data sources sinks often in different environments (IoT, cloud, HPC,
HPDC) than compute Current State:
- Streaming (in sciences) often implemented on application-level (w/ limited re-use)
- Manifold landscape of streaming tools (Apache Open Source Tools, Cloud Tools)
Workload Characteristics
HPC Resource HPC Resource 2 HPC Resource 1
Simulation Analysis Simulation Analysis
Workload Characteristics
HPC Resource 3 HPC Resource 1 HPC Resource 2
Message Broker
Simulation Analysis 1 Analysis 2
Introduction Pilot Abstraction
Resource A Resource B Resource C Resource D User Application System Space User Space Resource Manager Pilot-Job System Policies Pilot-Job Pilot-Job
http://arxiv.org/abs/1207.6644
Compute Resources (Nodes, Cores, VMs) Workload Management (Pilots, Condor) Orchestration (Pegasus, Taverna, Dryad, Swift) Declarative Languages (Swift) MPI Frameworks for Advanced Analytics & Machine Learning (Blas, ScaLAPACK, CompLearn, PetSc, Blast) Applications MapReduce Frameworks (Pilot-MapReduce) M a Cluster Resource Manager (Slurm, Torque, SGE) Storage Resources (Lustre, GPFS) Data Access (Virtual Filesystem, GridFTP, SSH) H En Da O Compute and Data Resources (Nodes, Cores, HDFS) Higher-Level Workload Management (TEZ, LLama) Advanced Analytics & Machine Learning (Mahout, R, MLBase) Applications MapReduce Cluster Resource Manager (YARN, Mesos) Map Reduce Scheduler Data Store & Processing (HBase) In-Memory (Spark) Spark Scheduler Twister MapReduce Twister Scheduler SQL-Engines (Impala, Hive, Shark, Phoenix) Scheduler MPI, RDMA Hadoop Shuffle/Reduction, HARP Collectives C
- High-Performance Computing
Apache Hadoop Big Data Orchestration (Oozie, Pig) Advanced Analytics & Machine Learning (Pilot-KMeans, Replica Exchange) Storage Management (iRODS, SRM, GFFS)
The Convergence of HPC and “Data Intensive” Computing
A Tale of Two Data-Intensive Paradigms: Data Intensive Applications, Abstractions and Architectures In collaboration with Geoffrey Fox (Indiana), http://arxiv.org/abs/1403.1528
HPC Scheduler (Slurm, Torque, SGE) YARN Map Reduce Other YARN App Mode I: Hadoop on HPC
System-level Scheduling Application-level Scheduling
Spark Spark- App YARN/HDFS Pilot-Job Mode II: HPC on Hadoop HPC App (e.g. MPI) Pilot-Job Hadoop/Spark App
Appli- cation
Hadoop Application Scheduler (e.g. Spark, Tez, LLama)
Pilot-Abstraction for HPC and Hadoop Interoperability
http://arxiv.org/abs/1602.00345
Streaming and Batch Computing
Storage and Format (e.g. Lustre, HDFS,…) Compute (e.g. YARN, SLURM, Torque, PBS) Streaming Framework ETL Hadoop SQL Machine Learning Raw Text Columnar Data HDF5 Other Broker Broker Broker Mutable/ Random Access Message Broker Storage Stream Processing
Questions:
- How to manage batch and
streaming frameworks side-by- side?
- How to enable interoperability
between different programming system/models/middleware/schedu lers?
- How to enable elasticity?
http://dx.doi.org/10.5281/zenodo.47946
Pilot-Streaming
HTC (OSG/EGI) Cloud
Infrastructure User-Space
Distributed Application HPC Node n SSH Node n SSH Node Pilot Agent Hadoop Pilot API SAGA Pilot Compute Pilot Data Cloud YARN SSH Cloud
SRM (iRODS) S3 (HTTP)
Local/ Parallel FS (SSH/GO)
Globus Online HDFS (WebHDFS)
Node n SSH Node n SSH YARN Pilot Agent Node n SSH Node n SSH Node Pilot Agent Node n SSH Node n SSH EC2 VM Pilot Agent
Local / EBS (SSH) GFFS Local (iRODS)
iRODS HDFS Kafka
Conclusion
- 1. Pilot-Jobs enable the co-location of HPC/Simulations
and Big Data Tools (Hadoop, Spark, higher-level tools)
- 2. Pilot-Streaming will support message-broker as data
source/sink that enables the de-coupling of applications
- 3. Dynamic resource management provided by the Pilot-