INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
Distributed Deep Learning Using Hopsworks SF Machine Learning - - PowerPoint PPT Presentation
Distributed Deep Learning Using Hopsworks SF Machine Learning - - PowerPoint PPT Presentation
I NTRODUCTION H OPSWORKS D ISTRIBUTED D EEP L EARNING P ARALLEL B LACK -B OX O PTIMIZATION S UMMARY Distributed Deep Learning Using Hopsworks SF Machine Learning Mesosphere Kim Hammar kim@logicalclocks.com I NTRODUCTION H OPSWORKS D ISTRIBUTED
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Distributed Computing Deep Learning
Why Combine the two?
2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Distributed Computing Deep Learning
Why Combine the two?
◮ We like challenging problems
2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Distributed Computing Deep Learning
Why Combine the two?
◮ We like challenging problems ◮ More productive data science ◮ Unreasonable effectiveness of data1 ◮ To achieve state-of-the-art results2
2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING
3
2em13 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DDL IS NOT A SECRET ANYMORE
4
2em14 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/ 1802.09941.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DDL IS NOT A SECRET ANYMORE
TensorflowOnSpark CaffeOnSpark Distributed TF
Frameworks for DDL Companies using DDL
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DDL REQUIRES AN ENTIRE
SOFTWARE/INFRASTRUCTURE STACK
e1 e2 e3
Distributed Training
e4 Gradient ∇ Gradient ∇ Gradient ∇ Gradient ∇
Distributed Systems Data Validation Feature Engineering Data Collection Hardware Management HyperParameter Tuning Model Serving Pipeline Management A/B Testing Monitoring
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTLINE
- 1. Hopsworks: Background of the platform
- 2. Managed Distributed Deep Learning using HopsYARN,
HopsML, PySpark, and Tensorflow
- 3. Black-Box Optimization using Hopsworks, Metadata
Store, PySpark, and Maggy5
2em15 Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy. 2019.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTLINE
- 1. Hopsworks: Background of the platform
- 2. Managed Distributed Deep Learning using HopsYARN,
HopsML, PySpark, and Tensorflow
- 3. Black-Box Optimization using Hopsworks, Metadata
Store, PySpark, and Maggy6
2em16 Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy. 2019.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS HopsYARN
(GPU/CPU as a resource)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data) Feature Store Pipelines Experiments Models
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yML/AI Assets
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data) Feature Store Pipelines Experiments Models
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yML/AI Assets
from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model)
APIs
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS
HopsFS HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Distributed Metadata
(Available from REST API) Feature Store Pipelines Experiments Models
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yML/AI Assets
from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model)
APIs
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING
Inner loop
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y
. . .
worker1 worker2 workerN Data Synchronization
∇1 ∇2 ∇N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING
Outer loop Metric τ Search Method hparams h
Inner loop
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y
. . .
worker1 worker2 workerN Data Synchronization
∇1 ∇2 ∇N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING
Outer loop Metric τ Search Method hparams h
Inner loop
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y
. . .
worker1 worker2 workerN Data Synchronization
∇1 ∇2 ∇N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1 e2 e3 e4
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1 e2 e3 e4
Gradient ∇ Gradient ∇ Gradient ∇ Gradient ∇
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1 e2 e3 e4 p1 p2 p3 p4
Gradient ∇ Gradient ∇ Gradient ∇ Gradient ∇ Data Partition Data Partition Data Partition Data Partition
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
DISTRIBUTED DEEP LEARNING IN PRACTICE
◮ Implementation
- f distributed
algorithms is becoming a commodity (TF, PyTorch etc)
◮ The hardest part
- f DDL is now:
◮ Cluster
management
◮ Allocating
GPUs
◮ Data
management
◮ Operations &
performance
?
Models GPUs Data Distribution
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment.collective_all_reduce(train_fn)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource
Resource requests Client API YARN container
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor
Resource requests Client API YARN container
Spark driver
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor
Resource requests Client API YARN container conda env
Spark driver
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor
Resource requests Client API Here is my ip: 192.168.1.1 Here is my ip: 192.168.1.2 Here is my ip: 192.168.1.3 Here is my ip: 192.168.1.4 YARN container conda env
Spark driver
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor
Resource requests Client API Gradient ∇ Gradient ∇ Gradient ∇ Gradient ∇ YARN container conda env
Spark driver
Hops Distributed File System (HopsFS)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
HOPSWORKS DDL SOLUTION
from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor
Resource requests Client API Gradient ∇ Gradient ∇ Gradient ∇ Gradient ∇ YARN container conda env
Spark driver
Hops Distributed File System (HopsFS)
◮
Hide complexity behind simple API
◮
Allocate resources using pyspark
◮
Allocate GPUs for spark executors using HopsYARN
◮
Serve sharded training data to workers from HopsFS
◮
Use HopsFS for aggregating logs, checkpoints and results
◮
Store experiment metadata in metastore
◮
Use dynamic allocation for interactive resource management
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop Metric τ Search Method hparams h
Inner loop
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y
. . .
worker1 worker2 workerN Data Synchronization
∇1 ∇2 ∇N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop Metric τ Search Method hparams h
Inner loop
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆ y
. . .
worker1 worker2 workerN Data Synchronization
∇1 ∇2 ∇N
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
x1 . . . xn
Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Example Use-Case from one of our clients:
◮ Goal: Train a One-Class GAN model for fraud detection ◮ Problem: GANs are extremely sensitive to
hyperparameters and there exists a very large space of possible hyperparameters.
◮ Example hyperparameters to tune: learning rates η,
- ptimizers, layers.. etc.
Real input x Random Noise z Generator
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Discriminator
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1 Which algorithm to use for search?
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1 How to monitor progress? Which algorithm to use for search?
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search?
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer 25 30 35 40 45 N u m L a y e r s 2 4 6 8 10 12 L e a r n i n g R a t e 0.00 0.02 0.04 0.06 0.08 0.10
Search Space
η1, .. η2, .. η3, .. η4, .. η5, ..
Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?
x1 . . . xn Features Hyperparameters η, num_layers, neurons
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y
Model θ
ˆ y
Prediction
L(y, ˆ y)
Loss Gradient ∇θL(y, ˆ y)
This should be managed with platform support!
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: A FRAMEWORK FOR SYNCHRONOUS/ASYNCHRONOUS HYPERPARAMETER TUNING ON HOPSWORKS7
A flexible framework for running different black-box
- ptimization algorithms on Hopsworks
◮ ASHA, Hyperband, Differential Evolution, Random
search, Grid search, etc.
2em17 Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy builds on: Robin Andersson
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: A FRAMEWORK FOR SYNCHRONOUS/ASYNCHRONOUS HYPERPARAMETER TUNING ON HOPSWORKS7
A flexible framework for running different black-box
- ptimization algorithms on Hopsworks
◮ ASHA, Hyperband, Differential Evolution, Random
search, Grid search, etc.
2em17 Authors of Maggy: Moritz Meister and Sina Sheikholeslami. Author of the base framework that Maggy builds on: Robin Andersson
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH ALGORITHMS
Un-directed search N spark tasks N eval metrics Max/Min
- h
Driver/Parameter Server Synchronous directed search (2 iterations) N spark tasks Barrier N spark tasks N eval metrics Synchronization Barrier
. . .
N eval metrics Synchronization Driver/Parameter Server
◮ Parallel undirected/synchronous search is trivial using
Spark and a distributed file system
◮ Example of un-directed search algorithms: random and
grid search
◮ Example of synchronous search algorithms: differential
evolution
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
FRAMEWORK SUPPORT FOR SYNCHRONOUS SEARCH ALGORITHMS
Fits very well with Spark BSP model
Un-directed search N spark tasks N eval metrics Max/Min
- h
Driver/Parameter Server Synchronous directed search (2 iterations) N spark tasks Barrier N spark tasks N eval metrics Synchronization Barrier
. . .
N eval metrics Synchronization Driver/Parameter Server
◮ Parallel un-directed/synchronous search is trivial using
Spark and a distributed file system
◮ Example of un-directed search algorithms: random and
grid search
◮ Example of synchronous search algorithms: differential
evolution
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
PROBLEM WITH THE BULK-SYNCHRONOUS PROCESSING MODEL FOR PARALLEL SEARCH
N spark tasks Barrier N eval metrics Waiting.. wasted compute Driver/Parameter Server
◮ Synchronous search is sensitive to stragglers and not suitable for
early stopping
◮ ... For large scale search problems we need asynchronous search ◮ Problem: Asynchronous search is much harder to implement
with big data processing tools such as Spark
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker HopsFS Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside HopsFS Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async fetching
- f new tasks
(RPC framework) Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
ENTER MAGGY: A FRAMEWORK FOR RUNNING ASYNCHRONOUS SEARCH ALGORITHMS ON HOPS
◮ Robust against
stragglers
◮ Supports early
stopping
◮ Fault tolerance with
checkpointing
◮ Monitoring with
Tensorboard
◮ Log aggregation
with HopsFS
◮ Simple API and
extendable
1 spark task/worker, many async tasks inside HopsFS Write checkpoints & results Async fetching
- f new tasks
(RPC framework) Async Task Queue/Driver/Parameter Server
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α
Workers Coordinator
Global Task Queue
20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1
Trials Progress Black-Box Optimziers minx f(x) x ∈ S
Suggested tasks Results Suggested tasks Results Suggested tasks Results
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α
Workers Coordinator
Global Task Queue
20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1
Trials Progress Black-Box Optimziers minx f(x) x ∈ S
Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α
Workers Coordinator
Global Task Queue
20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1
Trials Progress Black-Box Optimziers minx f(x) x ∈ S
Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: ASYNCHRONOUS SEARCH WORKFLOW
λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α λ
Suggestions
b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ yTrial Metric
α
Workers Coordinator
Global Task Queue
20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1
Trials Progress Black-Box Optimziers minx f(x) x ∈ S
Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats Checkpoints
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
class RandomSearch(AbstractOptimizer): def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials , direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer base class to implement their own algorithms.
class RandomSearch(AbstractOptimizer): def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials , direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space
class RandomSearch(AbstractOptimizer): def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials , direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space Suggestions to be evaluated by workers
class RandomSearch(AbstractOptimizer): def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials , direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space Suggestions to be evaluated by workers Aggregate results
class RandomSearch(AbstractOptimizer): def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials , direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
Users have to extend the AbstractOptimizer base class to implement their own algorithms. Initializing search space Suggestions to be evaluated by workers Aggregate results Configure early-stop policy
class RandomSearch(AbstractOptimizer): def initialize(self):
# ..
def get_suggestion(self, trial=None):
# ..
def finalize_experiment(self, trials):
# ..
def early_check(self, to_check , trials , direction):
# ..
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
MAGGY: API
from maggy import experiment from maggy.searchspace import Searchspace from maggy.randomsearch import RandomSearch sp = Searchspace(argument_param=(’DOUBLE’, [1, 5])) rs = RandomSearch(5, sp) result = experiment.launch(train_fn , sp, optimizer=rs, num_trials=5, name=’test’, direction="max")
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
SUMMARY
◮ Deep Learning is going distributed ◮ Algorithms for DDL are available in several frameworks ◮ Applying DDL in practice brings a lot of operational
complexity
◮ Hopsworks is a platform for scale out deep learning and
big data processing
◮ Hopsworks makes DDL simpler by providing simple
abstractions for distributed training, parallel experiments and much more..
@hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, and Rasmus Toivonen. And our interns: Moritz Meister and Sina Sheikholeslami.
INTRODUCTION HOPSWORKS DISTRIBUTED DEEP LEARNING PARALLEL BLACK-BOX OPTIMIZATION SUMMARY
REFERENCES
◮ Example notebooks https:
//github.com/logicalclocks/hops-examples
◮ HopsML8 ◮ Hopsworks9 ◮ Hopsworks’ feature store10 ◮ Maggy
https://github.com/logicalclocks/maggy
2em18 Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en / latest/hopsml/hopsML.html. 2018. 2em19 Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing - hopsworks/. 2018. 2em110 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www. logicalclocks.com/feature-store/. 2018.