DSC 102 Systems for Scalable Analytics Arun Kumar Topic 7: ML - - PowerPoint PPT Presentation

▶

Aug 24, 2022 285 likes •429 views

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 7: ML Deployment Not included for Final Exam Slide Content ACKs: Alkis Polyzotis, Manasi Vartak 1 The Lifecycle of ML-based Analytics Feature Engineering Data acquisition Model

SLIDE 1

Topic 7: ML Deployment Not included for Final Exam

Arun Kumar

DSC 102  Systems for Scalable Analytics

Slide Content ACKs: Alkis Polyzotis, Manasi Vartak

SLIDE 2

The Lifecycle of ML-based Analytics

Data acquisition Data preparation Feature Engineering Training & Inference Model Selection Model Serving Monitoring

SLIDE 3

Deployment Stage of Data Science

❖ Data science does not exist in a vacuum. It must interplay with the data-generating process and prediction application ❖ Deploy Stage: Integrate the trained prediction function(s) with production environment, e.g., offline inference in a data system, online inference on a Web platform / IoT / etc. ❖ Typically, data scientist must work with “DevOps” engineers

r “MLOps” engineers to achieve this

SLIDE 4

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

ML in Academia vs Production

What you classes on statistics, ML, AI, etc. cover! ☺

SLIDE 5

Deployment Stage of Data Science

❖ Deployment stage typically involves 5 main activities in sync with other stages:

1. Packaging and Orchestration
2. Prediction Serving
3. Data Validation
4. Prediction Monitoring
5. Versioning

SLIDE 6

1. Packaging and Orchestration

❖ Basic Goal: Bundle up software to deploy with its dependencies into a lightweight standalone executable software that can run almost seamlessly across different OSs and hardware environments ❖ Most common approach today: Containerization ❖ Not specific to ML deployment but highly general ❖ Older generation approach called “virtual machines” included OS too and were bulky and slow ❖ Docker and Kubernetes are most popular options today

SLIDE 7

https://medium.com/edureka/kubernetes-vs-docker-45231abeeaf1

1. Packaging and Orchestration

SLIDE 8

❖ Often, one might need to deploy end-to-end pipelines with effectively independent contrainerized software modules ❖ Workflow orchestration tools help handle complex pipelines ❖ Can specify time constraints, operation constraints, etc.

1. Packaging and Orchestration

SLIDE 9

❖ Cloud providers are also starting to make it easier to package and deploy prediction software, e.g., Model Endpoint in AWS Sagemaker ❖ Data scientists must look out for

rganization’s

tools and services

1. Packaging and Orchestration

SLIDE 10

❖ Basic Goal: Make ML inference fast and potential co-

ptimize with serving environment/infra.

❖ Typically automated tools; so data scientists only needs to know what systems available and how to use them ❖ 3 main kinds of systems: ❖ Program optimization of prediction function to improve hardware utilization, e.g., ONNX Runtime or Apache TVM ❖ Batch optimization of many concurrent prediction requests to balance latency and throughput better to improve hardware utilization, e.g., AWS SageMaker ❖ New hardware optimized for inference, e.g., TPUs

2. Prediction Serving

SLIDE 11

❖ Basic Goal: Ensure the data fed into prediction function conforms to its expectations on, say, schema/syntax/shape, integrity constraints (e.g., value ranges or domains), etc. ❖ Needs to be in lock step with data sourcing stage: acquiring, re-organizing, cleaning, and feature extraction ❖ Industry is starting to build platforms to make this process more rigorous and reusable, e.g., TensorFlow Extended ❖ Data scientists must learn their organization’s data validation practices and tools/APIs ❖ Also covered in Alkis’s guest lecture; further reading: https:// mlsys.org/Conferences/2019/doc/2019/167.pdf

3. Data Validation

SLIDE 12

❖ Basic Goal: Ensure the prediction functions are working as intended by data scientist; “silent failures” can happen due to concept drifts, i.e., data distribution has deviated significantly from when prediction function was built! ❖ Example: Sudden world event changes Web user behavior drastically, e.g., WHO declares pandemic! ☺ ❖ Needs to be in lock step with model building stage ❖ Industry today uses ad hoc statistical approaches ❖ Data scientists must look out for organizations’ monitoring practices, since it affects the lifecycle loop frequency ❖ Also covered in Alkis’s guest lecture; further reading: https:// mlsys.org/Conferences/2019/doc/2019/167.pdf

4. Prediction Monitoring

SLIDE 13

❖ Basic Goal: Just like regular code, prediction software must be versioned and tracked for teams to ensure consistency across time and employees, as well as for auditing sake, ability to “rollback” to a safer state, etc. ❖ But unlike regular code, prediction software has 3 more dependencies other than just code: datasets (train/val/test), configuration (e.g., hyper-parameters), and environment (hardware/software, since that can affect accuracy too) ❖ Research and industry are barely starting to figure this out ❖ Data scientists must look out versioning best practices/tools ❖ Covered in Manasi’s guest lecture; https://blog.verta.ai/blog/ how-to-move-fast-in-ai-without-breaking-things

5. Versioning