harish@datatron.com jerry@datatron.com
Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies
Lessons Learned Deploying and Monitoring AI Models in Production at - - PowerPoint PPT Presentation
harish@datatron.com jerry@datatron.com Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies Who are we? Harish Doddi CEO Jerry Xu CTO datatron 2 Todays Enterprise AI life cycle Development:
harish@datatron.com jerry@datatron.com
Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies
datatron
Who are we?
2
Harish Doddi
CEO
Jerry Xu
CTO
datatron
Development: Discovery Optimization Production
Today’s Enterprise AI life cycle
“The Playground” “The Battleground”
3
“The Battleground”
datatron
Lesson 1
4
datatron
Deploy and Done
Time Performance
Deploy over and
Are your models decaying?
5
Time Performance
Model decays over time Model replenishes
Result Result Model performance decreases Model performance consistent
datatron
ML model cycle is a continuously optimizing process
6
Model Building Model Deployment Model Monitoring Model Testing Model Management
Concept drift New concept comes up …
datatron
Connecting Machine Learning to Software world
7
Before Now
Software deployment
Software deployment
every day
Future
Machine Learning models will deploy
very frequent and fast
Machine Learning models deploy
very slow BUT
datatron
Lesson 2
8
datatron
9
South Park and Alexa
datatron
Monitoring Learning: Post mortem is the only option
10
With Model Monitoring
The problem
The team decides what to do
Without Model Monitoring
The problem
The team detects the problem and decides what to do Notify asap
datatron
Model Performance monitoring
Model Timeout monitoring Infrastructure monitoring Organization KPI monitoring Deployment monitoring
11
Monitoring for Machine Learning Models
datatron
Lesson 3
12
datatron
Enterprise AI Life Cycle
13
Exploration Training Deploy
datatron
Feature distribution Model result Model routing Challenger KPI based selection Monitor performance Fall back strategy Alerting Split traffic Shadowing Blue Green Deployment Rollback Canary
Enterprise AI Life Cycle After Deployment
14
Deploy A/B Testing SLA Model Selection Anomaly detection
datatron
Lesson 4
15
datatron
Deployment Learning: Rise of new engineering role
16
Machine Learning Engineer Deep Learning Engineer
There is a hyper-competitive
that is projected to get much worse
datatron
17
BEST CASE SCENARIO: with a world-class team, 1 model deployed per quarter
Data Science End User Machine Learning Engineering DevOps
Production Model
Teams face cross- functional inefficiencies
datatron
Hidden Technical Debt in Machine Learning Systems
18
Google Paper
Hidden Technical Debt in Machine Learning Systems Machine learning offers a fantastically powerful toolkit for building complex systems quickly. … it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. Boundary erosion Entanglement Hidden feedback loops Undeclared consumers Data dependencies Changes in the external world System-level anti-patterns
datatron
Lesson 5
19
datatron
20
Deployment Learning: 1 model vs Multiple models
datatron
Cost per model increases significantly if no automation
21
# of models
As the number of models increases, the cost also increases
# of models Cost per model Cost per model
datatron
Lesson 6
22
datatron
Software Development vs ML Model Development
23
Evolution Testing Implementation Design Requirements Monitoring and Optimization Deploy to Production Training / Testing Data Preparation Requirements
Senior People Senior People
datatron
Lesson 7
24
datatron
Build/Bring Your Own Models, Frameworks, Languages
25
harish@datatron.com