Lessons Learned Deploying and Monitoring AI Models in Production at - - PowerPoint PPT Presentation

lessons learned deploying and monitoring ai models in
SMART_READER_LITE
LIVE PREVIEW

Lessons Learned Deploying and Monitoring AI Models in Production at - - PowerPoint PPT Presentation

harish@datatron.com jerry@datatron.com Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies Who are we? Harish Doddi CEO Jerry Xu CTO datatron 2 Todays Enterprise AI life cycle Development:


slide-1
SLIDE 1

harish@datatron.com jerry@datatron.com

Lessons Learned Deploying and Monitoring AI Models in Production at Major Tech Companies

slide-2
SLIDE 2

datatron

Who are we?

2

Harish Doddi

CEO

Jerry Xu

CTO

slide-3
SLIDE 3

datatron

Development:
 Discovery Optimization Production

Today’s Enterprise AI life cycle

“The Playground” “The Battleground”

3

“The Battleground”

slide-4
SLIDE 4

datatron

Lesson 1

You either NEVER deploy a model, or you have to do it over and over again

4

slide-5
SLIDE 5

datatron

Deploy and Done

Time Performance

Deploy over and

  • ver…

Are your models decaying?

5

Time Performance

Model decays over time Model replenishes

Result Result Model performance decreases Model performance consistent

slide-6
SLIDE 6

datatron

ML model cycle is a continuously optimizing process

6

Model Building Model Deployment Model Monitoring Model Testing Model Management

Concept drift New concept comes up …

slide-7
SLIDE 7

datatron

Connecting Machine Learning to Software world

7

Before Now

Software deployment

  • nce a 1 or 2 years

Software deployment

every day

Future

Machine Learning models will deploy

very frequent 
 and fast

Machine Learning models deploy

very slow BUT

slide-8
SLIDE 8

datatron

Lesson 2

Models may go wrong, you need to monitor them

8

slide-9
SLIDE 9

datatron

9

South Park and Alexa

slide-10
SLIDE 10

datatron

Monitoring Learning: Post mortem is the only option

10

With Model Monitoring

The problem

  • ccurs

The team decides what to do

Without Model Monitoring

The problem

  • ccurs

The team detects the problem and decides what to do Notify asap

slide-11
SLIDE 11

datatron

Model Performance monitoring

  • Confusion Matrix
  • Gain and Lift charts
  • Kolomogorov Smirnov chart
  • Area Under the ROC curve
  • Gini Coefficient
  • Concordant – Discordant ratio
  • Root Mean Squared Error (RMSE)
  • etc

Model Timeout monitoring Infrastructure monitoring Organization KPI monitoring Deployment monitoring

11

Monitoring for Machine Learning Models

slide-12
SLIDE 12

datatron

Lesson 3

Your real work starts AFTER you deploy the model to production

12

slide-13
SLIDE 13

datatron

Enterprise AI Life Cycle

13

Exploration Training Deploy

slide-14
SLIDE 14

datatron

Feature distribution Model result Model routing Challenger KPI based selection Monitor performance Fall back strategy Alerting Split traffic Shadowing Blue Green Deployment Rollback Canary

Enterprise AI Life Cycle After Deployment

14

Deploy A/B Testing SLA Model Selection Anomaly detection

slide-15
SLIDE 15

datatron

Lesson 4

Data science is scarce resource, you need to make sure you organize well

15

slide-16
SLIDE 16

datatron

Deployment Learning: Rise of new engineering role

16

Machine Learning Engineer Deep Learning Engineer

There is a hyper-competitive

WAR FOR TALENT

that is projected to get much worse

slide-17
SLIDE 17

datatron

17

BEST CASE SCENARIO: with a world-class team, 1 model deployed 
 per quarter

Data Science End User Machine Learning Engineering DevOps

  • Teams operate in silos, don’t speak the same language
  • Errors due to lack of communication
  • Engineering has to write stand-alone scripts

Production Model

Teams face cross- functional inefficiencies

slide-18
SLIDE 18

datatron

Hidden Technical Debt in Machine Learning Systems

18

Google Paper

Hidden Technical Debt in Machine Learning Systems Machine learning offers a fantastically powerful toolkit for building complex systems quickly. … it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. Boundary erosion Entanglement Hidden feedback loops Undeclared consumers Data dependencies Changes in the external world System-level anti-patterns

slide-19
SLIDE 19

datatron

Lesson 5

Be prepared, your number of models will increase

19

slide-20
SLIDE 20

datatron

20

Deployment Learning: 1 model vs Multiple models

slide-21
SLIDE 21

datatron

Cost per model increases significantly if no automation

21

# of models

As the number of models increases, the cost also increases

# of models Cost per model Cost per model

slide-22
SLIDE 22

datatron

Lesson 6

Senior people are needed AFTER deploying to production

22

slide-23
SLIDE 23

datatron

Software Development vs ML Model Development

23

Evolution Testing Implementation Design Requirements Monitoring and Optimization Deploy to Production Training / Testing Data Preparation Requirements

Senior People Senior People

slide-24
SLIDE 24

datatron

Lesson 7

Don’t be married to a single framework

24

slide-25
SLIDE 25

datatron

Build/Bring Your Own Models, Frameworks, Languages

25

slide-26
SLIDE 26

harish@datatron.com

Thank you!

Innovators Pavilion Booth P4