Leveraging AWS and Machine Learning to Power Search at Zocdoc - - PowerPoint PPT Presentation

leveraging aws and machine learning to power search at
SMART_READER_LITE
LIVE PREVIEW

Leveraging AWS and Machine Learning to Power Search at Zocdoc - - PowerPoint PPT Presentation

Leveraging AWS and Machine Learning to Power Search at Zocdoc Pedro Rubio Head of Search Engineering Brian dAlessandro Head of Data Science This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be


slide-1
SLIDE 1

This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be reproduced or shared, in whole or in part, without the express written authorization of Zocdoc, Inc.

Leveraging AWS and Machine Learning to Power Search at Zocdoc

Pedro Rubio Head of Search Engineering Brian d’Alessandro Head of Data Science

slide-2
SLIDE 2

Agenda

  • How we’re built - People and Architecture
  • How we’re built - the Data
  • Questions
slide-3
SLIDE 3

3

Problem Statements:

  • 1. Patients need to find and book with a doctor, and,
  • 2. Patients don’t often know what kind of doctor they

need.

slide-4
SLIDE 4

4

slide-5
SLIDE 5

How we’re built

And solving “what the patient means”

slide-6
SLIDE 6

6

  • Cross team collaboration enabling maximum iteration

speed

  • Deliver recommendations < 200 ms
  • Patient satisfaction

(And our architecture plays a big role here!)

Core Optimization Problems for ZD Search

slide-7
SLIDE 7

7

The Search Team

Engineering Product Data Science Design

slide-8
SLIDE 8

Zocdoc Tech Stack

  • NodeJS, ES6, Babel, React
  • AWS - Cloudformation, Docker, ECR,

EC2, ELB

  • Kinesis / Firehose - S3
  • reporting to data-lake
  • Monitoring with Datadog
  • Routes with Express
slide-9
SLIDE 9

Our Legacy Search

slide-10
SLIDE 10

Free Text (Patient Powered) Search

slide-11
SLIDE 11

11

Types of Intent

Name of doctor Medical procedure Specialty Symptom

slide-12
SLIDE 12

12

Intent Parsing Architecture Machine Learning Design

slide-13
SLIDE 13

Phase I - Auto-Suggest Search Service Pipeline

Browser

Doctor Name Retrieval

Semantic Service

NLP Corpus Building Specialty Retrieval Visit Reason Retrieval Service Handler Semantic Retrieval Auto-Suggest Ranking

Logging

Ranking Models Results

Phase II - Backend Search

slide-14
SLIDE 14

14

The structured queries comprise a reasonable percent of traffic, but are a minority of total search terms we service.

Solving for the Long Tail

Specialties = O(10^2), Procedures = O(10^3), Names = O(10^6), Other = O(10^7)

We use Natural Language Processing (NLP) algos to map unstructured terms into

  • ur structured search set.
slide-15
SLIDE 15

15

Variations

  • Heart beats too

fast

  • Heart flutters
  • Pulse rate too high
  • Irregular pulse
  • Heart out of

rhythm

  • Irregular heartbeat
  • Heart palpitations

Concept Interpretation

  • Irregular

heartbeat

  • Heart

palpitations Medical Term

  • Atrial

Fibrillation

Different Representations of Same Concept

slide-16
SLIDE 16

16

ZocDoc Semantic Service

f(“presentation anxiety”) = {[{Specialty =“Psychologist”, Relevance = 0.8}, …,{Specialty =“Psychiatrist”, Relevance = 0.7}]}

slide-17
SLIDE 17

Early Results (And Why You Need to Always Experiment)

slide-18
SLIDE 18

18

Searches that Lead to “Nephrology”

Many patients don’t know what a Nephrologist is. They don’t need to know to find one now.

slide-19
SLIDE 19

How We’re Built - The Data

slide-20
SLIDE 20

Data - Indexing so we can Search

slide-21
SLIDE 21

Lesson Learned with Indexing Data

Monolith Live Cache Feed Process S3 ƛ AWS Legacy Layer Elastic.co

Lambda’s act as a mini ETL layer getting the documents ready for our retrieval stage.

  • Lambda memory max 1500mb
  • Our data much larger
  • Manage state in S3 and

Elasticsearch

  • Complex “stateless” ETL process that

transforms this data into the data that we need in Elasticsearch

  • Load piecemeal into Elasticsearch
  • At the very end, swap alias to use

newly uploaded indexes

slide-22
SLIDE 22

More complex Processing

1 2 Joined Data Set

Mapping from 1 -> 2

3

Mapping from 1 -> 3 Business Logic Application

  • Spark - ETL
  • Get over 1500mb limit
  • Get over 5 minute runtime limit
  • Easily add more data-sets
  • Currently in Databricks
  • Plan to migrate to EMR (Elastic Map Reduce)

New ETL - Spark

slide-23
SLIDE 23

Data - Event Data So we can Learn

slide-24
SLIDE 24

24

The Marketplace

Goal: Make it as easy as possible to match the user to the right doctor. Considerations:

  • How to weight distance
  • vs. availability vs.

experience vs. reviews?

  • Does Dr. take this type of

patient?

  • Are we meeting regulatory

requirements?

slide-25
SLIDE 25

25

Optimize: algo iteration speed Subject to:

  • Org too small to justify full time data scientists within

search

  • Throwing models over the wall to be implemented

doesn’t work

Organizational Optimization

slide-26
SLIDE 26

26

Model API Query Ranked Results Transformations + Model Scoring Model dB Filtered Results Scored Results

ZocDoc Prod Service (Search) Logs (S3/Redshift) Research, Analysis, Model Development (Spark/Redshift)

Production Offline

Engineering Owned DS Owned

Agile Machine Learning

slide-27
SLIDE 27

27

Aqueduct: Filling the Data Lake

Some Data Lake principles:

  • Allow producers to easily

push data

  • Allow data format changes
  • Smart ETL to make

consumption very easy

slide-28
SLIDE 28

28

Cistern: Making Datalake Drinkable

  • “Raw” data lake good for

exploratory research (we use Spark)

  • “Clean” data lake better for

analytics and quick exploration

slide-29
SLIDE 29

Data - Insights

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

Searches for Therapy/Therapist on Zocdoc

We’ve got

  • ur fingers on

the pulse of public health trends.

slide-32
SLIDE 32

32

How Much is that Smile Worth?

Click Conversion by Search Rank and DrIsSmiling

We’re exploring AWS Rekognition to research what drives user interest in Dr. profiles.

slide-33
SLIDE 33
slide-34
SLIDE 34

34

slide-35
SLIDE 35

Thank you and Questions!