Kaycee Lai, CEO & Founder Presto Summit NYC 2019 WHO WE ARE - - PowerPoint PPT Presentation

kaycee lai ceo amp founder presto summit nyc 2019
SMART_READER_LITE
LIVE PREVIEW

Kaycee Lai, CEO & Founder Presto Summit NYC 2019 WHO WE ARE - - PowerPoint PPT Presentation

Kaycee Lai, CEO & Founder Presto Summit NYC 2019 WHO WE ARE EXEC TEAM $400M+ from successful startup exits Pedigree from GOOG, VMW, MSFT, ORCL Dr. Shuo Yang Azary Smotrich Kaycee Lai VP, Engineering Principal Architect CEO & Founder


slide-1
SLIDE 1

Kaycee Lai, CEO & Founder Presto Summit NYC 2019

slide-2
SLIDE 2

2

WHO WE ARE

EXEC TEAM

$400M+ from successful startup exits

Pedigree from GOOG, VMW, MSFT, ORCL

TOP INVESTORS

Successful track record nurturing startups to success

DOMAIN EXPERTISE

Data Ops / Big Data / Analytics / Cloud / Data Management / Cluster Management/ Data Governance CEO & Founder VP, Engineering Principal Architect Jocelyn Goldfein

Board Member Zetta Ventures

Arnold Silverman

Board Member Discovery Ventures

Graham Brooks

Board Member .406 Ventures

Jeff Parks

Investor Riverwood

Kaycee Lai

  • Dr. Shuo Yang

Azary Smotrich

  • GM $120M P&L @EMC
  • President @Waterline Data
  • VP Sales

@Virsto (VMware) @Avamar (EMC) @Delphix

  • Ph.D. CS from Purdue Univ.
  • Key member of ”Borg” @Google
  • Built cloud native analytics @EA
  • Office of CTO @Oracle
  • Founding Eng @ModleN
  • Prescriptive Analytics @NASA
  • Founding Eng. @Waterline
slide-3
SLIDE 3

3 Resources Required: Business Analysts / IT / Data Scientists / DBAs /BI Developers / SIs

Weeks Hours Days Months Days Months

Prep Data SQL Statement Query Data Visualize Data 6 4 3 1 5 Move Data 2 Govern Data 7 Discover Data Resources Required: Data Governance / Office of CDO / Compliance

GETTING ANSWERS FOR BI IS EVEN HARDER

Not sure if the data is right until step 6!

>4 Months to answer 1 question

slide-4
SLIDE 4

4

BI/ANALYTICS SHOULD BE ABOUT ANSWERING QUESTIONS…RIGHT?

slide-5
SLIDE 5

5

OUR VISION TO SIMPLIFY BI & ANALYTICS

Any Data Source Location Relationships Instructions Intent of question Assembly logic SQL Statement Federated Query BI Integration

Reveal Rationalize Execute 3 2 1 4 Connect

Reduce a 4 month process to minutes

slide-6
SLIDE 6

6 FAST, SCALABLE, SAAS PLAFORM ON CLOUD (AWS)

RATIONALIZE

Logical Guidance (Reasoner)

REVEAL

Relationships (Data Map)

CONNECT

Data Catalogs Data Sources

EXECUTE

Federated Query

DATA AS A SERVICE WITH PROMETHIUM

VISUALIZE

Self-Service Analytics Instructions (Directions) NLP Search (Question Builder) Location (Data Explorer) Data Discovery Auto SQL Statement (SQL AI) Data “Prep”

slide-7
SLIDE 7

ARCHITECTURE

slide-8
SLIDE 8

8

SCALABLE ARCHITECTURE

DATA CONTEXT ENGINE FRONT END QUERY EXECUTION AI/NLP

slide-9
SLIDE 9

9

KEY COMPONENTS

DATA SOURCES QUERY EXECUTION 3RD PARTY DATA CATALOG

slide-10
SLIDE 10

10

HOW IT WORKS - CONNECT

DATA SOURCES 3rd PARTY DATA CATALOG

SMART BOTS Cloud JDBC HDFS Data Catalogs

  • 1. API-Based (e.g. JDBC)
  • 2. Name / Location / Schema
  • 3. No heavy processing & data movement
  • 4. Alt. Names: Tags / Synonyms
  • 5. Data Quality
  • 6. Lineage

INFO FROM SMARTBOTS:

slide-11
SLIDE 11

11

HOW IT WORKS - REVEAL

DATA CONTEXT ENGINE

  • 3. Location (IP address / URL)

DATA EXPLORER (FIND DATA)

  • 1. Table/File/Column Name
  • 2. Vendor Name / Data Type

DIRECTIONS (ASSEMBLE)

  • 3. Select / Join
  • 1. Tables / Files
  • 2. From what Vendor

DATA MAP (VISUALIZE)

  • 1. Topology
  • 2. Relationships
  • 3. Alternate Versions
slide-12
SLIDE 12

12

HOW IT WORKS - RATIONALIZE

DATA CONTEXT ENGINE

DATA MAP (VISUALIZE)

  • 1. Delete / Change Tables
  • 2. Find Missing Tables via Catalog

DIRECTIONS (ASSEMBLE)

  • 3. Auto-Create SQL Statement for Presto
  • 1. Change Join Types
  • 2. Change Join Operators
slide-13
SLIDE 13

13

HOW IT WORKS - EXECUTE

VIRTUAL VIEW

Query Initiated Direct Access to Data – No ETL

slide-14
SLIDE 14

14

THE NO-ETL APPROACH

HOW IT WAS DONE

~ 4 MONTHS

HOW IT CAN BE TODAY

~ 4 MINUTES

Build complex data pipelines Copy/Move data to a data warehouse / lake Schedule long running ETL jobs

1 3 2

Manually subset, join, write SQL statements

4

Select data discovered Run queries directly from source

1 2

Query against data warehouse/ lake

5

slide-15
SLIDE 15

15

SUPPORT & INTEGRATION

SUPERSET

* On Roadmap * *

DATA SOURCES DATA VIRTUALIZATION DATA VISUALIZATION DATA LINEAGE DATA CATALOG PLATFORM

(PUBLIC CLOUD OR ON-PREM VPC) RDBMS: HDFS: S3 based: Cloud:

slide-16
SLIDE 16

BUSINESS IMPACT

slide-17
SLIDE 17

17

PAIN POINTS ADRESSED

Data is fractured across multiple systems, multiple vendors and multiple locations No single tool can search for data across the entire data estate Loading all of the data into a single repository is expensive & time consuming

FINDING DATA IS COMPLEX & TIME CONSUMING.

Need to know data relationships to know what / how to join SQL statements can take up to 8 hours to create Few people in the organization who can write a valid SQL statement

ANSWERING QUESTIONS STILL TAKES HUGE MANUAL EFFORT POST DATA DISCOVERY.

Insights are limited as SQL statements can only reflect data from one system Highly manual two-step process of moving data from each separate vendor then manually joining the data

QUERIES ACROSS DIFFERENT SOURCES / VENDORS = HARD TO EXECUTE

PROMETHIUM’S DATA EXPLORER ™ 1 SINGLE solution to find data without the need to move data Reveals context of data across all vendors and systems PROMETHIUM’S QUESTION BUILDER ™ NLP driven method to transform questions into data PROMETHIUM’S DATA MAP & DIRECTIONS Instantly generates a STEP BY STEP assembly directions + DATA MAP PROMETHIUM’S SQL AI Instantly generates a valid SQL statement PROMETHIUM’S KALEIDOSCOPE ™ 1-STEP Federated query execution across various sources with integration for BI tools such as Tableau

PROBLEM PROBLEM PROBLEM SOLUTION SOLUTION SOLUTION

slide-18
SLIDE 18

18

TODAY: TIME & EFFORT

TODAY

Time to find Data 4 weeks Time to Move Data 5 days Time to Subset/Model/Join 2 months Time to Write 1 SQL Statement 8 hours Time to Aggregate & Query Data 3 days # Data Analysts 4 # Data Engineers 2 # Business Analysts 2

TODAY

# of People Involved 8 Amount of Time (months) 3 month+ # of Questions answered in 1 year < 4 Cost of Asking 4 Questions $549,973 Data Analyst Cost $125,000 Data Engineer Cost $250,000 Business Analyst Cost $90,000

slide-19
SLIDE 19

19

PROMETHIUM EFFICIENCY

TODAY PROMETHIUM

Time to find Data 4 weeks 1 min Time to Move Data 5 days Time to Subset/Model/Join 2 months 2 sec Time to Write 1 SQL Statement 8 hours 1 min Time to Aggregate & Query Data 3 days 1 min # Data Analysts 4 1 # Data Engineers 2 # Business Analysts 2

TODAY

# of People Involved 8 Amount of Time (months) 3 month+ # of Questions answered in 1 year < 4 Cost of Asking 4 Questions $549,973

PROMETHIUM

# of People Involved 1 Amount of Time (min) ~3 min # of Questions answered in 1 year 40,000 Cost of Asking 4 Questions $7.30

People Efficiency 7X less Time Efficiency ~ 10,000X less Cost Efficiency ~ 75,000X less

For 7x less resources & 75,000X less cost, Promethium can answer up to 10,000X more questions.

What can a business do if it has a 10,000X increase in efficiency to answer questions & gain insights?

Data Analyst Cost $125,000 Data Engineer Cost $250,000 Business Analyst Cost $90,000

slide-20
SLIDE 20

20

AI-DRIVEN APPROACH WITH PROMETHIUM

Discover Prep Execute Ask a Question (NLP)

slide-21
SLIDE 21

Thank you!