Airflow the perfect match in our Analytics Pipeline Sergio Camilo - - PowerPoint PPT Presentation

airflow the perfect match in our analytics pipeline
SMART_READER_LITE
LIVE PREVIEW

Airflow the perfect match in our Analytics Pipeline Sergio Camilo - - PowerPoint PPT Presentation

FROM AIRFLOW IMPORT DAG Airflow the perfect match in our Analytics Pipeline Sergio Camilo Fandio Hernndez Senior Business Intelligence Architect @LOVOO A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is


slide-1
SLIDE 1

Airflow the perfect match in our Analytics Pipeline

Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO FROM AIRFLOW IMPORT DAG
slide-2
SLIDE 2 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-3
SLIDE 3 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO B E F O R E T H E A C T I O N S TA R T S 3

About

  • LOVOO is a dating and social app and the place for chatting, live

streaming, watching streams and getting to know people.

  • Germany - Dresden & Berlin - 2011
  • Acquired by The Meet Group (NASDAQ:MEET) in 2017
  • Top 3 Dating App in Europe
  • + 280 TB of Data
  • ~ 6 TB Monthly Growth
  • + 3 TB daily total aggregated data
  • + 36 TB Swipes (162,824,303,474)

LOVOO

slide-4
SLIDE 4 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO THE TEAM 4

Analytics

  • 1 Head
  • 6 Data Analysts
  • 2 BI Architects 

  • Product
  • Finance
  • Marketing
  • Talent Management
  • Customer Insights
  • CRM

slide-5
SLIDE 5 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO WILL IT BE TOO TECHNICAL? WAIT… 5

What can you expect?

My main purpose today is to tell you about our journey with Airflow as well as a few different use cases that could also boost the work of your Analytics/BI team on a daily basis.

  • Pieces of code (examples)
  • Way too many screenshots
slide-6
SLIDE 6 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-7
SLIDE 7 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO OUR LAST DATE… 7

On-premise

slide-8
SLIDE 8 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO THE COOL KIDS… 8

We went Cloud

slide-9
SLIDE 9 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO THE PROFILE DETAILS… 9

Data Processing

Backend EU-Bridge Data Loading

Pub-Sub Google Kubernetes BigQuery Cloud Storage Google Sheets Google - Firebase Airflow Composer Payment Providers, Appsumer, Adjust, CRM, etc…
slide-10
SLIDE 10 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO WHAT REALLY MATTERS… 1 0

Analytics Data-Core

BigQuery Cloud Storage Google Sheets Google - Firebase Airflow Composer Payment Providers, Appsumer, Adjust, Redshift, etc…
slide-11
SLIDE 11 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-12
SLIDE 12 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO LEFT SWIPING… 1 2

Orchestration Tool

  • Identify what is out there
  • Costs?
  • Scalability?
  • Data sources compatibility?
  • Knowledge/Human Resources?
slide-13
SLIDE 13 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO R I G H T S W I P E D … 1 3

Airflow

  • Great community
  • Game changer
  • Mobile App
  • Python
  • BigQuery
slide-14
SLIDE 14 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO A GOOD FIT… 1 4

Google Cloud Composer

  • Fully Managed Airflow
  • Scalable
  • IAP - Secure
  • Focus on building the Analytics data pipeline
  • Ease of implementation
slide-15
SLIDE 15 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO N O T R I S K , N O F U N … 1 5

Google Cloud Composer

  • Fully Managed Airflow
  • Scalable
  • IAP - Secure
  • Focus on building the Analytics data pipeline
  • Ease of implementation
slide-16
SLIDE 16 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-17
SLIDE 17 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO B R E A K I N G T H E I C E … 1 7

TODO List

  • SQL Scripts —> Data Modeling
  • DAGs
  • Permissions - Service Accounts
  • Data Importers
  • Create a Composer Environment
  • How do we deploy? —> CI/CD
slide-18
SLIDE 18 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO GROWING TOGETHER! 1 8

CI/CD

Cloud Storage Cloud Composer Version Control

DAGs.py SQL Importers

Cloud Build

Trigger

YAML

Slack

Checks Passed

slide-19
SLIDE 19 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO GROWING TOGETHER! 1 9

CI/CD

Cloud Storage Cloud Composer Version Control

DAGs.py SQL Importers

Cloud Build

Trigger

YAML

Slack
slide-20
SLIDE 20 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-21
SLIDE 21 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO H O W D O E S I T L O O K L I K E ? 2 1

DAGs

  • 26 DAGs
  • Sub-DAGs
  • Branching
  • Jinja Templating
  • Hooks
  • Pools
  • Trigger rules

Operators

slide-22
SLIDE 22 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO P R E T T Y O N T H E O U T S I D E … 2 2

The Core

Sub DAGs Analytics - Workflow

slide-23
SLIDE 23 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO P R E T T Y O N T H E I N S I D E … 2 3

The Core

Sub DAG

slide-24
SLIDE 24 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO C O M M U N I C AT I O N I S V I TA L … 2 4

Reports!

Slack Webhook
slide-25
SLIDE 25 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO C O M M U N I C AT I O N I S V I TA L … 2 5

Tableau Extracts

slide-26
SLIDE 26 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO C O M M U N I C AT I O N I S V I TA L … 2 6

Is Airflow finished?

by the way, this is branching…

slide-27
SLIDE 27 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO C O M M U N I C AT I O N I S V I TA L … 2 7

Is Airflow finished?

by the way, this is branching…

slide-28
SLIDE 28 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO B E C A U S E S H ! ] H A P P E N S ! 2 8

Error Alerting

slide-29
SLIDE 29 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO B E I N G F L E X I B L E I S A B I G F L E X ! 2 9

Integrating Data Sources

this code belongs to the DAG.py file

slide-30
SLIDE 30 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO 3 0

Integrating Data Sources

this code belongs to the DAG.py file

B E I N G F L E X I B L E I S A B I G F L E X !
slide-31
SLIDE 31 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO 3 1

Integrating Data Sources

this code belongs to the importer.py file

B E I N G F L E X I B L E I S A B I G F L E X !
slide-32
SLIDE 32 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO 3 2

Integrating Data Sources

this pseudo-code belongs to the importer.py file

B E I N G F L E X I B L E I S A B I G F L E X !
slide-33
SLIDE 33 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO 3 3

Integrating Data Sources

2 Tables - 2 Days -> ELT in BQ

B E I N G F L E X I B L E I S A B I G F L E X !
slide-34
SLIDE 34 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO 3 4

Data Importers

  • Redshift
  • Firebase (very dynamic)
  • Google Cloud Storage (Adjust, Merger)
  • Appsumer, Shopify, Paypal, AppStore, Adyen
  • S3 Storage
S C H E D U L I N G C U S TO M C O D E
slide-35
SLIDE 35 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-36
SLIDE 36 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y E S , V E RY D Y N A M I C … 3 6

Creating Tasks Dynamically

slide-37
SLIDE 37 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y E S , V E RY D Y N A M I C … 3 7

Creating Tasks Dynamically

  • 1. Creating a plain text with meaningful structure
  • 2. Create a task based on a PythonOperator
  • 3. Define and write your Callable (your custom code)
slide-38
SLIDE 38 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y E S , V E RY D Y N A M I C … 3 8

Creating Tasks Dynamically

JSON File

slide-39
SLIDE 39 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y E S , V E RY D Y N A M I C … 3 9

Creating Tasks Dynamically

this code belongs to the DAG.py file

slide-40
SLIDE 40 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y E S , V E RY D Y N A M I C … 4 0

Creating Tasks Dynamically

this code belongs to the DAG.py file

slide-41
SLIDE 41 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y O U R C O D E G O E S H E R E 4 1

Creating Tasks Dynamically

this is your custom code (Pseudo-Code)

slide-42
SLIDE 42 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO Y O U R C O D E G O E S H E R E 4 2

Creating Tasks Dynamically

this is your custom code (Pseudo-Code)

slide-43
SLIDE 43 A G E N D A
  • 1. Why we met?
  • 2. How we met?
  • 3. The first date!
  • 4. Fun dates!
  • 5. Is there any dynamic in between?
  • 6. Recap and conclusion
slide-44
SLIDE 44 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO D O N ’ T O V E R D O I T 4 4

Recap and Conclusion

slide-45
SLIDE 45 Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO IT WAS A MATCH… 4 5

Recap and Conclusion

  • Using an Alpha version (Google Composer) in Production was challenging!
  • Focus on what’s important - Google Cloud Composer
  • Airflow leverages a bunch of Operators OOTB
  • Always room for improvement
  • No magic recipe to use - stay flexible
slide-46
SLIDE 46

Feedback and Questions

LinkedIn:

https://www.linkedin.com/in/fandinohernandez/

Email: sergio.fandino@lovoo.com

Gracias.

LOVOO

July 16, 2020 Berlin - Germany