Airflow the perfect match in our Analytics Pipeline Sergio Camilo - PowerPoint PPT Presentation
FROM AIRFLOW IMPORT DAG Airflow the perfect match in our Analytics Pipeline Sergio Camilo Fandio Hernndez Senior Business Intelligence Architect @LOVOO A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is
FROM AIRFLOW IMPORT DAG Airflow the perfect match in our Analytics Pipeline Sergio Camilo Fandiño Hernández Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
B E F O R E T H E A C T I O N S TA R T S About - LOVOO is a dating and social app and the place for chatting, live streaming, watching streams and getting to know people. LOVOO - Germany - Dresden & Berlin - 2011 - Acquired by The Meet Group (NASDAQ:MEET) in 2017 - Top 3 Dating App in Europe - + 280 TB of Data - ~ 6 TB Monthly Growth - + 3 TB daily total aggregated data - + 36 TB Swipes (162,824,303,474) Sergio Camilo Fandiño Hernández 3 Senior Business Intelligence Architect @LOVOO
THE TEAM Analytics - Product - Finance - 1 Head - Marketing - 6 Data Analysts - Talent Management - 2 BI Architects - Customer Insights - CRM Sergio Camilo Fandiño Hernández 4 Senior Business Intelligence Architect @LOVOO
WILL IT BE TOO TECHNICAL? WAIT… What can My main purpose today is to tell you about our journey with Airflow as well as a few different use cases that could also boost the work of your Analytics/BI you expect? team on a daily basis. • Pieces of code (examples) • Way too many screenshots Sergio Camilo Fandiño Hernández 5 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
OUR LAST DATE… On-premise Sergio Camilo Fandiño Hernández 7 Senior Business Intelligence Architect @LOVOO
THE COOL KIDS… We went Cloud Sergio Camilo Fandiño Hernández 8 Senior Business Intelligence Architect @LOVOO
THE PROFILE DETAILS… Data Processing Data Loading Airflow Composer Backend Google Kubernetes Google - Firebase Google Sheets EU-Bridge Pub-Sub BigQuery Cloud Storage Payment Providers, Appsumer, Adjust, CRM, etc… Sergio Camilo Fandiño Hernández 9 Senior Business Intelligence Architect @LOVOO
WHAT REALLY MATTERS… Analytics Airflow Composer Data-Core Google - Firebase Google Sheets BigQuery Cloud Storage Payment Providers, Appsumer, Adjust, Redshift, etc… Sergio Camilo Fandiño Hernández 1 0 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
LEFT SWIPING… Orchestration Tool - Identify what is out there - Costs? - Scalability? - Data sources compatibility? - Knowledge/Human Resources? Sergio Camilo Fandiño Hernández 1 2 Senior Business Intelligence Architect @LOVOO
R I G H T S W I P E D … Airflow - Great community - Game changer - Mobile App - Python - BigQuery Sergio Camilo Fandiño Hernández 1 3 Senior Business Intelligence Architect @LOVOO
A GOOD FIT… Google Cloud Composer - Fully Managed Airflow - Scalable - IAP - Secure - Focus on building the Analytics data pipeline - Ease of implementation Sergio Camilo Fandiño Hernández 1 4 Senior Business Intelligence Architect @LOVOO
N O T R I S K , N O F U N … Google Cloud Composer - Fully Managed Airflow - Scalable - IAP - Secure - Focus on building the Analytics data pipeline - Ease of implementation Sergio Camilo Fandiño Hernández 1 5 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
B R E A K I N G T H E I C E … TODO List - SQL Scripts —> Data Modeling - DAGs - Permissions - Service Accounts - Data Importers - Create a Composer Environment - How do we deploy? —> CI/CD Sergio Camilo Fandiño Hernández 1 7 Senior Business Intelligence Architect @LOVOO
GROWING TOGETHER! CI/CD Slack YAML DAGs.py Cloud Build Cloud Composer Trigger Importers SQL Checks Cloud Storage Version Control Passed Sergio Camilo Fandiño Hernández 1 8 Senior Business Intelligence Architect @LOVOO
GROWING TOGETHER! CI/CD Slack YAML DAGs.py Cloud Build Cloud Composer Trigger Importers SQL Cloud Storage Version Control Sergio Camilo Fandiño Hernández 1 9 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
H O W D O E S I T L O O K L I K E ? Operators DAGs • 26 DAGs • Sub-DAGs • Branching • Jinja Templating • Hooks • Pools • Trigger rules Sergio Camilo Fandiño Hernández 2 1 Senior Business Intelligence Architect @LOVOO
P R E T T Y O N T H E O U T S I D E … Analytics - Workflow The Core Sub DAGs Sergio Camilo Fandiño Hernández 2 2 Senior Business Intelligence Architect @LOVOO
P R E T T Y O N T H E I N S I D E … The Core Sub DAG Sergio Camilo Fandiño Hernández 2 3 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Reports! Slack Webhook Sergio Camilo Fandiño Hernández 2 4 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Tableau Extracts Sergio Camilo Fandiño Hernández 2 5 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Is Airflow finished? by the way, this is branching… Sergio Camilo Fandiño Hernández 2 6 Senior Business Intelligence Architect @LOVOO
C O M M U N I C AT I O N I S V I TA L … Is Airflow finished? by the way, this is branching… Sergio Camilo Fandiño Hernández 2 7 Senior Business Intelligence Architect @LOVOO
B E C A U S E S H ! ] H A P P E N S ! Error Alerting Sergio Camilo Fandiño Hernández 2 8 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 2 9 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 3 0 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating this code belongs to the importer.py file Data Sources Sergio Camilo Fandiño Hernández 3 1 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources this pseudo-code belongs to the importer.py file Sergio Camilo Fandiño Hernández 3 2 Senior Business Intelligence Architect @LOVOO
B E I N G F L E X I B L E I S A B I G F L E X ! Integrating Data Sources 2 Tables - 2 Days -> ELT in BQ Sergio Camilo Fandiño Hernández 3 3 Senior Business Intelligence Architect @LOVOO
S C H E D U L I N G C U S TO M C O D E Data Importers • Redshift • Firebase (very dynamic) • Google Cloud Storage (Adjust, Merger) • Appsumer, Shopify, Paypal, AppStore, Adyen • S3 Storage Sergio Camilo Fandiño Hernández 3 4 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically Sergio Camilo Fandiño Hernández 3 6 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … Creating Tasks 1. Creating a plain text with meaningful structure Dynamically 2. Create a task based on a PythonOperator 3. Define and write your Callable (your custom code) Sergio Camilo Fandiño Hernández 3 7 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … JSON File Creating Tasks Dynamically Sergio Camilo Fandiño Hernández 3 8 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 3 9 Senior Business Intelligence Architect @LOVOO
Y E S , V E RY D Y N A M I C … Creating Tasks Dynamically this code belongs to the DAG.py file Sergio Camilo Fandiño Hernández 4 0 Senior Business Intelligence Architect @LOVOO
Y O U R C O D E G O E S H E R E Creating Tasks this is your custom code (Pseudo-Code) Dynamically Sergio Camilo Fandiño Hernández 4 1 Senior Business Intelligence Architect @LOVOO
Y O U R C O D E G O E S H E R E Creating Tasks this is your custom code (Pseudo-Code) Dynamically Sergio Camilo Fandiño Hernández 4 2 Senior Business Intelligence Architect @LOVOO
A G E N D A 1. Why we met? 2. How we met? 3. The first date! 4. Fun dates! 5. Is there any dynamic in between? 6. Recap and conclusion
D O N ’ T O V E R D O I T Recap and Conclusion Sergio Camilo Fandiño Hernández 4 4 Senior Business Intelligence Architect @LOVOO
IT WAS A MATCH… Recap and Conclusion - Using an Alpha version (Google Composer) in Production was challenging! - Focus on what’s important - Google Cloud Composer - Airflow leverages a bunch of Operators OOTB - Always room for improvement - No magic recipe to use - stay flexible Sergio Camilo Fandiño Hernández 4 5 Senior Business Intelligence Architect @LOVOO
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.