Routing Trillions of Events Per Day @Twitter
#ApacheBigData 2017
Lohit VijayaRenu & Gary Steelman @lohitvijayarenu @efsie
Routing Trillions of Events Per Day @Twitter #ApacheBigData 2017 - - PowerPoint PPT Presentation
Routing Trillions of Events Per Day @Twitter #ApacheBigData 2017 Lohit VijayaRenu & Gary Steelman @lohitvijayarenu @efsie In this talk Event Logs at Twitter 1. Log Collection 2. Log Processing 3. Log Replication 4. The Future
#ApacheBigData 2017
Lohit VijayaRenu & Gary Steelman @lohitvijayarenu @efsie
1. Event Logs at Twitter 2. Log Collection 3. Log Processing 4. Log Replication 5. The Future 6. Questions
In this talk
clients into the Category
File System, bucketed every hour into separate directories ○ /logs/ads_view/2017/05/01/23 ○ /logs/login_event/2017/05/01/23
Life of an Event
Clients Aggregated by Category Storage HDFS Http Clients Clients
Client Daemon Client Daemon Client Daemon Http Endpoint
Across millions of clients
Trillion Events a Day
Nodes
Incoming uncompressed Collocated with HDFS datanodes
Event Log Stats
Categories
Event groups by category
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Event Log Architecture
Events Events RT Storage (HDFS) Inside DC1 Events Events RT Storage (HDFS) Inside DC2 DW Storage (HDFS) Prod Storage (HDFS) DW Storage (HDFS) Cold Storage (HDFS) Prod Storage (HDFS)
Event Log Architecture
Events Events RT Storage (HDFS) Inside DC1 Events Events RT Storage (HDFS) Inside DC2 DW Storage (HDFS) Prod Storage (HDFS) DW Storage (HDFS) Cold Storage (HDFS) Prod Storage (HDFS)
Event Log Architecture
Events Events RT Storage (HDFS) Inside DC1 Events Events RT Storage (HDFS) Inside DC2 DW Storage (HDFS) Prod Storage (HDFS) DW Storage (HDFS) Cold Storage (HDFS) Prod Storage (HDFS)
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Past
Event Collection Overview
Future
Scribe Client Daemon Scribe Aggregator Daemons Scribe Client Daemon Flume Aggregator Daemon Flume Aggregator Daemon Flume Client Daemon
Present
Event Collection
Past
Challenges with Scribe
○ 600 categories x 1500 aggregators x 6 per hour =~ 5.4M files per hour
Event Collection
Present
Apache Flume
interfaces
Source Sink Channel Client HDFS Flume Agent
Event Collection
Present
Category Group
categories into a category group
properties per group
to generate fewer combined sequence files
Agent 1 Agent 2 Agent 3
Category 1 Category 3 Category 2 Category Group
Group 1 Category Groups Aggregator Group 1 Aggregator Group 2
Event Collection
Present
Aggregator Group
hosting same set of category groups
group of aggregators hosting subset of categories
Agent 1 Agent 2 Agent 3 Agent 8 Group 2
Event Collection
Present
Flume features to support groups
channel per category group
Event Collection
Present
Flume performance improvements
spikes on memory channel
SpillableMemoryChannel
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
To process one day of data
Wall Clock Hours Data per Day Disk Space
Output of cleaned, compressed, consolidated, and converted Saved by processing Flume sequence files
Log Processor Stats
Processing Trillion Events per Day
pre-processing steps on the same data sets
Log Processor Needs
Processing Trillion Events per Day
Datacenter 1
Log Processor Steps
ads_group/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh login_group/yyyy/mm/dd/hh Category Groups Categories Demux Jobs ads_group_demuxer login_group_demuxer
Datacenter 1
Log Processor Steps
ads_group/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh login_group/yyyy/mm/dd/hh Category Groups Categories Demux Jobs ads_group_demuxer login_group_demuxer
Datacenter 1
Log Processor Steps
ads_group/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh login_group/yyyy/mm/dd/hh Category Groups Categories Demux Jobs ads_group_demuxer login_group_demuxer
Decode Base64 encoding from logged data
1
Demux Category groups into individual categories for easier consumption by analytics teams
2
Clean Corrupt, empty, or invalid records so data sets are more reliable
3
Compress Logged data to the highest level to save disk space. From LZO level 3 to LZO level 7
4
Consolidate Small files to reduce pressure on the NameNode
5
Convert Some categories into Parquet for fastest use in ad-hoc exploratory tools
6
Log Processor Steps
record delimiter
serialized Thrift object per binary blob
Why Base64 Decoding?
Legacy Choices
/logs/ads_click/yyyy/mm/dd/hh/1.lzo /logs/ads_view/yyyy/mm/dd/hh/1.lzo
Log Demux Visual
/raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq DEMUX /logs/ads_view/yyyy/mm/dd/hh/1.lzo
/logs/ads_click/yyyy/mm/dd/hh/1.lzo /logs/ads_view/yyyy/mm/dd/hh/1.lzo
Log Demux Visual
/raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq DEMUX /logs/ads_view/yyyy/mm/dd/hh/1.lzo
/logs/ads_click/yyyy/mm/dd/hh/1.lzo /logs/ads_view/yyyy/mm/dd/hh/1.lzo
Log Demux Visual
/raw/ads_group/yyyy/mm/dd/hh/ads_group_1.seq DEMUX /logs/ads_view/yyyy/mm/dd/hh/1.lzo
aggregates logs
sequence files
thread pool
can’t be read
Log Processor Daemon
slows down job completion time
MapReduce, Spark, etc
Why Tez?
allowing large partitions to be further partitioned so multiple tasks process events for a single category one task ○ More info at TEZ-3209. ○ Thanks to team member Ming Ma for the contribution!
processing times
Dynamic Partitioning
Typical Partitioning Visual
Task 3 Task 2 Input File 1 Task 1
Hash Partitioning Visual
Task 3 Task 2 Input File 1 Task 1 Task 4 Task 5
Event Log Architecture
Clients Local log collection daemon Clients Aggregate log events grouped by Category Storage (HDFS) HTTP Remote Clients Log Processor Storage (HDFS) Storage (HDFS) Log Replicator Storage (HDFS) Inside DataCenter Storage (Streaming)
Across all analytics clusters
Copy Jobs per Day PB of Data per Day Analytics Clusters
Replicated to analytics clusters
Log Replication Stats
Replicating Trillion Events per Day
○ Cross-data center reads are incredibly expensive ○ Cross-rack reads within data center are still expensive
Log Replication Needs
Processing Trillion Events per Day
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Datacenter N Datacenter 1
Log Replication Visual
ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh ads_view/yyyy/mm/dd/hh login_event/yyyy/mm/dd/hh ads_click/yyyy/mm/dd/hh Replication Jobs ads_click_repl ads_view_repl ads_click_repl login_event_repl
Copy Logged data from all processing clusters to the target cluster. 1 Merge Copied data into one directory. 2 Present Data atomically by renaming it to an accessible location. 3 Publish Metadata to the Data Abstraction Layer to notify analytics teams data is ready for consumption. 4
Log Replication Steps
Distributing Trillion Events per Day
where analytics users run queries and jobs
clusters
can’t be read
Log Replicator Daemon
Future of Log Management
Questions?