[PPT] - How Cloudflare analyzes >1m DNS queries per second Tom Arnfeld PowerPoint Presentation

SLIDE 1

How Cloudflare analyzes >1m DNS queries per second

Tom Arnfeld (and Marek Vavrusa )

SLIDE 2

100+

Data centers globally

2.5B

Monthly unique visitors

>10%

Internet requests everyday

≦3M

DNS queries/second websites, apps & APIs in 150 countries

6M+ 5M+

HTTP requests/second

SLIDE 3

Anatomy of a DNS query

$ dig www.cloudflare.com ; <<>> DiG 9.8.3-P1 <<>> www.cloudflare.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36582 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.cloudflare.com. IN A ;; ANSWER SECTION: www.cloudflare.com. 5 IN A 198.41.215.162 www.cloudflare.com. 5 IN A 198.41.214.162 ;; Query time: 34 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Sat Sep 2 10:48:30 2017 ;; MSG SIZE rcvd: 68

Fields

30+

SLIDE 4

Cloudflare DNS Server Log Forwarder HTTP & Other Edge Services

Anycast DNS

Logs from all edge services and all PoPs are shipped over TLS to be processed Logs are received and de-multiplexed Logs are written into various kafka topics

SLIDE 5

Cloudflare DNS Server Log Forwarder HTTP & Other Edge Services

Anycast DNS

Log messages are serialized with Cap’n’Proto Logs from all edge services and all PoPs are shipped over TLS to be processed Logs are written into various kafka topics Logs are received and de-multiplexed

SLIDE 6

What did we want?

Multidimensional query analytics
Complex ad-hoc queries
Capable of current and expected future scale
Gracefully handle late arriving log data
Roll-ups/aggregations for long term storage
Highly available and replicated architecture

Queries Per Second

≦3M

Edge Points

f Presence

100+

Query Dimensions

20+

Years of stored aggregation

5+

SLIDE 7

Logs are written into various kafka topics Logs are received and de-multiplexed

Kafka, Apache Spark and Parquet

Scanning firehose is slow and

adding filters is time consuming

Offline analysis is difficult with

large amounts of data

Not a fast or friendly user

experience

Doesn’t work for customers

Converted into Parquet and written to HDFS Download and filter data from Kafka using Apache Spark

SLIDE 8

Let’s aggregate everything... with streams

Timestamp QName QType RCODE 2017/01/01 01:00:00 www.cloudflare.com A NODATA 2017/01/01 01:00:01 api.cloudflare.com AAAA NOERROR Time Bucket QName QType RCODE Count p50 Response Time 2017/01/01 01:00 www.cloudflare.com A NODATA 5 0.4876ms 2017/01/01 01:00 api.cloudflare.com AAAA NOERROR 10 0.5231ms

SLIDE 9

Let’s aggregate everything... with streams

Counters
Total number of queries
Query types
Response codes
Top-n query names
Top-n query sources
Response time/size quantiles

SLIDE 10

SLIDE 11

SLIDE 12

SLIDE 13

SLIDE 14

SLIDE 15

SLIDE 16

SLIDE 17

Logs are written into various kafka topics Logs are received and de-multiplexed

Spark experience in-house, though

Java/Scala

Batch-oriented and need a DB to

serve online queries

Difficult to support ad-hoc analysis
Low resolution aggregates
Scanning raw data is slow
Late arriving data

Aggregating with Spark Streaming

Produce low cardinality aggregates with Spark Streaming

SLIDE 18

Logs are written into various kafka topics Logs are received and de-multiplexed

Spark experience in-house, though

Java/Scala

Batch-oriented and need a DB to

serve online queries

Difficult to support ad-hoc analysis
Low resolution aggregates
Scanning raw data is slow
Late arriving data

Aggregating with Spark Streaming

Produce low cardinality aggregates with Spark Streaming

SLIDE 19

Logs are written into various kafka topics Logs are received and de-multiplexed

Distributed time-series DB
Existing deployments of CitusDB
High cardinality aggregations are

tricky due to insert performance

Late arriving data
SQL API

Spark Streaming + CitusDB

Produce low cardinality aggregates with Spark Streaming Insert aggregate rows into CitusDB cluster for reads

SLIDE 20

Logs are written into various kafka topics Logs are received and de-multiplexed

Apache Flink + (CitusDB?)

Dataflow API and support for

stream watermarks

Checkpoint performance issues
High cardinality aggregations are

tricky due to insert performance

SQL API

Produce low cardinality aggregates with Flink Insert aggregate rows into CitusDB cluster for reads

SLIDE 21

Logs are written into various kafka topics Logs are received and de-multiplexed

Druid

Insertion rate couldn’t keep up in
ur initial tests
Estimated costs of a suitable cluster

were way expensive

Seemed performant for random

reads but not the best we’d seen

Operational complexity seemed high

Insert into a cluster of Druid nodes

SLIDE 22

Let’s aggregate everything... with streams

Timestamp QName QType RCODE 2017/01/01 01:00:00 www.cloudflare.com A NODATA 2017/01/01 01:00:01 api.cloudflare.com AAAA NOERROR Time Bucket QName QType RCODE Count p50 Response Time 2017/01/01 01:00 www.cloudflare.com A NODATA 5 0.4876ms 2017/01/01 01:00 api.cloudflare.com AAAA NOERROR 10 0.5231ms

Raw data isn’t easily queried ad-hoc
Backfilling new aggregates is impossible or can

be very difficult without custom tools

A stream can’t serve actual queries
Can be costly for high cardinality dimensions

*https://clickhouse.yandex/docs/en/introduction/what_is_clickhouse.html

SLIDE 23

ClickHouse

Tabular, column-oriented data store
Single binary, clustered architecture
Familiar SQL query interface

Lots of very useful built-in aggregation functions

Raw log data stored for 3 months

~7 trillion rows

Aggregated data for ∞

1m, 1h aggregations across 3 dimensions

SLIDE 24

Cloudflare DNS Server Log Forwarder HTTP & Other Edge Services

Anycast DNS

Log messages are serialized with Cap’n’Proto Logs from all edge services and all PoPs are shipped over TLS to be processed Logs are written into various kafka topics Logs are received and de-multiplexed Go Inserters write the data in parallel Multi-tenant ClickHouse cluster stores data

SLIDE 25

ClickHouse Cluster

TinyLog dnslogs_2016_01_01_14_30_pN ReplicatedMergeTree dnslogs_2016_01_01 ReplicatedMergeTree dnslogs_2016_01 ReplicatedMergeTree dnslogs_2016

Raw logs are inserted into

sharded tables

Sidecar processes aggregates

data into day/month/year tables

Initial table design

SLIDE 26

ClickHouse Cluster

r{0,2}.dnslogs

Raw logs are inserted into one replicated, sharded table
Multiple r{0,2} databases to better pack the cluster with shards and

replicas

First attempt in prod.

ReplicatedMergeTree

SLIDE 27

Speeding up typical queries

SUM() and COUNT() over a few low-cardinality dimensions
Global overview (trends and monitoring)
Storing intermediate state for non-additive functions

SLIDE 28

ClickHouse Cluster

r{0,2}.dnslogs

Raw logs are inserted into one

replicated, sharded table

Multiple r{0,2} databases to better pack

the cluster with shards and replicas

Aggregate tables for long-term storage

Today...

ReplicatedMergeTree ReplicatedAggregatingMergeTree dnslogs_rollup_X

SLIDE 29

October 2016 Began evaluating technologies and architecture, 1 instance in Docker Finalized schema, deployed a production ClickHouse cluster of 6 nodes November 2016 Prototype ClickHouse cluster with 3 nodes, inserting a sample of data August 2017 Migrated to a new cluster with multi-tenancy Growing interest among other Cloudflare engineering teams, worked on standard tooling December 2016 ClickHouse visualisations with Superset and Grafana Spring 2017 TopN, IP prefix matching, Go native driver, Analytics library, pkey in monotonic functions

SLIDE 30

October 2016 Began evaluating technologies and architecture, 1 instance in Docker Finalized schema, deployed a production ClickHouse cluster of 6 nodes November 2016 Prototype ClickHouse cluster with 3 nodes, inserting a sample of data August 2017 Migrated to a new cluster with multi-tenancy Growing interest among other Cloudflare engineering teams, worked on standard tooling December 2016 ClickHouse visualisations with Superset and Grafana Spring 2017 TopN, IP prefix matching, Go native driver, Analytics library, pkey in monotonic functions

Multi-tenant ClickHouse cluster

Row Insertion/s

8M+

Raid-0 Spinning Disks

2PB+

Insertion Throughput/s

4GB+

Nodes

33

SLIDE 31

ClickHouse Today… 12 Trillion Rows

SELECT table, sum(rows) AS total FROM system.cluster_parts WHERE database = 'r0' GROUP BY table ORDER BY total DESC ┌─table──────────────────────────────┬─────────────total─┐ │ ███████████████ │ 9,051,633,001,267 │ │ ████████████████████ │ 2,088,851,716,078 │ │ ███████████████████ │ 847,768,860,981 │ │ ██████████████████████ │ 259,486,159,236 │ │ … │ … │

SLIDE 32

TopK(n) Aggregates

https://github.com/yandex/ClickHouse/pull/754

TrieDictionaries (IP Prefix)

https://github.com/yandex/ClickHouse/pull/785

SpaceSaving: internal storage for StringRef{}

https://github.com/yandex/ClickHouse/pull/925

Bug fixes to the Go native driver

https://github.com/kshvakov/clickhouse

sumMap(key, value)

https://github.com/yandex/ClickHouse/pull/1250

Contributions to ClickHouse

SLIDE 33

Other Contributions

Grafana Plugin

https://github.com/vavrusa/grafana-sqldb-datasource (see also https://github.com/Vertamedia/clickhouse-grafana)

SQLAlchemy (Superset)

https://github.com/cloudflare/sqlalchemy-clickhouse

SLIDE 34

Python w/ Jupyter Notebooks

import requests import pandas as pd def ch(q, host='127.0.0.1', port=9001): start = timer() r = requests.get( 'https://%s:%d/' % (host, port), params={'user': 'xxx', 'query': q + '\nFORMAT TabSeparatedWithNames'}, stream=True) end = timer() if not r.ok: raise RuntimeError(r.text) print 'Query finished in %.02fs' % (end - start) return pd.read_csv(r.raw, sep="\t")

SLIDE 35

Python w/ Jupyter Notebooks

import requests import pandas as pd def ch(q, host='127.0.0.1', port=9001): start = timer() r = requests.get( 'https://%s:%d/' % (host, port), params={'user': 'xxx', 'query': q + '\nFORMAT TabSeparatedWithNames'}, stream=True) end = timer() if not r.ok: raise RuntimeError(r.text) print 'Query finished in %.02fs' % (end - start) return pd.read_csv(r.raw, sep="\t")

SLIDE 36

Python w/ Jupyter Notebooks

SLIDE 37

Python w/ Jupyter Notebooks

SLIDE 38

blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second

Check it

SLIDE 39

Thanks!

@tarnfeld @vavrusam

https://cloudflare.com/careers/departments/engineering