How Cloudflare analyzes >1m DNS queries per second Tom Arnfeld - - PowerPoint PPT Presentation

how cloudflare analyzes 1m dns queries per second
SMART_READER_LITE
LIVE PREVIEW

How Cloudflare analyzes >1m DNS queries per second Tom Arnfeld - - PowerPoint PPT Presentation

How Cloudflare analyzes >1m DNS queries per second Tom Arnfeld (and Marek Vavrusa ) 3M DNS queries/second >10% 100+ Internet requests everyday Data centers globally 2.5B 6M+ Monthly unique visitors 5M+ websites, apps &


slide-1
SLIDE 1

How Cloudflare analyzes >1m DNS queries per second

Tom Arnfeld (and Marek Vavrusa )

slide-2
SLIDE 2

100+

Data centers globally

2.5B

Monthly unique visitors

>10%

Internet requests everyday

≦3M

DNS queries/second websites, apps & APIs in 150 countries

6M+ 5M+

HTTP requests/second

slide-3
SLIDE 3

Anatomy of a DNS query

$ dig www.cloudflare.com ; <<>> DiG 9.8.3-P1 <<>> www.cloudflare.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36582 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.cloudflare.com. IN A ;; ANSWER SECTION: www.cloudflare.com. 5 IN A 198.41.215.162 www.cloudflare.com. 5 IN A 198.41.214.162 ;; Query time: 34 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Sat Sep 2 10:48:30 2017 ;; MSG SIZE rcvd: 68

Fields

30+

slide-4
SLIDE 4

Cloudflare DNS Server Log Forwarder HTTP & Other Edge Services

Anycast DNS

Logs from all edge services and all PoPs are shipped over TLS to be processed Logs are received and de-multiplexed Logs are written into various kafka topics

slide-5
SLIDE 5

Cloudflare DNS Server Log Forwarder HTTP & Other Edge Services

Anycast DNS

Log messages are serialized with Cap’n’Proto Logs from all edge services and all PoPs are shipped over TLS to be processed Logs are written into various kafka topics Logs are received and de-multiplexed

slide-6
SLIDE 6

What did we want?

  • Multidimensional query analytics
  • Complex ad-hoc queries
  • Capable of current and expected future scale
  • Gracefully handle late arriving log data
  • Roll-ups/aggregations for long term storage
  • Highly available and replicated architecture

Queries Per Second

≦3M

Edge Points

  • f Presence

100+

Query Dimensions

20+

Years of stored aggregation

5+

slide-7
SLIDE 7

Logs are written into various kafka topics Logs are received and de-multiplexed

Kafka, Apache Spark and Parquet

  • Scanning firehose is slow and

adding filters is time consuming

  • Offline analysis is difficult with

large amounts of data

  • Not a fast or friendly user

experience

  • Doesn’t work for customers

Converted into Parquet and written to HDFS Download and filter data from Kafka using Apache Spark

slide-8
SLIDE 8

Let’s aggregate everything... with streams

Timestamp QName QType RCODE 2017/01/01 01:00:00 www.cloudflare.com A NODATA 2017/01/01 01:00:01 api.cloudflare.com AAAA NOERROR Time Bucket QName QType RCODE Count p50 Response Time 2017/01/01 01:00 www.cloudflare.com A NODATA 5 0.4876ms 2017/01/01 01:00 api.cloudflare.com AAAA NOERROR 10 0.5231ms

slide-9
SLIDE 9

Let’s aggregate everything... with streams

  • Counters
  • Total number of queries
  • Query types
  • Response codes
  • Top-n query names
  • Top-n query sources
  • Response time/size quantiles
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

Logs are written into various kafka topics Logs are received and de-multiplexed

  • Spark experience in-house, though

Java/Scala

  • Batch-oriented and need a DB to

serve online queries

  • Difficult to support ad-hoc analysis
  • Low resolution aggregates
  • Scanning raw data is slow
  • Late arriving data

Aggregating with Spark Streaming

Produce low cardinality aggregates with Spark Streaming

slide-18
SLIDE 18

Logs are written into various kafka topics Logs are received and de-multiplexed

  • Spark experience in-house, though

Java/Scala

  • Batch-oriented and need a DB to

serve online queries

  • Difficult to support ad-hoc analysis
  • Low resolution aggregates
  • Scanning raw data is slow
  • Late arriving data

Aggregating with Spark Streaming

Produce low cardinality aggregates with Spark Streaming

slide-19
SLIDE 19

Logs are written into various kafka topics Logs are received and de-multiplexed

  • Distributed time-series DB
  • Existing deployments of CitusDB
  • High cardinality aggregations are

tricky due to insert performance

  • Late arriving data
  • SQL API

Spark Streaming + CitusDB

Produce low cardinality aggregates with Spark Streaming Insert aggregate rows into CitusDB cluster for reads

slide-20
SLIDE 20

Logs are written into various kafka topics Logs are received and de-multiplexed

Apache Flink + (CitusDB?)

  • Dataflow API and support for

stream watermarks

  • Checkpoint performance issues
  • High cardinality aggregations are

tricky due to insert performance

  • SQL API

Produce low cardinality aggregates with Flink Insert aggregate rows into CitusDB cluster for reads

slide-21
SLIDE 21

Logs are written into various kafka topics Logs are received and de-multiplexed

Druid

  • Insertion rate couldn’t keep up in
  • ur initial tests
  • Estimated costs of a suitable cluster

were way expensive

  • Seemed performant for random

reads but not the best we’d seen

  • Operational complexity seemed high

Insert into a cluster of Druid nodes

slide-22
SLIDE 22

Let’s aggregate everything... with streams

Timestamp QName QType RCODE 2017/01/01 01:00:00 www.cloudflare.com A NODATA 2017/01/01 01:00:01 api.cloudflare.com AAAA NOERROR Time Bucket QName QType RCODE Count p50 Response Time 2017/01/01 01:00 www.cloudflare.com A NODATA 5 0.4876ms 2017/01/01 01:00 api.cloudflare.com AAAA NOERROR 10 0.5231ms

  • Raw data isn’t easily queried ad-hoc
  • Backfilling new aggregates is impossible or can

be very difficult without custom tools

  • A stream can’t serve actual queries
  • Can be costly for high cardinality dimensions

*https://clickhouse.yandex/docs/en/introduction/what_is_clickhouse.html

slide-23
SLIDE 23

ClickHouse

  • Tabular, column-oriented data store
  • Single binary, clustered architecture
  • Familiar SQL query interface

Lots of very useful built-in aggregation functions

  • Raw log data stored for 3 months

~7 trillion rows

  • Aggregated data for ∞

1m, 1h aggregations across 3 dimensions

slide-24
SLIDE 24

Cloudflare DNS Server Log Forwarder HTTP & Other Edge Services

Anycast DNS

Log messages are serialized with Cap’n’Proto Logs from all edge services and all PoPs are shipped over TLS to be processed Logs are written into various kafka topics Logs are received and de-multiplexed Go Inserters write the data in parallel Multi-tenant ClickHouse cluster stores data

slide-25
SLIDE 25

ClickHouse Cluster

TinyLog dnslogs_2016_01_01_14_30_pN ReplicatedMergeTree dnslogs_2016_01_01 ReplicatedMergeTree dnslogs_2016_01 ReplicatedMergeTree dnslogs_2016

  • Raw logs are inserted into

sharded tables

  • Sidecar processes aggregates

data into day/month/year tables

Initial table design

slide-26
SLIDE 26

ClickHouse Cluster

r{0,2}.dnslogs

  • Raw logs are inserted into one replicated, sharded table
  • Multiple r{0,2} databases to better pack the cluster with shards and

replicas

First attempt in prod.

ReplicatedMergeTree

slide-27
SLIDE 27

Speeding up typical queries

  • SUM() and COUNT() over a few low-cardinality dimensions
  • Global overview (trends and monitoring)
  • Storing intermediate state for non-additive functions
slide-28
SLIDE 28

ClickHouse Cluster

r{0,2}.dnslogs

  • Raw logs are inserted into one

replicated, sharded table

  • Multiple r{0,2} databases to better pack

the cluster with shards and replicas

  • Aggregate tables for long-term storage

Today...

ReplicatedMergeTree ReplicatedAggregatingMergeTree dnslogs_rollup_X

slide-29
SLIDE 29

October 2016 Began evaluating technologies and architecture, 1 instance in Docker Finalized schema, deployed a production ClickHouse cluster of 6 nodes November 2016 Prototype ClickHouse cluster with 3 nodes, inserting a sample of data August 2017 Migrated to a new cluster with multi-tenancy Growing interest among other Cloudflare engineering teams, worked on standard tooling December 2016 ClickHouse visualisations with Superset and Grafana Spring 2017 TopN, IP prefix matching, Go native driver, Analytics library, pkey in monotonic functions

slide-30
SLIDE 30

October 2016 Began evaluating technologies and architecture, 1 instance in Docker Finalized schema, deployed a production ClickHouse cluster of 6 nodes November 2016 Prototype ClickHouse cluster with 3 nodes, inserting a sample of data August 2017 Migrated to a new cluster with multi-tenancy Growing interest among other Cloudflare engineering teams, worked on standard tooling December 2016 ClickHouse visualisations with Superset and Grafana Spring 2017 TopN, IP prefix matching, Go native driver, Analytics library, pkey in monotonic functions

Multi-tenant ClickHouse cluster

Row Insertion/s

8M+

Raid-0 Spinning Disks

2PB+

Insertion Throughput/s

4GB+

Nodes

33

slide-31
SLIDE 31

ClickHouse Today… 12 Trillion Rows

SELECT table, sum(rows) AS total FROM system.cluster_parts WHERE database = 'r0' GROUP BY table ORDER BY total DESC ┌─table──────────────────────────────┬─────────────total─┐ │ ███████████████ │ 9,051,633,001,267 │ │ ████████████████████ │ 2,088,851,716,078 │ │ ███████████████████ │ 847,768,860,981 │ │ ██████████████████████ │ 259,486,159,236 │ │ … │ … │

slide-32
SLIDE 32
  • TopK(n) Aggregates

https://github.com/yandex/ClickHouse/pull/754

  • TrieDictionaries (IP Prefix)

https://github.com/yandex/ClickHouse/pull/785

  • SpaceSaving: internal storage for StringRef{}

https://github.com/yandex/ClickHouse/pull/925

  • Bug fixes to the Go native driver

https://github.com/kshvakov/clickhouse

  • sumMap(key, value)

https://github.com/yandex/ClickHouse/pull/1250

Contributions to ClickHouse

slide-33
SLIDE 33

Other Contributions

  • Grafana Plugin

https://github.com/vavrusa/grafana-sqldb-datasource (see also https://github.com/Vertamedia/clickhouse-grafana)

  • SQLAlchemy (Superset)

https://github.com/cloudflare/sqlalchemy-clickhouse

slide-34
SLIDE 34

Python w/ Jupyter Notebooks

import requests import pandas as pd def ch(q, host='127.0.0.1', port=9001): start = timer() r = requests.get( 'https://%s:%d/' % (host, port), params={'user': 'xxx', 'query': q + '\nFORMAT TabSeparatedWithNames'}, stream=True) end = timer() if not r.ok: raise RuntimeError(r.text) print 'Query finished in %.02fs' % (end - start) return pd.read_csv(r.raw, sep="\t")

slide-35
SLIDE 35

Python w/ Jupyter Notebooks

import requests import pandas as pd def ch(q, host='127.0.0.1', port=9001): start = timer() r = requests.get( 'https://%s:%d/' % (host, port), params={'user': 'xxx', 'query': q + '\nFORMAT TabSeparatedWithNames'}, stream=True) end = timer() if not r.ok: raise RuntimeError(r.text) print 'Query finished in %.02fs' % (end - start) return pd.read_csv(r.raw, sep="\t")

slide-36
SLIDE 36

Python w/ Jupyter Notebooks

slide-37
SLIDE 37

Python w/ Jupyter Notebooks

slide-38
SLIDE 38

blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second

Check it

slide-39
SLIDE 39

Thanks!

@tarnfeld @vavrusam

https://cloudflare.com/careers/departments/engineering