Modern Fast Streaming Data Todd L. Montgomery @toddlmontgomery - - PowerPoint PPT Presentation

modern fast streaming data
SMART_READER_LITE
LIVE PREVIEW

Modern Fast Streaming Data Todd L. Montgomery @toddlmontgomery - - PowerPoint PPT Presentation

Modern Fast Streaming Data Todd L. Montgomery @toddlmontgomery Why Should We Care? Myths & Misconceptions You cant escape the Math Technologies & Techniques Why Should We Care? Human Knowledge is now doubling every year* * by


slide-1
SLIDE 1

Modern Fast Streaming Data

Todd L. Montgomery @toddlmontgomery

slide-2
SLIDE 2

Why Should We Care? Myths & Misconceptions You can’t escape the Math Technologies & Techniques

slide-3
SLIDE 3

Why Should We Care?

slide-4
SLIDE 4

Human Knowledge is now doubling every year*

* by discipline, 12-18 months

slide-5
SLIDE 5

Fueled by Technology

slide-6
SLIDE 6

Middle Ages - 1500 yrs Renaissance - 250 yrs Industrial Revolution - 150 yrs WWII - 25 years

Buckminster Fuller - Critical Path 1981

slide-7
SLIDE 7

IoT

slide-8
SLIDE 8

IoT Ubiquitous Computing

slide-9
SLIDE 9

In the near future, Human Knowledge could double every 72 hours

slide-10
SLIDE 10

What this could mean for

  • ur systems…
slide-11
SLIDE 11
slide-12
SLIDE 12

Updates/Sec =

Devices * Frequency * Market Share

Either ingest or streaming. 2x for Request/Response

slide-13
SLIDE 13

Updates/Sec =

Devices * Frequency * Market Share

9 Billion (Today) 50 Billion by 2020 (Cisco) 26 Billion by 2020 (Smartphone/Tablet - Gartner) 75 Billion by 2020 (Morgan Stanley)

slide-14
SLIDE 14

Updates/Sec =

50 Billion * 6/min * 1% = 50 Million/sec

slide-15
SLIDE 15

Bandwidth =

50 Billion * 6/min * 1% * 200 bytes = 9.3 GB/s (74.5 Gb/s)

slide-16
SLIDE 16

And… Geographic Distribution

30% 15% 10% 15% 20% 10%

slide-17
SLIDE 17
slide-18
SLIDE 18

Social & Societal demands will require processing an intense stream

  • f data in real-time
slide-19
SLIDE 19

Myths & Misconceptions

slide-20
SLIDE 20

Excuses, Excuses!

slide-21
SLIDE 21

Myth (CPUs, Storage, Networks) are not capable of processing in real-time*

* for some unknown, unquantified data volume

slide-22
SLIDE 22

Accumulated Improvement Time Network Bandwidth Response Time Storage Capacity CPU Cores Memory Capacity

slide-23
SLIDE 23

http://en.wikipedia.org/wiki/Instructions_per_second

Year Processor MIPS 1974 Intel 8080 0.29 1982 Intel 286 1.28 1993 PowerPC 601 157 2003 Pentium 4 Extreme 9,726 2008 Intel Core i7 920 (Quad) 82,300 2011 Intel Core i7 2600K (4/8) Sandy Bridge 128,300 2014 Intel Core i7 5960x (8/16) Haswell 298,190

slide-24
SLIDE 24

Raspberry Pi 2 (Quad) 1,186 MIPS!

http://en.wikipedia.org/wiki/Instructions_per_second

slide-25
SLIDE 25

DDRSSD PCIe - 3 100 GbE … OmniPath

slide-26
SLIDE 26

1 thread of awesome > 128 cores of so-so

http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html http://blog.acolyer.org/2015/06/05/scalability-but-at-what-cost/

slide-27
SLIDE 27

Misconception Data needs to come to rest to be processed

slide-28
SLIDE 28

Data at rest is a liability

slide-29
SLIDE 29

Data at rest is a liability

MDM ILM Warehouse ETL

slide-30
SLIDE 30

Why would you want transient data at rest?

slide-31
SLIDE 31

But what about sort? …REALLY?!

slide-32
SLIDE 32

You can’t escape the Math

slide-33
SLIDE 33

… and you don’t want to The Math will guide you to a solution

slide-34
SLIDE 34

"AmdahlsLaw" by Daniels220 at English Wikipedia - Own work based on: File:AmdahlsLaw.png. Licensed under CC BY-SA 3.0 via Wikimedia Commons

slide-35
SLIDE 35

Setup & Scheduling Work Unit Work Unit Work Unit Work Unit Post Processing

slide-36
SLIDE 36

Setup & Scheduling Work Unit Work Unit Work Unit Work Unit Post Processing Contention Contention

slide-37
SLIDE 37

Contention isn’t the biggest enemy

slide-38
SLIDE 38

Coherence is!

slide-39
SLIDE 39

Universal Scalability Law

2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024

Speedup Processors

Amdahl USL

slide-40
SLIDE 40

Setup & Scheduling Work Unit Work Unit Work Unit Work Unit Post Processing Contention Contention Contention + Coherence Contention + Coherence

slide-41
SLIDE 41

Up Front Partitioning Work Unit Work Unit Work Unit Work Unit

slide-42
SLIDE 42

… and more Queuing Theory ⭐ Complexity Theory CAP Theorem

slide-43
SLIDE 43

Technologies & Techniques

slide-44
SLIDE 44

The Essence of Architecture Data Structures Protocols of Interaction Mechanical Sympathy

slide-45
SLIDE 45

Understanding is essential

slide-46
SLIDE 46

Accumulated Improvement Time Network Bandwidth Response Time Storage Capacity CPU Cores Memory Capacity

slide-47
SLIDE 47

Accumulated Improvement Time Network Bandwidth Response Time Storage Capacity CPU Cores Memory Capacity

Batching…

slide-48
SLIDE 48

Technique Smart Batching (Natural Batching)

http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html

slide-49
SLIDE 49

Resource

slide-50
SLIDE 50

Resource Ring Buffer

slide-51
SLIDE 51

Batching Thread Resource Pull off as much waiting data as possible

slide-52
SLIDE 52

Single Writer Principle Avoid Resource Contention Batching only when needed Rate Decoupling Back Pressure

slide-53
SLIDE 53

Techniques Freedom! Lock-Free, Wait-Free

http://en.wikipedia.org/wiki/Non-blocking_algorithm

slide-54
SLIDE 54
slide-55
SLIDE 55

Words Matter

slide-56
SLIDE 56

Obstruction-Freedom Partially completed operations aborted & changes made rolled back

slide-57
SLIDE 57

Lock-Freedom Individual thread may starve, but guaranteed system-wide throughput Lock-Free is Obstruction-Free

slide-58
SLIDE 58

Wait-Freedom Starvation free and guaranteed system-wide throughput Wait-Free is Lock-Free

slide-59
SLIDE 59

These properties are awesome! Who wouldn’t want them?

slide-60
SLIDE 60

System-wide properties start at the lowest level

slide-61
SLIDE 61

Essence Just because we could take an action right now, doesn’t mean we should

slide-62
SLIDE 62
slide-63
SLIDE 63

Technology CRDTs

http://en.wikipedia.org/wiki/Conflict-free_replicated_data_type

slide-64
SLIDE 64

Node 1 2 N Value sum(0,N) = 0 …

slide-65
SLIDE 65

1

Node 1 2 N Value sum(0,N) = 1 …

slide-66
SLIDE 66

1 1

Node 1 2 N Value sum(0,N) = 2 …

slide-67
SLIDE 67

Gossip for visibility

slide-68
SLIDE 68

[2] = 0 [1] = 2 [0] = 4 [N] = 0

4 2

4 2

Shared View

slide-69
SLIDE 69

Technology Append-only Data Structures

https://github.com/real-logic/Aeron

slide-70
SLIDE 70

Header Message

Log

slide-71
SLIDE 71

Header Message Header Message Header Message

Log

slide-72
SLIDE 72

Efficiently Replicating an Append-only Log

slide-73
SLIDE 73

What If…? The Data Structure could be directly sent to the “network”?

slide-74
SLIDE 74

Header Message

slide-75
SLIDE 75

Header Message

Position in Log Length

slide-76
SLIDE 76

Header Message

Position in Log Length Version/Flags Type etc.

+

slide-77
SLIDE 77

Header Message

Fragment 0

slide-78
SLIDE 78

Header Message Header Message

Fragment 0

slide-79
SLIDE 79

Header Message Header Message Header Message Header Message

Fragment 0

slide-80
SLIDE 80

Header Message Header Message Header Message Header Message

Fragment 0 Fragment 1

slide-81
SLIDE 81

Header Message Header Message Header Message Header Message Header Message Header Message

Fragment 0 Fragment 1

slide-82
SLIDE 82

Natural for broadcast replication

slide-83
SLIDE 83

In Closing…

slide-84
SLIDE 84

A flood of data is coming, Many say it is already here, How will you deal with it?

slide-85
SLIDE 85

Ever seen a PitBull drink from a Fire Hydrant?

slide-86
SLIDE 86
slide-87
SLIDE 87

It won’t give up… Be the Pitbull at the Fire Hydrant!

slide-88
SLIDE 88

@toddlmontgomery

Questions?

  • Aeron https://github.com/real-logic/Aeron
  • SlideShare http://www.slideshare.com/toddleemontgomery
  • Twitter @toddlmontgomery

Thank You!