Responding in a timely manner Martin Thompson - @mjpt777 Hard - - PowerPoint PPT Presentation

responding in a timely manner
SMART_READER_LITE
LIVE PREVIEW

Responding in a timely manner Martin Thompson - @mjpt777 Hard - - PowerPoint PPT Presentation

Responding in a timely manner Martin Thompson - @mjpt777 Hard Real-time Soft Real-time Squidgy Real-time The Unaware 1. How to Test and Measure 2. A little bit of Theory 3. A little bit of Practice 4. Common Pitfalls 5. Useful Algorithms and


slide-1
SLIDE 1

Responding in a timely manner

Martin Thompson - @mjpt777

slide-2
SLIDE 2
slide-3
SLIDE 3

Hard Real-time

slide-4
SLIDE 4
slide-5
SLIDE 5

Soft Real-time

slide-6
SLIDE 6
slide-7
SLIDE 7

Squidgy Real-time

slide-8
SLIDE 8
slide-9
SLIDE 9

The Unaware

slide-10
SLIDE 10
slide-11
SLIDE 11
  • 1. How to Test and Measure
  • 2. A little bit of Theory
  • 3. A little bit of Practice
  • 4. Common Pitfalls
  • 5. Useful Algorithms and Techniques
slide-12
SLIDE 12

Test & Measure

slide-13
SLIDE 13

System Under Test

slide-14
SLIDE 14

Distributed Load Generation Agents System Under Test

slide-15
SLIDE 15

Distributed Load Generation Agents System Under Test

slide-16
SLIDE 16

Distributed Load Generation Agents System Under Test

slide-17
SLIDE 17

Distributed Load Generation Agents System Under Test Observer

slide-18
SLIDE 18

Pro Tip:

Setup a continuous performance testing environment

slide-19
SLIDE 19

Pro Tip: Record Everything

slide-20
SLIDE 20

Latency Histograms

slide-21
SLIDE 21

Latency Histograms

Mode

slide-22
SLIDE 22

Latency Histograms

Mode Median

slide-23
SLIDE 23

Latency Histograms

Mode Median Mean

slide-24
SLIDE 24

System: 1000 TPS, mean RT 50µs

slide-25
SLIDE 25

System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second?

slide-26
SLIDE 26

System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second?

~300µs

slide-27
SLIDE 27
slide-28
SLIDE 28

Forget averages, it’s all about percentiles

slide-29
SLIDE 29

Source: Gil Tene (Azul Systems)

Coordinated Omission

slide-30
SLIDE 30

Pro Tip: Don’t deceive yourself

slide-31
SLIDE 31

Theory

slide-32
SLIDE 32

Queuing Theory

0.0 2.0 4.0 6.0 8.0 10.0 12.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Response Time Utilisation

slide-33
SLIDE 33

Queuing Theory

Kendall Notation

M/D/1

slide-34
SLIDE 34

Queuing Theory

r = s(2 – ρ) / 2(1 – ρ)

r = mean response time s = service time ρ = utilisation

slide-35
SLIDE 35

Queuing Theory

r = s(2 – ρ) / 2(1 – ρ)

r = mean response time s = service time ρ = utilisation Note: ρ = λ * (1 / s)

slide-36
SLIDE 36

Queuing Theory

0.0 2.0 4.0 6.0 8.0 10.0 12.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Response Time Utilisation

slide-37
SLIDE 37

Pro Tip:

Ensure that you have sufficient capacity

slide-38
SLIDE 38

Queuing Theory

Little’s Law: L = λ * W

L = mean queue length λ = mean arrival rate W = mean time in system

slide-39
SLIDE 39

Pro Tip:

Bound queues to meet response time SLAs

slide-40
SLIDE 40

Can we go parallel to speedup?

slide-41
SLIDE 41

A

Sequential Process time

B

Amdahl’s Law

slide-42
SLIDE 42

A

Sequential Process

A B

Parallel Process A

A A A

time

B

Amdahl’s Law

slide-43
SLIDE 43

A

Sequential Process Parallel Process B

A B

Parallel Process A

A A A

time

B A B B B B

Amdahl’s Law

slide-44
SLIDE 44

Amdahl's Law

slide-45
SLIDE 45

Universal Scalability Law

C(N) = N / (1 + α(N – 1) + ((β* N) * (N – 1)))

C = capacity or throughput N = number of processors α = contention penalty β = coherence penalty

slide-46
SLIDE 46

Universal Scalability Law

2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024 Speedup Processors Amdahl USL

slide-47
SLIDE 47

What about the service time?

slide-48
SLIDE 48

Order of Algorithms

slide-49
SLIDE 49

Practice

slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Pitfalls

slide-56
SLIDE 56

Modern Processors

P & C States???

Hyperthreading? SMIs?

slide-57
SLIDE 57

Non-Uniform Memory Architecture (NUMA)

P & C States???

C 1 C n C 1 C n

Registers/Buffers <1ns

L1 L1 L1 L1

~4 cycles ~1ns

L2 L2 L2 L2

~12 cycles ~3ns

L3 L3

~40 cycles ~15ns ~60 cycles ~20ns (dirty hit) ~65ns

DRAM

QPI ~40ns

MC MC DRAM DRAM DRAM DRAM DRAM DRAM DRAM

... ... ... ... ... ...

QPI QPI PCI-e 3 PCI-e 3

40X IO 40X IO

* Assumption: 3GHz Processor

slide-58
SLIDE 58

Virtual Memory Management

Transparent Huge Pages Page Flushing & IO Scheduling vm.min_free_kbytes Swap???

slide-59
SLIDE 59

Safepoints in the JVM

Garbage Collection, De-optimisation, Biased Locking, Stack traces, etc.

slide-60
SLIDE 60

Virtualization

System Calls

slide-61
SLIDE 61

Notification

public class SomethingUseful { // Lots of useful stuff public void handOffSomeWork() { // prepare for handoff synchronized (this) { someObject.notify(); } } }

slide-62
SLIDE 62

Notification

public class SomethingUseful { // Lots of useful stuff public void handOffSomeWork() { // prepare for handoff synchronized (this) { someObject.notify(); } } }

slide-63
SLIDE 63

Law of Leaky Abstractions

“All non-trivial abstractions, to some extent, are leaky.”

  • Joel Spolsky
slide-64
SLIDE 64

Law of Leaky Abstractions

“The detail of underlying complexity cannot be ignored.”

slide-65
SLIDE 65

Mechanical Sympathy

slide-66
SLIDE 66

Responding in the presence of failure

slide-67
SLIDE 67

Algorithms & Techniques

slide-68
SLIDE 68

Clean Room Experiments

  • sufficient CPUs
  • intel_idle.max_cstate=0
  • cpufreq
  • isocpus
  • numctl, cgroups, affinity
  • “Washed” SSDs
  • network buffer sizing
  • jHiccup
  • tune your stack!
  • Mechanical Sympathy
slide-69
SLIDE 69

Profiling

slide-70
SLIDE 70

Pro Tip:

Incorporate telemetry and histograms

slide-71
SLIDE 71

Smart Batching

Latency Load Typical Possible

slide-72
SLIDE 72

Smart Batching

Producers

slide-73
SLIDE 73

Smart Batching

Batcher Producers << Amortise Expensive Costs >>

slide-74
SLIDE 74

Pro Tip:

Amortise the Expensive Costs

slide-75
SLIDE 75

Applying Backpressure

Transaction Service Threads Network Stack Storage Threads Network Stack Gateway Services Network Stack IO Customers

slide-76
SLIDE 76

Non-Blocking Design

“Get out of your own way!”

  • Don’t hog any resource
  • Always try to make progress
  • Enables Smart Batching
slide-77
SLIDE 77

Pro Tip:

Beware of hogging resources in synchronous designs

slide-78
SLIDE 78

Lock-Free Concurrent Algorithms

  • Agree protocols of

interaction

  • Don’t get a 3rd party

involved, i.e. the OS

  • Keep to user-space
  • Beat the “notify()”

problem

slide-79
SLIDE 79

Observable State Machines

slide-80
SLIDE 80

Pro Tip:

Observable state machines make monitoring easy

slide-81
SLIDE 81

Cluster for Response and Resilience

Service A Service A Sequencer

slide-82
SLIDE 82

Cluster for Response and Resilience

Service A Service A Sequencer

slide-83
SLIDE 83

Cluster for Response and Resilience

Service A Service A Service N Sequencer

slide-84
SLIDE 84

Data Structures and O(?) Models

Is there a world beyond maps and lists?

slide-85
SLIDE 85

In closing…

slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88

The Internet of Things (IoT)

“There will be X connected devices by 2020...” Where X is 20 to 75 Billion

slide-89
SLIDE 89

If you cannot control arrival rates...

slide-90
SLIDE 90

...you have to think hard about improving service times!

slide-91
SLIDE 91

...and/or you have to think hard about removing all contention!

slide-92
SLIDE 92

Questions?

Blog: http://mechanical-sympathy.blogspot.com/ Twitter: @mjpt777 “It does not matter how intelligent you are, if you guess and that guess cannot be backed up by experimental evidence – then it is still a guess.”

  • Richard Feynman