KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP - - PowerPoint PPT Presentation

kafka streams
SMART_READER_LITE
LIVE PREVIEW

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP - - PowerPoint PPT Presentation

FROM USE CASE TO PRODUCTION KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP CLOUD MONITORING AWS METRICS NRDB HTTP APP CLOUD MONITORING SCHDLR JOBS AWS NRDB METRICS HTTP WORKER CLOUD


slide-1
SLIDE 1

KAFKA STREAMS

FROM USE CASE TO PRODUCTION

slide-2
SLIDE 2

CLOUD MONITORING

AWS

slide-3
SLIDE 3

CLOUD MONITORING

AWS APP

slide-4
SLIDE 4

CLOUD MONITORING

AWS HTTP APP

slide-5
SLIDE 5

CLOUD MONITORING

AWS NRDB METRICS HTTP APP

slide-6
SLIDE 6

CLOUD MONITORING

WORKER SCHDLR JOBS AWS METRICS HTTP NRDB

slide-7
SLIDE 7

CLOUD MONITORING

AWS SCHDLR JOBS WORKER WORKER WORKER WORKER WORKER WORKER METRICS HTTP NRDB

slide-8
SLIDE 8

CLOUD MONITORING

AWS JOBS WORKER WORKER WORKER WORKER WORKER WORKER SCHDLR SCHDLR SCHDLR METRICS HTTP NRDB

slide-9
SLIDE 9

CLOUD MONITORING

AWS METRICS HTTP SCHDLR JOBS WORKER WORKER WORKER WORKER WORKER WORKER SCHDLR SCHDLR ZK NRDB

slide-10
SLIDE 10

CLOUD MONITORING

▸ horizontally scalable ▸ stateless ▸ failsafe ▸ a few Kafka topics

slide-11
SLIDE 11

AGGREGATION

📋

METRICS?

slide-12
SLIDE 12

WHAT METRICS?

APP METRICS NRDB

slide-13
SLIDE 13

WHAT METRICS?

APP METRICS NRDB

{ "id": 1, “timestamp": 5, "max.cpu": 10 }

slide-14
SLIDE 14

WHAT METRICS?

APP METRICS NRDB

{ "id": 1, “timestamp": 5, "max.cpu": 10 }

slide-15
SLIDE 15

WHAT METRICS?

APP METRICS NRDB

{ "id": 1, “timestamp": 5, "max.cpu": 10 }

slide-16
SLIDE 16

WHAT METRICS?

APP METRICS NRDB

{ "id": 1, “timestamp": 0, "max.cpu": 10 }

slide-17
SLIDE 17

WHAT METRICS?

APP METRICS NRDB

{ "id": 1, “timestamp": 0, "max.cpu": 10 } { "id": 1, "timestamp": 60, "max.cpu": 20 }

slide-18
SLIDE 18

WHAT METRICS?

APP METRICS NRDB

{ "id": 1, “timestamp": 0, "max.cpu": 10 } { "id": 1, "timestamp": 60, "max.cpu": 20 } { "id": 1, "timestamp": 120, "max.cpu": 5 }

slide-19
SLIDE 19

AGGREGATION

{ "id": 1, “timestamp": 0, "max.cpu": 10 } { "id": 1, "timestamp": 60, "max.cpu": 20 } { "id": 1, "timestamp": 120, "max.cpu": 5 }

60 120 T

slide-20
SLIDE 20

AGGREGATION

{ "id": 1, "timestamp": 3600, "max.cpu": 20 }

3600 T

slide-21
SLIDE 21

AGGREGATION

😲

STATE!

slide-22
SLIDE 22

AGGREGATION

APP METRICS 1M NRDB

slide-23
SLIDE 23

AGGREGATION

APP METRICS 1M NRDB METRICS 1H APP

slide-24
SLIDE 24

AGGREGATION

▸ batch? ▸ local storage? ▸ summon Redis?

slide-25
SLIDE 25

AGGREGATION

slide-26
SLIDE 26

✓ exactly once ✓ stateful (local state with failsafe mechanism) ? own cluster vs managed cluster ? framework vs library

AGGREGATION

slide-27
SLIDE 27

AGGREGATION

NRDB APP METRICS 1H

slide-28
SLIDE 28

AGGREGATION

NRDB APP KAFKA CLIENT LIBRARY METRICS 1H

slide-29
SLIDE 29

AGGREGATION

METRICS 1H NRDB APP + DSL KSTREAMS LIBRARY KAFKA CLIENT LIBRARY

slide-30
SLIDE 30

AGGREGATION

▸ same deployment mechanism ▸ no new external dependencies

slide-31
SLIDE 31

AGGREGATION

STATE!

slide-32
SLIDE 32

AGGREGATION

NRDB METRICS 1H APP METRICS 1M

slide-33
SLIDE 33

AGGREGATION

NRDB METRICS 1H APP METRICS 1M

slide-34
SLIDE 34

AGGREGATION

NRDB METRICS 1H APP METRICS 1M CHANGE-LOG

slide-35
SLIDE 35

AGGREGATION

NRDB METRICS 1H APP METRICS 1M CHANGE-LOG

{ "id": 1, "timestamp": 0, "max.cpu": 10 } { "id": 1, "max.cpu": 10 }

slide-36
SLIDE 36

AGGREGATION

NRDB METRICS 1H APP METRICS 1M CHANGE-LOG

{ "id": 1, "timestamp": 120, "max.cpu": 20 } { "id": 1, "max.cpu": 20 }

slide-37
SLIDE 37

AGGREGATION

NRDB METRICS 1H APP METRICS 1M CHANGE-LOG

{ "id": 1, "timestamp": 3600, "max.cpu": 20 } { "id": 1, "max.cpu": 20 }

slide-38
SLIDE 38

AGGREGATION

NRDB METRICS 1H METRICS 1M METRICS 1M METRICS 1M APP P1 PO P2 CHANGE-LOG

slide-39
SLIDE 39

AGGREGATION

NRDB METRICS 1H METRICS 1M METRICS 1M METRICS 1M APP APP APP PO P1 P2 P1 PO P2 CHANGE-LOG

slide-40
SLIDE 40

CHANGE-LOG CHANGE-LOG

AGGREGATION

NRDB METRICS 1H METRICS 1M METRICS 1M METRICS 1M APP APP APP PO P1 P2 P1 PO P2 CHANGE-LOG P1 PO P2 PO P1 P2

slide-41
SLIDE 41

CHANGE-LOG CHANGE-LOG

AGGREGATION

NRDB METRICS 1H METRICS 1M METRICS 1M METRICS 1M APP APP APP PO P1 P2 P1 PO P2 CHANGE-LOG P1 PO P2 CONSUMER GROUP PO P1 P2

slide-42
SLIDE 42

AGGREGATION

NRDB METRICS 1H METRICS 1M METRICS 1M METRICS 1M APP APP APP PO P1 P2 P1 PO P2 P1 PO P2 PO P1 P2 CONSUMER GROUP CHANGE-LOG CHANGE-LOG CHANGE-LOG

slide-43
SLIDE 43

AGGREGATION

NRDB METRICS 1H METRICS 1M METRICS 1M METRICS 1M APP APP PO P1 P2 P1 PO P2 P1 PO P2 P1 CONSUMER GROUP PO P2 CHANGE-LOG CHANGE-LOG CHANGE-LOG

slide-44
SLIDE 44

AGGREGATION

▸ key-value storage for stateful computations, failsafe ▸ time windows calculation ▸ scalable with number of partitions ▸ a bunch of new Kafka topics

slide-45
SLIDE 45

AGGREGATION

🎊

SUCCESS

slide-46
SLIDE 46

ENRICHMENT

AWS NRDB HTTP APP METRICS

{ "id": 1, "timestamp": 5, "max.cpu": 10 }

slide-47
SLIDE 47

ENRICHMENT

AWS HTTP APP METADATA HTTP APP

{ "id": 1, "name": "host-1", "region": "eu" }

NRDB METRICS

{ "id": 1, "timestamp": 5, "max.cpu": 10 }

slide-48
SLIDE 48

ENRICHMENT

AWS APP APP NRDB HTTP METADATA HTTP

{ "id": 1, "name": "host-1", "region": "eu" }

METRICS

{ "id": 1, "timestamp": 5, "max.cpu": 10 }

slide-49
SLIDE 49

ENRICHMENT

AWS APP APP NRDB HTTP METADATA HTTP

{ "id": 1, "name": "host-1", "region": "eu" }

METRICS

{ "id": 1, "timestamp": 5, "max.cpu": 10 }

slide-50
SLIDE 50

ENRICHMENT

AWS APP APP NRDB HTTP METADATA HTTP

{ "id": 1, "name": "host-1", "region": "eu" }

METRICS

{ "id": 1, "timestamp": 5, "max.cpu": 10 }

slide-51
SLIDE 51

ENRICHMENT

NRDB M&M APP

{ "id": 1, "name": "host-1", "region": "eu" } { "id": 1, "timestamp": 5, "max.cpu": 10 }

METADATA METRICS

slide-52
SLIDE 52

ENRICHMENT

APP M&M

{ "id": 1, "timestamp": 5 "max.cpu": 10 "name": "host-1", "region": "eu" }

NRDB

{ "id": 1, "name": "host-1", "region": "eu" } { "id": 1, "timestamp": 5, "max.cpu": 10 }

METADATA METRICS

slide-53
SLIDE 53

ENRICHMENT

APP M&M

{ "id": 1, "timestamp": 5 "max.cpu": 10 "name": "host-1", "region": "eu" }

NRDB

{ "id": 1, "name": "host-1", "region": "eu" } { "id": 1, "timestamp": 5, "max.cpu": 10 }

METADATA METRICS

slide-54
SLIDE 54

ENRICHMENT

APP M&M METADATA NRDB METRICS METADATA

slide-55
SLIDE 55

ENRICHMENT

METADATA APP METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 M&M NRDB METADATA

slide-56
SLIDE 56

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 METADATA METADATA PO P1 M&M NRDB METADATA

slide-57
SLIDE 57

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 METADATA METADATA PO P1 M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-58
SLIDE 58

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 PO P1 P2 PO P1 METADATA METADATA M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-59
SLIDE 59

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 PO P1 P2 PO P1 METADATA METADATA M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-60
SLIDE 60

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 PO P1 P2 PO P1 METADATA METADATA M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-61
SLIDE 61

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 PO P1 METADATA METADATA M&M NRDB METADATA

slide-62
SLIDE 62

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 P1 PO P2 P1 PO P2 P1 PO P2 PO P1 METADATA METADATA M&M NRDB METADATA

slide-63
SLIDE 63

CO-PARTITIONING

ENRICHMENT

slide-64
SLIDE 64

ENRICHMENT

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS P1 PO P2 P1 PO P2 P2 P1 PO P2 P1 PO P2 P1 PO P2 PO P1 METADATA METADATA M&M NRDB METADATA

slide-65
SLIDE 65

METADATA

ENRICHMENT

KSTREAMS APP KSTREAMS APP KSTREAMS APP METADATA METADATA METADATA METRICS METRICS METRICS METADATA P2 P2 P0 P1 PO P1 P1 PO P2 P1 PO P2 M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-66
SLIDE 66

METADATA

ENRICHMENT

KSTREAMS APP KSTREAMS APP KSTREAMS APP METADATA METADATA METADATA METRICS METRICS METRICS METADATA P2 P2 P0 P1 PO P1 P1 PO P2 P1 PO P2 M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-67
SLIDE 67

METADATA

ENRICHMENT

KSTREAMS APP KSTREAMS APP KSTREAMS APP METADATA METADATA METADATA METRICS METRICS METRICS METADATA P2 P2 P0 P1 PO P1 P1 PO P2 P1 PO P2 M&M NRDB METADATA

K:1, V:42 K:1, V:eu

slide-68
SLIDE 68

ENRICHMENT

▸ same partitioning key ▸ same number of partitions ▸ same hash function of producer

slide-69
SLIDE 69

ENRICHMENT

METRICS METRICS METRICS JAVA APP GO APP METADATA METADATA METADATA P1 P2 P1 PO P2 PO

slide-70
SLIDE 70

ENRICHMENT

METRICS JAVA APP GO APP METADATA P2 PO

K:1, V:42 K:1, V:eu

slide-71
SLIDE 71

ENRICHMENT

METRICS JAVA APP GO APP METADATA METADATA KSTREAMS APP P2 PO P0 PO

K:1, V:42 K:1, V:eu

slide-72
SLIDE 72

ENRICHMENT

METRICS JAVA APP GO APP METADATA METADATA KSTREAMS APP P2 PO P0 PO

K:1, V:42 K:1, V:eu

slide-73
SLIDE 73

ENRICHMENT

METRICS JAVA APP GO APP METADATA METADATA KSTREAMS APP FNV-1A MURMUR2 P2 PO P0 PO

slide-74
SLIDE 74

ENRICHMENT

METRICS JAVA APP GO APP METADATA METADATA METADATA METRICS METRICS METADATA KSTREAMS APP P1 PO P2 PO P0’ P1 PO P2

K:1, V:42 K:1, V:eu

P1’ P0’ P2’ METADATA’ REPARTITION METADATA’ METADATA’

K:1, V:eu

slide-75
SLIDE 75

ENRICHMENT

METRICS JAVA APP GO APP METADATA METADATA METADATA METRICS METRICS METADATA KSTREAMS APP P1 PO P2 PO P0’ P1 PO P2 METADATA’ P1’ P0’ P2’ REPARTITION METADATA’ METADATA’

K:1, V:42 K:1, V:eu K:1, V:eu

slide-76
SLIDE 76

▸ enrichment at scale thanks to co-partitioning ▸ in case of doubt — repartition ▸ a bit more kafka topics

ENRICHMENT

slide-77
SLIDE 77

ENRICHMENT

🏞

VACATIONS?

slide-78
SLIDE 78

PRODUCTION READINESS

APP

2 CPU 4096 MB

slide-79
SLIDE 79

PRODUCTION READINESS

APP APP APP

2 CPU 4096 MB 3x

slide-80
SLIDE 80

PRODUCTION READINESS

APP HEALTH-CHECK APP HEALTH-CHECK APP HEALTH-CHECK

slide-81
SLIDE 81

PRODUCTION READINESS

APP HEALTH-CHECK APP HEALTH-CHECK APP HEALTH-CHECK

slide-82
SLIDE 82

PRODUCTION READINESS

APP HEALTH-CHECK APP HEALTH-CHECK

slide-83
SLIDE 83

PRODUCTION READINESS

APP HEALTH-CHECK APP HEALTH-CHECK APP HEALTH-CHECK

slide-84
SLIDE 84

APP + DSL KSTREAMS LIBRARY HEALTH-CHECK

PRODUCTION READINESS

slide-85
SLIDE 85

APP STREAM THREAD HEALTH-CHECK

PRODUCTION READINESS

slide-86
SLIDE 86

APP STREAM THREAD HEALTH-CHECK

PRODUCTION READINESS

slide-87
SLIDE 87

APP STREAM THREAD HEALTH-CHECK

PRODUCTION READINESS

IT’S FINE

slide-88
SLIDE 88

APP STREAM THREAD HEALTH-CHECK

PRODUCTION READINESS

slide-89
SLIDE 89

ENRICHMENT

👃

LISTENERS

slide-90
SLIDE 90

PRODUCTION READINESS

RUNNING

slide-91
SLIDE 91

PRODUCTION READINESS

PENDING SHUTDOWN RUNNING NOT RUNNING ERROR

slide-92
SLIDE 92

PRODUCTION READINESS

REBALANCING RUNNING

slide-93
SLIDE 93

PRODUCTION READINESS

REBALANCING RUNNING

slide-94
SLIDE 94

PRODUCTION READINESS

BATCH RESTORED RESTORE START

slide-95
SLIDE 95

PRODUCTION READINESS

RESTORE END BATCH RESTORED RESTORE START

slide-96
SLIDE 96

PRODUCTION READINESS

BATCH RESTORED RESTORE START

slide-97
SLIDE 97

PRODUCTION READINESS

  • consuming/producing throughput

▸ missing joins ▸ rebalancing loops ▸ storage-specific metrics ▸ business-specific metrics

slide-98
SLIDE 98

PRODUCTION READINESS

slide-99
SLIDE 99

PRODUCTION READINESS

slide-100
SLIDE 100

PRODUCTION READINESS

slide-101
SLIDE 101

PRODUCTION READINESS

slide-102
SLIDE 102

PRODUCTION READINESS

slide-103
SLIDE 103

PRODUCTION READINESS

slide-104
SLIDE 104

PRODUCTION READINESS

slide-105
SLIDE 105

🚁

SCALE!

PRODUCTION READINESS

slide-106
SLIDE 106

SCALE OUT

METRICS METRICS METRICS APP APP METADATA METADATA METADATA METADATA KSTREAMS APP P1 PO P2 P1 PO P2

slide-107
SLIDE 107

APP APP METADATA KSTREAMS APP

SCALE OUT

METRICS METRICS METRICS METADATA METADATA METADATA P1 PO P2 P1 PO P2 METRICS P3

slide-108
SLIDE 108

APP APP METADATA KSTREAMS APP

SCALE OUT

METRICS METRICS METRICS METADATA METADATA METADATA P1 PO P2 P1 PO P2 METRICS P3

slide-109
SLIDE 109

APP APP METADATA KSTREAMS APP

SCALE OUT

METRICS METRICS METRICS METADATA METADATA METADATA P1 PO P2 P1 PO P2 METRICS P3 METADATA P3

slide-110
SLIDE 110

3 APP APP 3 KSTREAMS APP

SCALE OUT

slide-111
SLIDE 111

APP APP KSTREAMS APP 3 3 4 4 PR #1

SCALE OUT

slide-112
SLIDE 112

APP KSTREAMS APP PR #1 3 3 4 4 APP PR #2

SCALE OUT

slide-113
SLIDE 113

APP KSTREAMS APP PR #1 3 3 4 4 APP PR #2 PR #3

SCALE OUT

slide-114
SLIDE 114

APP KSTREAMS APP PR #1 3 3 4 4 APP PR #2 PR #3 PR #4

SCALE OUT

slide-115
SLIDE 115

RUNBOOKS!

SCALE OUT

slide-116
SLIDE 116

SCALE UP

SOURCE SOURCE SINK LOOKUP

slide-117
SLIDE 117

SCALE UP

SOURCE SOURCE SINK LOOKUP

slide-118
SLIDE 118

SCALE UP

SOURCE SOURCE SINK LOOKUP

slide-119
SLIDE 119

SCALE UP

SOURCE SOURCE SINK LOOKUP TRANSFORMERS MERGE

slide-120
SLIDE 120

SCALE UP

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS

slide-121
SLIDE 121

SCALE UP

APP APP APP METADATA METADATA METADATA METRICS METRICS METRICS

slide-122
SLIDE 122

SCALE UP

  • don’t block
  • if blocking, do it in a separate service
  • distribute the load as uniform as possible (not by account id)
slide-123
SLIDE 123

TOO GOOD TO BE TRUE…

  • works reliably in production under load
  • you really need to know Kafka, but that’s enough
  • co-partitioning is the only magic dust
  • watch new topics, there will be a lot
slide-124
SLIDE 124

QUESTIONS?

CONCERNS?

PROTESTS?