The Best of Apache Kafka Architecture Ranganathan Balashanmugam - - PowerPoint PPT Presentation

the best of apache kafka architecture
SMART_READER_LITE
LIVE PREVIEW

The Best of Apache Kafka Architecture Ranganathan Balashanmugam - - PowerPoint PPT Presentation

Apache: Big Data 2015 The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Hell Budapest About Me Graduated as Civil Engineer. <dev> 10+ years </dev> <Thoughtworker from=India/>


slide-1
SLIDE 1

The Best of Apache Kafka Architecture

Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

slide-2
SLIDE 2

Helló Budapest

slide-3
SLIDE 3

About Me

❏ Graduated as Civil Engineer. ❏ <dev> 10+ years </dev> ❏ <Thoughtworker from=”India”/> ❏ Organizer of Hyderabad Scalability Meetup with 2000+ members.

slide-4
SLIDE 4

“Form follows function.”

  • Louis Sullivan
slide-5
SLIDE 5

Gravity Dam

Indirasagar Dam, India

img src: http://www.montanhydraulik.in

slide-6
SLIDE 6

Forces on a gravity dam

Dam weight Head Water Tail Water Uplift

slide-7
SLIDE 7

❏ publish-subscribe messaging service ❏ distributed commit/write-ahead log “producers produce, consumers consume, in large distributed reliable way -- real time”

slide-8
SLIDE 8

❏ DBs ❏ Logs ❏ Brokers ❏ HDFS “For highly distributed messages, Kafka stands out.”

Why Kafka?

slide-9
SLIDE 9

Kafka Vs ________

src: https://softwaremill.com/mqperf/

slide-10
SLIDE 10

Timeline

2011 2012 2013 2014 2015

Open sourced by LinkedIn, as version 0.6 Graduated from Apache Latest stable - 0.8.2.1 Several Engineers who built Kakfa create Confluent

slide-11
SLIDE 11

A Kafka Message

CRC attributes key length key message message length message content kafka.message.Message magic

Change requested:KAFKA-2511

slide-12
SLIDE 12

Producers - push

Kafka Broker

  • rg.apache.kafka.clients.producer.KafkaProducer

Response => [TopicName [Partition ErrorCode Offset]] Request => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]

slide-13
SLIDE 13

Topic

number of messages time size Remove messages based on kafka.common.Topic

slide-14
SLIDE 14

Partitions

kafka.cluster.Partition

Serves: Horizontal scaling, Parallel consumer reads

slide-15
SLIDE 15

Consumers - pull

kafka.consumer.ConsumerConnector, kafka.consumer.SimpleConsumer Consumer 1 Consumer 2

slide-16
SLIDE 16

Consumer offsets

committing and fetching consumer offsets

img src: http://www.reynanprinting.com/photos/undefined/impresion-offset1.jpg

slide-17
SLIDE 17

kafka:// - protocol

  • Metadata
  • Send
  • Fetch
  • Offsets
  • Offset commit
  • Offset fetch

“Binary protocol over TCP”

slide-18
SLIDE 18

Mechanical Sympathy

"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Peteroski

Image source: http://www.theguide2surrey.com

slide-19
SLIDE 19

Persistence

“Everything is faster till the disk IO.”

slide-20
SLIDE 20

Disk faster than RAM

src: http://queue.acm.org/detail.cfm?id=1563874

slide-21
SLIDE 21

Linear Read & Writes

On high level there are only two operations: Append to end of log fetch messages from a partition beginning from a particular message id sequential file I/O

slide-22
SLIDE 22

“Let us play pictionary”

slide-23
SLIDE 23

Linux Page Cache

“Kafka ate my RAM”

slide-24
SLIDE 24

ZeroCopy

src: http://www.ibm.com/developerworks/library/j-zerocopy/

slide-25
SLIDE 25

Batching

small latency to improve throughput

img src: https://prashanthpanduranga.files.wordpress.com/2015/05/tirupati.jpg

slide-26
SLIDE 26

Compression

bandwidth is more expensive per-byte to scale than disk I/O, CPU,

  • r network bandwidth capacity within a facility

kafka.message.CompressionCodec

slide-27
SLIDE 27

Log compaction

img src: http://kafka.apache.org/083/documentation.html

kafka.log.LogCleaner, LogCleanerManager

slide-28
SLIDE 28

Message Delivery

Atleast once Atmost once Exactly once

slide-29
SLIDE 29

Replication

un-replicated = replication factor of one

slide-30
SLIDE 30

Quorum based

  • Better latency
  • To tolerate “f” failures, need “2f+1” replicas
slide-31
SLIDE 31

Primary-backup replication

Broker 1 Broker 2 Broker 3 Broker 4 Topic 1 Topic 1 Topic 1 Topic 2 Topic 2 Topic 2 Topic 3 Topic 3 Topic 3

slide-32
SLIDE 32

ZooKeeper

cluster coordinator

slide-33
SLIDE 33

THANK YOU

For questions or suggestions: Ran.ga.na.than B ranganab@thoughtworks.com @ran_than