How to visually spot and analyze slow MongoDB operations Kay Agahd - - PowerPoint PPT Presentation

how to visually spot and analyze slow mongodb operations
SMART_READER_LITE
LIVE PREVIEW

How to visually spot and analyze slow MongoDB operations Kay Agahd - - PowerPoint PPT Presentation

How to visually spot and analyze slow MongoDB operations Kay Agahd idealo internet GmbH About me Current location: Berlin/Germany Education: Engineer's degree, Software Engineering Experience: Software developer from 1998 - 2009


slide-1
SLIDE 1

How to visually spot and analyze slow MongoDB operations

Kay Agahd idealo internet GmbH

slide-2
SLIDE 2

2

About me

  • Current location: Berlin/Germany
  • Education: Engineer's degree, Software Engineering
  • Experience:

○ Software developer from 1998 - 2009 in Paris/France ○ Database engineer since 2012 at idealo in Berlin/Germany

  • Certifications:

○ MongoDB certified Java developer (final grade 100%) ○ MongoDB certified DBA (final grade 96%) linkedin.com/in/kayag/

slide-3
SLIDE 3

3

About idealo

  • founded in 2000
  • Europe’s leading price

comparison website and app

  • Germany, Austria,

United Kingdom, France, Italy and Spain

  • > 1 billion offers online

(November 2018)

  • fast growing
slide-4
SLIDE 4

4

idealo and MongoDB

  • different types of databases: Oracle, MySQL, PostgreSQL,

MongoDB

  • MongoDB in production since v1.6 (ca. 2011)
  • sharding in production since MongoDB v1.8
  • MongoDB stores mainly offers for back-end usage
  • > 2 billions docs in offerStore, up to 1 bn both read/writes per day
  • > 10 billions docs in offerHistory
slide-5
SLIDE 5

Motivation

Why we need a better MongoDB profiler

slide-6
SLIDE 6

6

Review profiling

  • MongoDB supports profiling of “slow” operations
  • “slow” is a threshold to be set when turning profiling on (100 ms)
  • profiler writes collected data to a capped collection of the profiled

database

  • profiling per-database or per-instance on a running mongod
slide-7
SLIDE 7

7

Inconveniences

  • each mongod and database needs to be handled separately
  • sharding: shards * repl. factor * databases = #profilers
  • gives only a view on a limited time span due to capped collection
  • profiling/analyzing may add stress to the profiled server
  • different formats of “query” field makes querying more difficult
  • bug: ops through mongos omit the user (JIRA: SERVER-7538)
slide-8
SLIDE 8

8

idealo requirements

  • easily switch on/off profiling, even for many mongod’s involved
  • quick overview of types of slow-ops and their quantity within a

time period (“types” means op type, user, server, query shape, etc.)

  • historical view to see how slow-ops evolve to extrapolate them
  • discovering spikes in time or in slow-op types
  • filtering by slow-op types and/or time range to drill down
  • usable also by non-database-admins, e.g. software developers
slide-9
SLIDE 9

What we’ve built

MongoDB slow operations profiler

slide-10
SLIDE 10

10

How it works

DB 1 mongod 1 DB 2 DB n DB n1 mongod n DB n2 DB nm slow ops app profiler 1..n profiler n..m collector DB

slide-11
SLIDE 11

11

Example of slow op document

{ "op": "query", "ns": "offerStore.offer", "query": { "find": "offer", "filter": { "shopId": 292731, "opIds": { "$in": [ 29337,5478 ] }, "offerStatus": { "$gt": 0 } }, "projection": { "traceId": 1, "bokey": 1, "version": 1, "offerTitle": 1 }, "batchSize":1000, ... "shardVersion": [ Timestamp(14944, 25276), ObjectId("591c6...8fcde") ] }, "keysExamined": 2210852, "docsExamined": 232, "cursorExhausted": true, "keyUpdates": 0, "writeConflicts": 0, "numYield": 17272, "locks": { "Global": { "acquireCount": { "r": NumberLong("34546")}}, "Database": { "acquireCount": { "r": NumberLong("17273")}}, "Collection": { "acquireCount": { "r": NumberLong("17273")}} }, "nreturned": 185, "responseLength": 119954, "protocol": "op_command", "millis": 4189, "execStats": { "stage": "PROJECTION", "nReturned": 185, "executionTimeMillisEstimate": 3941, "works": 2210853, "advanced": 185,

... 124 lines omitted ...

}, "ts": ISODate("2018-10-26T07:17:12.747Z"), "client": "10.135.128.219", "allUsers": [ { "user": "__system", "db": "local" } ], "user": "__system@local" }

slide-12
SLIDE 12

12

Condense slow op documents

{ "_id":ObjectId("5bd3090b68b5c4203f53ce7e"), "ts":ISODate("2018-10-26T12:31:07.752Z"), "adr":"host523.idealo.de", "lbl":"offerStoreDE", "rs":"offerStoreDE09", "db":"offerStore", "col":"offers" "op":"getmore", "fields":["shopId", "opIds.$in", "offerStatus.$gt"], "sort":["_id"], "nret":500, "reslen":94656, "millis":5322, "user":"__system@local" }

slide-13
SLIDE 13

13

Some numbers of the collector DB

  • > 250 millions slow ops stored within the last > 5 years
  • average doc size = 238 Bytes
  • uncompressed data size ca. 55 GB
  • index size < 9 GB
  • total storage size (snappy compression) < 12 GB
slide-14
SLIDE 14

14

Configuration

{ "collector":{ "hosts":["myCollectorHost_member1:27017", "myCollectorHost_member2:27017", "myCollectorHost_member3:27017"], "db":"profiling", "collection":"slowops", "adminUser":"", "adminPw":"" }, ...

  • Collector
slide-15
SLIDE 15

15

Configuration

"profiled":[ { "label":"dbs foo", "hosts":["someHost1:27017", "someHost2:27017", "someHost3:27017"], "ns":["someDB.someCollection", "anotherDB.*"], "enabled": false }, { "label":"dbs bar", "hosts":["someMongoRouter1:27017","someMongoRouter2:27017"], "ns":["*.*"], "adminUser":"kay", "adminPw":"never.tell.it!:-)", "enabled":false, "slowMS":500, "responseTimeoutInMs":2000 } ],...

  • databases to be profiled
slide-16
SLIDE 16

16

How it looks

slide-17
SLIDE 17

17

How it looks - part 2

slide-18
SLIDE 18

18

Slow ops diagram

2018/10/30 10:04 = Count:95 db=offerStore coll=offers

  • p=query

fields=[shopId,mCC.$gt] Duration: avg:322 max:990 sum:31.682 ms

slide-19
SLIDE 19

19

Slow ops data table

slide-20
SLIDE 20

20

Slow ops search form

slide-21
SLIDE 21

21

Further benefits

  • global collector allows to see evolution of slow ops
  • aggregate slow ops of last minute grouped by label
  • send metrics i.e. count, durations and nReturned to graphite
  • build grafana dashboard

{$match: { ts: {$gt:from, $lt:to}} }, {$group: {_id: {label:"$lbl"}, count:{$sum:1}, sumMs:{$sum:"$millis"}, maxMs:{$max:"$millis"}, sumNret:{$sum:"$nret"}, sumResplen:{$sum:"$reslen"} } }

slide-22
SLIDE 22

22

Historical view per DBS

slide-23
SLIDE 23

23

Further documentation

  • Open source project, you are welcome to contribute:

https://github.com/idealo/mongodb-slow-operations-profiler

  • Blog post 1 of 2:

https://medium.com/idealo-tech-blog/how-to-visually-spot- and-analyze-slow-mongodb-operations-d91ac819e0de

  • Blog post 2 of 2:

https://medium.com/idealo-tech-blog/mongodb-slow-

  • perations-analyzer-2-0-24da414fad13

We are hiring: jobs.idealo.de

slide-24
SLIDE 24

24

Rate My Session