Managing Data and Operation Distribution In MongoDB Antonios - - PowerPoint PPT Presentation

managing data and operation distribution in mongodb
SMART_READER_LITE
LIVE PREVIEW

Managing Data and Operation Distribution In MongoDB Antonios - - PowerPoint PPT Presentation

Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBAs @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1 Introduction Antonios Giannopoulos Jason Terpko


slide-1
SLIDE 1

Managing Data and Operation Distribution In MongoDB

Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/

1

slide-2
SLIDE 2

Introduction

www.objectrocket.com

2

Antonios Giannopoulos Jason Terpko

slide-3
SLIDE 3

Overview

  • Sharded Cluster
  • Shard Keys Selection
  • Shard Key Operations
  • Chunk Management
  • Data Distribution
  • Orphaned documents
  • Q&A

www.objectrocket.com

3

slide-4
SLIDE 4

Sharded Cluster

  • Cluster Metadata
  • Data Layer
  • Query Routing
  • Cluster Communication

www.objectrocket.com

4

slide-5
SLIDE 5

Cluster Metadata

slide-6
SLIDE 6

Data Layer

s1 s2 sN

slide-7
SLIDE 7

Replication

Data redundancy relies on an idempotent log of operations.

slide-8
SLIDE 8

Query Routing

s1 s2 sN

slide-9
SLIDE 9

Sharded Cluster

s1 s2 sN

slide-10
SLIDE 10

Cluster Communication

How do independent components become a cluster and communicate?

  • Replica Set

○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole

  • Mongos Configuration

○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor

  • Post Add Shard

○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions

slide-11
SLIDE 11

Primary Shard

s1 s2 sN Database <foo>

slide-12
SLIDE 12

Collection UUID

Cluster Metadata config.collections Data Layer (mongod) config.collections With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.

slide-13
SLIDE 13

Collection UUID

With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important

  • UUID’s for a namespace

must match

  • Use 4.0+ Tools for a

sharded cluster restore

slide-14
SLIDE 14

Shard Key - Selection

  • Profiling
  • Identify shard key candidates
  • Pick a shard key
  • Challenges

www.objectrocket.com

14

slide-15
SLIDE 15

Sharding

15 s1 s2 sN

Database <foo> Collection <foo>

Shards are Physical Partitions

chunk chunk

Chunks are Logical Partitions

chunk chunk chunk chunk

slide-16
SLIDE 16

What is a Chunk?

The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster.

  • Maximum size is defined in config.settings

○ Default 64MB

  • Before 3.4.11: Hardcoded maximum document count of 250,000
  • Version 3.4.11 and higher: 1.3 configured chunk size by the average document size
  • Chunk map is stored in config.chunks

○ Continuous range from MinKey to MaxKey

  • Chunk map is cached at both the mongos and mongod

○ Query Routing ○ Sharding Filter

  • Chunks distributed by the Balancer

○ Using moveChunk ○ Up to maxSize

slide-17
SLIDE 17

Shard Key Selection

www.objectrocket.com

17

Profiling

Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size

slide-18
SLIDE 18

Shard Key Selection

www.objectrocket.com

18

Candidates Profiling

Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates

slide-19
SLIDE 19

Shard Key Selection

www.objectrocket.com

19

Build-in Constraints Candidates Profiling

Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates

slide-20
SLIDE 20

Shard Key Selection

www.objectrocket.com

20

Schema Constraints Build-in Constraints Candidates Profiling

Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations

slide-21
SLIDE 21

Shard Key Selection

www.objectrocket.com

21

Future Schema Constraints Build-in Constraints Candidates Profiling

Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months

slide-22
SLIDE 22

Shard key - Operations

  • Apply a shard key
  • Revert a shard key

www.objectrocket.com

22

slide-23
SLIDE 23

Apply a shard key

www.objectrocket.com

23

Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer

slide-24
SLIDE 24

Sharding

s1 s2 sN

Database <foo> Collection <foo> chunk chunk sh.ShardCollection({foo.foo},<key>) sh.startBalancer() chunk chunk chunk chunk Burn Period

slide-25
SLIDE 25

Revert a shard key

www.objectrocket.com

25

Two categories:

  • Affects functionality (exceptions, inconsistent data,…)
  • Affects performance (operational hotspots…)

Dump/Restore

  • Requires downtime – write and in some cases read
  • Time consuming operation
  • You may restore on a sharded or unsharded collection
  • Better pre-create indexes
  • Same or new cluster can be used
  • Streaming dump/restore is an option
  • On special cases, like time series data can be fast
slide-26
SLIDE 26

Revert a shard key

www.objectrocket.com

26

Dual writes

  • Mongo to Mongo connector or Change streams
  • No downtime
  • Requires extra capacity
  • May Increase latency
  • Same or new cluster can be used
  • Adds complexity

Alter the config database

  • Requires downtime – but minimal
  • Easy during burn period
  • Time consuming, if chunks are distributed
  • Has overhead during chunk moves
slide-27
SLIDE 27

Revert a shard key

www.objectrocket.com

27

Process:

1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset

Rollback:

  • After step 6, stop all mongos and shards
  • Stop the running members of the config server ReplSet and wipe their data directory
  • Start all config server replset members
  • Start all mongos and shards
slide-28
SLIDE 28

Revert a shard key

www.objectrocket.com

28

Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases: Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1})

  • Possible (and easier) if b’s max and min (per a) are predefined
  • For example {year:month} to be extended to {year:month:day}

Reduce the elements of a shard key (({a:1, b:1} to {a:1})

  • Possible (and easier) if all distinct “a” values are in the same shard
  • There aren’t chunks with the same “a.min” (adds complexity)
slide-29
SLIDE 29

Revert a shard key

www.objectrocket.com

29

Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change

*There might be a more optimal code path but the above one worked like a charm

slide-30
SLIDE 30

Chunk Splitting and Merging

  • Pre-splitting
  • Auto Splits
  • Manual Intervention

www.objectrocket.com

30

slide-31
SLIDE 31

Distribution Goal

31 s1* s2 s4 Database <foo>

25% 25% 25%

50G 50G 50G

Database Size: 200G Primary Shard: s1

slide-32
SLIDE 32

Pre-Split – Hashed Keys

32

Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 Count = Number of documents / 125000 Limit = Number of shards * 8192 numInitialChunks = Min(Max(Size, Count), Limit) 1,600 = 51,200 / 32 800 = 100,000,000 / 125,000 32,768 = 4 *8192 1600 = Min(Max(1600, 800), 32768)

Command db.runCommand( { shardCollection: ”foo.users", key: { ”email": "hashed" }, numInitialChunks : 1600 } );

slide-33
SLIDE 33

Pre-Split – Deterministic

33

Use Case: Collection containing user profiles with email as the unique key. Prerequisites

  • 1. Shard key analysis complete
  • 2. Understanding of access patterns
  • 3. Knowledge of the data
  • 4. Unique key constraint
slide-34
SLIDE 34

Pre-Split – Deterministic

34

Split Prerequisites

Initial Chunk Splits

slide-35
SLIDE 35

Pre-Split – Deterministic

35

Split Prerequisites Balance

slide-36
SLIDE 36

Pre-Split – Deterministic

36

Split Prerequisites Balance Split

slide-37
SLIDE 37

Automatic Splitting

37

Controlling Auto-Split

  • sh.enableAutoSplit()
  • sh.disableAutoSplit()

Alternatively Mongos

  • The component responsible for track statistics
  • Bytes Written Statistics
  • Multiple Mongos Servers for HA
slide-38
SLIDE 38

Sub-Optimal Distribution

38 s1* s2 s4 Database <foo>

40% 20% 20% Database Size: 200G Primary Shard: s1 Chunks: Balanced

slide-39
SLIDE 39

Maintenance – Splitting

39

Four Helpful Resources:

  • collStats
  • config.chunks
  • Profiler
  • Oplog
  • dataSize
slide-40
SLIDE 40

Maintenance – Splitting

40

Five Helpful Resources:

  • collStats
  • config.chunks
  • dataSize
  • plog.rs
  • system.profile
slide-41
SLIDE 41

Maintenance – Splitting

41

Five Helpful Resources:

  • collStats
  • config.chunks
  • dataSize
  • plog.rs
  • system.profile

Or:

slide-42
SLIDE 42

Maintenance – Splitting

42

Five Helpful Resources:

  • collStats
  • config.chunks
  • dataSize
  • plog.rs
  • system.profile

*with setProfilingLevel at 2, analyze both read and writes

slide-43
SLIDE 43

Maintenance – Splitting

43

Five Helpful Resources:

  • collStats
  • config.chunks
  • dataSize
  • plog.rs
  • system.profile*

*with setProfilingLevel at 2, analyze both read and writes

slide-44
SLIDE 44

Sub-Optimal Distribution

44 s1* s2 s4 Database <foo>

40% 20% 20% Database Size: 200G Primary Shard: s1 Chunks: Balanced

slide-45
SLIDE 45

Maintenance - Merging

45

Analyze

slide-46
SLIDE 46

Maintenance - Merging

46

Move Analyze

slide-47
SLIDE 47

Maintenance - Merging

47

Move Analyze Merge

slide-48
SLIDE 48

Balancing

  • Balancer overview
  • Balancing with defaults
  • Create a better distribution
  • Create a better balancing

www.objectrocket.com

48

slide-49
SLIDE 49

Balancer

49

The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. Takes into account the number of chunks (and not the amount of data)

Number of Chunks Migration Threshold Fewer than 20 2 20-79 4 80 and greater 8

Jumbo Chunks: MongoDB cannot move a chunk if the number of documents in the chunk is greater than 1.3 times the result of dividing the configured chunk size by the average document

  • size. db.collection.stats() includes the avgObjSize field, which represents the average document size in

the collection. Prior to 3.4.11 max was 250000 documents

slide-50
SLIDE 50

Balancer

50

Parallel Migrations: Before 3.4, one migration at a time After 3.4 parallel migrations as long as source and destination aren’t involve in a another migration Settings: chunkSize: Default is 64M – Lives on config.settings _waitForDelete : Default is false – Lives on config.settings _secondaryThrottle Default is true. After 3.4 WT uses false. – Lives on config.settings activeWindow - Default is 24h. – Lives on config.settings maxSize – Default is unlimited. Lives on config.shards disableBalancing: Disables/Enables balancing per collection autoSplit: Disables/Enables splits

slide-51
SLIDE 51

Balancing

51

Balancer only cares about the number of chunks per shard. Best case Our case Our goal

slide-52
SLIDE 52

Balancing

52

The “apple algorithm” we are going to introduce is simple For a collection, it requires an ordered chunk map, with attributes: chunk size, chunk bounds (min, max) and the shard each chunk belongs. 1 Pick the first chunk (current) 2 Merge current with next 3 If merged size is lower than a configured threshold then go to step 2 4 else merge current with next and set next as current Lets now see the implementation in Python.

slide-53
SLIDE 53

Balancing - Variables

53

slide-54
SLIDE 54

Balancing – Basic functions

54

slide-55
SLIDE 55

Balancing – Main function

55

slide-56
SLIDE 56

Balancing – Helper functions

56

slide-57
SLIDE 57

Balancing - Output

57

slide-58
SLIDE 58

Balancing

58

Can the algorithm do better? Can we improve the balancing post running the script?

slide-59
SLIDE 59

Balancing

59

Can the algorithm do better? Can we improve the balancing post running the script? Make bounds more strict and add more parameters will improve it.

  • OR- Chunk Buckets maybe the answer.

The script produces chunks between (chunksize/2) and (chunksize) chunks It will improved balancing but, It may not achieve a perfect distribution The idea is to categorize the chunks to buckets between (chunksize/2) and (chunksize) and each shard to have equal number of chunks from each bucket

slide-60
SLIDE 60

Balancing - Buckets

60

For example, chunksize=64 we can create the following buckets:

  • Bucket1 for sizes between 32 and 36 MiB
  • Bucket2 for sizes between 36 and 40 MiB
  • Bucket3 for sizes between 40 and 44 MiB
  • Bucket4 for sizes between 44 and 48 MiB
  • Bucket5 for sizes between 48 and 52 MiB
  • Bucket6 for sizes between 52 and 56 MiB
  • Bucket7 for sizes between 56 and 60 MiB
  • Bucket8 for sizes between 60 and 64 MiB

More buckets means more accuracy but it may cause more chunk moves. The diversity of the chunks plays a major role

slide-61
SLIDE 61

Balancing - Buckets

61

slide-62
SLIDE 62

Balancing – Get the code

62

GitHub Repo - https://bit.ly/2M0LnxG

slide-63
SLIDE 63

Orphaned Documents

  • Definition
  • Issues
  • Cleanup

www.objectrocket.com

63

slide-64
SLIDE 64

Definition/Impact

64

Definition: Orphaned documents are those documents on a shard that also exist in chunks on other shards How can they occur:

  • Failed migration
  • Failed cleanup (RangeDeleter)
  • Direct access to the shards

Impact:

  • Space
  • Performance
  • Application consistency
slide-65
SLIDE 65

Cleanup

65

cleanupOrphaned

  • Must run on every shard
  • Removes the Orphans automatically
  • No dry run / Poor reporting

Drain shard(s)

  • Expensive – storage/performance
  • Locate shards with orphans
slide-66
SLIDE 66

Cleanup Cont.

66

There are ways to scan more intelligently:

  • Skip unsharded collections

db.collections.find({"dropped" : false},{_id:1})

  • Skip collections without migrations

db.changelog.distinct("ns",{"what":"moveChunk.start"})

  • Check first event - changelog is a capped collection
slide-67
SLIDE 67

Cleanup Cont.

67

An offline method to cleanup orphans: mongodump/mongorestore shards with orphans and config.chunks collection Remove documents on all ranges belong to the shard(s) The “leftovers” are the orphaned documents Its a bit more tricky with “hashed” keys:

slide-68
SLIDE 68

Questions?

www.objectrocket.com

68

slide-69
SLIDE 69

Rate Our Session

www.objectrocket.com

69

slide-70
SLIDE 70

www.objectrocket.com

70

We’re Hiring! Looking to join a dynamic & innovative team?

https://www.objectrocket.com/careers/

  • r email careers@objectrocket.com
slide-71
SLIDE 71

Thank you!

Address: 401 Congress Ave Suite 1950 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com

71