[PPT] - Managing Data and Operation Distribution In MongoDB Antonios PowerPoint Presentation

SLIDE 1

Managing Data and Operation Distribution In MongoDB

Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/

1

SLIDE 2

Introduction

www.objectrocket.com

2

Antonios Giannopoulos Jason Terpko

SLIDE 3

Overview

Sharded Cluster
Shard Keys Selection
Shard Key Operations
Chunk Management
Data Distribution
Orphaned documents
Q&A

www.objectrocket.com

3

SLIDE 4

Sharded Cluster

Cluster Metadata
Data Layer
Query Routing
Cluster Communication

www.objectrocket.com

4

SLIDE 5

Cluster Metadata

SLIDE 6

Data Layer

…

s1 s2 sN

SLIDE 7

Replication

Data redundancy relies on an idempotent log of operations.

SLIDE 8

Query Routing

…

s1 s2 sN

SLIDE 9

Sharded Cluster

…

s1 s2 sN

SLIDE 10

Cluster Communication

How do independent components become a cluster and communicate?

Replica Set

○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole

Mongos Configuration

○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor

Post Add Shard

○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions

SLIDE 11

Primary Shard

…

s1 s2 sN Database <foo>

SLIDE 12

Collection UUID

Cluster Metadata config.collections Data Layer (mongod) config.collections With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.

SLIDE 13

Collection UUID

With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important

UUID’s for a namespace

must match

Use 4.0+ Tools for a

sharded cluster restore

SLIDE 14

Shard Key - Selection

Profiling
Identify shard key candidates
Pick a shard key
Challenges

www.objectrocket.com

14

SLIDE 15

Sharding

…

15 s1 s2 sN

Database <foo> Collection <foo>

Shards are Physical Partitions

chunk chunk

Chunks are Logical Partitions

chunk chunk chunk chunk

SLIDE 16

What is a Chunk?

The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster.

Maximum size is defined in config.settings

○ Default 64MB

Before 3.4.11: Hardcoded maximum document count of 250,000
Version 3.4.11 and higher: 1.3 configured chunk size by the average document size
Chunk map is stored in config.chunks

○ Continuous range from MinKey to MaxKey

Chunk map is cached at both the mongos and mongod

○ Query Routing ○ Sharding Filter

Chunks distributed by the Balancer

○ Using moveChunk ○ Up to maxSize

SLIDE 17

Shard Key Selection

www.objectrocket.com

17

Profiling

Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size

SLIDE 18

Shard Key Selection

www.objectrocket.com

18

Candidates Profiling

Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates

SLIDE 19

Shard Key Selection

www.objectrocket.com

19

Build-in Constraints Candidates Profiling

Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates

SLIDE 20

Shard Key Selection

www.objectrocket.com

20

Schema Constraints Build-in Constraints Candidates Profiling

Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations

SLIDE 21

Shard Key Selection

www.objectrocket.com

21

Future Schema Constraints Build-in Constraints Candidates Profiling

Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months

SLIDE 22

Shard key - Operations

Apply a shard key
Revert a shard key

www.objectrocket.com

22

SLIDE 23

Apply a shard key

www.objectrocket.com

23

Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer

SLIDE 24

Sharding

…

s1 s2 sN

Database <foo> Collection <foo> chunk chunk sh.ShardCollection({foo.foo},<key>) sh.startBalancer() chunk chunk chunk chunk Burn Period

SLIDE 25

Revert a shard key

www.objectrocket.com

25

Two categories:

Affects functionality (exceptions, inconsistent data,…)
Affects performance (operational hotspots…)

Dump/Restore

Requires downtime – write and in some cases read
Time consuming operation
You may restore on a sharded or unsharded collection
Better pre-create indexes
Same or new cluster can be used
Streaming dump/restore is an option
On special cases, like time series data can be fast

SLIDE 26

Revert a shard key

www.objectrocket.com

26

Dual writes

Mongo to Mongo connector or Change streams
No downtime
Requires extra capacity
May Increase latency
Same or new cluster can be used
Adds complexity

Alter the config database

Requires downtime – but minimal
Easy during burn period
Time consuming, if chunks are distributed
Has overhead during chunk moves

SLIDE 27

Revert a shard key

www.objectrocket.com

27

Process:

1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset

Rollback:

After step 6, stop all mongos and shards
Stop the running members of the config server ReplSet and wipe their data directory
Start all config server replset members
Start all mongos and shards

SLIDE 28

Revert a shard key

www.objectrocket.com

28

Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases: Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1})

Possible (and easier) if b’s max and min (per a) are predefined
For example {year:month} to be extended to {year:month:day}

Reduce the elements of a shard key (({a:1, b:1} to {a:1})

Possible (and easier) if all distinct “a” values are in the same shard
There aren’t chunks with the same “a.min” (adds complexity)

SLIDE 29

Revert a shard key

www.objectrocket.com

29

Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change

*There might be a more optimal code path but the above one worked like a charm

SLIDE 30

Chunk Splitting and Merging

Pre-splitting
Auto Splits
Manual Intervention

www.objectrocket.com

30

SLIDE 31

Distribution Goal

…

31 s1* s2 s4 Database <foo>

25% 25% 25%

50G 50G 50G

Database Size: 200G Primary Shard: s1

SLIDE 32

Pre-Split – Hashed Keys

32

Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 Count = Number of documents / 125000 Limit = Number of shards * 8192 numInitialChunks = Min(Max(Size, Count), Limit) 1,600 = 51,200 / 32 800 = 100,000,000 / 125,000 32,768 = 4 *8192 1600 = Min(Max(1600, 800), 32768)

Command db.runCommand( { shardCollection: ”foo.users", key: { ”email": "hashed" }, numInitialChunks : 1600 } );

SLIDE 33

Pre-Split – Deterministic

33

Use Case: Collection containing user profiles with email as the unique key. Prerequisites

1. Shard key analysis complete
2. Understanding of access patterns
3. Knowledge of the data
4. Unique key constraint

SLIDE 34

Pre-Split – Deterministic

34

Split Prerequisites

Initial Chunk Splits

SLIDE 35

Pre-Split – Deterministic

35

Split Prerequisites Balance

SLIDE 36

Pre-Split – Deterministic

36

Split Prerequisites Balance Split

SLIDE 37

Automatic Splitting

37

Controlling Auto-Split

sh.enableAutoSplit()
sh.disableAutoSplit()

Alternatively Mongos

The component responsible for track statistics
Bytes Written Statistics
Multiple Mongos Servers for HA

SLIDE 38

Sub-Optimal Distribution

…

38 s1* s2 s4 Database <foo>

40% 20% 20% Database Size: 200G Primary Shard: s1 Chunks: Balanced

SLIDE 39

Maintenance – Splitting

39

Four Helpful Resources:

collStats
config.chunks
Profiler
Oplog
dataSize

SLIDE 40

Maintenance – Splitting

40

Five Helpful Resources:

collStats
config.chunks
dataSize
plog.rs
system.profile

SLIDE 41

Maintenance – Splitting

41

Five Helpful Resources:

collStats
config.chunks
dataSize
plog.rs
system.profile

Or:

SLIDE 42

Maintenance – Splitting

42

Five Helpful Resources:

collStats
config.chunks
dataSize
plog.rs
system.profile

*with setProfilingLevel at 2, analyze both read and writes

SLIDE 43

Maintenance – Splitting

43

Five Helpful Resources:

collStats
config.chunks
dataSize
plog.rs
system.profile*

*with setProfilingLevel at 2, analyze both read and writes

SLIDE 44

Sub-Optimal Distribution

…

44 s1* s2 s4 Database <foo>

40% 20% 20% Database Size: 200G Primary Shard: s1 Chunks: Balanced

SLIDE 45

Maintenance - Merging

45

Analyze

SLIDE 46

Maintenance - Merging

46

Move Analyze

SLIDE 47

Maintenance - Merging

47

Move Analyze Merge

SLIDE 48

Balancing

Balancer overview
Balancing with defaults
Create a better distribution
Create a better balancing

www.objectrocket.com

48

SLIDE 49

Balancer

49

The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. Takes into account the number of chunks (and not the amount of data)

Number of Chunks Migration Threshold Fewer than 20 2 20-79 4 80 and greater 8

Jumbo Chunks: MongoDB cannot move a chunk if the number of documents in the chunk is greater than 1.3 times the result of dividing the configured chunk size by the average document

size. db.collection.stats() includes the avgObjSize field, which represents the average document size in

the collection. Prior to 3.4.11 max was 250000 documents

SLIDE 50

Balancer

50

Parallel Migrations: Before 3.4, one migration at a time After 3.4 parallel migrations as long as source and destination aren’t involve in a another migration Settings: chunkSize: Default is 64M – Lives on config.settings _waitForDelete : Default is false – Lives on config.settings _secondaryThrottle Default is true. After 3.4 WT uses false. – Lives on config.settings activeWindow - Default is 24h. – Lives on config.settings maxSize – Default is unlimited. Lives on config.shards disableBalancing: Disables/Enables balancing per collection autoSplit: Disables/Enables splits

SLIDE 51

Balancing

51

Balancer only cares about the number of chunks per shard. Best case Our case Our goal

SLIDE 52

Balancing

52

The “apple algorithm” we are going to introduce is simple For a collection, it requires an ordered chunk map, with attributes: chunk size, chunk bounds (min, max) and the shard each chunk belongs. 1 Pick the first chunk (current) 2 Merge current with next 3 If merged size is lower than a configured threshold then go to step 2 4 else merge current with next and set next as current Lets now see the implementation in Python.

SLIDE 53

Balancing - Variables

53

SLIDE 54

Balancing – Basic functions

54

SLIDE 55

Balancing – Main function

55

SLIDE 56

Balancing – Helper functions

56

SLIDE 57

Balancing - Output

57

SLIDE 58

Balancing

58

Can the algorithm do better? Can we improve the balancing post running the script?

SLIDE 59

Balancing

59

Can the algorithm do better? Can we improve the balancing post running the script? Make bounds more strict and add more parameters will improve it.

OR- Chunk Buckets maybe the answer.

The script produces chunks between (chunksize/2) and (chunksize) chunks It will improved balancing but, It may not achieve a perfect distribution The idea is to categorize the chunks to buckets between (chunksize/2) and (chunksize) and each shard to have equal number of chunks from each bucket

SLIDE 60

Balancing - Buckets

60

For example, chunksize=64 we can create the following buckets:

Bucket1 for sizes between 32 and 36 MiB
Bucket2 for sizes between 36 and 40 MiB
Bucket3 for sizes between 40 and 44 MiB
Bucket4 for sizes between 44 and 48 MiB
Bucket5 for sizes between 48 and 52 MiB
Bucket6 for sizes between 52 and 56 MiB
Bucket7 for sizes between 56 and 60 MiB
Bucket8 for sizes between 60 and 64 MiB

More buckets means more accuracy but it may cause more chunk moves. The diversity of the chunks plays a major role

SLIDE 61

Balancing - Buckets

61

SLIDE 62

Balancing – Get the code

62

GitHub Repo - https://bit.ly/2M0LnxG

SLIDE 63

Orphaned Documents

Definition
Issues
Cleanup

www.objectrocket.com

63

SLIDE 64

Definition/Impact

64

Definition: Orphaned documents are those documents on a shard that also exist in chunks on other shards How can they occur:

Failed migration
Failed cleanup (RangeDeleter)
Direct access to the shards

Impact:

Space
Performance
Application consistency

SLIDE 65

Cleanup

65

cleanupOrphaned

Must run on every shard
Removes the Orphans automatically
No dry run / Poor reporting

Drain shard(s)

Expensive – storage/performance
Locate shards with orphans

SLIDE 66

Cleanup Cont.

66

There are ways to scan more intelligently:

Skip unsharded collections

db.collections.find({"dropped" : false},{_id:1})

Skip collections without migrations

db.changelog.distinct("ns",{"what":"moveChunk.start"})

Check first event - changelog is a capped collection

SLIDE 67

Cleanup Cont.

67

An offline method to cleanup orphans: mongodump/mongorestore shards with orphans and config.chunks collection Remove documents on all ranges belong to the shard(s) The “leftovers” are the orphaned documents Its a bit more tricky with “hashed” keys:

SLIDE 68

Questions?

www.objectrocket.com

68

SLIDE 69

Rate Our Session

www.objectrocket.com

69

SLIDE 70

www.objectrocket.com

70

We’re Hiring! Looking to join a dynamic & innovative team?

https://www.objectrocket.com/careers/

r email careers@objectrocket.com

SLIDE 71

Thank you!

Address: 401 Congress Ave Suite 1950 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com

71