Managing Data and Operation Distribution In MongoDB
Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/
1
Managing Data and Operation Distribution In MongoDB Antonios - - PowerPoint PPT Presentation
Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBAs @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1 Introduction Antonios Giannopoulos Jason Terpko
Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/
1
www.objectrocket.com
2
Antonios Giannopoulos Jason Terpko
www.objectrocket.com
3
www.objectrocket.com
4
…
s1 s2 sN
Data redundancy relies on an idempotent log of operations.
…
s1 s2 sN
…
s1 s2 sN
How do independent components become a cluster and communicate?
○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole
○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor
○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions
…
s1 s2 sN Database <foo>
Cluster Metadata config.collections Data Layer (mongod) config.collections With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.
With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important
must match
sharded cluster restore
www.objectrocket.com
14
…
15 s1 s2 sN
Database <foo> Collection <foo>
Shards are Physical Partitions
chunk chunk
Chunks are Logical Partitions
chunk chunk chunk chunk
The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster.
○ Default 64MB
○ Continuous range from MinKey to MaxKey
○ Query Routing ○ Sharding Filter
○ Using moveChunk ○ Up to maxSize
www.objectrocket.com
17
Profiling
Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size
www.objectrocket.com
18
Candidates Profiling
Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates
www.objectrocket.com
19
Build-in Constraints Candidates Profiling
Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates
www.objectrocket.com
20
Schema Constraints Build-in Constraints Candidates Profiling
Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations
www.objectrocket.com
21
Future Schema Constraints Build-in Constraints Candidates Profiling
Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months
www.objectrocket.com
22
www.objectrocket.com
23
Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer
…
s1 s2 sN
Database <foo> Collection <foo> chunk chunk sh.ShardCollection({foo.foo},<key>) sh.startBalancer() chunk chunk chunk chunk Burn Period
www.objectrocket.com
25
Two categories:
Dump/Restore
www.objectrocket.com
26
Dual writes
Alter the config database
www.objectrocket.com
27
Process:
1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset
Rollback:
www.objectrocket.com
28
Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases: Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1})
Reduce the elements of a shard key (({a:1, b:1} to {a:1})
www.objectrocket.com
29
Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change
*There might be a more optimal code path but the above one worked like a charm
www.objectrocket.com
30
…
31 s1* s2 s4 Database <foo>
25% 25% 25%
50G 50G 50G
Database Size: 200G Primary Shard: s1
32
Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 Count = Number of documents / 125000 Limit = Number of shards * 8192 numInitialChunks = Min(Max(Size, Count), Limit) 1,600 = 51,200 / 32 800 = 100,000,000 / 125,000 32,768 = 4 *8192 1600 = Min(Max(1600, 800), 32768)
Command db.runCommand( { shardCollection: ”foo.users", key: { ”email": "hashed" }, numInitialChunks : 1600 } );
33
Use Case: Collection containing user profiles with email as the unique key. Prerequisites
34
Split Prerequisites
Initial Chunk Splits
35
Split Prerequisites Balance
36
Split Prerequisites Balance Split
37
Controlling Auto-Split
Alternatively Mongos
…
38 s1* s2 s4 Database <foo>
40% 20% 20% Database Size: 200G Primary Shard: s1 Chunks: Balanced
39
Four Helpful Resources:
40
Five Helpful Resources:
41
Five Helpful Resources:
Or:
42
Five Helpful Resources:
*with setProfilingLevel at 2, analyze both read and writes
43
Five Helpful Resources:
*with setProfilingLevel at 2, analyze both read and writes
…
44 s1* s2 s4 Database <foo>
40% 20% 20% Database Size: 200G Primary Shard: s1 Chunks: Balanced
45
Analyze
46
Move Analyze
47
Move Analyze Merge
www.objectrocket.com
48
49
The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. Takes into account the number of chunks (and not the amount of data)
Number of Chunks Migration Threshold Fewer than 20 2 20-79 4 80 and greater 8
Jumbo Chunks: MongoDB cannot move a chunk if the number of documents in the chunk is greater than 1.3 times the result of dividing the configured chunk size by the average document
the collection. Prior to 3.4.11 max was 250000 documents
50
Parallel Migrations: Before 3.4, one migration at a time After 3.4 parallel migrations as long as source and destination aren’t involve in a another migration Settings: chunkSize: Default is 64M – Lives on config.settings _waitForDelete : Default is false – Lives on config.settings _secondaryThrottle Default is true. After 3.4 WT uses false. – Lives on config.settings activeWindow - Default is 24h. – Lives on config.settings maxSize – Default is unlimited. Lives on config.shards disableBalancing: Disables/Enables balancing per collection autoSplit: Disables/Enables splits
51
Balancer only cares about the number of chunks per shard. Best case Our case Our goal
52
The “apple algorithm” we are going to introduce is simple For a collection, it requires an ordered chunk map, with attributes: chunk size, chunk bounds (min, max) and the shard each chunk belongs. 1 Pick the first chunk (current) 2 Merge current with next 3 If merged size is lower than a configured threshold then go to step 2 4 else merge current with next and set next as current Lets now see the implementation in Python.
53
54
55
56
57
58
Can the algorithm do better? Can we improve the balancing post running the script?
59
Can the algorithm do better? Can we improve the balancing post running the script? Make bounds more strict and add more parameters will improve it.
The script produces chunks between (chunksize/2) and (chunksize) chunks It will improved balancing but, It may not achieve a perfect distribution The idea is to categorize the chunks to buckets between (chunksize/2) and (chunksize) and each shard to have equal number of chunks from each bucket
60
For example, chunksize=64 we can create the following buckets:
More buckets means more accuracy but it may cause more chunk moves. The diversity of the chunks plays a major role
61
62
www.objectrocket.com
63
64
Definition: Orphaned documents are those documents on a shard that also exist in chunks on other shards How can they occur:
Impact:
65
cleanupOrphaned
Drain shard(s)
66
There are ways to scan more intelligently:
db.collections.find({"dropped" : false},{_id:1})
db.changelog.distinct("ns",{"what":"moveChunk.start"})
67
An offline method to cleanup orphans: mongodump/mongorestore shards with orphans and config.chunks collection Remove documents on all ranges belong to the shard(s) The “leftovers” are the orphaned documents Its a bit more tricky with “hashed” keys:
www.objectrocket.com
68
www.objectrocket.com
69
www.objectrocket.com
70
Address: 401 Congress Ave Suite 1950 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com
71