How does sharding work with replication?Each shard is a logical collection of partitioned data. The shard could consist of a single server or a cluster of replicas. Typically in production one would use a replica set for each shard. Where do unsharded collections go if sharding is enabled for a database?In alpha 2 unsharded data goes to the "primary" for the database specified (query config.databases to see details). Future versions will parcel out unsharded collections to different shards (that is, a collection could be on any shard, but will be on only a single shard if unsharded). When will data be on more than one shard?MongoDB sharding is range based. So all the objects in a collection get put into a chunk. Only when there is more than 1 chunk is there an option for multiple shards to get data. Right now, the chunk size is 50mb, so you need at least 50mb for a migration to occur. What happens if I try to update a document on a chunk that is being migrated?The update will go through immediately on the old shard, and then the change will be replicated to the new shard before ownership transfers. What if a shard is down or slow and I do a query?If a shard is down, the query will return an error. If a shard is responding slowly, mongos will wait for it. You won't get partial results. How do queries distribute across shards?There are a few different cases to consider, depending on the query keys and the sort keys. Suppose 3 distinct attributes, X, Y, and Z, where X is the shard key. A query that keys on X and sorts on X will translate straightforwardly to a series of queries against successive shards in X-order. A query that keys on X and sorts on Y will execute in parallel on the appropriate shards, and perform a merge sort keyed on Y of the documents found. A query that keys on Y must run on all shards: if the query sorts by X, the query will serialize over shards in X-order; if the query sorts by Z, the query will parallelize over shards and perform a merge sort keyed on Z of the documents found. Now that I sharded my collection, how do I <...> (e.g. drop it)?Even if chunked, your data is still part of a collection and so all the collection commands apply. If I don't shard on _id how is it kept unique?If you don't use _id as the shard key then it is your responsibility to keep the _id unique. If you have duplicate _id values in your collection bad things will happen (as mstearn says). Why is all my data on one server?MongoDB sharding breaks data into chunks. By default, these chunks are 200mb. Sharding will keep chunks balanced across shards. This means that you need many chunks to trigger balancing, typically 2gb of data or so. db.printShardingStatus() will tell you how many chunks you have, typically need 10 to start balancing. |

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus