Sharding Introduction

MongoDB supports an automated sharding/partitioning architecture, enabling horizontal scaling across multiple nodes. For applications that outgrow the resources of a single database server, MongoDB can convert to a sharded cluster, automatically managing failover and balancing of nodes, with few or no changes to the original application code.

This document explains MongoDB's auto-sharding approach to scalability in detail and provides an architectural overview of the various components that enable it.

Be sure to acquaint yourself with the current limitations.

MongoDB's Auto-Sharding

Sharding in a Nutshell

Sharding is the partitioning of data among multiple machines in an order-preserving manner. To take an example, let's imagine sharding a collection of users by their state of residence. In a simplistic view, if we designate three machines as our shard servers, the users might be divided up by machine as follows:

Machine 1 Machine 2 Machine 3
Alabama → Arizona Colorado → Florida Arkansas → California
Indiana → Kansas Idaho → Illinois Georgia → Hawaii
Maryland → Michigan Kentucky → Maine Minnesota → Missouri
Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania
New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah
  Vermont → West Virgina Wisconsin → Wyoming

Note that each machine stores multiple "chunks" of data, based on state of residence. MongoDB distributes these chunks evenly across all of the machines available.

The chunking mechanism would only kick in if the amount of data you were storing reached the threshold where sharding is advantageous.

Our application connects to the sharded cluster through a mongos process, which routes operations to the appropriate shard(s). In this way, the sharded MongoDB cluster looks like a single logical server to our application. If our users collection receives heavy writes, those writes are now distributed across three shard servers. Most queries continue to be efficient, as well, because they too are distributed. And since the documents are organized in an order-preserving manner, any operations specifying the state of residence will be routed only to those nodes containing that state.

Sharding is performed on a per-collection basis. Small collections need not be sharded. For instance, if we were building a service like Twitter, our collection of tweets would likely be several orders of magnitude larger than the next biggest collection. The size and throughput demands of such a collection would be prime for sharding, whereas smaller collections would still live on a single server. Non-sharded collections reside on just one shard.

Balancing

Balancing is necessary when the load on any one shard node grows out of proportion with the remaining nodes. In this situation, the data must be redistributed to equalize load across shards.

Failover

Failover is also quite important since proper system functioning requires that each logical shard be always online. In practice, this means that each shard consists of more than one machine in a configuration known as a replica set. A replica set is a set of n servers, typically two or three, each of which contains a replica of the entire data set for the given shard. One of the n servers in a replica set will always be primary. If the primary fails, the remaining replicas are capable of electing a new master.

Scaling Model

MongoDB's auto-sharding scaling model shares many similarities with Yahoo's PNUTS and Google's BigTable. Readers interested in detailed discussions of distributed databases using order-preserving partitioning are encouraged to look at the PNUTS and BigTable white papers.

Architectural Overview

A MongoDB shard cluster consists of two or more shards, one or more config servers, and any number of routing processes to which the application servers connect. Each of these components is described below in detail.

 

Shards

Each shard consists of one or more servers and stores data using mongod processes (mongod being the core MongoDB database process). In a production situation, each shard will consist of multiple servers to ensure availability and automated failover. The set of servers/mongod process within the shard comprise a replica set

In MongoDB, sharding is the tool for scaling a system, and replication is the tool for data safety, high high availability, and disaster recovery. The two work in tandem yet are are orthogonal concepts in the design.

For testing, you can use sharding with a single mongod instance per shard.  Production databases typically need redundancy, so they use replica sets.

Shard Keys

To partition a collection, we specify a shard key pattern.  This pattern is similar to the key pattern used to define an index; it names one or more fields to define the key upon which we distribute data.  Some example shard key patterns include the following:

{ state : 1 }
{ name : 1 }
{ _id : 1 }
{ lastname : 1, firstname : 1 }
{ tag : 1, timestamp : -1 }

MongoDB's sharding is order-preserving; adjacent data by shard key tend to be on the same server. The config database stores all the metadata indicating the location of data by range:

Chunks

A chunk is a contiguous range of data from a particular collection.  Chunks are described as a triple of collection, minKey, and maxKey. Thus, the shard key K of a given document assigns that document to the chunk where minKey <= K < maxKey.

Chunks grow to a maximum size, usually 64MB.  Once a chunk has reached that approximate size, the chunk splits into two new chunks.  When a particular shard has excess data, chunks will then migrate to other shards in the system.  The addition of a new shard will also influence the migration of chunks.

When choosing a shard key, the values must be of high enough cardinality (granular enough) that data can be broken into many chunks, and thus distribute-able. For instance, in the above example, where we are sharding on name, if a huge number of users had the same name, that could be problematic as all documents involving that name would be in a single undivided chunk. With names that typically does not happen and thus name is a reasonable choice as a shard key.

If it is possible that a single value within the shard key range might grow exceptionally large, it is best to use a compound shard key instead so that further discrimination of the values will be possible.

Config DB Processes

The config servers store the cluster's metadata, which includes basic information on each shard server and the chunks contained therein.

Chunk information is the main data stored by the config servers.  Each config server has a complete copy of all chunk information.  A two-phase commit is used to ensure the consistency of the configuration data among the config servers. Note that config server use their own replication model; they are not run in as a replica set.

If any of the config servers is down, the cluster's meta-data goes read only. However, even in such a failure state, the MongoDB cluster can still be read from and written to.

Routing Processes (mongos)

The mongos process can be thought of as a routing and coordination process that makes the various components of the cluster look like a single system.  When receiving client requests, the mongos process routes the request to the appropriate server(s) and merges any results to be sent back to the client.

mongos processes have no persistent state; rather, they pull their state from the config server on startup. Any changes that occur on the the config servers are propagated to each mongos process.

mongos processes can run on any server desired. They may be run on the shard servers themselves, but are lightweight enough to exist on each application server. There are no limits on the number of mongos processes that can be run simultaneously since these processes do not coordinate between one another.

Operation Types

Operations on a sharded system fall into one of two categories: global and targeted.

For targeted operations, mongos communicates with a very small number of shards -- often a single shard.  Such targeted operations are quite efficient.

Global operations involve the mongos process reaching out to all (or most) shards in the system.

The following table shows various operations and their type.  For the examples below, assume a shard key of { x : 1 }.

Operation Type
Comments
db.foo.find( { x : 300 } )
Targeted
Queries a single shard.
db.foo.find( { x : 300, age : 40 } ) Targeted Queries a single shard.
db.foo.find( { age : 40 } )
Global Queries all shards.
db.foo.find()
Global sequential
db.foo.find(...).count()
Variable Same as the corresponding find() operation
db.foo.find(...).sort( { age : 1 } )
Global parallel
db.foo.find(...).sort( { x : 1 } )
Global sequential
db.foo.count()
Global parallel
db.foo.insert( <object> )
Targeted  
db.foo.update( { x : 100 }, <object> )
db.foo.remove( { x : 100 } )
Targeted  
db.foo.update( { age : 40 }, <object> )
db.foo.remove( { age : 40 } )
Global
 
db.getLastError()
   
db.foo.ensureIndex(...)
Global  

Server Layout

Machines may be organized in a variety of fashions.  For instance, it's possible to have separate machines for each config server process, mongos process, and mongod process.  However, this can be overkill since the load is almost certainly low on the config servers.  Here is an example where some sharing of physical machines is used to lay out a cluster. The outer boxes are machines (or VMs) and the inner boxes are processes.

In the picture about a given connection to the database simply connects to a random mongos. mongos is generally very fast so perfect balancing of those connections is not essential. Additionally the implementation of a driver could be intelligent about balancing these connections (but most are not at the time of this writing).

Yet more configurations are imaginable, especially when it comes to mongos. Alternatively, as suggested earlier, the mongos processes can exists on each application server. There is some potential benefit to this configuration, as the communications between app server and mongos then can occur over the localhost interface.

Exactly three config server processes are used in almost all sharded mongo clusters. This provides sufficient data safety; more instances would increase coordination cost among the config servers.

Next : Configuration

Sharding becomes a bit easier to understand one you've set it up. It's even possible to set up a sharded cluster on a single machine. To try it for yourself, see the configuration doc page.


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE FORUMS: http://groups.google.com/group/mongodb-user. Post tips and clarifications here.

blog comments powered by Disqus