Configuring Sharding

Introduction

This document describes the steps involved in setting up a basic sharding cluster. A sharding cluster requires, at minimum, three components:

1. Two or more shards.
2. At least one config server.
3. A mongos routing process.

For testing purposes, it's possible to start all the required processes on a single server, whereas in a production situation, a number of server configurations are possible.

Once the shards, config servers, and mongos processes are running, configuration is simply a matter of issuing a series of commands to establish the various shards as being part of the cluster. Once the cluster has been established, you can begin sharding individual collections.

This document is fairly detailed; if you're the kind of person who prefers a terse, code-only explanation, see the sample shard configuration. If you'd like a quick script to set up a test cluster on a single machine, we have a python sharding script that can do the trick.

1. Sharding Components

First, start the individual shards, config servers, and mongos processes.

Shard Servers

Run mongod on the shard servers.  Use the --shardsvr command line parameter to indicate this mongod is a shard. For auto failover support, replica sets will be required. See replica sets for more info.

Note that replica pairs will never be supported as shards.

To get started with a simple test, we recommend running a single mongod process per shard, as a test configuration doesn't demand automated failover.

Config Servers

Run mongod on the config server(s) with the --configsvr command line parameter.  If the config servers are running on a shared machine, be sure to provide a separate dbpath for the config data (--dbpath command line parameter).

mongos Router

Run mongos on the servers of your choice.  Specify the --configdb parameter to indicate location of the config database(s).

2. Configuring the Shard Cluster

Once the shard components are running, issue the sharding commands. You may want to automate or record your steps below in a .js file for replay in the shell when needed.

Start by connecting to one of the mongos processes, and then switch to the admin database before issuing any commands.

Keep in mind that once these commands are run, the configuration data will be persisted to the config servers. So, regardless of the number of mongos processes you've launched, you'll only need run these commands on one of those processes.

You can connect to the admin database via mongos like so:

./mongo <mongos-hostname>:<mongos-port>/admin
> db
admin
Adding shards

Each shard can consist of more than one server (see replica sets); however, for testing, only a single server with one mongod instance need be used.

You must explicitly add each shard to the cluster's configuration using the addshard command:

> db.runCommand( { addshard : "<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : ...}

Run this command once for each shard in the cluster.

If the individual shards consist of replica sets, they can be added by specifying replicaSetName/<serverhostname>[:port][,serverhostname2[:port],...], where at least one server in the replica set is given.

> db.runCommand( { addshard : "foo/<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : "foo"}
Optional Parameters

name
Each shard has a name, which can be specified using the name option. If no name is given, one will be assigned automatically.

maxSize
The addshard command accepts an optional maxSize parameter.  This parameter lets you tell the system a maximum amount of disk space in megabytes to use on the specified shard.  If unspecified, the system will use the entire disk.  maxSize is useful when you have machines with different disk capacities or when you want to prevent storage of too much data on a particular shard.

As an example:

> db.runCommand( { addshard : "sf103", maxSize:100000 } );
Listing shards

To see current set of configured shards, run the listshards command:

> db.runCommand( { listshards : 1 } );

This way, you can verify that all the shard have been committed to the system.

Removing a shard

Before a shard can be removed, we have to make sure that all the chunks and databases that once lived there were relocated to other shards. The 'removeshard' command takes care of "draining" the chunks out of a shard for us. To start the shard removal, you can  issue the command

> db.runCommand( { removeshard : "localhost:10000" } );
{ msg : "draining started succesfully" , state: "started" , shard :"localhost:10000" , ok : 1

That will put the shard in "draining mode". Its chunks are going to be moved away slowly over time, so not to cause any disturbance to a running system. The command will return right away but the draining task will continue on the background. If you issue the command again during it, you'll get a progress report instead:

> db.runCommand( { removeshard : "localhost:10000" } );
{ msg: "draining ongoing" ,  state: "ongoing" , remaining : { chunks :23 , dbs : 1 }, ok : 1 }

Whereas the chunks will be removed automatically from that shard, the databases hosted there will need to be moved manually. (This has to do with a current limitation that will go away eventually):

> db.runCommand( { moveprimary : "test", to : "localhost:10001" } );
{ "primary" : "localhost:10001", "ok" : 1 }

When the shard is empty, you could issue the 'removeshard' command again and that will clean up all metadata information:

> db.runCommand( { removeshard : "localhost:10000" } );
{ msg: "remove shard completed succesfully" , stage: "completed", host: "localhost:10000", ok : 1 }

After the 'removeshard' command reported being done with that shard, you can take that process down.

Enabling Sharding on a Database

Once you've added the various shards, you can enable sharding on a database. This is an important step; without it, all collection in the database will be stored on the same shard.

> db.runCommand( { enablesharding : "<dbname>" } );

Once enabled, mongos will place different collections for the database on different shards, with the caveat that each collection will still exists on one shard only. To enable real partitioning of data, we have to shard an individual collection.

3. Sharding a Collection

Use the shardcollection command to shard a collection. When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be create upfront (it speeds up the chunking process) otherwise an index will be automatically created for you.

 > db.runCommand( { shardcollection : "<namespace>",
                    key : <shardkeypatternobject> });

For example, let's assume we want to shard a GridFS chunks collection stored in the test database. We'd want to shard on the files_id key, so we'd invoke the shardcollection command like so:

 > db.runCommand( { shardcollection : "test.fs.chunks", key : { files_id : 1 } } )
{"ok" : 1}

One note: a sharded collection can have only one unique index, which must exist on the shard key. No other unique indexes can exist on the collection.

Of course, a unique shard key wouldn't make sense on the GridFS chunks collection. But it'd be practically a necessity for a users collection sharded on email address:

db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } );

Relevant Examples and Docs

Examples

Docs


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus