|
This document describes the steps involved in setting up a basic sharding cluster. A sharding cluster has three components: 1. Two or more shards. Shards are partitions of data. Each shard consists of one or more mongod processes which store the data for that shard. When multiple mongod's are in a single shard, they are each storing the same data – that is, they are replicating to each other. For testing purposes, it's possible to start all the required processes on a single server, whereas in a production situation, a number of server configurations are possible. Once the shards (mongod's), config servers, and mongos processes are running, configuration is simply a matter of issuing a series of commands to establish the various shards as being part of the cluster. Once the cluster has been established, you can begin sharding individual collections. This document is fairly detailed; for a terse, code-only explanation, see the sample shard configuration. If you'd like a quick script to set up a test cluster on a single machine, we have a python sharding script that can do the trick. Sharding ComponentsFirst, start the individual shards (mongod's), config servers, and mongos processes. Shard ServersAny mongod can become a shard server. For auto failover support, use a replica set as a shard. See replica sets for more info. Note that replica pairs are deprecated and not supported with sharding. To get started with a simple test, we recommend running a single mongod process per shard, as a test configuration doesn't demand automated failover. Config ServersRun mongod on the config server(s) with the --configsvr command line parameter. If you're only testing, you can use only one config server. For production, you're expected to run three of them. Note: Replicating data to each config server is managed by the router (mongos); they have a synchronous replication protocol optimized for three machines, if you were wondering why that number. Do not run any of the config servers with --replSet; that is not how replication works for the config servers. mongos RouterRun mongos on the servers of your choice. Specify the --configdb parameter to indicate location of the config database(s). Note: use dns names, not ip addresses, for the --configdb parameter's value. Otherwise moving config servers later is difficult. Configuring the Shard ClusterOnce the shard components are running, issue the sharding commands. You may want to automate or record your steps below in a .js file for replay in the shell when needed. Start by connecting to one of the mongos processes, and then switch to the admin database before issuing any commands. The mongos will route commands to the right machine(s) in the cluster and, if commands change metadata, the mongos will update that on the config servers. So, regardless of the number of mongos processes you've launched, you'll only need run these commands on one of those processes. You can connect to the admin database via mongos like so: ./mongo <mongos-hostname>:<mongos-port>/admin > db admin Adding shardsEach shard can consist of more than one server (see replica sets); however, for testing, only a single server with one mongod instance need be used. You must explicitly add each shard to the cluster's configuration using the addshard command: > db.runCommand( { addshard : "<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : ...}
Run this command once for each shard in the cluster. If the individual shards consist of replica sets, they can be added by specifying replicaSetName/<serverhostname>[:port][,serverhostname2[:port],...], where at least one server in the replica set is given. > db.runCommand( { addshard : "foo/<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : "foo"}
Any databases and collections that existed already in the mongod/replica set will be incorporated to the cluster. The databases will have as the "primary" host that mongod/replica set and the collections will not be sharded (but you can do so later by issuing a shardCollection command). Optional Parametersname maxSize As an example: > db.runCommand( { addshard : "sf103", maxSize:100000/*MB*/ } );
Listing shardsTo see current set of configured shards, run the listshards command: > db.runCommand( { listshards : 1 } );
This way, you can verify that all the shard have been committed to the system. Removing a shardSee the removeshard command. Enabling Sharding on a Database
Once you've added one or more shards, you can enable sharding on a database. Unless enabled, all data in the database will be stored on the same shard. After enabling you then need to run shardCollection on the relevant collections (i.e., the big ones). > db.runCommand( { enablesharding : "<dbname>" } );
Once enabled, mongos will place new collections on the primary shard for that database. Existing collections within the database will stay on the original shard. To enable partitioning of data, we have to shard an individual collection. Sharding a Collection
Use the shardcollection command to shard a collection. When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be created upfront (it speeds up the chunking process); otherwise, an index will be automatically created for you.
> db.runCommand( { shardcollection : "<namespace>",
key : <shardkeypatternobject> });
For example, let's assume we want to shard a GridFS chunks collection stored in the test database. We'd want to shard on the files_id key, so we'd invoke the shardcollection command like so: > db.runCommand( { shardcollection : "test.fs.chunks", key : { files_id : 1 } } )
{ "collectionsharded" : "mydb.fs.chunks", "ok" : 1 }
You can use the {unique: true} option to ensure that the underlying index enforces uniqueness so long as the unique index is a prefix of the shard key. (note: prior to version 2.0 this worked only if the collection is empty). db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } );
If the "unique: true" option is not used, the shard key does not have to be unique. db.runCommand( { shardcollection : "test.products" , key : { category : 1, _id : 1 } } );
You can shard on multiple fields if you are using a compound index. In the end, picking the right shard key for your needs is extremely important for successful sharding. Choosing a Shard Key. Relevant Examples and DocsExamples
Docs |

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus