See also Backups and Import Export Tools for more information on backups, particularly on the step of backing up each individual server. If you are on EC2 you should look at EC2 Backup & Restore for how to take advantage of EBS snapshots.
mongos->mongodump (small clusters only)
 | You do not need to turn off the balancer when running mongodump against mongos. |
If you have a small sharded cluster, you can use mongodump, connected to a mongos, to dump the entire cluster's data, or a subset of it.
This is easy, but only works if one machine can dump all the data in a reasonable period of time. The default dump, when you don't specify a database or collection to dump, will contain both the sharding config metadata (the data on the config servers) and the actual database content.
The mongodump --oplog cannot be used when dumping from a mongos. This option is only available when running directly against a replica member.
Backing up via shards
 | The balancer must be turned off during a backup of the cluster (when done directly from shards) |
In MongoDB, documents move from shard to shard as the system continually rebalances the data and
load. Thus care is needed to assure getting a good backup of a large cluster. The diagram at the bottom of this page illustrates a bit how data is distributed and some of the nuances of a cluster backup.
Snapshotting
The procedures outlined below do not result in a cluster-wide snapshot. Rather, each shard is backed up somewhat independently. Thus new documents written during the backup may or may not be captured in the image (data preceding the backup start time will be present). The unit of isolation (and the rest of the letters in "ACID") in MongoDB is a document, not the entire database. Documents can be large and quite rich, thus for many use cases this is often sufficient.
Large cluster backup (v2.0+)
- Turn off the balancer
> use config
> db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true );
>
> > > while( db.locks.findOne({_id: "balancer"}).state ) { print("waiting..."); sleep(1000); }
- mongodump the config db from one of the config servers to back up the cluster's metadata (either through mongos or by direct connection to the config server). You only need to back up one config server, as they all have replicas of the same information.
- Backup each shard. Use the standard practices for backing up a replica set (e.g. fsync+lock then snapshot, or mongodump). Shards can be backed up one at a time or in parallel.
- Turn the balancer back on
>use config
>db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true );
Large cluster backup (v1.8 and earlier)
- Turn off the balancer
> use config
> db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true );
>
> > > while( db.locks.findOne({_id: "balancer"}).state ) { print("waiting..."); sleep(1000); }
v1.6: in this older release you must check the config.changelog collection rather than config.locks to see if migrates are still in-flight.
- stop one (and only one) config server (of your three config servers). This will make the configuration database read only. The db cluster is still fully readable and writable. Do not lock+fsync config servers; that can would block write operations on the cluster.
- Backup datafiles from the the stopped config server.
- Backup each shard. Use the standard practices to backup a replica sets (e.g. fsync+lock then snapshot, or mongodump). Shards can be backed up one at a time or in parallel.
- Restart the config server that was stopped.
- Turn the balancer back on
>use config
>db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true );
Restoring the entire cluster
- Stop all processes. No server should be running mongod, mongos, or a config server mongod processes.
- Restore data files for each server in each shard and also for each config server. Normally each shard is comprised of a replica set. You must restore all the members of the replica set, or use the other standard approaches for restoring a replica set from backup.
- If shard IPs/hosts are different, you have to manually update config.shards to use the new servers:
-
- Start up the three config servers
- Load/restore the config database to each config server
- Start one mongos instance for the rest
- Update the config database collection named "shards" to reflect the new shards' ip addresses; the contents of config.shards are a replica set seed list, so use the replica set name style "replName/seedAddress1,..."
- Restart the mongos instances
- Restart the mongod instances (the replica set members)
- Connect to a mongos from the shell. Run "printShardingStatus()" and "show collections" to make sure all shards are correctly seen.
Restoring a single shard
Clusters are designed to be restored as a whole. Since the last backup, data (chunks) may have moved from shard to shard, making restoration of a single shard difficult.
That said, we can with sufficient work restore a single shard from backup.
- Restore the shard which was lost
- Chunks that were migrated away from this shard are fine. Those documents do not need to be deleted from this shard because they are automatically filtered out from queries by mongos . (See diagram below.)
- Chunks that were migrated to this shard since the last backup are now empty. Those documents must be recovered manually from the other shard backups. You can see a changelog of chunk migrations in the config.changelog collection.
For a "mature" cluster that has been running for a while, if desired we can mitigate the above scenario by:
- keeping the auto-balancer off for significant time periods. If the shards are well balanced this is not a problem. If you do this, occasionally check the balance between the shards via either MMS or the mongo shell.
- if there has been no migrations since the last backup, we can restore the lost shard in a straightforward fashion
- at set times, turn the balancer on let it run for a time to rebalance the cluster if any rebalancing is needed. afterwards, a new backup would be warranted.
The above mitigation is not necessarily recomended. With the balancer off the cluster could become unbalanced unless we are monitoring it. Keeping full cluster backups and using a good degree of replication (perhaps including slaveDelay replication) is often sufficient.
The following illustration shows a five shard cluster and what might happen on a single shard restoration from a backup.

Other Suggestions
Note that unsharded collections are always on shard a single shard (the first shard, or the one marked as the primary shard in the databases collection in the config db) and do not move. Thus you might want to back up that shard more frequently to have fresher bacukps of those small collections.
See Also
PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus