If you're running mongod with master-slave replication, there are certain scenarios where the slave will halt replication because it hasn't kept up with the master's oplog. The first is when a slave is prevented from replicating for an extended period of time, due perhaps to a network partition or the killing of the slave process itself. The best solution in this case is to resyc the slave. To do this, open the mongo shell and point it at the slave: $ mongo <slave_host_and_port> Then run the resync command: > use admin
> db.runCommand({resync: 1})
This will force a full resync of all data (which will be very slow on a large database). The same effect can be achieved by stopping mongod on the slave, delete all slave datafiles, and restarting it. Increasing the OpLog SizeSince the oplog is a capped collection, it's allocated to a fixed size; this means that as more data is entered, the collection will loop around and overwrite itself instead of growing beyond its pre-allocated size. If the slave can't keep up with this process, then replication will be halted. The solution is to increase the size of the master's oplog.
There are a couple of ways to do this, depending on how big your oplog will be and how much downtime you can stand. But first you need to figure out how big an oplog you need. If the current oplog size is wrong, how do you figure out what's right? The goal is not to let the oplog age out in the time it takes to clone the database. The first step is to print the replication info. On the master node, run this command: > db.printReplicationInfo(); You'll see output like this: configured oplog size: 1048.576MB log length start to end: 7200secs (2hrs) oplog first event time: Wed Mar 03 2010 16:20:39 GMT-0500 (EST) oplog last event time: Wed Mar 03 2010 18:20:39 GMT-0500 (EST) now: Wed Mar 03 2010 18:40:34 GMT-0500 (EST) This indicates that you're adding data to the database at a rate of 524MB/hr. If an initial clone takes 10 hours, then the oplog should be at least 5240MB, so something closer to 8GB would make for a safe bet. The standard way of changing the oplog size involves stopping the mongod master, deleting the local.* datafiles, and then restarting with the oplog size you need, measured in MB: $ # Stop mongod - killall mongod or kill -2 or ctrl-c) - then: $ rm /data/db/local.* $ mongod --oplogSize=8038 --master Once you've changed the oplog size, restart with slave with --autoresync: mongod --slave --autoresync This method of oplog creation might pose a problem if you need a large oplog (say, > 10GB), since the time it takes mongod to pre-allocate the oplog files may mean too much downtime. If this is the case, read on. Manually Allocating OpLog FilesAn alternative approach is to create the oplog files manually before shutting down mongod. Suppose you need an 20GB oplog; here's how you'd go about creating the files: 1. Create a temporary directory, /tmp/local. cd /tmp/local for i in {0..9} do echo $i head -c 2146435072 /dev/zero > local.$i done Note that the datafiles aren't exactly 2GB due MongoDB's max int size. If you'd like MongoDB to preallocate them for you, you can do: $ mongod --dbpath /tmp/local --port 27099 --master --oplogSize=20000 Set the port to be something that is different than the other mongod running on the machine. Once this instance has finished allocating oplog files (watch the log), shut it down. If you are allocating these files for a replica set, remove the local.ns file: $ rm /tmp/local/local.ns 3. Shut down the mongod master (kill -2) and then replace the oplog files: $ mv /data/db/local.* /safe/place $ mv /tmp/local/* /data/db/ 4. Restart the master with the new oplog size: $ mongod --master --oplogSize=20000 5. Finally, resync the slave. This can be done by shutting down the slave, deleting all its datafiles, and restarting it. |

PLEASE POST QUESTIONS IN THE FORUMS: http://groups.google.com/group/mongodb-user. Post tips and clarifications here.
blog comments powered by Disqus