Resyncing a Very Stale Replica Set Member

Error RS102

MongoDB writes operations to an oplog.  For replica sets this data is stored in collection local.oplog.rs.  This is a capped collection and wraps when full "RRD"-style.  Thus, it is important that the oplog collection is large enough to buffer a good amount of writes when some members of a replica set are down.  If too many writes occur, the down nodes, when they resume, cannot catch up.  In that case, a full resync would be required.

In v1.8+, you can run db.printReplicationInfo() to see the status of the oplog on both the current primary and the overly stale member. This should show you their times, and if their logs have an overlapping time range. If the time ranges don't overlap, there is no way for the stale secondary to recover and catch up (except for a full resync).

There is also a MMS graph of the oplog time length.

> db.printReplicationInfo()
configured oplog size:   47.6837158203125MB
log length start to end: 132secs (0.04hrs)
oplog first event time:  Wed Apr 13 2011 02:58:08 GMT-0400
oplog last event time:   Wed Apr 13 2011 03:00:20 GMT-0400
now:                     Wed Apr 13 2011 14:09:08 GMT-0400

Sizing the oplog

The command line --oplogSize parameter sets the oplog size. A good rule of thumb is 5 to 10% of total disk space. On 64 bit builds, the default is large and similar to this percentage. You can check your existing oplog sizes from the mongo shell :

> use local
> db.oplog.rs.stats()

What to do on a RS102 sync error

If one of your members has been offline and is now too far behind to catch up, you will need to resync. There are a number of ways to do this.

  • Perform a full resync. If you stop the failed mongod, delete all data in the dbpath (including subdirectories), and restart it, it will automatically resynchronize itself. Obviously it would be better/safer to back up the data first. If disk space is adequate, simply move it to a backup location on the machine if appropriate. Resyncing may take a long time if the database is huge or the network slow – even idealized one terabyte of data would require three hours to transmit over gigabit ethernet.

or

  • Copy data from another member: You can copy all the data files from another member of the set IF you have a snapshot of that member's data file's. This can be done in a number of ways. The simplest is to stop mongod on the source member, copy all its files, and then restart mongod on both nodes. The Mongo fsync and lock feature is another way to achieve this if you are using EBS or a SAN. On a slow network, snapshotting all the datafiles from another (inactive) member to a gziped tarball is a good solution. Also similar strategies work well when using SANs and services such as Amazon Elastic Block Service snapshots.

or

  • Find a member with older data: Note: this is only possible (and occurs automatically) in v1.8+. If another member of the replica set has a large enough oplog or is far enough behind that the stale member can sync from it, the stale member can bootstrap itself from this member.

See Also

Follow @mongodb

MongoDB Pittsburgh - May 15
MongoNYC - May 23
MongoDB Paris - Jun 14
MongoDB UK - Jun 20
MongoDC - June 26


Labels

halted halted Delete
replication replication Delete
resync resync Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus