Durability and Repair

MongoDB (specifically, the mongod process) is normally ran with journaling enabled. This makes it crash-safe.

Journaling is on by default in v2.0+ for 64 bit builds. (Use journaling, leave it enabled.)

Journaling Enabled

If you are running with Journaling you should not do a repair to recover to a consistent state. When you start with journaling files they will automatically be replayed to a consistent state.

The --dur option was used before 1.8; now the option is --journal, and is on by default in version 1.9.2+ on 64-bit platforms

When using journaling, you may see the message:

**************
old lock file: mongod.lock.  probably means unclean shutdown,
but there are no journal files to recover.
this is likely human error or filesystem corruption.
found 23 dbs.
see: http://dochub.mongodb.org/core/repair for more information
*************

You may want to check:

  • If someone moved the journal files
  • The integrity of your disk.

Replication without Journaling

If you have a replica set then it is favorable to re-sync the failed node from scratch or a backup than to do a repair.

No Replication nor Journaling

Recent Backup

If you have a recent backup then it makes sense to use that instead of repair if you are concerned with application data consistency.

Repair Command

When not using journaling (--nojournal), after a machine crash or kill -9 termination, run the repairDatabase command. This command will check all data for corruption, remove any corruption found, and compact data files a bit. Repair is analogous to running fsck for a file system.

When journaling is enabled, it should not be necessary to run repair. However one could still use the repair command to compact a database.

From the command line:

mongod --repair

From the shell (you have to do for all dbs including local if you go this route):

> db.repairDatabase();

During a repair operation, mongod must store temporary files to disk. By default, mongod creates temporary directories under the dbpath for this purpose. Alternatively, the --repairpath command line option can be used to specify a base directory for temporary repair files.

Note that repair is a slow operation which inspects the entire database.

After running with --repair, mongod will start up normally.

When running the repairDatabase command on a non-primary server (replica set secondary), you will get an error stating that the server is not master. In order to run the repair, restart the server without the --replSet option so that the server is in single server mode, and run the repair. When you restart, make sure to do it on a different port, so as not to confused the other members. Then restart one more time with the --replSet option on. This may put the replica server back in a consistent state, but it is highly recommended to check the data validity by comparing a dump of the master/primary and repaired-replica. If there is a suspicion of data being corrupted, it is safer to resync the replica from scratch.
Because mongod rewrites all of the database files during the repair routine, if you do not run --repair under the same user account as mongod usually runs, you will need to run chown on your database files to correct the permissions before starting mongod again.
mongod.lock

Do not remove the mongod.lock file. If mongod is unable to start, use one of the methods above to correct the situation.

Removing the lock file will allow the database to start when its data may be corrupt. In general, you should never force the database to start with possibly corrupt data. In an emergency situation, you may want to remove the lock file to pull whatever data you can off the server. If you have ever manually removed the lock file and started the server back up, you should not consider that server "healthy."

Checking Data Integrity

You can use the validate command on to check if the contents of a collection are valid.

For example, here we validate the users collection:

> db.users.validate();
{
 "ns" : "test.users",
 "result" : " validate
  details: 0x1243dbbdc ofs:740bdc
  firstExtent:0:178b00 ns:test.users
  lastExtent:0:178b00 ns:test.users
  # extents:1
  datasize?:44 nrecords?:1 lastExtentSize:8192
  padding:1
  first extent:
    loc:0:178b00 xnext:null xprev:null
    nsdiag:test.users
    size:8192 firstRecord:0:178bb0 lastRecord:0:178bb0
  1 objects found, nobj:1
  60 bytes data w/headers
  44 bytes data wout/headers
  deletedList: 0000000010000000000
  deleted: n: 1 size: 7956
  nIndexes:2
    test.users.$_id_ keys:1
    test.users.$username_1 keys:1 ",
 "ok" : 1,
 "valid" : true,
 "lastExtentSize" : 8192
}

This is a slow command, as it has to check every document in a collection.

If journaling is disabled

If the databases exited uncleanly and you attempt to restart the database, mongod will prin

**************
old lock file: /data/db/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************

Then it will exit.

See Also

Follow @mongodb

MongoDB Pittsburgh - May 15
MongoNYC - May 23
MongoDB Paris - Jun 14
MongoDB UK - Jun 20
MongoDC - June 26


Labels

13440 13440 Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus