Durability and Repair

If the databases exited uncleanly and you attempt to restart the database, mongod will print:

**************
old lock file: /data/db/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************

Then it will exit.

Please find your situation on this page and follow the directions.

Journaling Enabled

If you are running with Journaling you should not do a repair to recover to a consistent state. When you start with journaling files they will automatically be replayed to a consistent state.

The --dur option was used before 1.8; now the option is --journal, and is on by default in version 1.9.2+ on 64-bit platforms

When using journaling, you may see the message:

**************
old lock file: mongod.lock.  probably means unclean shutdown,
but there are no journal files to recover.
this is likely human error or filesystem corruption.
found 23 dbs.
see: http://dochub.mongodb.org/core/repair for more information
*************

You may want to check:

  • If someone moved the journal files
  • The integrity of your disk.

Replication without Journaling

If you have a replica set then it is favorable to re-sync the failed node from scratch or a backup than to do a repair.

No Replication nor Journaling

Recent Backup

If you have a recent backup then it makes sense to use that instead of repair if you are concerned with application data consistency.

Repair Command

When not using journaling (--nojournal), after a machine crash or kill -9 termination, run the repairDatabase command. This command will check all data for corruption, remove any corruption found, and compact data files a bit. Repair is analogous to running fsck for a file system.

When journaling is enabled, it should not be necessary to run repair. However one could still use the repair command to compact a database.

From the command line:

mongod --repair

From the shell (you have to do for all dbs including local if you go this route):

> db.repairDatabase();

During a repair operation, mongod must store temporary files to disk. By default, mongod creates temporary directories under the dbpath for this purpose. Alternatively, the --repairpath command line option can be used to specify a base directory for temporary repair files.

Note that repair is a slow operation which inspects the entire database.

After running with --repair, mongod will start up normally.

When running the repair on a slave server (replica set), you will get an error stating that the server is not master. In order to run the repair, restart the slave without the --replSet option so that the server is in single db mode, and run the repair. When you restart, make sure to do it on a different port, so as not to confused the other members. Then restart one more time with the --replSet option on. This may put the slave back in a consistent state, but it is highly recommended to check the data validity by comparing a dump of the master and slave. If there is a suspicion of data being corrupted, it is safer to rebuild the slave from scratch.
mongod.lock

Do not remove the mongod.lock file. If mongod is unable to start, use one of the methods above to correct the situation.

Removing the lock file will allow the database to start when its data may be corrupt. In general, you should never force the database to start with possibly corrupt data. In an emergency situation, you may want to remove the lock file to pull whatever data you can off the server. If you have ever manually removed the lock file and started the server back up, you should not consider that server "healthy."

Checking Data Integrity

You can use the validate command on to check if the contents of a collection are valid.

For example, here we validate the users collection:

> db.users.validate();
{
 "ns" : "test.users",
 "result" : " validate
  details: 0x1243dbbdc ofs:740bdc
  firstExtent:0:178b00 ns:test.users
  lastExtent:0:178b00 ns:test.users
  # extents:1
  datasize?:44 nrecords?:1 lastExtentSize:8192
  padding:1
  first extent:
    loc:0:178b00 xnext:null xprev:null
    nsdiag:test.users
    size:8192 firstRecord:0:178bb0 lastRecord:0:178bb0
  1 objects found, nobj:1
  60 bytes data w/headers
  44 bytes data wout/headers
  deletedList: 0000000010000000000
  deleted: n: 1 size: 7956
  nIndexes:2
    test.users.$_id_ keys:1
    test.users.$username_1 keys:1 ",
 "ok" : 1,
 "valid" : true,
 "lastExtentSize" : 8192
}

This is a slow command, as it has to check every document in a collection.

See Also


Labels

13440 13440 Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus