Understanding Disk UsageYou may notice that for a given set of data the MongoDB datafiles in /data/db are larger than the data set inserted into the database. There are several reasons for this. local.* files and replicationThe replication oplog is preallocated as a capped collection in the local database. The default allocation is approximately 5% of disk space (64 bit installations). If you would like a smaller oplog size use the --oplogSize command line parameter. Datafile PreallocationEach datafile is preallocated to a particular size. (This is done to prevent file system fragmentation, among other reasons.) The first filename for a database is <dbname>.0, then <dbname>.1, etc. <dbname>.0 will be 64MB, <dbname>.1 128MB, et cetera, up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB. Thus, if the last datafile present is, say, 1GB, that file might be 90% empty if it was recently created. Additionally, on Unix, mongod will preallocate an additional datafile in the background and do background initialization of this file. These files are prefilled with zero bytes. This initialization can take up to a minute (less on a fast disk subsystem) for larger datafiles. Pre-filling in the background prevents significant delays when a new database file is next allocated. On Windows, additional datafiles are not preallocated. NTFS can allocate large files filled with zeroes relatively quickly, rendering preallocation unnecessary. As soon as a datafile starts to be used, the next one will be preallocated. You can disable preallocation with the --noprealloc command line parameter. This flag is nice for tests with small datasets where you drop the database after each test. It should not be used on production servers. For large databases (hundreds of GB or more), this is of no significant consequence as the unallocated space is relatively small. On Linux systems you can use hdparam to allocate files to get an idea of how costly allocation might be: time hdparm --fallocate $((1024*1024)) testfile Recovering Deleted SpaceMongoDB maintains lists of deleted blocks within the datafiles when objects or collections are deleted. This space is reused by MongoDB but never freed to the operating system. To shrink the amount of physical space used by the datafiles themselves, by reclaiming deleted blocks, you must rebuild the database by using (db.repairDatabase()).
Rather than compacting an entire database, you can compact just a single collection by using db.runCommand({compact:'collectionname'}. This does not shrink any datafiles, however; it only defragments deleted space so that larger objects might reuse it. The compact command will never delete or shrink database files, and in general requires extra space to do its work. Thus, it is not a good option when you are running critically low on disk space.
Running out of disk spaceIf your server runs out of disk space you will see something like this in the log: Thu Aug 11 13:06:09 [FileAllocator] allocating new datafile dbms/test.13, filling with zeroes... Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device Thu Aug 11 13:06:09 [FileAllocator] will try again in 10 seconds Thu Aug 11 13:06:19 [FileAllocator] allocating new datafile dbms/test.13, filling with zeroes... Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device Thu Aug 11 13:06:19 [FileAllocator] will try again in 10 seconds The server remains in this state forever blocking all writes including deletes. However, reads still work. To delete some data and compact (see above), you must restart the server first. Checking Size of a CollectionUse the validate command to check the size of a collection -- that is from the shell run: > db.<collectionname>.validate(); > // these are faster: > db.<collectionname>.dataSize(); // just data size for collection > db.<collectionname>.storageSize(); // allocation size including unused space > db.<collectionname>.totalSize(); // data + index > db.<collectionname>.totalIndexSize(); // index data size This command returns info on the collection data but note there is also data allocated for associated indexes. These can be checked with validate too, if one looks up the index's namespace name in the system.namespaces collection. For example: > db.system.namespaces.find()
{"name" : "test.foo"}
{"name" : "test.system.indexes"}
{"name" : "test.foo.$_id_"}
> > db.foo.$_id_.validate()
{"ns" : "test.foo.$_id_" , "result" : "
validate
details: 0xb3590b68 ofs:83fb68
firstExtent:0:8100 ns:test.foo.$_id_
lastExtent:0:8100 ns:test.foo.$_id_
# extents:1
datasize?:8192 nrecords?:1 lastExtentSize:131072
padding:1
first extent:
loc:0:8100 xnext:null xprev:null
ns:test.foo.$_id_
size:131072 firstRecord:0:81b0 lastRecord:0:81b0
1 objects found, nobj:1
8208 bytes data w/headers
8192 bytes data wout/headers
deletedList: 0000000000001000000
deleted: n: 1 size: 122688
nIndexes:0
" , "ok" : 1 , "valid" : true , "lastExtentSize" : 131072}
Helpful scriptsThese one-line scripts will print the stats for each db/collection: db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); printjson(mdb.stats())}) db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})}) |

PLEASE POST QUESTIONS IN THE FORUMS: http://groups.google.com/group/mongodb-user. Post tips and clarifications here.
blog comments powered by Disqus