Data Import and ExportmongoimportThis utility takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it. You have to specify a database and a collection. options: --help produce help message -v [ --verbose ] be more verbose (include multiple times for more verbosity e.g. -vvvvv) -h [ --host ] arg mongo host to connect to ("left,right" for pairs) --port arg server port. (Can also use --host hostname:port) --ipv6 enable IPv6 support (disabled by default) -d [ --db ] arg database to use -c [ --collection ] arg collection to use (some commands) -u [ --username ] arg username -p [ --password ] arg password --dbpath arg directly access mongod data files in the given path, instead of connecting to a mongod instance - needs to lock the data directory, so cannot be used if a mongod is currently accessing the same path --directoryperdb if dbpath specified, each db is in a separate directory -f [ --fields ] arg comma seperated list of field names e.g. -f name,age --fieldFile arg file with fields names - 1 per line --ignoreBlanks if given, empty fields in csv and tsv will be ignored --type arg type of file to import. default: json (json,csv,tsv) --file arg file to import from; if not specified stdin is used --drop drop collection first --headerline CSV,TSV only - use first line as headers --upsert insert or update objects that already exist --upsertFields arg comma-separated fields for the query part of the upsert. You should make sure this is indexed. --stopOnError stop importing at the first error rather than continuing --jsonArray load a json array, not one item per line. Currently limited to 4MB. Note that the following options are only available in 1.5.3+: upsert, upsertFields, stopOnError, jsonArray Example: Import file formatThe import file should contain one document per line (with a few exceptions: if using --jsonArray, if importing a CSV then one document may span multiple lines if it contains multi-line string, if importing a CSV with --headerline then the first line doesn't correspond to a document but instead specifies which fields are being imported). When using the standard JSON import format, each line in input file must be one JSON document which will be inserted directly into the database. For example, if you imported a file that looks like this: { "_id" : { "$oid" : "4e5bb37258200ed9aabc5d65" }, "name" : "Bob", "age" : 28, "address" : "123 fake street" }
by running mongoimport -d test -c foo importfile.json you'd get this imported: > db.foo.find()
{ "_id" : ObjectId("4e5bb37258200ed9aabc5d65"), "name" : "Bob", "age" : 28, "address" : "123 fake street" }
Example: Importing with upsertThe following command will import data from temp.csv into database foo, collection bar on localhost. Additionally it will perform an upsert of the data. By default the upsert will use the field marked as _id as the key for updates. mongoimport --host localhost --db foo --collection bar --type csv --file temp.csv --headerline --upsert If the file does not have an _id field, you can update on alternate fields by using upsertFields. Note that when using this with sharding, the upsertField must be the shardkey.
Example: Importing Interesting TypesMongoDB supports more types that JSON does, so it has a special format for representing some of these types as valid JSON. For example, JSON has no date type. Thus, to import data containing dates, you structure your JSON like: {"somefield" : 123456, "created_at" : {"$date" : 1285679232000}}
Then mongoimport will turn the created_at value into a Date. Note: the $-prefixed types must be enclosed in double quotes to be parsed correctly. mongoexportmongoexport takes a collection and exports to either JSON or CSV. You can specify a filter for the query, or a list of fields to output. See the mongoexport page for more information. mongodump and mongorestoreThe are many ways to do backups and restores. (Here are some other backup strategies) mongodumpThis takes a database and outputs it in a binary representation. This is used for doing (hot) backups of a database.
options: --help produce help message -v [ --verbose ] be more verbose (include multiple times for more verbosity e.g. -vvvvv) -h [ --host ] arg mongo host to connect to ("left,right" for pairs) -d [ --db ] arg database to use -c [ --collection ] arg collection to use (some commands) -u [ --username ] arg username -p [ --password ] arg password --dbpath arg directly access mongod data files in the given path, instead of connecting to a mongod instance - needs to lock the data directory, so cannot be used if a mongod is currently accessing the same path --directoryperdb if dbpath specified, each db is in a separate directory -o [ --out ] arg (=dump) output directory -q [ --query ] arg json query --oplog point in time backup (requires an oplog) --repair repairs documents as it dumps from a corrupt db (requires --dbpath and -d/--db) --forceTableScan force a table scan (do not use $snapshot) Example: Dumping EverythingTo dump all of the collections in all of the databases, run mongodump with just the --host: $ ./mongodump --host prod.example.com
connected to: prod.example.com
all dbs
DATABASE: log to dump/log
log.errors to dump/log/errors.bson
713 objects
log.analytics to dump/log/analytics.bson
234810 objects
DATABASE: blog to dump/blog
blog.posts to dump/log/blog.posts.bson
59 objects
DATABASE: admin to dump/admin
You'll then have a folder called "dump" in your current directory. If you're running mongod locally on the default port, you can just do: $ ./mongodump Example: Dumping a Single CollectionIf we just want to dump a single collection, we can specify it and get a single .bson file. $ ./mongodump --db blog --collection posts
connected to: 127.0.0.1
DATABASE: blog to dump/blog
blog.posts to dump/blog/posts.bson
59 objects
Example: Dumping a Single Collection to StdoutIn version 1.7.0+, you can use stdout instead of a file by specifying --out stdout: $ ./mongodump --db blog --collection posts --out - > blogposts.bson mongodump creates a file for each database collection, so we can only dump one collection at a time to stdout. Example: Dumping a Single Collection with a queryUsing the -q argument, you can specify a JSON query to be passed. The example below dumps out documents where the "created_at" is between 2010-12-01 and 2010-12-31. $ ./mongodump --db blog --collection posts
-q '{"created_at" : { "$gte" : {"$date" : 1293868800000},
"$lt" : {"$date" : 1296460800000}
}
}'
Example: Using --oplog to get a point-in-time backupIf data is changed over the course of a backup then the resulting dump may wind up in an inconsistent state that doesn't correspond to how the data looked in the DB at any one moment. This can be avoided by using --oplog in mongodump and --oplogReplay in mongorestore. If you use --oplog then when the backup is started, mongodump will note the time of the most recent entry in the oplog. When the dump is finished, mongodump will then find all the oplog entries since the dump started and will dump those as well. When you run mongorestore with --oplogReplay, after it finishes restoring the main dump, it will replay the entries in the oplog dump so that the data restored ends up in a consistent state corresponding to the moment that the original dump finished. $ ./mongodump --host localhost --oplog
connected to: localhost
all dbs
DATABASE: admin to dump/admin
admin.system.users to dump/admin/system.users.bson
1 objects
admin.system.indexes to dump/admin/system.indexes.bson
1 objects
DATABASE: test to dump/test
test.foo to dump/test/foo.bson
297110 objects
test.system.indexes to dump/test/system.indexes.bson
1 objects
local.oplog.rs to dump/oplog.bson
11304 objects
It is not valid to use --oplog when dumping from a mongos. You can use it to dump an individual shard though – see the Backing Up Sharded Cluster page. Likewise, --oplog can only be used on the master in a Master Slave configuration because the slave does not store an oplog. Performance TipsThe default dump mode is to do a "snapshot" query. This results in the dump query walking through the _id index and returning results in that order. If you use a custom _id value, not the default ObjectId type, then this could cause much more disk activity to do a dump; it could (dramatically) slow down things. In 1.9.1+ you can force a walk of the data without using an index: $ ./mongodump --forceTableScan ... In earlier versions you can cause this to behavior with a special query (one that cannot use the index): $ ./mongodump -q "{xxxx : { $ne : 0 } }" ...
Note: In some shells (like bash) you must escape the "$" in this command like so "\$". If mongodump seems to skip documents...There is a maximum key size in the indexes, currently approximately 800 bytes. This limit also applies to the default index on _id. Any document with an _id key larger than 800 bytes will not be indexed on _id. By default mongodump walks the _id index and will skip documents with keys too large to index. > use bigid > db.foo.count() 3 $ mongodump -d bigid -c foo connected to: 127.0.0.1 DATABASE: bigid to dump/bigid bigid.foo to dump/bigid/foo.bson 0 objects You can work around this issue with either of the options listed in the previous section:
$ mongodump -d bigid -c foo -q "{xxxx : { \$ne : 0 } }"
connected to: 127.0.0.1
DATABASE: bigid to dump/bigid
bigid.foo to dump/bigid/foo.bson
3 objects
mongorestoremongorestore takes the output from mongodump and restores it. Indexes will be created on a restore. mongorestore just does inserts with the data to restore; if existing data (like with the same _id) is there it will not be replaced. This can be done with an existing database, or mongorestore will create a new one if the database does not exist. Mongorestore is mostly non-blocking (it just calls a series of normal inserts), though if the dump included indexes it may cause the DB to block as the indexes are rebuilt.
usage: ./mongorestore [options] [directory or filename to restore from] options: --help produce help message -v [ --verbose ] be more verbose (include multiple times for more verbosity e.g. -vvvvv) -h [ --host ] arg mongo host to connect to ("left,right" for pairs) -d [ --db ] arg database to use -c [ --collection ] arg collection to use (some commands) -u [ --username ] arg username -p [ --password ] arg password --dbpath arg directly access mongod data files in the given path, instead of connecting to a mongod instance - needs to lock the data directory, so cannot be used if a mongod is currently accessing the same path --directoryperdb if dbpath specified, each db is in a separate directory --drop drop each collection before import --objcheck validate object before inserting --filter arg filter to apply before inserting --indexesLast wait to add indexes (faster if data isn't inserted in index order) --oplogReplay Restores the dump and replays the backed up portion of the oplog. bsondump
This takes a bson file (typically from mongodump) and converts it to json/debug output. Passing type=debug outputs an indented format that shows the type and size for each object. usage: bsondump [options] <bson filename>
options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
--version print the program's version and exit
--objcheck validate object before inserting
--filter arg filter to apply before inserting
--type arg (=json) type of output: json,debug
The debug format displays extra debug information for each field, including the type and size, in an indented form. The debug option also tries to validate strings are valid utf-8. See Also |

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus