Import Export Tools

If you just want to copy a database from one mongod server to another, use the copydb Command instead of these tools.
These tools work with the raw data (the BSON documents in the collections, both user and system); they do not save, or load certain metadata such as (capped) collection properties. You will need to (re)create those yourself in a separate step, before loading that data. Vote for SERVER-808 to change this. (Consider using the copydb command which does preserve these properties.)

Data Import and Export

mongoimport

This utility takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it. You have to specify a database and a collection.

options:
  --help                  produce help message
  -v [ --verbose ]        be more verbose (include multiple times for more
                          verbosity e.g. -vvvvv)
  -h [ --host ] arg       mongo host to connect to ("left,right" for pairs)
  --port arg              server port. (Can also use --host hostname:port)
  --ipv6                  enable IPv6 support (disabled by default)
  -d [ --db ] arg         database to use
  -c [ --collection ] arg collection to use (some commands)
  -u [ --username ] arg   username
  -p [ --password ] arg   password
  --dbpath arg            directly access mongod data files in the given path,
                          instead of connecting to a mongod instance - needs to
                          lock the data directory, so cannot be used if a
                          mongod is currently accessing the same path
  --directoryperdb        if dbpath specified, each db is in a separate
                          directory
  -f [ --fields ] arg     comma seperated list of field names e.g. -f name,age
  --fieldFile arg         file with fields names - 1 per line
  --ignoreBlanks          if given, empty fields in csv and tsv will be ignored
  --type arg              type of file to import.  default: json (json,csv,tsv)
  --file arg              file to import from; if not specified stdin is used
  --drop                  drop collection first
  --headerline            CSV,TSV only - use first line as headers
  --upsert                insert or update objects that already exist
  --upsertFields arg      comma-separated fields for the query part of the
                          upsert. You should make sure this is indexed.
  --stopOnError           stop importing at the first error rather
                          than continuing
  --jsonArray             load a json array, not one item per line.
                          Currently limited to 4MB.

Note that the following options are only available in 1.5.3+: upsert, upsertFields, stopOnError, jsonArray

Example: Import file format

The import file should contain one document per line (with a few exceptions: if using --jsonArray, if importing a CSV then one document may span multiple lines if it contains multi-line string, if importing a CSV with --headerline then the first line doesn't correspond to a document but instead specifies which fields are being imported).

When using the standard JSON import format, each line in input file must be one JSON document which will be inserted directly into the database.

For example, if you imported a file that looks like this:

{ "_id" : { "$oid" : "4e5bb37258200ed9aabc5d65" }, "name" : "Bob", "age" : 28, "address" : "123 fake street" }

by running

mongoimport -d test -c foo importfile.json

you'd get this imported:

> db.foo.find()
{ "_id" : ObjectId("4e5bb37258200ed9aabc5d65"), "name" : "Bob", "age" : 28, "address" : "123 fake street" }
Example: Importing with upsert

The following command will import data from temp.csv into database foo, collection bar on localhost. Additionally it will perform an upsert of the data. By default the upsert will use the field marked as _id as the key for updates.

mongoimport --host localhost --db foo --collection bar --type csv --file temp.csv --headerline --upsert

If the file does not have an _id field, you can update on alternate fields by using upsertFields. Note that when using this with sharding, the upsertField must be the shardkey.

Even though using --upsert may result in an update, every document in the input file must be formatted in a way that is compatible to insert. Therefore, no update modifiers are allowed.
Example: Importing Interesting Types

MongoDB supports more types that JSON does, so it has a special format for representing some of these types as valid JSON. For example, JSON has no date type. Thus, to import data containing dates, you structure your JSON like:

{"somefield" : 123456, "created_at" : {"$date" : 1285679232000}}

Then mongoimport will turn the created_at value into a Date.

Note: the $-prefixed types must be enclosed in double quotes to be parsed correctly.

mongoexport

mongoexport takes a collection and exports to either JSON or CSV. You can specify a filter for the query, or a list of fields to output.

See the mongoexport page for more information.

mongodump and mongorestore

The are many ways to do backups and restores. (Here are some other backup strategies)

mongodump

This takes a database and outputs it in a binary representation. This is used for doing (hot) backups of a database.

If you're using sharding and trying to migrate data this way, this will dump shard configuration metadata information (from the config db) and overwrite configurations upon restore. This is true because without any options mongodump will dump all dbs/collections, including the config db where this information is kept.
options:
  --help                   produce help message
  -v [ --verbose ]         be more verbose (include multiple times for more
                           verbosity e.g. -vvvvv)
  -h [ --host ] arg        mongo host to connect to ("left,right" for pairs)
  -d [ --db ] arg          database to use
  -c [ --collection ] arg  collection to use (some commands)
  -u [ --username ] arg    username
  -p [ --password ] arg    password
  --dbpath arg             directly access mongod data files in the given path,
                           instead of connecting to a mongod instance - needs
                           to lock the data directory, so cannot be used if a
                           mongod is currently accessing the same path
  --directoryperdb         if dbpath specified, each db is in a separate
                           directory
  -o [ --out ] arg (=dump) output directory
  -q [ --query ] arg       json query
  --oplog                  point in time backup (requires an oplog)
  --repair                 repairs documents as it dumps from a corrupt db (requires --dbpath and -d/--db)
  --forceTableScan         force a table scan (do not use $snapshot)

Example: Dumping Everything

To dump all of the collections in all of the databases, run mongodump with just the --host:

$ ./mongodump --host prod.example.com
connected to: prod.example.com
all dbs
DATABASE: log    to   dump/log
        log.errors to dump/log/errors.bson
                713 objects
        log.analytics to dump/log/analytics.bson
                234810 objects
DATABASE: blog    to    dump/blog
        blog.posts to dump/log/blog.posts.bson
                59 objects
DATABASE: admin    to    dump/admin

You'll then have a folder called "dump" in your current directory.

If you're running mongod locally on the default port, you can just do:

$ ./mongodump
Example: Dumping a Single Collection

If we just want to dump a single collection, we can specify it and get a single .bson file.

$ ./mongodump --db blog --collection posts
connected to: 127.0.0.1
DATABASE: blog        to     dump/blog
        blog.posts to dump/blog/posts.bson
                 59 objects
Currently indexes for a single collection will not be backed up. Please follow SERVER-808
Example: Dumping a Single Collection to Stdout

In version 1.7.0+, you can use stdout instead of a file by specifying --out stdout:

$ ./mongodump --db blog --collection posts --out - > blogposts.bson

mongodump creates a file for each database collection, so we can only dump one collection at a time to stdout.

Example: Dumping a Single Collection with a query

Using the -q argument, you can specify a JSON query to be passed. The example below dumps out documents where the "created_at" is between 2010-12-01 and 2010-12-31.

$ ./mongodump --db blog --collection posts
    -q '{"created_at" : { "$gte" : {"$date" : 1293868800000},
                          "$lt"  : {"$date" : 1296460800000}
                        }
        }'
Example: Using --oplog to get a point-in-time backup

If data is changed over the course of a backup then the resulting dump may wind up in an inconsistent state that doesn't correspond to how the data looked in the DB at any one moment. This can be avoided by using --oplog in mongodump and --oplogReplay in mongorestore. If you use --oplog then when the backup is started, mongodump will note the time of the most recent entry in the oplog. When the dump is finished, mongodump will then find all the oplog entries since the dump started and will dump those as well. When you run mongorestore with --oplogReplay, after it finishes restoring the main dump, it will replay the entries in the oplog dump so that the data restored ends up in a consistent state corresponding to the moment that the original dump finished.

$ ./mongodump --host localhost --oplog
connected to: localhost
all dbs
DATABASE: admin  to     dump/admin
        admin.system.users to dump/admin/system.users.bson
                 1 objects
        admin.system.indexes to dump/admin/system.indexes.bson
                 1 objects
DATABASE: test   to     dump/test
        test.foo to dump/test/foo.bson
                 297110 objects
        test.system.indexes to dump/test/system.indexes.bson
                 1 objects
        local.oplog.rs to dump/oplog.bson
                 11304 objects

It is not valid to use --oplog when dumping from a mongos. You can use it to dump an individual shard though – see the Backing Up Sharded Cluster page. Likewise, --oplog can only be used on the master in a Master Slave configuration because the slave does not store an oplog.

Performance Tips

The default dump mode is to do a "snapshot" query. This results in the dump query walking through the _id index and returning results in that order. If you use a custom _id value, not the default ObjectId type, then this could cause much more disk activity to do a dump; it could (dramatically) slow down things.

In 1.9.1+ you can force a walk of the data without using an index:

$ ./mongodump --forceTableScan ... 

In earlier versions you can cause this to behavior with a special query (one that cannot use the index):

$ ./mongodump -q "{xxxx : { $ne : 0 } }" ... 

Note: In some shells (like bash) you must escape the "$" in this command like so "\$".

If mongodump seems to skip documents...

There is a maximum key size in the indexes, currently approximately 800 bytes. This limit also applies to the default index on _id. Any document with an _id key larger than 800 bytes will not be indexed on _id. By default mongodump walks the _id index and will skip documents with keys too large to index.

> use bigid
> db.foo.count()
3

$ mongodump -d bigid -c foo
connected to: 127.0.0.1
DATABASE: bigid	 to 	dump/bigid
	bigid.foo to dump/bigid/foo.bson
		 0 objects

You can work around this issue with either of the options listed in the previous section:

$ mongodump -d bigid -c foo -q "{xxxx : { \$ne : 0 } }"
connected to: 127.0.0.1
DATABASE: bigid	 to 	dump/bigid
	bigid.foo to dump/bigid/foo.bson
		 3 objects

mongorestore

mongorestore takes the output from mongodump and restores it. Indexes will be created on a restore. mongorestore just does inserts with the data to restore; if existing data (like with the same _id) is there it will not be replaced. This can be done with an existing database, or mongorestore will create a new one if the database does not exist. Mongorestore is mostly non-blocking (it just calls a series of normal inserts), though if the dump included indexes it may cause the DB to block as the indexes are rebuilt.

If you do not wish to create indexes you can remove the system.indexes.bson file from your database(s) dump directory before restoring. (The default _id indexes will always be created.)
usage: ./mongorestore [options] [directory or filename to restore from]
options:
  --help                  produce help message
  -v [ --verbose ]        be more verbose (include multiple times for more
                          verbosity e.g. -vvvvv)
  -h [ --host ] arg       mongo host to connect to ("left,right" for pairs)
  -d [ --db ] arg         database to use
  -c [ --collection ] arg collection to use (some commands)
  -u [ --username ] arg   username
  -p [ --password ] arg   password
  --dbpath arg            directly access mongod data files in the given path,
                          instead of connecting to a mongod instance - needs to
                          lock the data directory, so cannot be used if a
                          mongod is currently accessing the same path
  --directoryperdb        if dbpath specified, each db is in a separate
                          directory
  --drop                  drop each collection before import
  --objcheck              validate object before inserting
  --filter arg            filter to apply before inserting
  --indexesLast           wait to add indexes (faster if data isn't inserted in
                          index order)
  --oplogReplay           Restores the dump and replays the backed up portion of the oplog.

bsondump

v1.6+

This takes a bson file (typically from mongodump) and converts it to json/debug output. Passing type=debug outputs an indented format that shows the type and size for each object.

usage: bsondump [options] <bson filename>
options:
  --help                produce help message
  -v [ --verbose ]      be more verbose (include multiple times for more 
                        verbosity e.g. -vvvvv)
  --version             print the program's version and exit
  --objcheck            validate object before inserting
  --filter arg          filter to apply before inserting
  --type arg (=json)    type of output: json,debug

The debug format displays extra debug information for each field, including the type and size, in an indented form. The debug option also tries to validate strings are valid utf-8.

See Also


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus