Object IDs

Documents in MongoDB required a key, _id, which uniquely identifies them.

The _id Field

Almost every MongoDB document has an _id field as its first attribute (there are a few exceptions for system collections and capped collections).  The _id value can be of any type with type ObjectId being the most common. _id must be unique for each document in a collection. In most cases collections automatically have an _id index which includes a unique key constraint that enforces this.

If a user tries to insert a document without providing an _id field, the database will automatically generate an _object id_ and store it the _id field.

The _id value may be of any type, other than arrays, so long as it is a unique. If your document has a natural primary key that is immutable we recommend you use that in _id instead of the automatically generated ids. Arrays are not allowed _ids because they are Multikeys.

The BSON ObjectId Datatype

Although _id values can be of any type, a special BSON datatype is provided for object ids.  This type is a 12-byte binary value designed to have a reasonably high probability of being unique when allocated.  All of the officially-supported MongoDB drivers use this type by default for _id values.  Also, the Mongo database itself uses this type when assigning _id values on inserts where no _id value is present.

In the MongoDB shell, ObjectId() may be used to create ObjectIds.  ObjectId(string) creates an object ID from the specified hex string.

> x={ name: "joe" }
{ name : "joe" }
> db.people.save(x)
{ name : "joe" , _id : ObjectId( "47cc67093475061e3d95369d" ) }
> x
{ name : "joe" , _id : ObjectId( "47cc67093475061e3d95369d" ) }
> db.people.findOne( { _id: ObjectId( "47cc67093475061e3d95369d" ) } )
{ _id : ObjectId( "47cc67093475061e3d95369d" ) , name : "joe" }
> db.people.findOne( { _id: new ObjectId( "47cc67093475061e3d95369d" ) } )
{ _id : ObjectId( "47cc67093475061e3d95369d" ) , name : "joe" }

BSON ObjectID Specification

A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON. This is because they are compared byte-by-byte and we want to ensure a mostly increasing order. The format:

0123456 7891011
timemachine pidinc

  • TimeStamp. This is a unix style timestamp. It is a signed int representing the number of seconds before or after January 1st 1970 (UTC).
  • Machine. This is the first three bytes of the (md5) hash of the machine host name, or of the mac/network address, or the virtual machine id.
  • Pid. This is 2 bytes of the process id (or thread id) of the process generating the object id.
  • Increment. This is an ever incrementing value, or a random number if a counter can't be used in the language/runtime.

BSON ObjectIds can be any 12 byte binary string that is unique; however, the server itself and almost all drivers use the format above.

Sequence Numbers

Traditional databases often use increasing sequence numbers for primary keys. In MongoDB, the preferred approach is to use Object IDs instead. The concept is that in a very large cluster of machines, it is easier to create an object ID than have global, uniformly increasing sequence numbers.

However, sometimes you may want a sequence number. You can do this by creating "counter" documents and using the findAndModify Command.

function counter(name) {
    var ret = db.counters.findAndModify({query:{_id:name}, update:{$inc : {next:1}}, "new":true, upsert:true});
    // ret == { "_id" : "users", "next" : 1 }
    return ret.next;
}

db.users.insert({_id:counter("users"), name:"Sarah C."}) // _id : 1
db.users.insert({_id:counter("users"), name:"Bob D."}) // _id : 2
//repeat

See also "Insert if Not Present" section of the Atomic Operations page for another method.

UUIDs

The _id field can be of any type; however, it must be unique. Thus you can use UUIDs in the _id field instead of BSON ObjectIds (BSON ObjectIds are slightly smaller; they need not be worldwide unique, just unique for a single db cluster). When using UUIDs, your application must generate the UUID itself. Ideally the UUID is then stored in the [DOCS:BSON] type for efficiency – however you can also insert it as a hex string if you know space and speed will not be an issue for the use case.

See Also

Follow @mongodb

MongoDB Pittsburgh - May 15
MongoNYC - May 23
MongoDB Paris - Jun 14
MongoDB UK - Jun 20
MongoDC - June 26


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus