BSON

Introduction

BSON is the data storage format for "documents" – or "objects" – in MongoDB.  BSON is a binary-encoded serialization of JSON-like documents, and, like JSON, supports the embedding of objects and arrays within other objects and arrays.

Although BSON stands for "Binary JSON," the format contains extensions for representing data types not included in the JSON specification.  For instance, BSON supports BinData and Date data formats.

BSON at first seems BLOB-like, but there exists an important difference: the Mongo database understands BSON internals. This means that MongoDB can "reach inside" BSON objects, even nested ones.  Among other things, this allows MongoDB to build indexes and match objects against query expressions on both top-level and nested BSON keys.

A General-Purpose Data Format

MongoDB is the first major application to use BSON; however, BSON was designed as a general-purpose data format, usable for many data-marshalling problems.  Developers requiring a rich, efficient data-exchange format should seriously consider BSON.

BSON libraries are available for most languages.  These libraries are currently bundled with the MongoDB client drivers, but work is under way to make the various BSON modules independent. 

See also: the BSON blog post.

Language-Specific Examples

C

See http://github.com/mongodb/mongo-c-driver/blob/master/src/bson.h

C++
BSONObj p = BSON( "name" << "Joe" << "age" << 33 );

See the BSON section of the C++ Tutorial for more information.

PHP

The PHP driver includes bson_encode and bson_decode functions. bson_encode takes any PHP type and serializes it, returning a string of bytes:

$bson = bson_encode(null);
$bson = bson_encode(true);
$bson = bson_encode(4);
$bson = bson_encode("hello, world");
$bson = bson_encode(array("foo" => "bar"));
$bson = bson_encode(new MongoDate());

Mongo-specific objects (MongoId, MongoDate, MongoRegex, MongoCode) will be encoded in their respective BSON formats. For other objects, it will create a BSON representation with the key/value pairs you would get by running for ($object as $key => $value).

bson_decode takes a string representing a BSON object and parses it into an associative array.

Python
>>> from pymongo.bson import BSON
>>> bson_string = BSON.from_dict({"hello": "world"})
>>> bson_string
'\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
>>> bson_string.to_dict()
{u'hello': u'world'}

PyMongo also supports "ordered dictionaries" through the pymongo.son module. The BSON class can handle SON instances using the same methods you would use for regular dictionaries.

Ruby
irb(main):013:0> require 'rubygems'
irb(main):014:0> require 'mongo'
irb(main):015:0> document = {:title => 'Intro', :date => Time.now, :tags => ['MongoDB', 'databases', 'nosql']}
=> {:tags=>["MongoDB", "databases", "nosql"], :date=>Sat Dec 19 10:56:28 -0800 2009, :title=>"Intro"}
irb(main):016:0> sdoc = BSON.serialize(document)
=> #<ByteBuffer:0x101372880 @double_pack_order="E", @cursor=92, @int_pack_order="V", @buf="\\\000\000\000\004tags\0002\000\000\000\0020\000\b\000\000\000MongoDB\000\0021\000\n\000\000\000databases\000\0022\000\006\000\000\000nosql\000\000\tdate\000XEL\250%\001\000\000\002title\000\006\000\000\000Intro\000\000", @order=:little_endian>
irb(main):017:0> BSON.deserialize(sdoc)
=> {"tags"=>["MongoDB", "databases", "nosql"], "date"=>Sat Dec 19 18:56:28 UTC 2009, "title"=>"Intro"}
irb(main):018:0> 

The BSON class also supports ordered hashes. Simply construct your documents using the OrderedHash class, also found in the MongoDB Ruby Driver.  Examples with all of the supported BSON types can be found in the BSON unit tests.

BSON Document Format

BSON is a binary message format in which zero or more key-value pairs are stored as a single entity. We call this entity a "BSON document".

All BSON data must be serialized in little-endian format.

Data types used in the grammar
Type Size
byte 1 byte (8-bits)
int32 4 byte, 32-bit signed integer
int64 8 byte, 64-bit signed integer
double 8 byte, 64-bit IEEE-XX floating point number
BSON Grammar
Entity Definition Comment
bson_object obj_size element* eoo  
obj_size int32 total size in bytes of the BSON object including 4 bytes for this field
element element_type element_name element_data  
eoo NULL  
element_type byte One of the data_... types listed below
element_name cstring See note on element_name
element_data (see below) Specific data for the type - see table below
cstring byte* NULL zero or more UTF-8-encoded characters ended by NULL
NULL 0x00 single byte of value 0
VOID   Nothing. Not 0. Zero bits.
Notes
element_name Element names available to user documents are constrained. Please be sure to understand the limitations as outlined below in XXXX.
Element Data Types

Note - for data representations, semantic meaning for a data type is indicated in within the brackets and after a hyphen and is described in the note for that type

Element Type Binary Value Data Representation Comments
data_number 1 double  
data_string 2 int32 cstring The int32 is the # bytes following (# of bytes in string + 1 for terminating NULL)
data_object 3 bson_object  
data_array 4 bson_object See note on data_array
data_binary 5 int32 byte byte[] The first int32 is the # of bytes following the byte subtype. Please see note on data_binary
data_undefined 6 VOID Conceptually equivalent to Javascript undefined.  Deprecated.
data_oid 7 byte[12] 12 byte object id.
data_boolean 8 byte legal values: 0x00 -> false, 0x01 -> true
data_date 9 int64 value: milliseconds since epoch (e.g. new Date.getTime())
data_null 10 VOID Mapped to Null in programming languages which have a Null value or type.  Conceptually equivalent to Javascript null.
data_regex 11 cstring cstring first ctring is regex expression, second cstring are regex options See note on data_regex
data_ref 12 int32 cstring byte[12] Deprecated.  Please use a subobject instead -- see page DB Ref.
The int32 is the length in bytes of the cstring.
The cstring is the namespace: full collection name.
The byte array is a 12 byte object id. See note on data_oid.
data_code 13 int32 cstring The int32 is the # bytes following (# of bytes in string + 1 for terminating NULL) and then the code as cstring. data_code should be supported in BSON encoders/decoders, but has been deprecated in favor of data_code_w_scope
data_symbol 14 int32 cstring same as data_string but for languages with distinct symbol type
data_code_w_scope 15 int32 int32 cstring bson_object The first int32 is the total # of bytes (size of cstring + size of bson_object + 8 for the two int32s). The second int 32 is the size of the cstring (# of bytes in string + 1 for terminating NULL). The cstring is the code. The bson_object is an object mapping identifiers to values, representing the scope in which the code should be evaluated.
data_int 16 int32  
data_timestamp 17 int64 Special internal type used by MongoDB replication and sharding. First 4 are a timestamp, next 4 are an incremented field.  Saving a zero value for data_timestamp has special semantics.
data_long 18 int64 64 bit integer
data_min_key -1 VOID Special type which compares lower than all other possible BSON element values.  See Comparing Types and Splitting Shards
data_max_key 127 VOID Special type which compares greater than all other possible BSON element values.  See Comparing Types and Splitting Shards
Notes
All strings are UTF-8.
data_array The data for a data_array type is a normal BSON object, with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ["blue", "red", "green" ] is expressed as as BSON object (using JSON notation for this example:
{ "0" : "blue", "1" : "red", "2" : "green" }

The keys in the BSON object must be in ascending numerical order.
data_regex Option characters must be stored in alphabetical order. Only options "i", "m", and "x" are supported by the database. However, it's possible for users to use BSON to store their own regex. How drivers handle non-standard options in user documents is left to the drivers. We recommend that they are simply ignored, and if possible, preserved in the document on future saves. The following table documents options characters and what they mean, so that regexs can be interpreted correctly across drivers:

Option Character Meaning
i case-insensitive matching
m multiline: "^" and "$" match the beginning / end of each line as well as the whole string
x verbose / comments: the pattern can contain comments
l (lowercase L) locale: \w, \W, etc. depend on the current locale
s dotall: the "." character matches everything, including newlines
u unicode: \w, \W, etc. match unicode

data_binary The data_binary type has the following subtypes defined:

Subtype Code Structure Comment
0x01 unknown function
0x02 int32 byte[] The int32 is the number of bytes in the following byte array.
0x03 unknown UUID
0x05 unknown MD5
0x80 user defined User defined

Note that 0x02 is the commonly used "binary" type for carrying general binary data. 0x80 is user defined, and can be anything.

Implicit Document Types

The mongo wire protocol uses BSON documents for three things:

  1. User Document : This is the regular document that the database stores. These are the BSON documents that are sent to the database via the INSERT operation. User documents have limitations on the "element name" space due to the usage of special characters in the JSON-like query language.
    1. A user document element name cannot begin with "$".
    2. A user document element name cannot have a "." in the name.
    3. The element name "_id" is reserved for use as a primary key id, but you can store anything that is unique in that field. ($$$ GMJ : and it probably would be prudent to avoid starting any element name with "_".)
    4. The element name "query" cannot currently be used.
      The database expects that drivers will prevent users from creating documents that violate these constraints.
  2. "Selector" Documents : Selector documents (or selectors) are BSON documents that are used in QUERY, DELETE and UPDATE operations. They are used by these operations to match against documents. Selector objects have no limitations on the "element name" space, as they must be able to supply special "marker" elements, like "$where" and the special "command" operations.
  3. "Modifier" Documents : Documents that contain 'modifier actions' that modify user documents in the case of an update (see Updating)
  4. Return error messages : TODO

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Comments (2)

IF YOU HAVE A QUESTION, POST IT TO THE USER GROUP.

These pages are fine for comments, but for questions, your best bet will always be the MongoDB User Group.

 
  1. Dec 19

    Anonymous says:

    The ruby example failed with method not found for serialize on @bson. It work...

    The ruby example failed with method not found for serialize on @bson.

    It works when sending serialize/deserialize to BSON, examples in the tests: test_bson.rb

     def test_string
        doc = {'doc' => 'hello, world'}
        bson = bson = BSON.serialize(doc)
        assert_equal doc, BSON.deserialize(bson)
      end

    New example

    irb(main):013:0> require 'rubygems'
    irb(main):014:0> require 'mongo'
    irb(main):015:0> document = {:title => 'Intro', :date => Time.now, :tags => ['MongoDB', 'databases', 'nosql']}
    => {:tags=>["MongoDB", "databases", "nosql"], :date=>Sat Dec 19 10:56:28 -0800 2009, :title=>"Intro"}
    irb(main):016:0> sdoc = BSON.serialize(document)
    => #<ByteBuffer:0x101372880 @double_pack_order="E", @cursor=92, @int_pack_order="V", @buf="\\\000\000\000\004tags\0002\000\000\000\0020\000\b\000\000\000MongoDB\000\0021\000\n\000\000\000databases\000\0022\000\006\000\000\000nosql\000\000\tdate\000XEL\250%\001\000\000\002title\000\006\000\000\000Intro\000\000", @order=:little_endian>
    irb(main):017:0> BSON.deserialize(sdoc)
    => {"tags"=>["MongoDB", "databases", "nosql"], "date"=>Sat Dec 19 18:56:28 UTC 2009, "title"=>"Intro"}
    irb(main):018:0> 
    

    Adam

    1. Dec 19

      Mike Dirolf says:

      thanks - updated the example to match yours

      thanks - updated the example to match yours

Add Comment