GridFS Specification

Introduction

GridFS is a storage specification for large objects in MongoDB. GridFS takes large objects and stores them as chunks of data as well as metadata. This document specifies the requirements of a GridFS implementation.

Normally, you need not worry about the details of the format -- for information on how to use GridFS, see Storing Files.

Specification

Storage Collections

GridFS uses two collections to store data :

  • files contains the object metadata
  • chunks contains the binary chunks with some additional accounting information

These are "subcollections" on a "root collection". By default this is fs so for a GridFS store, the collection would be considered to be fs, with the two parts fs.files and fs.chunks.

The root collection is allowed to vary, to provide for the ability for a user to segment large objects into subsets. For example, one might partition objects by type, such as pdf, contracts, {{videos}, etc.

However, fs is the default root collection for GridFS, and must be supported by any GridFS implementation in a way that it doesn't have to be specified to perform GridFS operations. For example:

/*
 * default root collection usage - must be supported
 */
GridFS myFS = new GridFS(myDatabase);              // returns a default GridFS (e.g. "fs" root collection)
myFS.storeFile(new File("/tmp/largething.mpg"));   // saves the file into the "fs" GridFS store

/*
 * specified root collection usage - optional
 */

GridFS myContracts = new GridFS(myDatabase, "contracts");             // returns a GridFS where  "contracts" is root
myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf"));  // retrieves object whose filename is "smithco"

Note that the above API is for demonstration purposes only - this spec does not (at this time) recommend any API. See individual driver documentation for API specifics.

files

The structure of the object metadata document is as follows :

{
    "_id" : <unspecified>,                  // unique ID for this file
    "filename" : data_string,               // human name for the file
    "contentType" : data_string,            // valid mime type for the object
    "length" : data_number,                 // size of the file in bytes
    "chunkSize" : data_number,              // size of each of the chunks.  Default is 256k
    "uploadDate" : data_date,               // date when object first stored
    "aliases" : data_array of data_string,  // optional array of alias strings
    "metadata" : data_object,               // anything the user wants to store
    "md5" : data_string                     // result of running the "filemd5" command on this file's chunks
}

Note that the _id field can be of any type at the discretion of the spec implementor.

chunks

The structure of the chunk document is as follows :

{
    "_id" : <unspecified>,                  // object id of the chunk in the _chunks collection
    "files_id" : <unspecified>,              // _id value of the owning {{files}} collection entry
    "n" : data_number,                      // "chunk number" - chunks are numbered in order, starting with 0
    "data" : data_binary (type 0x02),        // binary data for chunk
}

Notes:

  • The _id is whatever type you choose
  • The files_id must contain the value of the _id field for the "owning" files collection entry
Indexing

GridFS implementations should create an index on

{ files_id:1, n:1}

in the chunks collection, and should count on being able to retrieve chunks efficiently via

db.fs.chunks.find({file_id: myFileID}).orderby({n:1});

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

IF YOU HAVE A QUESTION, POST IT TO THE USER GROUP.

These pages are fine for comments, but for questions, your best bet will always be the MongoDB User Group.

blog comments powered by Disqus