|
The best way to store a tree usually depends on the operations you want to perform; see below for some different options. In practice, most developers find that one of the "Full Tree in Single Document", "Parent Links", and "Array of Ancestors" patterns works best. PatternsFull Tree in Single Document{
comments: [
{by: "mathias", text: "...", replies: []}
{by: "eliot", text: "...", replies: [
{by: "mike", text: "...", replies: []}
]}
]
}
Pros:
Cons:
Parent LinksStoring all nodes in a single collection, with each node having the id of its parent, is a simple solution. The biggest problem with this approach is getting an entire subtree requires several query turnarounds to the database (or use of db.eval). > t = db.tree1;
> t.find()
{ "_id" : 1 }
{ "_id" : 2, "parent" : 1 }
{ "_id" : 3, "parent" : 1 }
{ "_id" : 4, "parent" : 2 }
{ "_id" : 5, "parent" : 4 }
{ "_id" : 6, "parent" : 4 }
> // find children of node 4
> t.ensureIndex({parent:1})
> t.find( {parent : 4 } )
{ "_id" : 5, "parent" : 4 }
{ "_id" : 6, "parent" : 4 }
Child LinksAnother option is storing the ids of all of a node's children within each node's document. This approach is fairly limiting, although ok if no operations on entire subtrees are necessary. It may also be good for storing graphs where a node has multiple parents. > t = db.tree2
> t.find()
{ "_id" : 1, "children" : [ 2, 3 ] }
{ "_id" : 2 }
{ "_id" : 3, "children" : [ 4 ] }
{ "_id" : 4 }
> // find immediate children of node 3
> t.findOne({_id:3}).children
[ 4 ]
> // find immediate parent of node 3
> t.ensureIndex({children:1})
> t.find({children:3})
{ "_id" : 1, "children" : [ 2, 3 ] }
Array of AncestorsHere we store all the ancestors of a node in an array. This makes a query like "get all descendents of x" fast and easy. > t = db.mytree;
> t.find()
{ "_id" : "a" }
{ "_id" : "b", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "f", "ancestors" : [ "a", "e" ], "parent" : "e" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> t.ensureIndex( { ancestors : 1 } )
> // find all descendents of b:
> t.find( { ancestors : 'b' })
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> // get all ancestors of f:
> anc = db.mytree.findOne({_id:'f'}).ancestors
[ "a", "e" ]
> db.mytree.find( { _id : { $in : anc } } )
{ "_id" : "a" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
ensureIndex and MongoDB's multikey feature makes the above queries efficient. In addition to the ancestors array, we also stored the direct parent in the node to make it easy to find the node's immediate parent when that is necessary. Materialized Paths (Full Path in Each Node)Materialized paths make certain query options on trees easy. We store the full path to the location of a document in the tree within each node. Usually the "array of ancestors" approach above works just as well, and is easier as one doesn't have to deal with string building, regular expressions, and escaping of characters. (Theoretically, materialized paths will be slightly faster.) The best way to do this with MongoDB is to store the path as a string and then use regex queries. Simple regex expressions beginning with "^" can be efficiently executed. As the path is a string, you will need to pick a delimiter character -- we use ',' below. For example: > t = db.tree test.tree > // get entire tree -- we use sort() to make the order nice > t.find().sort({path:1}) { "_id" : "a", "path" : "a," } { "_id" : "b", "path" : "a,b," } { "_id" : "c", "path" : "a,b,c," } { "_id" : "d", "path" : "a,b,d," } { "_id" : "g", "path" : "a,b,g," } { "_id" : "e", "path" : "a,e," } { "_id" : "f", "path" : "a,e,f," } { "_id" : "g", "path" : "a,b,g," } > t.ensureIndex( {path:1} ) > // find the node 'b' and all its descendents: > t.find( { path : /^a,b,/ } ) { "_id" : "b", "path" : "a,b," } { "_id" : "c", "path" : "a,b,c," } { "_id" : "d", "path" : "a,b,d," } { "_id" : "g", "path" : "a,b,g," } > // find the node 'b' and its descendents, where path to 'b' is not already known: > nodeb = t.findOne( { _id : "b" } ) { "_id" : "b", "path" : "a,b," } > t.find( { path : new RegExp("^" + nodeb.path) } ) { "_id" : "b", "path" : "a,b," } { "_id" : "c", "path" : "a,b,c," } { "_id" : "d", "path" : "a,b,d," } { "_id" : "g", "path" : "a,b,g," } Ruby example: http://github.com/banker/newsmonger/blob/master/app/models/comment.rb acts_as_nested_setSee http://api.rubyonrails.org/classes/ActiveRecord/Acts/NestedSet/ClassMethods.html This pattern is best for datasets that rarely change as modifications can require changes to many documents. Use CasesFind Nodes by a Partial PathSuppose we want to find a given node in a tree, given some path through a portion of the tree to that node, and then get back that node, and perhaps also everything below it. With a materialized paths approach we can do the above. The main thing that needs tweaking is to make the operation fast if there is a path "a..b..c..d..e" to a document and we want to find documents with a path "..b..c..d..". If we are starting from the very top it is easy (and described above in the materialized paths section). However here we aren't starting at the top. One approach is to use a combination of materialized path plus an array of the node's ancestors, something like: { path : ",a,b,c,d,e,",
ancestor : ['a','b','c','d','e'] }
We could index on ancestors which will create a multikey index. Then we would do a query like the following to find nodes on path "...b,c,d..." with some efficiency: find({ path : /,b,c,d,/, ancestor : 'd', <more_query_expressions_optionally> })
In the above the index on ancestor would be used and only docs from 'd' down need be inspected. The following could be tried which might be even better depending on how smart the query optimizer is: find( { path : /,b,c,d,/, ancestor : { $all : ['a','d'] }, ... } )
See Also
|

PLEASE POST QUESTIONS IN THE FORUMS: http://groups.google.com/group/mongodb-user. Post tips and clarifications here.
blog comments powered by Disqus