|
Mongo includes utility functions which provide server-side count, distinct, and group by operations. More advanced aggregate functions can be crafted using MapReduce. Aggregation Frameworkv2.2+ This is a new module in the v2.2 release. You can test it via v2.1.0 in the 2.1 development (unstable) branch. Aggregation Framework Countcount() returns the number of objects in a collection or matching a query. If a document selector is provided, only the number of matching documents will be returned. size() is like count() but takes into consideration any limit() or skip() specified for the query. db.collection.count(selector); For example: print( "# of objects: " + db.mycollection.count() ); print( db.mycollection.count( {active:true} ); count is faster if an index exists for the condition in the selector. For example, to make the count on active fast, invoke db.mycollection.ensureIndex( {active:1} );
DistinctThe distinct command returns returns a list of distinct values for the given key across a collection. Command is of the form: { distinct : <collection_name>, key : <key>[, query : <query>] }
although many drivers have a helper function for distinct. > db.addresses.insert({"zip-code": 10010})
> db.addresses.insert({"zip-code": 10010})
> db.addresses.insert({"zip-code": 99701})
> // shell helper:
> db.addresses.distinct("zip-code");
[ 10010, 99701 ]
> // running as a command manually:
> db.runCommand( { distinct: 'addresses', key: 'zip-code' } )
{ "values" : [ 10010, 99701 ], "ok" : 1 }
distinct may also reference a nested key: > db.comments.save({"user": {"points": 25}})
> db.comments.save({"user": {"points": 31}})
> db.comments.save({"user": {"points": 25}})
> db.comments.distinct("user.points");
[ 25, 31 ]
You can add an optional query parameter to distinct as well > db.address.distinct( "zip-code" , { age : 30 } )
Note: the distinct command results are returned as a single BSON object. If the results could be large (> max document size – 4/16MB ), use map/reduce instead.
GroupNote: currently one must use map/reduce instead of group() in sharded MongoDB configurations. group returns an array of grouped items. The command is similar to SQL's group by. The SQL statement select a,b,sum(c) csum from coll where active=1 group by a,b corresponds to the following in MongoDB: db.coll.group(
{key: { a:true, b:true },
cond: { active:1 },
reduce: function(obj,prev) { prev.csum += obj.c; },
initial: { csum: 0 }
});
Note: the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception. For larger grouping operations without limits, please use map/reduce . group takes a single object parameter containing the following fields:
To order the grouped data, simply sort it client-side upon return. The following example is an implementation of count() using group(). function gcount(collection, condition) {
var res =
db[collection].group(
{ key: {},
initial: {count: 0},
reduce: function(obj,prev){ prev.count++;},
cond: condition } );
// group() returns an array of grouped items. here, there will be a single
// item, as key is {}.
return res[0] ? res[0].count : 0;
}
ExamplesThe examples assume data like this: { domain: "www.mongodb.org"
, invoked_at: {d:"2009-11-03", t:"17:14:05"}
, response_time: 0.05
, http_action: "GET /display/DOCS/Aggregation"
}
Show me stats for each http_action in November 2009: db.test.group(
{ cond: {"invoked_at.d": {$gte: "2009-11", $lt: "2009-12"}}
, key: {http_action: true}
, initial: {count: 0, total_time:0}
, reduce: function(doc, out){ out.count++; out.total_time+=doc.response_time }
, finalize: function(out){ out.avg_time = out.total_time / out.count }
} );
[
{
"http_action" : "GET /display/DOCS/Aggregation",
"count" : 1,
"total_time" : 0.05,
"avg_time" : 0.05
}
]
Show me stats for each domain for each day in November 2009: db.test.group(
{ cond: {"invoked_at.d": {$gte: "2009-11", $lt: "2009-12"}}
, key: {domain: true, invoked_at.d: true}
, initial: {count: 0, total_time:0}
, reduce: function(doc, out){ out.count++; out.total_time+=doc.response_time }
, finalize: function(out){ out.avg_time = out.total_time / out.count }
} );
[
{
"http_action" : "GET /display/DOCS/Aggregation",
"count" : 1,
"total_time" : 0.05,
"avg_time" : 0.05
}
]
Using Group from Various LanguagesSome language drivers provide a group helper function. For those that don't, one can manually issue the db command for group. Here's an example using the Mongo shell syntax: > db.foo.find()
{"_id" : ObjectId( "4a92af2db3d09cb83d985f6f") , "x" : 1}
{"_id" : ObjectId( "4a92af2fb3d09cb83d985f70") , "x" : 3}
{"_id" : ObjectId( "4a92afdab3d09cb83d985f71") , "x" : 3}
> db.$cmd.findOne({group : {
... ns : "foo",
... cond : {},
... key : {x : 1},
... initial : {count : 0},
... $reduce : function(obj,prev){prev.count++;}}})
{"retval" : [{"x" : 1 , "count" : 1},{"x" : 3 , "count" : 2}] , "count" : 3 , "keys" : 2 , "ok" : 1}
If you use the database command with keyf (instead of key) it must be prefixed with a $. For example: db.$cmd.findOne({group : {
... ns : "foo",
... $keyf : function(doc) { return {"x" : doc.x}; },
... initial : {count : 0},
... $reduce : function(obj,prev) { prev.count++; }}})
Map/ReduceMongoDB provides a MapReduce facility for more advanced aggregation needs. CouchDB users: please note that basic queries in MongoDB do not use map/reduce. See Also
|

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus