Tips on troubleshooting map/reduce.
Troubleshooting the map function
We can troubleshoot the map function in the shell by defining a test emit function in the shell and having it print out trace information.
For example suppose we have some data:
> db.articles.find()
{ "_id" : 123, "author" : "joe", "text" : "hello", "votes" : [
{
"who" : "john",
"vote" : 1
},
{
"who" : "jane",
"vote" : 1
},
{
"who" : "vince",
"vote" : -1
}
] }
{ "_id" : 127, "author" : "sri", "text" : "It was...", "votes" : [
{ "who" : "jane",
"vote" : 2
}
] }
And we have written a map function:
function map() {
this.votes.forEach( function(x){emit(x.who,1);} );
}
It would be nice to visualize the output of this function. We can do this in the shell by defining a client side debug version of emit():
function emit(k, v) {
print("emit");
print(" k:" + k + " v:" + tojson(v));
}
For example, we could run the emit on a single document from the collection:
> x = db.articles.findOne(); > map.apply(x); emit
k:john v:1
emit
k:jane v:1
emit
k:vince v:1
Additionally we could apply the map on several objects:
> for( var c = db.articles.find(); c.hasNext(); ) {
... var doc = c.next();
... print("document _id=" + tojson(doc._id));
... map.apply( doc );
... print();
... }
document _id=123
emit
k:john v:1
emit
k:jane v:1
emit
k:vince v:1
document _id=127
emit
k:jane v:1
After verifying the emits from map are as expected, we write a reduce function and run the real job:
> function reduce(k, vals) {
... var sum = 0;
... for (var i in vals) {
... sum += vals[i];
... }
... return sum;
...}
>
> db.articles.mapReduce(map,reduce,"out");
{
"result" : "out",
"timeMillis" : 62,
"counts" : {
"input" : 2,
"emit" : 4,
"output" : 3
},
"ok" : 1,
}
>
> db.out.find()
{ "_id" : "jane", "value" : 2 }
{ "_id" : "john", "value" : 1 }
{ "_id" : "vince", "value" : 1 }
Troubleshooting the reduce function
When troubleshooting the reduce function, problems usually crop up in two places:
- emit() outputting different values than reduce
- reduce( k, [A, B] ) != reduce( k, [B, A] )
Fortunately, it is easy to test for both of these cases directly from the shell.
 | When performing a reduce, there is no guarantee on the order of incoming values. |
#1 - Test value format
Run a reduce on a sample key / value from emit. Wrap the value in an array construct. The output of the reduce should have the same format at the input. In most cases, it should actually be the same.
> reduce( { name : 'joe' }, [ { votes : 1 } ] )
{ votes : 1 }
The same can also be tested with two values. The format should still be the same.
> reduce( { name : 'joe' }, [ { votes : 1 }, { votes : 3 } ] )
{ votes : 4 }
#2 - Test Commutativity / Idempotence
Again, two simple tests that should work.
Order of the objects should not matter:
> reduce( { name : 'joe' }, [ { votes : 1 }, { votes : 3 } ] )
{ votes : 4 }
> reduce( { name : 'joe' }, [ { votes : 3 }, { votes : 1 } ] )
{ votes : 4 }
Reduce output can be re-reduced:
> reduce( { name : 'joe' }, [
{ votes : 1 },
reduce ( { name : 'joe' }, [ { votes : 3 } ] )
] )
{ votes : 4 }
PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus