Troubleshooting MapReduce

Tips on troubleshooting map/reduce.

Troubleshooting the map function

We can troubleshoot the map function in the shell by defining a test emit function in the shell and having it print out trace information.

For example suppose we have some data:

> db.articles.find()
{ "_id" : 123, "author" : "joe", "text" : "hello", "votes" : [
        {
                "who" : "john",
                "vote" : 1
        },
        {
                "who" : "jane",
                "vote" : 1
        },
        {
                "who" : "vince",
                "vote" : -1
        }
] }
{ "_id" : 127, "author" : "sri", "text" : "It was...", "votes" : [ 
        { "who" : "jane", 
          "vote" : 2 
        } 
] }

And we have written a map function:

function map() {
    this.votes.forEach( function(x){emit(x.who,1);} );
}

It would be nice to visualize the output of this function. We can do this in the shell by defining a client side debug version of emit():

function emit(k, v) {
    print("emit");
    print("  k:" + k + " v:" + tojson(v));
}

For example, we could run the emit on a single document from the collection:

> x = db.articles.findOne(); // grab an object
> map.apply(x); // call our map function, client side, with x as 'this'
emit
  k:john v:1
emit
  k:jane v:1
emit
  k:vince v:1

Additionally we could apply the map on several objects:

> for( var c = db.articles.find(); c.hasNext(); ) {
...     var doc = c.next();
...     print("document _id=" + tojson(doc._id));
...     map.apply( doc );
...     print();
... }
document _id=123
emit
  k:john v:1
emit
  k:jane v:1
emit
  k:vince v:1

document _id=127
emit
  k:jane v:1

After verifying the emits from map are as expected, we write a reduce function and run the real job:

> function reduce(k, vals) {
...    var sum = 0;
...    for (var i in vals) {
...        sum += vals[i];
...    }
...    return sum;
...}
>
> db.articles.mapReduce(map,reduce,"out");
{
        "result" : "out",
        "timeMillis" : 62,
        "counts" : {
                "input" : 2,
                "emit" : 4,
                "output" : 3
        },
        "ok" : 1,
}
>
> db.out.find()
{ "_id" : "jane", "value" : 2 }
{ "_id" : "john", "value" : 1 }
{ "_id" : "vince", "value" : 1 }

Troubleshooting the reduce function

When troubleshooting the reduce function, problems usually crop up in two places:

  1. emit() outputting different values than reduce
  2. reduce( k, [A, B] ) != reduce( k, [B, A] )

Fortunately, it is easy to test for both of these cases directly from the shell.

When performing a reduce, there is no guarantee on the order of incoming values.

#1 - Test value format

Run a reduce on a sample key / value from emit. Wrap the value in an array construct. The output of the reduce should have the same format at the input. In most cases, it should actually be the same.

> reduce( { name : 'joe' }, [ { votes : 1 } ] )
{ votes : 1 }

The same can also be tested with two values. The format should still be the same.

> reduce( { name : 'joe' }, [ { votes : 1 }, { votes : 3 } ] )
{ votes : 4 }

#2 - Test Commutativity / Idempotence

Again, two simple tests that should work.

Order of the objects should not matter:

> reduce( { name : 'joe' }, [ { votes : 1 }, { votes : 3 } ] )
{ votes : 4 }
> reduce( { name : 'joe' }, [ { votes : 3 }, { votes : 1 } ] )
{ votes : 4 }

Reduce output can be re-reduced:

> reduce( { name : 'joe' }, [ 
    { votes : 1 }, 
    reduce ( { name : 'joe' }, [ { votes : 3 } ] )
  ] )
{ votes : 4 }

Follow @mongodb

MongoDB Pittsburgh - May 15
MongoNYC - May 23
MongoDB Paris - Jun 14
MongoDB UK - Jun 20
MongoDC - June 26


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus