Query Optimizer

The MongoDB query optimizer generates query plans for each query and then executes the plan to return results. Thus, MongoDB supports ad hoc queries much like say, Oracle or MySQL.

The database uses an interesting approach to query optimization. Traditional approaches (which tend to be cost-based and statistical) are not used, as these approaches have a couple of potential issues.

First, the optimizer might consistently pick a bad query plan.  For example, there might be correlations in the data of which the optimizer is unaware.  In a situation like this, the developer might use a query hint.

Also with the traditional approach, query plans can change in production with negative results.  No one thinks rolling out new code without testing is a good idea.  Yet often in a production system a query plan can change as the statistics in the database change on the underlying data.  The query plan in effect may be a plan that never was invoked in QA.  If it is slower than it should be, the application could experience an outage.

The MongoDB query optimizer is different.  It is not cost based -- it does not model the cost of various queries.  Instead, the optimizer simply tries different query plans and learn which ones work well.  Of course, when the system tries a really bad plan, it may take an extremely long time to run.  To solve this, when testing new plans, MongoDB executes multiple query plans in parallel.  As soon as one finishes, it terminates the other executions, and the system has learned which plan is good. This works particularly well given the system is non-relational, which makes the space of possible query plans much smaller (as there are no joins).

Sometimes a plan which was working well can work poorly -- for example if the data in the database has changed, or if the parameter values to the query are different.  In this case, if the query seems to be taking longer than usual, the database will once again run the query in parallel to try different plans.

This approach adds a little overhead, but has the advantage of being much better at worst-case performance.

Testing of queries repeats after 1,000 operations and also after certain manipulations of a collection occur (such as adding an index).

Query plan selection is based on a "query pattern". For example the query

{ x : 123 } 

is treated as the same pattern as { x : <anothervalue> }.

See Also


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus