|
We get a lot of questions about indexing. Here we provide answers to a number of these. There are a couple of points to keep in mind, though. First, indexes in MongoDB work quite similarly to indexes in MySQL, and thus many of the techniques for building efficient indexes in MySQL apply to MongoDB. Second, and even more importantly, know that advice on indexing can only take you so far. The best indexes for your application should always be based on a number of important factors, including the kinds of queries you expect, the ratio of reads to writes, and even the amount of free memory on your system. This means that the best strategy for designing indexes will always be to profile a variety of index configurations with data sets similar to the ones you'll be running in production, and see which perform best. There's no substitute for good empirical analyses. Note: if you're brand new to indexing, you may want to read this introductory article first.
Indexing StrategiesHere are some general principles for building smart indexes. Create indexes to match your queries.If you only query on a single key, then a single-key index will do. For instance, maybe you're searching for a blog post's slug: db.posts.find({ slug : 'state-of-mongodb-2010' })
In this case, a unique index on a single key is best:
db.posts.ensureIndex({ slug: 1 }, {unique: true});
However, it's common to query on multiple keys and to sort the results. For these situations, compound indexes are best. Here's an example for querying the latest comments with a 'mongodb' tag: db.comments.find({ tags : 'mongodb'}).sort({ created_at : -1 });
And here's the proper index: db.comments.ensureIndex({tags : 1, created_at : -1});
Note that if we wanted to sort by created_at ascending, this index would be less effective. One index per query.It's sometimes thought that queries on multiple keys can use multiple indexes; this is not the case with MongoDB. If you have a query that selects on multiple keys, and you want that query to use an index efficiently, then a compound-key index is necessary. Make sure your indexes can fit in RAM.The shell provides a command for returning the total index size on a given collection: db.comments.totalIndexSize(); 65443 If your queries seem sluggish, you should verify that your indexes are small enough to fit in RAM. For instance, if you're running on 4GB RAM and you have 3GB of indexes, then your indexes probably aren't fitting in RAM. You may need to add RAM and/or verify that all the indexes you've created are actually being used. Be careful about single-key indexes with low selectivity.Suppose you have a field called 'status' where the possible values are 'new' and 'processeed'. If you add an index on 'status' then you've created a low-selectivity index, meaning that the index isn't going to be very helpful in locating records and might just be be taking up space. A better strategy, depending on your queries, of course, would be to create a compound index that includes the low-selectivity field. For instance, you could have a compound-key index on 'status' and 'created_at.' Another option, again depending on your use case, might be to use separate collections, one for each status. As with all the advice here, experimentation and benchmarks will help you choose the best approach. Use explain.MongoDB includes an explain command for determining how your queries are being processed and, in particular, whether they're using an index. explain can be used from of the drivers and also from the shell:
db.comments.find({ tags : 'mongodb'}).sort({ created_at : -1 }).explain();
This will return lots of useful information, including the number of items scanned, the time the query takes to process in milliseconds, which indexes the query optimizer tried, and the index ultimately used. If you've never used explain, now's the time to start. Understanding explain's output.There are three main fields to look for when examining the explain command's output:
Pay attention to the read/write ratio of your application.This is important because, whenever you add an index, you add overhead to all insert, update, and delete operations on the given collection. If your application is read-heavy, as are most web applications, the additional indexes are usually a good thing. But if your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a small write-performance penalty. In general, don't be cavalier about adding indexes. Indexes should be added to complement your queries. Always have a good reason for adding a new index, and make sure you've benchmarked alternative strategies. Indexing PropertiesHere are a few properties of compound indexes worth keeping in mind (Thanks to Doug Green and Karoly Negyesi for their help on this). These examples assume a compound index of three fields: a, b, c. So our index creation would look like this:
db.foo.ensureIndex({a: 1, b: 1, c: 1})
Here's some advice on using an index like this:
1. The sort column must be the last column used in the index.Good:
Bad:
2. The range query must also be the last column in an index. This is an axiom of 1 above.Good:
Bad:
3. Only use a range query or sort on one column.Good:
Bad:
4. Conserve indexes by re-ordering columns used on equality (non-range) queries.Imagine you have the following two queries:
A single index defined on a, b, c, and d can be used for both queries. 5. MongoDB's $ne or $nin operator's aren't efficient with indexes.
FAQI've started building an index, and the database has stopped responding. What's going on? What do I do?Building an index can be an IO-intensive operation, especially you have a large collection. This is true on any database system that supports secondary indexes, including MySQL. If you'll need to build an index on a large collection in the future, you'll probably want to consider building the index in the background, a feature available beginning with 1.3.2. See the docs on background indexing for more info. As for the long-building index, you only have a few options. You can either wait for the index to finish building or kill the current operation (see killOp()). If you choose the latter, the partial index will be deleted. I'm using $ne or $nin in a query, and while it uses the index, it's still slow. What's happening?The problem with $ne and $nin is that much of an index will match queries like these. If you need to use $nin, it's often best to make sure that an additional, more selective criterion is part of the query. |

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus