Indexing MongoDB collections using the ESR rule
Database indexes are an important piece when building your product. In this article, you will learn the main components to correctly index your MongoDB collection based on the queries executed in your app.
PLAYING WITH INDEXES
When creating indexes to cover queries, our goal is to improve the performance of the queries being executed. Let's suppose that we have a collection called products with 1 million documents with the following schema:
{ name: "Fish", category: "food", price: 10, reviewStars: 5, }
Running a query without any indexes:
db.products .find({ name: 'Fish', reviewStars: { $gt: 3 }, }) .sort({ price: 1 })
The result is bad, as imagined. The query took 199ms, MongoDB needed to examine the whole collection (1M documents/COLLSCAN) and then execute an in-memory sort. Couldn’t be worse.
Creating a naive index:
db.products.createIndex({ reviewStars: 1, name: 1, price: 1 });
When executing the query again, we notice that now the results are better, the query took 61ms and only examined 13964, returning 13960. Note that now we are also examining keys instead of documents, which means that we are doing an index scan instead of a collection one.
EQUALITIES FIRST
Creating a new index placing the equality field before the range one:
db.products.createIndex({ name: 1, reviewStars: 1, price: 1 });
Better than our previous result. The query executed in a similar time, but It gave us a 1:1 return ratio on our query (keys examined = documents returned). This scenario is due to the fact that equalities are more selective than ranges.
An important note is that if you have multiple equalities fields, it won’t matter what you put first due to the disposition of B-Tree indexes. The number of combinations will be the same regardless of the keys order.
There is one problem remaining: We are still doing an in-memory sort.
SORT BEFORE RANGE
On the previous run, we saw that we are still sorting the returned documents in memory. This happens when one of the predicate fields is a range.
Ranges before sort steps lead MongoDB to perform a blocking-sort since we first need to filter the range, to only then be able to sort the results returned from the previous step.
When placing the sort step before the range, we make sure that the results will be in the correct order due to the elements disposition inside the index, and then we execute the range filter. This may cause more keys to be examined, but removing blocking sorts will generally provide more performance.
Creating the correct index:
db.products.createIndex({ name: 1, price: 1, reviewStars: 1 });
As we can see, the number of keys examined is bigger than in the previous run. However, due to the blocking-sort step removal, we have a much better query execution time of 32ms.
If you can only extract one information from this article, it should be: Equalities → Sort → Range.