Database Indexing in MongoDB

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. I will be explaining its use via MongoDB-Compass.

I worked on a game server that had spaghetti code, a lot. One among them had leaderboard service. We tested it with real 50 players. It was fine until our players count reached more than 20,000. And searching or updating data took longer than expected. We faced many lags. The player's info wasn't updated upon game end, simple query stacked up and took a long time to get executed. We viewed logs on the server, dropped down our rank calculating feature and still faced the same issues but to a lesser degree. So we tried adopting indexes in the database and implemented them. And it worked and worked quite well. So let's learn about them.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. - Wikipedia

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. I will be explaining its use via MongoDB-Compass.

Example

let's insert dummy data into MongoDB. In your MongoDB Compass, you can create a database and a collection within the database, then import the following JSON.

[
	{
		"name": "apple",
		"address": "here",
		"age": 26
	},
	{
		"name": "ball",
		"address": "there",
		"age": 20
	},
	{
		"name": "cat",
		"address": "where",
		"age": 26
	},
	{
		"name": "dog",
		"address": "here",
		"age": 20
	},
	{
		"name": "eagle",
		"address": "here",
		"age": 24
	},
	{
		"name": "frog",
		"address": "there",
		"age": 22
	}
]
JSON - example records to work with

An "_id" field is automatically added to your data by MongoDB. If you open the database in Compass and see under section "Indexes", you can see that "_id" is indexed by default. Now, let's move up to section "Explain Plan" and filter by replacing the object Id with one of the values from "_id" of your data. Mine is :

{_id: ObjectId('614a8f8dd2d090e4e9f88ea0')}

You can see that the output shows that 1 document was examined and returned based on index keys.

MongoDB Compass - searching by indexed objectId

It seems good. We had 6 data under the collection but it examined only 1. Now let's filter with "address" and see what happens for the filter

{address: "here"}
MongoDB Compass - searching by not indexed address field

Oops! It examined 6, which means all documents and returned 3 documents that matched our query.

Now let's move over to section "Indexes" and then create an index on the "address" field.

MongoDB Compass - creating an index

And run the same filter by address query from the "Explain Plan" section. That will result in following

MongoDB Compass - searching by indexed address

It examined 3 documents by index keys and then returned those three.

Actual Query Execution Time

When the address was not indexed, the query had to examine all the records in a database. We had only 6 records, so the performance is negligible. But you can populate the JSON with 50,000 records or more and perform the above steps and look for "Actual Query Execution Time" in response. You will notice a huge difference.
Then when we indexed the address, MongoDB internally created a table with values from the address and it's a pointer to the respective record. Thus when we re-ran the query after indexing, it searched for the index table for the "here" values and returned only the records that had the "here" address.

If we had 50,000 records and only those 3 records with address "here" and address was indexed, the search for "here" would search only 3 documents and return them. However, if it was not indexed, it would search through all 50,000 records and return those 3 documents.

More indexing options

  • index other columns
  • index multiple columns at once
  • index asc, desc, text, 2dsphere, etc
  • you can have other indexing options, read the docs for it

Advantages

  • their use in queries usually result in better performance
  • faster execution time for searching with WHERE on an indexed column
  • can be used for sorting, eliminates a post-fetch-sort operation
  • unique indexes guarantee uniquely identifiable records in the database

Disadvantages

  • takes up additional disk and memory space
  • use of too many indexes might reduce the performance
  • INSERT, UPDATE and DELETE becomes slower because on each operation the indexes must also be updated

Conclusion

This article gives a basic idea of how indexes might help you with performance. You should read the docs and more articles on indexes to move on with a large project. This little information saved my day, and it might save yours.

Happy coding!