The bigger and more complex a database gets, the more time it takes for a query to complete. That is – if you don’t implement an index.
An index drastically reduces the number of records and rows that have to be examined. It saves time, streamlines queries, and improves performance for both end users and the back end—everyone’s happy.
Here, we’ll dive into how database indexes work and how to leverage them for silk-smooth queries.
How Does Indexing Work in Database Management?
Before we explain how indexes function, we need to look at how queries generally run.
Let’s imagine a database table that lists people attending a birthday party. The table looks something like this:
Guest_Name | Guest_Id |
Robert | 4 |
Diana | 6 |
George | 5 |
Sanjay | 1 |
Jane | 3 |
Lee | 2 |
Say you’re curious about how many female guests are coming. However, the table isn’t sorted by gender but by alphabetical order of the names. In this case, you would likely need to check each guest’s name and guess their gender.
This isn’t terribly performant—just imagine if there were a thousand guests to sift through!
But we can speed up this effort by creating an index that sorts these guests by gender:
Guest_Name | Guest_Id | Gender |
Robert | 4 | m |
Diana | 6 | f |
George | 5 | m |
Sanjay | 1 | m |
Jane | 3 | f |
Lee | 2 | m |
This table now has a gender column. You can run a query that checks the gender and its associated name. Since the index sorts guests by their gender, the query only needs to go to the rows containing the f value.
This is the gist of indexing: sorting data by specific metrics in specialized tables or columns to make queries execute faster.
How Does DB Indexing Work: Key Terms
The following database index terms come up all the time:
- Search key: the attribute being indexed, like gender in our example above. The keys are copies of the keys in the original table, usually in sorted order to make searching easier.
- Value/pointer: a value that “points'' to the disk block address where the key is stored. In the example we explored, this would be the values in the Guest_Id column in the index.
- Cardinality: the amount of unique, non-repeated values in a column. Cardinality can be:
- High - the index has many non-duplicate values—this is preferable
- Low - the index has few non-duplicate values
- Density: the ratio of repeated values to unique ones, calculated by dividing 1 by the number of unique values. So, a 100-row index with 95 non-repeated values would have a density of 0.2%. (1/5). You want to aim for low density when designing indexes.
- Selectivity: the inverse of density, i.e., the ratio of unique values to repeated ones, derived by dividing cardinality by the number of rows. A 100-row index with 95 unique values—thus, a cardinality of 5—has a selectivity of 5/100 = 0.05. Strive for selectivity that’s as close to 1 as possible.
Types of Indexing in Database Explained
We can categorize how an index works in database management along several criteria: key-pointer ratio, number of records per file, levels of indexing, etc.. Below are the most important database index types.
Dense Vs Sparse Index
Categorized by the number of keys per pointer, indexing tables can be dense or sparse.
Dense indexes have one key for every pointer. Going back to our birthday table, an index sorted by Guest_Id would be dense since every instance in that column correlates to a unique record, thus needing its own pointer.
Sparse indexes, on the other hand, may contain more than one key per pointer. In our hypothetical, an index built around the gender column has pointers to more than one record. (although the table should be sorted in blocks by gender for this to be effective).
Primary, Secondary, and Clustered DB Indexing
Primary Indexing
This is among the more “traditional” types of indexes. Primary indexing is designed around ordered primary key fields of databases. A primary index can be either dense or sparse, though sparse is preferable.
The number of keys equals the number of blocks in the table being indexed, but there can only be one primary index per database file.
Secondary (Non-Clustered) Indexing
In secondary indexing, the ordering of keys doesn’t match that of the rows stored on the disk. It can be created on either the primary or secondary key, and one table can have more than one secondary index. Updates on non-clustered indexes tend to be faster than on clustered ones.
Clustered Indexing
A clustered index basically reorders the table being indexed so that they match. It reorders them in a way that stores the records in sorted order based on keys and values. Indexes are stored in the same table as the records themselves.
Clustered indexing makes a lot of sense when you have groups of records with similar characteristics. Your index can point to these groups with just one pointer, which makes some queries a little quicker.
Inery•
1 year ago
Can You Use Inery In Telecommunications?
Data is at the centerfold of most businesses, especially in telecommunications. However, telecommunications industries are extremely centralized. Inery presents a solution. ...READ MORE
Share
Inery•
1 year ago
The True Cost of Centralized Database Management
The faults of centralized database management harm companies and consumers every day. Click here to see how—and how to avoid the cost. ...READ MORE
Share
Inery•
10 months ago
Are Crypto Regulations Ruinning Web3 Projects and Businesses?
As the crypto industry grapples with the regulatory surge, the destiny of Web3 projects hinges on finding harmony between innovation and regulation, shaping the decentralized ecosystem's future. ...READ MORE
Share
Inery•
2 years ago
Inery: Transcending IPFS with True Decentralization
Moving beyond the IPFS and database storage solution to offer a database management solution in a decentralized architecture. ...READ MORE
Share
Most popular today