To more easily complete queries, large databases use indexes. However, some databases are too large even for regular indexes to be performant. In those cases, hashing or hash indexing can be a more powerful, convenient alternative.

Here, we’ll cover the basics of hashing, its classifications, as well as situations where it’s the most useful.

DBMS Hash Indexing Explained

In a nutshell, hashing in DBMS is a way to quickly find data on disk without using a traditional index. Instead, the database uses a hash function to calculate the location of a given record.

It usually takes the primary key, finds the hash based on a certain mathematical function, and points to the appropriate bucket where the queried data is.

What Are Hash Buckets?

In hashing, buckets are essentially used to apportion data for lookup or sorting, similar to an array index. It’s where the hashes derived from the hashing function are stored, providing fast access to the required data.

Each bucket is identified via an address; so a bucket with one address holds all index entries for a certain search key, provided that function(search key) = address.

When Is Hashing Preferable to Standard Indexing?

In specific cases, hashing is superior to indexes. More precisely, hashing works better for point lookups and primary key-based lookups.

Hashing is only usable in equality/inequality checks, unlike indexes. But hashing performs better in that regard than B-tree indexes: its cost is equal to the number of pages in the bucket, so it’s much cheaper if there’s no overflow (more on overflow later).

Hash Functions Explained

Hash functions are the means through which a key becomes a hash of length-restricted value. These functions should have the following properties:

They remain the same for every hash being made

They’re easy to calculate quickly

No two hashes should be alike—collision happens when two hashes are the same

Every unique input causes a unique output

Common Hash Functions

Division

Mid-Square

Folding

Multiplication

Types of Hashing

Broadly speaking, we divide hashing techniques in two ways: by extendability and collision resolution.

Sorted by extendability, we use the following categorization:

Static

Dynamic

By collision resolution, hashing can be:

Open (separate chaining)

Closed (open addressing)

Static Vs Dynamic Hashing

In static hashing, hash tables have a fixed size. When a table is filled up, an overflow page is allocated to take in the brimming data. But this overflow page can also reach maximum capacity, after which we add another one. This chain is called the overflow chain.

Static hashing is a fast way to complete insert and delete operations, seeing that it only needs two I/Os, those being read and write. However, you do need to know the size of the hash table in advance. If you fail to optimize the hash table, the overflow chain can get quite long, causing a dip in performance and making scaling more difficult.

To compensate for the shortcomings of static hashing, dynamic hashing takes a different approach. Dynamic hash tables aren’t fixed; instead, they expand to accommodate more data. Conversely, if a dynamic hash table is too empty, it can be resized to save memory usage.

The advantage of dynamic hashing lies in its flexibility: you don’t have to plan its size in advance, and it still maintains an O(1) lookup/insertion performance. On the other hand, dynamic tables change data location as they grow or shrink. This can make performance inconsistent as data jumps around the table.

Open Vs Closed Hashing (Separate Chain Vs. Open Address)

For hashing to be effective, we have to reduce collisions (i.e., two hashes for different inputs being the same) as much as possible.

In closed hashing, we store records within the hash table array itself, and available fields for hashes are found via probing. There should be no more than one key per bucket. In contrast, open hashing entails using separate lists to store hashes (hence the term “separate” chains) with an arbitrary possible number of keys per table.

Why Is Closed Hashing Also Called “Open Address?”

The fact that “closed” hashing and “open” addresses are synonyms sounds confusing at first. But here’s why that’s the case: if the location of a record is independent of its hash value—as is the case in closed hashing—then its address is open by definition; it isn’t fixed.

Closed Hashing and Probing

In closed hashing, to find a place for a new hash entry or look for an existing record within a bucket array, a process called probing is necessary. Through probing, we examine the buckets in a given probe sequence (mainly linear, double-hashing, or quadratic) and look for an unoccupied slot.

Probing is an integral part of specific types of collision resolution. For example, Robin Hood hashing displaces duplicate hashes based on PSL (probe sequence length), displacing those farther from the bucket where the item was originally hashed.

Best Use Cases for Hashing

In the following cases, hashing gives you the most bang for your buck:

Graphics

Game boards

Message digests

Compiler operation

Password verification

Linking file names and paths

Hashing algorithms excel in various scenarios where fast data retrieval, security, and efficiency are essential. Graphics processing, where quick access to pixel data or image properties is crucial, benefits from hashing's speed. Similarly, game boards, particularly in complex video games, rely on hashing to quickly determine the state of the game and make efficient moves.

In cybersecurity, hashing plays a pivotal role in generating message digests, ensuring data integrity, and detecting tampering. Moreover, during compiler operation, hashing helps expedite symbol table lookups and optimize code generation. Password verification is another prime application, enhancing security by storing and comparing password hashes instead of the actual passwords.

Lastly, when linking file names and paths in file systems, hashing enables rapid access and reduces the need for exhaustive searches. In these use cases, hashing offers a powerful solution to streamline operations and enhance both performance and security.

Conclusion

In conclusion, understanding the power of hashing in the world of database management is crucial. It provides an efficient means to access data without the traditional index, making it particularly valuable in cases of extensive databases. Hashing technology's ability to distribute and locate data quickly offers substantial advantages for point lookups and primary key-based searches.

As you explore the possibilities of implementing hashing in your database system, consider innovative solutions like IneryDB.

IneryDB leverages the efficiency of hashing to enhance data retrieval and system performance, contributing to the ever-evolving landscape of decentralized and efficient data management. So, as you navigate the world of database management, remember that hashing plays a pivotal role, and solutions like IneryDB can take your data management to new heights.

Inery•

3 years ago

Inery Community Updates Edition #2

This edition of community updates will focus on what has happened since then and some more exciting news. ...READ MORE

Inery•

2 years ago

Databases Unchained: Navigating the Centralized-Decentralized Landscape

Explore the intricate world of database management, weighing the strengths of centralized and decentralized systems, and discover Inery. ...READ MORE

Inery•

2 years ago

The True Cost of Centralized Database Management

The faults of centralized database management harm companies and consumers every day. Click here to see how—and how to avoid the cost. ...READ MORE

Inery•

3 years ago

Web5: What is the buzz all about ?

Here’s decoding web5, what it means, and what the buzz is all about. ...READ MORE

Inery•

Hashing in DBMS: a Helpful Overview

How Decentralization Empowers Digital Freedom: The Data Rebellion

What Is Database Denormalization? Guide + Tips

DBMS Hash Indexing Explained

What Are Hash Buckets?

When Is Hashing Preferable to Standard Indexing?

Hash Functions Explained

Common Hash Functions

Types of Hashing

Static Vs Dynamic Hashing

Open Vs Closed Hashing (Separate Chain Vs. Open Address)

Why Is Closed Hashing Also Called “Open Address?”

Closed Hashing and Probing

Best Use Cases for Hashing

Conclusion

Inery•

Inery Community Updates Edition #2

Inery•

Databases Unchained: Navigating the Centralized-Decentralized Landscape

Inery•

The True Cost of Centralized Database Management

Inery•

Web5: What is the buzz all about ?