Introduction to Hash Tables

Grokking the Engineering Manager Coding Interview

Ask Author

Back to course home

0% completed

Vote For New Content

Introduction to Hash Tables

On this page

A naive Hash Table implementation

ADT class

Defining the hash method

Naive insertion operation

Naive search operation

Naive delete operation

Implementation

When to Use a Hashtable?

A Hashtable (also known as Hash Map) is a data structure that stores key-value pairs and allows fast lookups, insertions, and deletions. It is an essential tool in programming for efficiently managing and retrieving data.

Imagine a locker system where:

Your name is the key.
Your belongings are the values.
A locker number (computed from your name) determines where you store/retrieve items.

In other words, a Hash Table implements an associative array abstract data type (or Dictionary ADT), mapping keys to values. A Hash Table uses a hash function to compute an index into an array of buckets or slots where the desired value is stored.

Hash Table

There are four main elements to any Hash Table: Keys, Values, the Hash Function, and Buckets.

Keys: In our library analogy, think of keys as the unique identifiers for each book. Keys are the inputs we feed into our hash function. They can be any data type - numbers, strings, or even objects. The crucial characteristic of keys is that they should be unique. If two pieces of data share the same key, it might lead to complications, like collisions (we'll discuss this in detail later).
Values or Data: Values are the actual data that we store in our Hash Table. They could be anything from a single number to a complex object or even a function. Using the key, we can quickly retrieve the corresponding value from the Hash Table.
Hash Function - H(x): We've touched on this before, but it's worth emphasizing the importance of the hash function. This is the engine that drives a Hash Table. It's responsible for transforming keys into hash values, which dictate where we store our data in the table.
Buckets: Once the hash function processes our key, it produces a hash value. This value corresponds to a specific location or 'bucket' within the Hash Table. Think of buckets as shelves in our library, each one designated to store a specific book (or piece of data).

Here are the three basic operations that are performed on Hash Tables:

Insert(key, value) operation: Calculates the hash index $H(key)$ and stores the key-value pair in the slot at that index.
Search(key) operation: Computes $H(key)$ to find the slot and returns the value associated with the key, or null if not present.
Delete(key) operation: Removes the key-value pair stored at index $H(key)$ .

A naive Hash Table implementation

Imagine once again a major public library that needs to store basic information, such as Key (ISBN), Title, and Placement Info, for all available books. This system is heavily used, with librarians frequently searching for books. This results in many retrieval requests.

The frequent retrievals require an efficient solution that can quickly perform the search operation (ideally, in constant time). Therefore, the Hash table data structure perfectly suits the scenario.

Let's start by discussing our data model. We will use a dynamic array of the Record type to store the books' information. Here is what our Record class looks like:

Python3

. . . .

ADT class

Now, let's look at the HashTable ADT class definition. The HashTable class has an HT_array pointer to store the address of the dynamically allocated array of records. The max_length property is the maximum number of records the hash table can hold. The length represents the current number of records in the Hash table. It increments and decrements with insertions and deletions, respectively.

Python3

. . . .

Defining the hash method

Let's define our Hash method H() for this naive implementation. We will use the simplest modular Hashing for this scenario:

Python3

. . . .

The above Hash function ensures we always get an index value in the range $[0, max\_length-1]$ . Let's move on to see how the simplified/ naive Insert() method works.

Naive insertion operation

The Insert() method takes a new record as a parameter and checks if the maximum capacity of the HT_array is not reached. If the table still can have more records, the method calculates a proper index/ hash key for placing this item. After that, it puts the item at the computed index.

Python3

. . . .

Point to ponder: What happens if two different keys map to the same array index? This implementation overwrites it. Indeed, it is a flaw we will address in the Solving Collisions section.

Naive search operation

Now, let's explore how and why the Search() method will retrieve records in $O(1)$ for us. Here is a naive implementation:

Python3

. . . .

The Search() method applies the hash function H() on the passed key and checks if the hash table slot is empty or not. If this slot has a valid record, the Search() method assigns the record at this slot to the reference parameter. Also, it returns a true flag to indicate the operation's success.

The above implementation of Search() clearly takes constant time (i.e., $O(1)$ ) time to retrieve/ search any record regardless of the size our table may grow. This is evident by the fact that you only have to apply the hash function only constant number of times to calculate the position of the searched item.

Note: This naive implementation for the search doesn't cover all aspects. We will discuss a more sophisticated method in the next lesson.

Naive delete operation

Like the search operation, this deletion operation first locates the item requiring a delete. Afterward, the deletion operation simply places a null or default object at the table slot.

Here is what the naive implementation looks like:

Python3

. . . .

Again, like the search operation, deletion also takes a constant time to locate and delete an item from the table. But, it is important to note that this naive implementation is just to get an overall idea of how the hash table works. However, it doesn't cater to many exceptional cases; we will discuss those in the subsequent sections.

Here is the complete code for the naive hash table implementation with a sample driver code:

Python3

. . . .

Implementation

Here is how different languages have implemented Hash Tables:

Language	API
Java	java.util.Map Or java.util.HashMap
Python	dict
C++	std::unordered_map
JavaScript	Object or Map

When to Use a Hashtable?

Quick lookups (e.g., finding user data from an ID).
Removing duplicates (e.g., tracking unique users).
Storing and retrieving configuration settings.
Fast caching systems (e.g., database indexing).

Mark as Completed

On this page

A naive Hash Table implementation

ADT class

Defining the hash method

Naive insertion operation

Naive search operation

Naive delete operation

Implementation

When to Use a Hashtable?