0% completed
Hash-based indexing is a technique used to store and retrieve records efficiently by mapping keys to specific locations using a hash function. Unlike tree-based indexing, which organizes data hierarchically, hash-based indexing directly maps keys to buckets or slots, making it particularly efficient for point queries.
What Is Hash-Based Indexing?
Hash-based indexing employs a hash function to compute a hash value for each key, determining the bucket or slot where the corresponding record will be stored. This eliminates the need for sequential searching, offering constant-time complexity O(1) in ideal cases.
Key Features
- Fast and efficient for equality searches (e.g., "Find the record where ID = 102").
- The hash function ensures even distribution of records across buckets.
- Not suitable for range queries (e.g., finding all records with IDs between 100 and 200).
Static Hashing
In static hashing, the number of buckets is fixed when the hash table is created. Each record is mapped to a bucket based on its hash value. This structure works well when the size of the dataset is known and remains relatively constant.
How Static Hashing Works
Consider the image below.
-
Data Mapping:
- Consider the keys
98
,106
,104
, and102
. - The hash function used is h(key) = key \mod 5. The result of the hash function determines the bucket index.
- Consider the keys
-
Hash Calculation:
- 98 \mod 5 = 3 → Bucket 3
- 106 \mod 5 = 1 → Bucket 1
- 104 \mod 5 = 4 → Bucket 4
- 102 \mod 5 = 2 → Bucket 2
-
Bucket Representation:
- Each bucket stores the actual data corresponding to the hashed key. For example:
- Bucket 3 contains data for key
98
(e.g.,John, Delhi
). - Bucket 1 contains data for key
106
. - Bucket 4 contains data for key
104
. - Bucket 2 contains data for key
102
.
- Bucket 3 contains data for key
- Each bucket stores the actual data corresponding to the hashed key. For example:
This setup ensures efficient lookups because the hash function directly maps keys to their corresponding buckets.
Problem: What Happens When a Collision Occurs?
Now, suppose we want to add the key 103
to the hash table:
- 103 \mod 5 = 3
- Bucket 3 is already occupied by key
98
, leading to a collision.
Solution: Chaining
Chaining is a common method to resolve collisions. When two or more keys map to the same bucket, a linked list (or chain) is created to store multiple records in that bucket.
Chaining Example
Consider the image below.
-
Collision Handling:
- Keys
98
and103
both map to bucket 3. Instead of overwriting data, a linked list is created within bucket 3 to store both records.
- Keys
-
How It Works:
- The hash table is updated to include a chain for bucket 3.
- To search for a key (e.g.,
103
), the hash function maps it to bucket 3, and the linked list is traversed to locate the specific record.
Chaining ensures that no data is lost due to collisions, and the hash table remains efficient.
Dynamic Hashing
Dynamic hashing, also known as extendible hashing, overcomes the limitations of static hashing by allowing the number of buckets to grow or shrink dynamically. This ensures better handling of collisions and adapts to changes in the dataset size.
How Dynamic Hashing Works
Consider the image below.
-
Global Depth:
- The global depth k represents the number of bits used from the hash value to determine the directory ID. For example, if ( k = 2 ), the possible directory IDs are
00
,01
,10
, and11
.
- The global depth k represents the number of bits used from the hash value to determine the directory ID. For example, if ( k = 2 ), the possible directory IDs are
-
Mapping Keys to Buckets:
- A hash function generates a binary hash value for each key.
- The last ( k ) bits of the hash value determine the directory ID.
- The directory ID points to the corresponding bucket where the record is stored.
-
Handling Bucket Overflow:
-
If a bucket overflows (e.g., due to multiple keys hashing to the same bucket), bucket splitting occurs:
- The overflowing bucket is split into two new buckets.
- The global depth k increases, and the directory is updated to reflect the new buckets.
-
For example, if ( B_2 ) overflows, the directory IDs are updated to
100
and101
, and the bucket splits to redistribute the records. -
Keys are redistributed between the new buckets based on their updated hash values.
-
Properties of Dynamic Hashing
- Bucket Splitting: When a bucket overflows, it is split into two, and the keys are redistributed.
- Dynamic Directory: The directory grows or shrinks as needed, ensuring efficient data storage and retrieval.
- Flexibility: The hash function dynamically adjusts to prevent collisions and ensure balanced bucket utilization.
Dynamic hashing provides a scalable and efficient solution for managing large and dynamic datasets.
Comparison of Static and Dynamic Hashing
Feature | Static Hashing | Dynamic Hashing |
---|---|---|
Number of Buckets | Fixed | Grows or shrinks dynamically |
Collision Handling | Chaining or open addressing | Bucket splitting with updated directories |
Flexibility | Limited | Highly flexible |
Use Cases | Small, stable datasets | Large, dynamic datasets |
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible