Home | Projects | Notes > Data Structures & Algorithms > Hash Tables
In storing key-value pairs,
Arrays limit the key to integer values which will be used as an index.
xxxxxxxxxx
41a1[1] = 3;
2a2[3] = "three";
3a3[0] = 'a';
4...
Maps can store any pairs of any types.
xxxxxxxxxx
41m1["three"] = 3
2m2[5] = 2
3m3['A'] = 'a'
4...
std::map
(header: <map>
)
Implemented using binary search tree (BST). Due to the nature of the BST, it is ordered map.
Iterator will iterate through the elements based on key order.
The time complexity of all the functions is
std::unordered_map
(header: <unordered_map>
)
Implemented using hash table.
Keys are unique.
It is NOT guaranteed that the iterator will iterate through the elements based on key order.
Time complexity of all the functions is
Functions: insert()
, at()
, size()
, erase()
, etc.
Hash tables support one of the most efficient types of searching: hashing.
A hash table consists of an array (a.k.a. bucket array) in which data is accessed via a special index called key.
The primary idea behind a hash table is to establish a mapping between the set of all possible keys and positions in the array using a hash function.
A hash function accepts a key and returns its hash coding, or hash value.
Keys vary in type, but hash codings are always integers.
Two components of hash functions:
Hash code (converts a key to an integer)
xxxxxxxxxx
21IF key = 'A', hash code = convert character to ascii
2THEN h('A') = 97
Compression function (scales the converted integer to a valid index of the bucket array)
xxxxxxxxxx
31IF bucket array can contain 26 entries
2THEN good compression function can be '% 26' since it scales whatever the hash
3value is to an integer (hash coding, or hash value) ranging 0-25.
A good hash function produces wide-spread hash values as randomly as possible within the valid range of integers. Some well-known hash functions:
Base-26
xxxxxxxxxx
131// e.g., A hash function which gets passed a 4-character string and returns a
2// hash index as an int. Assume the table size is defined as 'ts'.
3
4int hashBase26(char *key, int ts)
5{
6 int index;
7 index = ((key[0] - 'A' + 1) * pow(26,3) +
8 ((key[1] - 'A' + 1) * pow(26,2) +
9 ((key[2] - 'A' + 1) * pow(26,1) +
10 ((key[3] - 'A' + 1) * pow(26,0) +
11
12 return index % ts;
13}
Folding
xxxxxxxxxx
101// e.g., A hash function using Folding
2
3int hashFolding(char *key, int ts)
4{
5 int index;
6 index = ((key[0] - 'A' + 1) * (key[1] - 'A' + 1)) +
7 ((key[2] - 'A' + 1) * (key[3] - 'A' + 1));
8
9 return index % ts;
10}
Middle Squaring
xxxxxxxxxx
91// e.g., A hash function using Middle Squaring
2
3int hashMiddleSquaring(char *key, int ts)
4{
5 int index;
6 index = ((key[1] - 'A' + 1) * (key[2] - 'A' + 1));
7
8 return index % ts;
9}
Division Method
A good general approach:
Select a table size that is a prime number
Translate the key into an integer
Divide Key / M = Q with remainder R
Use R as a hash value
Use max(1, Q) as the probe in/decrement auxiliary hash function (double hashing)
A key is always stored in the bucket it is hashed to.
Collisions are handled by using separate data structure on a per-bucket basis.
Arbitrary number of keys per bucket.
e.g., Chained Hash Table (Separate Chaining)
Consists of an array of linked lists.
Each list forms a bucket in which we place all elements hashing to a specific position in the array.
Performs as good as open addressing techniques.
At most one key per bucket
Collisions are handled by searching for another empty buckets within the hash table array itself. This searching is handled by the hash function
Some well-known open addressing techniques (depending on the choice of
linear probing
Although it is simple to implement, it is prone to primary clustering (cluster developed due to the local popular index).
quadratic probing
This alleviates the primary clustering to some extent.
double hashing
Two auxiliary hash functions should be carefully chosen so that the elements are distributed in a uniform and random manner.
xxxxxxxxxx
491//==============================================================================
2// File : hash_table.h
3// Brief : Interface for Hash Table
4// Author : Kyungjae Lee
5// Date : Jul 31, 2023
6//
7// Note : It is known that when the number of bucket array elements is a
8// prime number, the (key, value) pairs are distributed more
9// randomely (i.e., less collision).
10//==============================================================================
11
12
13
14
15
16
17
18
19using namespace std;
20
21// Class for hash table nodes
22class Node
23{
24public:
25 string key;
26 int value;
27 Node *next;
28
29 Node(string key, int value); // Constructor
30};
31
32// Class for hash table
33class HashTable
34{
35public:
36 HashTable(int size); // Constructor
37 ~HashTable(void); // Destructor
38 void printHashTable(void); // Prints all the nodes in the hash table
39 void insert(string key, int value); // Insert a node into the hash table
40 int lookup(string key); // Looks up a value by key from the hash table
41 vector<string> keys(void); // Returns a vector of keys in the hash table
42 int hash(string key); // Hash function
43
44private:
45 int bucketArrSize; // Bucket array size (prime number recommended)
46 vector<Node *> bucketArr; // Bucket array
47};
48
49// HASH_TABLE_H
xxxxxxxxxx
1641//==============================================================================
2// File : hash_table.cpp
3// Brief : Implementation of Hash Table
4// Author : Kyungjae Lee
5// Date : Jul 31, 2023
6//
7// Note : It is known that when the number of bucket array elements is a
8// prime number, the (key, value) pairs are distributed more
9// randomely (i.e., less collision).
10//==============================================================================
11
12
13
14using namespace std;
15
16//------------------------------------------------------------------------------
17// Implementation of Node class interface
18//------------------------------------------------------------------------------
19
20// Constructor
21// T = O(1)
22Node::Node(string key, int value)
23{
24 this->key = key;
25 this->value = value;
26 next = nullptr;
27} // End of Node class constructor
28
29//------------------------------------------------------------------------------
30// Implementation of Hash Table class interface
31//------------------------------------------------------------------------------
32
33// Constructor
34// T = O(1)
35HashTable::HashTable(int size)
36{
37 bucketArrSize = size;
38 bucketArr.reserve(size); // Size of prime number recommended
39} // End of Hash Table class constructor
40
41// Destructor
42// T = O(n)
43HashTable::~HashTable(void)
44{
45 // Destroy all the nodes
46 Node *delNode, *head;
47
48 for (int i = 0; i < bucketArrSize; i++)
49 {
50 delNode = bucketArr[i];
51 head = bucketArr[i];
52
53 while (head)
54 {
55 head = head->next;
56 delete delNode;
57 delNode = head;
58 }
59 }
60
61 // Destory the bucket array
62 bucketArrSize.~vector();
63} // End of Desctructor
64
65// Prints all the nodes in the hash table
66// T = O(n)
67void HashTable::printHashTable(void)
68{
69 for (int i = 0; i < bucketArrSize; i++)
70 {
71 cout << i << ": ";
72 if (bucketArr[i])
73 {
74 Node *temp = bucketArr[i];
75 while (temp)
76 {
77 cout << "(" << temp->key << ", " << temp->value << "} ";
78 temp = temp->next;
79 }
80 }
81 cout << endl;
82 }
83} // End of printHashTable
84
85// Inserts a node into the hash table (Closed addressing; chaining)
86// T = O(1)
87void HashTable::insert(string key, int value)
88{
89 int index = hash(key);
90 Node *newNode = new Node(key, value);
91
92 if (bucketArr[index] == nullptr)
93 {
94 // No collision detected, so simply insert the new node
95 bucketArr[index] = newNode;
96 }
97 else
98 {
99 // Collision detected
100 Node *temp = bucketArr[index];
101
102 // Find the last node of the chain
103 while (temp->next != nullptr)
104 temp = temp->next;
105
106 // Insert the new node after the last node in the chain
107 temp->next = newNode;
108 }
109} // End of insert
110
111// Looks up a value by key from the hash table
112// T = O(1)
113int HashTable::lookup(string key)
114{
115 int index = hash(key);
116 Node *temp = bucketArr[index];
117
118 // Search the key
119 while (temp)
120 {
121 if (temp->key == key)
122 return temp->value;
123
124 temp = temp->next;
125 }
126
127 // Key is not found, return 0
128 return 0;
129} // End of lookup
130
131// Returns a vector of keys in the hash table
132// T = O(n)
133vector<string> HashTable::keys()
134{
135 vector<string> allKeys;
136 for (int i = 0; i < bucketArrSize; i++)
137 {
138 Node *temp = bucketArr[i];
139
140 while (temp)
141 {
142 allKeys.push_back(temp->key);
143 temp = temp->next;
144 }
145 }
146
147 return allKeys;
148} // End of keys
149
150// Hash function
151int HashTable::hash(string key)
152{
153 int hash = 0;
154 for (int i = 0; i < key.length(); i++)
155 {
156 int ascii = key[i];
157
158 // Multiply by 23, a prime number, to make the result more random
159 // Divide by bucketArrSize to normalize the result
160 hash = (hash + ascii * 23) % bucketArrSize;
161 }
162
163 return hash;
164} // End of hash
xxxxxxxxxx
451//==============================================================================
2// File : main.cpp
3// Brief : Test driver for Hash Table
4// Author : Kyungjae Lee
5// Date : Jul 31, 2023
6//==============================================================================
7
8
9
10
11using namespace std;
12
13int main(int argc, char *argv[])
14{
15 // Create a hash table
16 HashTable *ht = new HashTable(7);
17
18 // Insert elements
19 ht->insert("nails", 100);
20 ht->insert("tile", 50);
21 ht->insert("lumber", 80);
22 ht->insert("bolts", 200);
23 ht->insert("screws", 140);
24
25 // Print hash table
26 ht->printHashTable();
27
28 // Create a vector of keys present in the hash table
29 vector<string> allKeys = ht->keys();
30
31 cout << endl;
32
33 // Print the keys
34 for (auto key : allKeys)
35 cout << key << " ";
36
37 cout << endl;
38
39 cout << "tile: " << ht->lookup("tile") << endl;
40
41 // Destroy the hash table
42 delete ht;
43
44 return 0;
45}
xxxxxxxxxx
1010:
21:
32:
43: (screws, 140}
54: (bolts, 200}
65:
76: (nails, 100} (tile, 50} (lumber, 80}
8
9screws bolts nails tile lumber
10tile: 50
For the ease of implementation, key will be string type, and value will be template.
xxxxxxxxxx
601//==============================================================================
2// File : chtbl.h
3// Brief : Interface for chained hash table (in C++)
4// Author : Kyungjae Lee
5// Date : Dec 23, 2022
6//==============================================================================
7
8
9
10
11
12
13using namespace std;
14
15// Forward declaration
16template <typename T> class CHTbl;
17
18// Chained Hash Table Node class
19template <typename T>
20class CHTblNode
21{
22public:
23 CHTblNode(string key, T value); // parameterized constructor
24 ~CHTblNode(); // parameterized constructor
25
26 // friend declaration
27 template <typename> friend class CHTbl;
28
29private:
30 string key;
31 T value;
32 CHTblNode *next;
33};
34
35// Chained Hash Table
36template <typename T>
37class CHTbl
38{
39public:
40 // function
41 CHTbl(); // default constructor
42 ~CHTbl(); // destructor
43 int size(); // retrieve the total number of entries in the CHTbl
44 void insert(string key, T value); // insert an entry
45 T remove(string key); // remove an entry by key
46 T getValue(string key); // retrieve the value corresponding to the given key
47 double getLoadFactor(); // retrieve the current load factor
48
49private:
50 // variable
51 CHTblNode<T> **bucketArr; // bucket array
52 int buckets; // max number of elements in the bucket array
53 int count; // total number of entries in the CHTbl
54
55 // function
56 int hash(string key); // convert key into the bucketArr index (hash value)
57 void rehash(); // transfer entries to the resized bucket array
58};
59
60// CHTBL_H
xxxxxxxxxx
2471//==============================================================================
2// File : chtbl.c
3// Brief : Implementation of chained hash table (in C++)
4// Author : Kyungjae Lee
5// Date : Dec 23, 2022
6//==============================================================================
7
8
9
10// CHTblNode -------------------------------------------------------------------
11
12// Parameterized constructor
13// T = O(1)
14template <typename T>
15CHTblNode<T>::CHTblNode(string key, T value)
16{
17 this->key = key;
18 this->value = value;
19 next = nullptr;
20} // end parameterized constructor
21
22// destructor
23// T = O(1)
24template <typename T>
25CHTblNode<T>::~CHTblNode()
26{
27 delete next; // recursively destroy the list
28} // end destructor
29
30// CHTbl -----------------------------------------------------------------------
31
32// Default constructor
33// T = O(1)
34template <typename T>
35CHTbl<T>::CHTbl()
36{
37 count = 0;
38 buckets = 5; // set the initial number of buckets
39 bucketArr = new CHTblNode<T>*[buckets];
40} // end default constructor
41
42// destructor
43// T = O(n)
44template <typename T>
45CHTbl<T>::~CHTbl()
46{
47 // traverse each element of the bucketArr and delete lists associated with it
48 // (just deleting the bucketArr won't delete these lists)
49 for (int i = 0; i < buckets; i++)
50 delete bucketArr[i];
51
52 delete[] bucketArr; // delete the bucketArr
53} // end destructor
54
55// Retrieve the total number of entries in the CHTbl
56// T = O(1)
57template <typename T>
58int CHTbl<T>::size()
59{
60 return count;
61} // end size
62
63// Insert an entry
64// T = O(n/b) on average ; // T = O(1)
65// - n: the total number of entries, b: the number of buckets, n/b: load factor
66// - If we could keep n/b < 1, then T = O(1)
67// - If this is the case, getValue and delete function will also run at O(1).
68// - One way is to implement the number of buckets doubles the size every time the
69// load factor exceeds the properly chosen threshold. (e.g., n/b < 0.7)
70// - When this happens, you need to 'rehash' all the exsisting keys because the hash
71// value is dependent on the number of buckets.
72// - Hash function runs at O(l) where l is the length of the key, but it is ignored
73// because l << n.
74// - In the worst case, where the hash function is really bad such that it produces only
75// one hash value for all the given keys, insert function will run at T = O(n). (i.e.,
76// inserting and entry at the end of the list that contains all the n entries)
77// But, this is unlikely to happen with good hash functions that are proven to work.
78template <typename T>
79void CHTbl<T>::insert(string key, T value)
80{
81 int bucketIndex = hash(key);
82 CHTblNode<T> *curr = bucketArr[bucketIndex]; // pointer to traverse the list
83
84 // search the list for the key
85 while (curr)
86 {
87 // if the key was found
88 if (curr->key == key)
89 {
90 curr->value = value; // just update the value
91 return;
92 }
93
94 curr = curr->next;
95 }
96
97 // if key was not found insert a new node
98 CHTblNode<T> *newNode = new CHTblNode(key, value); // create a node
99 newNode->next = bucketArr[bucketIndex]; // insert at the head of the list
100 bucketArr[bucketIndex] = newNode; // update the head
101 count++; // increment count
102 double loadFactor = (1.0 * count) / buckets; // update the load factor
103
104 // if the load factor becomes greater than LOAD_FACTOR_THRESHOLD
105 if (loadFactor > LOAD_FACTOR_THRESHOLD)
106 rehash();
107} // end insert
108
109// Remove an entry by key
110// T = O(1)
111template <typename T>
112T CHTbl<T>::remove(string key)
113{
114 int bucketIndex = hash(key);
115 CHTblNode<T> *curr = bucketArr[bucketIndex]; // pointer to traverse the list
116 CHTblNode<T> *prev = nullptr; // pointer to traverse the list
117
118 // search the list for the key
119 while (curr)
120 {
121 // if the key was found
122 if (curr->key == key)
123 {
124 // remove the head from the list
125 if (prev == nullptr)
126 bucketArr[bucketIndex] = curr->next;
127 // remove the node other than the head from the list
128 else
129 prev->next = curr->next;
130
131 // delete the removed node
132 T value = curr->value; // set aside the value to return
133 curr->next = nullptr; // isolate curr to prevent recursive deletion
134 delete curr; // deallocate memory space pointed at by curr
135 count--; // decrement count
136
137 return value;
138 }
139
140 prev = curr; // advance prev
141 curr = curr->next; // advance curr
142 }
143
144 return 0; // '0' as a cross-type signal for key not found
145} // end remove
146
147// Retrieve the value corresponding to the given key
148// T = O(1)
149template <typename T>
150T CHTbl<T>::getValue(string key)
151{
152 int bucketIndex = hash(key);
153 CHTblNode<T> *curr = bucketArr[bucketIndex]; // pointer to traverse the list
154
155 // search the list for the key
156 while (curr)
157 {
158 // if the key was found
159 if (curr->key == key)
160 return curr->value;
161
162 curr = curr->next; // advance curr
163 }
164
165 return 0; // '0' as a cross-type singal for key not found
166} // end getValue
167
168// Hash function - Convert key into bucketArr index (hash value)
169// T = O(l) where l is the length of the key
170template <typename T>
171int CHTbl<T>::hash(string key)
172{
173 int hashCode = 0;
174 int base = 37;
175 int place = 1; // b^0 (ones') place, b^1s' place, b^2s' place and so on
176
177 // step1: hash code (convert a key to an integer)
178
179 // base-37
180 // "abc" -> ('a' * 37^0) + ('b' * 37^1) + ('c' * 37^2)
181 for (int i = 0; i < key.size(); i++)
182 {
183 hashCode += key[i] * place;
184 place *= base;
185
186 // Since place and hashCode grow fast, let's apply the property of modulo
187 // operation and stay away from int overflow.
188 // e.g., (a*b)%m = [(a%m)*(b%m)]%m
189 hashCode %= buckets;
190 base %= buckets;
191 }
192
193 // setp 2: compression (scale the converted integer to a valid bucketArr index)
194 return hashCode % buckets;
195
196} // end hash
197
198// Transfer entries to the resized bucket array
199// T = O(n)
200template <typename T>
201void CHTbl<T>::rehash()
202{
203 // create a new bucket array with double the size of the original one
204 CHTblNode<T> **oldBucketArr = bucketArr;
205 bucketArr = new CHTblNode<T>*[2 * buckets];
206
207 // initialize the new bucket array
208 for (int i = 0; i < 2 * buckets; i++)
209 bucketArr[i] = nullptr;
210
211 // update the number of buckets
212 int oldBuckets = buckets;
213 buckets *= 2;
214
215 // reset count
216 count = 0;
217
218 // transfer the entries in the old bucket array to the new bucket array
219 for (int i = 0; i < oldBuckets; i++)
220 {
221 CHTblNode<T> *curr = oldBucketArr[i]; // pointer to traverse the old bucket array
222
223 while (curr)
224 {
225 string key = curr->key;
226 T value = curr->value;
227 insert(key, value); // entry gets inserted into the new bucket array
228
229 curr = curr->next; // advance curr
230 }
231 }
232
233 // deallocate the memory used for the lists associated the old bucket array
234 for (int i = 0; i < oldBuckets; i++)
235 delete oldBucketArr[i]; // recursively delete a list
236
237 // deallocate the memory used for the old bucket array
238 delete[] oldBucketArr;
239} // end rehash
240
241// Retrieve the current load factor
242// T = O(1)
243template <typename T>
244double CHTbl<T>::getLoadFactor()
245{
246 return (1.0 * count) / buckets;
247} // end getLoadFactor
xxxxxxxxxx
731//==============================================================================
2// File : chtbl_main.c
3// Brief : Test driver for chained hash table (in C++)
4// Author : Kyungjae Lee
5// Date : Dec 23, 2022
6//==============================================================================
7
8
9
10
11using namespace std;
12
13int main(int argc, char *argv[])
14{
15 CHTbl<int> map;
16
17 cout << "Initial map info" << endl;
18 cout << "size : " << map.size() << endl;
19 cout << endl;
20
21
22 // insert ("abc0", 1), ("abc1", 2), ("abc2", 3) ... , ("abc9", 10) into the map
23 cout << "Insert (\"abc0\", 1), (\"abc1\", 2), ... , (\"abc9\", 10) "
24 << "into the map" << endl;
25
26 for (int i = 0; i < 10; i++)
27 {
28 // construct the key
29 char c = '0' + i;
30 string key = "abc";
31 key = key + c;
32
33 // construct the value
34 int value = i + 1;
35
36 // insert
37 map.insert(key, value);
38
39 // print the insertion status
40 cout << "(" << key << ", " << value << ") inserted - ";
41 cout << "load factor: " << map.getLoadFactor() << endl;
42 }
43
44 cout << "size : " << map.size() << endl;
45 cout << endl;
46
47 // remove entry
48 cout << "Remove (\"abc1\", 2) and (\"abc6\", 7)" << endl;
49 map.remove("abc1");
50 map.remove("abc6");
51
52 cout << "size : " << map.size() << endl;
53 cout << endl;
54
55 // printt the key, value pairs
56 cout << "Print the key, value pairs (value of the removed entry : 0)" << endl;
57
58 for (int i = 0; i < 10; i++)
59 {
60 // construct the key
61 char c = '0' + i;
62 string key = "abc";
63 key = key + c;
64
65 // print the key, value pairs
66 cout << key << ", " << map.getValue(key) << endl;
67 }
68
69 cout << "size : " << map.size() << endl;
70 cout << endl;
71
72 return 0;
73}
xxxxxxxxxx
311Initial map info
2size : 0 <---- initial buckets: 5
3
4Insert ("abc0", 1), ("abc1", 2), ... , ("abc9", 10) into the map
5(abc0, 1) inserted - load factor: 0.2
6(abc1, 2) inserted - load factor: 0.4
7(abc2, 3) inserted - load factor: 0.6
8(abc3, 4) inserted - load factor: 0.4 <---- rehash, buckets: 10
9(abc4, 5) inserted - load factor: 0.5
10(abc5, 6) inserted - load factor: 0.6
11(abc6, 7) inserted - load factor: 0.7
12(abc7, 8) inserted - load factor: 0.4 <---- rehash, buckets: 20
13(abc8, 9) inserted - load factor: 0.45
14(abc9, 10) inserted - load factor: 0.5
15size : 10
16
17Remove ("abc1", 2) and ("abc6", 7)
18size : 8
19
20Print the key, value pairs (value of the removed entry : 0)
21abc0, 1
22abc1, 0 <---- removed entry
23abc2, 3
24abc3, 4
25abc4, 5
26abc5, 6
27abc6, 0 <---- removed entry
28abc7, 8
29abc8, 9
30abc9, 10
31size : 8