The hash table uses a hashing function to efficiently store and retrieve data, It is also known as a *hash map*, *map*, *dictionary*, or *associative array*.

An underlying array is used to store the data, the hashing function will take a key and calculate its index in the array.

The hash function must be:

- consistent
- map different words to different numbers,
- only return valid indexes.

A good hashing function, such as **SHA**, will minimise collisions.

Collisions occur when the function generates the same index for different keys.

In the event of a collision, a linked list is used at the affected index to connect the data. This obviously has performance implications because the hash table has to iterate the linked nodes to identify the desired key, or to add another item.

The *load factor* of the underlying array is a measurement of how occupied it is, for example here we have an array of five elements with two occupied:

If all five elements were occupied we would have a load factor of one, which is bad. If we had ten elements in the same array we would have a load factor of two (if there is a collision a linked list is used at the affected index thus allowing more elements than the array’s capacity).

Generally, hash tables resize when they reach a load factor of 0.7. Resizing is a slow operation.

Following is a very simple example of a hash table, with minimum functionality. It implements just enough to demonstrate the essence of the data structure.

First we have the node used to store our data, I added the index for diagnostic purposes:

```
class Node:
def __init__(self, key=None, value=None, index=None, next_node=None):
self.key = key
self.value = value
self.index = index
self.next_node = next_node
```

To keep things simple, I limit the underlying array to 26 items and populate them with empty nodes. Typically the hash table will only use a linked list at a collision, but for this example I initialise each array item as the head of a linked list. The hashing function is basic, it gets the modulus of the ordinal value of the first character of the supplied key. It assumes the key is a string.

```
class Hashtab:
NODE_COUNT = 26
def __init__(self):
self.items = [Node(index=i) for i in range(self.NODE_COUNT)]
def __getitem__(self, key):
item = self.get_node(key)
return item.value
def __setitem__(self, key, value):
item = self.get_node(key)
item.value = value
def __iter__(self):
nodes = []
for i in range(self.NODE_COUNT):
item = self.items[i]
while item and item.key:
nodes.append(item)
item = item.next_node
return iter(nodes)
def get_node(self, key):
index = self.get_index(key)
item = self.items[index]
if not item.key:
item.key = key
return item
while item.key != key and item.next_node:
item = item.next_node
if item.key == key:
return item
item.next_node = Node(key, None, index)
return item.next_node
def get_index(self, key):
return ord(key[0]) % self.NODE_COUNT
```

And here it is in action:

```
htab = Hashtab()
#
```**let's initialize some values**
htab['a'] = 'john'
htab['c'] = 7
htab['y'] = 141
htab['abcde'] = 'tom'
# **and overwrite some**
htab['y'] = 'sally'
htab['c'] = 'bingo!'
htab['abcde'] = 'fred'
# **list all the items in our hash table**
for node in htab:
next_key = None
if node.next_node:
next_key = node.next_node.key
print(f'[{node.index}] key={node.key} value={node.value} next_node => {next_key}')
# **outputs**:
[17] key=y value=sally next_node => None
[19] key=a value=john next_node => abcde
[19] key=abcde value=fred next_node => None
[21] key=c value=bingo! next_node => None

We can see in the output that there is a collision at index 19. The first item (key=a, value=john) consequently links to the next node at that index (key=abcde, value=fred).

Thankfully most languages have a highly efficient hash table already available so we don’t need to worry about writing our own. Still, it’s a fun exercise 🙂