Hash Table

The hash table uses a hashing function to efficiently store and retrieve data, It is also known as a hash map, map, dictionary, or associative array.

An underlying array is used to store the data, the hashing function will take a key and calculate its index in the array.

The hash function must be:

  • consistent
  • map different words to different numbers,
  • only return valid indexes.

A good hashing function, such as SHA, will minimise collisions.

Collisions occur when the function generates the same index for different keys.

In the event of a collision, a linked list is used at the affected index to connect the data. This obviously has performance implications because the hash table has to iterate the linked nodes to identify the desired key, or to add another item.

The load factor of the underlying array is a measurement of how occupied it is, for example here we have an array of five elements with two occupied:

If all five elements were occupied we would have a load factor of one, which is bad. If we had ten elements in the same array we would have a load factor of two (if there is a collision a linked list is used at the affected index thus allowing more elements than the array’s capacity).

Generally, hash tables resize when they reach a load factor of 0.7. Resizing is a slow operation.

Following is a very simple example of a hash table, with minimum functionality. It implements just enough to demonstrate the essence of the data structure.

First we have the node used to store our data, I added the index for diagnostic purposes:

class Node:
    def __init__(self, key=None, value=None, index=None, next_node=None):
        self.key = key
        self.value = value
        self.index = index
        self.next_node = next_node

To keep things simple, I limit the underlying array to 26 items and populate them with empty nodes. Typically the hash table will only use a linked list at a collision, but for this example I initialise each array item as the head of a linked list. The hashing function is basic, it gets the modulus of the ordinal value of the first character of the supplied key. It assumes the key is a string.

class Hashtab:
    NODE_COUNT = 26

    def __init__(self):
        self.items = [Node(index=i) for i in range(self.NODE_COUNT)]

    def __getitem__(self, key):
        item = self.get_node(key)
        return item.value

    def __setitem__(self, key, value):
        item = self.get_node(key)
        item.value = value

    def __iter__(self):
        nodes = []

        for i in range(self.NODE_COUNT):
            item = self.items[i]

            while item and item.key:
                item = item.next_node

        return iter(nodes)

    def get_node(self, key):
        index = self.get_index(key)
        item = self.items[index]

        if not item.key:
            item.key = key
            return item

        while item.key != key and item.next_node:
            item = item.next_node

        if item.key == key:
            return item

        item.next_node = Node(key, None, index)
        return item.next_node

    def get_index(self, key):
        return ord(key[0]) % self.NODE_COUNT

And here it is in action:

htab = Hashtab()

# let's initialize some values
htab['a'] = 'john'
htab['c'] = 7
htab['y'] = 141
htab['abcde'] = 'tom'

# and overwrite some
htab['y'] = 'sally'
htab['c'] = 'bingo!'
htab['abcde'] = 'fred'

# list all the items in our hash table
for node in htab:
    next_key = None

    if node.next_node:
        next_key = node.next_node.key

    print(f'[{node.index}] key={node.key} value={node.value} next_node => {next_key}')

# outputs:

[17] key=y value=sally next_node => None
[19] key=a value=john next_node => abcde
[19] key=abcde value=fred next_node => None
[21] key=c value=bingo! next_node => None

We can see in the output that there is a collision at index 19. The first item (key=a, value=john) consequently links to the next node at that index (key=abcde, value=fred).

Thankfully most languages have a highly efficient hash table already available so we don’t need to worry about writing our own. Still, it’s a fun exercise 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s