Recently one of my friends asked me how hash table resize itself. I tried to learn if from net, that ended up in this blog post. This is an accumulation of what I had found in various articles and discussions. Lot of smart people out there already discussed it. I have included reference to some useful sites to the end of the post.

Hash Table

Hash table is a data structure available in .net. It allows you to associate a key and a value.

Hash table usually puts the Key-Value pair in a bucket. The following figure gives you an idea about it,

Given a Key, hash table will use the hash code of the key to determine where to place the key. So that all the keys that are to be placed in a hash table requires a hash code. For lookup process also the hash code is used, which is a O(1) process. There are chances that the two keys may produce the same hash code and they will end up in the same bucket. Such a situation is called collision. So you have to make sure, that you are using a very good hash function for providing the hash codes.

There are many techniques available to avoid the collisions, the most popular are chaining and open addressing.

Chaining.

When a collision occurs in chained hash technique, the bucket will refer to a linked list. Each item will be attached to the list.

Open Addressing

Here only one Key-Value pair is placed in a bucket, when a collision occurs one of the following probing methods is used to determine where to place the item.

· Linear

· Quadratic

· Double hashing.

Another important element in open addressing is the load factor, which determines the proportions of the elements in the array are to be used. In .net the default load factor is 0.72 and in java it's 0.75.

Hash Table In .Net

In .net, the bucket contains three elements

1. Key.

2. Value.

3. Hash Code.

The Hash Code will contain the significant bit of the hash code if a collision happens.

By default, if we create an instance of the hash table. The default capacity is zero and load factor is 1.0. The valid range of load factor in .net is 0.1 to 1.0. If you specify the load factor as 1.0 it will be scaled down to 0.72, as they found that values greater than 0.72 seriously degraded the performance. On resize of hash table, .net considers the smallest prime number, which is larger than the double of the current size. For example, if we specify the size as 100 and a load factor of 1 then,

ActualloadFactor = 0.72 * 1 = 0.72

// This should not exceed the maximum integer size.

ActualSize = 100 / ActualLoadFactor = 100/0.72 = 139

//Which will returns the next smallest prime number which is

// larger than (2 * //ActualSize)

ActualSize = GetPrime(ActualSize) = 278

loadSize = ActualLoadFactor * ActualSize = 0.72 * 278 = 200

if (loadSize >= ActualSize){loadSize = ActualSize - 1}

When we resize an hash table what it actual do is, it will find the load size as per above and then create temporary array copy all the items to it and will resize the original array and will copy items from the original one.

The hash function used by the .net for is as follows

h1(key) = GetHash(key); // default implementation calls key.GetHashCode();

h2(key) = 1 + (((h1(key) >> 5) + 1) % (hashsize - 1));

The h1 can return any number but h2 function will make sure that the hash code returned will be between 1 and hashsize - 1. This will ensures that each bucket is visited at least once in hashsize probes. The table size is selected as prime number or at least a coprime, in order to avoid the tendency having the same divisors for a large hash code and the hash table size, which other wise produce collisions after the modulus operation.

The notable matter here is that size of the array is limited to the maximum value of the integer. Beyond that it will throw an exception. It’s always good to initialise the array with the enough initial capacity, as it save the resizing of the array.

References

- MBR IT/.NET 247 : System.Collections.Hashtable Class [Rotor]
- C#: Dictionary capacity in VB.Net 2.0
- Hash table - Wikipedia, the free encyclopedia
- Talk:Hash table - Wikipedia, the free encyclopedia
- When you create your own key object in a Hashtable, be careful - By Tony Sintes
- An Extensive Examination of Data Structures - By Scott Mitchell