Extension mechanism of HashMap and why the default size is 2 power

Keywords: JDK

Put Method of HashMap

The data structure design of HashMap can be referred to. link . Next, we review the put(Key k, Value v) process of HashMap:

(1) Hash value is calculated for Key, hash table subscript is calculated, corresponding to hashCode() method, so when class object is used as Key, the hashCode() method and equals() method of the object need to be rewritten.
(2) If there is no collision, put it directly into the bucket, that is, the linked list header of the corresponding position of the Hash table array.
(3) If the collision occurs, replace the old value if the node already exists, otherwise link the element to the back in a linked list.
(4) If the length of the list exceeds the threshold (TREEIFY_THRESHOLD=== 8), the list is converted to a red-black tree. I'm not familiar with mangrove trees. I'm not going to talk about them here.
(5) If the bucket is full (capacity * load factor), resize is required.

Extension mechanism of HashMap

Assuming that length is the size of an array of Hash tables, the method indexFor(int hash, int length) is

indexFor(int hash, int length) {
    return hash % length;
}

In the resize process, elements on the same Entry chain in the old array may be placed in different positions of the new array by recalculating the index position. JDK8 has some optimizations. In resize process, the modification of the array size of the Hash table uses an expansion of the second power (i.e. the length is twice as long as the original), which has two advantages.

Benefits 1

In the source code of hashmap. The put method calls the indexFor(int h, int length) method, which finds the entry's position in the Hash table array based on the hash value of the key. The source code is as follows:

/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

The above code is also equivalent to finding the module for length. Notice that the final return is H & (length-1). If length is not a power of 2, say 15. Then the binary of length-1 becomes 1110. In the case that h is a random number, do & operate with 1100. The last number is always zero. Then the places with the last number of 1, such as 0001, 1001, 1101, can never be occupied by entry. This will lead to waste, not random and other issues. The more digits 1 in length-1 binary, the average distribution.

Benefits 2

The following figure is taken as a n example, in which graph (a) represents an example of key1 and key2 determining index positions before expansion, and graph (b) represents an example of key1 and key2 determining index positions after expansion, and N represents length.

After the element recalculates hash, because n becomes twice as large, the mask range of n-1 is more than 1 bit (red), so the new index will change as follows:

There is no need to recalculate hash in resize process like the implementation of JDK 1.7. Just look at whether the bit added to the original hash value is 1 or 0. If it is 0, the index will remain unchanged. If it is 1, the index will become "original index + oldCap". You can see the resize schematic diagram expanded from 16 to 32 (on the one hand, bit operation is faster, on the other hand, the collision-resistant Hash function is time-consuming). :

Source code is as follows

 1 final Node<K,V>[] resize() {
 2     Node<K,V>[] oldTab = table;
 3     int oldCap = (oldTab == null) ? 0 : oldTab.length;
 4     int oldThr = threshold;
 5     int newCap, newThr = 0;
 6     if (oldCap > 0) {
 7         // If you exceed the maximum, you will not expand any more, so you have to collide with it.
 8         if (oldCap >= MAXIMUM_CAPACITY) {
 9             threshold = Integer.MAX_VALUE;
10             return oldTab;
11         }
12         // If it does not exceed the maximum value, it will be expanded to twice the original value.
13         else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
14                  oldCap >= DEFAULT_INITIAL_CAPACITY)
15             newThr = oldThr << 1; // double threshold
16     }
17     else if (oldThr > 0) // initial capacity was placed in threshold
18         newCap = oldThr;
19     else {               // zero initial threshold signifies using defaults
20         newCap = DEFAULT_INITIAL_CAPACITY;
21         newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
22     }
23     // Calculating the New resize Upper Limit
24     if (newThr == 0) {
25 
26         float ft = (float)newCap * loadFactor;
27         newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
28                   (int)ft : Integer.MAX_VALUE);
29     }
30     threshold = newThr;
31     @SuppressWarnings({"rawtypes"，"unchecked"})
32         Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
33     table = newTab;
34     if (oldTab != null) {
35         // Move each bucket to the new buckets
36         for (int j = 0; j < oldCap; ++j) {
37             Node<K,V> e;
38             if ((e = oldTab[j]) != null) {
39                 oldTab[j] = null;
40                 if (e.next == null)
41                     newTab[e.hash & (newCap - 1)] = e;
42                 else if (e instanceof TreeNode)
43                     ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
44                 else { // Chain list optimization heavy hash block
45                     Node<K,V> loHead = null, loTail = null;
46                     Node<K,V> hiHead = null, hiTail = null;
47                     Node<K,V> next;
48                     do {
49                         next = e.next;
50                         // Original reference
51                         if ((e.hash & oldCap) == 0) {
52                             if (loTail == null)
53                                 loHead = e;
54                             else
55                                 loTail.next = e;
56                             loTail = e;
57                         }
58                         // Original Index + oldCap
59                         else {
60                             if (hiTail == null)
61                                 hiHead = e;
62                             else
63                                 hiTail.next = e;
64                             hiTail = e;
65                         }
66                     } while ((e = next) != null);
67                     // Place the original index in the bucket
68                     if (loTail != null) {
69                         loTail.next = null;
70                         newTab[j] = loHead;
71                     }
72                     // The original index + oldCap is put in the bucket
73                     if (hiTail != null) {
74                         hiTail.next = null;
75                         newTab[j + oldCap] = hiHead;
76                     }
77                 }
78             }
79         }
80     }
81     return newTab;
82 }

Reference

https://zhidao.baidu.com/question/1738414783693877787.html

https://blog.csdn.net/aichuanwendang/article/details/53317351

Posted by coderWil on Sun, 12 May 2019 02:22:09 -0700

Programmer Group