HashMap source code analysis

Today, let's look at the capacity expansion mechanism of HashMap. It's hard to say in words. I'll try to make it clear. As long as we talk about the logic in resize, we can divide this method into two parts

resize method

This method mainly has two parts of logic. The first part is to set the size of the new array (newCap) and the threshold of the new array (newThr) during capacity expansion. The second part is to migrate the data of the old array to the new array. Let's look at the first part first

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table; //The array of all current elements is called the old element array
        int oldCap = (oldTab == null) ? 0 : oldTab.length; //Old element array length
        int oldThr = threshold;    // Old capacity expansion threshold setting
        int newCap, newThr = 0;    // The capacity of the new array and the expansion threshold of the new array are initialized to 0
        if (oldCap > 0) {    // If the length of the old array is greater than 0, the element already exists
            // Case 1
            if (oldCap >= MAXIMUM_CAPACITY) { // If the number of array elements is greater than or equal to the limited maximum capacity (the 30th power of 2)
                // The expansion threshold is set to the maximum value of int (the 31st power of 2 - 1), because oldCap overflows by multiplying 2.
                threshold = Integer.MAX_VALUE;    
                return oldTab;    // Returns an old array of elements
            }

           /*
            * If the number of array elements is within the normal range, the capacity of the new array is twice that of the old array (shifting 1 bit to the left is equivalent to multiplying 2)
            * If the new capacity after capacity expansion is less than the maximum capacity and the old array capacity is greater than or equal to the default initialization capacity (16), the capacity expansion threshold of the new array is set to twice the old threshold. (the old array capacity greater than 16 means that either the constructor specifies an initialization capacity value greater than 16, or has experienced at least one expansion)
            */
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }

        // Case 2
        // Running to this else if indicates that the old array does not have any elements
        // If the capacity expansion threshold of the old array is greater than 0, set the capacity of the new array to this threshold
        // This step means that the initialization capacity is specified when constructing the map.
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            // If it can run here, it means that the map is created by calling the parameterless constructor and adding elements for the first time
            newCap = DEFAULT_INITIAL_CAPACITY;    // Set the new array capacity to 16
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); // Set the new array expansion threshold to 16 * 0.75 = 12. 0.75 is the load factor (when the number of elements reaches 3 / 4 of the capacity, the capacity will be expanded)
        }

        // If the expansion threshold is 0 (case 2)
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);  
        }
        threshold = newThr; // Set the capacity expansion threshold of map to the new threshold
         The second part is omitted first....
        return newTab; // Returns a new array
    }

The above code is divided into two cases

The first case is marked 1 in the above figure. If the length of the old array is greater than 0, it indicates that a value already exists. If the capacity of the old array is greater than the maximum capacity, the old array will be returned directly. If the capacity of the new array is directly expanded within the normal range, the new array will be twice the original, and the threshold value will be twice the old array

The second case is marked case 2 in the above figure. When you go to case 2, it indicates that the old array does not have any elements. Note that if the initialization capacity of the new array is specified here, set the capacity of the new array to the threshold of the old array, and the threshold of the new array to the following logic, otherwise set the default value

Next, let's look at the process of data migration of old arrays

  final Node<K,V>[] resize() {
           //Omit part of the code
            // Create a new array (for the first addition of elements, this array is the first array; for the existence of oldTab, this array is the new array to be expanded)
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;    // Point the table attribute of the map to the new array
        if (oldTab != null) {    // If the old array is not empty, it indicates that it is a capacity expansion operation, which involves the transfer of elements
            for (int j = 0; j < oldCap; ++j) { // Traversing the old array
                Node<K,V> e;
                if ((e = oldTab[j]) != null) { // If the current position element is not empty, it needs to be transferred to the new array
                    oldTab[j] = null; // Release the reference of the old array to the elements to be transferred (mainly to make the array recyclable)
                    if (e.next == null) // If the element does not have a next node, there is no hash conflict for the element
                        // PS3
                        // Store the elements in a new array. Where to store them in the array needs to be modeled according to the hash value and array length
                        // [hash value% array length] = [hash value & (array length - 1)]
                        //  In this way, the array length must be the nth power of 2, but the initialization capacity can be arbitrarily specified through the constructor. If 17 and 15 are specified, isn't there a problem? It doesn't matter. Finally, the user specified will be converted to the N-th power greater than its nearest 2 through the tableSizeFor method. 15 -> 16,17-> 32
                        newTab[e.hash & (newCap - 1)] = e;

                        // If the element has the next node, it indicates that there is a linked list at this position (multiple elements with the same hash are stored in this position of the old array in the form of linked list)
                        // For example, if the array length is 16, the two elements with a hash value of 1 (1% 16 = 1) and a hash value of 17 (17% 16 = 1) will be stored in the second position of the array (the corresponding array subscript is 1). When the array expansion is 32 (1% 32 = 1), the elements with a hash value of 1 should also be stored in the second position of the new data group, but the hash value is 17 (17% 32 = 17) Should be stored in the 18th position of the new array.
                        // Therefore, after the array is expanded, all elements need to recalculate their positions in the new array.
                    else if (e instanceof TreeNode)  // If the node is of TreeNode type
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);  // This is discussed separately here
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;  // If you translate by name, it should be called low head and tail nodes
                        Node<K,V> hiHead = null, hiTail = null;  // If you translate by name, it should be called high-level head and tail nodes
                        // The low order above refers to 0 to oldCap-1 of the new array, and the high order specifies oldCap to newCap - 1
                        Node<K,V> next;
                        // Traversal linked list
                        do {  
                            next = e.next;
                            // This part simply divides the new array into high-order and low-order,
                            //For example, the length of the new array is 32, [0,15] is low and [16,31] is high
                            //The ultimate goal is to spread the linked list on the old array to different positions on the new array as much as possible
                            if ((e.hash & oldCap) == 0) {  
                                // PS4
                                if (loTail == null) // If there is no tail, the linked list is empty
                                    loHead = e; // When the linked list is empty, the header node points to the element
                                else
                                    loTail.next = e; // If there is a tail, the linked list is not empty. Hang the element to the end of the linked list.
                                loTail = e; // Set the tail node as the current element
                            }
                            else { 
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) { // The linked list composed of low-order elements is still placed in the original position
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {  // The position of the linked list composed of high-order elements is only offset by the length of the old array.
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead; // For example, hash is 17, which is placed at 0 subscript in the old array and 16 subscript in the new array; Hash is 18, which is placed at 1 subscript in the old array and 17 subscript in the new array;                   
                        }
                    }
                }
            }
        }
        return newTab; // Returns a new array
    }

What is difficult to understand in this part of the code is how to evenly distribute the data of the old array to different positions on the new array. First, look at the following code

(e.hash & oldCap) == 0

Suppose the length of the old array is 16, and the length of the new array 32, 15 and 31 are in the position of 15 in the table below the old array

See the above figure using e.hash & oldcap and operation. If the result is 0, it will be placed in the same position as the old array, i.e. 15. The subscript of the new array is 15. If not, it will be placed in the table under the old array + the length of the old array, i.e. 31. The subscript of the new array is 31. The next section will look at the questions about HashMap in the interview

Posted by dprichard on Fri, 03 Dec 2021 23:12:13 -0800