HashMap Source Learning

Keywords: Java less JDK

common method

put method

Description:

One of the most common methods used to add key-value pairs to a hash bucket. However, this method does not perform the actual operation. Instead, it delegates the putVal method to process

Code:

public V put(K key, V value) {
    // This call specifies hash,key,value, replace existing value, not create mode
    return putVal(hash(key), key, value, false, true);
}

The hash method is called here to get the hash of the key, and then the meaning of the hash is described separately

putVal method

Description:

The method by which the put operation is actually performed.

Code:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // tab - Reference to current hash bucket
    // The node represented by the P-KEY (this node is not necessarily the target node, but simply hash is the same as the bucket length calculation) (it may be a chain table or a red-black tree when it is not empty)
    // n - Capacity of current bucket
    // i - key subscript in bucket (same p, not target node)
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // Initialize the local variable tab and determine if it is empty, initialize the local variable n and determine if it is 0
    // PS: This writing method is heavily used in the source code, and I don't know what happens in a big writing factory (squinting)
    if ((tab = table) == null || (n = tab.length) == 0)
        // When tab is empty or n is 0, hash bucket is not initialized, call resize() method to initialize and reinitialize local variables tab and n
        n = (tab = resize()).length;
    
    // Initialize p and i
    // This uses (n - 1) & hash to calculate the key's subscript in the bucket. This is explained separately later
    // When p is empty
    if ((p = tab[i = (n - 1) & hash]) == null)
        // p is empty, call newNode method to initialize node and assign to tab corresponding subscript
        tab[i] = newNode(hash, key, value, null);
    else {
        // p is not empty, collision occurs. Processing
        
        // e - Target Node
        // k - the key of the target node
        Node<K,V> e; K k;
        
        // Determine if keys are the same. (In addition to comparing keys, hash is also compared here)
        // Note that the local variable k is initialized here at the same time, but it has no value and can be ignored if the second set of conditions is not satisfied
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            // Same key, set E (target node) to p
            e = p;
        // Determine whether a node is a red-black tree
        else if (p instanceof TreeNode)
            // Delegate processing directly when determining
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // Walk here, represent the current node as a common list of chains, and do a traversal lookup
            // The binCount variable is used only as a threshold to determine whether tree ization has been achieved.
            for (int binCount = 0; ; ++binCount) {
                
                // Gets the next element of the chain table and assigns it to e (where e is an intermediate variable, not sure if it is the target node)
                // For the first for loop, P represents the node in the hash bucket (which is also the head node of the chain table) and then remains equal to p.next
                if ((e = p.next) == null) {
                    // Chain list traverses to the end
                    
                    // Append new nodes to the list of chains
                    p.next = newNode(hash, key, value, null);
                    
                    // Determine if the current list length reaches the tree threshold
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        // Call treeifyBin method to process directly
                        treeifyBin(tab, hash);
                    // Break Loop
                    // Notice that the local variable e=null at this point
                    break;
                }
                
                // If you can go here, the list is not finished, and compare if e has the same k (hash vs==)
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    // Keys are the same
                    break;
                
                // e is neither null nor the target node, assigned to p, ready for the next cycle
                p = e;
            }
        }
        
        // Determine whether e exists
        if (e != null) { // existing mapping for key
            // e is not equal to null indicating that the operation is Replace
            
            // Cached Old Value
            V oldValue = e.value;
            // Determine if the old value must be replaced or null
            if (!onlyIfAbsent || oldValue == null)
                // Must replace or the old value must be empty to update the value of node e
                e.value = value;
            // Callback
            afterNodeAccess(e);
            // Return Old Value
            // Notice that this is returned without a modCount update and the following
            return oldValue;
        }
    }
    
    // Everyone comes here except updating the list node (what is the return value of putTreeVal to be confirmed)
    // modCount+1
    ++modCount;
    // Size+1 (number of elements+1)
    // Determine whether the threshold is exceeded
    if (++size > threshold)
        // Reset Size
        resize();
    // Call post-node insert callback
    afterNodeInsertion(evict);
    return null;
}

resize method

Description:

Expansion and slot redistribution after adding key-value pairs

Code:

final Node<K,V>[] resize() {
    //-------------------------------------------------- Calculation of new capacity and threshold-------------------------------
    
    // Cache Bucket Reference
    Node<K,V>[] oldTab = table;
    // Cache the length of the old bucket, using 0 when the bucket is null
    // Note that oldTab.length, not size, is used here
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // Cache Threshold
    int oldThr = threshold;
    // New Bucket Capacity and Threshold
    int newCap, newThr = 0;
    
    // Old capacity is larger than. This generally means that the bucket has been resize d several times
    if (oldCap > 0) {
        
        // Older capacity greater than MAXIMUM_CAPACITY = 1 << 30 = 1073741824
        // Capacity is calculated in n < 1, and displacement is performed again when oldCap >= MAXIMUM_CAPACITY. The maximum possible value is Integer.MAX_VALUE
        if (oldCap >= MAXIMUM_CAPACITY) {
            // Set threshold to Integer.MAX_VALUE
            threshold = Integer.MAX_VALUE;
            // Direct return. Discard all subsequent processing
            return oldTab;
        }
        // Initialize newCap with oldCap << 1
        // When oldCap is less than MAXIMUM_CAPACITY and oldCap is greater than DEFAULT_INITIAL_CAPACITY(16)
        // At this point, the newCap may already be larger than MAXIMUM_CAPACITY and newThr=0 or the newCap is small (less than 16 > 2) and newThr=0
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            //Set newThr to oldThr << 1 (Correctness check not done here, to be checked)
            newThr = oldThr << 1; // double threshold
    }
    // Determine if the old threshold is greater than 0
    // Going here means oldCap==0 and using a constructor that contains the initialCapacity parameter to construct the map without adding elements
    else if (oldThr > 0) // initial capacity was placed in threshold
        // Use Copy New Capacity as Old Threshold (newCap is 0 at this time)
        // Note: When using a construction method that includes the initialCapacity parameter, its threshold s have been calculated to be the nth power of 2
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // Default method, when a parameterless construction method is used, the oldThr and oldCap are both equal to 0
        // Assign to newCap using default initialization capacity
        newCap = DEFAULT_INITIAL_CAPACITY;
        // Assign to newThr by multiplying the load factor with the default initialization capacity
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
    // Unified processing of newThr
    if (newThr == 0) {
        // Multiplication of new capacity with load factor
        float ft = (float)newCap * loadFactor;
        // When both newCap and FT are less than MAXIMUM_CAPACITY, newThr=ft. Otherwise, newThr=Integer.MAX_VALUE
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    
    //------------------------------------------------ Element rearrangement-------------------------------
    // Update threshold
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // Update bucket object (empty at this time)
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // Determine if the old bucket is empty
    if (oldTab != null) {
        // The old bucket is not empty. Traverse
        for (int j = 0; j < oldCap; ++j) {
            // Barrel element
            Node<K,V> e;
            // Bucket element acquisition
            // Determine if a bucket element exists (this often happens because of calculations using (n-1) &hash)
            if ((e = oldTab[j]) != null) {
                // Delete reference
                oldTab[j] = null;
                // Determine if the bucket element has the next element
                if (e.next == null)
                    // There is no next element required. Use the same algorithm to calculate the subscript in the new bucket and assign values
                    newTab[e.hash & (newCap - 1)] = e;
                // The bucket element has next to determine whether it is TreeNode
                else if (e instanceof TreeNode)
                    // Delegate execution
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // For list structure, split into high and low groups
                    
                    // loHead and loTail do not represent lower head and lower tail
                    Node<K,V> loHead = null, loTail = null;
                    // hiHead and hiTail do not represent high head and high tail
                    Node<K,V> hiHead = null, hiTail = null;
                    // next
                    Node<K,V> next;
                    // Traversal target already exists, use do while directly
                    do {
                        // next to get e.
                        next = e.next;
                        // Determine if hash of e is high
                        // The principle of judgment is as follows.
                        // First oldCap is constant to the n-th power of 2, and the binary expression is 1000...
                        // Subscript calculation equation is (n-1) &hash
                        // When n is brought in, it is...111&hash
                        // When n=111, hash is 111, result is 101
                        // When n=1111, hash is 1111, result is 1111. Expressed as high (note the high position of hash)
                        // When n=1111, hash is 0101 and result is 101. Expressed as low (note the high position of hash)
                        // In this way, new subscripts can be calculated directly. However, this method requires recalculating all elements, which is very inefficient
                        // So jdk uses a special method. It directly compares the highest bit, when a hash is equal to the length of the array (that is, the N-power of n), such as 1101-1000
                        // When the result is equal to zero, it means that the hash is a low-bit hash, and the other is a high-bit hash.
                        if ((e.hash & oldCap) == 0) {
                            // Low position
                            // Determine if a low tail exists
                            if (loTail == null)
                                // No, no header, initialize
                                loHead = e;
                            else
                                // Exists, appended to the tail of next
                                loTail.next = e;
                            // Update Tail
                            loTail = e;
                        }
                        else {
                            // High position
                            if (hiTail == null)
                                // No, no header, initialize
                                hiHead = e;
                            else
                                // Exists, appended to the tail of next
                                hiTail.next = e;
                            // Update Tail
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    
                    // End processing
                    // Determine if the low position is empty
                    if (loTail != null) {
                        // Not empty
                        // Clear the next of the end element. When loTail is the second last element in the list and the first last element is the high element, you need to clear the next reference of loTail to the high element
                        loTail.next = null;
                        // Low bits are saved with the original subscript
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        // Not empty
                        // Clear the next of the end element. The reason is the same but the judgment is the opposite
                        hiTail.next = null;
                        // Low Bit Save with Original Subscript+Original Capacity
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    // Return to newTab
    return newTab;
}

treeifyBin method

Description:

This method is called when the chain list length in a slot is longer than the threshold value. Replace the chain list at the corresponding location in the slot with a red-black tree.
Note: This method simply replaces the chain list with the red-black tree object TreeNode. At this time it is still a chain structure, not a structure that is assembled into a red-black tree. The TreeNode.treeify method with the head object of the chain list needs to be completed at the end.

Code:

final void treeifyBin(Node<K,V>[] tab, int hash) {
    // n - Represents the parameter tab length
    // Subscript denoting hash in index - tab
    // Hash - the pending list node hash
    // e - Target Node
    int n, index; Node<K,V> e;
    // Determine if tab is empty or if tab length MIN_TREEIFY_CAPACITY=64
    // That is, the length of a single chain list in a bucket may have met the requirements (such as binCount >= TREEIFY_THRESHOLD - 1 in putVal), but tree ization will not occur if the bucket capacity is not up to the standard
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // Table is empty or table capacity is less than MIN_TREEIFY_CAPACITY
        // Reset Size
        resize();
    // Can be tree yed to check the existence of chain list nodes
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        // Chain list node exists
        
        // Tree Node Head and Tail
        TreeNode<K,V> hd = null, tl = null;
        // There's already a first goal, just do while
        do {
            // Construct a TreeNode. (There is no additional logic here, just creating a TreeNode using the current e)
            // Note that the Tree here inherits from LinkedHashMap.Entry and contains the bi-directional list of before and after. However, TreeNode implements the bi-directional list of prev and next by itself, without using the data structure of the former.
            TreeNode<K,V> p = replacementTreeNode(e, null);
            // Determine if the tail is empty
            if (tl == null)
                // Initialize Header
                hd = p;
            else {
                // Tail not empty
                // Set Last Node
                // Set the trailing next node
                p.prev = tl;
                tl.next = p;
            }
            // interchange trailer
            tl = p;
        } while ((e = e.next) != null);
        
        // Assign a value and determine if the header node is empty
        if ((tab[index] = hd) != null)
            // Call treeify of TreeNode to assemble a red-black tree
            hd.treeify(tab);
    }
}

Even if called, this method does not guarantee replacing the chain list in the corresponding slot to the red-black tree. This also requires checking that the current bucket capacity reaches the threshold MIN_TREEIFY_CAPACITY

TreeNode.treeify method

Description:

The actual executor who converts the linked list structure's data to the red-black tree structure's data (at this point all objects in the linked list are of type TreeNode)

Code:

final void treeify(Node<K,V>[] tab) {
    // Root node (black node)
    TreeNode<K,V> root = null;
    // Iterate. (The current this scope is in the TreeNode instance)
    // x represents the node in the current traversal
    for (TreeNode<K,V> x = this, next; x != null; x = next) {
        // Cache next
        next = (TreeNode<K,V>)x.next;
        // Ensure that the left and right nodes of the current node are null
        x.left = x.right = null;
        // Determine if there is a root node
        if (root == null) {
            // Non-existent
            // No parent with node. So set to null
            x.parent = null;
            // In the red-black tree, the root node is black
            x.red = false;
            // Save to local variable
            root = x;
        }
        else {
            // Confirmed with node
            
            // Cache key
            K k = x.key;
            // Cache hash
            int h = x.hash;
            // key type
            Class<?> kc = null;
            // ----------------------------------------------------------------------------------------------------------------------------------------------------------------
            // p is the parent of the inserted node
            for (TreeNode<K,V> p = root;;) {
                // dir - used to express left and right.
                // ph - parent hash
                int dir, ph;
                // Parent node key
                K pk = p.key;
                
                // ----------------------------------------------------------------------------------------------------------------------------------------------------------
                // Initialize parent hash
                // Determine whether the parent node hash is greater than the current node hash
                if ((ph = p.hash) > h)
                    // dir = -1 Insert node to left of parent
                    dir = -1;
                // Determine if parent hash is less than current node hash
                else if (ph < h)
                    // dir = 1 Insert node to the right of parent
                    dir = 1;
                // Parent hash equals current node hash for extra processing
                // This uses some lass-based methods, which ultimately guarantee the correct dir value (not zero) TODO to be supplemented 
                else if ((kc == null &&
                          (kc = comparableClassFor(k)) == null) ||
                         (dir = compareComparables(kc, k, pk)) == 0)
                    dir = tieBreakOrder(k, pk);


                // --------------------------------------------------------------------------------------------------------------------------------------------------------
                // Cache Insert Node's Parent Node
                TreeNode<K,V> xp = p;
                // Use dir to get the left or right node corresponding to the parent node, and check if the node is null. When not null, go to the next cycle
                if ((p = (dir <= 0) ? p.left : p.right) == null) {
                    // Parent node left or right node is null
                    
                    // Set Parent Node
                    x.parent = xp;
                    // Judge left and right again
                    if (dir <= 0)
                        // Copy the left child node of the parent node as the current node
                        xp.left = x;
                    else
                        // Copy the right child node of the parent node as the current node
                        xp.right = x;
                    // Balance
                    root = balanceInsertion(root, x);
                    // Exit the loop to find the insertion location and insert the next element
                    break;
                }
            }
        }
    }
    
    // Since the root node may be modified to another node during the rotation operation, the direct node in the bucket is a branch node and needs to be fixed
    moveRootToFront(tab, root);
}

hash method

Description:

HashMap's own hash calculation. As a key comparison, index is calculated on the basis of ^ operation between the original hash of the object and its 16-bit height, and a new value is returned to hash.

Code:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The reason key s are not used directly here is to cope with the indirect consequences of a very poor hash distribution
The distribution of hash buckets is very poor, which affects performance. So use the original hash XOR (XOR) as the actual hash
16:16 instead of 8:8:8:8 or other values are used here because jdk developers take time, efficiency, performance into account
A compromise choice after the situation.
It is also because most hash es of the current jdk are well distributed, so there is no need to process them too much
The calculation is as follows
10000000000000000000000000000000
00000000000000001000000000000000
10000000000000001000000000000000

tableSizeFor method

Description:

Typically used as a threshold for initialization, it returns a power greater than the smallest two of the input values. If it is already a power of two, it will be returned again.

Code:

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

Go directly to process (omit cap - 1 for now, and finally)
10000 16N Initial State
11000 24 n|=n>>1 is equivalent to n = 10000|01000 = 11000
11110 30 n|=n>>2 is equivalent to n = 11000|00110 = 11110
11111 31 n|=n>>4 is equivalent to n = 11110|00001 = 11111
11111 31 slightly
11111 31 slightly
100000 32 n+1
Finally, with + 1, the... 111 becomes 100..., the n-th power of 2

An interesting way to do this is to enter the highest significant bit of the value.
By continually copying the highest significant bit (1) to the lower bit, all the lower bits are converted to 1, which eventually equals (2^n)-1.
It is also the highest potential energy expression of the current number
Then, with this value + 1, the value becomes 2^n. which is the power of the smallest 2 greater than the input value

Role of cap-1:
If the input value is already a power of 2, then this method should return it directly to him. Go directly to -1 and use the original logic.

Internal Class

HashIterator

Description:

HashMap's own iterator. It is mainly used to constrain references to parent class members. It also implements necessary methods such as remove,nextNode,hasNext, etc. In order to facilitate subclass implementation, Node type objects are returned directly in the nextNode method. It is used to obtain key and value directly.

Code:

abstract class HashIterator {
    /**
    * Next Node 
    */
    Node<K,V> next;        // next entry to return
    /**
    * Current Node 
    */
    Node<K,V> current;     // current entry
    /**
    * Modification Count
    */
    int expectedModCount;  // for fast-fail
    /**
    * Current subscript (for parent member variable table, it points to a slot in the bucket)
    */
    int index;             // current slot

    /**
    * Construction method 
    */
    HashIterator() {
        // Cache Modification Count
        expectedModCount = modCount;
        // Cache Bucket 
        Node<K,V>[] t = table;
        // Vacuum (is this necessary????) 
        current = next = null;
        // Index at 0 (Why?????) 
        index = 0;
        // Check if the bucket has been initialized 
        if (t != null && size > 0) { // advance to first entry
            // Get and save next ahead of time
            do {} while (index < t.length && (next = t[index++]) == null);
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Node<K,V> nextNode() {
        // Point to current bucket
        Node<K,V>[] t;
        // Cache next.Prepare to replace next.Also e to return as a result
        Node<K,V> e = next;
        // fast-fail
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        // next should not be empty
        if (e == null)
            throw new NoSuchElementException();
        // ------------------------ Find next-----------------------
        // Set current to e and next to e.next
        // Determine whether next is null
        // If empty, get the current bucket
        // Determine if the bucket is empty (can you walk here, indicating that you have previously acquired a node in the bucket, what is wrong with the empty bucket????)
        if ((next = (current = e).next) == null && (t = table) != null) {
            // Next fetch above failed. Here we use slot switching to find the next one
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }

    public final void remove() {
        Node<K,V> p = current;
        if (p == null)
            throw new IllegalStateException();
        // fast-fail
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        // Delete the current in the current iteration
        current = null;
        // Get key
        K key = p.key;
        // Call parent delete method
        // This value only applies to treeNode, to verify the effect of setting it to false
        // Apparently set to false in iterator
        removeNode(hash(key), key, null, false, false);
        // update count
        expectedModCount = modCount;
    }
}

KeyIterator

Description:

Implementation of HashIterator. Wraps the nextNode method and returns the key of Node

Code:

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

ValueIterator

Description:

Implementation of HashIterator. Wraps the nextNode method and returns the value of Node

Code:

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next() { return nextNode().value; }
}

EntryIterator

Description:

Implementation of HashIterator. Wraps the nextNode method and returns Node directly

Code:

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { return nextNode(); }
}

KeySet

Description:

The AbstractSet is inherited, and the implementation method is accomplished by calling a reference to the parent HashMap directly through the attributes of the internal class

Code:

final class KeySet extends AbstractSet<K> {
    /**
    * Returns the member variable size of hashMap
    * @return 
    */
    public final int size()                 { return size; }
    /**
    * Since the method is the same name, it can only be called with the class name.this.MethodName()
    */
    public final void clear()               { HashMap.this.clear(); }
    /**
    * Returns an internal class key iterator
    * @return 
    */
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    /**
    * Call parent method
    * @param o
    * @return 
    */
    public final boolean contains(Object o) { return containsKey(o); }
    /**
    * Call parent method. Tags do not need matching values, delete and rebuild
    * @param key
    * @return 
    */
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
    /**
    * Returns an internal class keySpl iterator
    * @return 
    */
    public final Spliterator<K> spliterator() {
        return new KeySpliterator<>(HashMap.this, 0, -1, 0, 0);
    }
    /**
    * Implement forEach
    * Here's one thing to watch out for
    * For fail-fast, this method will not be judged until all elements have been iterated
    * @param action
    */
    public final void forEach(Consumer<? super K> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e.key);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
        }
    }
}

Posted by Jaquio on Tue, 19 Nov 2019 22:42:33 -0800