HashMap 1.8 Core Source Analysis

Keywords: PHP less

HashMap 1.8

hashmap constructor

//The 30th power of maximum capacity 2
    static final int MAXIMUM_CAPACITY = 1 << 30;
    //Default load factor
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
​
    //Hash bucket, store list. The length is 2. N The power, or initialization time, is 0.
    transient Node<K,V>[] table;
​
    //Load factor, which is used to calculate the threshold of the number of hash table elements.  threshold = hash bucket.length * loadFactor;
    final float loadFactor;
    //The threshold of the number of elements in a hash table will be expanded when the number of elements in a hash table exceeds the threshold. resize(). 
    int threshold;
​
    public HashMap() {
        //Default constructor, assign load factor to default 0.75f
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
    public HashMap(int initialCapacity) {
        //Constructor specifying initialization capacity
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
    //At the same time, the initialization capacity and load factor are specified, which is rarely used and will not be modified. loadFactor
    public HashMap(int initialCapacity, float loadFactor) {
        //Boundary Processing
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        //The maximum initial capacity should not exceed the 30th power of 2.
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        //Obviously, the loading factor cannot be negative.
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        //Set the threshold to  =Initialized capacity 2 n The value of the second power
        this.threshold = tableSizeFor(initialCapacity);
    }
    //Create a new hash table and add another map m All elements in the table are added
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

 

hash algorithm of hashmap

The perturbation function is used to solve hash collision. It synthesizes the characteristics of high and low hash values and stores them in low positions. Therefore, it is equivalent to high and low bits to participate in the operation together to reduce the probability of hash collision. (Before JDK8, the perturbation function would perturb four times, and JDK8 simplified this operation.)

//adopt key Of hashCode High 16-bit and low 16-bit XOR operations, the same 0, different 1, in order to get more discrete hash Value, which will be subtracted 1 from the array length later.
static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

 

put method of hashmap

//1 judge Node Whether the array is initialized or not, if not, it is called resize()Method initialization
if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
//2 For the present key Which subscript should be in the array by using the current node Reduce the length of an array by one pair key Of hash To perform and operate, if the current subscript of the array is empty, place the new directly node
if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
//3 If the current subscript of the array is not empty, then it is generated hash collision
else {
            Node<K,V> e; K k;
            //4 If the node key If it exists, it will cover directly. value
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //5 Judging the chain p Whether it is a red-black tree, if it is a red-black tree, insert key-value pairs directly into the tree, otherwise turn to the bottom.
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //6 The chain is list traversal p,Determine whether the length of the linked list is greater than 8. If it is greater than 8, convert the linked list to a red-black tree and perform insertion in the red-black tree, otherwise insert the linked list.
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        //7 Conversion of Link List Length Over 8 to Red-Black Tree for Processing
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //8 If found during traversal key Direct coverage already exists value that will do
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }

 

hashmap expansion method

final Node<K,V>[] resize() {
        //oldTab Hash bucket for current table
        Node<K,V>[] oldTab = table;
        //Current hash bucket capacity length
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        //Current threshold
        int oldThr = threshold;
        //Initialize new capacity and threshold of 0
        int newCap, newThr = 0;
        //If the current capacity is greater than 0
        if (oldCap > 0) {
            //If the current capacity has reached the upper limit
            if (oldCap >= MAXIMUM_CAPACITY) {
                //The threshold is set to the 21st power of 2.-1
                threshold = Integer.MAX_VALUE;
                //At the same time return to the current hash bucket, no longer expansion
                return oldTab;
            }//Otherwise, the new capacity is twice the old capacity. 
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)//If the old capacity is greater than or equal to the default initial capacity 16
                //So the new threshold is twice the old one.
                newThr = oldThr << 1; // double threshold
        }//If the current table is empty, but there are thresholds. Represents the case where capacity and threshold are specified at initialization
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;//Then the capacity of the new table equals the old threshold.
        else {}//If the current table is empty and there is no threshold. Representation is initialized without any capacity/The case of threshold parameters               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;//At this point, the capacity of the new table is the default capacity of 16
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);//The new threshold is default capacity 16 * Default Load Factor 0.75f = 12
        }
        if (newThr == 0) {//If the new threshold is 0, the corresponding table is empty, but there are thresholds
            float ft = (float)newCap * loadFactor;//New thresholds are calculated according to new table capacity and load factor
            //Cross-border repair
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        //Update threshold 
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        //Building new hash bucket according to new capacity
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        //Update hash bucket references
        table = newTab;
        //If there were elements in the previous hash bucket
        //Let's start moving all the nodes in the current hash bucket to the new hash bucket.
        if (oldTab != null) {
            //Traveling through the old hash bucket
            for (int j = 0; j < oldCap; ++j) {
                //Remove the current node e
                Node<K,V> e;
                //If there are elements in the current bucket,Then assign the list to e
                if ((e = oldTab[j]) != null) {
                    //Empty the original hash bucket so as to GC
                    oldTab[j] = null;
                    //If there is only one element in the current list, (no hash collision)
                    if (e.next == null)
                        //Place this element directly in a new hash bucket.
                        //Note that the subscript here is taken with hash value and barrel length-1 .  Because the length of the barrel is 2. n Second, doing so is actually equivalent to a modular operation. But it's more efficient.
                        newTab[e.hash & (newCap - 1)] = e;
                        //If a hash collision occurs ,Moreover, the number of nodes is more than 8, which is transformed into a red-black tree (let alone avoid too complex, follow-up study of the red-black tree).
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    //If hash collision occurs, the number of nodes is less than 8. Then, according to the hash value of each node on the list, the new hash bucket should be placed in the corresponding subscript position.
                    else { // preserve order
                        //Because the capacity expansion is doubled, each node on the original list may now be stored in the original subscript, that is, the original subscript. low Bits, or expanded subscripts, i.e. high Bit. high position=  low position+Original hash bucket capacity
                        //Head and tail nodes of low-level linked list
                        Node<K,V> loHead = null, loTail = null;
                        //Head and tail nodes of high-level list
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;//Temporary Node Storage e Next node
                        do {
                            next = e.next;
                            //Here is another efficient point of using bitwise operation instead of conventional operation: using hash value and old capacity, we can get the hash value after de-modularization, which is greater than or equal to. oldCap Or less than oldCap,Equivalent to zero means less than oldCap,Should be stored in low position, otherwise stored in high position.
                            if ((e.hash & oldCap) == 0) {
                                //Assignment of pointers to head and tail nodes
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }//High is the same logic.
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }//Loop until the end of the list
                        } while ((e = next) != null);
                        //Store the low-level list in the original index Department,
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        //Store the high-level list in the new index place
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

 

Node property of linked list

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;//Hash value
        final K key;//key
        V value;//value
        Node<K,V> next;//Link list post-node
​
        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
​
        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }
​
        //Each node's hash The value is key Of hashCode and value Of hashCode Or get it.
        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }
        //Setting up new value Return to the old at the same time value
        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
​
        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }
 

 

Differentiation from HashTable

  • In contrast, HashTable is thread-safe and does not allow key and value to be null.

  • The default capacity of HashTable is 11.

  • HashTable uses hashCode(key.hashCode()) of key directly as hash value, unlike HashMap which uses static final int hash(Object key) perturbation function to perturb hashCode of key as hash value.

  • HashTable takes the hash bucket subscript directly by modular arithmetic%. (Because its default capacity is not the n th power of 2. So it is impossible to replace modular operation with bit operation.

  • When expanding capacity, the new capacity is 2 times + 1 of the original capacity. Int newCapacity = oldCapacity < 1 + 1;

  • Hashtable is a subclass of Dictionary and implements Map interface. HashMap is an implementation class of Map interface.

 

Summary:

  • Operations are replaced by bitwise operations as far as possible, which is more efficient.

  • When expansion results in the need for new arrays to store more elements, in addition to migrating elements from old arrays, remember to null references from old arrays for GC.

  • The subscript is taken with hash value and operation (barrel length - 1) I = n - 1) & hash. Since the length of the bucket is the n th power of 2, this is actually equivalent to a modular operation. But it's more efficient.

  • When expansion occurs, if hash collision occurs, the number of nodes is less than 8. Then, according to the hash value of each node on the list, the new hash bucket should be placed in the corresponding subscript position.

  • Because the capacity of expansion is doubled, every node on the original list may now be stored in the original subscript, that is, low bit, or the expanded subscript, that is, high bit. High bit = low bit + original hash bucket capacity

  • If ((e. hash & oldCap) === 0) using hash value and old capacity, we can get whether the hash value is greater than or equal to oldCap or less than oldCap after de-modularization, and equal to 0 means less than oldCap. It should be stored in low position, otherwise it should be stored in high position. Here is another efficient point to use bit operation instead of conventional operation.

  • If the number of linked lists >= 8 after adding nodes, it will be converted to red-black tree.

  • When inserting a node operation, there are some empty implemented functions that can be used for LinkedHashMap rewriting.

Posted by carsonk on Sat, 20 Jul 2019 08:19:59 -0700