An article gives you a thorough understanding of the implementation of HashMap without worrying about being bullied.

Keywords: JDK less

Article catalog

HashMap
preface

What is a red black tree
Discoloration
rotate

Sinistral
Dextral

brief introduction

Basic elements of HashMap
Node
Construction method

HashMap()
HashMap(int initialCapacity)
HashMap(Map<? extends K, ? extends V> m)

Add method put
Get method get
Expand resize

JDK1.8 capacity expansion
JDK 1.7 capacity expansion

follow-up

Author: LSS has a long way to go

HashMap

preface

Before I introduce HashMap, I need to know something else: red black tree.

What is a red black tree

Red black tree is actually a self balanced binary search tree. The height of its left and right subtrees may be greater than 1. Strictly speaking, the red black tree is not a completely balanced binary tree. Then another question is introduced: what is a binary search tree? What are the disadvantages of binary search tree? Why do red and black trees grow?

Here is a link: https://www.cs.usfca.edu/~galles/visualization/Algorithms.html You can clearly see the process of inserting, deleting and taking values of various data structures.

As shown in the figure: This is the real column of a binary search tree. Each time the data is inserted, the system first determines whether it is larger than the root node (current data > root node)? Insert from right: insert from left) this is a ternary operator. And the insertion is from the bottom of the leaf node to make a comparison and then choose whether it is on the left or right side of the leaf node. There will be a disadvantage: the height of the tree. The more data, the higher the height. As a result, it takes too long to query the lowest leaf node. So there is one of its red and black trees.

As shown in the picture: it is a red black tree. The red black tree maintains a different color between the two.

Each node is either red or black

Root node must be black

Red nodes cannot be continuous (children of red nodes cannot be red)

For each node, any path from that node to null (the end of the tree) contains the same number of black nodes.

Intuitively, the insertion of red black tree is similar to that of binary search tree, only maintaining two different colors and why is it a balanced binary tree? In order to solve this problem, some balancing operations are introduced: discoloration, rotation. Rotation is divided into left and right

Discoloration

Discoloration can be divided into many situations. This is just one of them, because it is not the focus of this time. If you are interested, you can find out.

It can be seen from the figure that the left node child of B is also red. Obviously contrary to the concept of red black tree (red nodes can not be continuous), then after adjustment (on the right side of the way). Change node B to black. So we solved the problem of discontinuity. But there will be a new problem. There is one more black node in the side path, which causes the black nodes on both sides to be inconsistent.

So we change the color of the parent node A of B to red. Although it solves the problem that the number of black nodes on both sides is the same, it will cause the last problem that can not be continuous. So we changed A's right node child to black. As shown in the figure below, both problems are solved.

After learning how to adjust the balance by changing the color, let's see how the rotation works

rotate

As we mentioned above, the rotation can be divided into left-hand rotation and right-hand rotation, so let's show them separately

Sinistral

Left: your right node becomes your husband node. Instead, the left node of the right node becomes its own right node.

Dextral

Right: your left node becomes your own parent. Instead, the right node of the left node becomes its own left node.

If you don't understand the rotation here, let's give you a group of dynamic display charts

At this point, it's over here. It's about the result of HashMap, but it's about the data structure. But there's no way. If you don't understand the above. You may not be able to read the following source code.

brief introduction

Said so much, finally arrived at the text. HashMap implements the Map interface. JDK1.7 is implemented by array + linked list, and 1.8 later by array + linked list + red black tree. It is a storage container in the form of key and value, and allows key to be null and value to be null. The container thread is not safe.

HashMap structure of JDK 1.7

HashMap structure of JDK 1.8

Basic elements of HashMap

   /**
     *   Default initial capacity 1 < 4 = = 2 ^ 4 = 16
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     *    Maximum capacity 2 ^ 30
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     *  Default load factor
     *  The load factor represents the usage of a current hash
     *  The number of containers size > load factor * array length needs to be expanded
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f; 
    /**
     *  JDK 1.8 newly added 
     *     If a linked list > = 8 in the array needs to be converted to a red black tree
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     *  JDK 1.8 newly added
     *     If a linked list in the array is converted to a red black tree node < 6, it will continue to be converted to a linked list
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     *  JDk 1.8 newly added
     *  If the linked list element is > = 8 and the array is > 64, the red black tree will be converted
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     *   From the previous entry, it should be changed to Node type, in fact, Node is Map.Entry Interface implementation class of.
     *   (Entry Is an internal interface in the Map interface)
     */
	transient Node<K,V>[] table;


   	// Record the number of key value pairs
    transient int size;

    /**
     *  Record the number of changes to elements in the collection
     */
    transient int modCount;

    /**
     *  Threshold: the number of elements that can be accommodated. When size > threshold, the capacity will be expanded
     * threshold = Load factor * array length
     */
    int threshold;

Node

static class Node<K,V> implements Map.Entry<K,V> {
    	// hash value
        final int hash;
        final K key;
        V value;
    	// Lower node pointer
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

Construction method

HashMap()

public HashMap() {
    	// The default load factor is 0.75
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

HashMap(int initialCapacity)

 public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
 }

// Specify initial capacity value, default load factor 0.75
public HashMap(int initialCapacity, float loadFactor) {
    	// Initial capacity not less than 0
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
    	// If the initial capacity > 2 ^ 30, the capacity is 2 ^ 30
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
    	// Load factor cannot be less than 0 | no value
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
    	// Load factor
        this.loadFactor = loadFactor;
    	// threshold
        this.threshold = tableSizeFor(initialCapacity);
 }

HashMap(Map<? extends K, ? extends V> m)

// Create a new HashMap based on Map
public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
}

Add method put

 public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
 }

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
    	// Create a new table when table is null
        if ((tab = table) == null || (n = tab.length) == 0)
           // Use resize() to create a new table
            n = (tab = resize()).length;
    	//According to the data length and hash value, a numerical subscript is obtained by operation. If there is no element in the subscript, it will be stored directly
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            // Element already exists
            Node<K,V> e; K k;
            // There is a conflict, pointing to the first node
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // If it's a red black tree. Store key value pairs.
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // If it is linked list storage
            else {
                // Circular list 
                for (int binCount = 0; ; ++binCount) {
                    // Until the end of the list, there is no duplicate key
                    if ((e = p.next) == null) {
                        // New node storage
                        p.next = newNode(hash, key, value, null);
                        // If chain len gt h > = specified value 8 - 1
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            // Transformation of Mangrove
                            treeifyBin(tab, hash);
                        break;
                    }
                    // End the cycle if the same key is found
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // If there is a duplicate key, the new value will replace the old one
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
    	// Whether expansion is needed
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
  }

If you still don't understand the code level, use words to sort out the process

Call hash to get the hash value of the key
Determine whether the table is null. If NULL, create a new table array
Gets the subscript position based on the array length and hash value. If the subscript has no data, it will be stored directly

(1) If there is data, judge whether there is a conflict. The conflict directly obtains this node

(2) To determine whether to store in red black tree, the method in red black tree is used to store key value pairs.

(3) Linked list storage, circular linked list, whether there is data conflict with it

I. if there is no data conflict at the end of the cycle, insert it directly at the tail, and judge whether it is > = 8-1 to convert the red black tree

II. If there is a conflicting key, the loop will be terminated immediately and the node will be obtained at the same time

Replace the old value with the new one

Note one point: if the user-defined type is the key of HashMap, the hashCode and equals methods need to be overridden at this time.

principle:

Override hashCode method: ensure that elements with the same content return the same hash value, and different elements return different hash values as much as possible.

Override equls method: objects with the same content return true.

After all, let's have some more pictures. Maybe the combination of the three can be more thorough. If you don't say much, just go to the picture above.

When inserting the data whose key is handsome, a subscript will be calculated through the hash and array length of the key. For example, at this time, the subscript is 0, and there is no data at this time. Then the data will be inserted here directly. When inserting a data for key shuaizi for the second time. Through the hash of key and the length of array, the subscript is 2. At this time, if there is no data here, insert it directly. As shown in Figure 1. We mentioned a linked list before. So where is the list. So look at Figure 2. At this time, we need to insert a key data for zishuai. In some way. Zishuai and shuaizi's hash may be the same. This will lead to data conflicts. At this time, the zipper method is used to solve the conflict problem. Here is a problem to pay special attention to: the figure above shows that the JDK 1.7 insertion method uses the head insertion method, that is, after the conflict between zishuai and shuaizi. Will put the child commander in the head node. Its back pointer points to shuaizi.

Why do you emphasize that JDK 1.7 uses plug-in? Isn't that kind of insertion after that

This is not the case for JDK 1.8. It's not kind of you to do it by yourself. 1.8 is not very different from the above figure, and it has been emphasized many times. Linked list > = 8-1 & & array length > 64 convert to red black tree. Let's make up the brain. Let's focus on tail insertion

Consider a question: why do we use tail plug? Is there any defect in head plug?

Let's think about this first. Detailed explanation will be provided during resize().

Get method get

  final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
      	// tab is not null while table length > 0 and the corresponding subscript content under the array is not null
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // If the key of the first node corresponding to the subscript of the array is the same as the key searched
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                // Return to the first node
                return first;
            // Different, there are other nodes
            if ((e = first.next) != null) {
                // Red black tree storage 
                if (first instanceof TreeNode)
                    // Get Node of key pair through red black tree
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // In the form of linked list, all nodes are traversed.
                do {
                    // If the same direct return is found
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
      	// Return null if no
        return null;
    }

If the table is not null and the corresponding storage subscript is not null for
Determine whether the first node is the same. Return if same
The first node is not the data you are looking for.
1. If it is a red black tree storage, use the red black tree to obtain the corresponding Node
2. Store the linked list, traverse the linked list, find the same key and return to Node
If table is null, or there is no corresponding data, return null

Expand resize

JDK1.8 capacity expansion

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        //  table length > 0
        if (oldCap > 0) {
            // Cannot expand if table length > = 2 ^ 30
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                // Return to the original array
                return oldTab;
            }
            // If the maximum space is not exceeded, it will be doubled
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
    	// Get new threshold
        threshold = newThr;
    	// Create a new Node array and hash table based on the new capacity
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            // Traverse the original table to recalculate the position of each element
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // The corresponding subscript element is not null
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    // There is only one element under the array
                    if (e.next == null)
                        // Store on new array
                        newTab[e.hash & (newCap - 1)] = e;
                    // Red black tree storage 
                    else if (e instanceof TreeNode)
                        // Separate the red and black trees
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        // Linked list storage
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        // Get each node of the linked list
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

JDK 1.7 capacity expansion

  void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }


  void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
      	// Array assignment after expansion
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                // The value of the key will be recalculated here. Jdk 1.8 changed here
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
  }

It seems that there are so many source codes. Don't you think it's too boring? It seems that it's just like you don't understand. Very strange. It doesn't matter. Let's look at it with code and graphics. In this article, I will also explain why the tail interpolation method is adopted as mentioned before

Let's see the capacity expansion under single thread first

This is a single thread expansion. As the code says, the array will be expanded to twice the original size, and then all nodes will be traversed to assign values to the new array. Someone will say it at this time. Why are C running to other subscripts A and B still in the original subscript? Because the subscript depends on the hash and array length of the Key. If the array is expanded, the subscript will change in some cases. Is it true that every Key will be hashed once? In principle, this is the case, but it has been adjusted after 1.8. It will not lead to rehash, because each time a hash consumes a lot of performance.

Multithreading expansion

First, explain in advance the significance of the three above figures

The first one is that thread 1 is suspended by CPU scheduling after the expansion is completed, starting at thread 2
The second is that thread 2 starts to expand the capacity and then continues to execute thread 1 after the array assignment is finished
The third is the continuation of thread 1

Let's explain one by one

First of all, I'd like to say something unpleasant in advance. When reading this passage, please take a look at the source code of the expansion and the single thread and multi-thread expansion diagram above. Have a macroscopic understanding and then taste this passage. To some extent, Xiaobian is not willing to repeat the same thing many times. Please understand!!!

Thread 1 is suspended after executing the capacity expansion method (in this case, it refers to the line of code that has completed the capacity expansion, not the whole capacity expansion method). It is not necessary to look at the above code to make up for it. Then thread 2 is executed. After expansion and recalculation of subscript. The state changes to the one shown on the right side of Figure 2. The original state is that B's next points to a, which is caused by the expansion of thread 2. At this time, a's next points to B. Thread 2 is suspended and starts to execute thread 1. At this moment, the next of thread 1's a does not point to null. After thread 2's processing, at this time, it points to B, which causes a particularly serious problem. A points to B, and B points to a, forming a dead cycle.

So we abandoned the head inserting method and introduced the tail inserting method. Why doesn't tail insertion cause these problems. I think the answer is already in your mind. If you haven't figured it out yet. Then you will follow the section "add put" in the article of Xiaobian to start step by step, insert the value with tail interpolation, and then try to expand the capacity. So here I will not introduce too much, want to know the little partner, has followed the small ideas in the drawing oh.

follow-up

That's all for today's article. See you next time.

Posted by NoorAdiga on Thu, 18 Jun 2020 23:25:45 -0700

Programmer Group