Interpretation of HashMap1.8 source code and related interview questions

Keywords: Java

Love technology, open source and programming. Technology is open source and knowledge is shared.
Stay hungry, stay foolish.

If you are also interested in Java, it happens that my article can help you. Join [CAFEBABE] to learn together. Group No.: 243108249

Interpretation of HashMap 1.7 source code

preface

With the increase of working years, more and more detailed questions are asked about the Map set during the interview. This chapter makes a detailed interpretation of the HashMap of jdk1.8, and analyzes the addition of elements and the expansion of the underlying array from the source level

Tip: the following is the main content of this article. The following cases can be used for reference

1, Data structure of HashMap

In jdk1.7, HashMap is implemented by array + linked list. When there are many Hash collisions, the length of linked list will be too long and the time complexity will be O(n); Low efficiency.

In jdk1.8, HashMap is composed of array + linked list + red black tree. In 1.8, it mainly solves the problem of reducing query efficiency when the length of linked list is too long. When the length of the linked list reaches 8, the current linked list structure will be transformed into a red black tree. When the number of nodes of the red black tree is less than 6, it will be transformed from a red black tree into a linked list (detailed analysis will be given later)

2, Interpretation of HashMap source code

First, start with properties and view some basic properties
The code is as follows (example):

	/**
     * The default initial capacity - MUST be a power of two.
     * Default capacity: 16
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     * The maximum capacity is 2 ^ 30
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * The default load factor is 0.75
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * Transformation threshold of red black tree
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * Threshold value of transforming red black tree into linked list
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * Minimum capacity of tree
     */
    static final int MIN_TREEIFY_CAPACITY = 64;
	/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     * The array built by the basic Node node, linked list and red black tree are hung on the array
     */
    transient Node<K,V>[] table;

    /**
     * Holds cached entrySet(). Note that AbstractMap fields are used
     * for keySet() and values().
     * Save the cached entrySet();
     */
    transient Set<Map.Entry<K,V>> entrySet;

    /**
     * The number of key-value mappings contained in this map.
     * map Number of key value pairs in
     */
    transient int size;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     * Record the number of modifications. The main function is to avoid thread safety problems caused by operating HashMap under multithreading
     */
    transient int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     * Threshold of capacity expansion, calculation method = capacity * loading factor
     */
    int threshold;

    /**
     * The load factor for the hash table.
     * Default load factor
     * @serial
     */
    final float loadFactor;

After reading the basic properties, view the basic construction methods

    /**
     * Parameterized constructor
     * @param  initialCapacity the initial capacity Initial capacity
     * @param  loadFactor      the load factor Loading factor
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     * Parametric structure, initialization capacity, default loading factor is (0.75f)
     * @param  initialCapacity the initial capacity. Initialization capacity
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     * The parameterless constructor has a default capacity of 16, a loading factor of (0.75), and a default capacity expansion threshold of 12
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

After understanding the basic attributes and construction methods, check the core method put(K key, V value);

	/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

When calling putVal method, the first parameter is to calculate the hash value according to the key. The specific calculation rules are as follows:
The code is as follows (example):

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

An example is given to illustrate the calculation method of index subscript in HashMap and hash method.

HashMap Middle subscript position calculation
 calculation hash Value, when key == null When, hash Value is 0, otherwise use(h = key.hashCode()) ^ (h >>> 16)calculation hash value
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Through case analysis(h = key.hashCode()) ^ (h >>> 16);How is it calculated
String name = "Wan Quan"；
name.hashCode() == 647074;
take hashCode Convert value 647074 to binary
0000 0000 0000 1001 1101 1111 1010 0010
 Move the value unsigned right 16 bits, and the result is:
0000 0000 0000 0000 0000 0000 0000 1001 
Again hashCode The value 647074 is XOR with the result shifted to the right (the XOR algorithm is: 0 for the same, 1 for the different)
0000 0000 0000 1001 1101 1111 1010 0010 ^
0000 0000 0000 0000 0000 0000 0000 1001 =
0000 0000 0000 1001 1101 1111 1010 1011
 Then convert this value to hexadecimal, that is, the final result is 647083
 The subscript is calculated as:(n - 1) & hash
n by hashmap The length of the array in. By default, it is 16
 Then the calculated value will be brought in:
(16 - 1)& 647083 = 15 & 647083
&The operation rule is: 1 is only 1 when it is all 1, otherwise it is 0
0000 0000 0000 0000 0000 0000 0000 1111   &
0000 0000 0000 1001 1101 1111 1010 1011   =
0000 0000 0000 0000 0000 0000 0000 1011
 The final calculation result is 0000 0000 0000 1011
，Convert to decimal: 11.
It can be concluded that the string Wanquan should be placed in hashMap Where the middle subscript is 11.

After reading the calculation method of hash method, continue to observe putVal(); Method.

/**
     * Implements Map.put and related methods
     * Map.put Related implementation of method
     * @param hash key hash value of
     * @param key
     * @param value Value to be stored
     * @param onlyIfAbsent If true, the existing value will not be modified
     * @param evict If false, the current array is in creation mode
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        // Temporary array tab
        Node<K,V>[] tab;
        // Element at index subscript
        Node<K,V> p;
        // n: The length of the array, i: the subscript of the array
        int n, i;
        // Judge whether the array is initialized (the judgment method is: global array = = null or the length of global array is 0)
        if ((tab = table) == null || (n = tab.length) == 0)
        	// It is executed when the value is saved in HashMap for the first time. It belongs to lazy loading method.
            // resize() initializes the array and returns the length of the initialized array. The resize () method can also be used for capacity expansion. This method will be analyzed in detail later
            n = (tab = resize()).length;
        // Judge whether there are elements in the index subscript. The calculation method of the index subscript is described above. If you don't understand it, you can follow the test case
        if ((p = tab[i = (n - 1) & hash]) == null)
        	 // Create a new linked list node when the array (hash table) has no elements at the index subscript
            tab[i] = newNode(hash, key, value, null);
        else {
            // The array has been initialized and a hash conflict has occurred
            // e: Updated target value, k: p.key
            Node<K,V> e; K k;
            // p here is the value assigned during the second if judgment, and the content is the element at the index subscript of the array (hash table)
            // hash conflict occurs, and the key value is the same
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                // If the key s are equal, assign the queried node to e
                e = p;
                // Judge whether the current p node is a red black tree
            else if (p instanceof TreeNode)
                // Put values into the red black tree (the method here is the method of adding values to the red black tree. Since the implementation of hashMap is studied, it will not be analyzed in depth first)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // The current node is a linked list
                // Traversal array (hash table) index subscript linked list
                for (int binCount = 0; ; ++binCount) {
                    // (it has been judged that the key s are equal earlier)
                    if ((e = p.next) == null) {
                        // Append to the end of the linked list
                        p.next = newNode(hash, key, value, null);
                        // If the number of nodes (traversing one node at a time) exceeds 8
                        // static final int TREEIFY_THRESHOLD = 8;
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            // Carry out the transformation from linked list to red black tree
                            treeifyBin(tab, hash);
                        break;
                    }
                    // If the hash value of the next node is equal and the key is the same, the loop ends
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    // Traverse the next node
                    p = e;
                }
            }
            // For existing nodes, perform the overwrite operation
            if (e != null) { // existing mapping for key
                // Get old value
                V oldValue = e.value;
                // Update value
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                // Return old value
                return oldValue;
            }
        }
        // modCount will increase only when adding a value, but not when overwriting
        ++modCount;
        // Threshold detection
        if (++size > threshold)
            // Capacity expansion
            resize();
        afterNodeInsertion(evict);
        return null;
    }

resize()； Method, which is used to initialize or expand the hash table.

/**
     * The first time is initialization, and then capacity expansion
     **/
    final Node<K,V>[] resize() {
        // Get original array
        Node<K,V>[] oldTab = table;
        // Get original capacity
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // Gets the old threshold, 0 for the first time
        int oldThr = threshold;
        // New capacity and new threshold
        int newCap, newThr = 0;
        // When the old capacity is not 0, expand it
        if (oldCap > 0) {
            // If the old capacity is greater than the maximum capacity
            if (oldCap >= MAXIMUM_CAPACITY) {
                // The threshold is set to the maximum value of Integer
                threshold = Integer.MAX_VALUE;
                // Return to the old tab without making any changes
                return oldTab;
            }
            // Double the old capacity and assign it to the new capacity, which is less than the maximum capacity & & the old capacity should be greater than or equal to the default capacity
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                    oldCap >= DEFAULT_INITIAL_CAPACITY)
                // Double the threshold
                newThr = oldThr << 1; // double threshold
        }
        // Old threshold > 0, initial capacity set to threshold
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {// zero initial threshold signifies using defaults
            // The initialization capacity is 16 and the initialization threshold is 12 when it is created for the first time
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // If the threshold is not initialized in the above process, it is recalculated here
        if (newThr == 0) {
            // The calculation method is the new capacity * loading factor
            float ft = (float)newCap * loadFactor;
            // Recalculate expansion threshold
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
        }
        // Update global threshold
        threshold = newThr;
        // Initialize the underlying array
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // Update global table
        table = newTab;
        // The history tab contains data
        if (oldTab != null) {
            // Traverse old tab
            for (int j = 0; j < oldCap; ++j) {
                // Declare temporary variable e
                Node<K,V> e;
                // Find the non null element in the old tab and assign it to e
                if ((e = oldTab[j]) != null) {
                    // Reset element at old tab[j]
                    oldTab[j] = null;
                    // If e there is no next element
                    if (e.next == null)
                        // Recalculate index subscript in newTab and place element e
                        newTab[e.hash & (newCap - 1)] = e;
                    // If the current e element is of red black tree type
                    else if (e instanceof TreeNode)
                        // Split the current red black tree
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order the current node is of linked list type and has multiple nodes
                        // Low linked list
                        Node<K,V> loHead = null, loTail = null;
                        // High order linked list
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            // Keep next element
                            next = e.next;
                            // Because the value of oldCap is to the power of 2, the result of e.hash & oldCap can only be 0 or another fixed value
                            // When the calculation result is 0, put the current linked list into the low linked list
                            if ((e.hash & oldCap) == 0) {
                                // The node of the low order linked list is empty, which proves that there are no elements in the current low order linked list
                                if (loTail == null)
                                    // The low order chain header node is the current element e
                                    loHead = e;
                                else
                                    // Append element e to the end of the low linked list
                                    loTail.next = e;
                                // Coordinate backward
                                loTail = e;
                            }
                            else {
                                // When the calculation result is not 0, it is put into the high-order linked list
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                            // Start with e and traverse backwards
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            // Put the low linked list into the new tab
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            // Put the high linked list at j+oldCap
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        // Returns the newly created table
        return newTab;
    }

3, HashMap usage optimization

Optimization 1: specify generics

When defining a collection generic, in JDK7 And above, use diamond Grammar or complete ellipsis.
Description: Diamond generic, i.e diamond，Direct use<>To refer to the type already specified by the previous edge.
Java Development Manual
 Positive example:
// diamond mode, i.e. < >
HashMap<String, String> userCache = new HashMap<>(16);
// Full ellipsis
ArrayList<User> users = new ArrayList(10);

Optimization 2: specify the initialization capacity ((number of elements to be stored / load factor) + 1) to avoid frequent capacity expansion due to insufficient capacity in the later stage

Specifies the initial value size of the collection when the collection is initialized.
explain: HashMap use HashMap(int initialCapacity) Initialization. If the collection size cannot be determined temporarily, specify the default
 The recognition value (16) is enough.
Positive example: initialCapacity = (Number of elements to store / Load factor) + 1. Pay attention to the load factor (i.e loader factor)default
 Is 0.75，If the initial value size cannot be determined temporarily, set it to 16 (the default value).
Counterexample: HashMap 1024 elements need to be placed. Since the initial size of the capacity is not set, the capacity is forced 7 times as the elements continue to increase
 expand, resize Need to rebuild hash When tens of millions of set elements are placed, continuous capacity expansion will seriously affect the performance.

4, Related interview questions

1. Why rewrite the HashCode method when overriding Equals

In order to make hash tables such as HashMap work normally, the specific provisions are as follows:
equals is equal, hashcode must be equal.
equals is not equal, hashcode is not necessarily equal.
hashcode is unequal, and equals must be unequal.
hashcode is equal, and equals is not necessarily equal.

The hashcode method of Object is a local method, that is, it is implemented in c or c + +. This method directly returns the memory address of the Object, and then converts it to an integer.
==Compare whether the memory addresses of the two objects are the same
Equals compares the memory addresses of two objects by default

When the Equals comparison objects are equal, the hashcode values need to be equal according to the regulations. Accordingly, the hashcode method needs to be rewritten.

2. How does HashMap avoid memory leakage
When the user-defined object is used as the key of HashMap, if the Equals method and HashCode method are not overridden, the object will be stored in the HashMap all the time and cannot be recycled by GC garbage, resulting in memory leakage.
Solution:
When using a custom object as a key, you need to override the Equals method and HashCode method.

3. How is the bottom layer of HashMap 1.7 implemented
It is implemented in the form of array + linked list, and the query efficiency is O(n);

4. Where is the hashmapkey stored when it is null
Put in the position where the subscript of the array (Hash table) is 0

5. Is the HashMap bottom layer a single linked list or a double linked list
The bottom layer of HashMap is a one-way linked list

6. Time complexity O(1), O(N), O(Logn)

O(1) the query time will not increase with the increase of the amount of data. It is simply understood as one query to get the result
O(n) query time is proportional to the increase of data volume
When the O(logn) array is increased by n times, the query time is increased by logn. For example, when the amount of data is increased by 256 times, the query time is only increased by 8 times

7. Time complexity of HashMap query based on key
First, according to the structure on which the object mapped by the key is stored

It is stored on the array (i.e. the head node of the linked list or red black tree), and the time complexity is O(1)
Stored on the linked list, the time complexity is O(n);
It is stored on the red black tree, and the time complexity is O(logn);

8. How to realize array expansion in HashMap
In jdk1.8, array expansion is based on twice the capacity and twice the threshold.

9. Is the bottom layer of HashMap stored orderly?
Disordered and hash storage

10. Why not use the key as the hash value directly, but perform XOR operation with the high 16 bits?
Reduce the probability of hash collision

11. How can HashMap store 10000 key s with the highest efficiency
When initializing hashMap, the specified capacity is: (10000 elements to be stored / load factor) + 1

12. HashMap1.8 how to avoid multi-threaded capacity expansion dead loop problem
1.8 split the original linked list into high-level linked list and low-level linked list, and reinstall them into the expanded array. Therefore, it will not cause the problem of capacity expansion dead cycle in the case of multithreading.

13. Why does HashMap1.8 need to introduce red and black trees
Because the query efficiency of linked list is too low, the introduction of red black tree can improve the query efficiency and reduce the time complexity from O(n) to O(logn)

14. Why should a linked list with a length > 8 be converted to a red black tree? If it is less than 6, it will be converted to a linked list instead of all 8
In the implementation of hashMap, we can see that the default threshold of turning to red black tree is 8 and the threshold of turning to linked list is 6. According to personal conjecture, when the number of nodes is too small, the efficiency of using linked list is higher than that of turning to red black tree. However, why the threshold of turning to red black tree is inconsistent with that of turning to linked list is to avoid frequent turning from linked list to red black tree when the length of linked list is about 8, And red black tree to linked list, which consumes a lot of resources, so the threshold is inconsistent.

15. Under what circumstances do you need to convert from red black tree to linked list storage?
When the number of nodes in the red black tree is < 6

16. How to reduce the Hash conflict probability at the bottom of HashMap
The high-order hash algorithm with more uniform hash is adopted to take the hashcode value of key and perform XOR operation on the current hashcode value.

summary

HashMap is a collection framework commonly used in daily development. Reasonable use will improve the efficiency of the program.

Posted by joey3002 on Mon, 11 Oct 2021 16:13:56 -0700

Programmer Group