007Java collection 006 explain HashMap

Keywords: Java

Note: This article is based on JDK1.8.

1 Introduction

It is not allowed to insert elements with the same key value, and it is allowed to insert null key value.

The bottom layer is composed of array, linked list and red black tree. The linked list or red black tree is stored in the array. Taking a key to value mapping as an element cannot ensure that the insertion order and output order are consistent.

Thread unsafe.

2. Capacity expansion mechanism

The array structure will have the concept of capacity. The default capacity of HashMap is 16 and the default load factor is 0.75, which means that when the number of inserted elements exceeds 0.75 times the length, it will be expanded. The default expansion increment is 1, so the capacity after expansion is 2 times.

It is better to specify the initial capacity value to avoid wasting time and efficiency due to excessive capacity expansion.

3 method description

3.1 construction method

// Constructor that specifies the length and load factor.
public HashMap(int initialCapacity, float loadFactor);
// Specifies the length of the constructor, using the default load factor.
public HashMap(int initialCapacity);
// Null parameter constructor, using the default load factor.
public HashMap();
// The constructor of a collection is passed in, and the specified collection is added using the default load factor.
public HashMap(Map<? extends K, ? extends V> m);

3.2 common methods

// Get the number of.
public int size();
// Judge whether it is empty.
public boolean isEmpty();
// Get value according to key. If it does not exist, null will be returned.
public V get(Object key);
// Set key and value key value pairs and return the original value. If they do not exist, null will be returned.
public V put(K key, V value);
// Delete the key value pair according to the key and return the original value. If it does not exist, it will return null.
public V remove(Object key);
// Clear all elements.
public void clear();

4 source code analysis

4.1 properties

Static properties:

// The default capacity is 16, which is an integer power of 2.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// The maximum capacity is the 30th power of 2. If the incoming capacity is too large, it will be replaced by this value.
static final int MAXIMUM_CAPACITY = 1 << 30;
// The default load factor is 0.75.
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// The tree threshold is 8. When the number of elements in the linked list exceeds 8, it will be converted to a red black tree
static final int TREEIFY_THRESHOLD = 8;
// The de tree threshold is 6. When the number of elements in the red black tree is less than 6, it will be converted to a linked list.
static final int UNTREEIFY_THRESHOLD = 6;
// The minimum capacity of hash table is 64 When treeing. To avoid conflicts, the value is at least the product of the treelization threshold and 4.
static final int MIN_TREEIFY_CAPACITY = 64;

General properties:

// Array, used to store linked lists and red black trees.
transient Node<K,V>[] table;
// A collection that stores key and value pairs.
transient Set<Map.Entry<K,V>> entrySet;
// Number of key value pairs.
transient int size;
// Number of modifications for the fast failure mechanism.
transient int modCount;
// Expansion threshold.
int threshold;
// Load factor.
final float loadFactor;

4.2 tools and methods

// Recalculate the hash value according to the hashCode of the key.
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
// Calculate the threshold based on the length.
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

4.3 construction method

// Constructor that specifies the length and load factor.
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
    this.loadFactor = loadFactor;
    // Set the threshold according to the length.
    this.threshold = tableSizeFor(initialCapacity);
}
// Specifies the length of the constructor, using the default load factor.
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
// Null parameter constructor, using the default load factor.
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}
// The constructor of a collection is passed in, and the specified collection is added using the default load factor.
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

4.4 common methods

// Get the number of.
public int size() {
    return size;
}
// Judge whether it is empty.
public boolean isEmpty() {
    return size == 0;
}
// Get value according to key. If it does not exist, null will be returned.
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
// Get the node according to the key.
final Node<K,V> getNode(int hash, Object key) {
    // Node array tab, array first node, target node e, array length n, target node key value k.
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // Assign a value and judge. If the array has been initialized and the first node of the array is not empty, the element is obtained.
    if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        // Judge whether the key and value of the first node of the array are satisfied. If so, the first node of the array is returned.
        if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // If the first node of the array is not satisfied, assign a value and traverse the linked list and red black tree.
        if ((e = first.next) != null) {
            // If it is a red black tree node, query through the red black tree node.
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // If it is a linked list node, it will traverse the linked list node query.
            do {
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
// Set key and value key value pairs and return the original value. If they do not exist, null will be returned.
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
// Set key and value key value pairs and return the original value. If they do not exist, null will be returned.
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    // Node array tab, pointer node p, array length n, array position i.
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // Assign and judge. If the array is not initialized, initialize the array.
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length
    // If the array is initialized and the pointer node does not exist, a new node is created to store key and value.
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // If the array is initialized and the pointer node exists, find the node with the same key value and replace value.
    else {
        // Target node e, target node key value k.
        Node<K,V> e; K k;
        // Judge whether the key and value of the pointer node meet the requirements, and if so, take the pointer node as the target node.
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // The pointer node is not satisfied and is a red black tree node. Traverse the red black tree and return to the target node.
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // The pointer node is not satisfied and is a linked list node. Traverse the linked list and return the target node.
        else {
            // Record the number of linked list nodes and traverse the linked list.
            for (int binCount = 0; ; ++binCount) {
                // Take the next node as the target node. If it does not exist, the traversal is completed and there is no target node.
                if ((e = p.next) == null) {
                    // Create a new node to store key and value.
                    p.next = newNode(hash, key, value, null);
                    // If the number of newly added linked list nodes exceeds the tree threshold, try the tree operation.
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // At this time, the target node does not exist, and the loop jumps out.
                    break;
                }
                // In the process of traversal, find the target node that meets the requirements.
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    // At this time, the target node exists and jumps out of the loop.
                    break;
                // Enter the target node into the loop as a new pointer node.
                p = e;
            }
        }
        // If the target node exists, it does not need to increase the number and expand the capacity. Replace value and return the original value.
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    // This is the end of execution. It indicates that a new node is added and the operands increase automatically.
    ++modCount;
    // The number increases automatically. If the number exceeds the threshold, the capacity will be expanded.
    if (++size > threshold)
        resize();
    // Post processing after adding a new node.
    afterNodeInsertion(evict);
    // Returns null.
    return null;
}
// Delete the key value pair according to the key and return the original value. If it does not exist, it will return null.
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ? null : e.value;
}
// Delete the key value pair according to the key and return to the original node.
final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) {
    // Node array tab, pointer node p, array length n, array position index.
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    // Assign a value and judge. If the array has been initialized and the first node of the array is not empty, the element will be deleted.
    if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) {
        // Original node, target node e, target node key value k, target node value v.
        Node<K,V> node = null, e; K k; V v;
        // Judge whether the key and value of the pointer node meet the requirements, and if so, take the pointer node as the original node.
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        // Take the next node of the pointer node as the target node. If the target node exists, continue to traverse the node.
        else if ((e = p.next) != null) {
            // If the pointer node is a red black tree node, query the original node through the red black tree node method.
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            // If the pointer node is a linked list node, query the original node through the linked list node method.
            else {
                do {
                    // In the process of traversal, find the target node that meets the requirements.
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {
                        // Make the target node the original node.
                        node = e;
                        // At this time, the original node exists and jumps out of the loop.
                        break;
                    }
                    // Save the target node with the pointer node. When jumping out of the loop, the pointer node saves the previous node of the target node.
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        // If the original node exists and meets the judgment rule of value value, continue to execute.
        if (node != null && (!matchValue || (v = node.value) == value || (value != null && value.equals(v)))) {
            // If the original node is a red black tree node, the original node is deleted through the red black tree node method.
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            // If the original node is a linked list node and the original node is the same as the pointer node, the next node of the original node is taken as the first node of the array.
            else if (node == p)
                tab[index] = node.next;
            // If the original node is a linked list node and the original node is different from the pointer node, the next node of the original node is taken as the next node of the pointer node.
            else
                p.next = node.next;
            // This indicates that the original node is deleted and the operands increase automatically.
            ++modCount;
            // The number is self decreasing.
            --size;
            afterNodeRemoval(node);
            // Return to the original node.
            return node;
        }
    }
    return null;
}
// Clear all elements.
public void clear() {
    // Define the node array tab.
    Node<K,V>[] tab;
    // The operand is incremented.
    modCount++;
    // If the array is initialized, set the number to 0 and empty each node.
    if ((tab = table) != null && size > 0) {
        size = 0;
        for (int i = 0; i < tab.length; ++i)
            tab[i] = null;
    }
}

4.5 expansion method

final Node<K,V>[] resize() {
    // Record the original node array.
    Node<K,V>[] oldTab = table;
    // Record the capacity of the original array.
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // Record the original threshold.
    int oldThr = threshold;
    // Define new array capacity and new array threshold.
    int newCap, newThr = 0;
    // If the original capacity is greater than 0, the capacity will be expanded.
    if (oldCap > 0) {
        // If the original capacity is greater than or equal to the maximum capacity.
        if (oldCap >= MAXIMUM_CAPACITY) {
            // Set the original threshold value to the integer maximum value and return the original array.
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Double the original capacity and assign it to the new capacity. If the new capacity is less than the maximum capacity and the original capacity is greater than or equal to the default capacity.
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
            // Double the original threshold and assign it to the new threshold.
            newThr = oldThr << 1;
    }
    // If the original capacity is 0, judge whether the original threshold is greater than 0.
    else if (oldThr > 0)
        // For the first initialization, assign the original threshold value to the new capacity.
        newCap = oldThr;
    // The original capacity is 0, and the original threshold is also 0.
    else {
        // Set the default capacity.
        newCap = DEFAULT_INITIAL_CAPACITY;
        // Set the default threshold.
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // Judge whether the new threshold is 0.
    if (newThr == 0) {
        // Calculate a new threshold.
        float ft = (float)newCap * loadFactor;
        // If the new capacity is less than the maximum capacity and the threshold is less than the maximum capacity, the new threshold is used, otherwise the maximum value is used.
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
    }
    // Determine the new threshold.
    threshold = newThr;
    // Start constructing a new node array.
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // If the original node array is initialized, the original node is placed in the new node array.
    if (oldTab != null) {
        // Traverse the original node array.
        for (int j = 0; j < oldCap; ++j) {
            // Define the original node.
            Node<K,V> e;
            // If the original node is not empty, record the original node and move it.
            if ((e = oldTab[j]) != null) {
                // Empty the original node.
                oldTab[j] = null;
                // If the original node has no child nodes, it means that there is only one node in the array and it is directly placed in the new node array.
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // If the original node is a red black tree node, the original node is stored in the new node array in the red black tree.
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // If the original node is a linked list node, move the linked list node.
                else {
                    // Define the head node and tail node of the low order linked list.
                    Node<K,V> loHead = null, loTail = null;
                    // Define the head node and tail node of the high-level linked list.
                    Node<K,V> hiHead = null, hiTail = null;
                    // Define the next node.
                    Node<K,V> next;
                    // Loop through the node linked list.
                    do {
                        // Assign a value to the next node.
                        next = e.next;
                        // Determine the position of the original node in the new node array. If it is 0, it will be placed in the low linked list.
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // If it is 1, it will be placed in the high-level linked list.
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // If the low order linked list is not empty, the whole low order linked list will be placed in the original position.
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // If the high-order linked list is not empty, put the whole high-order linked list into the new space.
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    // Returns a new array of nodes.
    return newTab;
}

5 supplementary notes

5.1 array length is a multiple of 2

The binary bits after length minus one are all 1, which can be used to calculate the position of nodes in the array without waste.

5.2 the specified length is not a multiple of 2

In the last step of the construction method, the capacity will be calculated according to the specified length, and the minimum positive integer greater than or equal to the specified length and a multiple of 2 will be obtained through shift operation.

5.3 first use hashCode() method, and then use equals() method

There is a hashCode() method in the Object class, which is used to obtain the hash value of the Object, also known as the hash value.

The hashCode() method is modified by native, which means that this method is platform dependent. In most cases, the hashCode() method returns values related to object information (storage address, fields, etc.).

When inserting objects into a collection, it is feasible but inefficient to call equals() to compare them one by one. Therefore, calling hashCode() to judge first, and then calling equals() to judge if it is the same will improve efficiency.

5.4 using hash() method to process hashCode

When calculating the subscript of the node in the array, it is generally obtained by dividing the value related to the node by the length of the array. When the array length is a multiple of 2, the remainder operation is equivalent to subtracting one from the array length.

According to the result obtained by subtracting the length of the array by one, the binary bits are divided into high and low bits, the part with all 0 on the left is regarded as the high bit, and the part with all 1 on the right is regarded as the low bit. In the and operation, any number and 0 can only get 0, so the high bit is invalid and only the low bit is used.

If hashcodes are used for and operation, nodes with different hashcodes but the same low order will be divided into an array, and hash collision is more likely. Therefore, the hashCode needs to be processed so that two nodes with different high bits and the same low bits get different results.

hash() method is called perturbation function and is used to process hashCode. After the hash() method is processed, the change of the high bit of hashCode will also affect the low bit. At this time, using the low bit to calculate the subscript can make the distribution of elements more reasonable and reduce the possibility of hash collision.

In JDK1.8, hashCode is only used for one and operation:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

5.5 when to expand capacity

1) When inserting the first node, expand the capacity when initializing the array.

2) After the node is inserted, the capacity is expanded when the length exceeds the threshold.

5.6 treatment of linked list structure during capacity expansion

Use the high-low linked list to sum the hash value of the node with the length of the original array, and put it into the low-level linked list and high-level linked list according to the results 0 and 1.

Place the low order linked list to the low order of the new array and the high order linked list to the high order of the new array. The low position plus the length of the original array is the high position.

5.7 rewrite the equals method and hashCode method

Generally, when rewriting the equals() method, the hashCode() method will also be rewritten as much as possible, so as to ensure that the hashCode() method can judge the equality when the equals() method judges the equality.

Posted by matt86 on Tue, 28 Sep 2021 18:50:58 -0700

Programmer Group