Noseparte says: There's nothing less HashMap to make Java play well

Keywords: Java JDK Oracle less

Brief Introduction

As a Java programmer, HashMap is an inevitable data type.

This is evidenced by both the frequency of use in development and the frequency of inspection in interviews.

HashMap's Past and Present

HashMap was born in JDK1.2. With the update of JDK version and to solve hash collision problem in HashMap in JDK1.7,
Oracle team at JEP 180 : Use Balanced Trees (known as red and black trees) to handle frequent HashMap conflicts.The official documents are as follows:

Data structure comparison before and after HashMap optimization

  • AshMap in JDK 1.7

In the large direction, HashMap contains an array, and each element in the array is a one-way chain table.In the image below, each green color
Entities are instances of the nested class Entry, which contains four attributes: key, value, hash value, and next for one-way chain lists.

  1. Capacity: the current array capacity, always keep 2^n, can be expanded, expanded array size is twice the current.
  2. Load Factor: Load factor, default 0.75.
  3. Threshold: The threshold for expansion, equal to capacity * loadFactor

  • HashMap in JDK 1.8

Java8 has made some modifications to HashMap, the biggest difference being that it uses red-black trees, so it consists of arrays + chaining lists + red-black trees.
From the introduction of Java7 HashMap, we know that when looking up, we can quickly navigate to the array's
Specific subscripts, but later, we need to follow the list one by one to find what we need, depending on the time complexity
The length of the chain table, O(n).To reduce this overhead, in Java 8, when there are more than eight elements in the list,
Chain lists are converted to red-black trees, which reduce the time complexity to O(logN) when searching for these locations.

Performance comparison before and after HashMap optimization

Talk is cheap. Show me the code: Handwritten HashMap

HashMap.java

/**
 * @Auther: Noseparte
 * @Date: 2019/10/18 11:07
 * @Description:
 *
 *          <p>Take a closer look at HashMap</p>
 */
public class HashMap<K, V> extends AbstractMap<K, V> implements Map<K, V>, Cloneable, Serializable {

    // Default initial capacity
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    // Maximum Container Value
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // Default Load Factor
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // Threshold Value for Conversion from Chain List to Red-Black Tree
    static final int TREEIFY_THRESHOLD = 8;

    transient int modCount;

    // Node hash bucket
    transient Node[] table;

    // Load factor
    final float loadFactor;

    // HashMap can store key-value pairs**limits**
    int threshold;

    // Number of existing key-value pairs in HashMap
    transient int size;

    public HashMap(float loadFactor) {
        this.loadFactor = loadFactor;
    }

    public HashMap(int initialCapacity){
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    public HashMap(int initialCapacity, float loadFactor) {
        if(initialCapacity < 0){
            throw new IllegalArgumentException();
        }
        if(loadFactor <= 0){
            throw new IllegalArgumentException();
        }
        this.loadFactor = loadFactor;
        threshold = tableSizeFor(initialCapacity);
    }

    private static int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

    public Set<Entry<K, V>> entrySet() {
        return null;
    }

    private void afterNodeInsertion(boolean evict) {

    }

    private void afterNodeAccess(Node<K,V> e) {
    }

    private void treeifyBin(Node<K,V>[] newTable, int hash) {

    }

    /**
     * Hash The algorithm takes the hashCode value of the key, the high-bit operation, and the modulus operation.
     *
     * @param key key key
     * @return
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    // Create a regular (non-tree) node
    Node<K,V> newNode(int hash, K key, V value, Node<K, V> next) {
        return new Node<K, V>(hash, key, value, next);
    }

}

Node Node

/**
 * @Auther: Noseparte
 * @Date: 2019/10/18 11:13
 * @Description:
 *
 *          <p>Node Maintain a linked list </p>
 */
static class Node<K, V> implements Map.Entry<K, V>{
    final int hash;
    final K key;
    V value;
    Node<K, V> next;

    public Node(int hash, K key, V value, Node<K, V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey() {
        return key;
    }

    public final V getValue() {
        return value;
    }

    public final V setValue(V newValue) {
        V oldValue = this.value;
        value = newValue;
        return oldValue;
    }


}

TreeNode.java Red-Black Tree

/**
 * @Auther: Noseparte
 * @Date: 2019/10/18 15:53
 * @Description:
 *
 *          <p>Red-black tree</p>
 */
static final class TreeNode<K, V> extends Node{
    TreeNode<K, V> parent;
    TreeNode<K, V> left;
    TreeNode<K, V> right;
    TreeNode<K, V> prev;
    boolean red;

    TreeNode(int hash, K key, V value, HashMap.Node<K, V> next) {
        super(hash, key, value, next);
    }

    final void split(HashMap<K,V> kvHashMap, Node[] newTable, int index, int oldCapacity) {

    }

    /**
     * Get the tree node
     *
     * @param hash key hash value of
     * @param key key key
     * @return
     */
    TreeNode<K, V> getTreeNode(int hash, Object key) {
        return ((parent != null) ? root() : this).find(hash, key, null);
    }

    final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
        // Slightly Interested Please View Source
        return null;
    }

    final TreeNode<K,V> root() {
        for (TreeNode<K,V> r = this, p;;) {
            if ((p = r.parent) == null)
                return r;
            r = p;
        }
    }

    Node<K,V> putTreeVal(HashMap<K, V> kvHashMap, Node<K, V>[] newTable, int hash, K key, V value) {
        // Red-Black Tree Inserts Key-Value Pairs Directly
        return null;
    }
}

Analysis of Common Functions in HashMap

put(K key, V value)

public class HashMap{
    
    /**
     * Call Layer HashMap Inserts k-V Key-Value Pairs
     *
     * @param key key key
     * @param value value value
     * @return V
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implementation Layer HashMap Inserts Bottom Implementation of k-V Key-Value Pairs
     *
     * @param hash key Corresponding hash value
     * @param key   key key
     * @param value value value
     * @param onlyIfAbsent If true, do not change existing values
     * @param evict If false, a new one will be created
     * @return
     */
    final public V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                          boolean evict) {
        Node<K, V>[] newTable;
        Node<K, V> node;
        int n, i;
        /** table Is it empty or is the length zero */
        if ((newTable = table) == null || (n = newTable.length) == 0)
            // Expansion
            n = (newTable = resize()).length;
        // Calculate hash value from key to insert array index i
        if ((node = newTable[i = (n - 1) & hash]) == null) {
            // If empty, insert directly
            newTable[i] = newNode(hash, key, value, null);
        } else {
            Node<K, V> e;
            K k;
            // Pass checking hash values of key and key
            if (node.hash == hash &&
                    ((k = node.key) == key || (key != null && key.equals(k)))) {
                // Insert directly if present
                e = node;
                // Is table[i] a tree node
            } else if (node instanceof TreeNode) {
                // Red-Black Tree Inserts Key-Value Pairs Directly
                e = ((TreeNode<K, V>) node).putTreeVal(this, newTable, hash, key, value);
            } else {
                // Start traversing the list of chains ready for insertion
                for (int binCount = 0; ; ++binCount) {
                    if ((e = node.next) == null) {
                        node.next = newNode(hash, key, value, null);
                        // Is the chain list longer than 8
                        if (binCount >= TREEIFY_THRESHOLD - 1)
                            // Turn to a red-black tree, insert key-value pairs
                            treeifyBin(newTable, hash);
                        break;
                    }
                    // Chain list insertion, directly replacing value if key exists
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    node = e;
                }
            }
            if (e != null) {
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null) {
                    e.value = value;
                }
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // Determine if capacity expansion is required
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
}

get()

public class HashMap{
    
    /**
     * Get value K-V through key
     * @param key key value
     * @return value
     */
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Find the underlying implementation of a K-V in HashMap
     * @param hash node#hash value
     * @param key node#key value
     * @return
     */
    private final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
                (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                    ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    /**
                     * Find Node
                     * @param hash Node Corresponding hash value
                     * @param key Determine if key s are the same
                     * @return Node
                     */
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
}

resize()

public class HashMap{
    
    /**
     * Hardy Barrel Array Expansion
     * <p>
     *      if null: Initialize Array Length [12] DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY
     *      else:    2 Exponential power expansion table.length << 1
     * <p/>
     * <p>
     *     Bit operation c is a natural constant
     *     c << 1  ==> 2c
     *     c << 2  ==> 4c
     * <p/>
     * @return
     */
    final Node<K, V>[] resize(){
        /// <summary>
        ///Hash bucket array initialization
        /// <summary>
        Node<K, V>[] oldTable = this.table;
        int oldCapacity = (oldTable == null) ? 0 : oldTable.length;
        int oldThreshold = this.threshold;
        int newCapacity, newThreshold = 0;
        if(oldCapacity > 0){
            if(oldCapacity > MAXIMUM_CAPACITY){
                threshold = Integer.MAX_VALUE;
                return oldTable;
            }
            else if ((newCapacity = oldCapacity << 1) < MAXIMUM_CAPACITY &&
                    oldCapacity >= DEFAULT_INITIAL_CAPACITY){
                newThreshold = oldThreshold << 1;
            }
        }
        else if (oldThreshold > 0){ // Initial size is threshold
            newCapacity = oldThreshold;
        }
        else {  // If threshold is 0, use the default value
            newCapacity = DEFAULT_INITIAL_CAPACITY;
            newThreshold = (int) (DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        }
        if(newThreshold == 0){
            float currentThreshold = newCapacity * loadFactor;
            newThreshold = newCapacity < MAXIMUM_CAPACITY && currentThreshold < MAXIMUM_CAPACITY ?
                    (int) currentThreshold : Integer.MAX_VALUE;
        }
        threshold = newThreshold;
        Node[] newTable = new Node[newCapacity];
        table = newTable;
        if(oldTable != null){
            for(int j = 0; j < oldCapacity; ++j){
                Node<K, V> eachNode;
                if((eachNode = oldTable[j]) != null){
                    oldTable[j] = null;
                    if(eachNode.next == null){
                        newTable[eachNode.hash & (newCapacity - 1)] = eachNode;
                    }
                    else if (eachNode instanceof TreeNode){
                        ((TreeNode)eachNode).split(this, newTable, j, oldCapacity);
                    }
                    else {
                        Node<K, V> leftHead = null, leftTail = null;
                        Node<K, V> rightHead = null, rightTail = null;
                        Node<K, V> next;
                        do {
                            next = eachNode;
                            if((eachNode.hash & oldCapacity) == 0){
                                if(leftTail == null){
                                    leftHead = eachNode;
                                }else {
                                    leftTail.next = eachNode;
                                }
                                leftTail = eachNode;
                            }
                            else {
                                if (rightHead == null){
                                    rightHead = eachNode;
                                }else {
                                    rightTail.next = eachNode;
                                }
                                rightTail = eachNode;
                            }
                        }while ((eachNode = next) != null);
                        if (leftTail != null){
                            leftTail.next = null;
                            newTable[j] = leftHead;
                        }
                        if (rightTail != null){
                            rightTail.next = null;
                            newTable[j] = rightHead;
                        }
                    }
                }
            }
        }
        return newTable;
    }
}

Derived topics in HashMap

  1. Why do balance trees in HashMap use red and black trees?

Insertion efficiency is higher than balanced binary tree and query efficiency is higher than normal binary tree.So red and black trees with relatively compromised performance are selected.

  1. Chain list or red-black tree?

The difference between Hash and red-black trees:
Balance three factors: search speed, data volume, memory usage, scalability, and order.
Red and black trees are ordered, Hash is disordered, and is chosen on demand.
Red and Black Trees use less memory (they only need to allocate memory for their existing nodes), and Hash should allocate enough memory to store hash lists beforehand, even if some slots may be abandoned
The time complexity of red and black tree lookup and deletion is O(logn), and that of Hash lookup and deletion is O(1).

  1. What is a Hash collision and how can it be avoided?

A hash is the mapping of different inputs to a unique, fixed-length value (also known as a hash value).
If different inputs get the same hash value, a "collision" occurs.
The most effective way to prevent hash collisions is to expand the value space of the hash value.
Ruan Yifen: Hash Collision and Birthday Attack

  1. When I look at the HashMap source code, why do I define a local variable inside a function?

In short, it's Doug Lea's own optimization tool

Stack Overflow Text
The Word of Doug Lea

finally

  1. HashMap in JDK1.8 is based on HashMap (Array + Chain List) in JDK1.7, which adds red and black trees, greatly optimizing HashMap performance
  2. HashMap is thread insecure. Do not operate HashMap simultaneously in a concurrent environment. ConcurrentHashMap is recommended.
  3. Extension is a particularly performance-intensive operation, so when a programmer is using HashMap, he estimates the size of the map and gives a rough number to initialize it to avoid frequent expansion of the map.

Posted by Davo on Thu, 05 Dec 2019 05:14:39 -0800