HashMap Source Code Analysis

Keywords: Attribute

Today let's take a look at HashMap
Introduction of Hash Table
When we learn about data structures, we have learned a way of finding called hash lookup.
Let's give an example to illustrate:
For example, there are eight locations in memory that make up a hash table
3.0 1 2 3 4 5 6 7
Now I have an object that needs to be stored in one of the above eight locations. This object has an attribute key. If it is not stored arbitrarily without hashCode, then when searching, it needs to be searched one by one in these eight locations, or using algorithms such as dichotomy.
But if you use hashCode, you'll be much more efficient.
We define our hashCode as key% 8 and store our classes in the location where we get the remainder. For example, if our key is 9, 9 divided by 8 and the remainder is 1, then we put this class in the position of 1. If the key is 13 and the remainder is 5, then we put this class in the position of 5. In this way, we can find the storage location directly by dividing the remainder of key by 8 when we look up this class. This is the principle of hash lookup.
But if the hash code of a single object is the same, it will cause hash conflict, and 10/8 is 2.18/8 is 2. We must solve hash conflict and put hash value in the same hash "bucket".
II. Data structure of HashMap
Our HashMap data structure is based on the columns mentioned above.
The underlying implementation of HashMap is arrays and linked lists
Head insertion is used in the linked list.
An array is a Hash table.
Chain lists are designed to prevent Hash conflicts
Let's look at the internal classes representing the underlying data structure of HashMap

/** Entry It's a one-way list.    
     * It is a linked list corresponding to "HashMap chain storage method".    
     *It implements the Map.Entry interface, which implements getKey(), getValue(), setValue(V value), equals(Object o), hashCode().  
    **/  
    static class Entry<K,V> implements Map.Entry<K,V> {    
        final K key;    
        V value;    
        // Point to the next node    
        Entry<K,V> next;    
        final int hash;    

        // Constructor.    
        // Input parameters include "hash value (h)", "key (k)", "value (v)", and "next node (n)".    
        Entry(int h, K k, V v, Entry<K,V> n) {    
            value = v;    
            next = n;    
            key = k;    
            hash = h;    
        }    

        public final K getKey() {    
            return key;    
        }    

        public final V getValue() {    
            return value;    
        }    

        public final V setValue(V newValue) {    
            V oldValue = value;    
            value = newValue;    
            return oldValue;    
        }    
        // Determine whether two Entries are equal    
        // If both Entry's "key" and "value" are equal, return true.    
        // Otherwise, return false    
        public final boolean equals(Object o) {    
            if (!(o instanceof Map.Entry))    
                return false;    
            Map.Entry e = (Map.Entry)o;    
            Object k1 = getKey();    
            Object k2 = e.getKey();    
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {    
                Object v1 = getValue();    
                Object v2 = e.getValue();    
                if (v1 == v2 || (v1 != null && v1.equals(v2)))    
                    return true;    
            }    
            return false;    
        }    

        // Implementing hashCode()    
        public final int hashCode() {    
            return (key==null   ? 0 : key.hashCode()) ^    
                   (value==null ? 0 : value.hashCode());    
        }    

        public final String toString() {    
            return getKey() + "=" + getValue();    
        }    

        // When adding elements to HashMap, the drawing calls recordAccess().    
        // There's nothing to do here.    
        void recordAccess(HashMap<K,V> m) {    
        }    

        // When deleting elements from HashMap, the drawing calls recordRemoval().    
        // There's nothing to do here.    
        void recordRemoval(HashMap<K,V> m) {    
        }    
    }

HashMap is actually an Entry array. Entry objects contain keys and values, and next is also an Entry object.
It is used to handle hash conflicts and form a linked list.

Key attributes in HashMap
transient Entry[] table;
transient int size; / / / number of elements stored
int threshold; // critical value When the actual size exceeds the critical value, the capacity expansion threshold = loading factor * capacity is performed.
final float loadFactor;
transient int modCount;
The loadFactor loading factor is used to indicate the degree of element filling in the Hsah table.

If: the larger the loading factor, the more elements filled, the advantage is that the space utilization rate is higher, but: the chance of conflict is increased. The length of the list will be longer and longer, and the search efficiency will be reduced.

Conversely, the smaller the load factor, the fewer elements to fill. The advantage is that the chance of conflict is reduced, but the space is wasted much. The data in the table will be too sparse (a lot of space is not used, it will start to expand).

The greater the chance of conflict, the higher the cost of searching.

Therefore, it is necessary to find a balance and compromise between "conflict opportunities" and "space utilization rate". This balance and compromise is essentially the balance and compromise of the well-known "time-space" contradiction in data structure.

If the machine has enough memory and wants to improve the query speed, the load factor can be set a little bit smaller; on the contrary, if the machine memory is tight and there is no requirement for the query speed, the load factor can be set a little larger. But generally we do not need to set it, let it take the default value of 0.75.
Three. Storage
Let's take a look at HashMap's data storage

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

Let's analyze this code. HashMap allows Null values to exist, that is, key is null. HashMap doesn't care whether Value is null or not. Null corresponds to a hashCode value of 0 by default, so when the key is null, the hash bucket corresponding to the position of 0 in the hash table can only correspond to one Value in the HashMap, so instead of inserting the linked list, the original value is replaced directly.

    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

If the key is not null, the hash method is called with the integer value returned by key.hashCode as the parameter

    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

Using hash method, an integer is generated, which reduces the probability of hash value duplication of different key s and improves the storage efficiency.
Then indexFor(hash, table.length) is executed.

   static int indexFor(int h, int length) {
        return h & (length-1);
    }

Get the location of the hash table corresponding to the key. If the index in the hash table already has data, insert the data into the linked list (that is, put it in the hash bucket) by header interpolation.
If you have key.equals(), which is the value of the key before, replace the value of the original value.
Here's a detail we need to pay attention to.
return h & (length-1);
Generally speaking, we all do and operate with length, and here is length-1. Why?
Let's take a look at the source code of HashMap and see that the power of length 2 means that length-1 is an odd number. We know that an odd number and an odd number may end up being odd or even, but the sum and even number and sum must be even, because the last bit must be zero. And that means about half of the space is wasted. So sum length-1 and length-1 are used here.

get(Object) method
The same is true for finding an element. First, the hash bucket where the element is located is found by hashcode Value, and then the corresponding Value is found by key.equals ().

 public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

Four. Traversal
We generally use the following methods when traversing HashMap

public class TestHashMap {

    public static void main(String[] args) {
        // TODO Auto-generated method stub

        Map<String, String> map = new HashMap<String, String>();
        map.put("1", "kobe");
        map.put("2", "james");
        map.put("3", "paul");
        Iterator<Map.Entry<String, String>> itEntry = map.entrySet().iterator();
        while (itEntry.hasNext()) {
            Map.Entry<String, String> entry = itEntry.next();
            System.out.println("KEY:" + entry.getKey() + "Value:"
                    + entry.getValue());
        }
        for (Map.Entry<String, String> myEntry : map.entrySet()) {
            System.out.println("KEY:" + myEntry.getKey() + "Value:"
                    + myEntry.getValue());
        }
        Iterator<String> itKeySet = map.keySet().iterator();
        while (itKeySet.hasNext()) {
            String key = itKeySet.next();
            System.out.println("KEY:" + key + "VALUE:" + map.get(key));
        }
        for (String key : map.keySet()) {
            System.out.println("KEY:" + key + "VALUE:" + map.get(key));
        }
        // Map<String, String> map1 = new TreeMap<String, String>();

    }

As we can see from the above code, HashMap uses Iterator iterator to traverse, but we know that HashMap implements Map interface and does not implement Collection interface (Map and Collection are level, Collection has List and Set), and it does not override iterator method. Why can iterator() be used? We observed the source code discovery of HashMap

  public Set<K> keySet() {
        Set<K> ks = keySet;
        return (ks != null ? ks : (keySet = new KeySet()));
    }
    private final class KeySet extends AbstractSet<K> {
        public Iterator<K> iterator() {
            return newKeyIterator();
        }
        public int size() {
            return size;
        }
        public boolean contains(Object o) {
            return containsKey(o);
        }
        public boolean remove(Object o) {
            return HashMap.this.removeEntryForKey(o) != null;
        }
        public void clear() {
            HashMap.this.clear();
        }
    }
  public Set<Map.Entry<K,V>> entrySet() {
        return entrySet0();
    }
   private Set<Map.Entry<K,V>> entrySet0() {
        Set<Map.Entry<K,V>> es = entrySet;
        return es != null ? es : (entrySet = new EntrySet());
    }

    private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
        public Iterator<Map.Entry<K,V>> iterator() {
            return newEntryIterator();
        }
        public boolean contains(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry<K,V> e = (Map.Entry<K,V>) o;
            Entry<K,V> candidate = getEntry(e.getKey());
            return candidate != null && candidate.equals(e);
        }
        public boolean remove(Object o) {
            return removeMapping(o) != null;
        }
        public int size() {
            return size;
        }
        public void clear() {
            HashMap.this.clear();
        }
    }

The original HashMap has two internal classes, representing this Set set Set, which inherit the AbstractSet class.
One is a keySet set with keys, and the other is an Entry Set set.
Set sets implement the iterator () method. So it can be traversed by Iterator iterator.
This also illustrates one advantage of internal classes, which breaks the limitation of single inheritance to a certain extent. For the internal class, I will write a blog to discuss it.

Fifth, to answer why the hashCode method should be rewritten
** By explaining HashMap, I believe I have an idea of why hashCode should be rewritten. When searching for HashMap elements, we have to rewrite hashcode. Because we need to find hash barrels through hashCode, and then we need Value from hash barrels through equals. At this point, if hashCode is not rewritten, hashCode using the Object class will not be able to find the hash bucket.
When inserting elements, if hashCode of Object class is not rewritten, hash conflicts will easily occur, which greatly reduces the efficiency of inserting elements.
So we must rewrite hashCode!!!!!!!**

Posted by doofystyle on Mon, 15 Apr 2019 18:54:33 -0700