HashMap of JDK Source Code

HashMap source code, based on jdk1.6.43

 

Its internal implementation is actually an Entry array, the size of the array has been determined at the time of construction. When subsequently storing data, according to the zipper method in the data structure, according to each stored Hash, a series of linked lists are formed at the corresponding array location. The head of the linked list is the element of a certain location of the array. But in the follow-up operation, the number of arrays will be expanded according to the number of data.

 

The initial capacity is 16, and the commentary indicates that the number must be the power of 2.

static final int DEFAULT_INITIAL_CAPACITY = 16;

 

Maximum capacity 1073741824

static final int MAXIMUM_CAPACITY = 1 << 30;

 

Default load factor, used to balance the space used and the remaining space

static final float DEFAULT_LOAD_FACTOR = 0.75f;

 

Objects that actually store data are internal classes in HahsMap

transient Entry[] table;

 

Map's current capacity size

transient int size;

 

Threshold size for expansion

int threshold;

 

The actual load factor has been determined at the time of construction and will not be changed subsequently.

final float loadFactor;

 

It defines an init method, which will be called at the end of the constructor, and an empty implementation in HashMap. The purpose of this method is to rewrite the method after subclass inheritance and realize some custom operations after construction.

void init() {}

 

Functions for internal hash calculation

static int hash(int h) {
	h ^= (h >>> 20) ^ (h >>> 12);
	return h ^ (h >>> 7) ^ (h >>> 4);
}
 

Based on the hash value calculated above and the size of the array, find the location of the array where the key needs to be stored.

static int indexFor(int h, int length) {
	return h & (length - 1);
}

To place elements in HashMap, first determine whether the key is empty for special processing. If the key is not null, then call the hash function above according to the hashCode value of the key to calculate the hash value that HashMap needs to use. Then confirm the position of the array subscript of the element according to the indexFor method above, and then to the subscript. The existing list is traversed to determine whether there is the same key value in the original list. If there is one, it directly overrides the old value and returns it. If not, it adds a new key to the original list and returns null.

Note: The put method only calculates the location based on the key, which has nothing to do with the value value passed in.

	public V put(K key, V value) {
		if (key == null)
			return putForNullKey(value);
		int hash = hash(key.hashCode());
		int i = indexFor(hash, table.length);
		for (Entry<K, V> e = table[i]; e != null; e = e.next) {
			Object k;
			if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
				V oldValue = e.value;
				e.value = value;
				e.recordAccess(this);
				return oldValue;
			}
		}
		modCount++;
		addEntry(hash, key, value, i);
		return null;
	}

The special treatment for null values is a private function. The difference is that null values are placed in the linked list where the array subscript is 0, and other subsequent steps are similar to put.

	private V putForNullKey(V value) {
		for (Entry<K, V> e = table[0]; e != null; e = e.next) {
			if (e.key == null) {
				V oldValue = e.value;
				e.value = value;
				e.recordAccess(this);
				return oldValue;
			}
		}
		modCount++;
		addEntry(0, null, value, 0);
		return null;
	}
 

The two put functions above call the addEntry function at the end. The function is to create an Entry object and then put the corresponding object into the array according to the position of the subscript of the calculated array.

	void addEntry(int hash, K key, V value, int bucketIndex) {
		Entry<K, V> e = table[bucketIndex];
		table[bucketIndex] = new Entry<K, V>(hash, key, value, e);
		if (size++ >= threshold)
			resize(2 * table.length);
	}
As can be seen from the constructor in the Entry inner class below, it is a single linked list implementation, where each current element uses next to store the location of the next Entry object.

	Entry(int h, K k, V v, Entry<K, V> n) {
		value = v;
		next = n;
		key = k;
		hash = h;
	}
One of the important operations of the addEntry method above is resize method. The significance of this method is to expand the whole array of stored data, and decide whether to expand the array according to whether the number of data has exceeded the threshold. Since the size of the array does not change after initialization, a new array can only be used to replace the obsolete array in case of expansion. The implementation of this process is the resize method. He doubled the size of the array, copied the data to the new array table using transfer method, and recalculated the expansion threshold.

	void resize(int newCapacity) {
		Entry[] oldTable = table;
		int oldCapacity = oldTable.length;
		if (oldCapacity == MAXIMUM_CAPACITY) {
			threshold = Integer.MAX_VALUE;
			return;
		}
		Entry[] newTable = new Entry[newCapacity];
		transfer(newTable);
		table = newTable;
		threshold = (int) (newCapacity * loadFactor);
	}

The implementation of transfer method is to traverse each element in the array, then traverse the linked list for each element in the array, recalculate the location, and then assign it to a new array to complete the data transfer operation.

 

Compared with put function, get function is simpler, because it does not involve expansion operation of array. Judging from null value of key, if null, special processing takes value from the position of array subscript 0, if not null, calculates hash value, then traverses the list, finds the same Entry return value of hash, if not found, returns value. Back to null.

	public V get(Object key) {
		if (key == null)
			return getForNullKey();
		int hash = hash(key.hashCode());
		for (Entry<K, V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
			Object k;
			if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
				return e.value;
		}
		return null;
	}
The remove operation is the deletion operation of the single linked list, which is also the movement operation of the linked list pointer.

 

The following is part of the internal Entry code. Each Entry object stores a key-value pair and keeps the hash value of the key to avoid repetitive operations.

	static class Entry<K, V> implements Map.Entry<K, V> {
		final K key;
		V value;
		Entry<K, V> next;
		final int hash;

		Entry(int h, K k, V v, Entry<K, V> n) {
			value = v;
			next = n;
			key = k;
			hash = h;
		}
	}
This set of code is executed, whether in initial assignment or execution, with the underlying array size used to be the power of 2.

As follows, when the initial Capacity initial capacity is passed in, the actual capacity is still greater than the nearest 2 power of the initial Capacity.

	public HashMap(int initialCapacity, float loadFactor) {
		// Find a power of 2 >= initialCapacity
		int capacity = 1;
		while (capacity < initialCapacity)
			capacity <<= 1;

		this.loadFactor = loadFactor;
		threshold = (int) (capacity * loadFactor);
		table = new Entry[capacity];
		init();
	}
In this way, the number of operations in indexFor function will always be the binary number of all 1. If we abandon this design, it will lead to the existence of zero in the number of operations, which will lead to some places unable to store data, resulting in the waste of array elements. The design of all 1 can maximize the use of the entire range of arrays to store data.

 

One of the internal classes is HashIterator, an iterator for HashMap, and one of the properties is expectedModCount, which is set to modCount when initialized. From the above code, we can see that when making modification operations, HashMap adds modCount as a variable to record modification counts for comparison in this iterator.

	private abstract class HashIterator<E> implements Iterator<E> {
		Entry<K, V> next; // next entry to return
		int expectedModCount; // For fast-fail
		int index; // current slot
		Entry<K, V> current; // current entry

		HashIterator() {
			expectedModCount = modCount;
			if (size > 0) { // advance to first entry
				Entry[] t = table;
				while (index < t.length && (next = t[index++]) == null)
					;
			}
		}
	}

As you can see in the following method, before each iteration, it is judged whether modCount and expectedModCount are equal, and if not, a Concurrent ModificationException exception is thrown. This is also the Fast-Fail mechanism of HashMap, which throws exceptions as soon as the state changes are found when multi-threaded modifications are made to ensure the correctness of its iteration. But we can't rely too much on this exception. We should still code HashMap's thread security manually or directly use the thread security classes in juc.

		final Entry<K, V> nextEntry() {
			if (modCount != expectedModCount)
				throw new ConcurrentModificationException();
			Entry<K, V> e = next;
			if (e == null)
				throw new NoSuchElementException();

			if ((next = e.next) == null) {
				Entry[] t = table;
				while (index < t.length && (next = t[index++]) == null)
					;
			}
			current = e;
			return e;
		}

Posted by gillms1 on Fri, 12 Jul 2019 15:44:47 -0700