1.HashMap structure
HashMap is the data structure of key-value mapping, which is composed of array and linked list. Array is the main body of HashMap, and linked list exists mainly to solve hash conflict. If the location of the array does not contain linked list (the next entry of the current entry points to null), then for searching, adding and other operations are very important. Fast, only one addressing is needed; if the array located contains linked list, the time complexity of adding operation is O(n), first traverse linked list, existing is covered, otherwise new additions; for finding operation, still need to traverse linked list, and then compare one by one through the equals method of key object. You can get an intuitive understanding of HashMap through the following figure.
2.put and get operations
- put operation (JDK version 1.8)
_1) Calculate the hash value of the key (the calculation process will be described in detail below).
_2) If the array table is null or the length is equal to 0, the data table expansion (the actual implementation of resize method) is carried out when the condition is true.
_3) Calculate the location of Value to be stored according to hash value, that is, calculate the array table index (the calculation process will be described in detail below).
_4) Judge whether table[i] is null, and if it is null, insert the Value directly.
_5) If table[i] is not null, then judge whether the key is repeated, if repeated, insert the cover directly.
_6) If the key does not repeat, judge whether table[i] is a TreeNode type, if it is a red-black tree, insert it directly.
_7) If it is not TreeNode, it traverses the list. The traversal time pre-judges whether the length of the list is greater than or equal to 8 after inserting a new Value. When the condition is True, the list is converted to a red-black tree, and then Value is inserted.
If the length of the upper list is less than 8, insert the list.
_9) Finally check whether the array needs to be expanded.
Detailed code:
public V put(K key, V value) { //Call the putVal method return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; //Cache underlying arrays are references to an address Node<K,V> p; //Key-value pair nodes at bucket i inserted into the array int n; //Length of the underlying array int i; //Subscription of buckets inserted into arrays //Initially, when the table is null or empty, initialize a default table; assign tabs and n, tabs point to the underlying array, and N is the length of the underlying array. if ((tab = table) == null || (n = tab.length) == 0){ n = (tab = resize()).length; } //(n - 1) & hash: Calculate the location of the insertion point in the bucket of the underlying array according to the hash value, i.e. subscript value; assign value to p and I (whether collision or not, assign value to i) //If no collision occurs on the array, i.e., no overvalue is inserted before the current insertion position, the key-value pair to be inserted is inserted directly at that location. if ((p = tab[i = (n - 1) & hash]) == null){ tab[i] = newNode(hash, key, value, null);//The next attribute of the inserted node is null } else { //Collision occurs, i.e., the current position has been inserted with a value Node<K,V> e; //The intermediate variable, like the intermediate variable in the bubble sort, acts as a value exchange. K k; //Ditto //If the hash value is the same and the key is the same, then the value of the key-value pair is updated. Same as jdk 1.7 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))){ //Notice in this if [e!= null] e = p;//Here, e = p, they both point to the place where the array subscript is i, and after this if else if else else ends, the value of the node is updated. } else if (p instanceof TreeNode){ //This tree method may return null. //jdk 1.8 introduces a red-black tree to handle collisions, which determines that the type of p is a tree structure. e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);//If so, add trees. } else { //Note that in this else, [e===] is used to add new nodes; when a node is updated, it is not null. for (int binCount = 0; ; ++binCount) {//Hasn't yet formed a tree structure, or the linked list structure of jdk 1.7 //The difference is 1.7: head insertion, later on the array, first on the tail of the chain; 1.8: first on the array, then on the tail of the chain //Determine whether p.next is empty and assign value to e. If it is empty, p.next points to the newly added node, when the list length is less than 7 if ((e = p.next) == null) { //This place has a hard place to understand: in the judgment condition, point e to p.next, that is to say, now e=null, not the wrong understanding of the next line. //This also explains that when updating, returning to oldValue, when new, is not returned there. p.next = newNode(hash, key, value, null);//e = p.next,p.next points to the newly generated node, that is, e points to the new node (error) //For the analysis of the critical value: //Suppose this is the sixth time, binCount = 6, no tree changes, the current list length is 7; the next cycle. //binCount == 7, the condition holds, the tree is changed, and then put to the position of the bucket, the else will not go, go to the middle of the number structure of the bifurcation statement. //At this point, the 8-length list becomes a red-black tree. if (binCount >= TREEIFY_THRESHOLD - 1){// -1 for 1st //TREEIFY_THRESHOLD == 8 treeifyBin(tab, hash); } break;//After inserting new values or making tree changes, jump out of the for loop. At this point e does not redirect, but still points to null, although p.next points to the new node. //But it doesn't matter with e. } //If you find the same key node when you loop the list, then you jump out of the loop and you can't get to the end of the list. // e has been assigned in the previous step, and if it is not null, it will jump out of the for loop and update the value below. if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))){ break; } //This is p.next, that is, e is not empty, and then, without the same key situation, go on looping the list. // p points to p.next, which is e. Continue the loop, continue, e=p.next. p = e; //It is not until p.next is empty that a new node is added, or when the key is equal, the old value is updated out of the loop. } } //After the above if else if else, e is null when a new node is built, and is assigned when it is updated. The same is true for putTreeVal () in a tree. if (e != null) { // existing mapping for key // foreigners are right, that is, only when the update time, go here, will return oldValue directly. V oldValue = e.value; //onlyIfAbsent is always false when calling put() of hashMap, so updating value below is definitely executed if (!onlyIfAbsent || oldValue == null){ e.value = value; } afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold){ resize(); } afterNodeInsertion(evict); return null;
3. Addressing process (how to convert key values into array subscripts)
Borrowing from the graph on the internet, we can see the conversion process more intuitively:
Step 1: key value - > 32-bit hash value
This is the key value that calls the hashCode() function to generate a 32-bit hash value.
Step 2: 32-bit hash value - > mixed hash value
In this step, the high 16 bits and the low 16 bits of the 32 bit hash value are XOR operated. Why do we do this, because we need to do an indexFoe() operation to intercept low-bit information (high 16-bit information will be lost). After doing this, low 16-bit information can also be doped with high 16-bit information. High-bit information is saved in disguised form, which can increase randomness and reduce the possibility of conflict.
The two-step operation of 1,2 is merged into a single line of code in the source code of the put function, as follows:
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
Step 3: Mixed hash value - > array subscript
The initial length of the linked list array is 16. Obviously, this 32-bit mixed hash value can not directly correspond to the linked list array, and will cause a lot of conflicts. A modular operation is used here. HashMap uses hash value and length-1 (container length-1) to perform modular (%) operations. Some people may ask that the indexFor() method in explicit source code performs bitwise and (&) operations rather than modular operations. In fact, the indexFor() method in HashMap is a modular operation. Using bitwise operation instead of modular operation can greatly improve the computational efficiency of the program. Bit operation can directly operate on memory data without conversion to decimal system, so it is much more efficient. It should be noted that bitwise operations can be converted to modular operations only in specific cases (when b = 2^n, a% B = A & (b - 1)). Therefore, HashMap only sets the initial length to 16, and the expansion can only be multiplied by 2 (2^n).
4. Security of HashMap
It is possible to form a ring list in the expansion operation.
Reason:
When HashMap is expanded, the order of elements in the list is changed and the elements are inserted from the head of the list. (To avoid tail traversal). And the ring list happens at this moment. When adding elements, if they exceed the threshold, they need to be expanded. If two elements are added at the same time, threads A and B may be expanded at the same time. When thread A is ready to expand, thread B begins to execute, the expansion is completed, and a new hashMap is generated. At this time, A - > B null becomes B - > A null. First, A is copied into the new hash table, and then B is copied to the chain head (B.next=A), originally B.next=null, which ends (the same process as thread 2), but Yes, because of the expansion of thread 2, B.next=A, so here we continue to copy A, let A.next=B, thus the ring list appears: B.next=A; A.next=B. It should have pointed A to null through A.next=B, but thread A has changed A.next to B. All rows are looped.
5.HashSet
Introduction:
HashSet is a collection without duplicate elements.
It is implemented by HashMap, which does not guarantee the order of elements, and HashSet allows null elements to be used.
HashSet is asynchronous. If multiple threads access a hash set at the same time, and at least one of them modifies the set, it must keep external synchronization. This is usually done by synchronizing objects that naturally encapsulate the set. If no such object exists, the Collections. synchronized Set method should be used to "wrap" the set. It is best to do this at creation time to prevent unexpected asynchronous access to the set: Set s = Collections. synchronized Set (new HashSet (...));
Pseudo-code implementation:
public class MyHashSet<E> { private HashMap<E,Object> map; private static final Object PRESENT = new Object(); public MyHashSet(){ map = new HashMap<E,Object>(); } public int size(){ return map.size(); } public void add(E e){ map.put(e,PRESENT); } public void remove(E e){ map.remove(e); } }