Collection of underlying code learning

Set classification

Collection interface: single column data, which defines the collection of methods to access a group of objects
- List: an ordered and repeatable collection of elements
- Set: an unordered and non repeatable set of elements

Map interface: double column data, which saves the set with mapping relationship "key value pair"
List interface
Array has its limitations. It is usually used to replace the interface with lList
The elements in the List class are ordered and repeatable, and each element has a corresponding index
The implementation classes of the List interface are commonly used: ArrayList, LinkedList and Vector.
|- ArrayList: the main implementation class of the List interface, which uses transient object [] elementdata to store, frequently add and find, and is recommended
|- LinkedList: linked storage, which is realized by using a two-way linked list. The efficiency of frequent insertion and deletion is higher than that of ArrayList
Static inner class:
private static class Node {
E item;
Node next;
Node prev;
```
 Node(Node<E> prev, E element, Node<E> next) {
     this.item = element;
     this.next = next;
     this.prev = prev;
 }
```
}
Delete: give the front node of the element to be deleted to the front node of its rear node, and give its rear node to the rear node of its front element. Insert similar.
|- Vector: appears earlier than the List interface (1.0), which is inefficient.
ArrayList source code analysis
JDK7:
ArrayList arrayList=new ArrayList(); An array with a length of 10 is created;
arraylist.add() capacity expansion: when the capacity is insufficient, the default capacity expansion is 1.5 times of the original capacity, and the original array is added.
Null parameter constructors are not recommended.
JDK8:
ArrayList arrayList=new ArrayList();

private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};//Empty array
//When the parameterless construction method is called, initialization is not performed by default. Here, the comments of the source code have not been changed or initialized 10
 public ArrayList() {
        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
    }

Only when the add method is called will an array with a length of 10 be created and the data be added. Similar to lazy type, the creation time is delayed to save memory.
add method

public boolean add(E e) {
      ensureCapacityInternal(size + 1);  // Increments modCount!!  First determine whether the capacity is enough
      elementData[size++] = e;
      return true;
  }

private void ensureCapacityInternal(int minCapacity) {
//If it is empty, give capacity 10 or minCapacity(size+1)
      if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
          minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
      }
//Otherwise, call the following method
      ensureExplicitCapacity(minCapacity);
  }

private void ensureExplicitCapacity(int minCapacity) {
      modCount++;//Verify that the thread is safe

      // overflow-conscious code
      //If the current actual length is greater than the current capacity
      if (minCapacity - elementData.length > 0)
      //Capacity expansion method
          grow(minCapacity);
  }

private void grow(int minCapacity) {
      // overflow-conscious code
      int oldCapacity = elementData.length;//Original capacity
      int newCapacity = oldCapacity + (oldCapacity >> 1);//1.5x new capacity
      //If the new capacity is still less than minCapacity, use minCapacity as the new capacity
      if (newCapacity - minCapacity < 0)
          newCapacity = minCapacity;
          //private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 
         // 8; If the new capacity overflows, call hugeCapacity(minCapacity);
      if (newCapacity - MAX_ARRAY_SIZE > 0)
          newCapacity = hugeCapacity(minCapacity);
      // minCapacity is usually close to size, so this is a win:
      //Finally, copy the original element to the new array
      elementData = Arrays.copyOf(elementData, newCapacity);
  }

private static int hugeCapacity(int minCapacity) {
      if (minCapacity < 0) // overflow throw an exception if it is out of bounds
          throw new OutOfMemoryError();
          //If there is no out of bounds, the final capacity is obtained by ternary operation
      return (minCapacity > MAX_ARRAY_SIZE) ?
          Integer.MAX_VALUE :
          MAX_ARRAY_SIZE;
  }

LinkedList underlying source code analysis

//Call the null parameter constructor when there are no parameters
public LinkedList() {
  }
  //The parameter is a collection. When it is called, the empty parameter constructor is called. Then addAll(c) is added.
  public LinkedList(Collection<? extends E> c) {
      this();
      addAll(c);
  }
/**
*Call add to add
*/
public boolean addAll(Collection<? extends E> c) {
      boolean modified = false;
      for (E e : c)
          if (add(e))
              modified = true;
      return modified;
  }

add method

 public boolean add(E e) {
      linkLast(e);//Call method to add element
      return true;
  }
/**
*Tail node
*/
void linkLast(E e) {
      final Node<E> l = last;  //Initial transient node < E > last;
      final Node<E> newNode = new Node<>(l, e, null);//Create a node for the current data
      last = newNode;//last reassign
      if (l == null)
          first = newNode;//If the first assignment is made, it will be the first node
      else
          l.next = newNode;//Otherwise, set the current node as the tail node
      size++;
      modCount++;
  }

Set interface
|----------HashSet: thread unsafe; null values can be stored

Linkhashset: subclass of HashSet. map is used at the bottom
|----------TreeSet: data of the same type, red black tree storage
Set features
Out of order: the storage order of the stored data is not added according to the index order, which is determined according to the hash value of the data.
Non repeatable:
Set uses an array at the bottom layer. It is initialized to 16 in jdk7 and will not be initialized until add is called in jdk8
In order to ensure that the data is not repeated, data comparison is required. If the data is compared one by one, the efficiency will be slow when there are a lot of data. Therefore, there is hashcode. When adding data, the index of the data will be obtained by algorithm according to the hashcode value of the added data. If there is no data at the index position, it will be directly stored in the array. When there are already elements in the added data location, the hashcode value is compared. If the hashcodes are the same, use equals to judge whether they are the same. If they are the same, the two data are the same.
If the hashcode s are different, the linked list is used to add data. At this time, multiple data are stored in the same index. JDK7 and 8 are inserted in different ways.
jdk7 header insertion: the currently inserted data is placed in the array as the header node, and the previous data is placed behind it.
jdk8 tail interpolation: the added data is put back as the tail node.
This addition method has obvious advantages when there are many data. It first calculates the storage location through the hash value. If there are elements in the location, it will enter the linked list for comparison and then add, which greatly reduces the time complexity.
Therefore, to add data to a set, you must override equals() and hashcode()
Because the underlying layer uses map, the expansion of set is discussed below
TreeSet
Two sorting methods
1. Natural sorting

 public void testSet(){
      TreeSet treeSet1=new TreeSet();
      treeSet1.add("lis");
      treeSet1.add("ons");
      treeSet1.add("las");
        System.out.println(treeSet1);
        System.out.println("----------------");
        //The User class implements the complex interface
        /*
         public int compareTo(Object o) {
        if(o instanceof User){
            User user= (User) o;
            int eq=user.usrname.compareTo(this.usrname);
            if (eq==0)
            return Integer.compare(user.age, this.age);
           else return eq;
        }else {
            throw new RuntimeException("Data type mismatch '');
        }
    }
        */
        TreeSet treeSet2=new TreeSet();
        treeSet2.add(new User("lisi", 18));
        treeSet2.add(new User("suli", 20));
        treeSet2.add(new User("liufang", 19));
        treeSet2.add(new User("mumian", 17));
        treeSet2.add(new User("mumian", 18));
        System.out.println(treeSet2);
    }
    result:
    [las, lis, ons]
----------------
[User{usrname='suli', age=20}, User{usrname='mumian', age=18}, 
User{usrname='mumian', age=17}, User{usrname='liufang', age=19}, 
User{usrname='lisi', age=18}]

Custom sorting

 public void testCo(){
      Comparator comparator= (o1, o2) -> {
         if(o1 instanceof User &&o2 instanceof User){
             User user1= (User) o1;
             User user2= (User) o2;
             return user1.getUsrname().compareTo(user2.getUsrname());
         }else
            throw new RuntimeException("Input data type mismatch");
      };
        TreeSet treeSet2=new TreeSet(comparator);
        treeSet2.add(new User("lisi", 18));
        treeSet2.add(new User("suli", 20));
        treeSet2.add(new User("liufang", 19));
        treeSet2.add(new User("mumian", 17));
        treeSet2.add(new User("mumian", 18));
        System.out.println(treeSet2);
    }
result:
[User{usrname='lisi', age=18}, User{usrname='liufang', age=19}, 
User{usrname='mumian', age=17}, User{usrname='suli', age=20}]

Map interface (1.2)

HashMap(1.2): thread is unsafe and efficient, and key value can store null
- LinkedHashMap(1.4): it is suitable for frequent traversal operations
TreeMap: sort. Red and black trees are used at the bottom
Hashtable(1.0): thread safe, thread unsafe, null cannot be saved
- propeties: processing configuration files
  HashMap infrastructure:
  1.7 and before: array + linked list
  After 1.8: array + linked list + red black tree
  Key value pair
  key: non repeatable, unordered, stored in set
  value: repeatable, unordered, repeatable Collection storage.
  A key value is encapsulated into an Entry object, which is non repeatable and unordered. It is stored in set.
  Underlying implementation principle of HashMap
  jdk1.7:
  HashMap hashMap=new HashMap();
  Create a one-dimensional array Entry[] table with a length of 16 after instantiation
  map.put(key,value)
  Call hashcode() of the class where the key is located to calculate the hash value of the key, and get its position in the Entry array through algorithm processing.

		If there is no data on the location, insert it directly;
		If there is data on the location, compare the hash value. If the hash value is different, add it directly
  			  If the hash values are the same at the location, call equals()Method comparison:
  	 				If different false Added successfully
         				If return true Just replace it key Corresponding value

The default value is twice the original value during capacity expansion. Expansion conditions: the length is greater than or equal to the critical value and the current storage location is not empty.
jdk1.8:
HashMap hashMap=new HashMap(); no initialization capacity, lazy loading
The stored array is Node [], not Entry [], which is essentially the same
When the data stored in the form of a linked list in the array is > 8 and the array length is > 64, the red black tree is used for storage, otherwise the capacity will be expanded

//Initialization capacity 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
//Maximum capacity 2 ^ 30
static final int MAXIMUM_CAPACITY = 1 << 30;
//Loading factor
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//Tree critical value
static final int TREEIFY_THRESHOLD = 8;
//The critical value is not treelized. When the internal number of the tree is less than 6, it is converted to a linked list
static final int UNTREEIFY_THRESHOLD = 6;
//Minimum tree capacity
static final int MIN_TREEIFY_CAPACITY = 64;


//static final float DEFAULT_LOAD_FACTOR = 0.75f;, default load factor 0.75f
 public HashMap() {
      this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
  }
  //With initial capacity
   public HashMap(int initialCapacity) {
      this(initialCapacity, DEFAULT_LOAD_FACTOR);//Call public HashMap(int initialCapacity, float loadFactor)
  }

public HashMap(int initialCapacity, float loadFactor) {
//If the initial capacity is less than 0, an exception is thrown
      if (initialCapacity < 0)
          throw new IllegalArgumentException("Illegal initial capacity: " +
                                             initialCapacity);
 //If the custom capacity is greater than static final int maximum_capacity = 1 < < 30; 2 ^ 30 
//  MAXIMUM_CAPACITY is set to the maximum capacity                                          
      if (initialCapacity > MAXIMUM_CAPACITY)
          initialCapacity = MAXIMUM_CAPACITY;
          //If the load factor is < 0 or is not float data, an exception is thrown
          /*public static boolean isNaN(float v) {
    				  return (v != v);
  }
*/       if (loadFactor <= 0 || Float.isNaN(loadFactor))
          throw new IllegalArgumentException("Illegal load factor: " +
                                             loadFactor);
      this.loadFactor = loadFactor;
      //The critical value is initialCapacity*loadFactor, and the capacity is expanded when the length is greater than the threshold.
      this.threshold = tableSizeFor(initialCapacity);
  }

Capacity expansion:

public V put(K key, V value) {
      return putVal(hash(key), key, value, false, true);
  }

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                 boolean evict) {
      Node<K,V>[] tab; Node<K,V> p; int n, i;
      //Initialize resize() is called for the first time
      if ((tab = table) == null || (n = tab.length) == 0)
          n = (tab = resize()).length;
          //If the location to be stored is empty, it can be stored directly
      if ((p = tab[i = (n - 1) & hash]) == null)
          tab[i] = newNode(hash, key, value, null);
      else {
          Node<K,V> e; K k;
          //If hash es are the same and equals is the same
          if (p.hash == hash &&
              ((k = p.key) == key || (key != null && key.equals(k))))
              e = p;
              //If TreeNode
          else if (p instanceof TreeNode)
              e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
          else {
          //Are they the same one by one
              for (int binCount = 0; ; ++binCount) {
                  if ((e = p.next) == null) {
                      p.next = newNode(hash, key, value, null);
                      //treeifyBin(tab, hash) is called when the data on the chain is greater than 8; Red black tree storage
                      if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                          treeifyBin(tab, hash);
                      break;
                  }
                  //The same as above belongs to the logic of substitution
                  if (e.hash == hash &&
                      ((k = e.key) == key || (key != null && key.equals(k))))
                      break;
                  p = e;
              }
          }
          //If e= Null replace or add directly
          if (e != null) { // existing mapping for key
              V oldValue = e.value;
              if (!onlyIfAbsent || oldValue == null)
                  e.value = value;
              afterNodeAccess(e);
              return oldValue;
          }
      }
      ++modCount;
      if (++size > threshold)
          resize();
      afterNodeInsertion(evict);
      return null;
  }
/**
*Red black tree storage
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
      int n, index; Node<K,V> e;
      //static final int MIN_TREEIFY_CAPACITY = 64
      //When the array length is less than 64, expand the capacity instead of changing to a tree structure
      if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
          resize();
          //Use tree when greater than 64
      else if ((e = tab[index = (n - 1) & hash]) != null) {
          TreeNode<K,V> hd = null, tl = null;
          do {
              TreeNode<K,V> p = replacementTreeNode(e, null);
              if (tl == null)
                  hd = p;
              else {
                  p.prev = tl;
                  tl.next = p;
              }
              tl = p;
          } while ((e = e.next) != null);
          if ((tab[index] = hd) != null)
              hd.treeify(tab);
      }
  }

final Node<K,V>[] resize() {
      Node<K,V>[] oldTab = table;
      int oldCap = (oldTab == null) ? 0 : oldTab.length;
      int oldThr = threshold;
      int newCap, newThr = 0;
      //Capacity expansion
      if (oldCap > 0) {
          if (oldCap >= MAXIMUM_CAPACITY) {
              threshold = Integer.MAX_VALUE;
              return oldTab;
          }
          else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                   oldCap >= DEFAULT_INITIAL_CAPACITY)
              newThr = oldThr << 1; // double threshold
      }
      else if (oldThr > 0) // initial capacity was placed in threshold
          newCap = oldThr;
      else {               // zero initial threshold signifies using defaults
          newCap = DEFAULT_INITIAL_CAPACITY;//The first call goes back to this static final int 
                   //DEFAULT_INITIAL_CAPACITY = 1 << 4 = 16
          newThr = (int)(DEFAULT_LOAD_FACTOR * 
          DEFAULT_INITIAL_CAPACITY);//0.75*16
      }
      if (newThr == 0) {
          float ft = (float)newCap * loadFactor;
          newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
      }
      threshold = newThr;//Critical value assignment
      @SuppressWarnings({"rawtypes","unchecked"})
      //The size of the array is newCap=16
          Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
      table = newTab;
      if (oldTab != null) {
          for (int j = 0; j < oldCap; ++j) {
              Node<K,V> e;
              if ((e = oldTab[j]) != null) {
                  oldTab[j] = null;
                  if (e.next == null)
                      newTab[e.hash & (newCap - 1)] = e;
                  else if (e instanceof TreeNode)
                      ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                  else { // preserve order
                      Node<K,V> loHead = null, loTail = null;
                      Node<K,V> hiHead = null, hiTail = null;
                      Node<K,V> next;
                      do {
                          next = e.next;
                          if ((e.hash & oldCap) == 0) {
                              if (loTail == null)
                                  loHead = e;
                              else
                                  loTail.next = e;
                              loTail = e;
                          }
                          else {
                              if (hiTail == null)
                                  hiHead = e;
                              else
                                  hiTail.next = e;
                              hiTail = e;
                          }
                      } while ((e = next) != null);
                      if (loTail != null) {
                          loTail.next = null;
                          newTab[j] = loHead;
                      }
                      if (hiTail != null) {
                          hiTail.next = null;
                          newTab[j + oldCap] = hiHead;
                      }
                  }
              }
          }
      }
      return newTab;
  }

LinkedHashMap

Bottom use Entry storage
static class Entry<K,V> extends HashMap.Node<K,V> {
Entry<K,V> before, after;//With this, traversal can be realized, which is convenient for frequent traversal operations
Entry(int hash, K key, V value, Node<K,V> next) {
super(hash, key, value, next);
}
}
**Loading factor**
① If the space utilization is high, when the hash algorithm is used to calculate the storage location, it will be found that many storage locations already have data (hash conflict);
② If you increase the capacity of the array to avoid hash conflicts, it will lead to low space utilization.
And the loading factor is Hash The filling degree of the elements in the table.

Loading factor = Fill in the number of elements in the table / Length of hash table
The larger the loading factor, the more elements filled, and the higher the space utilization, but the greater the chance of conflict;

The smaller the loading factor is, the fewer elements will be filled, and the chance of conflict will be reduced, but more space will be wasted, and the capacity expansion will be improved rehash Number of operations.

The greater the chance of conflict, the higher the cost of searching. Therefore, we must find a balance and compromise between "conflict opportunity" and "space utilization".
**The size of the load factor value, right HashMap What's the impact**
1.The size of the load factor determines HashMap Data density.
2.The larger the load factor, the greater the density, the higher the probability of collision, and the longer the linked list in the array,
As a result, the number of comparisons during query or insertion increases, and the performance will decline.
3.The smaller the load factor, the easier it is to trigger capacity expansion, and the smaller the data density, which means that collision occurs
The smaller the probability, the shorter the linked list in the array, and the smaller the number of comparisons between query and insertion
Can be higher. But it will waste some content space. Moreover, frequent capacity expansion will also affect the performance
It is recommended to initialize the preset larger space.
4. According to the reference and research experience of other languages, it will be considered to set the load factor to 0.7~0.75，this
The average retrieval length is close to a constant.

Posted by FadeToLife on Thu, 21 Oct 2021 06:37:24 -0700

Programmer Group

Collection of underlying code learning

Hot Keywords