[A Series of Articles] Learn about common collection classes (including source code) in java

Keywords: Java less

List

1,ArrayList

Key source code

// Default initialization to an empty array
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

// Adding elements
public boolean add(E e) {
    // Add modCount to judge expansion
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

// Expansion function
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    // The new capacity equals 1.5 times the old capacity, and the displacement is more efficient.
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    // Expansion is a process of array replication
    elementData = Arrays.copyOf(elementData, newCapacity);
}

(2) Characteristics

  1. The default initial capacity for the first addition of elements to an empty collection is 10
  2. Thread insecurity (ArrayList is used only in single threads, while Vector [method with sync keyword] or CopyOnWriteArrayList[juc package] can be selected in multithreads)
  3. Random access (i.e., access by index number) is the most efficient way to traverse, while iterators are the least efficient.
  4. fail-fast mechanism

    • Fail-fast mechanism is an error mechanism in java collection. When multiple threads operate on the contents of the same collection, fail-fast events may occur.
    • In the AbstractList source code, modCount+1 is modified every time (add/delete, etc.).
      Because in the implemented Itr class, next() and remove() both execute checkForComodification().
      If "modCount is not equal to expectedModCount", a Concurrent ModificationException exception is thrown, resulting in a fail-fast event.
      So when multiple threads operate simultaneously, a Concurrent ModificationException exception is thrown

2,LinkedList

Key source code

// Initialize empty list
public LinkedList() {
}

// Add a node at the end of the list
public boolean add(E e) {
    linkLast(e);
    return true;
}

// linkLast function
void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        // last is an empty list, inserting the first node
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

// Node class
private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}

(2) Characteristics

  1. The essence of LinkedList is a two-way linked list.
  2. LinkedList contains two important members: Node < E > and size.
  3. Node is an instance of a class corresponding to a two-way linked list node. Node contains member variables: prev, next, item
  4. prev is the last node of the node, next is the next node of the node, and item is the value contained in the node. size is the number of nodes in a two-way list.

3. LinkedList and Array List usage scenarios and performance analysis

1. ArrayList is an array queue, equivalent to a dynamic array. It is implemented by arrays, with high random access efficiency and low efficiency of random insertion and deletion (involving replication).
2. LinkedList is a two-way linked list. It can also be operated as a stack, queue, or double-ended queue. LinkedList random access efficiency is low, but random insertion and deletion efficiency is high.

(01) LinkedList should be used for quick insertion and deletion of elements.
(02) ArrayList should be used for elements requiring quick random access.
(03) For "single-threaded environment" or "multi-threaded environment, but List can only be operated by a single thread", asynchronous classes (such as ArrayList) should be used.
     For "multi-threaded environments where lists may be operated by multiple threads at the same time", synchronous classes (such as CopyOnWriteArray List) should be used.

Map

1,HashMap

Key source code

// Loading factor
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// Initialization capacity
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// The initial load factor defaults to 0.75, at which time the capacity has not been initialized, resize() is used to initialize the capacity to 16 at put time.
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}


public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
* key The hash value of hashCode() is calculated by 16 bits high or 16 bits low: (h = K. hashCode()^ (h > > > > 16)
* This can be done when the length of the array table is small
* It also ensures that both high and low Bit s are involved in Hash computing without too much overhead.
*/
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

// Important putVal logic
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the table is null or 0, resize() is performed and the resize() method is described below.
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // To locate the insertion table, the algorithm is (n - 1) & hash. When n is a power of 2, it is equivalent to the operation of dividing modulus and redundancy. (That's one of the reasons why length takes power of two)
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
        
    // There are two cases in which hash collision occurs at the i position of the table or the same result occurs after modulus removal:
    // 1. If the key value is the same, it is replaced by the value value value.
    // 2. There are two ways to deal with different key values: a. the linked list stored in i (when the length of the linked list reaches 8, it turns into a red-black tree); b. the linked list stored in a red-black tree.
    else {
        Node<K,V> e; K k;
        // The key value is the same, replacing the value value value
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // Put TreeVal is the structure of the red-black tree.
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // It's not TreeNode. It's a linked list. It traverses the linked list and compares it to every key in the linked list.
            for (int binCount = 0; ; ++binCount) {
                // Until the next node in the list is null, go in and create a new node
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // Record the length of the linked list according to binCount, and turn to red-black tree when it exceeds 8.
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

final Node<K,V>[] resize() {
    // Save the current table
    Node<K,V>[] oldTab = table;
    // Save the capacity of the current table
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // Save the current threshold
    int oldThr = threshold;
    // Initialize new table capacity and threshold 
    int newCap, newThr = 0;
    /*
      1. resize()The function is called when size > threshold. Old Cap greater than 0 represents the original table table is not empty.
      oldCap For the size of the original table, oldThr (threshold) is oldCap * load_factor
   */
    if (oldCap > 0) {
        // If the capacity of the old table exceeds the maximum capacity, the update threshold is Integer.MAX_VALUE (the maximum shaping value), so that it will not be automatically expanded in the future.
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // Double capacity, use left shift, more efficient
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    /*
       2. resize()The function is called when the table is empty. oldCap is less than or equal to 0 and oldThr is greater than 0, creating a HashMap on behalf of the user, but using the constructor as follows      
       HashMap(int initialCapacity, float loadFactor) Or HashMap (int initial capacity)
       Or HashMap (Map <? Extends K,? Extends V > m), which results in the initial capacity of HashMap specified by oldTab, oldCap and oldThr for users.
    */
    else if (oldThr > 0) // initial capacity was placed in threshold
        //When the table is not initialized, threshold holds the initial capacity. Remember threshold = tableSizeFor(t);
        newCap = oldThr;
    /*
        3. resize()The function is called when the table is empty. OldCap is less than or equal to 0 and oldThr is equal to 0. Users call HashMap() constructor to create HashMap. All values are default values. oldTab (Table) table is empty, oldCap is 0, oldThr is equal to 0.
    */
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // The new threshold is 0.
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // Initialize table
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // Put the node reHash in oldTab into newTab
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                // If the node is a single node, relocate it directly in newTab
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // If the node is a TreeNode node, the rehash operation of the red-black tree is performed.
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // If it's a linked list, rehash the linked list
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // Divide the elements in the same bucket into two different lists according to whether they are 0 (e. hash & oldCap). Complete rehash.
                    do {
                        next = e.next;
                        // According to the algorithm e.hash & oldCap, whether the node position changes after rehash is judged
                        // The highest bit = 0, which is the linked list with the same index.
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        //The highest bit = 1 (this is a linked list of index changes)
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // The tail pointer of the original bucket position is not empty (that is, node)
                    if (loTail != null) {
                        // The list ends up with a null
                        loTail.next = null;
                        // The list head pointer is placed at the same subscript (j) of the new bucket.
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        // The new location of the node after rehash must be old Cap, which is added to the original one. See the figure below for a detailed explanation.
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

(2) Characteristics

  1. The implementation of HashMap is not thread-safe. Its key and value can be null.
  2. Initialization load factor is 0.75 and capacity is initialized to 16 with put operation (the number of barrels in the hash table, that is, the number of entries)
  3. When the number of entries in a hash table exceeds the product of load factor and capacity (e.g. 0.75 * 16), the hash table is resize d (rehash operation is also performed to reconstruct the internal data structure), so that the hash table will have twice the capacity.
  4. HashMap is a hash table implemented by zipper method. Essentially, it is the data structure of array + one-way linked list + red-black tree (as shown below).
  5. HashMap uses linked list method to avoid hash collision (the same hash value). When the length of the linked list is longer than TREEIFY_THRESHOLD (default is 8, set to 8 because the probability of reaching 8 is 0.00000006, the probability is low), the linked list is converted to a red-black tree (before the transition to a red-black tree, the capacity of the map will be determined once again to be less than 64, if yes, it will not enter. Row to red-black tree, but resize() expansion first, if later less than UNTREEIFY_THRESHOLD (default is 6), and then back to the list to achieve performance balance.


Figure 1 Data structure of HashMap

2,LinkedHashMap

Key source code

// LinkedHashMap inherits HashMap and implements Map interface
public class LinkedHashMap<K,V>
    extends HashMap<K,V>
    implements Map<K,V>

// Initialization is the default capacity 16, load factor 0.75. And access Order = false is set to indicate that the default traversal is in insertion order.
public LinkedHashMap() {
    super();
    accessOrder = false;
}

(2) Characteristics

  1. LinkedHashMap is a collection based on HashMap, which has all the features mentioned above.
  2. Apart from the disorder of HashMap, LinkedHashMap is ordered.
  3. LinkedHashMap maintains a two-way linked list with all data on the basis of HashMap, which ensures the order of iteration of elements.
  4. Data structure diagram

Set

1,HashSet

Key source code

// HashSet implements Cloneable and Serilizable interfaces to support cloning and serialization, respectively. Set interface is also implemented, which defines a set of specifications for Set set types
public class HashSet<E>
    extends AbstractSet<E>
    implements Set<E>, Cloneable, java.io.Serializable
    
// The content in the HashSet collection is stored through the HashMap data structure
private transient HashMap<E,Object> map;
// Add data to HashSet. The map structure above the data exists as a key, and the value unification is PRESENT.
private static final Object PRESENT = new Object();

// Initialization is a new HashMap
public HashSet() {
    map = new HashMap<>();
}

// Insert e as key, PRESENT as value into the map collection, and if e does not exist, the insert successfully returns true; if it does, it returns false.
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

(2) Characteristics

  1. HashSet is a collection without duplicate elements.
  2. It is implemented by HashMap, which does not guarantee the order of elements, and HashSet allows null elements to be used.

Posted by knetcozd on Wed, 31 Jul 2019 00:13:58 -0700