Java ArrayList underlying implementation principle source code detailed analysis of Jdk8

Keywords: Java JDK C less

brief introduction

  • ArrayList is a dynamic array based on array. Its capacity can grow automatically. It is similar to the dynamic application memory and dynamic growth memory in C language.
  • ArrayList is not thread safe and can only be used in a single thread environment. In a multi-threaded environment, you can consider using the collections.synchronized list (List L) function to return a thread safe ArrayList class, or you can use the CopyOnWriteArrayList class under concurrent and contract.
  • ArrayList implements the Serializable interface, so it supports serialization, can realize the RandomAccess interface through serialization transmission, and supports fast random access. In fact, it realizes the clonable interface through rapid access through subscript sequence number, and can be cloned.

storage structure

// Where the current data object is stored. The current object does not participate in serialization
// The main function of this keyword is that when serializing, the content decorated by transient will not be serialized
transient Object[] elementData;
  • Object type array.

    Data domain

    // Serialization of ID
    private static final long serialVersionUID = 8683452581122892189L;
    // Default initial capacity
    private static final int DEFAULT_CAPACITY = 10;
    // An empty array, easy to use, is mainly used for initializing with parameter constructors and reading serialized objects.
    private static final Object[] EMPTY_ELEMENTDATA = {};
    /**
     * As written in the official document, the difference between default access? Element data and element? Element data
     * It's just to distinguish the lazy initial mode object of the construction with 0 user band parameter from the default construction.
     * When the user takes the construction with parameter 0 and add s for the first time, the array capacity grow s to 1.
     * When the user uses the default construction, the first time add, the capacity will directly grow to default menu capacity (10).
     */
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
 
    // Where the current data object is stored. The current object does not participate in serialization
    // The main function of this keyword is that when serializing, the content decorated by transient will not be serialized
    transient Object[] elementData; // non-private to simplify nested class access
    // The number of elements in the current array
    private int size;
    // Array maximum allocable capacity
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
    // Identification of the number of changes to the collection array (inherited by AbstractList) (fail fast mechanism)
    protected transient int modCount = 0;
  • Parameterless constructor for ArrayList. During initialization, there is no real creation of 10 spaces, which is the lazy initial mode object.
  • The difference between the defaultaccess? Empty? Element data and empty? Element data is only to distinguish the lazy initial mode object of the construction with 0 user band parameter and the default construction.
  • modCount is used to record the number of times the ArrayList structure has changed. Be used for Fail fast mechanism

Constructor

    public ArrayList() {
        // Only this place will refer to defaultattachment? Empty? Elementdata
        this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
    }
    
    public ArrayList(int initialCapacity) {
        if (initialCapacity > 0) {
            this.elementData = new Object[initialCapacity];
        } else if (initialCapacity == 0) {
            // Using empty? Elementdata, you may refer to empty? Elementdata in several other places
            this.elementData = EMPTY_ELEMENTDATA;
        } else {
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        }
    }
   
    public ArrayList(Collection<? extends E> c) {
        // Pass the incoming set into an array of [] and copy it to elementData 
        elementData = c.toArray();
        // The converted array length is assigned to the size of the current ArrayList, and whether it is 0 is determined
        if ((size = elementData.length) != 0) {
            //c.toArray may not return Object []. You can view the bug with java official number 6260652
            if (elementData.getClass() != Object[].class)
                // If the array type returned by c.toArray() is not Object [], use Arrays.copyOf(); to construct an Object [] array of size
                // At this time, elementData refers to the memory of the incoming collection, and you need to create a new deep copy of the memory area to elementData 
                elementData = Arrays.copyOf(elementData, size, Object[].class);
        } else {
            // Null array replaced by null array
            this.elementData = EMPTY_ELEMENTDATA;
        }
    }
  • The difference between the defaultaccess? Empty? Element data and empty? Element data is only to distinguish the lazy initial mode object of the construction with 0 user band parameter and the default construction.
  • Pay attention to deep copy and light copy.
  • A construct with parameter 0 will be lazy initialized, and a construct without parameter 0 will not be lazy initialized.

    add() source code analysis

public boolean add(E e) {
        // Make sure the array has used the length (size) plus 1 to store the next data
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        // The next index of the array holds the incoming elements.
        elementData[size++] = e;
        // Always return true.
        return true;
}
private void ensureCapacityInternal(int minCapacity) {
        ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}
private static int calculateCapacity(Object[] elementData, int minCapacity) {
        // This is the defaultaccess? Empty? Elementdata and
        // The most important difference between empty and elementdata.
        if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
            // The default construct returns 10 for the first add.
            return Math.max(DEFAULT_CAPACITY, minCapacity);
        }
        // The band parameter is 0, and the first add returns 1 (0 + 1).
        return minCapacity;
}
private void ensureExplicitCapacity(int minCapacity) {
        // Self increasing modification count
        modCount++;

        // overflow-conscious code
        // The current array capacity is less than the required minimum capacity
        if (minCapacity - elementData.length > 0)
            // Prepare to expand array
            grow(minCapacity);
}
private void grow(int minCapacity) {
        // overflow-conscious code
        // Get the current array capacity
        int oldCapacity = elementData.length;
        // 1.5 times the capacity of the new array
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            // If newCapacity is still smaller than minCapacity
            newCapacity = minCapacity;
            // Determine whether the required capacity exceeds the maximum array capacity.
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        // In Arrays.copyOf(), the whole original array will be assigned to the expanded array.
        elementData = Arrays.copyOf(elementData, newCapacity);
}
  • To expand the capacity, you need to call Arrays.copyOf() to copy the whole original array to the new array. This is a very expensive operation. Therefore, it is better to specify the approximate capacity when creating the ArrayList object to reduce the number of expansion operations.

add(int index, E element) source code analysis

// This is a local method, implemented by C language.
public static native void arraycopy(Object src,  // Source array
                                    int  srcPos, // Starting position of source array to copy
                                    Object dest, // Target array (copy the original array to the target array)
                                    int destPos, // Starting position of the target array (from which subscript of the target array to copy)
                                    int length   // Copy the length of the source array
                                    );

public void add(int index, E element) {
        // Judge whether the index is out of range
        rangeCheckForAdd(index);
        // Make sure the array has used the length (size) plus 1 to store the next data
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        // Running here means that the array capacity is satisfied.
        // The array is copied from the incoming parameter index, and the size index elements are copied (that is, all the following elements including index are copied),
        // Paste from index + 1 of the array.
        // At this time, the element values at index and index + 1 are the same.
        System.arraycopy(elementData, index, elementData, index + 1,
                         size - index);
        // Replace the element at index with a new one.
        elementData[index] = element;
        // Add one to the length of the element in the array.
        size++;
}
  • You need to call System.arraycopy() to copy the later elements including index to index + 1. The time complexity of this operation is O(N). You can see that the cost of adding elements to ArrayList array header is very high.

remove(int index) source code analysis

public E remove(int index) {
        // Check index 
        rangeCheck(index);

        modCount++;
        E oldValue = elementData(index);

        int numMoved = size - index - 1;
        if (numMoved > 0)
            // And add(int index, E element) principle.
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        // If the reference count is 0, garbage collection will be performed automatically.
        elementData[--size] = null; // clear to let GC do its work
        // Return old element
        return oldValue;
    }
  • You need to call System.arraycopy() to copy the later elements including index + 1 to the index position. The time complexity of this operation is O(N). You can see that the cost of adding elements to ArrayList array header is very high.

Fail fast mechanism

The fail fast mechanism is a kind of error detection mechanism in java collection. When the structure of the set changes in the process of iteration, it is possible to fail fast, that is to throw a ConcurrentModificationException exception. The fail fast mechanism does not guarantee that exceptions will be thrown in case of unsynchronized modification. It just tries its best to throw exceptions, so this mechanism is generally only used to detect bug s.

  • Structure change refers to all operations of adding or deleting at least one element, or adjusting the internal array size, just setting the value of the element does not mean that the structure changes.
  • When serializing or iterating, you need to compare whether the modCount changes before and after the operation. If it changes, you need to run the ConcurrentModificationException
private class Itr implements Iterator<E> {
        int cursor;
        int lastRet = -1;
        // The expected modification value is equal to the current number of modifications (modCount)
        int expectedModCount = modCount;
 
        public boolean hasNext() {
            return cursor != size;
        }
 
        public E next() {
            // Check whether expectedModCount is equal to modCount, if not, throw ConcurrentModificationException
            checkForComodification();
            /** Omit code here */
        }
 
        public void remove() {
            if (this.lastRet < 0)
                throw new IllegalStateException();
            checkForComodification();
            /** Omit code here */
        }
 
        final void checkForComodification() {
            if (ArrayList.this.modCount == this.expectedModCount)
                return;
            throw new ConcurrentModificationException();
        }
    }

An example of fail fast in single thread environment

     public static void main(String[] args) {
           List<String> list = new ArrayList<>();
           for (int i = 0 ; i < 10 ; i++ ) {
                list.add(i + "");
           }
           Iterator<String> iterator = list.iterator();
           int i = 0 ;
           while(iterator.hasNext()) {
                if (i == 3) {
                     list.remove(3);
                }
                System.out.println(iterator.next());
                i ++;
           }
     }

serialize

ArrayList implements the java.io.Serializable interface, but defines serialization and deserialization. Because ArrayList is based on array implementation and has dynamic capacity expansion feature, it is not necessary to serialize all the arrays to save elements. Therefore, the elementData array is decorated with transient to prevent automatic serialization.

private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException{
        // Write out element count, and any hidden stuff
        int expectedModCount = modCount;
        // Write the non static and non transient fields of the current class to the stream
        // The size field is also written here.
        s.defaultWriteObject();

        // Write out size as capacity for behavioural compatibility with clone()
        // Serialized array contains the number of elements for backward compatibility
        // Write size to stream twice
        s.writeInt(size);

        // Write out all elements in the proper order.
        // Write in sequence, only to the end of the array containing elements, and not to all the capacity areas of the array
        for (int i=0; i<size; i++) {
            s.writeObject(elementData[i]);
        }
        // Determine whether fast fail is triggered
        if (modCount != expectedModCount) {
            throw new ConcurrentModificationException();
        }
    }
    private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        // Sets the array reference to an empty array.
        elementData = EMPTY_ELEMENTDATA;

        // Read in size, and any hidden stuff
        // Read the non static and non transient fields in the stream to the current class
        // Include size
        s.defaultReadObject();

        // Read in capacity
        // It's useless to read in the number of elements, just because the size attribute is written when writing out, and it should be read in order when reading
        s.readInt(); // ignored

        if (size > 0) {
            // be like clone(), allocate array based upon size not capacity
            // Calculate capacity based on size.
            int capacity = calculateCapacity(elementData, size);
            // SharedSecrets is a "shared secret" repository, which is a mechanism,
            // Used to call implementation specific methods in another package without reflection. TODO
            SharedSecrets.getJavaOISAccess().checkArray(s, Object[].class, capacity);
            // Check whether expansion is needed
            ensureCapacityInternal(size);

            Object[] a = elementData;
            // Read in all elements in the proper order.
            // Read elements into the array in turn
            for (int i=0; i<size; i++) {
                a[i] = s.readObject();
            }
        }
    }

Why is size serialized twice in ArrayList?

In the code, the size in s.defaultWriteObject(); should also be serialized. Why do you need to serialize it again?
This is written for compatibility.
In the old version of JDK, the implementation of ArrayList is different, and the length field will be serialized.
In the new version of JDK, the implementation of ArrayList is optimized, and the length field is no longer serialized.
At this time, if s.writeInt(size) is removed, the objects serialized by the new version of JDK will not be read correctly in the old version,
Because the length field is missing.
So this way of writing seems to be superfluous, but in fact, it guarantees compatibility.

Summary

  • ArrayList is implemented in array mode, without capacity limitation (it will be expanded)
  • When you add elements, you may need to expand the capacity (so it's better to predict). When you delete elements, you will not reduce the capacity (if you want to reduce the capacity, you can use trimToSize()). When you delete elements, you will set the deleted location elements to null, and the next time gc will reclaim the memory space occupied by these elements.
  • Thread unsafe
  • add(int index, E element): when adding an element to a specified position in an array, you need to copy the position and all elements behind it back one bit
  • get(int index): when getting the element at the specified location, you can get it directly through the index (O(1))
  • remove(Object o) needs to traverse array
  • remove(int index) doesn't need to traverse the array, just judge whether the index meets the conditions, and the efficiency is higher than remove(Object o)
  • contains(E) needs to traverse array

Posted by Zjoske on Sun, 03 Nov 2019 01:02:52 -0700