Java foundation - ArrayList source code analysis and design ideas

No, let's start!

Guiding language

We use ArrayList almost every day, but during the real interview, we find that many people still don't know the details of the source code, leaving a bad impression on the interviewer. In this section, we'll take a look at the source code related to ArrayList in the interview.

1. Overall architecture

The overall architecture of ArrayList is relatively simple, which is an array structure, as shown in the following figure:

The figure shows an array with a length of 10, counting from 1, index represents the subscript of the array, counting from 0, and elementData represents the array itself. In addition to these two concepts, there are the following three basic concepts in the source code:

DEFAULT_ Capability indicates the initial size of the array. The default is 10. Remember this number;
Size indicates the size of the current array. The type is int. it is not decorated with volatile. It is non thread safe
modCount counts the number of modified versions of the current array. If the array structure changes, it will be + 1.

Class annotation

To see the source code, first look at the class annotation. Let's see what the class annotation says, as follows:

Allow put null value, and the capacity will be expanded automatically;
The time complexity of methods such as size, isEmpty, get, set and add is O (1);
Enhance the for loop or use an iterator. If the array size is changed during iteration, it will fail quickly and throw an exception.

In addition to the four points mentioned in the above comments, the essence of initialization and capacity expansion, iterators and other issues are often asked. Next, we will analyze them one by one from the source code.

2. Source code analysis

2.1 initialization

We have three initialization methods: direct initialization without parameters, initialization with specified size and initialization with specified initial data. The source code is as follows:

Direct initialization without parameters:

private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

// Directly initialize without parameters, and the array size is empty
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

Specify size initialization:

transient Object[] elementData;
// Specify array length initialization
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}

Specify initial data initialization:

// Specify initial data initialization
public ArrayList(Collection<? extends E> c) {
    Object[] a = c.toArray();
    if ((size = a.length) != 0) {
        if (c.getClass() == ArrayList.class) {
            elementData = a;
        } else {
            elementData = Arrays.copyOf(a, size, Object[].class);
        }
    } else {
        // The given data has no value, and the default is an empty array
        elementData = EMPTY_ELEMENTDATA;
    }
}

In addition to the Chinese notes of the source code, we add two points:

When the ArrayList parameterless constructor is initialized, the default size is an empty array, which is not what we often say 10. 10 is the array value expanded at the first add.
When specifying the initial data initialization, we found a comment like see 6260652. This is a bug in Java, which means that when the element in a given collection is not of Object type, we will convert it to Object type.
Generally, this bug will not be triggered. It will only be triggered in the following scenarios: after the ArrayList is initialized (the ArrayList element is not of Object type), call the toArray method again to get the Object array, and assign a value to the Object array. The code and reason are shown in the figure:

Official view Document address , the problem is solved in Java 9.

2.2. Realization of new addition and expansion

Adding is to add elements to the array, which is mainly divided into two steps:

Judge whether capacity expansion is required. If necessary, perform capacity expansion;
Direct assignment.

The two-step source code is reflected as follows:

public boolean add(E e) {
    //Ensure that the size of the array is sufficient for capacity expansion. The size is the size of the current array
    ensureCapacityInternal(size + 1); // Increments modCount!!
    //Direct assignment, thread unsafe
    elementData[size++] = e;
    return true;
}

Let's take a look at the source code of ensureCapacityInternal:

private void ensureCapacityInternal(int minCapacity) {
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

// Calculate the required capacity
private static int calculateCapacity(Object[] elementData, int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        return Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    return minCapacity;
}

// Ensure sufficient capacity
private void ensureExplicitCapacity(int minCapacity) {
    // Record the number of array modifications
    modCount++;

    // If the expected capacity is greater than the length of the current array, the capacity is expanded
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

// Expand the capacity and copy the existing data into the new array
private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    // oldCapacity > > 1 means dividing oldCapacity by 2
    // It is half of the original capacity + capacity, i.e. 1.5 times
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    // If the expanded value is less than our expected value, the expanded value is equal to our expected value
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    // If the expanded value > the maximum value of the array that the JVM can allocate, the maximum value of Integer is used
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // Copy array
    elementData = Arrays.copyOf(elementData, newCapacity);
}

The notes should be detailed. We should pay attention to the following four points:

The rule of capacity expansion is not to double, but half of the original capacity + capacity. Frankly, the size after capacity expansion is 1.5 times of the original capacity;
The maximum value of an array in ArrayList is Integer.MAX_VALUE, beyond which the JVM will not allocate memory space to the array.
When adding, the value is not strictly verified, so ArrayList allows null values.

From the new and expanded source code, the following points are worth learning from:

When expanding the source code, we have the awareness of array size overflow, that is, the lower bound of array size after expansion cannot be less than 0 and the upper bound cannot be greater than the maximum value of Integer. We can learn this awareness.

After the expansion, the assignment is very simple. You can directly add elements to the array: elementData [size++] =e. It is through this simple assignment that there is no lock control, so the operation here is thread unsafe.

2.3 essence of capacity expansion

Capacity expansion is realized through this line of code:

Arrays.copyOf(elementData, newCapacity);

This line of code describes the essence of copying between arrays. For capacity expansion, we will first create a new array that meets our expected capacity, and then copy the data of the old array. We copy it through the System.arraycopy method. This method is a native method. The source code is as follows:

/**
* @param src Copied array
* @param srcPos Start with the array
* @param dest target array
* @param destPos Copy from the index position of the target array
* @param length Length of copy
* This method has no return value, and the value is passed through the reference of dest
*/
public static native void arraycopy(Object src, int srcPos,Object dest, int destPos,int length);

2.4. Delete

There are many ways to delete elements in ArrayList, such as deleting according to array index, deleting according to value or batch deleting. The principle and idea are the same. We choose the method of deleting according to value to explain the source code:

public boolean remove(Object o) {
    // If the value to be deleted is null, find the deletion with the first null value in the array
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        // If the value to be deleted is not null, find the first deletion equal to the value to be deleted
        for (int index = 0; index < size; index++)
            // Here, the values are determined to be equal according to equals, and then deleted according to the index position
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

We need to pay attention to two points:

Null is not checked when adding, so null values can be deleted when deleting;
The index position of the value in the array is determined by equals. If the array element is not a basic type, we need to pay attention to the specific implementation of equals.

The above code has found the index position of the element to be deleted. The following code deletes the element according to the index position:

private void fastRemove(int index) {
    // The structure of the record array is about to change. The number of array modifications + 1
    modCount++;
    // After deletion, you need to move the following elements forward to calculate the moved quantity
    int numMoved = size - index - 1;
    if (numMoved > 0)
        // Move the following elements
        // numMoved indicates how many elements need to be moved from the back of index to the front after deleting the elements at the index position
        // The reason for subtracting 1 is that size starts from 1 and index starts from 0
        // It is copied from the position of index +1. The starting position of the copy is index and the length is numMoved
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    // The last position of the array is assigned null to help GC
    elementData[--size] = null; // clear to let GC do its work
}

From the source code, we can see that after an element is deleted, in order to maintain the array structure, we will move the elements behind the array forward.

2.5 iterators

If you want to implement the iterator yourself, just implement the java.util.Iterator class. ArrayList does the same. Let's take a look at some general parameters of the iterator:

int cursor;// During the iteration, the position of the next element starts from 0 by default.
int lastRet = -1; // New scenario: indicates the location of the index in the last iteration; Delete scene: - 1.
int expectedModCount = modCount;// expectedModCount indicates the expected version number during the iteration; modCount represents the actual version number of the array.

Iterators generally have three methods:

Is there any value for hasNext that can be iterated
next if there is a value that can be iterated, what is the value of the iteration
remove deletes the value of the current iteration

Let's look at the source code of the following three methods:

2.5.1,hasNext

public boolean hasNext() {
  return cursor != size;//cursor indicates the position of the next element, and size indicates the actual size. If they are equal, it means that there are no elements to iterate. If they are not equal, it means that they can iterate
}

2.5.2,next

public E next() {
  //During the iteration, judge whether the version number has been modified, and throw the ConcurrentModificationException
  checkForComodification();
  //During this iteration, the index position of the element
  int i = cursor;
  if (i >= size)
    throw new NoSuchElementException();
  Object[] elementData = ArrayList.this.elementData;
  if (i >= elementData.length)
    throw new ConcurrentModificationException();
  // At the next iteration, the position of the element is to prepare for the next iteration
  cursor = i + 1;
  // Return element value
  return (E) elementData[lastRet = i];
}
// Version number comparison
final void checkForComodification() {
  if (modCount != expectedModCount)
    throw new ConcurrentModificationException();
}

As can be seen from the source code, the next method does two things. The first is to check whether the iteration can continue. The second is to find the value of the iteration and prepare for the next iteration (cursor+1).

2.5.3,remove

public void remove() {
  // If the position of the array is less than 0 during the last operation, it indicates that the array has been deleted
  if (lastRet < 0)
    throw new IllegalStateException();
  //During the iteration, judge whether the version number has been modified, and throw the ConcurrentModificationException
  checkForComodification();
 
  try {
    ArrayList.this.remove(lastRet);
    cursor = lastRet;
    // -1 indicates that the element has been deleted, and duplicate deletion is also prevented here
    lastRet = -1;
    // When deleting an element, the value of modCount has changed. Assign it to expectedModCount here
    // In this way, the values of the two are consistent in the next iteration
    expectedModCount = modCount;
  } catch (IndexOutOfBoundsException ex) {
    throw new ConcurrentModificationException();
  }
}

Here we need to pay attention to two points:

The purpose of lastRet = -1 is to prevent duplicate deletion
If the element is deleted successfully, the current modCount of the array will change. Here, the expectedModCount will be re assigned, and the values of the two will be the same in the next iteration

2.6 time complexity

From the source code analysis of the add or delete method above, the operation of array elements only needs to add and delete directly according to the array index, so the time complexity is O (1).

2.7 thread safety

We need to emphasize that there is a thread safety problem only when ArrayList is a shared variable. When ArrayList is a local variable in a method, there is no thread safety problem.

The essence of the thread safety problem of ArrayList is that the elementData, size and modConut of ArrayList are not locked during various operations, and the types of these variables are not volatile. Therefore, if multiple threads operate on these variables, the values may be overwritten.

In the class annotation, it is recommended that we use Collections#synchronizedList to ensure thread safety. SynchronizedList is realized by locking each method. Although thread safety is realized, the performance is greatly reduced. The specific implementation source code is as follows:

public boolean add(E e) {
    synchronized (mutex) {// Synchronized is a lightweight lock, and mutex represents a current synchronized list
        return c.add(e);
    }
}

summary

Starting from the overall architecture of ArrayList, this paper lands on the core source code implementation such as initialization, addition, capacity expansion, deletion and iteration. We find that ArrayList actually focuses on the underlying array structure, and each API encapsulates the operation of the array, so that users do not need to perceive the underlying implementation, but only need to pay attention to how to use it.

No wordy, the end of the article, it is recommended to connect three times!

Posted by Helminthophobe on Fri, 17 Sep 2021 03:59:30 -0700

Programmer Group