Wrong practice of Parallel Stream

Keywords: Java

I. Preface

The emergence of Java8 Stream greatly simplifies the processing operation of collection data in business requirements. Although it is easy to use, it will bring unexpected results if it is not used properly. This article records the wrong practice of using Parallel Stream.

List<Object> sourceList = ...;
List<Object> list = new ArrayList();

sourceList.stream.map(...).foreach(list::add);

As shown above, the pseudo code processes the source data of the sourceList, and add s it to the result list after processing. During the run, a null element was found.

Two. Experiment

Write a simple Case test as follows:

public class StreamTest {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<>();
        IntStream.range(0, 50).parallel().map(e -> e * 2).forEach(list::add);
        System.out.println("size = " + list.size() + "\n" + list);
    }
}

Multiple executions found that the number of elements in the result set is not equal to the number of expected elements, and there are null elements in it, and there is a chance of array subscript overrun error.

size = 44
[30, 12, 32, 14, 34, 16, 42, 44, 46, 48, 24, 36, 20, 38, 40, null, 22, 6, 8, 10, 0, 2, 4, 56, 88, 82, 60, 84, 90, 92, 74, 94, 76, null, 50, 52, 98, 54, 62, 64, 66, 68, 70, 72]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
	at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
	at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
	at java.util.stream.ForEachOps$ForEachOp$OfInt.evaluateParallel(ForEachOps.java:189)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.util.stream.IntPipeline.forEach(IntPipeline.java:404)
	at jit.wxs.disruptor.stream.StreamTest.main(StreamTest.java:15)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 15
	at java.util.ArrayList.add(ArrayList.java:463)
	at java.util.stream.ForEachOps$ForEachOp$OfInt.accept(ForEachOps.java:205)
	at java.util.stream.IntPipeline$3$1.accept(IntPipeline.java:233)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
	at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
	at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.execLocalTasks(ForkJoinPool.java:1040)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1058)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Three, analysis

The reason for the problem is also very simple. Students who have known Parallel Stream know that it uses the ForkJoinPool thread pool to execute internally, that is to say, there is a thread safety problem, while ArrayList is thread unsafe. Next, we will analyze the causes of various abnormal conditions in turn.

3.1 loss of element quantity

// java.util.ArrayList#add(E)
public boolean add(E e) {
  ensureCapacityInternal(size + 1);  // Increments modCount!!
  elementData[size++] = e;
  return true;
}

The reason for the array subscript out of bounds is elementData[size++] = e in the add() method of ArrayList. This line of code is not an atomic operation. It can be disassembled as follows:

  1. Read size value
  2. Add e to the location of size, i.e. elementData[size] = e
  3. size++

There is A memory visibility problem here. When thread A reads size from memory, set e value, add 1 to size, and then write it to memory. In the process, thread B may also modify the size and write to memory, so the value written to memory by thread A will lose the update of thread B. This explains the fact that the array length is smaller than the original array (element loss).

3.2 null element

null element generation is similar to element data loss, which is also caused by elementData[size++] = e is not an atomic operation. Suppose there are three threads: thread 1, thread 2, and thread 3. Three threads start executing at the same time, with an initial size of 1.

  • Thread 1 has finished executing, and the size is updated to 2.

  • Thread 2 reads the size value at the beginning = 1. After e is added to the size position, the time slice is used up. It's the turn to execute the third step, size + + reads the update of thread 1, and the size is directly updated to 3. [Note: here, the E value of thread 2 is also lost, which is overwritten by thread 1]

  • Thread 3 runs out of time slice after reading size = 1 at the beginning. It's the second step to add e to the size position and read the update of thread 2. The size becomes 3. The position of size = 2 is skipped, so elementData[2] is null.

3.3 array subscript out of bounds

The array out of bounds exception mainly occurs at the critical point before array expansion. Assuming that only one element can be added to the current array, two threads are ready to execute the securecapacityinternal (size + 1) at the same time, and the size value read at the same time, adding 1 to enter the securecapacityinternal will not cause the expansion.

After exiting the ensurcapityinternal, two threads execute elementData[size] = e at the same time, and the size + + of thread B is completed first. Assuming that thread A reads the update of thread B at the moment, thread A executes size + +, and the actual value of size will be greater than the capacity of the array, so the array overrun exception will occur.

Four, solve

There are two ways to solve the problem. One is to make the result set thread safe.

List<Integer> list = new CopyOnWriteArrayList<>();
// or
List<Integer> list = Collections.synchronizedList(new ArrayList<>());

The second is to use the Stream's collect instead of forEach's own add:

public class StreamTest {
    public static void main(String[] args) {
        List<Integer> list = IntStream.range(0, 50).parallel().map(e -> e * 2).boxed().collect(Collectors.toList());
        System.out.println("size = " + list.size() + "\n" + list);
    }
}

5, References

269 original articles published, 397 praised, 540000 visitors+
His message board follow

Posted by cartoonjunkie on Fri, 21 Feb 2020 21:41:28 -0800