Deep Analysis of ForkJoinPool in Dead Javaa Thread Series

Keywords: Java less Mobile Programming

(Source code is easier to see on mobile phone's horizontal screen)

Note: The java source analysis section is based on the Java 8 version unless otherwise specified.

Note: This article divides and conquers thread pool classes based on ForkJoinPool.

brief introduction

With the development and widespread use of multicore processors on hardware, concurrent programming has become a technology that programmers must master, and knowledge about concurrency is often examined in interviews.

Today, let's look at an interview:

How can I make full use of multicore CPU s to calculate the sum of all integers in a large array?


  • Single threaded addition?

The easiest thing to think of is single-threaded addition, a for loop.

  • Thread pools added?

If further optimized, we naturally think about using a thread pool to add segments together, and finally add the results of each segment.

  • Other?

Yes, it's our protagonist today, ForkJoinPool, but how does it work?It doesn't seem like you've used Ha^^

Three implementations

OK, the analysis is complete, let's look directly at the three implementations, no ink, serve directly.

 * Calculate the sum of 100 million integers
public class ForkJoinPoolTest01 {
    public static void main(String[] args) throws ExecutionException, InterruptedException {
        // Construct data
        int length = 100000000;
        long[] arr = new long[length];
        for (int i = 0; i < length; i++) {
            arr[i] = ThreadLocalRandom.current().nextInt(Integer.MAX_VALUE);
        // Single Thread
        // ThreadPoolExecutor Thread Pool
        // ForkJoinPool Thread Pool


    private static void singleThreadSum(long[] arr) {
        long start = System.currentTimeMillis();

        long sum = 0;
        for (int i = 0; i < arr.length; i++) {
            // The simulation is time consuming. This paper was originally created by Gongcong Tong Gong Read Source
            sum += (arr[i]/3*3/3*3/3*3/3*3/3*3);

        System.out.println("sum: " + sum);
        System.out.println("single thread elapse: " + (System.currentTimeMillis() - start));


    private static void multiThreadSum(long[] arr) throws ExecutionException, InterruptedException {
        long start = System.currentTimeMillis();

        int count = 8;
        ExecutorService threadPool = Executors.newFixedThreadPool(count);
        List<Future<Long>> list = new ArrayList<>();
        for (int i = 0; i < count; i++) {
            int num = i;
            // Submit Tasks in Segments
            Future<Long> future = threadPool.submit(() -> {
                long sum = 0;
                for (int j = arr.length / count * num; j < (arr.length / count * (num + 1)); j++) {
                    try {
                        // Time-consuming simulation
                        sum += (arr[j]/3*3/3*3/3*3/3*3/3*3);
                    } catch (Exception e) {
                return sum;

        // Add results for each segment
        long sum = 0;
        for (Future<Long> future : list) {
            sum += future.get();

        System.out.println("sum: " + sum);
        System.out.println("multi thread elapse: " + (System.currentTimeMillis() - start));

    private static void forkJoinSum(long[] arr) throws ExecutionException, InterruptedException {
        long start = System.currentTimeMillis();

        ForkJoinPool forkJoinPool = ForkJoinPool.commonPool();
        // Submit Tasks
        ForkJoinTask<Long> forkJoinTask = forkJoinPool.submit(new SumTask(arr, 0, arr.length));
        // Get results
        Long sum = forkJoinTask.get();


        System.out.println("sum: " + sum);
        System.out.println("fork join elapse: " + (System.currentTimeMillis() - start));

    private static class SumTask extends RecursiveTask<Long> {
        private long[] arr;
        private int from;
        private int to;

        public SumTask(long[] arr, int from, int to) {
            this.arr = arr;
            this.from = from;
   = to;

        protected Long compute() {
            // Direct addition when less than 1000, flexible adjustment
            if (to - from <= 1000) {
                long sum = 0;
                for (int i = from; i < to; i++) {
                    // Time-consuming simulation
                    sum += (arr[i]/3*3/3*3/3*3/3*3/3*3);
                return sum;

            // Divided into two tasks, this article was originally created by Gongcong "Tong Gong Read Source"
            int middle = (from + to) / 2;
            SumTask left = new SumTask(arr, from, middle);
            SumTask right = new SumTask(arr, middle, to);

            // Submit tasks to the left
            // Tasks on the right are computed directly from the current thread, saving overhead
            Long rightResult = right.compute();
            // Wait until the calculation on the left is complete
            Long leftResult = left.join();
            // Return results
            return leftResult + rightResult;

Tong told you secretly that in fact, adding up 100 million integers, single-threaded is the fastest. My computer is about 100ms, but using thread pool will be slower.

So, to demonstrate how good ForkJoinPool is, I've performed an operation on each number/3*3/3*3/3*3/3*3/3*3 to simulate the time-consuming calculations.

To see the results:

sum: 107352457433800662
single thread elapse: 789
sum: 107352457433800662
multi thread elapse: 228
sum: 107352457433800662
fork join elapse: 189

As you can see, ForkJoinPool has a significant improvement over the normal thread pool.

Question: Can a normal thread pool implement ForkJoinPool, which is to split large tasks into tasks, medium tasks into small tasks, and finally summarize them?

You can try it (---)

OK, now we're officially going into ForkJoinPool's analysis.

Divide and Conquer

  • basic thought

Divide a large-scale problem into smaller-scale sub-problems, divide and conquer them, and merge the solutions of the sub-problems to get the solution of the original problem.

  • step

(1) Split the original problem:

(2) Solving subproblems:

(3) The solution of the merged subproblem is the solution of the original problem.

In a partition method, subproblems are generally independent of each other, so they are often solved by recursive invocation of algorithms.

  • Typical application scenarios

(1) Binary search

(2) Multiplication of large integers

(3) Strassen Matrix Multiplication

(4) board coverage

(5) Merge and Sort

(6) Quick sorting

(7) Linear time selection

(8) Tower of Hanoi

ForkJoinPool Inheritance System

ForkJoinPool is a new thread pool class in java 7, and its inheritance system is as follows:

ForkJoinPool and ThreadPoolExecutor are both inherited from the AbstractExecutorService Abstract class, so they are almost identical to ThreadPoolExecutor except that the task becomes ForkJoinTask.

Here, another important design principle is applied, the open and close principle, which is open to modifications and extensions.

It can be seen that the interface design of the entire thread pool system is very good at the beginning. A new thread pool class will not interfere with the original code, but can also take advantage of the original features.


Two main methods

  • fork()

The fork() method is similar to the Thread.start() method for threads, but it does not really start a thread, it puts tasks into the work queue.

  • join()

The join() method is similar to the Thread.join() method of a thread, but instead of simply blocking the thread, it uses the worker thread to run other tasks.When a join() method is called in a worker thread, it will process other tasks until it notices that the target subtasks have been completed.

Three Subclasses

  • RecursiveAction

No return value task.

  • RecursiveTask

There is a return value task.

  • CountedCompleter

No return value task, callback can be triggered when the task is completed.

Internal Principles of ForkJoinPool

Internally, ForkJoinPool uses a job-stealing algorithm.

(1) Each worker thread has its own workqueue WorkQueue;

(2) This is a double-ended queue, which is private to threads;

(3) The fork subtasks in ForkJoinTask will be placed in the header of the worker thread running the task, and the worker thread will process the tasks in the worker queue in the LIFO order;

(4) To maximize CPU utilization, idle threads will "steal" tasks from other threads'queues;

(5) Stealing tasks from the tail of the work queue to reduce competition;

(6) Operation of double-ended queue: push()/pop() is called only in its owner worker thread, poll() is called when another thread steals a task;

(7) When only the last task remains, there will still be competition, which is achieved through CAS;

ForkJoinPool Best Practices

(1) Computing-intensive tasks are the most suitable ones. This paper was originally created by Gonggong Tong Tong Tong Read Source Code;

(2) ManagedBlocker can be used when the worker thread needs to be blocked;

(3) ForkJoinPool.invoke()/invokeAll() should not be used inside RecursiveTask;


(1) ForkJoinPool is particularly suitable for the implementation of the divide-and-conquer algorithm;

(2) ForkJoinPool and ThreadPoolExecutor are complementary, not who is substituting for whom, and they apply in different scenarios;

(3) ForkJoinTask has two core methods, fork() and join(), and three important subclasses, RecursiveAction, RecursiveTask and CountdCompleter;

(4) The internal implementation of Forkjoin Pool is based on the "work stealing" algorithm;

(5) Each thread has its own work queue, which is a double-ended queue, accessing tasks from the head of the queue itself, and stealing tasks from the tail by other threads;

(6) ForkJoinPool is best suited for computing intensive tasks, but ManagedBlocker can also be used for blocking tasks;

(7) RecursiveTask can call fork() once less internally, which is a skill to use the current thread processing;


How does ManagedBlocker work?

A: ManagedBlocker is equivalent to explicitly telling the ForkJoinPool framework that it is blocking and ForkJoinPool will start another thread to run the task to maximize CPU utilization.

Look at the example below and think about it yourself.

 * Fibonacci series
 * A number is the sum of its first two numbers
 * 1,1,2,3,5,8,13,21
public class Fibonacci {

    public static void main(String[] args) {
        long time = System.currentTimeMillis();
        Fibonacci fib = new Fibonacci();
        int result = fib.f(1_000).bitCount();
        time = System.currentTimeMillis() - time;
        System.out.println("result´╝îThis article was originally created by Gong Cong's Tong Gong Read Source Code = " + result);
        System.out.println("test1_000() time = " + time);

    public BigInteger f(int n) {
        Map<Integer, BigInteger> cache = new ConcurrentHashMap<>();
        cache.put(0, BigInteger.ZERO);
        cache.put(1, BigInteger.ONE);
        return f(n, cache);

    private final BigInteger RESERVED = BigInteger.valueOf(-1000);

    public BigInteger f(int n, Map<Integer, BigInteger> cache) {
        BigInteger result = cache.putIfAbsent(n, RESERVED);
        if (result == null) {

            int half = (n + 1) / 2;

            RecursiveTask<BigInteger> f0_task = new RecursiveTask<BigInteger>() {
                protected BigInteger compute() {
                    return f(half - 1, cache);

            BigInteger f1 = f(half, cache);
            BigInteger f0 = f0_task.join();

            long time = n > 10_000 ? System.currentTimeMillis() : 0;
            try {

                if (n % 2 == 1) {
                    result = f0.multiply(f0).add(f1.multiply(f1));
                } else {
                    result = f0.shiftLeft(1).add(f1).multiply(f1);
                synchronized (RESERVED) {
                    cache.put(n, result);
            } finally {
                time = n > 10_000 ? System.currentTimeMillis() - time : 0;
                if (time > 50)
                    System.out.printf("f(%d) took %d%n", n, time);
        } else if (result == RESERVED) {
            try {
                ReservedFibonacciBlocker blocker = new ReservedFibonacciBlocker(n, cache);
                result = blocker.result;
            } catch (InterruptedException e) {
                throw new CancellationException("interrupted");

        return result;
        // return f(n - 1).add(f(n - 2));

    private class ReservedFibonacciBlocker implements ForkJoinPool.ManagedBlocker {
        private BigInteger result;
        private final int n;
        private final Map<Integer, BigInteger> cache;

        public ReservedFibonacciBlocker(int n, Map<Integer, BigInteger> cache) {
            this.n = n;
            this.cache = cache;

        public boolean block() throws InterruptedException {
            synchronized (RESERVED) {
                while (!isReleasable()) {
            return true;

        public boolean isReleasable() {
            return (result = cache.get(n)) != RESERVED;

Welcome to pay attention to my public number "Tong Gong Read Source". Check out more articles about source series and enjoy the sea of source code with Tong Gong.

Posted by sharpnova on Fri, 08 Nov 2019 13:47:32 -0800