Phaser Source Parsing of Dead java Synchronization Series

Keywords: Java IE

problem

(1) what is Phaser?

(2) What are the characteristics of Phaser?

(3) Phaser's advantages over Cyclic Barrier and Count Down Latch?

brief introduction

Phaser, translated into stages, is suitable for a scenario where a large task can be accomplished in multiple stages, and tasks in each stage can be executed concurrently by multiple threads, but tasks in the previous stage must be completed before tasks in the next stage can be executed.

Although this scenario can also be implemented using Cyclic Barrier or Country DownLatch, it is much more complex. Firstly, how many phases are needed may change, and secondly, the number of tasks in each phase may also change. Phaser is more flexible and convenient than Cyclic Barrier and Count DownLatch.

Usage method

Let's look at the simplest use case:

public class PhaserTest {

    public static final int PARTIES = 3;
    public static final int PHASES = 4;

    public static void main(String[] args) {

        Phaser phaser = new Phaser(PARTIES) {
            @Override
            protected boolean onAdvance(int phase, int registeredParties) {
                // This article is original by the public number "Tong Ge read the source code". Please support originality. Thank you! ]
                System.out.println("=======phase: " + phase + " finished=============");
                return super.onAdvance(phase, registeredParties);
            }
        };

        for (int i = 0; i < PARTIES; i++) {
            new Thread(()->{
                for (int j = 0; j < PHASES; j++) {
                    System.out.println(String.format("%s: phase: %d", Thread.currentThread().getName(), j));
                    phaser.arriveAndAwaitAdvance();
                }
            }, "Thread " + i).start();
        }
    }
}

Here we define a big task that needs four stages to complete. Each stage needs three small tasks. For these small tasks, we set up three threads to perform these small tasks separately. The output is as follows:

Thread 0: phase: 0
Thread 2: phase: 0
Thread 1: phase: 0
=======phase: 0 finished=============
Thread 2: phase: 1
Thread 0: phase: 1
Thread 1: phase: 1
=======phase: 1 finished=============
Thread 1: phase: 2
Thread 0: phase: 2
Thread 2: phase: 2
=======phase: 2 finished=============
Thread 0: phase: 3
Thread 2: phase: 3
Thread 1: phase: 3
=======phase: 3 finished=============

As you can see, each stage is completed by three threads before entering the next stage. How does this work? Let's learn together.

Principle guess

Based on the principle of AQS, we can guess the implementation principle of Phaser.

First, we need to store the current phase, the number of tasks (participants) in the current phase and the number of unfinished participants. These three variables can be stored in a variable state.

Secondly, a queue is needed to store the first completed participants, and when the last participant completes the task, it is necessary to wake up the participants in the queue.

Well, that's about it.

Combined with the above case, bring in:

Initially, the current stage is 0, the number of participants is 3, and the number of uncompleted participants is 3.

The first thread executes to phaser.arriveAndAwaitAdvance(); when it enters the queue;

The second thread executes to phaser.arriveAndAwaitAdvance(); when it enters the queue;

The third thread executes to phaser.arriveAndAwaitAdvance(); it first executes the summary onAdvance() of this stage, and then wakes up the first two threads to continue the task of the next stage.

Well, the whole makes sense. As for whether this is the case, let's look at the source code.

Source code analysis

Major internal classes

static final class QNode implements ForkJoinPool.ManagedBlocker {
    final Phaser phaser;
    final int phase;
    final boolean interruptible;
    final boolean timed;
    boolean wasInterrupted;
    long nanos;
    final long deadline;
    volatile Thread thread; // nulled to cancel wait
    QNode next;

    QNode(Phaser phaser, int phase, boolean interruptible,
          boolean timed, long nanos) {
        this.phaser = phaser;
        this.phase = phase;
        this.interruptible = interruptible;
        this.nanos = nanos;
        this.timed = timed;
        this.deadline = timed ? System.nanoTime() + nanos : 0L;
        thread = Thread.currentThread();
    }
}

First completed participants are placed in the queue of nodes, where we only need to focus on threads and next two attributes can be, obviously this is a single linked list, stored in the queue threads.

Main attributes

// State variables that store the current phase, number of participants, and number of unfinished participants unarrived_count
private volatile long state;
// How many participants can there be at most, that is, how many tasks can there be at most at each stage?
private static final int  MAX_PARTIES     = 0xffff;
// How many stages can there be at most?
private static final int  MAX_PHASE       = Integer.MAX_VALUE;
// The offset of the number of participants
private static final int  PARTIES_SHIFT   = 16;
// Current phase offset
private static final int  PHASE_SHIFT     = 32;
// Mask for the number of unfinished participants, 16 bits low
private static final int  UNARRIVED_MASK  = 0xffff;      // to mask ints
// Number of participants, 16 in the middle
private static final long PARTIES_MASK    = 0xffff0000L; // to mask longs
// The mask of counts, which is equal to the'|'operation of the number of participants and the number of incomplete participants
private static final long COUNTS_MASK     = 0xffffffffL;
private static final long TERMINATION_BIT = 1L << 63;

// Complete one participant at a time
private static final int  ONE_ARRIVAL     = 1;
// Increase and decrease the number of participants used
private static final int  ONE_PARTY       = 1 << PARTIES_SHIFT;
// Reduce the use of participants
private static final int  ONE_DEREGISTER  = ONE_ARRIVAL|ONE_PARTY;
// Use without participants
private static final int  EMPTY           = 1;

// Used to find the number of incomplete participants
private static int unarrivedOf(long s) {
    int counts = (int)s;
    return (counts == EMPTY) ? 0 : (counts & UNARRIVED_MASK);
}
// Used to find the number of participants (middle 16), pay attention to the position of int
private static int partiesOf(long s) {
    return (int)s >>> PARTIES_SHIFT;
}
// For calculating the number of stages (32 bits high), pay attention to the position of int
private static int phaseOf(long s) {
    return (int)(s >>> PHASE_SHIFT);
}
// Number of completed participants
private static int arrivedOf(long s) {
    int counts = (int)s; // Low 32 bit
    return (counts == EMPTY) ? 0 :
        (counts >>> PARTIES_SHIFT) - (counts & UNARRIVED_MASK);
}
// It is used to store the thread of the completed participant, and select different queues according to the parity of the current stage.
private final AtomicReference<QNode> evenQ;
private final AtomicReference<QNode> oddQ;

The main attributes are state, evenQ and oddQ:

(1) state, state variables, high 32 bit storage at the current stage phase, the number of intermediate 16 bit storage participants, and the number of participants who have not completed 16 bit storage. ]

(2) EveQ and oddQ, queues stored by completed participants, wake up the participants in the queue after the last participant completes the task to continue the task of the next stage or finish the task.

Construction method

public Phaser() {
    this(null, 0);
}

public Phaser(int parties) {
    this(null, parties);
}

public Phaser(Phaser parent) {
    this(parent, 0);
}

public Phaser(Phaser parent, int parties) {
    if (parties >>> PARTIES_SHIFT != 0)
        throw new IllegalArgumentException("Illegal number of parties");
    int phase = 0;
    this.parent = parent;
    if (parent != null) {
        final Phaser root = parent.root;
        this.root = root;
        this.evenQ = root.evenQ;
        this.oddQ = root.oddQ;
        if (parties != 0)
            phase = parent.doRegister(1);
    }
    else {
        this.root = this;
        this.evenQ = new AtomicReference<QNode>();
        this.oddQ = new AtomicReference<QNode>();
    }
    // The storage of state variables is divided into three segments
    this.state = (parties == 0) ? (long)EMPTY :
        ((long)phase << PHASE_SHIFT) |
        ((long)parties << PARTIES_SHIFT) |
        ((long)parties);
}

There is also a parent and root in the constructor, which are used to construct multi-level stages and are not covered by this article.

The emphasis is still on the assignment of state, which stores the current phase with 32 bits high, the number of participants with 16 bits middle and the number of participants with 16 bits low.

Let's look at the source code of several main methods:

register() method

Register a participant, and if the onAdvance() method is executing when the method is called, the method waits for its execution to complete.

public int register() {
    return doRegister(1);
}
private int doRegister(int registrations) {
    // The value that state should add, note that this is equivalent to adding parties and unarrived at the same time
    long adjust = ((long)registrations << PARTIES_SHIFT) | registrations;
    final Phaser parent = this.parent;
    int phase;
    for (;;) {
        // The value of state
        long s = (parent == null) ? state : reconcileState();
        // The lower 32 bits of state are the values of parties and unarrived
        int counts = (int)s;
        // Value of parties
        int parties = counts >>> PARTIES_SHIFT;
        // The value of unarrived
        int unarrived = counts & UNARRIVED_MASK;
        // Check for spillovers
        if (registrations > MAX_PARTIES - parties)
            throw new IllegalStateException(badRegister(s));
        // Current phase
        phase = (int)(s >>> PHASE_SHIFT);
        if (phase < 0)
            break;
        // Not the first participant
        if (counts != EMPTY) {                  // not 1st registration
            if (parent == null || reconcileState() == s) {
                // unarrived equals 0, indicating that the onAdvance() method is being executed at the current stage, waiting for its execution to complete
                if (unarrived == 0)             // wait out advance
                    root.internalAwaitAdvance(phase, null);
                // Otherwise, change the value of state, add adjust, and jump out of the loop if it succeeds.
                else if (UNSAFE.compareAndSwapLong(this, stateOffset,
                                                   s, s + adjust))
                    break;
            }
        }
        // Is the first participant
        else if (parent == null) {              // 1st root registration
            // Calculate the value of state
            long next = ((long)phase << PHASE_SHIFT) | adjust;
            // Modify the value of state to jump out of the loop if it succeeds
            if (UNSAFE.compareAndSwapLong(this, stateOffset, s, next))
                break;
        }
        else {
            // Multilayer Stage Processing
            synchronized (this) {               // 1st sub registration
                if (state == s) {               // recheck under lock
                    phase = parent.doRegister(1);
                    if (phase < 0)
                        break;
                    // finish registration whenever parent registration
                    // succeeded, even when racing with termination,
                    // since these are part of the same "transaction".
                    while (!UNSAFE.compareAndSwapLong
                           (this, stateOffset, s,
                            ((long)phase << PHASE_SHIFT) | adjust)) {
                        s = state;
                        phase = (int)(root.state >>> PHASE_SHIFT);
                        // assert (int)s == EMPTY;
                    }
                    break;
                }
            }
        }
    }
    return phase;
}
// Waiting for the onAdvance() method to complete
// The principle is to spin a certain number of times first. If we enter the next stage, this method will return directly.
// If the next stage is not reached after a certain number of spins, the current thread queues and waits for onAdvance() to wake up after execution.
private int internalAwaitAdvance(int phase, QNode node) {
    // Guarantee that the queue is empty
    releaseWaiters(phase-1);          // ensure old queue clean
    boolean queued = false;           // true when node is enqueued
    int lastUnarrived = 0;            // to increase spins upon change
    // Number of spins
    int spins = SPINS_PER_ARRIVAL;
    long s;
    int p;
    // Check if the current phase changes, and if the change indicates that the next phase is in progress, then there is no need to spin.
    while ((p = (int)((s = state) >>> PHASE_SHIFT)) == phase) {
        // If the node is empty, the incoming is empty when registering
        if (node == null) {           // spinning in noninterruptible mode
            // Number of unfinished participants
            int unarrived = (int)s & UNARRIVED_MASK;
            // unarrived changes, increasing the number of spins
            if (unarrived != lastUnarrived &&
                (lastUnarrived = unarrived) < NCPU)
                spins += SPINS_PER_ARRIVAL;
            boolean interrupted = Thread.interrupted();
            // When the number of spins is over, a new node is created.
            if (interrupted || --spins < 0) { // need node to record intr
                node = new QNode(this, phase, false, false, 0L);
                node.wasInterrupted = interrupted;
            }
        }
        else if (node.isReleasable()) // done or aborted
            break;
        else if (!queued) {           // push onto queue
            // Nodes are queued
            AtomicReference<QNode> head = (phase & 1) == 0 ? evenQ : oddQ;
            QNode q = node.next = head.get();
            if ((q == null || q.phase == phase) &&
                (int)(state >>> PHASE_SHIFT) == phase) // avoid stale enq
                queued = head.compareAndSet(q, node);
        }
        else {
            try {
                // The current thread is blocked and waits to be waked up, just like calling LockSupport.park().
                ForkJoinPool.managedBlock(node);
            } catch (InterruptedException ie) {
                node.wasInterrupted = true;
            }
        }
    }
    
    // To show that the thread where the node is located has been awakened
    if (node != null) {
        // Threads in empty nodes
        if (node.thread != null)
            node.thread = null;       // avoid need for unpark()
        if (node.wasInterrupted && !node.interruptible)
            Thread.currentThread().interrupt();
        if (p == phase && (p = (int)(state >>> PHASE_SHIFT)) == phase)
            return abortWait(phase); // possibly clean up on abort
    }
    // Wake up currently blocked threads
    releaseWaiters(phase);
    return p;
}

The overall logic of adding a participant is as follows:

(1) To add a participant, two values of parties and unarrived, i.e. the middle 16 and the low 16 positions of state, need to be added at the same time.

(2) If it is the first participant, try to update the value of state atomically and exit if it succeeds.

(3) If it is not the first participant, check whether onAdvance() is being executed, if it is waiting for onAdvance() to be executed, and if it is not, try to update the value of state atomically until it exits successfully.

(4) Waiting for onAdvance() to complete is to wait by spinning first and then queuing to reduce thread context switching.

arriveAndAwaitAdvance() method

The current stage of the current thread is completed, waiting for other threads to complete the current stage.

If the current thread is the last to arrive at this stage, the current thread executes the onAdvance() method and wakes other threads to the next stage.

public int arriveAndAwaitAdvance() {
    // Specialization of doArrive+awaitAdvance eliminating some reads/paths
    final Phaser root = this.root;
    for (;;) {
        // The value of state
        long s = (root == this) ? state : reconcileState();
        // Current stage
        int phase = (int)(s >>> PHASE_SHIFT);
        if (phase < 0)
            return phase;
        // Values of parties and unarrived
        int counts = (int)s;
        // unarrived value (low 16 bits of state)
        int unarrived = (counts == EMPTY) ? 0 : (counts & UNARRIVED_MASK);
        if (unarrived <= 0)
            throw new IllegalStateException(badArrive(s));
        // Modify the value of state
        if (UNSAFE.compareAndSwapLong(this, stateOffset, s,
                                      s -= ONE_ARRIVAL)) {
            // If it's not the last one to arrive, call the internalAwaitAdvance() method to spin or enter the queue to wait
            if (unarrived > 1)
                // Here is a direct return. The source code for the internalAwaitAdvance() method is parsed by the register() method.
                return root.internalAwaitAdvance(phase, null);
            
            // Here is the last participant to arrive.
            if (root != this)
                return parent.arriveAndAwaitAdvance();
            // n retains only the parts in the state, that is, the middle 16 bits.
            long n = s & PARTIES_MASK;  // base of next state
            // The value of parties, i.e. the number of participants to arrive next time
            int nextUnarrived = (int)n >>> PARTIES_SHIFT;
            // Execute the onAdvance() method and return true to indicate that the number of participants in the next stage is zero, which is the end.
            if (onAdvance(phase, nextUnarrived))
                n |= TERMINATION_BIT;
            else if (nextUnarrived == 0)
                n |= EMPTY;
            else
                // n plus unarrived
                n |= nextUnarrived;
            // Next phase waits for the current phase to add 1
            int nextPhase = (phase + 1) & MAX_PHASE;
            // n plus the value of the next stage
            n |= (long)nextPhase << PHASE_SHIFT;
            // Modify the value of state to n
            if (!UNSAFE.compareAndSwapLong(this, stateOffset, s, n))
                return (int)(state >>> PHASE_SHIFT); // terminated
            // Wake up other participants and move on to the next stage
            releaseWaiters(phase);
            // Returns the value of the next phase
            return nextPhase;
        }
    }
}

The general logic of arriveAndAwaitAdvance is:

(1) Modify the unarrived part of the state by 1;

(2) If it is not the last one to arrive, the internalAwaitAdvance() method is called to spin or queue for waiting.

(3) If the last one arrives, the onAdvance() method is called, and then the value of state is changed to the corresponding value of the next stage, and other waiting threads are awakened.

(4) Return the value of the next stage;

summary

(1) Phaser is suitable for multi-stage and multi-task scenarios, and the tasks in each stage can be controlled very carefully.

(2) Phaser uses state variables and queues to implement the whole logic. This article is original by the public number "Tong Ge read the source code". Please support originality. Thank you! ]

(3) The current phase is stored in 32 bits of state, 16 bits of state store the number of participants (tasks) in the current phase, and 16 bits of state store the number of unfinished participants unarrived.

(4) Queues will select different queues according to the parity of the current stage.

(5) When not the last participant arrives, it spins or enters the queue to wait for all participants to complete the task.

(6) When the last participant completes the task, it wakes up the thread in the queue and goes to the next stage.

Egg

Phaser's advantages over Cyclic Barrier and Count Down Latch?

Answer: There are two main advantages:

(1) Phaser can accomplish multi-stage tasks, while a Cyclic Barrier or CountDownLatch can only control one or two stages of tasks.

(2) The number of tasks in each phase of Phaser can be controlled, and once the number of tasks in a Cyclic Barrier or CountDownLatch is determined, it cannot be modified.

Programmer Group