Source analysis of sync.Mutex in golang

Keywords: Go less Programming

The go language takes concurrency as one of its characteristics, and concurrency will inevitably lead to competition for resources. At this time, we need to use the sync.Mutex mutex provided by go to ensure that access to critical resources is mutually exclusive.

Since this lock is often used, you can understand what scenarios and features this lock applies to by understanding its internal implementation.

Introduction

When I first looked at this code, I was really amazed. Especially the whole Mutex only used two private fields, and the process of locking CAS once. The idea of design and programming really made me feel ashamed.

When looking at sync.Mutex's code, it's important to remember that there will be multiple goroutine s to ask for the lock at the same time, so the state of the lock may change all the time.

The Nature of Locks

First come to the conclusion: sync.Mutex is a fair lock.

In the source code, there is a comment:

// Mutex fairness.
//
// Mutex can be in 2 modes of operations: normal and starvation.
// In normal mode waiters are queued in FIFO order, but a woken up waiter
// does not own the mutex and competes with new arriving goroutines over
// the ownership. New arriving goroutines have an advantage -- they are
// already running on CPU and there can be lots of them, so a woken up
// waiter has good chances of losing. In such case it is queued at front
// of the wait queue. If a waiter fails to acquire the mutex for more than 1ms,
// it switches mutex to the starvation mode.
//
// In starvation mode ownership of the mutex is directly handed off from
// the unlocking goroutine to the waiter at the front of the queue.
// New arriving goroutines don't try to acquire the mutex even if it appears
// to be unlocked, and don't try to spin. Instead they queue themselves at
// the tail of the wait queue.
//
// If a waiter receives ownership of the mutex and sees that either
// (1) it is the last waiter in the queue, or (2) it waited for less than 1 ms,
// it switches mutex back to normal operation mode.
//
// Normal mode has considerably better performance as a goroutine can acquire
// a mutex several times in a row even if there are blocked waiters.
// Starvation mode is important to prevent pathological cases of tail latency.

Understanding this comment is very helpful for us to understand the mutex lock, which describes the design concept of the lock. The general meaning is as follows:

// Fair Lock
//
// There are two modes of locks: normal mode and hunger mode.
// In normal mode, all goroutine s waiting for locks will have a first-in-first-out queue (wake-up in turn)
// But a wakened goroutine does not acquire locks directly, but still needs to be locked with those new arrivals.
// The goroutines compete, which is actually unfair, because the goroutines of the new request locks have an advantage - they are on the CPU.
// Running, and the number may be large. So the probability of a wakened goroutine getting a lock is very small. Under these circumstances,
// The wakened goroutine will join the head of the queue. If a waiting goroutine has more than 1ms (written dead in code)
// If you don't get the lock, you'll turn the lock into a starvation mode.
//
// In starvation mode, the ownership of the lock is transferred directly from the goroutine of unlock to the goroutine of the queue head.
// The goroutine of the new request lock will not acquire the lock even if the lock is idle and will not attempt to spin. They just line up at the end of the queue.
//
// If a goroutine acquires a lock, it judges the following two cases:
// 1. It is the last goroutine in the queue.
// 2. It takes less than 1 ms to get the lock.
// As long as one of the above is established, it will turn the lock back to normal mode.

// Normal mode will have better performance, because even if there are many blocked goroutine s waiting for locks,
// A goroutine can also attempt to request multiple locks.
// Hunger patterns are important for preventing tail delays.

Before we really look at the source code in the next step, we have to understand that when a goroutine acquires a lock, there may be no competitors or many competitors, so we need to consider the state and actual state and expectation of the lock that goroutine sees from different goroutine perspectives. The transformation between states.

Field Definition

sync.Mutex contains only two fields:

// A Mutex is a mutual exclusion lock.
// The zero value for a Mutex is an unlocked mutex.
//
// A Mutex must not be copied after first use.
type Mutex struct {
    state int32
    sema    uint32
}

const (
    mutexLocked = 1 << iota // mutex is locked
    mutexWoken
    mutexStarving
    mutexWaiterShift = iota

    starvationThresholdNs = 1e6
)

The state is a field indicating the state of a lock, which is shared by multiple goroutines (atomic.CAS is used to ensure atomicity), the 0th bit (1) indicates that the lock has been acquired, that is, it has been locked and owned by a goroutine, and the 1st bit (2) indicates that a goroutine has been awakened to try to acquire the lock; Each bit (4) marks whether the lock is hungry.

The sema field is the semaphore used to wake up the goroutine.

Lock

Before looking at the code, we need to have a concept: each goroutine has its own state, which exists in local variables (that is, in the function stack), and goroutine may be new, awakened, normal, and hungry.

atomic.CAS

Look ahead to the CAS operation, which is amazing to lock a line of code between heaven and man.

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex.
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        if race.Enabled {
            race.Acquire(unsafe.Pointer(m))
        }
        return
    }
    ...
}

This is the first piece of code that calls CompareAndSwapInt32 in the atomic package to try to get locks quickly. The signature of this method is as follows:

// CompareAndSwapInt32 executes the compare-and-swap operation for an int32 value.
func CompareAndSwapInt32(addr *int32, old, new int32) (swapped bool)

That is, if the addr points to the address with the same value as old, change the value in addr to new and return true; otherwise, do nothing and return false. Because it is a function in atomic, it guarantees atomicity.

Let's look at the implementation of CAS (src/runtime/internal/atomic/asm_amd64.s):

// bool Cas(int32 *val, int32 old, int32 new)
// Atomically:
//    if(*val == old){
//        *val = new;
//        return 1;
//    } else
//        return 0;
// The sum of parameters and return values here is 17, because a pointer is 8 bytes under amd64.
// Then int32 takes 4 bytes, and the final return value is bool takes 1 byte, so it adds up to 17.
TEXT runtime∕internal∕atomic·Cas(SB),NOSPLIT,$0-17 
    // Why not put the * val pointer in AX? Because AX has special uses,
    // In the CMPXCHGL below, one of the numbers to be compared is read from AX
    MOVQ    ptr+0(FP), BX
    // So AX is used to store the old parameter.
    MOVL    old+8(FP), AX
    // Store the number in new in register CX
    MOVL    new+12(FP), CX
    // Notice here that the LOCK prefix is used, so make sure the operation is atomic.
    LOCK
    // 0(BX) can be understood as *val
    // Compare the number in AX with the value in the second operand 0(BX) - that is, the address to which the BX register points
    // If equal, assign the value stored in the first operand CX register to the address indicated by the second operand BX register
    // Set the flag register ZF to 1
    // Otherwise, zeroing out the flag register ZF
    CMPXCHGL    CX, 0(BX)
    // The role of SETE is:
    // If the Zero Flag flag register is 1, set the operand to 1
    // Otherwise, set the operand to 0
    // That is to say, if the above comparison is equal, return true, otherwise false.
    // ret+16(FP) represents the address of the return value
    SETEQ    ret+16(FP)
    RET

It doesn't matter if you don't understand it, as long as you know the function and the atomicity of the function.

So what this code means is: first, see if the lock is idle, and if so, just atomically modify the state to be retrieved. How concise (although the code behind is not...)!

Mainstream process

Next, look at the code of the main process. Some bit operations in the code look dizzy. I'll try to annotate them with pseudo code.

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex.
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
        if race.Enabled {
            race.Acquire(unsafe.Pointer(m))
        }
        return
    }

    // Time to save the current goroutine wait
    var waitStartTime int64
    // Is it hungry to save the current goroutine?
    starving := false
    // Is the current goroutine awakened
    awoke := false
    // The number of cycles used to store the current goroutine (think about a goroutine if it has 21474848 cycles... )
    iter := 0
    // Copy the status of the current lock
    old := m.state
    // spin
    for {
        // If it's hungry, don't spin, because the lock is handed directly to the goroutine at the head of the queue.
        // If the lock is in the acquired state and satisfies the spin condition (canSpin is analyzed later), then spin equilocks are used.
        // Pseudo-code: if isLocked() and isNotStarving() and canSpin()
        if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
            // Set your state and lock state to wake up, so that when Unlock, you won't wake up other blocked goroutine s.
            if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
                atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
                awoke = true
            }
            // Spinning (see below for analysis)
            runtime_doSpin()
            iter++
            // Update the status of the lock (it is possible that the status of the lock has been changed by other goroutine s during the spin period)
            old = m.state
            continue
        }
        
        // When it comes to this step, there may be the following situations:
        // 1. Lock acquisition + hunger
        // 2. Lock acquisition + Normal
        // 3. Lock idle + hunger
        // 4. Lock idle + Normal
        
        // The state of goroutine may be wake-up and non-wake-up
        
        // Duplicate a current state in order to set the desired state according to the current state, which exists in new.
        // And comparing and updating the status of locks through CAS
        // The current state of the lock used by old
        new := old

        // If a lock is not a hungry state, set the expected state to be acquired.
        // That is to say, if it's hungry, don't set the expected state to be acquired.
        // The new goroutine queues up
        // Pseudo-code: if is NotStarving ()
        if old&mutexStarving == 0 {
            // Pseudo-code: newState = locked
            new |= mutexLocked
        }
        // If the lock is acquired, or hungry
        // Just put the number of waiting queues in the expected state + 1 (actually new + 8)
        // (Could there be 300 million goroutine s waiting for locks... )
        if old&(mutexLocked|mutexStarving) != 0 {
            new += 1 << mutexWaiterShift
        }
        // If the current goroutine is hungry, and locks are acquired by other goroutines
        // Then set the desired lock state to starvation
        // If the lock is released, there is no need to switch
        // Unlock expects a hungry lock to have some goroutine s waiting to be locked, not just one.
        // It won't work in this case.
        if starving && old&mutexLocked != 0 {
            // Expectation is set to starvation
            new |= mutexStarving
        }
        // If the current goroutine is awakened, we need reset
        // Because goroutine either gets the lock or goes to sleep.
        if awoke {
            // If the expected state is not the woken state, then something must be wrong.
            // It doesn't matter if you don't understand it here. wake's logic is as follows
            if new&mutexWoken == 0 {
                throw("sync: inconsistent mutex state")
            }
            // This is to set new to a non-wake-up state.
            // &^ Meaning and not
            new &^= mutexWoken
        }
        // Attempt to set the lock status through CAS
        // This could be a lock, or it could be just starvation and waiting.
        if atomic.CompareAndSwapInt32(&m.state, old, new) {
            // If old state is neither hungry nor acquired state
            // So, on behalf of the current goroutine, locks have been successfully acquired through CAS
            // (Entering this code block indicates that the state has changed, that is, the state is from idle to acquired)
            if old&(mutexLocked|mutexStarving) == 0 {
                break // locked the mutex with CAS
            }
            // If you've already waited, put it in the front of the queue.
            queueLifo := waitStartTime != 0
            // If you haven't waited before, initialize the current waiting time
            if waitStartTime == 0 {
                waitStartTime = runtime_nanotime()
            }
            // Since the acquisition lock failed, use the sleep primitive to block the current goroutine
            // Queuing for locks by semaphores
            // If it's a new goroutine, put it at the end of the queue
            // If it's a wake-up goroutine waiting for locks, put it at the head of the queue
            runtime_SemacquireMutex(&m.sema, queueLifo)
            
            // Here the sleep is over and awakened
            
            // If the current goroutine is already hungry
            // Or the current goroutine has waited more than 1ms (defining constants above)
            // Set the current goroutine status to hunger
            starving = starving || runtime_nanotime()-waitStartTime > starvationThresholdNs
            // Get the status of the lock again
            old = m.state
            // If the lock is now hungry, it means that the lock is released, and the current goroutine is awakened by semaphores.
            // In other words, the lock is handed directly to the current goroutine
            if old&mutexStarving != 0 {
                // If the status of the current lock is awakened or acquired, or the waiting queue is empty
                // So it's impossible, there must be a problem, because there must be a queue waiting for the current state, and the lock must be released and not waked up.
                if old&(mutexLocked|mutexWoken) != 0 || old>>mutexWaiterShift == 0 {
                    throw("sync: inconsistent mutex state")
                }
                // The current goroutine gets the lock, so the waiting queue - 1
                delta := int32(mutexLocked - 1<<mutexWaiterShift)
                // If the current goroutine is not hungry, or if the current goroutine is the last goroutine in the queue
                // Then quit the hunger mode and set the state to normal.
                if !starving || old>>mutexWaiterShift == 1 {
                    // Exit starvation mode.
                    // Critical to do it here and consider wait time.
                    // Starvation mode is so inefficient, that two goroutines
                    // can go lock-step infinitely once they switch mutex
                    // to starvation mode.
                    delta -= mutexStarving
                }
                // Atomically plus a modified state
                atomic.AddInt32(&m.state, delta)
                break
            }
            // If the lock is not a starvation mode, set the current goroutine to be awakened
            // And reset ITER (reset spin)
            awoke = true
            iter = 0
        } else {
            // If CAS is unsuccessful, that is to say, the lock is acquired by another goroutine or the lock has not been released.
            // Then update the status, restart the loop and try to get the lock
            old = m.state
        }
    }

    if race.Enabled {
        race.Acquire(unsafe.Pointer(m))
    }
}

Why can CAS get the lock? Because CAS will atomically determine whether the old state and the current lock are in the same state; there will always be a goroutine that successfully holds the lock to satisfy the above conditions.

canSpin

Next, let's look at the canSpin condition mentioned above.

// Active spinning for sync.Mutex.
//go:linkname sync_runtime_canSpin sync.runtime_canSpin
//go:nosplit
func sync_runtime_canSpin(i int) bool {
    // Here active_spin is a constant with a value of 4.
    // Simply put, sync.Mutex is likely to be competing with multiple goroutine s, so it should not consume large amounts of spin (CPU).
    // The spin conditions are as follows:
    // 1. The number of spins is less than that of active_spin(4 here).
    // 2. On multi-core machines;
    // 3. GOMAXPROCS > 1 and at least one other running P;
    // 4. Currently P has no other waiting G.
    // Only when the above four conditions are satisfied can we spin.
    if i >= active_spin || ncpu <= 1 || gomaxprocs <= int32(sched.npidle+sched.nmspinning)+1 {
        return false
    }
    if p := getg().m.p.ptr(); !runqempty(p) {
        return false
    }
    return true
}

So it can be seen that the spin is not always infinite. When the number of spins reaches four or other conditions do not meet, the signal is locked.

doSpin

Then let's take a look at the implementation of doSpin (which isn't very interesting at all):

//go:linkname sync_runtime_doSpin sync.runtime_doSpin
//go:nosplit
func sync_runtime_doSpin() {
    procyield(active_spin_cnt)
}

This is a function of assembler implementation. Look at the implementation on amd64:

TEXT runtime·procyield(SB),NOSPLIT,$0-0
    MOVL    cycles+0(FP), AX
again:
    PAUSE
    SUBL    $1, AX
    JNZ    again
    RET

It doesn't look good. Just skip it.

Unlock

Next, let's look at the implementation of Unlock. For Unlock, there are two key features:

  1. If the lock is not locked, executing Unlock on the lock results in panic.
  2. Locks do not correspond to goroutine, so we can get locks in goroutine 1, and then call Unlock in goroutine 2 to release locks (what is this saucy operation!). (Although I don't recommend you to do that...)
func (m *Mutex) Unlock() {
    if race.Enabled {
        _ = m.state
        race.Release(unsafe.Pointer(m))
    }

    // Fast path: drop lock bit.
    // Here, the state of the lock is acquired, and then the state is subtracted from the acquired state (i.e., unlocked), which is called the new (expected) state.
    // Note that the above two operations are atomic, so you don't have to worry about multiple goroutine s concurrently.
    new := atomic.AddInt32(&m.state, -mutexLocked)
    // If the expected state plus the acquired state is not acquired
    // So panic
    // Here I would like to ask you a question: why take so much time to subtract and add, directly compare the status of the original lock is not acquired, not finished?
    if (new+mutexLocked)&mutexLocked == 0 {
        throw("sync: unlock of unlocked mutex")
    }
    // If the new state (that is, the lock state) is not the hungry state
    if new&mutexStarving == 0 {
        // Duplicate the original state
        old := new
        for {
            // If the lock does not wait for the goroutine to get the lock
            // Or the lock is acquired (acquired by other goroutine s during the loop)
            // Or the lock is awakened (meaning that there is a goroutine awakened, no need to try to wake up other goroutines)
            // Or the lock is a starvation mode (goroutine passed directly to the head of the queue)
            // So go straight back and do nothing.
            if old>>mutexWaiterShift == 0 || old&(mutexLocked|mutexWoken|mutexStarving) != 0 {
                return
            }
            // At this point, the lock is still idle, and no goroutine is awakened and there is goroutine in the queue waiting for the lock to be picked up.
            // Then we set the lock state to wake up and wait for queue-1.
            new = (old - 1<<mutexWaiterShift) | mutexWoken
            // Also familiar with CAS
            if atomic.CompareAndSwapInt32(&m.state, old, new) {
                // If the status setting is successful, we wake up goroutine through semaphores
                runtime_Semrelease(&m.sema, false)
                return
            }
            // At the end of the loop, update the status, because it is possible that during execution, the status is modified (for example, by changing Lock to starvation).
            old = m.state
        }
    } else {
        // If it's hungry, let's hand over the ownership of the lock directly to the goroutine of the queue head through semaphores.
        // handoff = true denotes goroutine that hands locks directly to the head of the queue
        // Note: At this point, the status of the lock acquired is not set, and it is set by the wakened goroutine after waking up.
        // But when locks are hungry, we also think that locks are acquired (because we manually specify the goroutine of acquisition)
        // So the new goroutine won't try to get locks (as shown in Lock)
        runtime_Semrelease(&m.sema, true)
    }
}

summary

Based on the analysis of the above code, we can see that sync.Mutex locks in your workload (time required) is relatively low, for example, when only assigning a key variable, the performance is better, but if the operation of critical resources takes a long time (especially when a single operation is greater than 1ms), it is true. In fact, there will be some performance problems, which is often seen as the "lock has been hungry" problem, for this situation, may need to find other ways.

So far, the analysis of sync.Mutex is over. Although there are only 200 lines of code (including 150 lines of comments, the actual code is estimated to be 50 lines), the algorithm, design ideas and programming concepts are worth understanding. The so-called "simplicity" is more likely to be the case.

Posted by gurjit on Sun, 11 Aug 2019 21:28:01 -0700