Figure out WaitGroup in Go to understand the core implementation source code of programming language

1. Foundation construction

The implementation logic in sync.WaitGroup is quite simple. After reading the previous sync.Mutex and synx.RWMutex, it should be very simple to read them. The only difference is the state1 in sync.WaitGroup

1.1 waiting mechanism

sync.WaitGroup is mainly used to wait for a group of goroutines to exit. In essence, it is a counter. We can specify the number of goroutines we need to wait for to exit by adding, and then decrease by Done. If it is 0, we can exit

1.2 memory alignment

Memory alignment is a big topic. The core mechanism of memory alignment is that the compiler performs the complement according to the size of the internal elements of the structure, combining the platform and the rules of the compiler itself. It is useful in sync.WaitGroup, and it is the most important one among all the core features of WaitGroup

In WaitGroup, there is only state 1 [3] uint32. By type, we can calculate that uint32 is 4 bytes, and the total length of the array of length 3 is 12. In fact, the previous place is [12]byte. Switching uint32 is a switch in go language to ensure that the underlying compiler is aligned with 4 word sections

1.38 byte

8 bytes are two 4 bytes, that is, the length of two uint32, which is actually the length of a uint64. In sync.WaitGroup, count the number of waits by using uint64

Here is a relatively hack question. I have read many articles, but I have not found the answer that can fully convince me. The next is my own conjecture

1.4 8 byte guesswork

First of all, go language needs to be compatible with 32-bit and 64 bit platforms, but the uint operation of 64 bytes on 32-bit platforms may not be atomic. For example, when reading the length of a word, the data of another word may have changed (in 32-bit operating systems, the word length is 4, while the uint64 length is 8). So when actually counting, sync.WaitGroup also uses 4 Bytes

There is a cache line in the cpu, which is usually 8 bytes long. In intel's cpu, the operation for a cache line is atomic. If there are only 8 bytes, the above situation is likely to occur, that is, two cache lines are broken, which may cause problems in both atomic operation and performance

1.5 test 8-byte pointer

Here I simply construct an 8-byte length pointer to demonstrate. By reading the address of the pointer and offset pointer of the underlying array (the second element of state1 array is index=1), we can verify the conjecture that after memory allocation and alignment by the compiler, if the address of the pointer of the current element cannot be divided by 8, the address of the address + 4 can be divided by 8 (here It's more that I can see the real thing in the compiler layer, but I'm not interested in the compiler itself, so I just need a proof to verify the results.)

import (
	"unsafe"
)

type a struct {
	b byte
}

type w struct {
	state1 [3]uint32
}

func main() {
	b := a{}
	println(unsafe.Sizeof(b), uintptr(unsafe.Pointer(&b)), uintptr(unsafe.Pointer(&b))%8 == 0)
	wg := w{}
	println(unsafe.Sizeof(wg), uintptr(unsafe.Pointer(&wg.state1)), uintptr(unsafe.Pointer(&wg.state1))%8 == 0)
	println(unsafe.Sizeof(wg), uintptr(unsafe.Pointer(&wg.state1[1])), uintptr(unsafe.Pointer(&wg.state1[1]))%8 == 0)
}

Output result

1 824633919343 false
12 824633919356 false
12 824633919360 true

1.6 segment count

In sync.WaitGroup, the 8-byte uint64 mentioned above is also a segmented count, that is, the number of dones that need to Wait for the high-level record, while the low-level record is currently waiting for the end of the Wait count

2. Quick reading of source code

1. The core principle is to count by the previously mentioned 64 bit uint64, using the number of Done required by high-level record and the number of wait recorded by low-level record. 2. If the current count > 0, the goroutine of wait will be queued. 3. After the task is completed, the goroutine will perform the Done operation, until the count==0, then it will wake up all goroutines sleeping because of the wait operation

2.1 count and semaphore

As mentioned in the basic part, for 12 byte [3]uint32, it will be calculated according to the address of the current pointer to determine which segment is used for counting and waiting as semaphore. As mentioned above in the detailed description, it is only based on the adopted segment, and then the corresponding segment will be converted into a pointer of * uint64 and a pointer of uint32

func (wg *WaitGroup) state() (statep *uint64, semap *uint32) {
    if uintptr(unsafe.Pointer(&wg.state1))%8 == 0 {
        return (*uint64)(unsafe.Pointer(&wg.state1)), &wg.state1[2]
    } else {
        return (*uint64)(unsafe.Pointer(&wg.state1[1])), &wg.state1[0]
    }
}

2.2 add wait count

func (wg *WaitGroup) Add(delta int) {
    // Get current count
    statep, semap := wg.state()
    if race.Enabled {
        _ = *statep // trigger nil deref early
        if delta < 0 {
            // Synchronize decrements with Wait.
            race.ReleaseMerge(unsafe.Pointer(wg))
        }
        race.Disable()
        defer race.Enable()
    }
    // counter count with high 32 bits
    state := atomic.AddUint64(statep, uint64(delta)<<32)
    v := int32(state >> 32) // Get the current number of done to wait for
    w := uint32(state) // Get the low 32 bit wait count
    if race.Enabled && delta > 0 && v == int32(delta) {
        // The first increment must be synchronized with Wait.
        // Need to model this as a read, because there can be
        // several concurrent wg.counter transitions from 0.
        race.Read(unsafe.Pointer(semap))
    }
    if v < 0 {
        panic("sync: negative WaitGroup counter")
    }
    if w != 0 && delta > 0 && v == int32(delta) {
        panic("sync: WaitGroup misuse: Add called concurrently with Wait")
    }
    // If the current V > 0, it means that you need to continue the unfinished goroutine for Done operation
    // If w ==0, no goroutine is wait ing for the end
    // The above two situations can be returned directly
    if v > 0 || w == 0 {
        return
    }
    // When waiters > 0 and the current v==0, if the state changes before and after checking, then
    // If it is proved that someone has modified it, it will be deleted
    // If you go to this place, it is proved that after the previous operation, the current v==0,w!=0, it is proved that the previous round of Done has been completed. Now you need to wake up all the goroutine s in wait
    // At this time, if the current * statep value is found to have changed again, it is proved that someone has added it
    // That's WaitGroup abuse here
    if *statep != state {
        panic("sync: WaitGroup misuse: Add called concurrently with Wait")
    }
    // Set the current state to 0, and you can reuse it next time
    *statep = 0
    for ; w != 0; w-- {
        // Release all queued waiter s
        runtime_Semrelease(semap, false)
    }
}

2.2 Done completes a wait event

func (wg *WaitGroup) Done() {
    // Subtract one - 1
    wg.Add(-1)
}

2.3 wait for all operations to complete

func (wg *WaitGroup) Wait() {
    statep, semap := wg.state()
    if race.Enabled {
        _ = *statep // trigger nil deref early
        race.Disable()
    }
    for {
        // Get state
        state := atomic.LoadUint64(statep)
        v := int32(state >> 32) // Get high 32-bit count
        w := uint32(state) // Get the current number of waiting
        if v == 0 { // If the current v ==0, return directly, indicating that there is no need to wait at present
            // Counter is 0, no need to wait.
            if race.Enabled {
                race.Enable()
                race.Acquire(unsafe.Pointer(wg))
            }
            return
        }
        // Carry out low water meter count statistics
        if atomic.CompareAndSwapUint64(statep, state, state+1) {
            if race.Enabled && w == 0 {
                // Wait must be synchronized with the first Add.
                // Need to model this is as a write to race with the read in Add.
                // As a consequence, can do the write only for the first waiter,
                // otherwise concurrent Waits will race with each other.
                race.Write(unsafe.Pointer(semap))
            }
            // If it succeeds, it will sleep in queue and wait for wakeup
            runtime_Semacquire(semap)
            // If the state is found to be not 0 after wake-up, it is proved that WaitGroup is reused during wake-up, then panic
            if *statep != 0 {
                panic("sync: WaitGroup is reused before previous Wait has returned")
            }
            if race.Enabled {
                race.Enable()
                race.Acquire(unsafe.Pointer(wg))
            }
            return
        }
    }
}

Reference articles

About the size of cpu cache line Atomic operation

Pay attention to the bulletin number to read more source code analysis articles

Programmer Group