A Brief Analysis of Golang Sync.Pool

Keywords: Go

sync pool uses a buffer to store temporary variables, but this buffer is not reliable. Every time a gc is used, the buffer is cleared first. So if a slice is only stored in a Pool and no other reference is made, it will be cleared as garbage.

concept

A Pool is a set of temporary objects that may be individually saved and retrieved.

Any item stored in the Pool may be removed automatically at any time without notification. If the Pool holds the only reference when this happens, the item might be deallocated.

A Pool is safe for use by multiple goroutines simultaneously.

Pool's purpose is to cache allocated but unused items for later reuse, relieving pressure on the garbage collector. That is, it makes it easy to build efficient, thread-safe free lists. However, it is not suitable for all free lists.

An appropriate use of a Pool is to manage a group of temporary items silently shared among and potentially reused by concurrent independent clients of a package. Pool provides a way to amortize allocation overhead across many clients.

An example of good use of a Pool is in the fmt package, which maintains a dynamically-sized store of temporary output buffers. The store scales under load (when many goroutines are actively printing) and shrinks when quiescent.

Icon

The structure of sync.pool is shown in the figure above. There may be two problems here.

  1. When we instantiate Sync.Pool, why instantiate a LocalPool array, and how do we determine which unit of the LocalPool array my data should be stored in?
  2. The members of PoolLocal International are private and shared. Why should we distinguish between them?

Source code analysis

Put

Put()

func (p *Pool) Put(x interface{}) {
    if x == nil {
        return
    }
    // race detection ignores this piece first
    if race.Enabled {
        if fastrand()%4 == 0 {
            // Randomly drop x on floor.
            return
        }
        race.ReleaseMerge(poolRaceAddr(x))
        race.Disable()
    }
    
    // According to the id of its goroutine, get the corresponding PoolLocal address, and then analyze it concretely.
    l := p.pin()
    // If the private field is empty, first assign a value to the private field
    if l.private == nil {
        l.private = x
        x = nil
    }
    runtime_procUnpin()
    // If the private field is added to the shared field, because the shared field can be acquired by other goroutine s, locks are required here.
    if x != nil {
        l.Lock()
        l.shared = append(l.shared, x)
        l.Unlock()
    }
    if race.Enabled {
        race.Enable()
    }
}

pin()

func (p *Pool) pin() *poolLocal {
   // Get the current Pid/P, the number is from
    pid := runtime_procPin()
    // Number of LocalPool s
    s := atomic.LoadUintptr(&p.localSize) // load-acquire
    l := p.local                          // load-consume
    // If the obtained pid is smaller than the length of the LocalPool array, return the corresponding LocalPool
    if uintptr(pid) < s {
        return indexLocal(l, pid)
    }
    // If the pid is longer than the LocalPool array, confirm that this function is discussed later
    return p.pinSlow()
}

Runtime_procPin() This is to get the pid currently running, the specific implementation is not found, but the range of values returned by runtime_procPin() is determined by runtime.GOMAXPROCS(0). There is an article on the Internet for reference. "Golang's Scheduling Mechanism and GOMAXPROCS Performance Tuning" It's not going deep here.

PinSlow()

func (p *Pool) pinSlow() *poolLocal {
    // Retry under the mutex.
    // Can not lock the mutex while pinned.
    runtime_procUnpin()
    allPoolsMu.Lock()
    defer allPoolsMu.Unlock()
    pid := runtime_procPin()
    // poolCleanup won't be called while we are pinned.
    // Check again whether the LocalPool has a corresponding index to avoid the impact of other threads
    s := p.localSize
    l := p.local
    if uintptr(pid) < s {
        return indexLocal(l, pid)
    }
    // If local is nil, it means the newly constructed Pool structure, step up the allPools slice
    if p.local == nil {
        allPools = append(allPools, p)
    }
    // If GOMAXPROCS changes between GCs, we re-allocate the array and lose the old one.
    // Retrieve GOMAXPROCS and set the size of PoolLocal based on this
    size := runtime.GOMAXPROCS(0)
    local := make([]poolLocal, size)
    atomic.StorePointer(&p.local, unsafe.Pointer(&local[0])) // store-release
    atomic.StoreUintptr(&p.localSize, uintptr(size))         // store-release
    // Find the corresponding address of the current goroutine and return
    return &local[pid]
}

Put logic

To sum up, Put's basic operating logic is

  • Get the currently executed Pid
  • According to Pid, find the corresponding PoolLocal, and then use the PoolLocal Internal inside.
  • private attributes of PoolLocal International are stored first, and shared slice s of PoolLocal International are stored second.

Get

Get()

func (p *Pool) Get() interface{} {
    if race.Enabled {
        race.Disable()
    }
    
    // Get LocalPool
    l := p.pin()
    
    // Copy the private data value and set it to nil, because if private has data, after returning the private data, set it to nil. If private does not have data, it is nil. It doesn't matter to add this step.
    x := l.private
    l.private = nil
    runtime_procUnpin()
    // If there is no data in private, look in shared.
    if x == nil {
        l.Lock()
        last := len(l.shared) - 1
        if last >= 0 {
            x = l.shared[last]
            l.shared = l.shared[:last]
        }
        l.Unlock()
        // If the corresponding LocalPool under the current thread has no data, call getSlow(), retrieve the data from the shared of other LocalPool, and parse getSlow later.
        if x == nil {
            x = p.getSlow()
        }
    }
    if race.Enabled {
        race.Enable()
        if x != nil {
            race.Acquire(poolRaceAddr(x))
        }
    }
    // If you can't get data from private share and share of other Local Pools, and the registered New function is not empty, then the registered New function is executed.
    if x == nil && p.New != nil {
        x = p.New()
    }
    return x
}

getSlow()

func (p *Pool) getSlow() (x interface{}) {
    // See the comment in pin regarding ordering of the loads.
    // Get the size of LocalPool
    size := atomic.LoadUintptr(&p.localSize) // load-acquire
    local := p.local                         // load-consume
    // Try to steal one element from other procs.
    pid := runtime_procPin()
    runtime_procUnpin()
    // Facilitate LocalPool, get the data in shared, find it and return
    for i := 0; i < int(size); i++ {
        l := indexLocal(local, (pid+i+1)%int(size))
        l.Lock()
        last := len(l.shared) - 1
        if last >= 0 {
            x = l.shared[last]
            l.shared = l.shared[:last]
            l.Unlock()
            break
        }
        l.Unlock()
    }
    return x
}

From the above logic, we can see that the data in shared will be retrieved by other P, but the data in private will not, so when we get the data in shared, we need to lock it.

poolCleanup

This function is provided in the Pool package to clean up the Pool, but the official implementation is a little rough.

func poolCleanup() {
    // This function is called with the world stopped, at the beginning of a garbage collection.
    // It must not allocate and probably should not call any runtime functions.
    // Defensively zero out everything, 2 reasons:
    // 1. To prevent false retention of whole Pools.
    // 2. If GC happens while a goroutine works with l.shared in Put/Get,
    //    it will retain whole Pool. So next cycle memory consumption would be doubled.
    // Facilitate all Sync.Pool
    for i, p := range allPools {
        allPools[i] = nil
        // Traverse the Local Pool inside the Pool and empty the data inside.
        for i := 0; i < int(p.localSize); i++ {
            l := indexLocal(p.local, i)
            l.private = nil
            for j := range l.shared {
                l.shared[j] = nil
            }
            l.shared = nil
        }
        p.local = nil
        p.localSize = 0
    }
    // Empty allPools
    allPools = []*Pool{}
}

This function will be called before GC, which explains the official sentence below.

Any item stored in the Pool may be removed automatically at any time without
notification. If the Pool holds the only reference when this happens, the
item might be deallocated.

If a data is only referenced in a Pool, then you need to worry about the data being cleaned up by the GC.

problem analysis

For the above two questions, do a simple analysis

When we instantiate Sync.Pool, why instantiate a LocalPool array, and how do we determine which unit of the LocalPool array my data should be stored in?

Local Pool is distinguished according to different PIDs to ensure the thread safety of private data. When the program runs, it can get the pid, and then use PID as the index of Local Pool to find the corresponding address.

The members of PoolLocal International are private and shared. Why should we distinguish between them?

private is P-specific. shared can be accessed by other P's.

Reference Documents

"sync.Pool Source Implementation"
Go Language Learning Notes - Rain Traces

Posted by andylai on Sun, 04 Aug 2019 23:08:04 -0700