Deep Understanding Go-Memory Allocation

Keywords: Go less IE Windows

Go language built-in runtime (runtime), abandoned the traditional way of memory allocation, changed to autonomous management, initially based on tcmalloc, although the later changes are relatively large. Using autonomous management can achieve better memory usage patterns, such as memory pools, pre-allocation, etc., thus avoiding the performance problems caused by system calls.

Before we understand the memory allocation of Go, we can take a look at the basic strategies of memory allocation to help us understand the memory allocation of Go.

Basic strategies:

  1. Request a large chunk of memory from the operating system at a time to reduce system calls
  2. A linked list is formed by pre-cutting large blocks of memory into small blocks according to a specific size.
  3. When allocating memory to objects, extract a block from a list of appropriate sizes
  4. If the object is destroyed, the memory occupied by the object is returned to the original list for reuse.
  5. If you limit too much memory, try to return part to the operating system to reduce overall overhead.

Let's look at the similarities and differences of Go's memory allocation strategy from the source code perspective.

Get ready

Before tracing the source code, we need to understand some concepts and structures first.

  • span: A large block of memory consisting of page s with successive addresses
  • Object: Divide span s into small pieces of specific size, each of which can store an object.

Object Classification

  • Small objects: size < 16 byte
  • Ordinary objects: 16byte ~ 32K
  • Large object (large): size > 32K

Size conversion

structural morphology

mHeap

Representing all heap space held by the Go program, the Go program manages heap memory using a global object _mheap of mheap.

type mheap struct {
    lock      mutex
    free      [_MaxMHeapList]mSpanList // An idle list of span s with page s less than 127
    freelarge mTreap                   // Tree structure of large span with page number greater than 127
    busy      [_MaxMHeapList]mSpanList // List of span s allocated within 127 page s
    busylarge mSpanList                // List of allocated large span s with page s greater than 127

    // allspans is a slice of all mspans ever created. Each mspan
    // appears exactly once.
    // slice of all created mspan
    allspans []*mspan // all spans out there

    // arenas is the heap arena map. It points to the metadata for
    // the heap for every arena frame of the entire usable virtual
    // address space.
    //
    // Use arenaIndex to compute indexes into this array.
    //
    // For regions of the address space that are not backed by the
    // Go heap, the arena map contains nil.
    //
    // Modifications are protected by mheap_.lock. Reads can be
    // performed without locking; however, a given entry can
    // transition from nil to non-nil at any time when the lock
    // isn't held. (Entries never transitions back to nil.)
    //
    // In general, this is a two-level mapping consisting of an L1
    // map and possibly many L2 maps. This saves space when there
    // are a huge number of arena frames. However, on many
    // platforms (even 64-bit), arenaL1Bits is 0, making this
    // effectively a single-level map. In this case, arenas[0]
    // will never be nil.
    // A set of heap Arena, each of which contains continuous pages PerArena spans, mainly for mheap management spans and garbage collection services, is also introduced by heap Arena.
    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

    // heapArenaAlloc is pre-reserved space for allocating heapArena
    // objects. This is only used on 32-bit, where we pre-reserve
    // this space to avoid interleaving it with the heap itself.
    // Pre-assigned address of heapArea object
    heapArenaAlloc linearAlloc

    // arenaHints is a list of addresses at which to attempt to
    // add more heap arenas. This is initially populated with a
    // set of general hint addresses, and grown with the bounds of
    // actual heap arena ranges.
    arenaHints *arenaHint

    // arena is a pre-reserved space for allocating heap arenas
    // (the actual arenas). This is only used on 32-bit.
    // Only 32-bit usage
    arena linearAlloc

    //_ uint32 // ensure 64-bit alignment of central

    // central free lists for small size classes.
    // the padding makes sure that the MCentrals are
    // spaced CacheLineSize bytes apart, so that each MCentral.lock
    // gets its own cache line.
    // central is indexed by spanClass.
    // Mcentral memory allocation center, mcache allocates from mcentral when it does not have enough memory allocation
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [sys.CacheLineSize - unsafe.Sizeof(mcentral{})%sys.CacheLineSize]byte
    }

    spanalloc             fixalloc // allocator for span*
    cachealloc            fixalloc // allocator for mcache*
    treapalloc            fixalloc // allocator for treapNodes* used by large objects
    specialfinalizeralloc fixalloc // allocator for specialfinalizer*
    specialprofilealloc   fixalloc // allocator for specialprofile*
    speciallock           mutex    // lock for special record allocators.
    arenaHintAlloc        fixalloc // allocator for arenaHints

    unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

mSpanList

The list of mSpan and the mSpan on the free busy busyLarge are linked in series through the list.

type mSpanList struct {
    first *mspan // first span in list, or nil if none
    last  *mspan // last span in list, or nil if none
}

mSpan

The basic unit of memory management in Go is a block of memory consisting of a continuous 8 KB page. Note that the pages here are not the same as the pages of the operating system itself. They are usually several times the size of the operating system pages. A sentence summary: mspan is a double-ended list containing the starting address, mspan specifications, the number of pages and so on.

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.

    startAddr uintptr // address of first byte of span aka s.base()
    // The number of pages contained in the span lock
    npages    uintptr // number of pages in span

    manualFreeList gclinkptr // list of free objects in _MSpanManual spans

    // freeindex is the slot index between 0 and nelems at which to begin scanning
    // for the next free object in this span.
    // Each allocation scans allocBits starting at freeindex until it encounters a 0
    // indicating a free object. freeindex is then adjusted so that subsequent scans begin
    // just past the newly discovered free object.
    //
    // If freeindex == nelem, this span has no free objects.
    //
    // allocBits is a bitmap of objects in this span.
    // If n >= freeindex and allocBits[n/8] & (1<<(n%8)) is 0
    // then object n is free;
    // otherwise, object n is allocated. Bits starting at nelem are
    // undefined and should never be referenced.
    //
    // Object n starts at address n*elemsize + (start << pageShift).
    // Used to locate the next available object, ranging in size from 0 to nelems
    freeindex uintptr
    // TODO: Look up nelems from sizeclass and remove this field if it
    // helps performance.
    // Number of object s in span
    nelems uintptr // number of object in the span.

    // Cache of the allocBits at freeindex. allocCache is shifted
    // such that the lowest bit corresponds to the bit freeindex.
    // allocCache holds the complement of allocBits, thus allowing
    // ctz (count trailing zero) to use it directly.
    // allocCache may contain bits beyond s.nelems; the caller must ignore
    // these.
    // The bit map used to cache the start of free index is the opposite of the original value. The ctz function can quickly calculate the index of the next free object by this value.
    allocCache uint64

    // Allocate bitmaps, each representing whether each block has been allocated
    allocBits  *gcBits

    // Number of object s allocated
    allocCount  uint16     // number of allocated objects

    elemsize    uintptr    // computed from sizeclass or from npages

}

spanClass

ClassID in the class table, related to Size Classs

type spanClass uint8

mTreap

This structure is a tree structure containing mspan, mainly for freeLarge, which is faster than a linked list when searching for large objects corresponding to classsize.

type mTreap struct {
    treap *treapNode
}

mtreapNode

mTreap structure node, node information contains mspan and left and right sub-nodes and other information

type treapNode struct {
    right     *treapNode // all treapNodes > this treap node
    left      *treapNode // all treapNodes < this treap node
    parent    *treapNode // direct parent of this node, nil if root
    npagesKey uintptr    // number of pages in spanKey, used as primary sort key
    spanKey   *mspan     // span of size npagesKey, used as secondary sort key
    priority  uint32     // random number used by treap algorithm to keep tree probabilistically balanced
}

heapArena

HeapArena stores arena's metadata, arenas is a set of heapArena components. All allocated memory is in arenas, roughly arenas[L1][L2] = heapArena. For the allocated memory address, we can calculate L1 L2 through arenaIndex, and find the corresponding arenas[L1][L2], That is heap Arena

type heapArena struct {
    // bitmap stores the pointer/scalar bitmap for the words in
    // this arena. See mbitmap.go for a description. Use the
    // heapBits type to access this.
    bitmap [heapArenaBitmapBytes]byte

    // spans maps from virtual address page ID within this arena to *mspan.
    // For allocated spans, their pages map to the span itself.
    // For free spans, only the lowest and highest pages map to the span itself.
    // Internal pages map to an arbitrary span.
    // For pages that have never been allocated, spans entries are nil.
    //
    // Modifications are protected by mheap.lock. Reads can be
    // performed without locking, but ONLY from indexes that are
    // known to contain in-use or stack spans. This means there
    // must not be a safe-point between establishing that an
    // address is live and looking it up in the spans array.
    spans [pagesPerArena]*mspan
}

arenaHint

This is the address where arena can grow

type arenaHint struct {
    addr uintptr
    // down is true, meaning that arena can be expanded
    down bool
    next *arenaHint
}

mcentral

Mcentral is a global resource that serves multiple threads. When a thread is out of memory, it requests mcentral, and when a thread releases memory, it reclaims it into mcentral.

type mcentral struct {
    lock      mutex
    spanclass spanClass
    // Chain List of Freeobject
    nonempty  mSpanList // list of spans with a free object, ie a nonempty free list
    // no free object list
    empty     mSpanList // list of spans with no free objects (or cached in an mcache)

    // nmalloc is the cumulative count of objects allocated from
    // this mcentral, assuming all spans in mcaches are
    // fully-allocated. Written atomically, read under STW.
    nmalloc uint64
}

Structural diagram

Next, let's take a look at the macro graphics to understand the relationship between the above structures. At the same time, we have a simple understanding of the following memory allocation. When we finish all the later, we may have a clearer understanding of the memory allocation of Go when we look back at this picture.

Initialization

func mallocinit() {
    // Initialize the heap.
    // Initialize mheap
    mheap_.init()
    _g_ := getg()
  // Get the mcache of m where g is currently located and initialize it
    _g_.m.mcache = allocmcache()
    for i := 0x7f; i >= 0; i-- {
  var p uintptr
  switch {
  case GOARCH == "arm64" && GOOS == "darwin":
      p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
  case GOARCH == "arm64":
      p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
  case raceenabled:
    // The TSAN runtime requires the heap
    // to be in the range [0x00c000000000,
    // 0x00e000000000).
    p = uintptr(i)<<32 | uintptrMask&(0x00c0<<32)
    if p >= uintptrMask&0x00e000000000 {
      continue
    }
  default:
      p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
  }
  // Save arena-related properties
  hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
  hint.addr = p
  hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
}

mheap.init

func (h *mheap) init() {
    h.treapalloc.init(unsafe.Sizeof(treapNode{}), nil, nil, &memstats.other_sys)
    h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
    h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
    h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
    h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
    h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)

    // Don't zero mspan allocations. Background sweeping can
    // inspect a span concurrently with allocating it, so it's
    // important that the span's sweepgen survive across freeing
    // and re-allocating a span to prevent background sweeping
    // from improperly cas'ing it from 0.
    //
    // This is safe because mspan contains no heap pointers.
    h.spanalloc.zero = false

    // h->mapcache needs no init
    for i := range h.free {
        h.free[i].init()
        h.busy[i].init()
    }

    h.busylarge.init()
    for i := range h.central {
        h.central[i].mcentral.init(spanClass(i))
    }
}

mcentral.init

Initialize the mcentral of a specification

// Initialize a single central free list.
func (c *mcentral) init(spc spanClass) {
    c.spanclass = spc
    c.nonempty.init()
    c.empty.init()
}

allocmcache

Initialization of mcache

func allocmcache() *mcache {
    lock(&mheap_.lock)
    c := (*mcache)(mheap_.cachealloc.alloc())
    unlock(&mheap_.lock)
    for i := range c.alloc {
        c.alloc[i] = &emptymspan
    }
    c.next_sample = nextSample()
    return c
}

fixalloc.alloc

Fixealloc is a fixed size allocator. Mainly used to allocate some packaging structure for memory, such as: mspan,mcache... and so on, although the actual memory used to start allocation is allocated by other memory allocators. The main idea of allocation is to allocate a large block of memory at the beginning, allocate a small block at each request, and put it in the list list list when released. Because size is unchanged, there will be no memory fragmentation.

func (f *fixalloc) alloc() unsafe.Pointer {
    if f.size == 0 {
        print("runtime: use of FixAlloc_Alloc before FixAlloc_Init\n")
        throw("runtime: internal error")
    }
    
  // If the list is not empty, take it directly
    if f.list != nil {
        v := unsafe.Pointer(f.list)
        f.list = f.list.next
        f.inuse += f.size
        if f.zero {
            memclrNoHeapPointers(v, f.size)
        }
        return v
    }
  // If the block is empty, the system memory allocation is invoked from the system allocation
    if uintptr(f.nchunk) < f.size {
        f.chunk = uintptr(persistentalloc(_FixAllocChunk, 0, f.stat))
        f.nchunk = _FixAllocChunk
    }
    // Allocate a fixed size from the chunk, and when released, it returns to the list.
    v := unsafe.Pointer(f.chunk)
    if f.first != nil {
        f.first(f.arg, v)
    }
    f.chunk = f.chunk + f.size
    f.nchunk -= uint32(f.size)
    f.inuse += f.size
    return v
}

Initialization is simple:

  1. Initialize heap, free large s corresponding specifications linked list, and busyLarge linked list
  2. Initialize mcentral for each specification
  3. Initialize mcache, initialize each corresponding specification in mcache
  4. Initialize arenaHints, fill in a set of addresses, and then expand according to the real arena boundary

distribution

newObject

func newobject(typ *_type) unsafe.Pointer {
    return mallocgc(typ.size, typ, true)
}

mallocgc

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    
    // Set mp.mallocing to keep from being preempted by GC.
    // Locking to prevent being preempted by GC
    mp := acquirem()
    if mp.mallocing != 0 {
        throw("malloc deadlock")
    }
    if mp.gsignal == getg() {
        throw("malloc during signal")
    }
    mp.mallocing = 1

    shouldhelpgc := false
    dataSize := size
    // Get the mcache of the current thread
    c := gomcache()
    var x unsafe.Pointer
    
    // Determine whether the assigned object is nil or non-pointer type
    noscan := typ == nil || typ.kind&kindNoPointers != 0
    if size <= maxSmallSize {
        if noscan && size < maxTinySize {
            // Here we start memory allocation for small objects
            
            // Alignment, offset adjustment
            off := c.tinyoffset
            // Align tiny pointer for required (conservative) alignment.
            if size&7 == 0 {
                off = round(off, 8)
            } else if size&3 == 0 {
                off = round(off, 4)
            } else if size&1 == 0 {
                off = round(off, 2)
            }
            // If the memory space of the tiny block bound to the current mcache is sufficient, allocate it directly and return it
            if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                c.tinyoffset = off + size
                c.local_tinyallocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
            // Allocate a new maxTinySize block.
            // Tiny block memory on current mcache is insufficient, reallocate a piece of tiny block memory
            span := c.alloc[tinySpanClass]
            
            // Attempts to retrieve memory from allocCache failed to return 0
            v := nextFreeFast(span)
            if v == 0 {
                // Instead of getting memory from allocCache, the netxtFree function tries to get a new corresponding specification of fast memory from mcentral, replaces the memory blocks that are not enough in memory space, allocates memory, and parses the nextFree function later.
                v, _, shouldhelpgc = c.nextFree(tinySpanClass)
            }
            x = unsafe.Pointer(v)
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            // See if we need to replace the existing tiny block with the new one
            // based on amount of remaining free space.
            if size < c.tinyoffset || c.tiny == 0 {
                c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
        } else {
            // Here we start the memory allocation for normal objects
            
            // First look up the table to determine the sizeclass
            var sizeclass uint8
            if size <= smallSizeMax-8 {
                sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
            } else {
                sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
            }
            size = uintptr(class_to_size[sizeclass])
            spc := makeSpanClass(sizeclass, noscan)
            // Find span corresponding to sizeclass
            span := c.alloc[spc]
            // As with small object allocation, trying to get memory from allocCache does not return 0
            v := nextFreeFast(span)
            if v == 0 {
                v, span, shouldhelpgc = c.nextFree(spc)
            }
            x = unsafe.Pointer(v)
            if needzero && span.needzero != 0 {
                memclrNoHeapPointers(unsafe.Pointer(v), size)
            }
        }
    } else {
        // Here we begin the allocation of large objects.
        // The allocation of large objects is somewhat different from that of small objects and ordinary objects. Large objects are allocated directly from mheap.
        var s *mspan
        shouldhelpgc = true
        systemstack(func() {
            s = largeAlloc(size, needzero, noscan)
        })
        s.freeindex = 1
        s.allocCount = 1
        x = unsafe.Pointer(s.base())
        size = s.elemsize
    }
    
    // bitmap tag...
    // Check the starting conditions and start garbage collection.

    return x
}

Tidy up the basic idea of this code:

  1. First, decide whether the object is a large object or a common object or a small object.
  2. If it's a small object

    1. Find mspan corresponding to classsize from alloc of mcache
    2. If the current mspan has enough space, assign and modify the relevant properties of mspan (implemented in the nextFreeFast function)
    3. If the current mspan does not have enough space, retrieve a mspan corresponding to the classsize from mcentral, replace the original mspan, and then assign and modify the relevant properties of mspan
  3. If it's a normal object, the logic is roughly the same as the memory allocation of a small object.

    1. First look up the table to determine the sizeclass of the object that needs to allocate memory, and find the mspan corresponding to the class size
    2. If the current mspan has enough space, assign and modify the relevant properties of mspan (implemented in the nextFreeFast function)
    3. If the current mspan does not have enough space, retrieve a mspan corresponding to the classsize from mcentral, replace the original mspan, and then assign and modify the relevant properties of mspan
  4. If it's a large object, it's allocated directly from mheap. The implementation here depends on the largeAlloc function. Let's follow this function first.

largeAlloc

func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
    // print("largeAlloc size=", size, "\n")
    
  // Memory Overflow Judgment
    if size+_PageSize < size {
        throw("out of memory")
    }
  
  // Calculate the number of pages required for the object
    npages := size >> _PageShift
    if size&_PageMask != 0 {
        npages++
    }

    // Deduct credit for this span allocation and sweep if
    // necessary. mHeap_Alloc will also sweep npages, so this only
    // pays the debt down to npage pages.
    deductSweepCredit(npages*_PageSize, npages)
    
  // Realization of Distribution Function
    s := mheap_.alloc(npages, makeSpanClass(0, noscan), true, needzero)
    if s == nil {
        throw("out of memory")
    }
    s.limit = s.base() + size
  // span of bitmap record allocation
    heapBitsForAddr(s.base()).initSpan(s)
    return s
}

mheap.alloc

func (h *mheap) alloc(npage uintptr, spanclass spanClass, large bool, needzero bool) *mspan {
    // Don't do any operations that lock the heap on the G stack.
    // It might trigger stack growth, and the stack growth code needs
    // to be able to allocate heap.
    var s *mspan
    systemstack(func() {
        s = h.alloc_m(npage, spanclass, large)
    })

    if s != nil {
        if needzero && s.needzero != 0 {
            memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
        }
        s.needzero = 0
    }
    return s
}

mheap.alloc_m

Assign a new span from top of heap according to the number of pages, and record the sizeclass of objects on HeapMap and HeapMapCache

func (h *mheap) alloc_m(npage uintptr, spanclass spanClass, large bool) *mspan {
    _g_ := getg()
    if _g_ != _g_.m.g0 {
        throw("_mheap_alloc not on g0 stack")
    }
    lock(&h.lock)

    // Clean up garbage, memory block status tag omitted.
    
    // Get the span of the specified number of pages from heap
    s := h.allocSpanLocked(npage, &memstats.heap_inuse)
    if s != nil {
        // Record span info, because gc needs to be
        // able to map interior pointer to containing span.
        atomic.Store(&s.sweepgen, h.sweepgen)
        h.sweepSpans[h.sweepgen/2%2].push(s) // Add to swept in-use list. //Ignore
        s.state = _MSpanInUse
        s.allocCount = 0
        s.spanclass = spanclass
    // Reset the state of span
        if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
            s.elemsize = s.npages << _PageShift
            s.divShift = 0
            s.divMul = 0
            s.divShift2 = 0
            s.baseMask = 0
        } else {
            s.elemsize = uintptr(class_to_size[sizeclass])
            m := &class_to_divmagic[sizeclass]
            s.divShift = m.shift
            s.divMul = m.mul
            s.divShift2 = m.shift2
            s.baseMask = m.baseMask
        }

        // update stats, sweep lists
        h.pagesInUse += uint64(npage)
        if large {
      // Update the properties of large objects in mheap
            memstats.heap_objects++
            mheap_.largealloc += uint64(s.elemsize)
            mheap_.nlargealloc++
            atomic.Xadd64(&memstats.heap_live, int64(npage<<_PageShift))
            // Swept spans are at the end of lists.
      // Judge whether it's a busy or a busylarge list based on the number of pages, and append it to the end
            if s.npages < uintptr(len(h.busy)) {
                h.busy[s.npages].insertBack(s)
            } else {
                h.busylarge.insertBack(s)
            }
        }
    }
    // gc trace tag, omit...
    unlock(&h.lock)
    return s
}
mheap.allocSpanLocked

Assign a span of a given size and remove the allocated span from the freelist

func (h *mheap) allocSpanLocked(npage uintptr, stat *uint64) *mspan {
    var list *mSpanList
    var s *mspan

    // Try in fixed-size lists up to max.
  // First try to get the span of the specified number of pages, and if not, try more pages.
    for i := int(npage); i < len(h.free); i++ {
        list = &h.free[i]
        if !list.isEmpty() {
            s = list.first
            list.remove(s)
            goto HaveSpan
        }
    }
    // Best fit in list of large spans.
  // Find a suitable span node from freelarge to return, and continue to analyze the function below.
    s = h.allocLarge(npage) // allocLarge removed s from h.freelarge for us
    if s == nil {
    // If a suitable span node cannot be found on freelarge s, it will have to be reassigned from the system.
    // We will continue to analyze this function later.
        if !h.grow(npage) {
            return nil
        }
    // After system allocation, go to freelarge again to find the right node
        s = h.allocLarge(npage)
        if s == nil {
            return nil
        }
    }

HaveSpan:
  // Get the appropriate number of pages from free
    // Mark span in use.
    
    if s.npages > npage {
        // Trim extra and put it back in the heap.
    // Create a span of s.napges - npage size and put back heap
        t := (*mspan)(h.spanalloc.alloc())
        t.init(s.base()+npage<<_PageShift, s.npages-npage)
    // Update the attributes of the acquired span
        s.npages = npage
        h.setSpan(t.base()-1, s)
        h.setSpan(t.base(), t)
        h.setSpan(t.base()+t.npages*pageSize-1, t)
        t.needzero = s.needzero
        s.state = _MSpanManual // prevent coalescing with s
        t.state = _MSpanManual
        h.freeSpanLocked(t, false, false, s.unusedsince)
        s.state = _MSpanFree
    }
    s.unusedsince = 0
    // Place s in spans and arenas arrays
    h.setSpans(s.base(), npage, s)

    *stat += uint64(npage << _PageShift)
    memstats.heap_idle -= uint64(npage << _PageShift)

    //println("spanalloc", hex(s.start<<_PageShift))
    if s.inList() {
        throw("still in list")
    }
    return s
}
mheap.allocLarge

Find a span with a specified number of page s from the freeLarge tree of mheap, remove the span from the tree, and return to nil if not found

func (h *mheap) allocLarge(npage uintptr) *mspan {
    // Search treap for smallest span with >= npage pages.
    return h.freelarge.remove(npage)
}

// The H. freelarge. remote above calls this function
// Typical Binary Tree Search Algorithms
func (root *mTreap) remove(npages uintptr) *mspan {
    t := root.treap
    for t != nil {
        if t.spanKey == nil {
            throw("treap node with nil spanKey found")
        }
        if t.npagesKey < npages {
            t = t.right
        } else if t.left != nil && t.left.npagesKey >= npages {
            t = t.left
        } else {
            result := t.spanKey
            root.removeNode(t)
            return result
        }
    }
    return nil
}

Note: When reading "Go Language Learning Notes", the lookup algorithm here is still the traversal lookup of the linked list.

mheap.grow

In the function mheap.allocSpanLocked, if you can't find a suitable span node on freelarge s, you have to redistribute it from the system, so let's continue to analyze the implementation of this function.

func (h *mheap) grow(npage uintptr) bool {
    ask := npage << _PageShift
  // Apply to the system for memory, and then continue tracking the sysAlloc function
    v, size := h.sysAlloc(ask)
    if v == nil {
        print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
        return false
    }

    // Create a fake "in use" span and free it, so that the
    // right coalescing happens.
  // Create span s to manage the memory you just applied for
    s := (*mspan)(h.spanalloc.alloc())
    s.init(uintptr(v), size/pageSize)
    h.setSpans(s.base(), s.npages, s)
    atomic.Store(&s.sweepgen, h.sweepgen)
    s.state = _MSpanInUse
    h.pagesInUse += uint64(s.npages)
  // Put the spans you just applied into arenas and spans arrays
    h.freeSpanLocked(s, false, true, 0)
    return true
}
mheao.sysAlloc
func (h *mheap) sysAlloc(n uintptr) (v unsafe.Pointer, size uintptr) {
   n = round(n, heapArenaBytes)

   // First, try the arena pre-reservation.
 // Return nil cannot be retrieved from arena by retrieving memory of the corresponding size
   v = h.arena.alloc(n, heapArenaBytes, &memstats.heap_sys)
   if v != nil {
   // Get the required memory from arena and jump to the mapped operation
       size = n
       goto mapped
   }

   // Try to grow the heap at a hint address.
 // Attempt to expand memory downward from arenaHint
   for h.arenaHints != nil {
       hint := h.arenaHints
       p := hint.addr
       if hint.down {
           p -= n
       }
       if p+n < p {
           // We can't use this, so don't ask.
     // Table name hint.down = false cannot expand memory downward
           v = nil
       } else if arenaIndex(p+n-1) >= 1<<arenaBits {
     // Exceeding heap addressable memory address, cannot be used
           // Outside addressable heap. Can't use.
           v = nil
       } else {
     // Currently hint can expand memory downward, and use mmap to request memory from the system
           v = sysReserve(unsafe.Pointer(p), n)
       }
       if p == uintptr(v) {
           // Success. Update the hint.
           if !hint.down {
               p += n
           }
           hint.addr = p
           size = n
           break
       }
       // Failed. Discard this hint and try the next.
       //
       // TODO: This would be cleaner if sysReserve could be
       // told to only return the requested address. In
       // particular, this is already how Windows behaves, so
       // it would simply things there.
       if v != nil {
           sysFree(v, n, nil)
       }
       h.arenaHints = hint.next
       h.arenaHintAlloc.free(unsafe.Pointer(hint))
   }

   if size == 0 {
       if raceenabled {
           // The race detector assumes the heap lives in
           // [0x00c000000000, 0x00e000000000), but we
           // just ran out of hints in this region. Give
           // a nice failure.
           throw("too many address space collisions for -race mode")
       }

       // All of the hints failed, so we'll take any
       // (sufficiently aligned) address the kernel will give
       // us.
       v, size = sysReserveAligned(nil, n, heapArenaBytes)
       if v == nil {
           return nil, 0
       }

       // Create new hints for extending this region.
       hint := (*arenaHint)(h.arenaHintAlloc.alloc())
       hint.addr, hint.down = uintptr(v), true
       hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
       hint = (*arenaHint)(h.arenaHintAlloc.alloc())
       hint.addr = uintptr(v) + size
       hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
   }

   // Check for bad pointers or pointers we can't use.
   {
       var bad string
       p := uintptr(v)
       if p+size < p {
           bad = "region exceeds uintptr range"
       } else if arenaIndex(p) >= 1<<arenaBits {
           bad = "base outside usable address space"
       } else if arenaIndex(p+size-1) >= 1<<arenaBits {
           bad = "end outside usable address space"
       }
       if bad != "" {
           // This should be impossible on most architectures,
           // but it would be really confusing to debug.
           print("runtime: memory allocated by OS [", hex(p), ", ", hex(p+size), ") not in usable address space: ", bad, "\n")
           throw("memory reservation exceeds address space limit")
       }
   }

   if uintptr(v)&(heapArenaBytes-1) != 0 {
       throw("misrounded allocation in sysAlloc")
   }

   // Back the reservation.
   sysMap(v, size, &memstats.heap_sys)

mapped:
   // Create arena metadata.
 // According to the address of v, the L1 L2 of arenas is calculated.
   for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
       l2 := h.arenas[ri.l1()]
       if l2 == nil {
     // If L2 is nil, arenas[L1]
           // Allocate an L2 arena map.
           l2 = (*[1 << arenaL2Bits]*heapArena)(persistentalloc(unsafe.Sizeof(*l2), sys.PtrSize, nil))
           if l2 == nil {
               throw("out of memory allocating heap arena map")
           }
           atomic.StorepNoWB(unsafe.Pointer(&h.arenas[ri.l1()]), unsafe.Pointer(l2))
       }
       
   // If arenas[ri.L1()][ri.L2()] is not null, the description has been instantiated.
       if l2[ri.l2()] != nil {
           throw("arena already initialized")
       }
       var r *heapArena
   // Allocate memory from arena
       r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
       if r == nil {
           r = (*heapArena)(persistentalloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
           if r == nil {
               throw("out of memory allocating heap arena metadata")
           }
       }

       // Store atomically just in case an object from the
       // new heap arena becomes visible before the heap lock
       // is released (which shouldn't happen, but there's
       // little downside to this).
       atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
   }
   // Eliminate part of the code.
   return
}

So far, the allocation process of large objects is over. Let's continue to look at the allocation process of small objects and Putonghua objects.

Small Object and Common Object Allocation

The following section is the main function of memory search and allocation for small objects and ordinary objects. It has been analyzed in the above section. Now we will focus on these two functions.

            span := c.alloc[spc]
            v := nextFreeFast(span)
            if v == 0 {
                v, _, shouldhelpgc = c.nextFree(spc)
            }

nextFreeFast

This function returns the address available on span and 0 if not found

func nextFreeFast(s *mspan) gclinkptr {
  // Calculate how many zeros s.allocCache has since low
    theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
    if theBit < 64 {
    
        result := s.freeindex + uintptr(theBit)
        if result < s.nelems {
            freeidx := result + 1
            if freeidx%64 == 0 && freeidx != s.nelems {
                return 0
            }
      // Update bitmap, available slot index
            s.allocCache >>= uint(theBit + 1)
            s.freeindex = freeidx
            s.allocCount++
      // Returns the address of the memory found
            return gclinkptr(result*s.elemsize + s.base())
        }
    }
    return 0
}

mcache.nextFree

If nextFreeFast cannot find the appropriate memory, it enters the function.

nextFree returns if it finds an unused object in the cached span. Otherwise, it calls the refill function, gets the span of the corresponding classsize from the central, and finds the unused object back from the new span.

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
    // Find the span of the corresponding specification in mcache first
  s = c.alloc[spc]
    shouldhelpgc = false
  // Find the appropriate index in the current span
    freeIndex := s.nextFreeIndex()
    if freeIndex == s.nelems {
        // The span is full.
    // FreeIndex = nelems indicates that the current span is full
        if uintptr(s.allocCount) != s.nelems {
            println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
            throw("s.allocCount != s.nelems && freeIndex == s.nelems")
        }
    // Call the refill function to get the available span from mcentral and replace the span in the current mcache
        systemstack(func() {
            c.refill(spc)
        })
        shouldhelpgc = true
        s = c.alloc[spc]
        
    // Find the appropriate index in the new span again
        freeIndex = s.nextFreeIndex()
    }

    if freeIndex >= s.nelems {
        throw("freeIndex is not valid")
    }
    
  // Calculate the memory address and update the properties of span
    v = gclinkptr(freeIndex*s.elemsize + s.base())
    s.allocCount++
    if uintptr(s.allocCount) > s.nelems {
        println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
        throw("s.allocCount > s.nelems")
    }
    return
}

mcache.refill

Refill retrieves the corresponding span according to the specified sizeclass and acts as the span corresponding to the new sizeclass of mcache

func (c *mcache) refill(spc spanClass) {
    _g_ := getg()

    _g_.m.locks++
    // Return the current cached span to the central lists.
    s := c.alloc[spc]

    if uintptr(s.allocCount) != s.nelems {
        throw("refill of span with free space remaining")
    }
    
  // Judging whether s is empty span
    if s != &emptymspan {
        s.incache = false
    }
    // Try to get a new span from mcentral instead of the old span
    // Get a new cached span from the central lists.
    s = mheap_.central[spc].mcentral.cacheSpan()
    if s == nil {
        throw("out of memory")
    }

    if uintptr(s.allocCount) == s.nelems {
        throw("span has no free space")
    }
    // Update span of mcache
    c.alloc[spc] = s
    _g_.m.locks--
}
mcentral.cacheSpan
func (c *mcentral) cacheSpan() *mspan {
    // Deduct credit for this span allocation and sweep if necessary.
    spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
    // Clean up garbage.
    lock(&c.lock)

    sg := mheap_.sweepgen
retry:
    var s *mspan
    for s = c.nonempty.first; s != nil; s = s.next {
    // if sweepgen == h->sweepgen - 2, the span needs sweeping
    // if sweepgen == h->sweepgen - 1, the span is currently being swept
    // if sweepgen == h->sweepgen, the span is swept and ready to use
    // h->sweepgen is incremented by 2 after every GC
    // span to be cleaned up
        if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            c.nonempty.remove(s)
            c.empty.insertBack(s)
            unlock(&c.lock)
            s.sweep(true)
            goto havespan
        }
        if s.sweepgen == sg-1 {
            // the span is being swept by background sweeper, skip
            continue
        }
        // we have a nonempty span that does not require sweeping, allocate from it
    // Find the span where the slice has not been cleaned up, assign it, and jump to the havespan tag to continue processing.
        c.nonempty.remove(s)
        c.empty.insertBack(s)
        unlock(&c.lock)
        goto havespan
    }
    
  // For span that may be being cleaned in the previous cycle, the cleaned span may have useful span, so here's a traversal check
    for s = c.empty.first; s != nil; s = s.next {
        if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            // we have an empty span that requires sweeping,
            // sweep it and see if we can free some space in it
            c.empty.remove(s)
            // swept spans are at the end of the list
            c.empty.insertBack(s)
            unlock(&c.lock)
            s.sweep(true)
            freeIndex := s.nextFreeIndex()
            if freeIndex != s.nelems {
                s.freeindex = freeIndex
                goto havespan
            }
            lock(&c.lock)
            // the span is still empty after sweep
            // it is already in the empty list, so just retry
            goto retry
        }
        if s.sweepgen == sg-1 {
            // the span is being swept by background sweeper, skip
            continue
        }
        // already swept empty span,
        // all subsequent ones must also be either swept or in process of sweeping
        break
    }

    unlock(&c.lock)

    // Replenish central list if empty.
  // If you can't find a suitable span, add the span corresponding to the classsize. The group function will call mheap.alloc to fill in the span, which has been analyzed above and will not be repeated.
    s = c.grow()
    if s == nil {
        return nil
    }
    lock(&c.lock)
  // Insert after empty span list
    c.empty.insertBack(s)
    unlock(&c.lock)

    // At this point s is a non-empty span, queued at the end of the empty list,
    // c is unlocked.
havespan:

    cap := int32((s.npages << _PageShift) / s.elemsize)
    n := cap - int32(s.allocCount)
    if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
        throw("span has no free objects")
    }
    // Assume all objects from this span will be allocated in the
    // mcache. If it gets uncached, we'll adjust this.
    atomic.Xadd64(&c.nmalloc, int64(n))
    usedBytes := uintptr(s.allocCount) * s.elemsize
    atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
    // Represents that span is in use
    s.incache = true
    freeByteBase := s.freeindex &^ (64 - 1)
    whichByte := freeByteBase / 8
  // Update bitmap
    // Init alloc bits cache.
    s.refillAllocCache(whichByte)

    // Adjust the allocCache so that s.freeindex corresponds to the low bit in
    // s.allocCache.
    s.allocCache >>= s.freeindex % 64

    return s
}

At this point, if you can't find the corresponding span from mcentral, you start the memory expansion journey, that is, mheap.alloc, which we analyzed above, and the latter analysis is the same.

Distribution Summary

In summary, we can see that the general process of memory allocation for Go is as follows

  1. First, decide whether the object is a large object or a common object or a small object.
  2. If it's a small object

    1. Find mspan corresponding to classsize from alloc of mcache
    2. If the current mspan has enough space, assign and modify the relevant properties of mspan (implemented in the nextFreeFast function)
    3. If the current mspan does not have enough space, retrieve a mspan corresponding to the classsize from mcentral, replace the original mspan, and then assign and modify the relevant properties of mspan
    4. If mcentral does not have enough span for the corresponding classsize, apply to mheap
    5. If the space corresponding to the classsize is missing, find a similar span of the classsize, cut and allocate it.
    6. If you can't find the span of similar classsize, apply to the system and add it to mheap
  3. If it's a normal object, the logic is roughly the same as the memory allocation of a small object.

    1. First look up the table to determine the sizeclass of the object that needs to allocate memory, and find the mspan corresponding to the class size
    2. If the current mspan has enough space, assign and modify the relevant properties of mspan (implemented in the nextFreeFast function)
    3. If the current mspan does not have enough space, retrieve a mspan corresponding to the classsize from mcentral, replace the original mspan, and then assign and modify the relevant properties of mspan
    4. If mcentral does not have enough span for the corresponding classsize, apply to mheap
    5. If the space corresponding to the classsize is missing, find a similar span of the classsize, cut and allocate it.
    6. If you can't find the span of similar classsize, apply to the system and add it to mheap
  4. If it's a large object, allocate it directly from mheap

    1. If the space corresponding to the classsize is missing, find a similar span of the classsize, cut and allocate it.
    2. If you can't find the span of similar classsize, apply to the system and add it to mheap

Reference material

Go Language Learning Notes

Graphical Go Language Memory Allocation

Exploring Go Memory Management (Allocation)

Golang Memory Management

Posted by neon on Thu, 15 Aug 2019 06:11:13 -0700