[sduoj] deep understanding of buffer

Keywords: Go data structure source code

2021SC@SDUSC

introduction

In a oj system, the core content is the judgment part. Whether it is a real user or a question judging machine, to detect whether the code written by the user is correct, you only need to compile the source file where the running code is located, and then compare whether the program output result is the same as the standard answer.

In most cases, we need to enter some values into the console. Let's take "sum of two" as an example.

package main

import "fmt"

func main() {
	var a, b int
	fmt.Scanln(&a, &b)
	fmt.Println(a + b)
}

Scanln method will scan text from standard input (i.e. console), and save the successfully read blank separated values into the parameters successfully passed to this function.

In different topics, the number of values read by the program is different. In some complex topics, the number of values read by the program may be unimaginable. At this time, we need a structure to store these data, and the buffer is selected here.

A buffer is a memory area, which is used between the input / output device and the CPU to store data. It makes the low-speed input and output devices and the high-speed CPU work in coordination, avoids the low-speed input and output devices from occupying the CPU, liberates the CPU, and enables it to work efficiently.

Source code analysis

Creating a Buffer is very simple. Just call the NewBuffer method under the bytes package. The bytes package implements common functions for operating [] bytes. The NewBuffer function in buffer.go creates and initializes a Buffer.

func NewBuffer(buf []byte) *Buffer { return &Buffer{buf: buf} }

This Buffer is a variable size byte Buffer that implements the read-write method. From off to the end of buf, there is unread content. If you want to add content to the Buffer, you need to append it to the last side, and if you want to read the content, you need to read it at off. lastRead records the mode of the last reading operation, which is very helpful for the correct use of the unread part.

type Buffer struct {
	buf      []byte
	off      int
	lastRead readOp
}

type readOp int8

If we need to write data to the buffer, we need to call the WriteString method.

The WriteString method appends the contents of the parameters to the buffer and increases the buffer if necessary.

func (b *Buffer) WriteString(s string) (n int, err error) {
	b.lastRead = opInvalid
	m, ok := b.tryGrowByReslice(len(s))
	if !ok {
		m = b.grow(len(s))
	}
	return copy(b.buf[m:], s), nil
}

The following is the trygrowbyreplicate method used in WriteString, which attempts to quickly "increase" the free space by re segmentation. The return value of this method is the index of the byte that should be written and whether the re segmentation is successful.

func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
	if l := len(b.buf); n <= cap(b.buf)-l {
		b.buf = b.buf[:l+n]
		return l, true
	}
	return 0, false
}

The function of the grow method is to increase the buffer capacity.

func (b *Buffer) grow(n int) int {
	// The internal code is described in detail later
}

The principle of buffer is that buf constantly receives data from the outside, while off separates read data and unread data. With the reading of the outside, off is increasing, and the read data will accumulate. Because we only need to read the data in the buffer once, these read data are useless data, which will occupy the space of the buffer. When the space is insufficient, we can consider clearing the garbage data first.

m := b.Len()
if m == 0 && b.off != 0 {
	b.Reset()
}

If the buffer has not been defined and the space to be added does not exceed smallBufferSize (constant, value 64), we can make a slice with length n and capacity smallBufferSize.

if b.buf == nil && n <= smallBufferSize {
	b.buf = make([]byte, n, smallBufferSize)
	return 0
}

To store n more byte s in the slice, just make the total capacity of the slice larger than the sum of the saved length and the length to be saved, that is, C > = m + n. however, if the front and back are exactly equal, the next time you write data to the buffer, you will still encounter the problem of insufficient free space, and then call the grow method again. In order to make the buffer grow less frequently, we can make the total capacity of the slice larger than m + N, but not too large (to prevent wasting space). Here, double the capacity of M + n is used as the benchmark.

c := cap(b.buf)
if n <= c/2-m {
// When the capacity is sufficient
} else if c > maxInt-c-n {
// When the capacity is too large
} else {
// When capacity expansion is required
}

When the capacity is large enough, you only need to move the data after off to the front.

copy(b.buf, b.buf[b.off:])

When the capacity is too large, it will explode ErrTooLarge panic. The maxInt here is the maximum value of integer, 011111... In binary. This number is enough to meet our needs.

panic(ErrTooLarge)

When the capacity is insufficient and needs to be expanded, we need to build a new slice with a capacity of 2*c + n. Why this number? In my opinion, if there is only 2*c, when n is too large, the expanded slices may not be enough. However, if C + n is too small, the expansion degree is small, and it is easy to expand frequently by writing data to the buffer.

buf := makeSlice(2*c + n)
copy(buf, b.buf[b.off:])
b.buf = buf

Finally, we also need to set off to 0, and set some data that is not empty during replication to empty.

b.off = 0
b.buf = b.buf[:m+n]

Posted by madchops on Mon, 04 Oct 2021 10:46:47 -0700