brief introduction
The go standard library bufio.Scanner is literally a scanner and scanner. The data soldier cache is continuously read from a reader, and an injection function is provided to customize the partitioner. Four predefined segmentation methods are also provided in the library.
- ScanLines: Separated by newline characters ('n')
- ScanWords: Returns a word that is partitioned by a "space"
- ScanRunes: Returns a single UTF-8 encoded rune as a token
- ScanBytes: Returns a single byte as a token
Usage method
Before we look at how to use it, we need to look at a function first.
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
This function accepts a byte array and returns three return values with an atEOF flag bit, which is used to indicate whether there is more data. The first is the number of bytes pushing the input (usually the number of token bits)
The splist function determines whether the flag bit is found or not, and if not, Scan can return (0,nil,nil) Scan to get the return value and continue to read the characters that have not been completed after reading. If it is found, it is returned with the correct return value. Here's a simple example
func main() { input := "abcend234234234" fmt.Println(strings.Index(input,"end")) scanner := bufio.NewScanner(strings.NewReader(input)) scanner.Split(ScanEnd) //Set Read Buffer Read Size to read 2 bytes at a time. Double the Buffer Size if the Buffer is insufficient buf := make([]byte, 2) scanner.Buffer(buf, bufio.MaxScanTokenSize) for scanner.Scan() { fmt.Println("output:",scanner.Text()) } if scanner.Err() != nil { fmt.Printf("error: %s\n", scanner.Err()) } } func ScanEnd(data []byte, atEOF bool) (advance int, token []byte, err error) { //If the data is empty, the data has been read and returned directly. if atEOF && len(data) == 0 { return 0, nil, nil } // Get the location of the custom end flag bit index:= strings.Index(string(data),"end") if index > 0{ //If the first parameter returned is found to be the backward character length //The second parameter is the character before the index bit. //The third parameter is whether there is an error or not. return index+3, data[0:index],nil } if atEOF { return len(data), data, nil } //If not found, return 0, nil, nil return 0, nil, nil }
The above example shows that the string is "abcend234234234"“
Because it's set to read two strings at a time
First read: buf = ab did not find end ScanEnd and returned 0,nil,nil
Second read: buf = abce did not find end ScanEnd and returned 0,nil,nil
Third read: buf = abcend23(buf doubling expansion) find the custom flag end and return: 6, abc, nil call out abc
Fourth read: buf = 23423423 before the read was removed, hesitating to read 8 characters directly with the size of buf
Fifth read: because buf capacity is insufficient to double, direct access to all data output out 234234234
The result is:
output: abc
output: 234234234
You can see that the scanner outputs the results according to the custom read size and the token Terminator
Source code view
type Scanner struct { r io.Reader // reader split SplitFunc // Partitioning Function and External Injection maxTokenSize int // Maximum length of token token []byte // The last token returned by split buf []byte // Buffer character start int // The first unprocessed byte in buf end int // Data End Marker in buf err error // Sticky error. empties int // Counting of Continuous Empty Tokens scanCalled bool // done bool // Is the scan completed? } func (s *Scanner) Scan() bool { if s.done { return false } s.scanCalled = true // for loop until token is found for { if s.end > s.start || s.err != nil { // Call the split function to get the return value. The function determines whether there is an error in token token's backward token number. advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil) if err != nil { if err == ErrFinalToken { s.token = token s.done = true return true } s.setErr(err) return false } if !s.advance(advance) { return false } s.token = token if token != nil { if s.err == nil || advance > 0 { s.empties = 0 } else { // Returning tokens not advancing input at EOF. s.empties++ if s.empties > 100 { panic("bufio.Scan: 100 empty tokens without progressing") } } return true } } //If there are errors, return false if s.err != nil { // Shut it down. s.start = 0 s.end = 0 return false } //Read more data by resetting the start and end positions if s.start > 0 && (s.end == len(s.buf) || s.start > len(s.buf)/2) { copy(s.buf, s.buf[s.start:s.end]) s.end -= s.start s.start = 0 } // If buf is full, if full, recreate a buf twice the length of the original if s.end == len(s.buf) { const maxInt = int(^uint(0) >> 1) if len(s.buf) >= s.maxTokenSize || len(s.buf) > maxInt/2 { s.setErr(ErrTooLong) return false } newSize := len(s.buf) * 2 if newSize == 0 { newSize = startBufSize } if newSize > s.maxTokenSize { newSize = s.maxTokenSize } newBuf := make([]byte, newSize) copy(newBuf, s.buf[s.start:s.end]) s.buf = newBuf s.end -= s.start s.start = 0 } //If not, continue reading data later for loop := 0; ; { n, err := s.r.Read(s.buf[s.end:len(s.buf)]) s.end += n if err != nil { s.setErr(err) break } if n > 0 { s.empties = 0 break } loop++ if loop > maxConsecutiveEmptyReads { s.setErr(io.ErrNoProgress) break } } } }
summary
According to the source code and examples above, we can see the function of this scanner. Of course, when used formally, it will not only read a dead string. The IO buffer provides a temporary storage area to store data. The data stored in the buffer will be "released" after it reaches a certain capacity for the next storage. This way greatly reduces the number of write operations or the number of triggers of the final system call, which will undoubtedly save a lot of system resources when frequently using system resources. Overhead. For read operations, buffering IO means that more data can be read per operation, which not only reduces the number of system calls, but also makes more efficient use of underlying hardware by reading hard disk data in blocks.