Go language core 36 (go language practice and application 17) -- learning notes

Keywords: Go

39 | bytes packet and byte string operation (Part 2)

In the previous article, we shared the general function of read count in bytes.Buffer and analyzed it around this problem. Let's expand the relevant knowledge.

Knowledge expansion

Question 1: what is the capacity expansion strategy for bytes.Buffer?

The Buffer value can be expanded manually or automatically. Moreover, the strategies of the two expansion methods are basically the same. Therefore, unless we completely determine the number of bytes required for subsequent content, it is good to let the Buffer value automatically expand.

During capacity expansion, the corresponding code in the Buffer value (hereinafter referred to as capacity expansion code) will first judge whether the remaining capacity of the content container can meet the requirements of the caller or whether it is enough to accommodate new content.

If you can, the expansion code will expand the length of the current content container.

More specifically, if the difference between the capacity of the content container and its length is greater than or equal to the number of additional bytes, the expansion code will expand the length of the original content container through slicing, as follows:

b.buf = b.buf[:length+need]

On the contrary, if the remaining capacity of the content container is not enough, the expansion code may replace the original content container with a new content container to realize capacity expansion.

However, there is still one step to optimize.

If half of the capacity of the current content container is still greater than or equal to the sum of its existing length (i.e. unread bytes) plus the number of additional bytes required, that is:

cap(b.buf)/2 >= b.Len() + need

Then, the expansion code will reuse the existing content container and copy the unread content in the container to its head.

This also means that all the read content will be overwritten by unread content and subsequent new content.

Such reuse is expected to save at least one memory allocation caused by subsequent capacity expansion and several byte copies.

If this optimization is not achieved, that is, the capacity of the current content container is less than twice the new length.

Then, the expansion code can only create a new content container, copy the unread content in the original container, and finally replace the original container with a new container. The capacity of the new container will be equal to twice the original capacity plus the sum of additional bytes required.

Capacity of new container = 2 * original capacity + required bytes

Through the above steps, the expansion of the content container is basically completed. However, in order to ensure the consistency of internal data and avoid data confusion caused by the original read content, the expansion code will also set the read count to 0 and slice the content container to cover up the original read content.

Incidentally, for the Buffer value in the zero state, if the number of additional bytes required during the first expansion is not greater than 64, the value will create a content container based on a predefined byte array with a length of 64.

In this case, the capacity of the content container is 64. The purpose of this is to make the Buffer value ready quickly when it is actually used.

package main

import (
	"bytes"
	"fmt"
)

func main() {
	// Example 1.
	var contents string
	buffer1 := bytes.NewBufferString(contents)
	fmt.Printf("The length of new buffer with contents %q: %d\n",
		contents, buffer1.Len())
	fmt.Printf("The capacity of new buffer with contents %q: %d\n",
		contents, buffer1.Cap())
	fmt.Println()

	contents = "12345"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer1.WriteString(contents)
	fmt.Printf("The length of buffer: %d\n", buffer1.Len())
	fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())
	fmt.Println()

	contents = "67"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer1.WriteString(contents)
	fmt.Printf("The length of buffer: %d\n", buffer1.Len())
	fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())
	fmt.Println()

	contents = "89"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer1.WriteString(contents)
	fmt.Printf("The length of buffer: %d\n", buffer1.Len())
	fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())
	fmt.Print("\n\n")

	// Example 2.
	contents = "abcdefghijk"
	buffer2 := bytes.NewBufferString(contents)
	fmt.Printf("The length of new buffer with contents %q: %d\n",
		contents, buffer2.Len())
	fmt.Printf("The capacity of new buffer with contents %q: %d\n",
		contents, buffer2.Cap())
	fmt.Println()

	n := 10
	fmt.Printf("Grow the buffer with %d ...\n", n)
	buffer2.Grow(n)
	fmt.Printf("The length of buffer: %d\n", buffer2.Len())
	fmt.Printf("The capacity of buffer: %d\n", buffer2.Cap())
	fmt.Print("\n\n")

	// Example 3.
	var buffer3 bytes.Buffer
	fmt.Printf("The length of new buffer: %d\n", buffer3.Len())
	fmt.Printf("The capacity of new buffer: %d\n", buffer3.Cap())
	fmt.Println()

	contents = "xyz"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer3.WriteString(contents)
	fmt.Printf("The length of buffer: %d\n", buffer3.Len())
	fmt.Printf("The capacity of buffer: %d\n", buffer3.Cap())
}

Question 2: which methods in bytes.Buffer may cause content disclosure?

First of all, what is content disclosure? The content disclosure mentioned here means that the party using the Buffer value obtains the content that should not have been obtained in a non-standard (or informal) way.

For example, I get some unread content by calling a method for reading the content of the Buffer value. I should and should only get the unread content in the Buffer value at that time through the result value of this method.

However, after the Buffer value has some new contents, I can directly obtain new contents through the result value obtained at that time without calling the corresponding method again.

This is a typical non-standard reading method. This reading method should not exist. Even if it exists, we should not use it. Because it is inadvertently (or accidentally) exposed, its behavior is likely to be unstable.

In bytes.Buffer, both the Bytes method and the Next method may cause content disclosure. The reason is that they all return the content container based slice directly to the caller of the method.

We all know that through slicing, we can directly access and manipulate its underlying array. This is true whether the slice is based on an array or obtained by slicing another slice.

Here, the byte slices returned by the Bytes method and the Next method are obtained by slicing the content container. That is, they share the same underlying array with the content container, at least for a period of time.

Take the Bytes method as an example. It returns all unread content in its value at the moment of the call. The example code is as follows:

contents := "ab"
buffer1 := bytes.NewBufferString(contents)
fmt.Printf("The capacity of new buffer with contents %q: %d\n",
 contents, buffer1.Cap()) // The capacity of the content container is: 8.
unreadBytes := buffer1.Bytes()
fmt.Printf("The unread bytes of the buffer: %v\n", unreadBytes) // Unread content is: [97 98].

I initialized a Buffer value with the string value "ab", represented by the variable buffer1, and printed some states of the value at that time.

You may wonder why I only put a string value with a length of 2 in this Buffer value, but why does the capacity of this value become 8.

Although this has nothing to do with our current topic, I can remind you that you can read a function called stringtoslicebyte in the runtime package, and the answer is in it.

Go on to buffer1. I also wrote the string value "cdefg" to this value, and at this time, its capacity is still 8. The result value unreadBytes I obtained earlier by calling the Bytes method of buffer1 contains all unread contents at that time.

However, since this result value and the content container of buffer1 still share the same underlying array at this time, I can use this result value to get all unread contents of buffer1 at this time through a simple re slicing operation. In this way, the new content of buffer 1 is leaked.

buffer1.WriteString("cdefg")
fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap()) // The capacity of the content container remains: 8.
unreadBytes = unreadBytes[:cap(unreadBytes)]
fmt.Printf("The unread bytes of the buffer: %v\n", unreadBytes) // Based on the result value obtained above, the unread content is: [97 98 99 100 101 102 103 0].

If I passed the value of unreadBytes to the outside world, the outside world can manipulate the contents of buffer1 through this value, as follows:

unreadBytes[len(unreadBytes)-2] = byte('X') // The ASCII encoding of 'X' is 88.
fmt.Printf("The unread bytes of the buffer: %v\n", buffer1.Bytes()) // Unread content becomes: [97 98 99 100 101 102 88].

Now, you should be able to realize the possible serious consequences of content disclosure here?

The same problem exists for the Next method of Buffer value. However, if the content container of Buffer value or its underlying array is reset after capacity expansion, the previous content disclosure problem cannot be further developed. I wrote a relatively complete example in the demo80.go file. You can take a look and figure it out.

package main

import (
	"bytes"
	"fmt"
)

func main() {
	// Example 1.
	contents := "ab"
	buffer1 := bytes.NewBufferString(contents)
	fmt.Printf("The capacity of new buffer with contents %q: %d\n",
		contents, buffer1.Cap())
	fmt.Println()

	unreadBytes := buffer1.Bytes()
	fmt.Printf("The unread bytes of the buffer: %v\n", unreadBytes)
	fmt.Println()

	contents = "cdefg"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer1.WriteString(contents)
	fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())
	fmt.Println()

	// Just expand the unread byte slice unreadBytes obtained before,
	// You can use it to read or even modify the subsequent contents in the buffer.
	unreadBytes = unreadBytes[:cap(unreadBytes)]
	fmt.Printf("The unread bytes of the buffer: %v\n", unreadBytes)
	fmt.Println()

	value := byte('X')
	fmt.Printf("Set a byte in the unread bytes to %v ...\n", value)
	unreadBytes[len(unreadBytes)-2] = value
	fmt.Printf("The unread bytes of the buffer: %v\n", buffer1.Bytes())
	fmt.Println()

	// However, this cannot be done after the buffer's content container is really expanded.
	contents = "hijklmn"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer1.WriteString(contents)
	fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())
	fmt.Println()

	unreadBytes = unreadBytes[:cap(unreadBytes)]
	fmt.Printf("The unread bytes of the buffer: %v\n", unreadBytes)
	fmt.Print("\n\n")

	// Example 2.
	// The following byte slices returned by the Next method have the same problem.
	contents = "12"
	buffer2 := bytes.NewBufferString(contents)
	fmt.Printf("The capacity of new buffer with contents %q: %d\n",
		contents, buffer2.Cap())
	fmt.Println()

	nextBytes := buffer2.Next(2)
	fmt.Printf("The next bytes of the buffer: %v\n", nextBytes)
	fmt.Println()

	contents = "34567"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer2.WriteString(contents)
	fmt.Printf("The capacity of buffer: %d\n", buffer2.Cap())
	fmt.Println()

	// Just expand the subsequent byte slice nextBytes obtained before,
	// You can use it to read or even modify the subsequent contents in the buffer.
	nextBytes = nextBytes[:cap(nextBytes)]
	fmt.Printf("The next bytes of the buffer: %v\n", nextBytes)
	fmt.Println()

	value = byte('X')
	fmt.Printf("Set a byte in the next bytes to %v ...\n", value)
	nextBytes[len(nextBytes)-2] = value
	fmt.Printf("The unread bytes of the buffer: %v\n", buffer2.Bytes())
	fmt.Println()

	// However, this cannot be done after the buffer's content container is really expanded.
	contents = "89101112"
	fmt.Printf("Write contents %q ...\n", contents)
	buffer2.WriteString(contents)
	fmt.Printf("The capacity of buffer: %d\n", buffer2.Cap())
	fmt.Println()

	nextBytes = nextBytes[:cap(nextBytes)]
	fmt.Printf("The next bytes of the buffer: %v\n", nextBytes)
}

summary

Let's summarize with two articles. Different from strings.Builder, bytes.Buffer can not only splice and truncate the byte sequences, export the contents in various forms, but also read the sub sequences in sequence.

The bytes.Buffer type uses byte slices as its content container, and uses a field to record the count of read bytes in real time.

Although we cannot directly calculate the read count, it is necessary to understand it because it plays a key role in the Buffer value.

Whether it is read, write, truncate, export or reset, read count is an important part of function implementation.

Like the values of strings.Builder, the Buffer value can be expanded manually or automatically. Unless we completely determine the number of bytes required for subsequent content, it is good to let the Buffer value expand automatically.

The capacity expansion method of Buffer value does not necessarily replace the existing content container in order to obtain greater capacity, but reuses the current content container in the principle of minimizing memory allocation and content copy. Moreover, it will create a new content container only when the capacity can not meet the requirements.

In addition, you may not think that some methods of Buffer value may cause content disclosure. This is mainly because the result values returned by these methods will share the same underlying array with the content container of their values over a period of time.

If we intentionally or unintentionally transfer these result values to the outside world, the outside world may manipulate the content of the associated Buffer value through them.

This is a very serious data security problem. We must avoid this. The most thorough approach is to isolate values such as slices before they are sent out. For example, first make a deep copy of them, and then transfer the copy out.

Thinking questions

Today's question is: compare the String methods of strings.Builder and bytes.Buffer, and judge which is more efficient? Why?

Note source code

https://github.com/MingsonZheng/go-core-demo

Posted by rtown on Mon, 29 Nov 2021 16:40:08 -0800