Bugs in Golang caused by zero value and gob Library Characteristics

Keywords: Go Back-end bug

0 origin

In September this year, the Department platform project I was responsible for released a new version, which also launched a new function. In short, it is a little similar to scheduled tasks. On the first day, everything was normal, but on the second day, very few tasks were not executed normally (the suspended tasks continued to be executed, but the normal tasks were not executed).

The first reaction of another colleague and I was that there was a problem with the logic of scheduled task execution. However, after we spent a lot of time debugging and testing, we found that the root of the problem is not the functional logic, but a section of underlying public code that has been online for a year and has not been moved. The core of this code is gob, the protagonist of this article. The root of the problem is a feature of go language: zero value.

Later, I will use a more simplified example to describe this BUG.

1 gob and zero value

Let's briefly introduce gob and zero value.

1.1 zero value

Zero value is a feature of Go language. In short, Go language will provide a default value for some variables that have not been assigned a value. For example, the following code:

package main

import (
    "fmt"
)

type person struct {
    name   string
    gender int
    age    int
}

func main() {
    p := person{}
    var list []byte
    var f float32
    var s string
    var m map[string]int
    
    fmt.Println(list, f, s, m)
    fmt.Printf("%+v", p)
}

/* Result output
[] 0  map[]
{name: gender:0 age:0}
*/

Zero value does bring convenience to developers many times, but many people who don't like it think that the existence of zero value makes the code loose from the syntax level and brings some uncertainty. For example, the problem I will describe in detail later.

1.2 gob

Gob is the standard library of Go language, which is in encoding/gob. Gob is actually the abbreviation of go binary, so we can guess from its name that gob should be related to binary.

In fact, gob is a unique format for serializing and deserializing program data in binary form, similar to pickle in Python. Its most common usage is to serialize an object (structure) and store it in a disk file. When it needs to be used, read the file and deserialize it, so as to achieve the effect of object persistence.

I won't give any examples. This article is not about the use of gob. This is its Official documents , friends who are not familiar with gob usage can take a look at the Example section in the document, or directly look at the examples I use to describe the problem later.

2 questions

2.1 requirements

At the beginning of this article, I briefly described the origin of the problem, and here I use a simpler model to describe it.

First, we define a structure named person:

type person struct {
    // As with the json library, field initials must be uppercase (public) to be serialized
    ID     int
    Name   string // full name
    Gender int    // Gender: male 1, female 0
    Age    int    // Age
}

Around this structure, we will enter several personnel information, and each personnel is a person object. But for some reasons, we must use gob to persist these personnel information to local disk instead of using databases such as MySQL.

Next, we have a demand:

Traverse and deserialize the locally stored gob files, then judge the number of men and women, and count them.

2.2 code

According to the above requirements and background, the code is as follows (package, import, init() and other codes are omitted here to save space):

  • defines.go
// Directory of. gob file
const DIR = "./persons"

type person struct {
    // As with the json library, field initials must be uppercase (public) to be serialized
    ID     int
    Name   string // full name
    Gender int    // Gender: male 1, female 0
    Age    int    // Age
}

// Objects that need persistence
var persons = []person{
    {0, "Mia", 0, 21},
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
}
  • serializer.go
// serialize serializes the person object and stores it in a file,
// The file name is. / persons/${p.id}.gob
func serialize(p person) {
    filename := filepath.Join(DIR, fmt.Sprintf("%d.gob", p.ID))
    buffer := new(bytes.Buffer)
    encoder := gob.NewEncoder(buffer)
    _ = encoder.Encode(p)
    _ = ioutil.WriteFile(filename, buffer.Bytes(), 0644)
}

// Deserialize deserializes the. gob file and stores it in the pointer parameter
func unserialize(path string, p *person) {
    raw, _ := ioutil.ReadFile(path)
    buffer := bytes.NewBuffer(raw)
    decoder := gob.NewDecoder(buffer)
    _ = decoder.Decode(p)
}
  • main.go
func main() {
    storePersons()
    countGender()
}

func storePersons() {
    for _, p := range persons {
        serialize(p)
    }
}

func countGender() {
    counter := make(map[int]int)
    // A temporary pointer is used as the carrier of the object in the file to save the cost of creating a new object.
    tmpP := &person{}
    for _, p := range persons {
        // For convenience, you can directly traverse people here, but only take the ID to read the file
        id := p.ID
        filename := filepath.Join(DIR, fmt.Sprintf("%d.gob", id))
        // Deserialize objects into tmpP
        unserialize(filename, tmpP)
        // Statistical gender
        counter[tmpP.Gender]++
    }
    fmt.Printf("Female: %+v, Male: %+v\n", counter[0], counter[1])
}

After executing the code, we get the following results:

// Objects
var persons = []person{
    {0, "Mia", 0, 21},
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
}

// Result output
Female: 1, Male: 4

Huh? 1 female, 4 male? The BUG appears, and the result is obviously inconsistent with our preset data. What's wrong?

2.3 positioning

We add a line of print statement to the for loop in the countGender() function to read out the person object read each time, and then get the following results:

// add rows
fmt.Printf("%+v\n", tmpP)

// Result output
&{ID:0 Name:Mia Gender:0 Age:21}
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
&{ID:3 Name:Jenny Gender:1 Age:16}
&{ID:4 Name:Marry Gender:1 Age:30}

Good guy, Jenny and Mary have become men! But the magic thing is that all the other data are normal except for Gender! Seeing this result, if you, like me, often deal with configuration files such as JSON and Yml, you may take it for granted that the gob file above reads normally and there should be a problem with storage.

However, gob files are binary files, which are difficult to verify with the naked eye like JSON files. Even if you use tools such as xxd under Linux, you can only get such an ambiguous output:

>$ xxd persons/1.gob 
0000000: 37ff 8103 0101 0670 6572 736f 6e01 ff82  7......person...
0000010: 0001 0401 0249 4401 0400 0104 4e61 6d65  .....ID.....Name
0000020: 010c 0001 0647 656e 6465 7201 0400 0103  .....Gender.....
0000030: 4167 6501 0400 0000 0eff 8201 0201 034a  Age............J
0000040: 696d 0102 0124 00                        im...$.

>$ xxd persons/0.gob 
0000000: 37ff 8103 0101 0670 6572 736f 6e01 ff82  7......person...
0000010: 0001 0401 0249 4401 0400 0104 4e61 6d65  .....ID.....Name
0000020: 010c 0001 0647 656e 6465 7201 0400 0103  .....Gender.....
0000030: 4167 6501 0400 0000 0aff 8202 034d 6961  Age..........Mia
0000040: 022a 00                                  .*.

Maybe we can try to hard parse these binary files to compare the differences between them; Or two objects as like as two peas in Gender are serialized, and then compared to gob files. If you are interested, you can try. At that time, we did not try this method because of time constraints, but modified the data to continue the test.

2.4 law

Since the two data in question above are women, the programmer's intuition tells me that this may not be a coincidence. So I tried to modify the order of data, completely separate men and women, and then test:

// The first group, women first and then men
var persons = []person{
    {0, "Mia", 0, 21},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
}

// Result output
&{ID:0 Name:Mia Gender:0 Age:21}
&{ID:3 Name:Jenny Gender:0 Age:16}
&{ID:4 Name:Marry Gender:0 Age:30}
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
// The second group, men first and then women
var persons = []person{
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
    {0, "Mia", 0, 21},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
}

// Result output
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
&{ID:2 Name:Mia Gender:1 Age:21}
&{ID:3 Name:Jenny Gender:1 Age:16}
&{ID:4 Name:Marry Gender:1 Age:30}

The strange phenomenon appeared. When women came first and then men, everything was normal; When men are first followed by women, men are normal and women are not normal. Even Mia's ID, which was originally 0, has become 2 here!

After repeated testing and observation of the result set, we get such a regular conclusion: all male data are normal, and all the problems are female data!

The conclusion of further formulaic description is that if the previous data is a non-0 number and the subsequent data number is 0, the subsequent 0 will be overwritten by the non-0 in front of it.

3 answer

Auditing the program code again, I noticed this sentence:

// A temporary pointer is used as the carrier of the object in the file to save the cost of creating a new object.
tmpP := &person{}

In order to save the extra cost of creating new objects, I use the same variable to cycle the data in the file and determine the gender. Combined with the BUG rule we found earlier, the answer seems to be close at hand: the so-called later data 0 is overwritten by the previous non-0, which is likely because the same object is used to load the file, resulting in the residue of the previous data.

The verification method is also very simple. You only need to put the public object into the following for loop to recreate an object for loading file data in each loop to cut off the impact of the previous data.

Let's modify the code (omit the redundant part):

for _, p := range persons {
    // ...
    tmpP := &person{}
    // ...
}

// Result output
&{ID:0 Name:Mia Gender:0 Age:21}
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
&{ID:3 Name:Jenny Gender:0 Age:16}
&{ID:4 Name:Marry Gender:0 Age:30}
Female: 3, Male: 2

by the way!

The result is indeed the reason for the data residue, as we suppose. But here is another question: why is the data read by the old method normal when it is 0 before non-0 (female before male)? And, except that 0 will be affected, other numbers (age) will not be affected?

All the questions now seem to point to the special number 0!

Until this time, the characteristic of zero value was finally noticed by us. So I quickly read the gob library Official documents , I found the following sentence:

If a field has the zero value for its type (except for arrays; see above), it is omitted from the transmission.

translate:

If the type of a field has a zero value (except array), it will be omitted in transmission.

Before and after this sentence is about struct, so the field here also refers to the field in the structure, which is consistent with the example in our article.

According to our previous conclusions and the description of official documents, we can finally draw a complete conclusion:

The gob library ignores zero values outside the array when manipulating data. At the beginning, our code uses a public object to load file data. Since the zero value is not transmitted, the field with zero value in the original data will not be read. What we see is actually the last non-zero value object data.

The solution is also very simple. As I did above, just don't use public objects to load.

4 review

In the project BUG I described at the beginning of the article, I used 0 and 1 to represent the status of a scheduled task (suspended and running). Just like the above person.Gender, different tasks are disturbed due to the problem of zero value, resulting in abnormal task execution, while other fields that do not involve zero value are normal. Despite the online production environment, fortunately, the problems were found early and handled in time, and did not cause any production accidents. But the whole process and the final answer are deeply imprinted in my mind.

Later, my colleagues and I briefly discussed why gob chose to ignore zero value? From my point of view, it may be to save space. The code we wrote at the beginning also created a public object to save space. As a result, the two space saving logic finally collided with a hidden BUG.

Posted by jcbones on Sat, 04 Dec 2021 23:24:10 -0800