Go select statement and related examples [go language Bible notes]

select based multiplexing

The following procedure will be similar to the countdown of rocket launch. The time.Tick function returns a channel to which the program will periodically send events like a metronome. The value of each event is a timestamp, but what is more interesting is the transmission method of the value.

// gopl.io/ch8/countdown1
func main() {
    fmt.Println("Commencing countdown.")
    tick := time.Tick(1*time.Second)  // channel
    for countdown:=10; countdown>0; countdown-- {
        fmt.Println(countdown)
        <-tick
    }
    launch()
}

Now let's make this program support directly interrupting the launch process when the user presses the return key during the countdown. First, we start a goroutine, which will try to read a separate byte from the standard input, and, if successful, send a value to the channel named abort.

// gopl.io/ch8/countdown2
abort := make(chan struct{})
go func() {
    os.Stdin.Read(make([]byte, 1))
    abort <- struct{}{}
}()

Now, each iteration of the counting cycle needs to wait for the return event of one of the two channels: the ticker channel when everything is normal (like NASA jorgon's "nominal") or the abort event returned when everything is abnormal. We can't receive information from each channel. If we do, if there is no event sent from the first channel, the program will be blocked immediately, so we can't receive the event sent from the second channel. At this time, we need to multiplex these operations. In order to multiplex, we use the select statement.

select {
case <-ch1:
    // ...
case x := <-ch2:
    // ...use x...
case ch3 <- y:
    // ...
default:
    // ...
}

The above is the general form of the select statement. Slightly similar to the switch statement, there will be several cases and the final default selection branch. Each case represents a communication operation (sending or receiving on a channel) and contains a statement block composed of some statements. A receiving expression can only contain the receiving expression itself (note to the author: in this case, the value sent by channel is not assigned to a variable and cannot be referenced naturally), just like the first case above, or contained in a short variable declaration, just like the second case; The second form allows you to reference the received value.

Select will be executed when there are cases that can be executed in the case. When the conditions are met, select will communicate and execute the statements after case; And other communications will not be performed at this time. If a select statement without any case is written as select {}, it will wait forever.

Let's go back to our rocket launch program. The time.After function will immediately return a channel, start a new goroutine, and send an independent value to the channel after a specific time. The following select statement will wait until one of the two events arrives, whether it is an abort event or a 10 second event. If no abort event enters after 10 seconds, the rocket will launch.

func main() {
    // ...create abort channel...
    
    fmt.Println("Commencing countdonw. Press return to abort.")
    select {
    case <- time.After(10 * time.Second)
        // do nothing time.After itself is a timer, which is called directly here
    case <- abort:
        fmt.Println("lanuch abort") 
        return  // return implementation terminated
    }
    launch()
}

The following example is more subtle. The buffer size of ch channel is 1, so it will be alternately empty or full, so only one case can go on. Whether i is odd or even, it will print 0 2 4 6 8.

ch := make(chan int, 1)
for i:=0; i<10; i++ {
    select {
    case x := <-ch:
        fmt.Println(x)  // 0 2 4 6 8
    case ch <- i:
    }
}

Author's note: why? Note that the case under the select statement corresponds to the communication behavior, that is, if a channel is not sent, the first case will never be executed (the first output of zero value is due to the buffer size of 1). Therefore, when i is singular, only the second case can be sent to the channel, and when i is even, the first case can be run because there is a sending behavior.

If multiple case s are ready at the same time, select will randomly select one to execute, so as to ensure that each channel has an equal opportunity to be selected. Increasing the buffer size of the previous example (author's note: > 1) will make its output uncertain, because when the buffer is neither full nor empty, the execution of the select statement is as random as a coin toss.

Let's print the countdown for our launch program. The select statement here will make each loop iteration wait one second to execute the exit operation.

// gopl.io/ch8/countdown3

func main() {
    // ...create abort channel...

    fmt.Println("Commecing countdown. Press return to abort.")
    tick := time.Tick(1*time.Second)
    for countdown := 10; countdown > 0; countdown-- {
        fmt.Println(countdown)
        select {
        case <- tick:
            // do noting count down
        case <- abort:
            fmt.Println("Launch aborted")
            return
        }
    }
    launch()
}

The time.Tick function behaves as if it had created a goroutine calling time.Sleep in the loop, sending an event every time it was awakened. When the countdown function returns, it will stop receiving events from the tick, but the goroutine of the ticker is still alive and continues to try in vain to send values to the channel. However, at this time, no other goroutine will receive values from the channel - this is called goroutine disclosure (§ 8.4.4).

The Tick function is very convenient, but it is more appropriate for us to use it only when the whole life cycle of the program needs this time. Otherwise, we should use the following mode:

ticker := time.NewTicker(1 * time.Second)
<- ticker.C    // receive from the ticker's channel
ticker.Stop()  // cause the ticker's goroutine to terminate

Sometimes we want to send or receive values from the channel and avoid blocking caused by sending or receiving. Especially when the channel is not ready to write or read, the select statement can achieve this function. Select can have a default to set what logic the program needs to execute when other operations cannot be processed immediately.

The following select statement will receive a value from abort channel when there is a value, and do nothing when there is no value. This is a non blocking receive operation. Doing this repeatedly is called "polling channel".

select {
case <- abort:
    fmt.Printf("Launch aborted\n")
    return
default:
    // do nothing      
}

The zero value of channel is nil. It may make you feel strange. Nil's channel is sometimes useful. Because the sending and receiving operations of a nil channel will always be blocked, the channel operating nil in the select statement will never be selected.

This allows us to use nil to activate or disable case to achieve the logic of timeout and cancellation when processing other input or output events. We will see an example in the next section.

Example: concurrent directory traversal

du = disk usage

In this section, we will create a program to generate the hard disk usage report of the specified directory, which is similar to the du tool in Unix. Most of the work is done with the following walkDir function, which uses the dirents function to enumerate all entries in a directory.

// gopl.io/ch8/du1
// walkDir recursively walks the file tree rooted a dir
// and sends the size of each found file on fileSizes

func walkDir(dir string, fileSize chan<- int64) {
    for _, entry := range dirents(dir) { // This function is declared below
        if entry.IsDir() {
            subdir := filepath.Join(dir, entry.Name())
            walkDir(subdir, fileSizes)
        } else {
            fileSizes <- entry.Size()    
        }
    }
}


// dirents returns the entires of directory dir
func dirents(dir string) []os.FileInfo {
    entries, err := ioutil.ReadDir(dir)
    if err != nil {
        fmt.Fprintf(os.Stderr, "du1: %v\n", err)
        return nil
    }
    return entries
}

ioutil.ReadDir function will return a slice of os.FileInfo type, which is also the return value of os.Stat function. For each subdirectory, walkDir will recursively call itself, and also get the information of each file in recursion. The walkDir function sends a message to the fileSizes channel. This message contains the byte size of the file.

The following main function uses two goroutines. The goroutine in the background calls walkDir to traverse each path given by the command line and finally close the fileSizes channel. The main goroutine will accumulate the file size it receives from the channel and output the final sum.

package main

import (
    "flag"
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
)

func main() {
    // Determine the initial directories
    flag.Parse()  // flag is used to obtain command line parameters, as shown in the example in the previous chapter
    roots := flag.Args()
    if len(roots) == 0 {
        roots = []string{"."}
    }
    
    // Traverse the file tree
    fileSiezes := make(chan int64)
    go func() {
        for _, root := range roots {
            walkDir(root, fileSizes)
        }
        close(fileSizes)
    }()
    
    
    // Print the results
    var nfiles, nbytes int64
    for size := range fileSizes {
        nfiles++
        nbytes += size
    }
    printDiskUsage(nfiles, nbytes)
}


func printDiskUsage(nfiles, nbytes int64) {
    fmt.Printf("%d files %.1f GB\n", nfiles, float64(nbytes)/1e9)
} 

This program will get stuck for a long time before printing its results.

$ go build gopl.io/ch8/du1
$ ./du1 $HOME /usr /bin /etc
213201 files  62.7 GB

The following variant of du will print the content intermittently, but the program progress information will be displayed only if the flag of - v is provided when calling. The background goroutine looping on the roots directory remains unchanged here. The main goroutine now uses a timer to generate events every 500ms, and then uses a select statement to wait for the file size message to update the total size data, or a timer event to print the current total size data. If the flag of - v is not passed in at runtime, the channel of tick will remain nil, so the case in select will be disabled.

// // gopl.io/ch8/du2
var verbose = flag.Bool("v", false, "show verbose progress messages")

func main() {
    // ...start background goroutine...
    
    // Print the results periodically
    var tick <-chan time.Time  // Unidirectional channel
    if *verbose {
        tick = time.Tick(500*time.Millisecond)
    }
    var nfiles, nbytes int64
loop:  // loop keyword appears for the first time in this book 
    for {
        select {
        case size, ok := <-fileSizes:
            if !ok {
                break loop  // fileSizes was closed
            }
            nfiles++
            nbytes += 1
        case <-tick:
            printDiskUsage(nfiles, nbytes)
        }
    }
    printDiskUsage(nfile, nbytes)  // final totals
}

Since our program no longer uses the range loop, the first select case must explicitly determine whether the channel of fileSizes has been closed. Here, the binary form received by the channel can be used. If the channel has been closed, the program will exit the loop directly. The break statement here uses the label break, which can terminate the select and for loops at the same time; If you break without tags, you will only exit the inner select loop, and the outer for loop will make it enter the next select loop.

Now the program will leisurely print the update stream for us:

$ go build gopl.io/ch8/du2
$ ./du2 -v $HOME /usr /bin /etc
28608 files  8.3 GB
54147 files  10.3 GB
93591 files  15.1 GB
127169 files  52.9 GB
175931 files  62.2 GB
213201 files  62.7 GB

However, the program will take a long time to finish. walkDir can be called concurrently to give full play to the parallel performance of the disk system. The following third version of du will create a new goroutine for each walkDir call. It uses sync.WaitGroup (§ 8.5) (I note: a synchronization semaphore, which appeared earlier) to count the still active walkDir calls. Another goroutine will close the fileSizes channel when the counter decreases to zero.

// gopl.io/ch8/du3

func main() {
    // deterimine roots, traverse each root of the file tree in parallel.
    fileSizes := make(chan int64)
    var n sync.WaitGroup
    for _, root := range roots {
        n.Add(1)  // Give 1 to initialization first, otherwise the following goroutine will directly close the channel
        go walkDir(root, &n, fileSizes)
    }
    go func() {
        n.Wait()
        close(fileSizes)
    }()
// select loop is the same as above
loop:
    for {
        select {
        case size, ok := <-fileSizes:
            if !ok {
                break loop  // fileSizes was closed
            }
            nfiles++
            nbytes += 1
        case <-tick:
            printDiskUsage(nfiles, nbytes)
        }
    }
    printDiskUsage(nfile, nbytes)  // final totals
}

func walkDir(dir string, n *sync.WaitGroup, fileSizes chan<- int64) {
    defer n.Done()
    for _, entry := range dirents(dir) {
        if entry.IsDir() {
            n.Add(1)
            subdir := filepath.Join(dir, entry.Name())
            go walkDir(subdir, n, fileSizes)  // Note that the original recursion here becomes a new goroutine
        } else {
            fileSizes <- entry.Size() 
        }
    }
}

Since this program will create hundreds of goroutine s during the peak period, we need to modify the dirents function and use the count semaphore to prevent it from opening too many files at the same time, just like the concurrent crawler in section 8.7:

// sema is a counting semaphore for limiting concurrency in dirents
var sema = make(chan struct{}, 20)

// dirents returns the entries of diertory dir
func dirents(dir string) []os.FileInfo {
    sema <- struct{}{}        // acquire token
    defer func() { <-sema }()  // The release token bracket indicates that the anonymous function is called directly after it is declared. Don't forget
    // ...
}

This version is several times faster than the previous one, although its specific efficiency is related to your running environment and machine configuration.

Concurrent exit

Sometimes we need to tell goroutine to stop what it is doing, such as a web service performing computing, but its client has disconnected from the server.

The Go language does not provide a method to terminate another goroutine in one goroutine, because this will cause the shared variables between goroutines to fall into an undefined state. In the rocket launch program in section 8.7, we sent a simple value to the channel named abort. In the goroutine of countdown, we will understand this value as our own exit signal. But what if we want to quit two or any more goroutines?

One possible means is to send as many events as goroutines to abort's channel to exit them. If some of these goroutines have exited, the number of events in our channel will be more than that of goroutines, which will directly block our sending. On the other hand, if these goroutines generate other goroutines, the number in our channel is too small, so some goroutines may not receive exit messages. In general, it is difficult to know how many goroutines are running at a certain time. In addition, when a goroutine receives a value from the abort channel, it will consume the value, so that other goroutines cannot see this information. In order to achieve our goal of exiting goroutine, we need a more reliable strategy to broadcast the message through a channel, so that goroutines can see the event message and know that it has happened after the event is completed.

Recall that we closed a channel and consumed all the sent values. The code after operating the channel can be executed immediately and will generate a zero value. We can extend this mechanism as our broadcast mechanism: instead of sending values to the channel, we can broadcast by closing a channel.

With a few minor changes, we can add the exit logic to the du program in the previous section. First, we create an exit channel. We don't need to send any value to this channel, but the closure should indicate that the program needs to exit. We also define a tool function, cancelled, which will poll the exit status when called.

// gopl.io/ch8/du4
var done = make(chan struct{})

func cancelled() bool {
    select {
    case <-done:
        return true
    default:
        return false    
    }
}

Next, we create a goroutine that reads content from the standard input stream. This is a typical program connected to the terminal. Whenever an input is read (for example, the user presses the Enter key), the goroutine will broadcast the cancellation message by closing the done channel.

// Cancel traversal when input is detected
go func() {
    os.Stdin.Read(make([]byte, 1))  // read a single byte
    close(done)
}()

Now we need to make our goroutine respond to the cancellation. In the main goroutine, we added the third case statement of select to try to receive content from the done channel. If the case is satisfied, it will be returned when the select arrives, but we need to "empty" the contents of the fileSizes channel before the end, and discard all values before the channel is closed. This can ensure that the call to walkDir is not blocked by sending information to fileSizes and can be completed correctly.

for {
    select{
    case <- done:
        // drain fileSizes to allow existing goroutines to finish
        for range fileSizes {
            // finish the job
        }
        return
    case size, ok := fileSizes:
        // ...    
    }
}

As soon as the goroutine of walkDir is started, it will poll the cancellation status. If the cancellation status is set, it will return directly without doing anything extra. In this way, we will change all goroutines created after the cancellation event to no operation.

func walkDir(dir string, n *sync.WaitGroup, fileSizes chan<- int64) {
    defer n.Done()  // Whether received
    if cancelled() {
        return
    }
    for _, entry := range dirents(dir) {
        // ...
    }
}

In the loop of walkDir function, polling the cancellation status can bring obvious benefits and avoid creating goroutine when the cancellation event occurs. Cancellation itself has some costs; To respond quickly, you need to make intrusive changes to the program logic. Make sure that there are no costly operations after the cancellation occurs. You may need to modify many parts of your code, but checking the cancellation event in some important places can also bring great benefits.

A simple performance analysis of this program can reveal that the bottleneck card obtains a semaphore in the dirents function. The following select allows this operation to be cancelled, and can reduce the cancellation delay from hundreds of milliseconds to tens of milliseconds.

func dirents(dir string) []os.fileInfo {
    select {
    case sema <- struct{}{}  // acquire token
    case <-done:
        return nil          // cancelled
    }
    defer func() { <-sema }() // release token
    // ...read directory...
}

Now when cancellation occurs, all background goroutines will stop quickly and the main function will return. Of course, when the main function returns, a program will exit, and we can't confirm that it has released all resources when the main function exits. Here is a convenient trick we can use: instead of returning directly from the main function, we call a panic, and then the runtime will dump the stack of each goroutine. If the main goroutine is the only remaining goroutine, it will clean up all its resources. However, if other goroutines do not exit, they may not be cancelled correctly, or they may be cancelled, but the cancellation operation will take a long time; Therefore, a survey here is still very necessary. We use panic to get enough information to verify our above judgment and see what the final situation is.

Example: chat service

We use a chat server to end the content of this chapter. This program allows some users to broadcast text messages to all other users through the server. There are four goroutines in this program. main and broadcaster are each a goroutine instance. Each client connection will have a goroutine of handleConn and clientWriter. Broadcaster is a good example of the use of select because it needs to handle three different types of messages.

The work of main goroutine shown below can be summarized as listen ing and accept ing connections from the client. For each connection, the program will establish a new handleConn goroutine, as we did in the concurrent echo server at the beginning of this chapter.

// gopl.io/ch8/chat

func main() {
    listener, err := net.Listen("tcp", "localhost:8000")
    if err != nil {
        log.Fatal(err)
    }
    go broadcaster()
    for {  // polling 
        conn, err := listener.Accept()
        if err != nil {
            log.Print(err)
            continue
        }
        go handleConn(conn)
    }
}

Then there is the goroutine of broadcaster. His internal variable clients records the collection of clients currently connected. The recorded content is the "qualification" information of the channel sent by the message of each client.

type client chan<- string  // an outgoing message channel

var (
    entering = make(chan client)  // Note that the client itself is a channel type, so entering is the channel that receives the client channel. This usage appears for the first time in this book
    leaving = make(chan client)
    messages = make(chan string)  // all incoming client messages
)

func broadcaster() {
    clients := make(map[client] bool)  // all connected clients 
                                    // Use the channel as the key of mao. All comparable types can be used as the key of map
    for {
        select {
        case msg := <-messages:
            // broadcast incoming message to all clients' outgoing message channels
            for cli := range clients {
                cli <- msg
            }
        case cli := <-entering:
            clients[cli] = true
        case cli := <-leaving:
            delete(clients, cli)
            close(cli)
        }
    }

}

The broadcaster listens to the global entering and leaving channels to get the arrival and departure events of the client. When it receives one of the events, it will update the clients collection. When the event is a leave behavior, it will close the client's message sending channel. The broadcaster will also listen to the global message channel, and all clients will send messages to this channel. When the broadcaster receives any message, it will broadcast it to all clients connected to the server.

Now let's look at the goroutine for each client. The handleConn function creates a message sending channel for its client and notifies the client of its arrival through the entering channel. Then it will read each line of text sent by the client, send these texts through the global message channel, and prefix each message with the sender to indicate the message identity. After the client sends, handleConn will notify the client to leave and close the connection through the leaving channel.

func handleConn(conn net.Conn) {
    ch := make(chan string)    // outgoing client messages
    go clientWriter(conn, ch)
    
    who := conn.RemoteAddr().String()
    ch <- "You are " + who
    messages <- who + "has arrived"
    entering <- ch
    
    input := bufio.NewScanner(conn)
    for input.Scan() {
        messages <- who + ": " + input.Text()
    }
    // note: ignoring pontential errors from input.Err()
    
    leaving <- ch
    messages <- who + " has left"
    conn.Close()
}

func clientWriter(conn net.Conn, ch <-chan string) {
    for msg := range ch {
        fmt.Fprintln(conn, msg)  // NOTE: ignoring network errors
    }
}

In addition, handleConn creates a goroutine of clientWriter for each client to receive broadcast messages in the channel sending messages to the client and write them to the client's network connection. The client's read cycle will terminate after the broadcaster receives the leaving notification and closes the channel.

Note to the author: in this example, the client (one-way string channel) is used to synchronize the information of multiple goroutine s, not strings. Therefore, it is necessary to close the channel (CLI).

The following shows how to use netcat to chat when the server has two active client connections and runs in two windows:

$ go build gopl.io/ch8/chat
$ go build gopl.io/ch8/netcat3
$ ./chat &
$ ./netcat3
You are 127.0.0.1:64208               $ ./netcat3
127.0.0.1:64211 has arrived           You are 127.0.0.1:64211
Hi!
127.0.0.1:64208: Hi!                  127.0.0.1:64208: Hi!
                                      Hi yourself.
127.0.0.1:64211: Hi yourself.         127.0.0.1:64211: Hi yourself.
^C
                                      127.0.0.1:64208 has left
$ ./netcat3
You are 127.0.0.1:64216               127.0.0.1:64216 has arrived
                                      Welcome.
127.0.0.1:64211: Welcome.             127.0.0.1:64211: Welcome.
                                      ^C
127.0.0.1:64211 has left"

When maintaining chat session s with n clients, the program will have 2n+2 concurrent goroutines, but the program does not need explicit locks (§ 9.2). The clients map is restricted to a separate goroutine, a broadcaster, so it cannot be accessed concurrently. The only variables shared by multiple goroutines are the instances of channel and net.Conn, both of which are concurrent and safe. We will explain more about constraints, concurrency security and the meaning of shared variables in goroutine in the next chapter.

Posted by BrandonE97 on Mon, 06 Dec 2021 17:42:35 -0800