Golang benchmark - Test code performance

Keywords: Go unit testing software testing Testing


Performance is a very important indicator when optimizing code or determining algorithm selection. For example, I need to use hash algorithm for signature when making requirements recently. At first, I wanted to use md5 without thinking about it, and then a few big words popped up on the IDE: md5 has known security problems, so it is recommended to replace it with other algorithms. Then consider changing SHA256. The question is, will another algorithm lead to a sharp decline in performance?

At this time, the built-in Benchmark function of Go language is very convenient. Note that the Benchmark test is greatly affected by the environment. Try to ensure the stability of the environment. During the test, try not to do other performance consuming things at the same time, and do not turn on the energy-saving mode.

Go benchmark

Go benchmarks are based on go unit tests. The benchmark is placed under the same package as the single test, which is still xxxx_test.go.

Function name Convention

Different from the single test function, the Benchmark function starts with Benchmark, and the signature is like this:

func BenchmarkXxx(b *testing.B) { ... }



The basic principle of benchmarking is to let the user implement a cycle to call the algorithm under test. The outer function passes in the number of cycles to be cycled. After making the total time-consuming / cycle times tend to be stable through continuous attempts, it can be considered that this value is the time-consuming of the measured method in this environment. Of course, it can also be ended in advance by specifying the number of cycles.

The outer function passes the number of cycles to us through b.N, so there should be a cycle like this in the benchmark:

for n := 0; n < b.N; n++ {
	// Algorithm under test


Before starting the benchmark, there may be some time-consuming preparations. At this time, you can restart the timing through b.ResetTimer()

StartTimer() and StopTimer()

If you need to do time-consuming preparations before and after each cycle, you can use the combination of b.StartTimer() and b.StopTimer() to ignore the time of those preparation stages.

Other methods

testing.b also provides those functions provided in normal tests, such as Fail to make the test Fail, Run to Run sub tests, and Helper to mark auxiliary functions.


The go test instruction that runs a single test ignores the benchmark test by default.
In order for go test to run the benchmark, we need to directly specify the - bench tag, whose parameter is a regular expression. If you want to run all benchmarks under the current package:

go test -bench=.

This will run all benchmarks under the current path.

If you want to run all tests with names containing go or lang:

go test -bench="go|lang"

In addition, since the single test will be run by default, in order to prevent the output of the single test from affecting the output result, you can deliberately specify a nonexistent single test function name:

go test -bench="go|lang" -run=noExist

There are other relevant directives:

-benchmemThe statistical information of the memory allocation of the test function is displayed when the performance test is performed, which is equivalent to calling b.ReportAllocs() in the benchmark test.
-count nHow many times does it run? The default is 1 time
-timeout tThe timeout time will exceed panic, which is 10 minutes by default
-cpuSpecify GOMAXPROCS. You can pass in a list through
-benchtimeSpecify execution time (e.g. 2s) or specific times (e.g. 10x)


Continuing with our story, I now need to test the efficiency difference between generating signatures with md5 and SHA256. In practice, I generate signatures for a uuid. So the benchmark is written as follows:


package util

import (

func BenchmarkSha256(b *testing.B) {
	target := []byte(uuid.New().String())
	for n := 0; n < b.N; n++ {

func BenchmarkMd5(b *testing.B) {
	target := []byte(uuid.New().String())
	for n := 0; n < b.N; n++ {


Run test:

admin@......:util$ go test -bench=. -run=none          
goos: darwin
goarch: amd64
pkg: ....../util
BenchmarkSha256-12       6331168               184 ns/op
BenchmarkMd5-12         11321952               103 ns/op
ok      ....../util    4.392s

See the - 12 after the function in the report? This represents the value of GOMAXPROCS corresponding to the runtime. The next 6331168 represents the last given N value, that is, the number of cycles in which the result is considered credible. The last 184 ns/op represents that each cycle takes 184 nanoseconds.

We can see that the calculation time of SHA256 in the uuid string scenario is almost 1.8 times that of Md5, which is acceptable.

You can use a few more options to see

admin@......:util$ go test -bench=. -run=none -count=3 -cpu=2,4 -benchmem
goos: darwin
goarch: amd64
pkg: ....../util
BenchmarkSha256-2        6375644               178 ns/op               0 B/op          0 allocs/op
BenchmarkSha256-2        6575397               180 ns/op               0 B/op          0 allocs/op
BenchmarkSha256-2        6646250               182 ns/op               0 B/op          0 allocs/op
BenchmarkSha256-4        6566167               183 ns/op               0 B/op          0 allocs/op
BenchmarkSha256-4        6476132               190 ns/op               0 B/op          0 allocs/op
BenchmarkSha256-4        6327001               192 ns/op               0 B/op          0 allocs/op
BenchmarkMd5-2          10067620               107 ns/op               0 B/op          0 allocs/op
BenchmarkMd5-2          11456790               104 ns/op               0 B/op          0 allocs/op
BenchmarkMd5-2          11314701               106 ns/op               0 B/op          0 allocs/op
BenchmarkMd5-4          10312569               105 ns/op               0 B/op          0 allocs/op
BenchmarkMd5-4          10565292               102 ns/op               0 B/op          0 allocs/op
BenchmarkMd5-4          11695822               103 ns/op               0 B/op          0 allocs/op
ok      ....../util    17.036s

In the above test, we specified to run 3 tests, using 2 cores and 4 cores respectively, and output the data of memory. It can be seen that several tests fluctuate, and the increase of the number of cores has no soft effect on the running speed. It should be that both algorithms are called serially. Moreover, both algorithms do not need to allocate memory.


When you can't determine whether your optimization is effective or not and don't know how to write performance at that time, you might as well try benchmarking to measure performance!

Posted by benphelps on Tue, 14 Sep 2021 12:03:01 -0700