Solve the problem of creating too many threads for Go file operation

A remote file performance test service implemented using Go. It is used to record the tuning process and solve the problem of too many threads created by Go processing blocked file IO.

1. Configuration

There is a config.json configuration file on the client and server respectively, as described below:

{
 "Port":"9999", // Service port number
 "UserProtoMsg":true, // Whether to use protobuf protocol, otherwise use json protocol
 "PageSize":4096, // Maximum data read per connection
 "MaxOpenFiles":102400, // Sets the maximum number of file handles that can be used by the service process
 "LogCountPerFile":2000000, // Maximum number of log lines recorded in a single log file
 "DataDir":"xx", // Server is the upper directory where files are stored
 "UserPoolIoSched":"G", // c: Enable c thread pool to process files; G: Enable go collaboration pool; Other: do not use IO pool
 "IoThreads":16, // Enable the number of threads / coroutines in the IO pool
 "PriorIoThreads":3, // Number of high priority cgo IO threads enabled
 "SetCpuAffinity":true, // Set CPU adhesion of IO threads in cgo
 "WaitingQueueLen":1000000 // Enable buffered task queue length for IO pool
}

2. Server

Code in fs_server directory. The server provides file upload, download, query, protocol performance test and other services.

Usage:

$ make clean
$ make
$ ./FsServer 
{"Port":"9999","UserProtoMsg":true,"PageSize":4096,"MaxOpenFiles":102400,"LogCountPerFile":2000000,"DataDir":"/home/stephen/devcloud/DATADIR","UserPoolIoSched":"C","IoThreads":32,"PriorIoThreads":0,"SetCpuAffinity":true,"WaitingQueueLen":1000000}
current thread id: 140694714423040, CPU 0
current thread id: 140694689244928, CPU 3
current thread id: 140694706030336, CPU 1
current thread id: 140694337353472, CPU 4
current thread id: 140694320568064, CPU 1
current thread id: 140694697637632, CPU 2
...
Use C io thread pool.......
raise pprof http server....

3. Client

Code in fs_client directory. The client is mainly used to test the performance of various interfaces, and does not provide normal client functions. The client only uses memory for uploading and downloading, and does not actually read and write files from the local disk. This mainly refers to the performance test method of fastdfs. If the network bandwidth of the client and server is not as fast as the local disk, it is recommended to run the client and server on the same machine to reduce the impact of the network.

Usage:

$ make clean
$ make
$ ./FsClient
{"Port":"9999","UserProtoMsg":true,"PageSize":4096,"MaxOpenFiles":102400,"LogCountPerFile":2000000,"DataDir":"","UserPoolIoSched":"","IoThreads":0,"PriorIoThreads":0,"SetCpuAffinity":false,"WaitingQueueLen":0}
[clean/bench/upload/download/exist/delete] corotine_num loop_num [file_size]

Running example:

$ ./FsClient upload 500 2000000 5000
{"Port":"9999","UserProtoMsg":true,"PageSize":4096,"MaxOpenFiles":102400,"LogCountPerFile":2000000,"DataDir":"","UserPoolIoSched":"","IoThreads":0,"PriorIoThreads":0,"SetCpuAffinity":false,"WaitingQueueLen":0}
Parmas:  &{3 500 2000000 5000}
Total success tasks:  2000000 , cost:  252746 ms.
qps:  7913.0826996272945

Test interface:
- Clean: used to clean up the files that may be left in the performance test
- bench: test protocol performance
- Upload: test the upload function performance
- Download: Test download performance
- exist: test query performance
- delete: Test deletion performance
corotine_num: concurrent number, that is, the number of coprocessors used
loop_num: total number of requests
file_size: the size of the request file (only used for the upload interface, others can be left blank)

4. Solve the IO thread problem of Go

Partners who have used the Go scheduling mechanism should know that for any file in Linux, if it is in the F of the file_ The poll function is not implemented in OP, so it is impossible to use epoll, poll and other multiplexing mechanisms. Therefore, Go's approach is to create a new thread to serve the blocked IO separately for those disk files that do not implement the poll method if the IO operation is blocked for a period of time and affects the operation of other coprocesses (G) in the same thread (M). For the remote file service system, the bottleneck lies in disk performance, that is, file IO operations, which eventually leads to the creation of a large number of threads, basically connecting one thread to another. By default, Go supports the creation of 10000 threads, that is, to exceed 1w concurrency, you need to modify MaxProcs in the runtime. However, if you want to support more, the system may have ulimit -u more restrictions.

The difficulty is that the Go scheduling mechanism is not suitable for scenarios with a large number of file IO.

This project solves this problem by combining thread pool, cgo, system performance analysis tool and pprof killer of Go.

Main solutions:

Encapsulate the file IO operation task, and encapsulate the interface in Go that will lead to the creation of threads independently;
Use channel to wait for the completion of file IO operation, and hand over the execution right of the thread to other coprocessors to avoid creating threads due to too long waiting time for IO operation;
Execute file IO task:
- (1) Combined with cgo, put the file IO operation task into the thread pool under glibc to complete; Call glibc's interface to complete disk IO;
- (2) Put the task into the process pool for execution.

4.1 data examples

PAGE_SIZE = 4k, request file 50k, number of requests 200000500, concurrent

Implementation method / data	Time consuming ms	qps	Number of system threads created
50K does not use IO pooling	169934	1176	512
Go collaboration pool: 32	173167	1154	48
C thread pool: 32. CPU bonding is used instead of priority collaboration	191358	1045	50
C thread pool: 32, no CPU bonding, no priority collaboration	181389	1102	50

5. Performance test records

Virtual machine specifications / log performance

fio disk performance test

rename optimization

Protocol resolution performance

Raw Go IO interface performance

IO pooling performance

6. Conclusion

The idea of using IO pool comes from the fastdfs project. If you are interested in fastdfs (implemented in C), you can take a look. There is also a fastdfs implemented by go in github, but I haven't seen the way to solve the problem of creating too many threads in the code. Maybe I didn't read it carefully enough.

Based on the implementation of Go coroutines, basically wait for these coroutines to upgrade them to threads when performing IO operations, so that the purpose can be achieved. The advantages of this implementation are: simple implementation and clear thinking;

For the IO thread pool based on cgo implementation, it is necessary to consider the call overhead of inserting IO operation tasks from Go to the task queue in C, lock competition, thread pool implementation, notification to callers after task completion, etc. The advantages are: let yourself have a certain sense of achievement, and you can control thread properties (priority, CPU adhesion), life cycle, etc. in C code.

I hope my project can bring you some help. I also look forward to your advice and opinions.

The github code is here

Posted by sarabjit on Mon, 18 Oct 2021 11:34:54 -0700

Programmer Group

Solve the problem of creating too many threads for Go file operation

Solve the problem of creating too many threads for Go file operation

1. Configuration

2. Server

3. Client

4. Solve the IO thread problem of Go

4.1 data examples

5. Performance test records

6. Conclusion

Hot Keywords