Netty - DefaultEventExecutorGroup feature of concurrent failure cases

Keywords: Netty less

In order to improve the performance, if the ChannelHandler implemented by the user contains complex business logic or may cause synchronous blocking, it is often necessary to enhance the concurrency capability through the thread pool. There are two strategies for adding the thread pool: user-defined thread pool to execute the ChannelHandler, and Netty's EventExecutorGroup mechanism to execute the ChannelHandler in parallel.

Case recurrence
Analysis cannot be performed in parallel
Optimization strategy

Case reappearance

The server uses Netty's built-in DefaultEventExecutorGroup to call the Handler of the business in parallel. Relevant code:

public class ConcurrentPerformanceServer {

    static final EventExecutorGroup executor = new DefaultEventExecutorGroup(100);

    public static void main(String[] args) throws InterruptedException {
        EventLoopGroup bossGroup = new NioEventLoopGroup(1);
        EventLoopGroup workerGroup = new NioEventLoopGroup();

        try{
            ServerBootstrap b = new ServerBootstrap();
            b.group(bossGroup, workerGroup)
                    .channel(NioServerSocketChannel.class)
                    .childHandler(new ChannelInitializer<SocketChannel>() {

                        @Override
                        protected void initChannel(SocketChannel socketChannel) throws Exception {
                            ChannelPipeline p = socketChannel.pipeline();
                            p.addLast(executor, new ConcurrentPerformanceServerHandler());
                        }
                    });
            ChannelFuture f = b.bind(8888).sync();
            f.channel().closeFuture().sync();
        } finally {
            bossGroup.shutdownGracefully();
            workerGroup.shutdownGracefully();
        }


    }

}

In the initialization of the server, an eventexecutor group with 100 threads is created and bound to the Handler of the business. In this way, the I/O thread and the business logic processing thread can be isolated, and the Handler can be executed concurrently to improve performance.
In the Handler of business, it simulates the time-consuming of complex business operation through random sleep, and makes use of timed task thread pool to periodically count the processing performance of the server. Related code:

public class ConcurrentPerformanceServerHandler extends ChannelInboundHandlerAdapter {

    AtomicInteger counter = new AtomicInteger(0);
    static ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();

    @Override
    public void channelActive(ChannelHandlerContext ctx) throws Exception {
        scheduledExecutorService.scheduleAtFixedRate(() ->{
            int qps = counter.getAndSet(0);
            System.out.println("The Server QPS is : " + qps);
        },0, 1000, TimeUnit.MILLISECONDS);
    }

    @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        ((ByteBuf)msg).release();
        counter.incrementAndGet();
        Random random = new Random();
        TimeUnit.MILLISECONDS.sleep(random.nextInt(1000));
    }
}

Establish a long TCP connection between the client and the server, and press test the server at the speed of 100QPS. The code is as follows:

public class ConcurrentPerformanceClientHandler extends ChannelInboundHandlerAdapter {

    static ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();

    @Override
    public void channelActive(ChannelHandlerContext ctx) throws Exception {
        scheduledExecutorService.scheduleAtFixedRate(() ->{
            for(int i = 0; i < 100; i++){
                ByteBuf firstMessage = Unpooled.buffer(100);
                for(int k = 0; k < firstMessage.capacity(); k++){
                    firstMessage.writeByte((byte) i);
                }
                ctx.writeAndFlush(firstMessage);
            }
        },0, 1000, TimeUnit.MILLISECONDS);
    }
}

Test results:

The throughput is a single digit, where the business takes from 100ms to 1000ms, so it is suspected that the business Handler is not executed concurrently, but by a single thread. View the thread stack of the server:
It is found that only one thread of 100 business settings runs. Because a single thread executes the Handler that contains the business logic operation, the performance is not high.

Cannot perform analysis in parallel

Analyze the source code and view the code binding DefaultEventExecutorGroup to business ChannelHandler, as follows (DefaultChannelPipeline class):

public final ChannelPipeline addLast(EventExecutorGroup group, String name, ChannelHandler handler) {
        AbstractChannelHandlerContext newCtx;
        synchronized(this) {
            checkMultiplicity(handler);
            newCtx = this.newContext(group, this.filterName(name, handler), handler);
            this.addLast0(newCtx);
		//Omit subsequent code
    }

The specific code of newContext is to create a DefaultChannelHandlerContext class to return. During the creation process, the childExecutor(group) method will be called. Select an EventExecutor from the EventExecutor group and bind it to DefaultChannelHandlerContext. The relevant code is as follows:

 private EventExecutor childExecutor(EventExecutorGroup group) {
 
       Map<EventExecutorGroup, EventExecutor> childExecutors = this.childExecutors;
       if (childExecutors == null) {
           childExecutors = this.childExecutors = new IdentityHashMap(4);
       }

       EventExecutor childExecutor = (EventExecutor)childExecutors.get(group);
       if (childExecutor == null) {
           childExecutor = group.next();
           childExecutors.put(group, childExecutor);
       }

       return childExecutor;
     
 
}

Select an EventExecutor from the EventExecutor group through the group.next() method and store it in the EventExecutor map. For a specific TCP connection, the thread pool bound to the business ChannelHandler instance is DefaultEventExecutor, so the DefaultEventExecutor's execute method is called. Because the DefaultEventExecutor inherits from SingleThreadEventExecutor, the execute method is to put Runnable into the task queue and execute by a single thread.
Therefore, no matter how many threads in the consumer side are used to test a link concurrently, there is only one DefaultEventExecutor thread in the server side to execute the service ChannelHandler, which cannot be called in parallel.

Optimization strategy

  1. If the number of concurrent connections of all clients is less than the number of threads that need to be configured by the business, it is recommended to encapsulate the request message as a task and deliver it to the backend business thread pool for execution. ChannelHandler does not need to handle complex business logic or bind EventExecutorGroup.
public class ConcurrentPerformanceServerHandler extends ChannelInboundHandlerAdapter {

    AtomicInteger counter = new AtomicInteger(0);
    static ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
    static ExecutorService executorService = Executors.newFixedThreadPool(100);

    @Override
    public void channelActive(ChannelHandlerContext ctx) throws Exception {
        scheduledExecutorService.scheduleAtFixedRate(() ->{
            int qps = counter.getAndSet(0);
            System.out.println("The Server QPS is : " + qps);
        },0, 1000, TimeUnit.MILLISECONDS);
    }

    @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        ((ByteBuf)msg).release();
        executorService.execute(() ->{
            counter.incrementAndGet();
            Random random = new Random();
            try {
                TimeUnit.MILLISECONDS.sleep(random.nextInt(1000));
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        });
    }
}

Results:

The Server QPS is : 59
The Server QPS is : 55
The Server QPS is : 61
The Server QPS is : 68
The Server QPS is : 43
The Server QPS is : 78

QPS increased significantly.

  1. If the number of concurrent connections of all clients is greater than or equal to the number of threads that need to be configured by the business, you can bind the EventExecutorGroup for the business ChannelHandler and execute various business logic in the business ChannelHandler. The client creates 10 TCP connections, each of which sends 1 request message per second. At the same time, if the size of the DefaultEventExecutorGroup is set to 10, the overall QPS is also 10. Thread stack situation:

Published 12 original articles, won praise 11, visited 668
Private letter follow

Posted by jannoy on Thu, 20 Feb 2020 01:12:50 -0800