What is the heartbeat mechanism?
Heartbeat is about how to keep the connection alive by sending a simplest package when the client and server establish ESTABLISH status with each other, and how to monitor the availability of services on the other side.
The Role of Heart Packet
-
Keep alive
Q: Why does heartbeat keep connections alive? It is the most effective way to avoid network interruption in clusters or long connections?
A: The reason why we call it an important safeguard to avoid network interruption is that we know that public IP is a valuable resource. Once a connection is occupied for a long time and no data is sent, how can we afford to allocate public IP to this connection? This is the biggest waste of network resources, so it is basic. All of the NAT routers on the network regularly clear the mapping table entries that have not been transmitted for a long time. One is to recycle IP resources, the other is to release the memory resources of the NAT router itself, so that the problem arises. The connection is disconnected from the middle, and the two routers do not know that the other party is disconnected, and they will continue to send data. This will result in two results: a) the sender will receive RST packets of the NAT router, leading the sender to know that the connection is in progress. Break; b) The sender did not receive any response from NAT, which simply drop s the corresponding packet.
Usually what we test is that the second situation is that the client doesn't know that it should be disconnected, so the heartbeat can be associated with NAT. As long as we send heartbeat packets within the time that NAT considers a reasonable connection, the IP mapping table entries of the NAT will continue to keep connection will not be removed. It achieves the goal that the connection will not be interrupted. -
Detecting whether the service on the other end is available
TCP disconnection may not be instantaneous, or even detectable, or may be delayed for a long time. If the front end does not disconnect the TCP connection normally, four handshakes are not initiated, and the server can not know the client's dropout, then we need a heartbeat packet to detect whether the other end of the service is off. It's still alive and available.
Implementation of keeping alive mechanism based on TCP
Based on the keepalive mechanism of TCP, the specific TCP protocol stack is used to maintain long connections. For example, when creating channel in netty, specify SO_KEEPALIVE parameter to achieve:
Problems: Netty can only control the parameter SO_KEEPALIVE. Other parameters need to be read from sysctl of the system. The key is tcp_keepalive_time. The time interval for sending heartbeat packet detection is 7200s by default, that is, every two hours after idle. If the client is disconnected within 2 hours, the server also needs to maintain the connection for 2 hours, which wastes the server resources. In addition, for the scenario where real-time data transmission is required, the client is disconnected, and the server can not discover it for 2 hours. The server sends heartbeat detection, which may occur as follows:
(1) The connection is normal: the client still exists and the network connection is in good condition. At this point, the client will return an ACK. After receiving the ACK, the server resets the timer and sends the probe 2 hours later. If there is data transmission on the connection within 2 hours, it will be delayed by 2 hours on the basis of that time.
(2) Connection disconnection: the client is abnormally closed or the network is disconnected. In either case, the client will not respond. The server did not receive a response to detect it and sent keep-alive packet s repeatedly after a certain time (default is 1000ms) and a certain number of times.
(3) The client has crashed, but has restarted: In this case, the server will receive a response to its survival detection, but the response is a reset, which causes the server to terminate the connection.
Implementation of IdleStateHandler Based on Netty
What is IdleStateHandler
When the idle time (read or write) of the connection is too long, an IdleStateEvent event will be triggered. Then, you can handle the event by rewriting the userEventTrigged method in your ChannelInboundHandler.
How to use it?
IdleStateHandler is both an outbound processor and an inbound processor, inheriting Channel Duplex Handler. IdleStateHandler is usually added to the pipeline in the initChannel method. Then override the userEventTriggered method in your handler. When an idle event (read or write) occurs, it triggers the method and passes in a specific event.
At this point, you can try to write data to the target Sockett through the Context object and set up a listener to close the Socket if the transmission fails (Netty has prepared a ChannelFutureListener.CLOSE_ON_FAILURE listener for closing the Socket logic).
In this way, a simple heartbeat service is realized.
Source code analysis
Construction method
There are three construction methods for this class, which assign the following four attributes:
private final boolean observeOutput;// Whether to consider the slow outbound situation. The default value is false(No consideration. private final long readerIdleTimeNanos; // Read event free time, 0 disables event private final long writerIdleTimeNanos;// Write event free time, 0 disables events private final long allIdleTimeNanos; //Read or write idle time, 0 disables events
You can control the time of reading, writing, reading and writing timeouts separately in seconds, if 0 means no detection, so if all is 0, it is equivalent to not adding the IdleStateHandler, the connection is a normal short connection.
Handler Added Method
IdleStateHandler is implemented by initialize(ctx) method when an IdleStateHandler instance is created and added to ChannelPipeline to perform timing detection, while removing this timing detection when removing from ChannelPipeline or closing Channel, specifically in destroy().
public void handlerAdded(ChannelHandlerContext ctx) throws Exception { if (ctx.channel().isActive() && ctx.channel().isRegistered()) { this.initialize(ctx); } } public void handlerRemoved(ChannelHandlerContext ctx) throws Exception { this.destroy(); }
initialize
private void initialize(ChannelHandlerContext ctx) { switch (state) { case 1: case 2: return; } state = 1; initOutputChanged(ctx); lastReadTime = lastWriteTime = ticksInNanos(); if (readerIdleTimeNanos > 0) { // There schedule Method calls eventLoop Of schedule Method to add timed tasks to the queue readerIdleTimeout = schedule(ctx, new ReaderIdleTimeoutTask(ctx), readerIdleTimeNanos, TimeUnit.NANOSECONDS); } if (writerIdleTimeNanos > 0) { writerIdleTimeout = schedule(ctx, new WriterIdleTimeoutTask(ctx), writerIdleTimeNanos, TimeUnit.NANOSECONDS); } if (allIdleTimeNanos > 0) { allIdleTimeout = schedule(ctx, new AllIdleTimeoutTask(ctx), allIdleTimeNanos, TimeUnit.NANOSECONDS); } }
As long as the given parameter is greater than 0, a timing task is created, and each event is created. At the same time, the state state is set to 1 to prevent repeated initialization. Initialize "Monitor Outbound Data Properties" by calling initOutputChanged method. The code is as follows:
private void initOutputChanged(ChannelHandlerContext ctx) { if (observeOutput) { Channel channel = ctx.channel(); Unsafe unsafe = channel.unsafe(); ChannelOutboundBuffer buf = unsafe.outboundBuffer(); // Record data related to the outbound buffer. buf Object hash Code, and buf Number of remaining buffer bytes if (buf != null) { lastMessageHashCode = System.identityHashCode(buf.current()); lastPendingWriteBytes = buf.totalPendingWriteBytes(); } } }
run Method for Reading Events
The code is as follows:
protected void run(ChannelHandlerContext ctx) { long nextDelay = readerIdleTimeNanos; if (!reading) { nextDelay -= ticksInNanos() - lastReadTime; } if (nextDelay <= 0) { // Reader is idle - set a new timeout and notify the callback. // Used to cancel tasks promise readerIdleTimeout = schedule(ctx, this, readerIdleTimeNanos, TimeUnit.NANOSECONDS); boolean first = firstReaderIdleEvent; firstReaderIdleEvent = false; try { // Resubmit tasks IdleStateEvent event = newIdleStateEvent(IdleState.READER_IDLE, first); // Trigger user handler use channelIdle(ctx, event); } catch (Throwable t) { ctx.fireExceptionCaught(t); } } else { // Read occurred before the timeout - set a new timeout with shorter delay. readerIdleTimeout = schedule(ctx, this, nextDelay, TimeUnit.NANOSECONDS); } }
The initial value of next Delay is reader IdleTime Nanos, which is timeout seconds. If the detection time is not being read and the calculation time is not read: next Delay - = current time - last reading time, if less than 0, it means that the reader IdleTime Nanos on the left side is less than ID Le time (current time - last reading time), then the time out will occur.
Create an IdleState Event event with an enumerated value of READER_IDLE, and then call the channelIdle method to distribute it to the next ChannelInboundHandler, usually captured and processed by a user-defined ChannelInboundHandler
Generally speaking, each read operation records a time, the time of the timed task arrives, the interval between the current time and the last read time is calculated, and if the interval exceeds the set time, the UserEventTriggered method is triggered. It's that simple.
run Method for Writing Events
The logic of writing tasks is basically the same as that of reading tasks. The only difference is that there is a judgment for slower outbound data.
if (hasOutputChanged(ctx, first)) { return; }
If this method returns true, the trigger event operation is not performed, even when the time is up. Look at the implementation of this method:
private boolean hasOutputChanged(ChannelHandlerContext ctx, boolean first) { if (observeOutput) { // If the last write time is different from the last record time, indicating that the write operation has been done, the value is updated. if (lastChangeCheckTimeStamp != lastWriteTime) { lastChangeCheckTimeStamp = lastWriteTime; // However, if the method is modified between calls, no event is triggered. if (!first) { // #firstWriterIdleEvent or #firstAllIdleEvent return true; } } Channel channel = ctx.channel(); Unsafe unsafe = channel.unsafe(); ChannelOutboundBuffer buf = unsafe.outboundBuffer(); // If there is data in the outbound area if (buf != null) { // Objects that get the outbound buffer hashcode int messageHashCode = System.identityHashCode(buf.current()); // Get all the bytes of this buffer long pendingWriteBytes = buf.totalPendingWriteBytes(); // If the number of bytes is different from the previous one, the output will be changed. "Last buffer reference" And "Remaining Bytes" refresh if (messageHashCode != lastMessageHashCode || pendingWriteBytes != lastPendingWriteBytes) { lastMessageHashCode = messageHashCode; lastPendingWriteBytes = pendingWriteBytes; // If the write operation has not been performed, the task is written slowly and no idle events are triggered. if (!first) { return true; } } } } return false; }
- If the user has not set up the need to observe the outbound situation. Return false and continue executing the event.
- Conversely, continue downward, if the last write time is different from the last record time, indicating that the write operation has just been done, then update this value, but still need to determine the first value, if this value is still false, indicating that the write event was completed in the gap between the two method calls/or first. The next time you visit this method, you still don't trigger an event.
- If the above condition is not satisfied, the buffer object is taken out. If the buffer has no object, it means that there is no slow event written, then the idle event is triggered. Conversely, record the hashcode and the number of bytes remaining in the current buffer object, and then compare with the previous one. If any one is not equal, the data is changing, or the data is slowly written out. Then update these two values and leave them at the next judgement.
- Continue to judge first, if it's fasle, which means that this is the second call, and there's no need to trigger the idle event.
run method for all events
This class is called AllIdleTimeoutTask, which means that this monitors all events. When read-write events occur, they are recorded. Code logic is basically the same as writing events, except here:
long nextDelay = allIdleTimeNanos; if (!reading) { // The current time minus the last time to write or read, if greater than 0, indicates that the time is overdue. nextDelay -= ticksInNanos() - Math.max(lastReadTime, lastWriteTime); }
The time calculation here is based on the maximum of the read and write events. Then, like writing events, judge whether slow writing has occurred. Finally, the ctx.fireUserEventTriggered(evt) method is called.
Usually this is the most used. The construction method is generally:
pipeline.addLast(new IdleStateHandler(0, 0, 30, TimeUnit.SECONDS));
Reading and writing are both 0 to disable, and 30 to trigger events if no task reading and writing events occur within 30 seconds. Note that when not zero, these three tasks overlap.