RocketMQ message retry

Keywords: Programming less Java network Database

RocketMQ uses ACK confirmation mechanism to ensure that messages are consumed. When consumers consume messages, they need to give Broker feedback about message consumption, success or failure. For failed messages, they will be re-consumed after a period of time according to the internal algorithm. Will it continue to consume? How does it work internally? Let's make a concrete analysis.

1. Analysis

Let's analyze what scenarios in which message retries occur

  • The business consumer explicitly returns ConsumeConcurrentlyStatus.RECONSUME_LATER, that is, when the consumer processes the message business, his business logic explicitly requests that the message be re-sent.
  • Business consumers actively/passively throw anomalies
  • Messages have not been confirmed due to network problems

Note that when an exception is thrown, as long as we explicitly throw an exception or not explicitly throw an exception in the business logic, the broker will also re-deliver the message. If the exception is caught by the business, the message will not be retried. Therefore, for businesses requiring retries, consumers should pay attention to returning ConsumeConcurrentlyStatus. RECONSUME_LATER or null when capturing exceptions, output logs and print the current number of retries. It is recommended to return to ConsumeConcurrentlyStatus.RECONSUME_LATER.

Broker will retry automatically only when the consumption mode is MessageModel. CLUSTERING (cluster mode). It will not retry for broadcast messages.

For a message that has been unable to consume successfully, RocketMQ will default to 16 after reaching the maximum number of retries and deliver the message to the dead letter queue. Then we need to pay attention to the dead letter queue and do manual business compensation for the messages in the dead letter queue.

The number of retries is in the delay level, and the interval between retries is different as the number of retries increases.

private String messageDelayLevel = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h";

Message DelayLevel = 1s 5S 10s 30s 1m 2m 3M 4m 5m 6m 7m 9m 10m 30m 1H 2H can be configured in brocker to customize its time level.

2. Code Implementation

2.1. Producers

public class Producer {
    public static void main(String[] args) throws MQClientException, InterruptedException {

        DefaultMQProducer producer = new DefaultMQProducer("gumx_test_delay");
        producer.setNamesrvAddr("10.10.15.205:9876;10.10.15.206:9876");
        producer.start();
        for (int i = 0; i < 1; i++) {
            try {
                Message msg = new Message("TopicDelayTest" /* Topic */,
                    "TagA" /* Tag */,
                    ("Test latency messages==Hello RocketMQ ").getBytes(RemotingHelper.DEFAULT_CHARSET) /* Message body */
                );
                SendResult sendResult = producer.send(msg);
                System.out.printf("%s%n", sendResult);
            } catch (Exception e) {
                e.printStackTrace();
                Thread.sleep(1000);
            }
        }
        producer.shutdown();
    }
}

2.2. Consumers

public class Consumer {

    public static void main(String[] args) throws InterruptedException, MQClientException {
        DefaultMQPushConsumer consumer = new DefaultMQPushConsumer("gumx_test_delay_1");
        consumer.setNamesrvAddr("10.10.15.205:9876;10.10.15.206:9876");
        consumer.setConsumeFromWhere(ConsumeFromWhere.CONSUME_FROM_FIRST_OFFSET);
        consumer.subscribe("TopicDelayTest", "*");
        consumer.registerMessageListener(new MessageListenerConcurrently() {
            public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs,
                ConsumeConcurrentlyContext context) {
            	try{
            		
	            	SimpleDateFormat sf = new SimpleDateFormat("YYYY-MM-dd HH:mm:ss");
	                System.out.printf("Current time:%s Delay level:%s The number of retries:%s Theme:%s Delayed topics:%s Message content:%s %n",sf.format(new Date()),msgs.get(0).getDelayTimeLevel(),msgs.get(0).getReconsumeTimes(),msgs.get(0).getTopic(),msgs.get(0).getProperties().get("REAL_TOPIC"), new String(msgs.get(0).getBody(),"UTF-8"));
	                int i = 1/0; //Deliberate error reporting
	                return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
            	}catch (Exception e) {
            		return ConsumeConcurrentlyStatus.RECONSUME_LATER;
				}
            }
        });
        consumer.start();
        System.out.printf("Consumer Started.%n");
    }
}

View the results:

The analysis results show that the time rule 1s 5S 10s 30s 1m 2m 3M 4m 5m 6m 8m 9m 10m 30m 1H 2H is the corresponding delay level of default configuration. We found a problem with the delay level from 0 to 3. We know that the default delay level of normal messages is 0. The second is the message that really starts retrying. Why start at 3? Next, we will analyze the source code and explore it.

3. Source code analysis

Let's take a look at the process first.

3.1. Client Code Analysis

In RocketMQ's client source code DefaultMQPushConsumerImpl.java, the retry mechanism is explained. The source code is as follows:

private int getMaxReconsumeTimes() {
    // default reconsume times: 16
    if (this.defaultMQPushConsumer.getMaxReconsumeTimes() == -1) {
        return 16;
    } else {
        return this.defaultMQPushConsumer.getMaxReconsumeTimes();
    }
}

Consumers can set their maximum consumption times to MaxReconsume Times. If not, the default consumption times are 16 to the maximum retry times. Let's look at the client code.

ConsumeMessageConcurrentlyService's internal class method ConsumeRequest.run() entry method

long beginTimestamp = System.currentTimeMillis();
boolean hasException = false;
ConsumeReturnType returnType = ConsumeReturnType.SUCCESS;
try {
    ConsumeMessageConcurrentlyService.this.resetRetryTopic(msgs);
    if (msgs != null && !msgs.isEmpty()) {
        for (MessageExt msg : msgs) {
            MessageAccessor.setConsumeStartTimeStamp(msg, String.valueOf(System.currentTimeMillis()));
        }
    }
    status = listener.consumeMessage(Collections.unmodifiableList(msgs), context);
} catch (Throwable e) {
    log.warn("consumeMessage exception: {} Group: {} Msgs: {} MQ: {}",
        RemotingHelper.exceptionSimpleDesc(e),
        ConsumeMessageConcurrentlyService.this.consumerGroup,
        msgs,
        messageQueue);
    hasException = true;
}

Getting the status of this batch of messages calls the ConsumeMessageConcurrentlyService.processConsumeResult() core method to process the status information it returns.

//ackIndex = Integer.MAX_VALUE
int ackIndex = context.getAckIndex();
if (consumeRequest.getMsgs().isEmpty())
    return;
//Consumption status
switch (status) {
    case CONSUME_SUCCESS:
    	//Setting Subscripts for Success Messages
        if (ackIndex >= consumeRequest.getMsgs().size()) {
            ackIndex = consumeRequest.getMsgs().size() - 1;
        }
        int ok = ackIndex + 1;
        int failed = consumeRequest.getMsgs().size() - ok;
        this.getConsumerStatsManager().incConsumeOKTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), ok);
        this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(), failed);
        break;
    case RECONSUME_LATER:
        ackIndex = -1;
        this.getConsumerStatsManager().incConsumeFailedTPS(consumerGroup, consumeRequest.getMessageQueue().getTopic(),
            consumeRequest.getMsgs().size());
        break;
    default:
        break;
}

switch (this.defaultMQPushConsumer.getMessageModel()) {
    case BROADCASTING:
        for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
            MessageExt msg = consumeRequest.getMsgs().get(i);
            log.warn("BROADCASTING, the message consume failed, drop it, {}", msg.toString());
        }
        break;
    case CLUSTERING:
        List<MessageExt> msgBackFailed = new ArrayList<MessageExt>(consumeRequest.getMsgs().size());
        for (int i = ackIndex + 1; i < consumeRequest.getMsgs().size(); i++) {
            MessageExt msg = consumeRequest.getMsgs().get(i);
            //Feedback broker on Consumption Progress
            boolean result = this.sendMessageBack(msg, context);
            if (!result) {
                msg.setReconsumeTimes(msg.getReconsumeTimes() + 1);
                msgBackFailed.add(msg);
            }
        }
        if (!msgBackFailed.isEmpty()) {
            consumeRequest.getMsgs().removeAll(msgBackFailed);

            this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue());
        }
        break;
    default:
        break;
}

If the return result is CONSUME_SUCCESS, then ackIndex = msg.size() - 1, then look at the condition of sending sendMessageBack loop, for (int I = ackIndex + 1; I < msg. size (); you can see from this that if the message succeeds, there is no need to send sendMsgBack to broker. If the return result is RECONSUME_LATER, then ackIndex = 1, then all the messages will be sent to Broker, that is, the messages will have to be re-consumed.

If the ack message fails, it will be delayed by 5S and re-consumed at the consumer side. First, consumers send ACK messages to Broker. If it succeeds, the retry mechanism is handled by broker. If the ACK message fails, the task will be directly on the consumers'side. The consumer task will be re-consumed again after the default performance of 5S.

1) Set the value of ackIndex according to the consumption result 2) If the consumption fails, send sendMessageBack according to the consumption mode (cluster consumption or broadcast consumption), broadcast mode, discard directly, and cluster mode. 3) Update the progress of news consumption. Whether the consumption is successful or not, the success of these news consumption is actually to modify the consumption offset. (Failed, retry, create new messages)

this.submitConsumeRequestLater(msgBackFailed, consumeRequest.getProcessQueue(), consumeRequest.getMessageQueue()) sends the failure of consumption status to broker, and then puts the failure message into the msgBackFailed collection for consumption after 5 seconds.

private void submitConsumeRequestLater(final List<MessageExt> msgs, 
		final ProcessQueue processQueue,  final MessageQueue messageQueue) {
    this.scheduledExecutorService.schedule(new Runnable() {
        @Override
        public void run() {
            ConsumeMessageConcurrentlyService.this.submitConsumeRequest(msgs, processQueue, messageQueue, true);
        }
    }, 5000, TimeUnit.MILLISECONDS);
}

3.2. Server Code Analysis

When the message consumption fails, the client will feedback its consumption status, and the Broker server will receive its feedback on the processing logic code of the message consumption status in the endMessageProcessor. consumerSendMsgBack () method. Let's look at some of the core source codes:

//Setting theme% RETRY% + consumerGroup
String newTopic = MixAll.getRetryTopic(requestHeader.getGroup());
int queueIdInt = Math.abs(this.random.nextInt() % 99999999) % subscriptionGroupConfig.getRetryQueueNums();
int topicSysFlag = 0;
if (requestHeader.isUnitMode()) {
    topicSysFlag = TopicSysFlag.buildSysFlag(false, true);
}
TopicConfig topicConfig = this.brokerController.getTopicConfigManager().createTopicInSendMessageBackMethod(
    newTopic,
    subscriptionGroupConfig.getRetryQueueNums(),
    PermName.PERM_WRITE | PermName.PERM_READ, topicSysFlag);
if (null == topicConfig) {
    response.setCode(ResponseCode.SYSTEM_ERROR);
    response.setRemark("topic[" + newTopic + "] not exist");
    return response;
}
if (!PermName.isWriteable(topicConfig.getPerm())) {
    response.setCode(ResponseCode.NO_PERMISSION);
    response.setRemark(String.format("the topic[%s] sending message is forbidden", newTopic));
    return response;
}
MessageExt msgExt = this.brokerController.getMessageStore().lookMessageByOffset(requestHeader.getOffset());
if (null == msgExt) {
    response.setCode(ResponseCode.SYSTEM_ERROR);
    response.setRemark("look message by offset failed, " + requestHeader.getOffset());
    return response;
}

final String retryTopic = msgExt.getProperty(MessageConst.PROPERTY_RETRY_TOPIC);
if (null == retryTopic) {
    MessageAccessor.putProperty(msgExt, MessageConst.PROPERTY_RETRY_TOPIC, msgExt.getTopic());
}
msgExt.setWaitStoreMsgOK(false);
//Delay level
int delayLevel = requestHeader.getDelayLevel();

int maxReconsumeTimes = subscriptionGroupConfig.getRetryMaxTimes();
if (request.getVersion() >= MQVersion.Version.V3_4_9.ordinal()) {
    maxReconsumeTimes = requestHeader.getMaxReconsumeTimes();
}
//The maximum number of retries is equal to the maximum number of messages that are dropped into the dead-letter queue.
if (msgExt.getReconsumeTimes() >= maxReconsumeTimes
    || delayLevel < 0) {
	//Reset its theme:% DLQ% + consumerGroup
    newTopic = MixAll.getDLQTopic(requestHeader.getGroup());
    queueIdInt = Math.abs(this.random.nextInt() % 99999999) % DLQ_NUMS_PER_GROUP;
    //Basic parameter setting
    topicConfig = this.brokerController.getTopicConfigManager().createTopicInSendMessageBackMethod(newTopic,
        DLQ_NUMS_PER_GROUP,
        PermName.PERM_WRITE, 0
    );
    if (null == topicConfig) {
        response.setCode(ResponseCode.SYSTEM_ERROR);
        response.setRemark("topic[" + newTopic + "] not exist");
        return response;
    }
} else {
	//When the first delayLevel==0, the next default delay level is 3
    if (0 == delayLevel) {
        delayLevel = 3 + msgExt.getReconsumeTimes();
    }
    msgExt.setDelayTimeLevel(delayLevel);
}

Determine whether the current number of retries of messages is greater than or equal to the maximum number of retries. If the maximum number of retries is reached, or the configured retries level is less than 0, the Topic is recreated. The rule is% DLQ% + consumerGroup, and the message send to the dead-letter queue is subsequently processed.

The normal message will enter the else branch. For the first retry message, the default delayLevel is 0. rocketMQ will give the level + 3, that is to say, if no configuration delay level is displayed, the first retry of message consumption will be delayed, that is, the retry initiated by the third level, that is, the distance from the first delivery. The default rule for the theme is **% RETRY% + consumerGroup **.

When the delay level setting is completed, the number of retries to refresh the message is increased by 1. broker brushes the message. The logic is as follows:

MessageExtBrokerInner msgInner = new MessageExtBrokerInner();
msgInner.setTopic(newTopic);
msgInner.setBody(msgExt.getBody());
msgInner.setFlag(msgExt.getFlag());
MessageAccessor.setProperties(msgInner, msgExt.getProperties());
msgInner.setPropertiesString(MessageDecoder.messageProperties2String(msgExt.getProperties()));
msgInner.setTagsCode(MessageExtBrokerInner.tagsString2tagsCode(null, msgExt.getTags()));

msgInner.setQueueId(queueIdInt);
msgInner.setSysFlag(msgExt.getSysFlag());
msgInner.setBornTimestamp(msgExt.getBornTimestamp());
msgInner.setBornHost(msgExt.getBornHost());
msgInner.setStoreHost(this.getStoreHost());
//The number of retries to refresh the message is added to the current number
msgInner.setReconsumeTimes(msgExt.getReconsumeTimes() + 1);

String originMsgId = MessageAccessor.getOriginMessageId(msgExt);
MessageAccessor.setOriginMessageId(msgInner, UtilAll.isBlank(originMsgId) ? msgExt.getMsgId() : originMsgId);
//Persisting messages to commitlog files
PutMessageResult putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);

So what is msgInner, that is, Message Ext Broker Inner, that is, for retry messages, rocketMQ creates a new Message Ext Broker Inner object, which actually inherits Message Ext.

We continue to enter the message brush logic, the putMessage(msgInner) method. The implementation class is DefaultMessageStore.java. The core code is as follows:

PutMessageResult result = this.commitLog.putMessage(msg);

Focus on this.commitLog.putMessage(msg); this code, through commitLog, we can think of it as a real brush operation, that is, the message is persisted.

We go on to commitLog's putMessage method and see the following core code snippets:

final int tranType = MessageSysFlag.getTransactionValue(msg.getSysFlag());
if (tranType == MessageSysFlag.TRANSACTION_NOT_TYPE
    || tranType == MessageSysFlag.TRANSACTION_COMMIT_TYPE) {
    // Whether the delay level of Delay Delivery message is greater than 0
    if (msg.getDelayTimeLevel() > 0) {
    	//If the delay level of a message is greater than the maximum delay level, it is set to the maximum delay level.
        if (msg.getDelayTimeLevel() > this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel()) {
            msg.setDelayTimeLevel(this.defaultMessageStore.getScheduleMessageService().getMaxDelayLevel());
        }
        //Set the message topic to SCHEDULE_TOPIC_XXXX
        topic = ScheduleMessageService.SCHEDULE_TOPIC;
        //Set the message queue to the ID of the delayed message queue
        queueId = ScheduleMessageService.delayLevel2QueueId(msg.getDelayTimeLevel());
        //The original topic and message queue of the message are stored in attributes
        MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
        MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
        msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));
        msg.setTopic(topic);
        msg.setQueueId(queueId);
    }
} 

As you can see, if the retry message is retried and true is returned when the delay level is judged, the branch logic is entered. Through this logic, we can know that rocketMQ does not retry the message from the original queue, but creates a new Topic to store the message. That's SCHEDULE_TOPIC in the code. Let's see what it is.

public static final String SCHEDULE_TOPIC = "SCHEDULE_TOPIC_XXXX";

Subject name changed to SCHEDULE_TOPIC_XXXX.

Here we can draw a conclusion:

For all consumer failure messages, rocketMQ will reprint the retry message (that is, the Message Ext BrokerInner object mentioned above), then deliver it to the queue under the topic SCHEDULE_TOPIC_XXXX XX, and then schedule the retry by a timing task, and the period of retry is in line with what we mentioned above. The delayLevel period, that is:

private String messageDelayLevel = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h";

At the same time, in order to ensure that the message can be found, the original topic will also be stored in properties, that is, the following code

MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_TOPIC, msg.getTopic());
MessageAccessor.putProperty(msg, MessageConst.PROPERTY_REAL_QUEUE_ID, String.valueOf(msg.getQueueId()));
msg.setPropertiesString(MessageDecoder.messageProperties2String(msg.getProperties()));  

The original top and queue id are backed up here.

Reference RocketMQ Delay Message In this article, there is a specific analysis. Message retry and processing of delayed messages are the same as creating a topic queue of delayed messages. The message sent by the background start-up timed task timed scan is sent to the original topic and message queue for consumption, but the theme of its retry message is% RETRY_TOPIC%+consumerGroup and its queue has only one queue 0, and the delay message is sent to the original topic queue as the ordinary message.

3.3. Dead Letter Business Processing

In the default processing mechanism, if we only consume the message repeatedly, the message will enter the dead letter queue after reaching the maximum number of retries.

We can also define the maximum number of retries consumed according to the needs of the business, and determine whether the current number of retries equals the threshold of the maximum number of retries.

For example, if we think that there is an exception in the current business after three retries and it is meaningless to continue retrying, then we can submit the current message and return it to the broker status ConsumeConcurrentlyStatus.CONSUME_SUCCES so that the message will not be retransmitted, and at the same time save the message into the dead message customized by our business. Tables, the business parameters are stored in the database, and the related operations are compensated by inquiring the dead letter table.

RocketMQ processes messages that reach the maximum number of retries (16 times) by marking them as dead-letter messages, and delivering them to DLQ dead-letter queues requires manual intervention. In the consumerSendMsgBack method of endMessage Processor, the logic is to determine whether the number of retries is more than 16 or whether the delay level of message sending is less than 0. If the delay level is more than 16 or less than 0, the message is set to a new dead letter. Dead letter topic is:% DLQ%+consumerGroup.

What is shown in the figure is the flow of messages across related topics involved in the whole message retry.

Posted by kimberlc on Mon, 05 Aug 2019 20:23:26 -0700