Kafka integrates Spring -- consumer end

Keywords: Programming kafka Spring xml Apache

Kafka consumer

Reliability assurance

As a consumer, consumption data needs to consider:

1. No repeated consumption

2. No lack of consumption information

Partition allocation policy

There are multiple consumers in a consumer group and multiple partitions in a topic, so it is inevitable to involve the allocation of partitions, that is, to determine which consumer consumes the partition. Kafka has two allocation strategies, one is round robin (round robin scheduling) and the other is Range.

https://blog.csdn.net/u013256816/article/details/81123600 Zhu Xiaosi's blog (author of deep understanding of Kafka: core design and practice principles and RabbitMQ Practice Guide)

offset

The reliability of the consumer side needs to be guaranteed by offer. The offset here is not the broker's offer, but the consumer's consumption displacement offset, which is maintained in the broker. As consumers may have power failure and downtime in the consumption process, after consumer recovers, it is necessary to Consumers need to record which offset they have consumed in real time so that they can continue to consume after recovery.

Before Kafka version 0.9, consumer saved the offset in Zookeeper by default. From version 0.9, consumer saved the offset in a built-in topic of Kafka by default, which is "consumer" offsets.

It is easy to guarantee the reliability of the Consumer when consuming data. Because the data is persistent in Kafka, there is no need to worry about data loss

Therefore, the maintenance of offset is a problem that must be considered for Consumer consumption data

In order to enable us to focus on our own business logic, Kafka provides the function of automatically submitting offset.

Automatically submit the relevant parameters of offset:

enable.auto.commit: enable the auto commit offset function (true) auto.commit.interval.ms: the time interval of auto submit offset (1000ms = 1s)

This way allows consumers to manage displacement, and the application itself does not need to be operated explicitly. When we set enable.auto.commit to true, Then the consumer submits the displacement every 5 seconds (specified by auto.commit.interval.ms) after the poll method call. And many of them. Like other operations, automatic submission is also driven by the poll() method; when poll() is called, the consumer determines whether the submission time is reached, such as If so, the maximum displacement returned by the last poll is submitted.

It should be noted that this approach may lead to repeated consumption of messages. If, after a consumer's poll message, the application is processing the message and Kafka rebalances it after 3 seconds, then this part of the message will be consumed repeatedly after rebalancing due to no update displacement.

Although it is very convenient to submit offset automatically, it is difficult for developers to grasp the opportunity of submitting offset because it is based on time. Therefore, Kafka also provides manual submission of offset.

There are two ways to manually submit an offset: commit sync and commit async. The same thing is that they will submit a batch of data with the highest offset of this poll. The difference is that commitSync blocks the current thread until the submission is successful, and it will automatically fail to retry (due to uncontrollable factors, there will also be submission failure). commitAsync has no failure retry mechanism, so it may fail to submit Because the synchronous commit offset has a failure retry mechanism, it is more reliable

Manually submit the relevant parameters of offset:

enable.auto.commit: enable the auto commit offset function (false)

There is also a disadvantage of asynchronous submission, that is, if the server fails to return the submission, the asynchronous submission will not be retried. In contrast, synchronous commit retries until it succeeds or throws an exception to the application. Asynchronous commit does not implement retry because if there are multiple asynchronous commits at the same time, retrying may cause displacement override. For example, if we launch an asynchronous commit, the commit displacement is 2000, and then we launch another asynchronous commit with a commit displacement of 3000; if the commit fails but the commit succeeds, and the commit retries and succeeds, the commit will actually roll back the committed displacement from 3000 to 2000, resulting in repeated message consumption.

Although synchronous commit offset is more reliable, it will block the current thread until commit is successful. Therefore, throughput will be greatly affected. Therefore, in more cases, you will choose to submit the offset asynchronously.

Whether it is synchronous submission or asynchronous submission of offset, it may cause data leakage or repeated consumption. Submitting the offset first and then consuming may result in data leakage, while submitting the offset first and then consuming may result in data repeated consumption. Therefore, on the premise of ensuring data integrity, we choose synchronous submission and try to de duplicate messages at the consumer side.

Spring Kafka consumer

spring-consumer.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns="http://www.springframework.org/schema/beans" xmlns:aop="http://www.springframework.org/schema/aop"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
	http://www.springframework.org/schema/beans/spring-beans.xsd
	http://www.springframework.org/schema/context
	http://www.springframework.org/schema/context/spring-context.xsd
	http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop.xsd">

    <context:component-scan base-package="listener" />
    <!--<context:component-scan base-package="concurrent" />-->


    <bean id="consumerProperties" class="java.util.HashMap">
        <constructor-arg>
            <map>
                <!--broker colony-->
                <entry key="bootstrap.servers" value="192.168.25.10:9092,192.168.25.11:9092,192.168.25.12:9092"/>
                <!--groupid-->
                <entry key="group.id" value="group1"/>
                <!--earliest : When there are submitted offset From submitted offset Begin to consume; uncommitted offset Consumption from the beginning-->
                <entry key="auto.offset.reset" value="earliest "/>
                <!--Automatic submission-->
                <entry key="enable.auto.commit" value="false"/>
                <!--Auto commit retry wait time-->
                <entry key="auto.commit.interval.ms" value="1000"/>
                <!--Timeout to detect consumer failure-->
                <entry key="session.timeout.ms" value="30000"/>
                <!--key De serialization-->
                <entry key="key.deserializer" value="org.apache.kafka.common.serialization.IntegerDeserializer"/>
                <!--value De serialization-->
                <entry key="value.deserializer" value="org.apache.kafka.common.serialization.StringDeserializer"/>
            </map>
        </constructor-arg>
    </bean>
    <!--consumer factory-->
    <bean id="consumerFactory" class="org.springframework.kafka.core.DefaultKafkaConsumerFactory">
        <constructor-arg>
            <ref bean="consumerProperties"/>
        </constructor-arg>
    </bean>
    <bean id="containerProperties" class="org.springframework.kafka.listener.config.ContainerProperties">
        <constructor-arg  >
            <list>
                <value>topic1</value>
                <value>topic2</value>
            </list>
        </constructor-arg>
        <property name="messageListener" ref="kafkaConsumerListener"/>
		<property name="pollTimeout" value="1000"/>
		<property name="AckMode" value="MANUAL"/>
    </bean>

    <bean id="messageListenerContainer" class="org.springframework.kafka.listener.KafkaMessageListenerContainer" >
        <constructor-arg ref="consumerFactory"/>
        <constructor-arg ref="containerProperties"/>
    </bean>

    <!-- Concurrent message listening container, executing doStart()Method -->
<!--    <bean id="messageListenerContainer" class="org.springframework.kafka.listener.ConcurrentMessageListenerContainer" init-method="doStart" >
        <constructor-arg ref="consumerFactory" />
        <constructor-arg ref="containerProperties" />
        &lt;!&ndash;#Consumption listener container concurrency & ndash; & gt;
        &lt;!&ndash;concurrency = 3&ndash;&gt;
        <property name="concurrency" value="3" />
    </bean>-->
</beans>

AckMode RECORD every commit processed

Batch (default) batch submit once every poll. The frequency depends on the call frequency of each poll

TIME: the TIME between each ackTime interval to commit (what's the difference with auto commit interval? )

COUNT the ack to commit when the cumulative ack COUNT is reached

COUNT_TIMEackTime or ackCount which condition is satisfied first, commit

Man listener is in charge of ack, but it is also in batch

Manual · immediate listner is responsible for ack, and every time it is called, it will immediately commit

KafkaConsumerListener class

(synchronous commit)

@Component
public class KafkaConsumerListener implements AcknowledgingMessageListener<String, String> {
    @Override
    public void onMessage(ConsumerRecord<String, String> stringStringConsumerRecord, Acknowledgment acknowledgment) {
        System.out.printf("offset= %d, key= %s, value= %s,topic= %s,partition= %s\n",
                stringStringConsumerRecord.offset(),
                stringStringConsumerRecord.key(),
                stringStringConsumerRecord.value(),
                stringStringConsumerRecord.topic(),
                stringStringConsumerRecord.partition());
                acknowledgment.acknowledge();
    }
}

test

    @Test
    public  void consumer() {
        ApplicationContext context = new ClassPathXmlApplicationContext("listener.xml");
        System.out.printf("start-up listener");
        while (true) {

        }
    }

Result:

offset= 57, key= null, value= 2019-11-19 03:40:45,topic= topic1,partition= 0
offset= 4929, key= null, value= 2019-11-19 03:40:47,topic= topic2,partition= 2

Posted by mhenke on Tue, 19 Nov 2019 01:05:44 -0800