kafka, it's not hard

Keywords: Programming kafka Zookeeper Apache Spring

1. Introduction to Kafka

 

1.1. Main functions

According to the introduction of the official website, ApacheKafka ® is a distributed streaming media platform, which has three main functions:

1: It lets you publish and subscribe to streams of records

2: it lets you store streams of records in a fault tolerant way

3: It lets you process streams of records as they occur

 

1.2. Use scenario

1: building real-time streaming data pipelines that reliable get data between systems or applications

2: Building real-time streaming applications that transform or react to the streams of data. Build real-time stream data processing program to transform or process data stream, data processing function

 

1.3. Detailed introduction

Kafka is currently mainly used as a distributed publish subscribe message system. Here is a brief introduction to Kafka's basic mechanism

1.3.1 message transmission process

 

 

Producer is the producer who sends messages to Kafka cluster. Before sending messages, the messages will be classified as topic. The figure above shows that two producers send messages classified as topic1 and the other sends messages classified as topic2.

Topic is the topic. By specifying a topic for messages, messages can be classified. Consumers can only focus on the messages in the topic they need

Consumer is the consumer. By establishing a long connection with kafka cluster, the consumer constantly pulls messages from the cluster, and then can process these messages.

From the above figure, we can see that the number of consumers and producers under the same Topic is not corresponding.

1.3.2 kafka server message storage strategy

 

 

When it comes to kafka storage, we have to mention partitions, that is, partitions. When creating a topic, you can specify the number of partitions at the same time. The more partitions, the greater the throughput, but the more resources you need, and the higher the unavailability. After receiving the messages sent by the producers, kafka will store the messages to different ones according to the equalization strategy Zoning.

 

 

In each partition, messages are stored in order, and the latest received messages are consumed.

1.3.3 interaction with producers

 

 

When the producer sends messages to the kafka cluster, it can send them to the specified partition through the specified partition

You can also send messages to different partitions by specifying an equalization strategy

If not specified, the message will be randomly stored in different partitions using the default random equalization strategy

1.3.4 interaction with consumers

 

 

 

When consumers consume messages, kafka uses offset to record the current consumption position

In kafka's design, there can be multiple different groups to consume messages under the same topic at the same time. As shown in the figure, we have two different groups to consume messages at the same time. Their consumption record positions offset are different and do not interfere with each other.

For a group, the number of consumers should not be more than the number of partitions, because in a group, each partition can only be bound to one consumer at most, that is, one consumer can consume multiple partitions, and one partition can only consume one consumer

Therefore, if the number of consumers in a group is greater than the number of partitions, the redundant consumers will not receive any messages.

 

2. Kafka installation and use

 

2.1. Download

You can download the latest Kafka installation package on the Kafka official website http://kafka.apache.org/downloads, choose to download the binary version of tgz file, and you may need fq according to the network status. Here, we choose the version 0.11.0.1, the latest version at present

 

2.2. installation

Kafka is a program written in scala and running on the jvm virtual machine. Although it can also be used on windows, Kafka is basically running on the linux server, so we also use linux here to start today's practice.

First of all, make sure that you have jdk installed on your machine. kafka needs java running environment. The previous kafka also needs zookeeper. The new version of kafka has a built-in zookeeper environment, so we can use it directly

To install, if we only need to make the simplest attempt, we just need to unzip to any directory. Here we unzip the kafka package to the / home directory

 

2.3. configuration

Under the kafka decompression directory, there is a config folder, which contains our configuration files

Consumer.properties consumer configuration. This configuration file is used to configure the consumers opened in Section 2.5. Here we use the default

producer.properties producer configuration. This configuration file is used to configure the producers opened in Section 2.5. Here we use the default

Configuration of server.properties kafka server. This configuration file is used to configure kafka server. At present, only a few basic configurations are introduced

  1. broker.id states the unique ID of the current kafka server in the cluster, which needs to be configured as integer, and the ID of each kafka server in the cluster should be unique. We can use the default configuration here
  2. listeners state the port number that this kafka server needs to listen to. If the virtual machine is running on the local machine, you may not need to configure this item. By default, the address of localhost will be used. If it is running on the remote server, you must configure it, for example:

  listeners=PLAINTEXT:// 192.168.180.128:9092. And ensure that the 9092 port of the server can be accessed

3. zookeeper.connect states that the address of the zookeeper connected by kafka needs to be configured as the address of the zookeeper. Because this time we use the zookeeper in the higher version of kafka, we can use the default configuration

  zookeeper.connect=localhost:2181

2.4. operation

  1. Start zookeeper

cd enters the kafka decompression directory, and input

bin/zookeeper-server-start.sh config/zookeeper.properties

After starting zookeeper successfully, you will see the following output

 

2. Start kafka

cd enters the kafka decompression directory, input

bin/kafka-server-start.sh config/server.properties

After starting kafka successfully, you will see the following output

 

 

2.5. Create first message

2.5.1 create a topic

Kafka manages the same kind of data through topic. Using the same topic for the same kind of data can make data processing more convenient

Open the terminal in the kafka decompression directory and input

    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Create a topic named test

 

 

After the topic is created, you can enter

bin/kafka-topics.sh --list --zookeeper localhost:2181

To view the topic s that have been created

2.4.2 create a message consumer

Open the terminal in the kafka decompression directory and input

    bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

You can create a consumer whose topic is test

 

 

After the creation of the consumer, because no data has been sent, no data has been printed out after execution

But don't worry. Don't close this terminal. Open a new terminal. Next, we create the first message producer

2.4.3 create a message producer

Open a new terminal in the kafka decompression directory and input

    bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Editor page to enter after execution

 

 

After sending the message, you can go back to our message consumer terminal. You can see that the terminal has printed out the message we just sent

 

 

3. Using kafka in java program

As in the previous section, we are now trying to use kafka in java programs

3.1 create Topic

public static void main(String[] args) { //Create topic properties props = new properties(); props. Put ("bootstrap. Servers", "192.168.180.128:9092"); adminclient adminclient = adminclient. Create (props); ArrayList < newtopic > topics = new ArrayList < newtopic > (); newtopic newtopic = new newtopic ("topic test", 1, (short) 1); topics. Add (newtopic); create topics result result = adminclient. Createtopic s(topics); try { result.all().get(); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } }

 

The AdminClient API can be used to control the configuration of kafka server. Here we use

NewTopic(String name, int numPartitions, short replicationFactor)

To create a Topic named "Topic test", the number of partitions is 1, and the replication factor is 1

3.2 Producer sending message

public static void main(String[] args){ Properties props = new Properties(); props.put("bootstrap.servers", "192.168.180.128:9092"); props.put("acks", "all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer<String, String>(props); for (int i = 0; i < 100; i++) producer.send(new ProducerRecord<String, String>("topic-test", Integer.toString(i), Integer.toString(i))); producer.close(); }

 

After sending the message with producer, the server-side consumer mentioned in 2.5 can listen to the message. You can also use the java consumer program described next to consume messages

3.3 Consumer consumer consumption message

public static void main(String[] args){ Properties props = new Properties(); props.put("bootstrap.servers", "192.168.12.65:9092"); props.put("group.id", "test"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); final KafkaConsumer<String, String> consumer = new KafkaConsumer<String,String>(props); consumer.subscribe(Arrays.asList("topic-test"),new ConsumerRebalanceListener() { public void onPartitionsRevoked(Collection<TopicPartition> collection) { } public void onPartitionsAssigned(Collection<TopicPartition> collection) { //Set the offset to the beginning consumer. Seektobeeting (Collection);}}); while (true) {consumer record s < string, string > records = consumer. Poll (100); for (consumer record < string, string > record: Records) system. Out. Printf ("offset =% D, key =% s, value =% s% n", record. Offset(), record. Key(), record. Value());}}

 

Here, we use the consumption API to create a common java consumer program to listen to the Topic named "Topic test". Whenever a producer sends a message to kafka server, our consumers can receive the message sent.

4. Using spring Kafka

Spring kafka is a spring subproject in the incubation stage. It can use the features of spring to make us more convenient to use kafka

4.1 basic configuration information

Like other spring projects, configuration is always necessary. Here we use java configuration to configure our kafka consumers and producers.

  1. Import pom file
<!--kafka start--> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.11.0.1</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.11.0.1</version> </dependency> <dependency> <groupId>org.springframework.kafka</groupId> <artifactId>spring-kafka</artifactId> <version>1.3.0.RELEASE</version> </dependency>
  1. Create configuration class

We will create a new class named KafkaConfig in the main directory

@Configuration @EnableKafka public class KafkaConfig { }
  1. Configure Topic

Add configuration to kafkaConfig class

@Configuration @EnableKafka public class KafkaConfig { //In this paper, the author analyzes the characteristics of
  1. Configure producer Factory and Template

Append in the configuration file above

//producer config start @Bean public ProducerFactory<Integer, String> producerFactory() { return new DefaultKafkaProducerFactory<Integer,String>(producerConfigs()); } @Bean public Map<String, Object> producerConfigs() { Map<String, Object> props = new HashMap<String,Object>(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.180.128:9092"); props.put("acks", "all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); return props; } @Bean public KafkaTemplate<Integer, String> kafkaTemplate() { return new KafkaTemplate<Integer, String>(producerFactory()); } //producer config end

5. Configure ConsumerFactory

Append in the configuration file above

//consumer config start @Bean public ConcurrentKafkaListenerContainerFactory<Integer,String> kafkaListenerContainerFactory(){ ConcurrentKafkaListenerContainerFactory<Integer, String> factory = new ConcurrentKafkaListenerContainerFactory<Integer, String>(); factory.setConsumerFactory(consumerFactory()); return factory; } @Bean public ConsumerFactory<Integer,String> consumerFactory(){ return new DefaultKafkaConsumerFactory<Integer, String>(consumerConfigs()); } @Bean public Map<String,Object> consumerConfigs(){ HashMap<String, Object> props = new HashMap<String, Object>(); props.put("bootstrap.servers", "192.168.180.128:9092"); props.put("group.id", "test"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); return props; } //consumer config end

4.2 create message producer

//Using spring Kafka's template to send a single message to send multiple messages only needs to cycle multiple times to public static void main (string [] args) throws executionexception, interruptedexception {annotationconfidapplicationcontext CTX = new annotationconfidapplicationcontext (kafkaconfig. Class); kafkatemplate < integer, string > kafkatemplate = (kafkatemplate < in teger, String>) ctx.getBean("kafkaTemplate"); String data="this is a test message"; ListenableFuture<SendResult<Integer, String>> send = kafkaTemplate.send("topic-test", 1, data); send.addCallback(new ListenableFutureCallback<SendResult<Integer, String>>() { public void onFailure(Throwable throwable) { } public void onSuccess(SendResult<Integer, S tring> integerStringSendResult) { } }); }

4.3 creating message consumers

First, we create a class for message listening. When the topic named "topic test" receives the message, our listen method will call.

public class SimpleConsumerListener { private final static Logger logger = LoggerFactory.getLogger(SimpleConsumerListener.class); private final CountDownLatch latch1 = new CountDownLatch(1); @KafkaListener(id = "foo", topics = "topic-test") public void listen(byte[] records) { //do something here this.latch1.countDown(); } }

We also need to configure this class into KafkaConfig as a Bean

Append in configuration file

@Bean public SimpleConsumerListener simpleConsumerListener(){ return new SimpleConsumerListener(); }

By default, spring kafka creates a thread for each listening method to pull messages from kafka server

 

Last

Thanks for your attention and forwarding.

Posted by RobbertvanOs on Fri, 29 Nov 2019 00:58:59 -0800