APACHE KAFKA quick start

Keywords: kafka stream

1. Obtain Kafka

download Latest version, unzip

$ tar -xzf kafka_2.13-3.0.0.tgz
$ cd kafka_2.13-3.0.0

2. Start Kafka environment

Run the following command in order to start the services correctly in order

# Start the ZooKeeper service
# Note: Soon, ZooKeeper will no longer be required by Apache Kafka.
$ bin/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal to run

# Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties

Once all services are started successfully, you have a basic kafka environment where you can run code

3. Create a topic and store your events

Kafka is a distributed event flow platform. You can write, read, store and process events (also known as recording or consumption in documents) across many servers

For example, events support transactions, mobile phone updates, address locations, product orders, IoT settings, sensor measurements, medical devices, etc. these events are organized and stored in topics. They are very type of files. In the file system, events are files in folders

So before you write the first event, you need to create a topic. Open another terminal to run:

$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092

All kafka command line workers have an additional option: running kafka topics.sh command without task parameters will print out usage information. For example, it can also show you details, such as the number of new partitions

$ bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server localhost:9092
Topic:quickstart-events  PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: quickstart-events Partition: 0    Leader: 0   Replicas: 0 Isr: 0

4. Write the event to the specified topic

Kafka client and Kafka brokers read (or write) through the network. Once received, brokers will store events in a durable and fault-tolerant manner as long as you need - even forever

Run the console production client to write some events to your topic. By default, each line you enter will cause a separate event to be written to the topic

$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
This is my first event
This is my second event

ctrl -c to deactivate the producer client

5. Read events

Open the terminal and run the consumer client to read events from the topic you just created

$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
This is my first event
This is my second event

Casual experiment: for example, switch to your producer client to write some events, and you will see the results immediately on your consumer client

Because events are permanently stored on kafka, they can be read many times by many consumers as long as you want to do so. You can easily verify by opening another terminal and rerunning the previous command

6. Use Kafka connector to import / export your data as event flow

There may be a lot of data stored in your system, such as relational databases or traditional consumer systems. Many applications have used these systems. Kafka Connect allows you to continuously collect data from additional systems to kafka, and you can operate back. It is very easy to integrate existing systems with kafka. Even things are easier to handle, and hundreds of connectors are available

see kafka connector chapter To learn more about how to continuously import / export your data, import or export kafka

7. Process your events with kafka stream

Once your data is stored in kafka in the form of events, you can process the data with the Java/Scala kafka streams client library. It allows you to implement core tasks, real-time applications and microservices, where the input and output data are stored in kafka topic. kafka stream combines the simplest writing and deployment of standard Java or Scala applications on the client, Thanks to the server-side clustering technology, these applications can be highly available, scalable, fault-tolerant, and distributed. The library supports accurate one-time processing, state operation and aggregation, windowing, join, event time-based processing, and so on

Let me show you how to implement the workCount algorithm here

KStream<String, String> textLines = builder.stream("quickstart-events");

KTable<String, Long> wordCounts = textLines
            .flatMapValues(line -> Arrays.asList(line.toLowerCase().split(" ")))
            .groupBy((keyIgnored, word) -> word)

wordCounts.toStream().to("output-topic", Produced.with(Serdes.String(), Serdes.Long()));

Kafka streams example and app development tutorial demonstrate the code and running flow application from beginning to end

8. End kafka environment

Now you can finish the quick start, close the kafka runtime or continue to take a look

  1. Stop both production and consumer clients. If you don't do it, use ctrl-c
  2. ctrl-c for lifeblood kafka broker
  3. Finally, disable zookeeper and use ctrl-c

If you want to delete all the data in your local kafka, including any you create, you can run

$ rm -rf /tmp/kafka-logs /tmp/zookeeper


You have successfully completed the quick start

To learn more, we guess we can see the following

  • To understand kafka more details, read file You also have your choice kafka books or academic papers
  • Browse user stories to learn how other users in society get value from kafka
  • Join the local Kafka conference group and discuss at the Kafka summit, the main seminar of Kafka society

Posted by tcorbeil on Fri, 26 Nov 2021 15:35:08 -0800