Kafka Quick Start (6) - Kafka Cluster Deployment
I. Kafka Cluster Deployment Plan
1. Operating system selection
In general, production environments should deploy Kafka clusters on Linux operating systems for the following reasons:
(1) At the bottom of the Kafka client, Java selector is used. The implementation mechanism of selector on Linux is epoll, while on Windows platform is select, so Kafka deployment can achieve more efficient I/O performance on Linux.
(2) Differences in network transmission efficiency.Kafka requires a large amount of data transfer between disks and networks. Deploying Kafka on Linux allows you to enjoy the fast data transfer features of Zero Copy technology.
(3) Support from the community.The Apache Kafka community currently has no commitment to the Kafka Bug found on the Windows platform.
2. Disk
(1) Kafka implements redundancy mechanisms to provide high reliability and load balancing at the software level through partitioning mechanisms, so Kafka's disk storage can be made up of normal disks without using RAID.
(2) The use of mechanical disks can be used for the Kafka online environment, but SSD performance is obviously better.
3. Disk capacity
When planning disk capacity, consider the number of new messages, message retention time, average message size, number of backups, whether compression is enabled, and so on.
Assuming that a company's business needs to send 100,000,000 messages to the Kafka cluster every day, two copies of each message are saved to prevent data loss. The message is saved by default for 7 days, with an average message size of 1 KB and a data compression ratio of 0.75 for Kafka.
100,000 1KB messages per day are saved in two copies with a compression ratio of 0.75, which equals 150 GB (100000*1KB*2/1000/1000*0.75). Considering the index data of the Kafka cluster, 10% of the disk space needs to be reserved, so the total storage capacity is 165 GB per day.The data remains for 7 days, so the planned disk capacity is 1155 GB (165 GB*7).
4. Network bandwidth
Assuming your company's computer room environment is a gigabyte network, or 1Gbps, your business needs to process 1TB of business data in one hour.Assuming that Kafka Broker uses 70% of the bandwidth resources, a threshold of more than 70% may result in network packet loss. A single Kafka Broker can use up to about 700Mb of bandwidth resources, but usually requires an additional two-thirds of the resources reserved for other services, that is, Kafka Broker can allocate 240Mbps (700Mb/3) of bandwidth for Kafka services.Processing 1TB data in an hour requires processing 2336Mb (1024*1024*8/3600) data per second, divided by 240, which equals about 10 servers.If you need to make two additional copies, multiply the number of servers by 3, or 30.
2. Configuration of Kafka cluster parameters
1. Broker End Parameters
Broker-side parameters are also known as static parameters (Static Configs), which can only be used in Kafka's configuration fileServer.propertiesYou must restart the Broker process to take effect.
log.dirs: Specify several file directory paths Broker needs to use. There is no default value and must be specified.In the production environment must beLog.dirsConfigure multiple paths and, if conditions permit, ensure that the directory is mounted on different physical disks.The advantage is that with improved read and write performance, multiple physical disks can read and write data at the same time with higher throughput; Failover can be achieved, and Kafka version 1.1 introduces Failover, which automatically transfers the data on the damaged disk to other normal disks, and Broker can work properly. Based on the Failover mechanism, Kafka can discard the RAID scheme.
zookeeper.connect CS format parameters, can be specified as zk1:2181,zk2:2181,zk3:2181, different Kafka clusters can be specified: zk1:2181,zk2:2181,zk3:2181/kafka1, chroot only needs to write once.
listeners: Set up a listener for intranet access to the Kafka service.
advertised.listeners: Set up a listener for external network access to Kafka services.
auto.create.topics.enable: Whether Topic creation is allowed automatically.
unclean.leader.election.enable: Allow Unclean Leader elections.
auto.leader.rebalance.enable: Whether regular Leader elections are allowed or not, a false setting is recommended in the production environment.
log.retention.{hours|minutes|ms}: Controls how long a message's data is saved.Priority: MS setting highest, minutes second, hours lowest.
log.retention.bytes: Specifies the total disk capacity that Broker saves for messages.message.max.bytes: Controls the maximum message size a broker can receive.
2. Topic level parameters
If both the Topic level parameter and the Global Broker parameter are set, the Topic level parameter overrides the Global Broker parameter, and each Topic can set its own parameter value.
In a production environment, Topic from different departments should be allowed to set their own retention times according to their own business needs.If you can only set the global Broker parameter, then it is necessary to extract the maximum of all business lifetime as the global parameter value, and setting the Topic level parameter to override the Broker parameter is a good choice.
retention.ms: Specifies how long a Topic message will be saved. By default, it is 7 days. Only the last 7 days of messages will be saved, overwriting the global parameter values at the Broker side.
retention.bytes: Specify how much disk space to reserve for Topic.Usually used in multi-tenant Kafka clusters, the default value is -1, indicating unlimited disk space usage.
max.message.bytes: Specify the maximum message size that Kafka Broker can normally receive Topic.
Topic level parameters can be set when Topic is created or when Topic is modified. It is recommended that Topic be set when Topic is modified. The Apache Kafka community may use the kafka-configs script to set Topic level parameters in the future.
3. JVM parameters
Kafka version 2.0.0 has officially abandoned support for Java 7.
Kafka Broker creates a large number of Byte Buffer instances on the JVM heap when interacting with clients, so Heap Size settings on the JVM side cannot be too small, it is recommended to set 6GB.
export KAFKA_HEAP_OPTS=--Xms6g --Xmx6g
An important parameter in JVM-side configuration is the setting of the garbage collector.For Java 7, the CMS collector is recommended if the CPU resources of the Broker machine are abundant.Enabled by specifying -XX:+UseCurrentMarkSweepGC.Otherwise, using the throughput collector, the open method is to specify -XX:+UseParallelGC.For Java 9, with the default G1 collector, G1 performs better than CMS without any tuning, mainly in fewer Full GC s, fewer parameters to adjust, etc., so using G1 is good.
export KAFKA_JVM_PERFORMANCE_OPTS= -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true
4. Operating system parameters
File descriptor limit: ulimit-n.It is recommended that you set it to a very large value, such as ulimit-n 1000000.
File system type: Selection of file system type.According to the test report on the official website, XFS has better performance than ext4.
Swappiness: A smaller value, such as 1, is recommended.Setting swap to 0 completely prohibits the Kafka Broker process from using swap space; when physical memory is exhausted, the operating system triggers the OOM killer component, randomly picking a process to kill without warning the user.If you set a smaller value, when you start using swap space, broker performance will drop dramatically, giving you time to further tune and diagnose problems.
Submission time: Submission time (Flush closing time).Sending data to Kafka is not really a matter of waiting for the data to be written to disk to be considered successful, but rather as long as it is written to the Page Cache of the operating system, which then periodically drops dirty data from the page cache onto the physical disk according to the LRU algorithm.The period for page cache data to be written to disk is determined by the commit time, which defaults to 5 seconds and can be appropriately increased to reduce physical disk write operations.If the data in the page cache goes down before it is written to disk, the data will be lost, but since Kafka already provides a redundant mechanism for multiple copies at the software level, it is reasonable to increase the commit interval in exchange for performance.
3. Docker Mirror Selection
1. Install docker
installDocker:sudo yumInstall docker
Start Docker:sudo systemctl start docker
Docker version check: docker version
2. docker-compose installation
Docker-compose download: sudo curl-LHttps://github.com/docker/compose/releases/download/1.23.0-rc3/docker-compose-uname-s-uname-m-o/usr/local/bin/docker-compose
Docker-compose installation: sudo Chmod +x/usr/local/bin/docker-compose
Doker-compose version check: docker-compose version
3. docker mirror selection
zookeeper mirror selection:
docker search zookeeper
Choose the mirror with the most star ts:docker.io/zookeeper
Kafka mirror selection:
docker search kafka
Choose the mirror with the most star ts:docker.io/wurstmeister/kafka
kafka-manager mirror selection:
docker search kafka-manager
Select mirror: Kafka manager/kafka-manager
4. Kafka Single-machine Deployment Scheme
1. Write docker-Compose.ymlfile
# Single machine zookeeper + kafka + kafka-manager cluster version: '2' services: # Define zookeeper service zookeeper-test: image: zookeeper # zookeeper mirror restart: always hostname: zookeeper-test ports: - "12181:2181" # Host port: docker internal port container_name: zookeeper-test # Container name # Define kafka services kafka-test: image: wurstmeister/kafka # kafka mirror restart: always hostname: kafka-test ports: - "9092:9092" # Expose Port Number - "9999:9999" # Exposure to JMX_PORT environment: KAFKA_ADVERTISED_HOST_NAME: 192.168.0.105 # KAFKA_ADVERTISED_PORT: 9092 # KAFKA_ZOOKEEPER_CONNECT: zookeeper-test:2181 # zookeeper service KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 30000 # zookeeper connection timeout KAFKA_LOG_CLEANUP_POLICY: "delete" KAFKA_LOG_RETENTION_HOURS: 120 # Set the maximum time to save message data to 120 hours KAFKA_MESSAGE_MAX_BYTES: 10000000 # Maximum number of bytes in message body KAFKA_REPLICA_FETCH_MAX_BYTES: 10000000 # KAFKA_GROUP_MAX_SESSION_TIMEOUT_MS: 60000 # KAFKA_NUM_PARTITIONS: 1 # Number of partitions KAFKA_DELETE_RETENTION_MS: 10000 # KAFKA_BROKER_ID: 1 # ID of kafka KAFKA_COMPRESSION_TYPE: lz4 KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=192.168.0.105 -Dcom.sun.management.jmxremote.rmi.port=9999" # Import KAFKA_JMX_OPTS environment variable JMX_PORT: 9999 # Import JMX_PORT environment variable depends_on: - zookeeper-test # rely on container_name: kafka-test # Define kafka-manager service kafka-manager-test: image: kafkamanager/kafka-manager # kafka-manager mirror restart: always container_name: kafka-manager-test hostname: kafka-manager-test ports: - "9000:9000" # Expose ports, provide web access depends_on: - kafka-test # rely on environment: ZK_HOSTS: zookeeper-test:2181 # Host IP KAFKA_BROKERS: kafka-test:9090 # kafka KAFKA_MANAGER_AUTH_ENABLED: "true" # Turn on security authentication KAFKA_MANAGER_USERNAME: kafka-manager # Kafka Manager Logon User KAFKA_MANAGER_PASSWORD: 123456 # Kafka Manager login password
You need to confirm whether the port is occupied.
2. Start Services
Create a kafka directory that will docker-Compose.ymlFiles are placed in the kafka directory and commands are executed in the kafka directory.
Start:
docker-compose up -d
Close:
docker-compose down
3. kafka Service View
Enter docker container:
docker exec -it kafka /bin/bash
Create Topic:
kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 3 --topic test
View Topic:
kafka-topics.sh --list --zookeeper zookeeper:2181
Production message:
kafka-console-producer.sh --broker-list kafka:9092 --topic test
Consumer News:
kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic test --from-beginning
Open two terminals, one command to execute production messages, and one command to execute consumer messages. Each message produced, the consumer message Terminal displays a message to implement the message queue.
4. Kafka Version Query
In the wurstmeister/kafka image, Kafka is installed in the / opt directory, enter / opt directory, kafka_The 2.12-2.4.0 directory is the Kafka installation directory.
Scala version: 2.12
Kafka version: 2.4
5. kafka-manager monitoring
Web access: http://127.0.0.1:9000
5. Error Resolution
1. Container Deletion Failed
docker rm -f $(docker ps -a --filter status=dead -q |head -n 1)
Error message:
ERROR: for f78856fb92e9_zoo1 Driver overlay2 failed to remove root filesystem f78856fb92e97f75ff4c255077de544b39351a4a2a3319737ada2a54df568032: remove /var/lib/docker/overlay2/2c257b8071b6a3d79e216838522f76ba7263d466a470dc92cdbef25c4dd04dc3/merged: device or resource busy
grep docker /proc/*/mountinfo|grep containerid | awk -F ":" '{print $1}' | awk -F "/" '{print $3}'
sudo kill -9 3119
2. The kafka service has been restarted
Error message:
Error response from daemon: Container 9b3f9af8a1196f2ad3cf74fe2b1eeb7ccbd231fe2a93ec09f594d3a0fbb5783c is restarting, wait until the container is running
Error Reason:
Docker-Compose.ymlThe file configures restart: always for the kafka service. If the kafka service fails to start and restarts all the time, you can view the log information of the kafka service start through docker logs kafka to find out the cause of the error.
6. Configuration of Kafka cluster parameters
############################# System ###################### # The ID that uniquely identifies the cluster, requires a positive number. broker.id = 0 # Service port, default 9092 port = 9092 # Listening address, not all addresses host.name = debugo01 # Maximum number of threads handling network requests num.network.threads = 2 # Number of threads processing disk I/O num.io.threads = 8 # Number of background threads background.threads = 4 # Maximum number of request queues waiting to be processed by IO threads queued.max.requests = 500 # Send buffer for socket (SO_SNDBUF) socket.send.buffer.bytes = 1048576 # Receive buffer for socket s (SO_RCVBUF) socket.receive.buffer.bytes = 1048576 # Maximum number of bytes requested by a socket.To prevent memory overflow,Message.max.bytesMust be less than socket.request.max.bytes = 104857600 ############################# Topic ######################## # Number of partitions per topic, more partitions will produce more segment file s num.partitions = 2 # Whether automatic topic creation is allowed or, if false, command creation is required auto.create.topics.enable = true # A topic, the number of replication s for the default partition, cannot be greater than the number of broker s in the cluster. default.replication.factor = 1 # Maximum size of the message body in bytes message.max.bytes = 1000000 ############################# ZooKeeper #################### # Zookeeper quorum settings.If more than one is split by commas zookeeper.connect = debugo01:2181, debugo02, debugo03 # Timeout for connecting zk zookeeper.connection.timeout.ms = 1000000 # Synchronization between leader and follower in ZooKeeper cluster zookeeper.sync.time.ms = 2000 ############################# Log ######################### # Log storage directories, multiple directories separated by commas log.dirs = / var / log / kafka # When the number of messages below is reached, the data is flush ed into the log file.Default 10000 # log.flush.interval.messages=10000 # When the following time (ms) is reached, a mandatory flush operation is performed.Interval.msandInterval.messagesEither way, it will flush.Default 3000ms # log.flush.interval.ms=1000 # Check if log flush interval is required log.flush.scheduler.interval.ms = 3000 # Log Cleanup Policy (delete|compact) log.cleanup.policy = delete # The log save time (hours|minutes) defaults to 7 days (168 hours).Over this time, the data will be processed according to the policy.bytes and minutes are triggered regardless of which is reached first. log.retention.hours = 168 # Maximum number of bytes stored in log data.Over this time, the data will be processed according to the policy. # log.retention.bytes=1073741824 # Controls the size of the log segments file, beyond which a new log segments file is appended (-1 means no limit) log.segment.bytes = 536870912 # When the following time is reached, a new segment is forced to be created log.roll.hours = 24 * 7 # The review cycle for log fragment files to see if they meet the deletion policy settings (Log.retention.hoursorLog.retention.bytes) log.retention.check.interval.ms = 60000 # Whether compression is turned on log.cleaner.enable = false # Maximum time to keep compressed logs log.cleaner.delete.retention.ms = 1 day # Index file size limit for segment log log.index.size.max.bytes = 10 * 1024 * 1024 # A buffer computed by the y-index does not normally need to be set. log.index.interval.bytes = 4096 ############################# replica ####################### # Timeout for communication between partition management controller s and replicas controller.socket.timeout.ms = 30000 # Size size of controller-to-broker-channels message queue controller.message.queue.size = 10 # The maximum wait time for replicas to respond to leader s, beyond which replicas are excluded from management replica.lag.time.max.ms = 10000 # Whether the controller is allowed to turn off the broker, and if set to true, all leader s on this broker will be turned off and transferred to other brokers controlled.shutdown.enable = false # Number of attempts for controller shutdown controlled.shutdown.max.retries = 3 # Time interval for each shutdown attempt controlled.shutdown.retry.backoff.ms = 5000 # If relicas falls too far behind, it will be considered invalid.In general, due to network latency and other reasons, message synchronization in replicas always lags.If the message is severely delayed, leader will assume that this relicas network has a high latency or limited message throughput.In environments with a small number of broker s or insufficient network, it is recommended that this value be increased. replica.lag.max.messages = 4000 # socket timeout for leader and relicas replica.socket.timeout.ms = 30 * 1000 # socket cache size for leader replication replica.socket.receive.buffer.bytes = 64 * 1024 # Maximum number of bytes per replicas fetch replica.fetch.max.bytes = 1024 * 1024 # Maximum wait time for communication between replicas and leader. Failed retries replica.fetch.wait.max.ms = 500 # Minimum data size for each fetch operation, if the unsynchronized data in the leader is less than this value, will wait until the data reaches this size replica.fetch.min.bytes = 1 # Number of threads replicating in leader, increasing this number will increase relipca's IO num.replica.fetchers = 1 # Time interval between each replica flush the highest water level replica.high.watermark.checkpoint.interval.ms = 5000 # Whether to automatically balance allocation policies between broker s auto.leader.rebalance.enable = false # The unbalanced proportion of leader s, if exceeded, will rebalance the partitions leader.imbalance.per.broker.percentage = 10 # Check for unbalanced lead intervals leader.imbalance.check.interval.seconds = 300 # Maximum space for clients to retain offset information offset.metadata.max.bytes = 1024 #############################Consumer ##################### # The core configuration of the Consumer side isGroup.id,zookeeper.connect # By setting the same group ID multiple processes indicate that they are all parts of the same Consumer group. group.id # Consumer ID, if not set, will increase consumer.id # An ID for tracking surveys, preferably the same asGroup.ididentical client.id = < group_id > # For zookeeper cluster designation, you must use the same zk configuration as broker zookeeper.connect = debugo01:2182, debugo02: 2182, debugo03: 2182 # zookeeper's heartbeat timeout, which is considered an invalid consumer zookeeper.session.timeout.ms = 6000 # zookeeper's Waiting Connection Time zookeeper.connection.timeout.ms = 6000 # Synchronization time between zookeeper's follower and leader zookeeper.sync.time.ms = 2000 # When there is no initial offset in the zookeeper, or when the offset limit is exceeded. # smallest: reset to minimum # largest:Reset to maximum # anything else: throw an exception to consumer auto.offset.reset = largest # The socket timeout, the actual timeout isMax.fetch.wait+Socket.timeout.ms. socket.timeout.ms = 30 * 1000 # Receive cache size for socket s socket.receive.buffer.bytes = 64 * 1024 # Message size limit from fetch per partition fetch.message.max.bytes = 1024 * 1024 # When true, the consumer synchronizes offsets to zookeeper after the consumer message, so when the consumer fails, the new consumer can get the latest offsets from the zookeeper auto.commit.enable = true # Time interval for automatic submission auto.commit.interval.ms = 60 * 1000 # Maximum number of message block buffer sizes for consumption, each of which can be equal toFetch.message.maxValues in.Bytes queued.max.message.chunks = 10 # When a new consumer joins the group, reblance is attempted to migrate the consumer side of partitions to the new consumer, which is the number of attempts rebalance.max.retries = 4 # Time interval per reblance rebalance.backoff.ms = 2000 # Time for each re-election of leader refresh.leader.backoff.ms # The minimum data sent by the server to the consumer, if this number is not met, will wait until the specified size is met.The default is 1 for immediate reception. fetch.min.bytes = 1 # If not satisfiedFetch.min.bytesMaximum wait time for consumer requests when fetch.wait.max.ms = 100 # If no new messages are available for consumption within a specified time, an exception is thrown, default -1 means unlimited consumer.timeout.ms = -1 #############################Producer###################### # Core configurations include: # metadata.broker.list # request.required.acks # producer.type # serializer.class # The address at which consumers obtain message meta-information (topics, partitions and replicas) in the format host1:port1,host2:port2 or vip outside metadata.broker.list # Confirmation mode of message # 0: No acknowledgment of arrival is guaranteed, send only, low latency but message loss occurs, which is a bit like TCP in the case of a server failure # 1: Send a message and wait for leader to receive confirmation for reliability # -1: Send a message, wait for leader to receive confirmation, and replicate before returning. Maximum reliability request.required.acks = 0 # Maximum wait time for message to be sent request.timeout.ms = 10000 # Cache size of socket s send.buffer.bytes = 100 * 1024 # The serialization of key, if not set, is the same asSerializer.class key.serializer.class # Partition policy, default is modulo partitioner.class =kafka.producer.DefaultPartitioner # Compression mode for messages, none by default, can have gzip and snappy compression.codec = none # You can compress specific topic s for silent writing compressed.topics = null # Number of retries after message sending failure message.send.max.retries = 3 # Interval after each failure retry.backoff.ms = 100 # The time interval at which the producer periodically updates the topic meta-information, if set to 0, the data is updated after each message is sent topic.metadata.refresh.interval.ms = 600 * 1000 # User-specified, but not duplicated, primarily for tracking logged messages client.id = "" # Maximum time to buffer data in asynchronous mode.Setting to 100, for example, collects messages within 100ms and sends them later, which improves throughput but increases the latency of message sending queue.buffering.max.ms = 5000 # Maximum number of messages buffered in asynchronous mode, same as above queue.buffering.max.messages = 10000 # The wait time for a message to enter the queue in asynchronous mode.If set to 0, the message does not wait, if it cannot enter the queue, it is discarded directly queue.enqueue.timeout.ms = -1 # In asynchronous mode, the number of messages sent at a time whenQueue.buffering.max.messages orQueue.buffering.maxThe producer triggers the send when one of the.Ms conditions is met. batch.num.messages = 200