Construction Practice of ELK Log Platform with Billion Level

Keywords: kafka ElasticSearch Zookeeper Nginx

This article mainly talks about the real experience in our work, how to build a 100 million-level log platform, and at the same time teaches you how to build such a 100 million-level ELK system.The specific development history of the log platform can be referred to the previous section "Evolution from ELK to EFK"

Say nothing more. Old drivers are seated. We're ready to start ~~

Overall architecture

The overall architecture is divided into four modules, each providing different functions

Filebeat: Lightweight data collection engine.Based on the original Logstash-fowarder source code.In other words, Filebeat is the new version of Logstash-fowarder and will be ELK Stack's first choice in Agent.

Kafka: Data buffer queue.As a message queue, it decouples the processing and improves scalability.With peak processing power, message queuing enables critical components to withstand sudden access pressure without crashing completely due to sudden, overloaded requests.

Logstash: Data collection and processing engine.Supports the dynamic collection of data from a variety of data sources, filtering, analysis, enrichment, uniform format, and other operations, and then storage for subsequent use.

Elasticsearch: A distributed search engine.It has the features of high scalability, high reliability and easy management.It can be used for full-text retrieval, structured retrieval and analysis, and can combine the three.Elasticsearch is based on Lucene and now uses one of the most popular open source search engines, Wikipedia, StackOverflow, Github, and others to build their own search engine.

Kibana: A visualization platform.It searches for and displays indexed data stored in Elasticsearch.Use it to display and analyze data easily with charts, tables, maps.

Version description

Filebeat: 6.2.4
Kafka: 2.11-1
Logstash: 6.2.4
Elasticsearch: 6.2.4
Kibana: 6.2.4

The appropriate version is best to download the appropriate plug-in

Specific Practice

Let's take a more common example of Nginx logs, where the content is in JSON format

{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}

Filebeat

Why use Filebeat instead of Logstash?

The reason is simple: resource consumption is high.

Since Logstash runs on JVM and consumes a lot of resources, the author later wrote a lightweight Agent called Logstash-forwarder with GO that has less functions but less resources.

Later, the author joined elastic.co, where Logstash-forwarder was developed for the internal GO team and named Filebeat.

Filebeat s need to be deployed on each application server and can be pushed and installed via Salt.

download

$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz

decompression

tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
mv filebeat-6.2.4-darwin-x86_64 filebeat
cd filebeat

Modify Configuration

Modify the Filebeat configuration to allow local directory logs to be collected and output to the Kafka cluster

$ vim fileat.yml
filebeat.prospectors:
- input_type: log
  paths:
    -  /opt/logs/server/nginx.log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log

output.kafka:   
  hosts: ["192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"]
  topic: 'nginx'

Some configuration parameters have changed considerably since Filebeat 6.0, such as document_type not supported, fields being used instead, and so on.

start-up

$ ./filebeat -e -c filebeat.yml

Kafka

The number of nodes in a Kafka cluster in a production environment is recommended to be (2N + 1), with three nodes as an example

download

Download Kafka directly from the official website

$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz

decompression

tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafka

Modify Zookeeper configuration

Modify Zookeeper configuration, set up Zookeeper cluster, number (2N + 1)

ZK Cluster recommends using Kafka in-house to reduce network-related interference

$ vim zookeeper.properties

tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5

server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

Add a myid file under the Zookeeper data directory that represents the Zooekeeper node id (1, 2, 3) and guarantees no duplication

$ vim /opt/zookeeper/myid
1

Start Zookeeper Node

Start three Zookeeper nodes to ensure high availability of the cluster

$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties

Modify Kafka configuration

There are three Kafka clusters on this side, so you can modify the Kafka configuration one by one. Note that broker.id is (1, 2, 3)

$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1: 192.168.0.2: 192.168.0.3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true

Start Kafka Cluster

Start three Kafka nodes to ensure high availability of the cluster

$ ./bin/kafka-server-start.sh -daemon ./config/server.properties

Check if topic was created successfully

$ bin/kafka-topics.sh --list --zookeeper localhost:2181

nginx

Monitor Kafka Manager

Kafka-manager is Yahoo's open source cluster management tool.

You can download and install in Github: https://github.com/yahoo/kafka-manager

If Kafka consumption is not timely, you can add partition by going to the specific cluster page.Kafka improves concurrent consumption speed through partition partitioning

Logstash

Logstash provides three main functions

  • INPUT Entry
  • FILTER filter function
  • OUTPUT Out

If you use the Filter function, it is highly recommended that you use it Grok debugger To pre-parse the log format.

download

$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz

Unzip Rename

$ tar -zxvf logstash-6.2.4.tar.gz
$ mv logstash-6.2.4 logstash

Modify Logstash configuration

Modify the Logstash configuration to provide indexer functionality to insert data into the Elasticsearch cluster

$ vim nginx.conf

input {
  kafka {
    type => "kafka"
    bootstrap_servers => "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181"
    topics => "nginx"
    group_id => "logstash"
    consumer_threads => 2
  }
}

output {
  elasticsearch {
    host => ["192.168.0.1","192.168.0.2","192.168.0.3"]
    port => "9300"
    index => "nginx-%{+YYYY.MM.dd}"
  }
}

Start Logstash

$ ./bin/logstash -f nginx.conf

Elasticsearch

download

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz

decompression

$ tar -zxvf elasticsearch-6.2.4.tar.gz
$ mv elasticsearch-6.2.4.tar.gz elasticsearch

Modify Configuration

$ vim config/elasticsearch.yml

cluster.name: es 
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1

start-up

Background boot via -d

$ ./bin/elasticsearch -d

Open Web Page http://192.168.0.1:9200/ If the following information appears, the configuration is successful

{
    name: "es-node1",
    cluster_name: "es",
    cluster_uuid: "XvoyA_NYTSSV8pJg0Xb23A",
    version: {
        number: "6.2.4",
        build_hash: "ccec39f",
        build_date: "2018-04-12T20:37:28.497551Z",
        build_snapshot: false,
        lucene_version: "7.2.1",
        minimum_wire_compatibility_version: "5.6.0",
        minimum_index_compatibility_version: "5.0.0"
    },
    tagline: "You Know, for Search"
}

Console

The name Cerebro may seem strange to everyone, but in the past its name was kopf!Because Elasticsearch 5.0 no longer supports site plugin, the author of Kopf abandoned the original project and started cerebro to continue supporting the management of Elasticsearch in the new version as a stand-alone single-page application.

Points of Attention

  1. Master is separated from Data node. When Data node is larger than 3, it is recommended that responsibility be separated to alleviate pressure
  2. Data Node does not have more than 32G of memory and is recommended to be set to 31G for the specific reason given in the previous article
  3. discovery.zen.minimum_master_nodes set to (total / 2 + 1) to avoid schizophrenia
  4. Most importantly, do not expose ES to the public network. It is recommended that X-PACK be installed to enhance its security.

kibana

download

$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz

decompression

$ tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
$ mv kibana-6.2.4-darwin-x86_64.tar.gz kibana

Modify Configuration

$ vim config/kibana.yml

server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"

Start Kibana

$ nohup ./bin/kibana &

Interface Display

To create an index page, you need to specify it by prefix in Management -> Index Patterns

Final effect display

summary

To sum up, the whole set of components of ELK is implemented by deploying commands above, which includes the whole process of log collection, filtering, indexing and visualization. The analysis log function is implemented based on this set of systems.At the same time, by expanding the Kafka and Elasticsearch clusters horizontally, real-time processing of logs with an average daily scale of 100 million can be achieved.

The first principle of all good architecture design is not to pursue advanced, but to be reasonable, to match the business scale and development trend of the company. Any company, even if it is now a very large company, such as BAT, should start with a simple and clear system architecture.

However, with the expansion of business scope, the expansion of business scale, and the gradual complexity and size of the system, all systems are facing the problem of high availability.So how can we avoid similar problems and build highly available systems?

For this reason, I have specifically written a column "Take you to play with high availability", which combines years of experience in architecture design and battle in Baidu and Shanghai into this column.

This column contains 15 articles and is divided into three modules to explain in detail the knowledge of highly available architectures:

Concepts: Introduces the high availability architecture theory and its evolution, which is a partial theory.But there is still a need for us to understand the whole system.
Project: Describes how each layer of the common Internet hierarchy is highly available, including DNS, service, cache, data, and so on.
Question: Describes how to troubleshoot common faults on the line, including faults location in machine, application layer, etc.
Columns are updated weekly for 64 days.Over the next two months, I'll take you all to understand all aspects of a highly available architecture and throw out the issues and solutions that you encounter. I hope you don't repeat the pits I've encountered.We also look forward to interesting questions.

Column address: Take you around for high availability

 

https://blog.51cto.com/13527416/2117141

Posted by Jnerocorp on Wed, 08 May 2019 17:42:39 -0700