Preface: articles before us ELK+Kafka+Filebeat analysis Nginx log The configuration of ES cluster is mentioned in. In my later experiments, I found that although the cluster can be started and can process data normally, as long as the ES1 node is hung up, the whole cluster can not operate normally and the data can not be processed, so the highly available effect cannot be achieved. Let's solve this problem.
Let's look at our previous configuration:
ES1 node:
[root@es1 ~]# cat /opt/soft/elasticsearch-7.6.0/elasticsearch.yml #Cluster name cluster.name: my-app #Node name node.name: es1 #Is it a management node node.master: true #Is it a data node node.data: true #Data storage path path.data: /var/es/data #Log storage path path.logs: /var/es/logs #IP of current node network.host: 10.1.1.7 #Port the current node works on http.port: 9200 #Communication port transport.tcp.port: 9300 #Choose which primary node to use to initialize the cluster and actively discover other nodes, which must be consistent with node.name cluster.initial_master_nodes: ["es1"] #IP and communication ports discovered automatically during cluster initialization discovery.zen.ping.unicast.hosts: ["10.1.1.7:9300","10.1.1.8:9300", "10.1.1.9:9300"] #This attribute is defined to form a cluster. The minimum number of nodes with master node qualification and connected to each other is (N/2)+1, so as to prevent multiple master nodes from appearing in the cluster brain crack discovery.zen.minimum_master_nodes: 2
ES2 configuration:
cluster.name: my-app node.name: es2 node.master: true node.data: true path.data: /var/es/data path.logs: /var/es/logs network.host: 10.1.1.8 http.port: 9200 transport.tcp.port: 9300 cluster.initial_master_nodes: ["es1"] discovery.zen.ping.unicast.hosts: ["10.1.1.7:9300","10.1.1.8:9300", "10.1.1.9:9300"] discovery.zen.minimum_master_nodes: 2
ES3 configuration:
cluster.name: my-app node.name: es3 node.master: false node.data: true path.data: /var/es/data path.logs: /var/es/logs network.host: 10.1.1.9 http.port: 9200 transport.tcp.port: 9300 cluster.initial_master_nodes: ["es1"] discovery.zen.ping.unicast.hosts: ["10.1.1.7:9300","10.1.1.8:9300", "10.1.1.9:9300"] discovery.zen.minimum_master_nodes: 2
After cluster startup:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 10.1.1.7 14 94 0 0.06 0.03 0.05 dilm * es1 10.1.1.8 12 97 3 0.14 0.09 0.11 dilm - es2 10.1.1.9 12 93 0 0.32 0.08 0.07 dil - es3
The current problem is that when the 10.1.1.7 node is hung up, the cluster will fail.
What's the problem?
First of all, we need to ensure the number of qualified masters. For a complete production cluster, at least three nodes should be qualified as masters. That is to say, node.master:true is configured in at least three node configuration files.
In this way, the number of discovery.zen.minimum master nodes participating in the election can be configured as (3 / 2) + 1 = 2, that is to say, at least two node.master nodes participate in the election, and then a new master can be selected. The above reason is that there are only three nodes in the whole, among which only two have become the master, and the discovery.zen.minimum master nodes configured in our configuration file are also 2, When we stop one node. Master, there is only one node. Master left in the whole cluster, which can't meet the requirement that at least two nodes. Master participate in the election, so the cluster goes down.
OK, now we know the basic requirements, that is, at least three nodes. Master and at least two participating in the election. Now, build a new cluster:
ES1:
[root@es1 ~]# cat /etc/elasticsearch/elasticsearch.yml cluster.name: my-app node.name: es1 node.master: true node.data: true path.data: /var/es/data path.logs: /var/es/logs network.host: 192.168.1.8 http.port: 9200 transport.tcp.port: 9300 cluster.initial_master_nodes: ["es1"] discovery.zen.ping.unicast.hosts: ["192.168.1.8:9300","192.168.1.9:9300", "192.168.1.10:9300","192.168.1.12:9300"] discovery.zen.minimum_master_nodes: 2 discovery.zen.ping_timeout: 100s
ES2:
[root@es2 ~]# cat /etc/elasticsearch/elasticsearch.yml cluster.name: my-app node.name: es2 node.master: true node.data: true path.data: /var/es/data path.logs: /var/es/logs network.host: 192.168.1.9 http.port: 9200 transport.tcp.port: 9300 cluster.initial_master_nodes: ["es1"] discovery.zen.ping.unicast.hosts: ["192.168.1.8:9300","192.168.1.9:9300", "192.168.1.10:9300","192.168.1.12:9300"] discovery.zen.minimum_master_nodes: 2 discovery.zen.ping_timeout: 100s
ES3:
[root@es3 ~]# cat /etc/elasticsearch/elasticsearch.yml cluster.name: my-app node.name: es3 node.master: true node.data: true path.data: /var/es/data path.logs: /var/es/logs network.host: 192.168.1.10 http.port: 9200 transport.tcp.port: 9300 cluster.initial_master_nodes: ["es1"] discovery.zen.ping.unicast.hosts: ["192.168.1.8:9300","192.168.1.9:9300", "192.168.1.10:9300","192.168.1.12:9300"] discovery.zen.minimum_master_nodes: 2 discovery.zen.ping_timeout: 100s
ES4:
[root@es4 ~]# cat /etc/elasticsearch/elasticsearch.yml cluster.name: my-app node.name: es4 node.master: false node.data: true path.data: /var/es/data path.logs: /var/es/logs network.host: 192.168.1.12 http.port: 9200 transport.tcp.port: 9300 cluster.initial_master_nodes: ["es1"] discovery.zen.ping.unicast.hosts: ["192.168.1.8:9300","192.168.1.9:9300", "192.168.1.10:9300","192.168.1.12:9300"] discovery.zen.minimum_master_nodes: 2 discovery.zen.ping_timeout: 100s
After the four nodes are started one by one, you can see in the logs of ES1 node:
[2020-03-16T13:11:53,469][INFO ][o.e.c.s.MasterService ] [es1] node-join[{es2}{CgceuyGlQ2GTcCGqegSz3Q}{oxQ-79VaQxulg07SejH0iQ}{192.168.1.9}{192.168.1.9:9300}{dilm}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 1, version: 20, delta: added {{es2}{CgceuyGlQ2GTcCGqegSz3Q}{oxQ-79VaQxulg07SejH0iQ}{192.168.1.9}{192.168.1.9:9300}{dilm}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true}} [2020-03-16T13:11:55,058][INFO ][o.e.c.s.ClusterApplierService] [es1] added {{es2}{CgceuyGlQ2GTcCGqegSz3Q}{oxQ-79VaQxulg07SejH0iQ}{192.168.1.9}{192.168.1.9:9300}{dilm}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true}}, term: 1, version: 20, reason: Publication{term=1, version=20} [2020-03-16T13:11:55,084][INFO ][o.e.c.s.MasterService ] [es1] node-join[{es3}{i-wCmfCESsCNDr6Vw50Aew}{rxtD1oqtQniz62voo7MPUg}{192.168.1.10}{192.168.1.10:9300}{dilm}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true} join existing leader, {es4}{L221OFajR8-FlaIIYg37Qw}{zvHrSewbQvq3OO14zKHCpg}{192.168.1.12}{192.168.1.12:9300}{dil}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true} join existing leader], term: 1, version: 21, delta: added {{es4}{L221OFajR8-FlaIIYg37Qw}{zvHrSewbQvq3OO14zKHCpg}{192.168.1.12}{192.168.1.12:9300}{dil}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true},{es3}{i-wCmfCESsCNDr6Vw50Aew}{rxtD1oqtQniz62voo7MPUg}{192.168.1.10}{192.168.1.10:9300}{dilm}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true}} [2020-03-16T13:11:56,347][INFO ][o.e.c.s.ClusterApplierService] [es1] added {{es4}{L221OFajR8-FlaIIYg37Qw}{zvHrSewbQvq3OO14zKHCpg}{192.168.1.12}{192.168.1.12:9300}{dil}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true},{es3}{i-wCmfCESsCNDr6Vw50Aew}{rxtD1oqtQniz62voo7MPUg}{192.168.1.10}{192.168.1.10:9300}{dilm}{ml.machine_memory=1019797504, ml.max_open_jobs=20, xpack.installed=true}}, term: 1, version: 21, reason: Publication{term=1, version=21}
As can be seen from the log, the remaining three nodes have been added to a cluster initialized by ES1. Now, you can view the status of each node of the cluster by visiting http://192.168.1.8:9200/_cat/nodes?v:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 192.168.1.10 8 93 0 0.00 0.13 0.25 dilm - es3 192.168.1.8 8 92 0 0.00 0.12 0.24 dilm * es1 192.168.1.12 11 93 0 0.00 0.12 0.22 dil - es4 192.168.1.9 7 92 0 0.00 0.12 0.24 dilm - es2
Now we stop the ES1 node:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 192.168.1.10 9 93 0 0.00 0.08 0.22 dilm * es3 192.168.1.12 9 93 0 0.04 0.09 0.20 dil - es4 192.168.1.9 11 92 0 0.00 0.08 0.21 dilm - es2
Now you see that the master node is transferred to ES3.
The basic ES cluster is built with high availability.