MHA high availability configuration and failover

Keywords: Database MySQL MHA

catalogue

preface

1, MHA

      1. Overview

      2. Composition of MHA

      3. Characteristics of MHA

2, Build MySQL MHA

      1. Experimental ideas

      2. Experimental environment

      3. Experimental steps

3, Fault simulation

      1. Malfunction

      2. Algorithm of alternative master database for failover

      3. Troubleshooting steps

summary

preface

         In the actual production environment, when MySQL master-slave replication and read-write separation cannot meet the actual needs, and once the master fails and there is no leader, there will be no manager. MHA can solve this problem, and MySQL high availability cluster can be considered.

1, MHA

      1. Overview

          MHA (MasterHigh Availability) is a set of excellent software for failover and master-slave replication in MySQL high availability environment.

         The emergence of MHA is to solve the problem of MySQL single point of failure.

          During MySQL failover, MHA can automatically complete the failover operation within 0-30 seconds.

          MHA can ensure the consistency of data to the greatest extent in the process of failover, so as to achieve high availability in the real sense.

      2. Composition of MHA

         MHA Node

                The MHA Node runs on each MySQL server.

         MHA Manager (management node)

                MHA Manager can be deployed on an independent machine to manage multiple master slave clusters; It can also be deployed on a slave node.
                The MHA Manager will periodically probe the master node in the cluster. When the master fails, it can automatically promote the slave of the latest data to the new master, and then point all other slave to the new master again. The entire failover process is completely transparent to the application.

      3. Characteristics of MHA

          In the process of automatic failover, MHA tries to save binary logs from the down main server to ensure no data loss to the greatest extent;

          Using semi synchronous replication can greatly reduce the risk of data loss. If only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, so it can ensure the data consistency of all nodes;

          At present, MHA supports a master-slave architecture with at least three services, i.e. one master and two slaves.

MHA: to solve the problems of failover, data preservation as much as possible, and consistency of all node logs

2, Build MySQL MHA

      1. Experimental ideas

1. MHA architecture
1) Database installation
2) One master and two slaves
3) MHA construction

2. Fault simulation
1) Main library failure
2) The alternative master library becomes the master library
3) The original fault master database recovers and rejoins the MHA as a slave database

      2. Experimental environment

hosthost nameoperating systemIP addressMain software
MHA ManagerCentOS 7-1CentOS 7192.168.159.11MHA Node component, MHA Manager component
MasterCentOS 7-2CentOS 7192.168.159.20mysql-boost-5.7.20.tar.gz, MHAnode component
Slave1CentOS 7-3CentOS 7192.168.159.70mysql-boost-5.7.20.tar.gz, MHAnode component
Slave2CentOS 7-4CentOS 7192.168.159.80mysql-boost-5.7.20.tar.gz, MHAnode component

      3. Experimental steps

          3-1. Close the firewall

systemctl stop firewalld
systemctl disable firewalld
setenforce 0

          3-2. Modify the Mysql main configuration file / etc/my.cnf of Master, Slave1 and Slave2 nodes  

master: 
[root@localhost ~]# vim /etc/my.cnf
#Default character set = utf8 this character set needs to be annotated

[mysqld]
server-id = 1
log_bin = master-bin
log-slave-updates = true


[root@localhost ~]# systemctl restart mysqld

salve1:
[root@www ~]# vim /etc/my.cnf
#Default character set = utf8 this character set needs to be annotated

server-id = 2
log_bin = master-bin
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index

[root@www ~]# systemctl restart mysqld

slave2:
[root@www ~]# vim /etc/my.cnf
#Character set server = utf8, where the character set needs to be annotated

server-id = 3
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index

[root@www ~]# systemctl restart mysqld

 

          3-3. Create two soft links on the Master, Slave1 and Slave2 nodes

[root@localhost ~]# ln -s /usr/local/mysql/bin/mysql /usr/sbin/
[root@localhost ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/

          3-4. Configure mysql with one master and two slaves

                3-4-1. All database nodes are authorized with mysql

#Enter the following in the Master, Slave1 and Slave2 servers

mysql> grant replication slave on *.* to 'myslave'@'192.168.159.%' identified by '123456';

mysql> grant all privileges on *.* to 'mha'@'192.168.159.%' identified by 'manager';

mysql> grant all privileges on *.* to 'mha'@'master' identified by 'manager';

#Enter the following in the master server to prevent the slave library from connecting to the master library through the host name

mysql> grant all privileges on *.* to 'mha'@'slave1' identified by 'manager';

mysql> grant all privileges on *.* to 'mha'@'slave2' identified by 'manager';

mysql> flush privileges;

                3-4-2. View binary files and synchronization points on the Master node

mysql> show master status\G;

                3-4-3. Perform synchronization at Slave1 and Slave2 nodes

mysql> change master to master_host='192.168.159.20',master_user='myslave',master_password='123456',mas
ter_log_file='master-bin.000001',master_log_pos=1747;

mysql> start slave;

                 3-4-4. View data synchronization results at Slave1 and Slave2 nodes

mysql> show slave status\G;


                  3-4-5. The two slave libraries must be set to read-only mode

mysql> set global read_only=1;

                  3-4-6. Insert data and test database synchronization

stay Master Insert a piece of data into the main library to test whether it is synchronized
mysql> create database nage;

mysql> use nage;

mysql> create table zz (id int(4),name varchar(20));

mysql> insert into zz values(1,'wangyi');

 ​​​​

 

          3-5. Install MHA software

                3-5-1. MHA dependent environments are installed on all servers. First, install the epel source

[root@localhost ~]# yum install epel-release --nogpgcheck -y
[root@localhost ~]# yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

                3-5-2. To install MHA software package, node components must be installed on all servers first
The version of each operating system is different. Here, CentOS 7.6 selects version 0.57.
You must first install the node component on all servers, and finally install the manager component on the MHA manager node, because manager depends on the node component.

[root@localhost ~]# cd /opt
[root@localhost opt]# tar zxvf mha4mysql-node-0.57.tar.gz
[root@localhost opt]# cd mha4mysql-node-0.57
[root@localhost mha4mysql-node-0.57]# perl Makefile.PL
[root@localhost mha4mysql-node-0.57]# make && make install

                3-5-3. Install the manager component on the MHA manager node

[root@localhost mha4mysql-node-0.57]# cd /opt   #Move the installation package to the directory
[root@localhost opt]# tar zxvf mha4mysql-manager-0.57.tar.gz
[root@localhost opt]# cd mha4mysql-manager-0.57
[root@localhost mha4mysql-manager-0.57]# perl Makefile.PL
[root@localhost mha4mysql-manager-0.57]# make && make install
#After the manager component is installed, several tools will be generated under / usr/local/bin, mainly including the following:
masterha_check_ssh inspect MHA of SSH Configuration status
masterha_check_repl inspect MySQL Replication status
masterha_manger start-up manager Script for
masterha_check_status Detect current MHA running state 
masterha_master_monitor testing master Is it down
masterha_master_switch Control failover (automatic or manual)
masterha_conf_host Add or remove configured server information
masterha_stop  close manager

#After the node component is installed, several scripts will also be generated under / usr/local/bin (these tools are usually triggered by MHAManager scripts without manual operation). The main contents are as follows:
save_binary_logs Save and copy master Binary log for
apply_diff_relay_logs Identify differential relay log events and apply their differential events to other events slave
filter_mysqlbinlog Remove unnecessary ROLLBACK Events( MHA (this tool is no longer used)
purge_relay_logs Clear relay log (no blocking) SQL Thread)

          3-6. Configure password less authentication on all servers

                3-6-1. Configure password free authentication to all database nodes on the manager node

[root@localhost mha4mysql-manager-0.57]# ssh-keygen -t rsa  #Press enter all the way

[root@localhost mha4mysql-manager-0.57]# ssh-copy-id 192.168.159.20  #master

[root@localhost mha4mysql-manager-0.57]# ssh-copy-id 192.168.159.70  #slave1

[root@localhost mha4mysql-manager-0.57]# ssh-copy-id 192.168.159.80  #slave2

                3-6-2. Configure password free authentication to database nodes slave1 and slave2 on the master

[root@localhost mha4mysql-node-0.57]# ssh-keygen -t rsa

[root@localhost mha4mysql-node-0.57]# ssh-copy-id 192.168.159.70

[root@localhost mha4mysql-node-0.57]# ssh-copy-id 192.168.159.80

                3-6-3. Configure password free authentication to the database nodes master and slave2 on slave1

[root@www mha4mysql-node-0.57]# ssh-keygen -t rsa

[root@www mha4mysql-node-0.57]# ssh-copy-id 192.168.159.20

[root@www mha4mysql-node-0.57]# ssh-copy-id 192.168.159.80

                3-6-4. Configure password free authentication to the database nodes master and slave1 on slave2

[root@www mha4mysql-node-0.57]# ssh-keygen -t rsa

[root@www mha4mysql-node-0.57]# ssh-copy-id 192.168.159.20

[root@www mha4mysql-node-0.57]# ssh-copy-id 192.168.159.70

          3-7. Configure MHA on the manager node

                3-7-1. Copy the relevant scripts on the manager node to the / usr/local/bin directory

[root@localhost mha4mysql-manager-0.57]# cp -rp /opt/mha4mysql-manager-0.57/samples/scripts /usr/local/bin

//After copying, there will be four execution files
[root@localhost mha4mysql-manager-0.57]# ll /usr/local/bin/scripts/ 
Total consumption 32
-rwxr-xr-x. 1 1001 1001  3648 5 December 31, 2015 master_ip_failover  #Script for VIP management during automatic switching
-rwxr-xr-x. 1 1001 1001  9870 5 December 31, 2015 master_ip_online_change  #Management of vip during online switching
-rwxr-xr-x. 1 1001 1001 11867 5 December 31, 2015 power_manager  #Script to shut down the host after a failure
-rwxr-xr-x. 1 1001 1001  1360 5 December 31, 2015 send_report   #Script for sending alarms after failover

                3-7-2. Copy the above script of VIP management during automatic switching to / usr/local/bin directory. Here, use master_ip_failover scripts to manage VIPs and failover

[root@localhost home]# cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin

                3-7-3. The modified contents are as follows: (delete the original contents, directly copy and modify vip related parameters)

[root@localhost home]# cd /usr/local/bin/
[root@localhost bin]# cp master_ip_failover master_ip_failover.bak  #backups
[root@localhost bin]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#############################Add content section#########################################
my $vip = '192.168.159.200';									#Specify the address of the vip
my $brdc = '192.168.159.255';								#Specify the broadcast address of the vip
my $ifdev = 'ens33';										#Specify the network card bound by vip
my $key = '1';												#Specifies the serial number of the virtual network card bound by the vip
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";		#Represents that the value of this variable is ifconfig ens33:1 192.168.159.200
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";		#Represents that the value of this variable is ifconfig ens33:1 192.168.159.200 down
my $exit_code = 0;											#Specifies that the exit status code is 0
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

                3-7-4. Create the MHA software directory and copy the configuration file. Here, use the app1.cnf configuration file to manage the mysql node server

[root@localhost bin]# mkdir /etc/masterha
[root@localhost bin]# cp /opt/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha
[root@localhost bin]# vim /etc/masterha/app1.cnf   ##Delete the original content, directly copy and modify the IP address of the node server

[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.159.70 -s 192.168.159.80
shutdown_script=""
ssh_user=root
user=mha

[server1]
hostname=192.168.159.20
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.159.70
port=3306

[server3]
hostname=192.168.159.80
port=3306

----------------------------------------------------------------------------------------------------------
[server default]
manager_log=/var/log/masterha/app1/manager.log      #manager log
manager_workdir=/var/log/masterha/app1            #manager working directory
master_binlog_dir=/usr/local/mysql/data/         #The location where the master saves binlog. The path here should be consistent with the path of binlog configured in the master so that MHA can find it
master_ip_failover_script=/usr/local/bin/master_ip_failover  #Set the switching script for automatic failover, that is, the script above
master_ip_online_change_script=/usr/local/bin/master_ip_online_change  #Set the switching script for manual switching
password=manager			#Set the password of the root user in mysql, which is the password of the monitoring user created in the previous article
ping_interval=1				#Set the time interval for monitoring the main database and sending ping packets. The default is 3 seconds. When there is no response after three attempts, the failover will be carried out automatically
remote_workdir=/tmp			#Set the save location of binlog when remote mysql switches
repl_password=123			#Set the password of the replication user
repl_user=myslave			#Set the user of the replication user
report_script=/usr/local/send_report     #Set the script of the alarm sent after switching
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.159.70 -s 192.168.159.80	#Specifies the IP address of the slave server to check
shutdown_script=""			#Set the script to close the fault host after the fault occurs (the main function of this script is to close the host to prevent brain crack, which is not used here)
ssh_user=root				#Set ssh login user name
user=mha					#Set monitoring user root

[server1]
hostname=192.168.159.20
port=3306

[server2]
hostname=192.168.159.70
port=3306
candidate_master=1
#Set as the candidate master. After setting this parameter, the slave library will be promoted to the master library after the master-slave switch occurs, even if the slave library is not the latest slave in the cluster

check_repl_delay=0
#By default, if a slave lags behind the master by more than 100M of relay logs, MHA will not select the slave as a new master, because the recovery of the slave takes a long time; By setting check_repl_delay=0, MHA triggers the switch. When selecting a new master, the replication delay will be ignored. This parameter is set to candidate_ The host with master = 1 is very useful because the candidate master must be a new master in the process of switching

[server3]
hostname=192.168.159.80
port=3306

          3-8. For the first configuration, you need to manually start the virtual IP on the Master node

[root@localhost mha4mysql-node-0.57]# ifconfig ens33:1 192.168.159.200/24

          3-9. Test ssh password free authentication on the manager node. If it is normal, it will output successfully, as shown below.

[root@localhost bin]# masterha_check_ssh -conf=/etc/masterha/app1.cnf

          3-10. Test the mysql master-slave connection on the manager node, and the word MySQL Replication Health is OK indicates that it is normal

[root@localhost bin]# masterha_check_repl -conf=/etc/masterha/app1.cnf

          3-11. Start MHA on the manager node

[root@localhost bin]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_d
ead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &


--remove_dead_master_conf: This parameter represents the of the old master database after master-slave switching ip Will be removed from the configuration file.
--manger_log: Log storage location.
--ignore_last_failover: By default, if MHA If continuous downtime is detected and the interval between two outages is less than 8 hours, it will not be Failover, The reason for this restriction is to avoid ping-pong effect. This parameter represents ignoring the last MHA Files generated by trigger switching. By default, MHA After switching, it will be recorded in the log, that is, the log set above app1.failover.complete File. If the file is found in this directory during the next switch, the switch is not allowed unless the file is deleted after the first switch. For convenience, it is set to--ignore_last_failover. 
----------------------------------------------------------------------------------------------------------

          3-12. Check the MHA status. You can see that the current master is Mysql1 node

[root@localhost bin]# masterha_check_status --conf=/etc/masterha/app1.cnf

          3-13. Check the MHA log to see that the current master is 192.168.159.20

[root@localhost bin]# cat /var/log/masterha/app1/manager.log | grep "current master"

          3-14. Check whether the VIP address 192.168.10.200 of Mysql1 exists. This VIP address will not disappear because the manager node stops MHA service

ifconfig

//To shut down the manager service, you can use the following command.
masterha_stop --conf=/etc/masterha/app1.cnf
 Or it can be used directly kill process ID Close by.

3, Fault simulation

      1. Malfunction

        1-1. Monitor and observe log records on the manager node

tail -f /var/log/masterha/app1/manager.log

          1-2. Stop MySQL service on Master node Mysql1

systemctl stop mysqld
 or
pkill -9 mysql

#After a normal automatic switching, the MHA process will exit. HMA will automatically modify the contents of the app1.cnf file and delete the down mysql1 node. Check whether mysql2 takes over VIP
ifconfig

      2. Algorithm of alternative master database for failover

1.It is generally judged that the slave library is from( position/GTID)Judging the pros and cons, there are differences in data, which is closest to master of slave,Become an alternative master.
2.If the data is consistent, select an alternative master database according to the order of the configuration file.
3.Set weighted( candidate_master=1),Force an alternate master by weight.
(1)By default, if one slave backward master 100M of relay logs If there is a weight, it will fail.
(2)If check_repl_delay=0 If, even if it lags behind a lot of logs, it is forced to select it as the alternative master.

      3. Troubleshooting steps

          3-1. Repair mysql

systemctl restart mysqld

          3-2. Repair master-slave

#View binary files and synchronization points on the current main database server Mysql2
show master status;

#Perform synchronization on the original master database server mysql1
change master to master_host='192.168.159.70',master_user='myslave',master_password='123456',master_log_file='master-bin.000001',master_log_pos=1747;

start slave;

          3-3. Modify the configuration file app1.cnf on the manager node (add this record again, because it will disappear automatically when it detects failure)

vim /etc/masterha/app1.cnf
......
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.159.70 -s 192.168.159.80
......
[server1]
hostname=192.168.159.20
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.159.70
port=3306

[server3]
hostname=192.168.159.80
port=3306

          3-4. Start MHA on the manager node

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

#Solve the problem of incompatible Chinese and English characters and error reporting
dos2unix /usr/local/bin/master_ip_failover 

summary

        1. Save binary log events from the master of downtime and crash

         2. Identify the slave with the latest update  

        3. Apply different relay log s to other slave  

        4. Apply binary log events saved from the master  

        5. Upgrade a slave to a new master  

        6. Connect other slave s to the new master for replication  

        Note: MHA needs to verify the login method based on ssh and key

Posted by luke_barnes on Mon, 08 Nov 2021 03:06:22 -0800