Construction of MHA+keepalive High Availability Environment

Keywords: MySQL ssh yum Database

MHA (Master High Availability) is currently a relatively mature solution for MySQL high availability. It was developed by youshimaton (now working for Facebook) of DeNA Company in Japan. It is an excellent high availability software for failover and master-slave upgrading in MySQL high availability environment. In the process of MySQL fault switching, MHA can automatically complete the database fault switching operation within 0-30 seconds, and in the process of fault switching, MHA can ensure the consistency of data to the greatest extent, in order to achieve high availability in the true sense.

The software consists of two parts: MHA Manager (Management Node) and MHA Node (Data Node).

MHA Node runs on each MySQL server, and MHA Manager detects the master node in the cluster regularly. When the master fails, it can automatically upgrade the slave of the latest data to the new master, and then redirect all other slaves to the new master. The whole failover process is completely transparent to the application.

Experimental environment

10.192.203.201, 10.192.203.101 with keepalive installed, VIP 108

Experimental steps


1 Modify / etc/hosts

Add these server ip and host name correspondences to the server where the mha management node and data node are located.

Such as:

cat /etc/hosts

10.192.203.201 pc2

10.192.203.101 slave1

10.192.203.102 PC

2 Mysql master-slave replication environment

The construction process is simple and can be referred to as follows: http://blog.csdn.net/yabingshi_tech/article/details/45192599.

Make sure that two slave libraries set read_only.

Make sure that master and alternate master are primary and alternate modes, otherwise errors may occur in the subsequent configuration process.

 

Configure host trust relationship

Generate password files at all nodes and then copy them to local computers and other servers. Take 10.192.203.201 as an example:

 # ssh-keygen

# ssh-copy-id  root@10.192.203.201

 # ssh-copy-id  root@10.192.203.101

 # ssh-copy-id  root@10.192.203.102  

Then ssh verifies whether password-free login is possible.

 

4 install MHA

Click here to download:

http://download.csdn.net/download/yabignshi/8974251

http://download.csdn.net/detail/yabignshi/8974265 

Install on all data nodes:

yum install perl-DBD-MySQL -y

rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

When the installation is completed, the following script files are generated in the / usr/bin directory (these tools are usually triggered by MHA Manager scripts without human manipulation):

save_binary_logs // Save and copy master's binary logs
apply_diff_relay_logs // / Recognize differences in relay log events and apply their differences to other slave s
filter_mysqlbinlog// Remove unnecessary ROLLBACK events (MHA no longer uses this tool)
purge_relay_logs // / Clear relay logs (no blocking of SQL threads)

Install on the management node:

yum install perl-DBD-MySQL-y (since the experimental environment management node and data node are deployed on the same server, there is no need to re-install here)

yum install perl-Config-Tiny -y

yum install epel-release -y

yum install perl-Log-Dispatch -y

yum install perl-Parallel-ForkManager -y

Rpm-ivh mha4mysql-node-0.56-0.el6.noarch.rpm (because the experimental environment management node and data node are deployed on the same server, so there is no need to re-install here)

rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm 

After installation, the following script files will be generated in the / usr/bin directory:

-rwxr-xr-x. 1 root      root       1995 Apr  1  2014 masterha_check_repl
-rwxr-xr-x. 1 root      root        1779 Apr 1  2014 masterha_check_ssh
-rwxr-xr-x. 1 root      root       1865 Apr  1  2014 masterha_check_status
-rwxr-xr-x. 1 root      root        3201 Apr 1  2014 masterha_conf_host
-rwxr-xr-x. 1 root      root        2517 Apr 1  2014 masterha_manager
-rwxr-xr-x. 1 root      root        2165 Apr 1  2014 masterha_master_monitor
-rwxr-xr-x. 1 root      root        2373 Apr 1  2014 masterha_master_switch
-rwxr-xr-x. 1 root      root        5171 Apr 1  2014 masterha_secondary_check
-rwxr-xr-x. 1 root      root        1739 Apr 1  2014 masterha_stop
-rwxr-xr-x. 1 root      root       4807 Apr  1  2014 filter_mysqlbinlog
-rwxr-xr-x. 1 root      root       7525 Apr  1  2014 save_binary_logs
-rwxr-xr-x. 1 root      root       8261 Apr  1  2014 purge_relay_logs
-rwxr-xr-x. 1 root      root      16367 Apr  1  2014 apply_diff_relay_logs

5 Installation configuration keepalive


5.1 Install keepalive

Install keepalive on master and alternate master:

yum install -y popt-devel
cd /usr/local/src
wgethttp://www.keepalived.org/software/keepalived-1.2.2.tar.gz
tar zxvf keepalived-1.2.2.tar.gz
cd keepalived-1.2.2
./configure --prefix=/
make
make install

5.2 Modify configuration file

vi/etc/keepalived/keepalived.conf

The contents of master and alternate master configuration files are the same.

#ConfigurationFile for keepalived
global_defs {
notification_email {                        ######Define the mailbox to receive mail
 wangjj@hrloo.com
       }
 notification_email_from jiankong@staff.tuge.com    ######Define the mailbox to send mail
 smtp_server mail.tuge.com
 smtp_connect_timeout 10
}
vrrp_instance vrrptest {                 ######Define vrrptest instance
       state BACKUP              ######Server status
 interface eth0                     ######Used interfaces
       virtual_router_id 51               ######Virtual routing flags, a set of lvs virtual routing flags must be the same in order to switch
       priority 150                       ######Service startup priority, the higher the value and priority, BACKUP can not be greater than MASTER
       advert_int 1                        ######Survival check time between servers
authentication {
       auth_type PASS                     ######Authentication type
       auth_pass ufsoft       ######Authentication passwords, a set of lvs server authentication passwords must be consistent
}
virtual_ipaddress {                         ######Virtual IP Address
        10.192.203.108
}
}

Here the master server state is not configured as MASTER, and the priority of the configuration is the same. It is expected that when the master is down and restored, it will not take the initiative to grab the MASTER state to avoid the fluctuation of MySQL service.

vrrp_script is not configured here, which will allow mha to drift vip automatically later.

5.3  vi /etc/sysconfig/iptables

# Note that both machines need to be modified.

Add to:

-A INPUT-d 10.192.203.108/32 -j ACCEPT

- A INPUT-d 224.0.0.18-j ACCEPT# A D D VRRP Communication Support

Note: 10.192.203.108 in the first line needs to be changed to your own vip.

serviceiptables restart

5.4 Start keepalive

service keepalived start

Execute the ip addr command separately, and you can see the virtual IP on one of the machines

5.5 test

Stop the master server keepalived and check whether VIP is switched to the alternate master server (using ip) The addr command is validated.

6 configure Mha


6.1 Added Management Account

# Perform the following actions on the data node

grant all privileges on *.* TO mha@'10.192.%' IDENTIFIED BY 'test'; 

flush privileges;

6.2 Configuration/etc/mha/app1.cnf

# Only on the management side

 mkdir /etc/mha

 mkdir -p /var/log/mha/app1 

vi /etc/mha/app1.cnf

Add to:

[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/data/mysql/data
master_ip_failover_script= /usr/bin/master_ip_failover     
master_ip_online_change_script=/usr/bin/master_ip_online_change
report_script=/usr/bin/send_report 
user=mha
password=test
ping_interval=2
repl_password=beijing
repl_user=rep_user
ssh_user=root
 
[server1]
hostname=10.192.203.201
port=3306
 
[server2]
candidate_master=1
check_repl_delay=0
hostname=10.192.203.101
port=3306
 
[server3]
hostname=10.192.203.102
port=3306
Configuration in server default is the common configuration of three data nodes. It can also be customized in a specific server.
/*
Interpretation of parameter concepts:

master_binlog_dir=/data/mysql/data# Sets the location where the master saves the binlog so that MHA can find the master's log

master_ip_failover_script=/usr/bin/master_ip_failover= Sets the handover script when automatic failover occurs

master_ip_online_change_script=/usr/bin/master_ip_online_change# Sets the handover script for manual handover

report_script=/usr/bin/send_report * * * * // Sets the script for alerting when a switch occurs

ping_interval=2 # Sets up the monitor main library, and the time interval for sending ping packets is 3 seconds by default. failover automatically occurs when three attempts are not answered.

Candidate master = 1 # is set as candidate master. If this parameter is set, the slave library will be promoted to the master library after master-slave switching occurs, even if the master library is not the latest slave of events in the cluster.

By default, if a slave lags behind relay logs of master 100M, MHA will not choose the slave as a new master because it takes a long time to recover the slave. By setting check_repl_delay=0, the MHA trigger switch will ignore the replication delay when selecting a new master. This parameter sets candidate for the recovery of the slave._ The master = 1 host is very useful because the candidate master must be a new master in the process of switching.

*/ 

6.3 master_ip_failover script code

To introduce the keepalived service into MHA, we only need to modify the script file master_ip_failover, which triggers the switch. In this script, we add the handling of the keepalived service when the Master goes down.

In the management node edit script / usr/bin/master_ip_failover, the modification is as follows:

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
 
use Getopt::Long;
 
my (
   $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
   $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
 
my $vip = '10.192.203.108';
my $ssh_start_vip ="/etc/init.d/keepalived start";
my $ssh_stop_vip ="/etc/init.d/keepalived stop";
 
GetOptions(
   'command=s'          =>\$command,
   'ssh_user=s'         =>\$ssh_user,
   'orig_master_host=s' => \$orig_master_host,
   'orig_master_ip=s'   =>\$orig_master_ip,
   'orig_master_port=i' => \$orig_master_port,
   'new_master_host=s'  =>\$new_master_host,
   'new_master_ip=s'    =>\$new_master_ip,
   'new_master_port=i'  =>\$new_master_port,
);
 
exit &main();
 
sub main {
 
   print "\n\nIN SCRIPTTEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
 
   if ( $command eq "stop" || $command eq "stopssh" ) {
 
       my $exit_code = 1;
       eval {
           print "Disabling the VIP on old master: $orig_master_host \n";
           &stop_vip();
           $exit_code = 0;
       };
       if ($@) {
           warn "Got Error: $@\n";
           exit $exit_code;
       }
       exit $exit_code;
    }
   elsif ( $command eq "start" ) {
 
       my $exit_code = 10;
       eval {
           print "Enabling the VIP - $vip on the new master - $new_master_host\n";
           &start_vip();
           $exit_code = 0;
       };
       if ($@) {
           warn $@;
           exit $exit_code;
       }
       exit $exit_code;
    }
   elsif ( $command eq "status" ) {
       print "Checking the Status of the script.. OK \n";
       exit 0;
    }
   else {
       &usage();
       exit 1;
    }
}
sub start_vip() {
   `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIPon the old_master
sub stop_vip() {
   `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
   print
   "Usage: master_ip_failover --command=start|stop|stopssh|status--orig_master_host=host --orig_master_ip=ip --orig_master_port=port--new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

Note: my $vip = 10.192.203.108'; VIP in this line should be changed to its own virtual IP according to the situation.

chmod +x /usr/bin/master_ip_failover

6.4 master_ip_online_change script code

Edit script / usr/bin/master_ip_online_change at management node

#!/usr/bin/env perl
 
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
#  itunder the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
# Foundation, Inc.,
#  51Franklin Street, Fifth Floor, Boston, MA 02110-1301  USA
 
## Note: This is a sample script and is notcomplete. Modify the script based on your environment.
 
use strict;
use warnings FATAL => 'all';
 
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofdaytv_interval );
use Data::Dumper;
 
my $_tstart;
my $_running_interval = 0.1;
my (
 $command,             $orig_master_is_new_slave, $orig_master_host,
 $orig_master_ip,      $orig_master_port,        $orig_master_user,
 $orig_master_password, $orig_master_ssh_user,     $new_master_host,
 $new_master_ip,       $new_master_port,         $new_master_user,
 $new_master_password, $new_master_ssh_user
);
my $vip = '10.192.203.108/32';
my $key = '1';
my $ssh_start_vip = "/etc/init.d/keepalivedstart";
my $ssh_stop_vip = "/etc/init.d/keepalivedstop";
my $orig_master_ssh_port = 22;
my $new_master_ssh_port = 22;
GetOptions(
 'command=s'                =>\$command,
 'orig_master_is_new_slave' => \$orig_master_is_new_slave,
 'orig_master_host=s'       =>\$orig_master_host,
 'orig_master_ip=s'         =>\$orig_master_ip,
 'orig_master_port=i'       =>\$orig_master_port,
 'orig_master_user=s'       =>\$orig_master_user,
 'orig_master_password=s'   =>\$orig_master_password,
 'orig_master_ssh_user=s'   =>\$orig_master_ssh_user,
 'new_master_host=s'        =>\$new_master_host,
 'new_master_ip=s'          =>\$new_master_ip,
 'new_master_port=i'        =>\$new_master_port,
 'new_master_user=s'        =>\$new_master_user,
 'new_master_password=s'    =>\$new_master_password,
 'new_master_ssh_user=s'    =>\$new_master_ssh_user,
 'orig_master_ssh_port=i'    =>\$orig_master_ssh_port,
 'new_master_ssh_port=i'    =>\$new_master_ssh_port,
);
 
exit &main();
 
sub current_time_us {
  my( $sec, $microsec ) = gettimeofday();
  my$curdate = localtime($sec);
 return $curdate . " " . sprintf( "%06d", $microsec);
}
 
sub sleep_until {
  my$elapsed = tv_interval($_tstart);
  if( $_running_interval > $elapsed ) {
   sleep( $_running_interval - $elapsed );
  }
}
 
sub get_threads_util {
  my$dbh                    = shift;
  my$my_connection_id       = shift;
  my$running_time_threshold = shift;
  my$type                   = shift;
 $running_time_threshold = 0 unless ($running_time_threshold);
 $type                   = 0 unless($type);
  my@threads;
 
  my$sth = $dbh->prepare("SHOW PROCESSLIST");
 $sth->execute();
 
 while ( my $ref = $sth->fetchrow_hashref() ) {
   my $id         = $ref->{Id};
   my $user       = $ref->{User};
   my $host       = $ref->{Host};
   my $command    =$ref->{Command};
   my $state      = $ref->{State};
   my $query_time = $ref->{Time};
   my $info       = $ref->{Info};
   $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
   next if ( $my_connection_id == $id );
   next if ( defined($query_time) && $query_time <$running_time_threshold );
   next if ( defined($command)   && $command eq "Binlog Dump" );
   next if ( defined($user)      && $user eq "system user" );
   next
     if ( defined($command)
     && $command eq "Sleep"
     && defined($query_time)
     && $query_time >= 1 );
 
   if ( $type >= 1 ) {
     next if ( defined($command) && $command eq "Sleep" );
     next if ( defined($command) && $command eq "Connect");
    }
 
   if ( $type >= 2 ) {
     next if ( defined($info) && $info =~ m/^select/i );
     next if ( defined($info) && $info =~ m/^show/i );
    }
 
   push @threads, $ref;
  }
 return @threads;
}
 
sub main {
  if( $command eq "stop" ) {
   ## Gracefully killing connections on the current master
    #1. Set read_only= 1 on the new master
    #2. DROP USER so that no app user can establish new connections
    #3. Set read_only= 1 on the current master
    #4. Kill current queries
    #* Any database access failure will result in script die.
   my $exit_code = 1;
   eval {
     ## Setting read_only=1 on the new master (to avoid accident)
     my $new_master_handler = new MHA::DBHelper();
 
     # args: hostname, port, user, password, raise_error(die_on_error)_or_not
     $new_master_handler->connect( $new_master_ip, $new_master_port,
       $new_master_user, $new_master_password, 1 );
     print current_time_us() . " Set read_only on the new master..";
     $new_master_handler->enable_read_only();
     if ( $new_master_handler->is_read_only() ) {
       print "ok.\n";
     }
      else {
       die "Failed!\n";
     }
     $new_master_handler->disconnect();
 
     # Connecting to the orig master, die if any database error happens
     my $orig_master_handler = new MHA::DBHelper();
     $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
       $orig_master_user, $orig_master_password, 1 );
 
     ## Drop application user so that nobody can connect. Disablingper-session binlog beforehand
     $orig_master_handler->disable_log_bin_local();
      print current_time_us() . " Drpping appuser on the orig master..\n";
     #FIXME_xxx_drop_app_user($orig_master_handler);
 
     ## Waiting for N * 100 milliseconds so that current connections can exit
     my $time_until_read_only = 15;
     $_tstart = [gettimeofday];
     my @threads = get_threads_util( $orig_master_handler->{dbh},
       $orig_master_handler->{connection_id} );
     while ( $time_until_read_only > 0 && $#threads >= 0 ) {
       if ( $time_until_read_only % 5 == 0 ) {
         printf
"%s Waiting all running %d threads aredisconnected.. (max %d milliseconds)\n",
           current_time_us(), $#threads + 1, $time_until_read_only * 100;
         if ( $#threads < 5 ) {
           print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump ."\n"
              foreach (@threads);
         }
       }
       sleep_until();
       $_tstart = [gettimeofday];
       $time_until_read_only--;
       @threads = get_threads_util( $orig_master_handler->{dbh},
         $orig_master_handler->{connection_id} );
     }
 
     ## Setting read_only=1 on the current master so that nobody(exceptSUPER) can write
     print current_time_us() . " Set read_only=1 on the orig master..";
     $orig_master_handler->enable_read_only();
     if ( $orig_master_handler->is_read_only() ) {
       print "ok.\n";
     }
     else {
       die "Failed!\n";
     }
 
     ## Waiting for M * 100 milliseconds so that current update queries cancomplete
     my $time_until_kill_threads = 5;
     @threads = get_threads_util( $orig_master_handler->{dbh},
       $orig_master_handler->{connection_id} );
     while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
       if ( $time_until_kill_threads % 5 == 0 ) {
         printf
"%s Waiting all running %d queries aredisconnected.. (max %d milliseconds)\n",
           current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
         if ( $#threads < 5 ) {
           print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump ."\n"
              foreach (@threads);
         }
       }
       sleep_until();
       $_tstart = [gettimeofday];
       $time_until_kill_threads--;
       @threads = get_threads_util( $orig_master_handler->{dbh},
         $orig_master_handler->{connection_id} );
     }
 
     ## Terminating all threads
     print current_time_us() . " Killing all applicationthreads..\n";
     $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0);
     print current_time_us() . " done.\n";
     $orig_master_handler->enable_log_bin_local();
     $orig_master_handler->disconnect();
 
     ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
     eval {
     `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host\" $ssh_stop_vip \"`;
       };
       if ($@) {
           warn $@;
       }
     $exit_code = 0;
   };
   if ($@) {
     warn "Got Error: $@\n";
     exit $exit_code;
    }
   exit $exit_code;
  }
 elsif ( $command eq "start" ) {
   ## Activating master ip on the new master
    #1. Create app user with write privileges
    #2. Moving backup script if needed
    #3. Register new master's ip to the catalog database
 
# We don't return error even thoughactivating updatable accounts/ip failed so that we don't interrupt slaves'recovery.
# If exit code is 0 or 10, MHA does notabort
   my $exit_code = 10;
   eval {
     my $new_master_handler = new MHA::DBHelper();
 
     # args: hostname, port, user, password, raise_error_or_not
     $new_master_handler->connect( $new_master_ip, $new_master_port,
       $new_master_user, $new_master_password, 1 );
 
     ## Set read_only=0 on the new master
     $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Setread_only=0 on the new master.\n";
     $new_master_handler->disable_read_only();
 
     ## Creating an app user on the new master
     print current_time_us() . " Creating app user on the newmaster..\n";
     #FIXME_xxx_create_app_user($new_master_handler);
     $new_master_handler->enable_log_bin_local();
     $new_master_handler->disconnect();
 
     ## Update master ip on the catalog database, etc
     `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host\" $ssh_start_vip \"`;
     $exit_code = 0;
   };
   if ($@) {
     warn "Got Error: $@\n";
     exit $exit_code;
    }
   exit $exit_code;
  }
 elsif ( $command eq "status" ) {
 
    #do nothing
   exit 0;
  }
 else {
   &usage();
   exit 1;
  }
}
 
sub usage {
 print
"Usage: master_ip_online_change--command=start|stop|status --orig_master_host=host --orig_master_ip=ip--orig_master_port=port --new_master_host=host --new_master_ip=ip--new_master_port=port\n";
 die;
}

Note: You need to change my $vip = 10.192.203.108/32'; to your own vip.

chmod +x /usr/bin/master_ip_online_change

6.5 send_report script code

Edit script / usr/bin/send_report at the management node

#!/usr/bin/perl
 
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
#  itunder the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
# Foundation, Inc.,
#  51Franklin Street, Fifth Floor, Boston, MA 02110-1301  USA
 
## Note: This is a sample script and is notcomplete. Modify the script based on your environment.
 
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts areset only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts,$subject, $body );
my $smtp='smtp.163.com';
my $mail_from='xxxxxxx@163.com';
my $mail_user='xxxxxxx@163.com';
my $mail_pass='Password';
my$mail_to=['949538827@qq.com','15521xxxx@139.com'];
GetOptions(
 'orig_master_host=s' => \$dead_master_host,
 'new_master_host=s'  =>\$new_master_host,
 'new_slave_hosts=s'  =>\$new_slave_hosts,
 'subject=s'          =>\$subject,
 'body=s'             => \$body,
);
 
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
   my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
   open my $DEBUG, "> /tmp/monitormail.log"
       or die "Can't open the debug     file:$!\n";
   my $sender = new Mail::Sender {
       ctype       => 'text/plain;charset=utf-8',
       encoding    => 'utf-8',
       smtp        => $smtp,
       from        => $mail_from,
       auth        => 'LOGIN',
       TLS_allowed => '0',
       authid      => $user,
       authpwd     => $passwd,
       to          => $mail_to,
       subject     => $subject,
       debug       => $DEBUG
   };
 
   $sender->MailMsg(
       {   msg   => $msg,
           debug => $DEBUG
       }
    )or print $Mail::Sender::Error;
   return 1;
}
 
 
 
# Do whatever you want here
 
exit 0;
 
//Note: The following lines need to be modified:
my $smtp='smtp.163.com';
my $mail_from='xxxxxxx@163.com';
my $mail_user='xxxxxxx@163.com';
my $mail_pass='Password';
my$mail_to=['949538827@qq.com','15521xxxx@139.com'];

# f Gives Execution Authority

chmod +x /usr/bin/send_report 

Note: It is necessary to ensure that the management node server can send mail normally. You can try it first with the sendEmail command.

6.6 Set the way relay log s are cleared (on each slave node)

Make sure that relay_log_purge=0 from the server configuration file, otherwise warning will be missed when masterha_check_repl, relay_log_purge=0 is not set on slave

When the master library is switched to a slave Library in the future, remember to perform the same operation on the original master library.

In the process of switching, MHA relies on relay log information in the recovery process from the library, so the automatic clearance of relay log is set to OFF, and relay log is cleared manually.

By default, the relay log from the server is automatically deleted after the execution of the SQL thread.

However, in MHA environment, these relay logs may be used to recover other slave servers, so automatic deletion of relay logs needs to be disabled.

Regular removal of relay logs requires consideration of replication latency. Under ext3 filesystem, deleting large files takes a certain amount of time, which can lead to serious replication delay.

In order to avoid replication delay, it is necessary to temporarily create hard links for relay logs, because deleting large files through hard links in linux systems can be very fast.

(In mysql databases, hard links are often used to delete large tables)

 

7 Check if mha manage ment is successfully configured?

7.1 Check ssh login

 [root@pc2.ssh]#  masterha_check_ssh--conf=/etc/mha/app1.cnf
Mon Feb 13 15:35:25 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 13 15:35:25 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..
Mon Feb 13 15:35:25 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..
Mon Feb 13 15:35:25 2017 - [info] StartingSSH connection tests..
Mon Feb 13 15:35:26 2017 - [debug]
Mon Feb 13 15:35:25 2017 - [debug]  Connecting via SSH fromroot@10.192.203.201(10.192.203.201:22) toroot@10.192.203.101(10.192.203.101:22)..
Mon Feb 13 15:35:25 2017 - [debug]   ok.
Mon Feb 13 15:35:25 2017 - [debug]  Connecting via SSH fromroot@10.192.203.201(10.192.203.201:22) toroot@10.192.203.102(10.192.203.102:22)..
Mon Feb 13 15:35:26 2017 - [debug]   ok.
Mon Feb 13 15:35:26 2017 - [debug]
Mon Feb 13 15:35:25 2017 - [debug]  Connecting via SSH fromroot@10.192.203.101(10.192.203.101:22) toroot@10.192.203.201(10.192.203.201:22)..
Mon Feb 13 15:35:25 2017 - [debug]   ok.
Mon Feb 13 15:35:25 2017 - [debug]  Connecting via SSH fromroot@10.192.203.101(10.192.203.101:22) toroot@10.192.203.102(10.192.203.102:22)..
Mon Feb 13 15:35:26 2017 - [debug]   ok.
Mon Feb 13 15:35:27 2017 - [debug]
Mon Feb 13 15:35:26 2017 - [debug]  Connecting via SSH fromroot@10.192.203.102(10.192.203.102:22) toroot@10.192.203.201(10.192.203.201:22)..
Mon Feb 13 15:35:26 2017 - [debug]   ok.
Mon Feb 13 15:35:26 2017 - [debug]  Connecting via SSH fromroot@10.192.203.102(10.192.203.102:22) toroot@10.192.203.101(10.192.203.101:22)..
Mon Feb 13 15:35:27 2017 - [debug]   ok.
Mon Feb 13 15:35:27 2017 - [info] All SSHconnection tests passed successfully.

If SSH can log on to several other servers without password, it will report an error when checking ssh:

 [root@PC .ssh]#  masterha_check_ssh--conf=/etc/mha/app1.cnf 
Fri Feb 10 10:04:35 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 10 10:04:35 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..
Fri Feb 10 10:04:35 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..
Fri Feb 10 10:04:35 2017 - [info] StartingSSH connection tests..
Fri Feb 10 10:04:36 2017 -[error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Feb 10 10:04:35 2017 - [debug]  Connecting via SSH fromroot@10.192.203.201(10.192.203.201:22) toroot@10.192.203.101(10.192.203.101:22)..
Warning: Permanently added '10.192.203.201'(RSA) to the list of known hosts.
Permission denied(publickey,gssapi-keyex,gssapi-with-mic,password).
Fri Feb 10 10:04:35 2017 -[error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connectionfrom root@10.192.203.201(10.192.203.201:22) toroot@10.192.203.101(10.192.203.101:22) failed!
Fri Feb 10 10:04:37 2017 - [debug]
Fri Feb 10 10:04:36 2017 - [debug]  Connecting via SSH fromroot@10.192.203.101(10.192.203.101:22) toroot@10.192.203.201(10.192.203.201:22)..
Fri Feb 10 10:04:36 2017 - [debug]   ok.
Fri Feb 10 10:04:36 2017 - [debug]  Connecting via SSH fromroot@10.192.203.101(10.192.203.101:22) toroot@10.192.203.102(10.192.203.102:22)..
Fri Feb 10 10:04:37 2017 - [debug]   ok.
Fri Feb 10 10:04:38 2017 - [debug]
Fri Feb 10 10:04:36 2017 - [debug]  Connecting via SSH fromroot@10.192.203.102(10.192.203.102:22) toroot@10.192.203.201(10.192.203.201:22)..
Fri Feb 10 10:04:37 2017 - [debug]   ok.
Fri Feb 10 10:04:37 2017 - [debug]  Connecting via SSH fromroot@10.192.203.102(10.192.203.102:22) to root@10.192.203.101(10.192.203.101:22)..
Fri Feb 10 10:04:38 2017 - [debug]   ok.
SSH Configuration Check Failed!
 at/usr/bin/masterha_check_ssh line 44

Need to check:

(1) Whether the/etc/hosts file is configured correctly;

(2) Is the password file ssh-copy-id on its own server?

If you encounter this error, correct it, empty the contents under. ssh, and then re-authenticate.

 

7.2 Check whether mysql replication has been successfully configured

[root@pc2 app1]# masterha_check_repl--conf=/etc/mha/app1.cnf

Fri Feb 10 11:25:05 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 10 11:25:05 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..
Fri Feb 10 11:25:05 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..
Fri Feb 10 11:25:05 2017 - [info]MHA::MasterMonitor version 0.56.
Fri Feb 10 11:25:06 2017 -[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is noalive slave. We can't do failover
Fri Feb 10 11:25:06 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Errorhappened on checking configurations.  at/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Fri Feb 10 11:25:06 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Errorhappened on monitoring servers.
Fri Feb 10 11:25:06 2017 - [info] Got exitcode 1 (Not master dead).
 
MySQL Replication Health is NOT OK!

Clearly, I checked the replication status is normal, why can't it pass?

Solution:

You need to ensure that master and alternate master are set to double master replication. Let master also point to the alternate master.

 

Verify again if an error is reported:

[root@pc2 mysql]#  masterha_check_repl --conf=/etc/mha/app1.cnf

Mon Feb 13 17:21:52 2017 - [info]   Connecting toroot@10.192.203.101(10.192.203.101:22)..
 Checking slave recovery environment settings..
   Opening /data/mysql/relay-log.info ... ok.
   Relay log found at /data/mysql, up to slave1-relay-bin.000011
   Temporary relay log file is /data/mysql/slave1-relay-bin.000011
   Testing mysql connection and privileges..ERROR 1045 (28000): Accessdenied for user 'mha'@'slave1' (using password: YES)
mysql command failed with rc 1:0!
 at/usr/bin/apply_diff_relay_logs line 375
         main::check()called at /usr/bin/apply_diff_relay_logs line 497
         eval{...} called at /usr/bin/apply_diff_relay_logs line 475
         main::main()called at /usr/bin/apply_diff_relay_logs line 120
Mon Feb 13 17:21:53 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln205] Slavessettings check failed!
Mon Feb 13 17:21:53 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm,ln413] Slave configuration failed.
Mon Feb 13 17:21:53 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Errorhappened on checking configurations.  at/usr/bin/masterha_check_repl line 48
Mon Feb 13 17:21:53 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Errorhappened on monitoring servers.
Mon Feb 13 17:21:53 2017 - [info] Got exitcode 1 (Not master dead).
 
MySQL Replication Health is NOT OK!

Solution:

Create users on slave1:

mysql> grant  ALL PRIVILEGES ON *.* TO 'mha'@'slave1'identified by 'test';

Query OK, 0 rows affected (0.00 sec) 

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

Similarly, create users on PC s:

mysql> grant  ALL PRIVILEGES ON *.* TO 'mha'@'PC' identifiedby 'test';

Query OK, 0 rows affected (0.00 sec) 

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

  

When re-validating, if an error is reported:

Can't exec "mysqlbinlog": No suchfile or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line106.

mysqlbinlog version command failed with rc1:0, please verify PATH, LD_LIBRARY_PATH, and client options

Solution:

Create a soft connection on all data nodes:

 ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

Verify again and report an error:

Testing mysql connection andprivileges..sh: mysql: command not found

mysql command failed with rc 127:0!

Solution:

Establish soft connections on all data nodes:

 ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql

Finally, everything is normal:

[root@pc2 mysql]#  masterha_check_repl --conf=/etc/mha/app1.cnf

Mon Feb 13 17:32:49 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 13 17:32:49 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..
Mon Feb 13 17:32:49 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..
Mon Feb 13 17:32:49 2017 - [info]MHA::MasterMonitor version 0.56.
Mon Feb 13 17:32:49 2017 - [info]Multi-master configuration is detected. Current primary(writable) master is10.192.203.201(10.192.203.201:3306)
Mon Feb 13 17:32:49 2017 - [info] Masterconfigurations are as below:
Master 10.192.203.101(10.192.203.101:3306),replicating from 10.192.203.201(10.192.203.201:3306), read-only
Master 10.192.203.201(10.192.203.201:3306),replicating from 10.192.203.101(10.192.203.101:3306)
 
Mon Feb 13 17:32:49 2017 - [info] GTID failovermode = 0
Mon Feb 13 17:32:49 2017 - [info] DeadServers:
Mon Feb 13 17:32:49 2017 - [info] AliveServers:
Mon Feb 13 17:32:49 2017 - [info]   10.192.203.201(10.192.203.201:3306)
Mon Feb 13 17:32:49 2017 - [info]   10.192.203.101(10.192.203.101:3306)
Mon Feb 13 17:32:49 2017 - [info]   10.192.203.102(10.192.203.102:3306)
Mon Feb 13 17:32:49 2017 - [info] AliveSlaves:
Mon Feb 13 17:32:49 2017 - [info]   10.192.203.101(10.192.203.101:3306)  Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled
Mon Feb 13 17:32:49 2017 - [info]     Replicating from10.192.203.201(10.192.203.201:3306)
Mon Feb 13 17:32:49 2017 - [info]     Primary candidate for the new Master(candidate_master is set)
Mon Feb 13 17:32:49 2017 - [info]   10.192.203.102(10.192.203.102:3306)  Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled
Mon Feb 13 17:32:49 2017 - [info]     Replicating from10.192.203.201(10.192.203.201:3306)
Mon Feb 13 17:32:49 2017 - [info] CurrentAlive Master: 10.192.203.201(10.192.203.201:3306)
Mon Feb 13 17:32:49 2017 - [info] Checkingslave configurations..
Mon Feb 13 17:32:49 2017 - [info] Checkingreplication filtering settings..
Mon Feb 13 17:32:49 2017 - [info]  binlog_do_db= , binlog_ignore_db=
Mon Feb 13 17:32:49 2017 - [info]  Replication filtering check ok.
Mon Feb 13 17:32:49 2017 - [info] GTID(with auto-pos) is not supported
Mon Feb 13 17:32:49 2017 - [info] StartingSSH connection tests..
Mon Feb 13 17:32:51 2017 - [info] All SSHconnection tests passed successfully.
Mon Feb 13 17:32:51 2017 - [info] CheckingMHA Node version..
Mon Feb 13 17:32:52 2017 - [info]  Version check ok.
Mon Feb 13 17:32:52 2017 - [info] CheckingSSH publickey authentication settings on the current master..
Mon Feb 13 17:32:52 2017 - [info]HealthCheck: SSH to 10.192.203.201 is reachable.
Mon Feb 13 17:32:52 2017 - [info] MasterMHA Node version is 0.56.
Mon Feb 13 17:32:52 2017 - [info] Checkingrecovery script configurations on 10.192.203.201(10.192.203.201:3306)..
Mon Feb 13 17:32:52 2017 - [info]   Executing command: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql--output_file=/var/tmp/save_binary_logs_test --manager_version=0.56--start_file=single-mysql-bin.000018
Mon Feb 13 17:32:52 2017 - [info]   Connecting toroot@10.192.203.201(10.192.203.201:22)..
 Creating /var/tmp if not exists..   ok.
 Checking output directory is accessible or not..
  ok.
 Binlog found at /data/mysql, up to single-mysql-bin.000018
Mon Feb 13 17:32:52 2017 - [info] Binlogsetting check done.
Mon Feb 13 17:32:52 2017 - [info] CheckingSSH publickey authentication and checking recovery script configurations on allalive slave servers..
Mon Feb 13 17:32:52 2017 - [info]   Executing command : apply_diff_relay_logs--command=test --slave_user='mha' --slave_host=10.192.203.101--slave_ip=10.192.203.101 --slave_port=3306 --workdir=/var/tmp--target_version=5.5.19-log --manager_version=0.56--relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Mon Feb 13 17:32:52 2017 - [info]   Connecting toroot@10.192.203.101(10.192.203.101:22)..
 Checking slave recovery environment settings..
   Opening /data/mysql/relay-log.info ... ok.
   Relay log found at /data/mysql, up to slave1-relay-bin.000011
    Temporaryrelay log file is /data/mysql/slave1-relay-bin.000011
   Testing mysql connection and privileges.. done.
   Testing mysqlbinlog output.. done.
   Cleaning up test file(s).. done.
Mon Feb 13 17:32:52 2017 - [info]   Executing command : apply_diff_relay_logs--command=test --slave_user='mha' --slave_host=10.192.203.102--slave_ip=10.192.203.102 --slave_port=3306 --workdir=/var/tmp--target_version=5.5.19-log --manager_version=0.56--relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Mon Feb 13 17:32:52 2017 - [info]   Connecting toroot@10.192.203.102(10.192.203.102:22)..
 Checking slave recovery environment settings..
   Opening /data/mysql/relay-log.info ... ok.
   Relay log found at /data/mysql, up to PC-relay-bin.000011
   Temporary relay log file is /data/mysql/PC-relay-bin.000011
   Testing mysql connection and privileges.. done.
   Testing mysqlbinlog output.. done.
   Cleaning up test file(s).. done.
Mon Feb 13 17:32:53 2017 - [info] Slavessettings check done.
Mon Feb 13 17:32:53 2017 - [info]
10.192.203.201(10.192.203.201:3306)(current master)
 +--10.192.203.101(10.192.203.101:3306)
 +--10.192.203.102(10.192.203.102:3306)
 
Mon Feb 13 17:32:53 2017 - [info] Checkingreplication health on 10.192.203.101..
Mon Feb 13 17:32:53 2017 - [info]  ok.
Mon Feb 13 17:32:53 2017 - [info] Checkingreplication health on 10.192.203.102..
Mon Feb 13 17:32:53 2017 - [info]  ok.
Mon Feb 13 17:32:53 2017 - [info] Checkingmaster_ip_failover_script status:
Mon Feb 13 17:32:53 2017 - [info]   /usr/bin/master_ip_failover --command=status--ssh_user=root --orig_master_host=10.192.203.201--orig_master_ip=10.192.203.201 --orig_master_port=3306
 
 
IN SCRIPT TEST====/etc/init.d/keepalivedstop==/etc/init.d/keepalived start===
 
Checking the Status of the script.. OK
Mon Feb 13 17:32:53 2017 - [info]  OK.
Mon Feb 13 17:32:53 2017 - [warning]shutdown_script is not defined.
Mon Feb 13 17:32:53 2017 - [info] Got exitcode 0 (Not master dead).
 
MySQL Replication Health is OK.
 

8 Start monitoring on the management side

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 & 

# View status

masterha_check_status --conf=/etc/mha/app1.cnf  

app1 (pid:3349) is running(0:PING_OK),master:10.192.203.201

9 Verify failover

Close Mysql instance on 10.192.203.201:

 [root@pc2mha]# service mysqld stop

Shutting downMySQL... SUCCESS!

View logs on the management node:

tail -f /var/log/mha/app1/manager.log  

You will see:

Started automated(non-interactive)failover.
Invalidated master IP address on10.192.203.201(10.192.203.201:3306)
The latest slave10.192.203.101(10.192.203.101:3306) has all relay logs for recovery.
Selected10.192.203.101(10.192.203.101:3306) as a new master.
10.192.203.101(10.192.203.101:3306): OK:Applying all logs succeeded.
10.192.203.101(10.192.203.101:3306): OK:Activated master IP address.
10.192.203.102(10.192.203.102:3306): Thishost has the latest relay log events.
Generating relay diff files from the latestslave succeeded.
10.192.203.102(10.192.203.102:3306): OK:Applying all logs succeeded. Slave started, replicating from10.192.203.101(10.192.203.101:3306)
10.192.203.101(10.192.203.101:3306):Resetting slave info succeeded.
Master failover to10.192.203.101(10.192.203.101:3306) completed successfully.
Mon Feb 13 17:44:28 2017 - [info] Sendingmail..
Unknown option: conf

Alarm mail was also received:

Vip also drifted over:

[root@slave1 mysql]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd00:00:00:00:00:00
   inet 127.0.0.1/8 scope host lo
   inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000
   link/ether 08:00:27:04:05:16 brd ff:ff:ff:ff:ff:ff
   inet 10.192.203.101/24 brd 10.192.203.255 scope global eth0
   inet 10.192.203.108/32 scope global eth0
   inet6 fe80::a00:27ff:fe04:516/64 scope link tentative dadfailed
      valid_lft forever preferred_lft forever
3: eth1:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000
   link/ether 08:00:27:3a:ec:3c brd ff:ff:ff:ff:ff:ff
   inet 10.0.10.5/24 brd 10.0.10.255 scope global eth1
   inet6 fe80::a00:27ff:fe3a:ec3c/64 scope link tentative dadfailed
      valid_lft forever preferred_lft forever

Check slave status on PC:

mysql> show slave status \G;

*************************** 1. row***************************

               Slave_IO_State: Waiting formaster to send event

                  Master_Host: 10.192.203.101

                  Master_User: rep_user

                  Master_Port: 3306

You can see that 10.192.203.102 replication automatically points to 10.192.203.101.  

Looking at the variable read_only on the current master, we found that it was automatically closed, indicating that the previous slave can now write:

mysql> showvariables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.01 sec) 

When the handover is completed, the manager process automatically hangs up:

[root@pc2 mysql]#  masterha_check_status --conf=/etc/mha/app1.cnf

app1 is stopped(2:NOT_RUNNING).

The mha configuration file was modified by itself (the configuration information of the original master library was deleted):

[root@pc2 mha]# cat app1.cnf

[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/data/mysql
master_ip_failover_script=/usr/bin/master_ip_failover
master_ip_online_change_script=/usr/bin/master_ip_online_change
password=test
ping_interval=2
repl_password=beijing
repl_user=rep_user
report_script=/usr/bin/send_report
ssh_user=root
user=mha
 
[server2]
candidate_master=1
check_repl_delay=0
hostname=10.192.203.101
port=3306
 
[server3]
hostname=10.192.203.102
port=3306
 

Now the master library 10.192.203.101 has no original master-slave replication information:

mysql> showslave status \G;

Empty set (0.00sec)

 

ERROR:

No queryspecified

 

The next thing we should do is:

(1) Find out the reason why the main library hangs and fix it.

(2) Make sure that the original repository catches up with the current one and continues to replicate correctly.

(3) On the current main library, change master to points to the original main library and starts to replicate, so as to prepare for future failover.

(4) Check whether the current master-slave replication status is normal (masterha_check_repl--conf=/etc/mha/app1.cnf)

(5) Start the original main library keepalive process

(5) Ensure that the original main library configuration file (read_only set to 1, relay_log_purge=0)

(8) Modify the mha configuration file to add the original master database information

(9) Start the manager process to check whether the mha status is normal

10 Online Switching

In many cases, an existing primary server needs to be migrated to another server. For example, the main server hardware failure, the RAID control card needs to be rebuilt, the main server will be moved to a better performance server, and so on. Maintenance of the primary server results in performance degradation, resulting in downtime at least not being able to write data. In addition, blocking or killing the currently running session can lead to data inconsistencies between hosts. MHA provides fast switching and elegant resistance

Plug writing, the switching process only takes 0.5-2 seconds, during which the data can not be written. In many cases, blocking writes of 0.5-2s are acceptable. Therefore, switching primary servers does not require scheduling maintenance time windows.

The general process of MHA online switching:

(1) Detecting replication settings and determining the current primary server

(2) Identify a new primary server

(3) Blocking Writing to the Current Main Server

(4) Waiting for all slave servers to catch up with replication

(5) Grant write to the new primary server

(6) Reset the slave server

 

If I want to switch back from 10.192.203.101 to 10.192.203.201 now, it will be done online.

10.1 Stop MHA Monitoring

masterha_stop --conf=/etc/mha/app1.cnf

10.2 Online Switching

[root@pc2 bin]# masterha_master_switch--conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=10.192.203.201--new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

Tue Feb 14 02:44:20 2017 - [info]MHA::MasterRotate version 0.56.
Tue Feb 14 02:44:20 2017 - [info] Startingonline master switch..
Tue Feb 14 02:44:20 2017 - [info]
Tue Feb 14 02:44:20 2017 - [info] * Phase1: Configuration Check Phase..
Tue Feb 14 02:44:20 2017 - [info]
Tue Feb 14 02:44:20 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.
Tue Feb 14 02:44:20 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..
Tue Feb 14 02:44:20 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..
Tue Feb 14 02:44:20 2017 - [info]Multi-master configuration is detected. Current primary(writable) master is10.192.203.101(10.192.203.101:3306)
Tue Feb 14 02:44:20 2017 - [info] Masterconfigurations are as below:
Master 10.192.203.101(10.192.203.101:3306),replicating from 10.192.203.201(10.192.203.201:3306)
Master 10.192.203.201(10.192.203.201:3306),replicating from 10.192.203.101(10.192.203.101:3306), read-only
 
Tue Feb 14 02:44:20 2017 - [info] GTIDfailover mode = 0
Tue Feb 14 02:44:20 2017 - [info] CurrentAlive Master: 10.192.203.101(10.192.203.101:3306)
Tue Feb 14 02:44:20 2017 - [info] AliveSlaves:
Tue Feb 14 02:44:20 2017 - [info]   10.192.203.201(10.192.203.201:3306)  Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled
Tue Feb 14 02:44:20 2017 - [info]     Replicating from10.192.203.101(10.192.203.101:3306)
Tue Feb 14 02:44:20 2017 - [info]   10.192.203.102(10.192.203.102:3306)  Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled
Tue Feb 14 02:44:20 2017 - [info]     Replicating from10.192.203.101(10.192.203.101:3306)
 
It is better to execute FLUSHNO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to executeon 10.192.203.101(10.192.203.101:3306)? (YES/no): yes
Tue Feb 14 02:45:03 2017 - [info] ExecutingFLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Feb 14 02:45:03 2017 - [info]  ok.
Tue Feb 14 02:45:03 2017 - [info] CheckingMHA is not monitoring or doing failover..
Tue Feb 14 02:45:03 2017 - [info] Checkingreplication health on 10.192.203.201..
Tue Feb 14 02:45:03 2017 - [info]  ok.
Tue Feb 14 02:45:03 2017 - [info] Checkingreplication health on 10.192.203.102..
Tue Feb 14 02:45:03 2017 - [info]  ok.
Tue Feb 14 02:45:03 2017 - [info]10.192.203.201 can be new master.
Tue Feb 14 02:45:03 2017 - [info]
From:
10.192.203.101(10.192.203.101:3306)(current master)
 +--10.192.203.201(10.192.203.201:3306)
 +--10.192.203.102(10.192.203.102:3306)
 
To:
10.192.203.201(10.192.203.201:3306) (newmaster)
 +--10.192.203.102(10.192.203.102:3306)
 +--10.192.203.101(10.192.203.101:3306)
 
Starting master switch from10.192.203.101(10.192.203.101:3306) to 10.192.203.201(10.192.203.201:3306)?(yes/NO): yes
Tue Feb 14 02:45:09 2017 - [info] Checkingwhether 10.192.203.201(10.192.203.201:3306) is ok for the new master..
Tue Feb 14 02:45:09 2017 - [info]  ok.
Tue Feb 14 02:45:09 2017 - [info] ** Phase1: Configuration Check Phase completed.
Tue Feb 14 02:45:09 2017 - [info]
Tue Feb 14 02:45:09 2017 - [info] * Phase2: Rejecting updates Phase..
Tue Feb 14 02:45:09 2017 - [info]
Tue Feb 14 02:45:09 2017 - [info] Executingmaster ip online change script to disable write on the current master:
Tue Feb 14 02:45:09 2017 - [info]   /usr/bin/master_ip_online_change--command=stop --orig_master_host=10.192.203.101--orig_master_ip=10.192.203.101 --orig_master_port=3306--orig_master_user='mha' --orig_master_password='test'--new_master_host=10.192.203.201 --new_master_ip=10.192.203.201--new_master_port=3306 --new_master_user='mha' --new_master_password='test' --orig_master_ssh_user=root--new_master_ssh_user=root  --orig_master_is_new_slave
Tue Feb 14 02:45:09 2017 728241 Setread_only on the new master.. ok.
Tue Feb 14 02:45:09 2017 733969 Drpping appuser on the orig master..
Tue Feb 14 02:45:09 2017 735861 Set read_only=1on the orig master.. ok.
Tue Feb 14 02:45:09 2017 739394 Killing allapplication threads..
Tue Feb 14 02:45:09 2017 739431 done.
SIOCSIFFLAGS: Cannot assign requestedaddress
Tue Feb 14 02:45:09 2017 - [info]  ok.
Tue Feb 14 02:45:09 2017 - [info] Lockingall tables on the orig master to reject updates from everybody (includingroot):
Tue Feb 14 02:45:09 2017 - [info] ExecutingFLUSH TABLES WITH READ LOCK..
Tue Feb 14 02:45:09 2017 - [info]  ok.
Tue Feb 14 02:45:09 2017 - [info] Origmaster binlog:pos is single-mysql-bin.000009:997.
Tue Feb 14 02:45:09 2017 - [info]  Waiting to execute all relay logs on10.192.203.201(10.192.203.201:3306)..
Tue Feb 14 02:45:09 2017 - [info]  master_pos_wait(single-mysql-bin.000009:997)completed on 10.192.203.201(10.192.203.201:3306). Executed 0 events.
Tue Feb 14 02:45:09 2017 - [info]   done.
Tue Feb 14 02:45:09 2017 - [info] Gettingnew master's binlog name and position..
Tue Feb 14 02:45:09 2017 - [info]  single-mysql-bin.000022:107
Tue Feb 14 02:45:09 2017 - [info]  All other slaves should start replicationfrom here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.192.203.201',MASTER_PORT=3306, MASTER_LOG_FILE='single-mysql-bin.000022', MASTER_LOG_POS=107,MASTER_USER='rep_user', MASTER_PASSWORD='xxx';
Tue Feb 14 02:45:09 2017 - [info] Executingmaster ip online change script to allow write on the new master:
Tue Feb 14 02:45:09 2017 - [info]   /usr/bin/master_ip_online_change--command=start --orig_master_host=10.192.203.101 --orig_master_ip=10.192.203.101--orig_master_port=3306 --orig_master_user='mha' --orig_master_password='test'--new_master_host=10.192.203.201 --new_master_ip=10.192.203.201--new_master_port=3306 --new_master_user='mha' --new_master_password='test'--orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
Tue Feb 14 02:45:09 2017 946501 Setread_only=0 on the new master.
Tue Feb 14 02:45:09 2017 947009 Creatingapp user on the new master..
Tue Feb 14 02:45:10 2017 - [info]  ok.
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info] *Switching slaves in parallel..
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info] -- Slaveswitch on host 10.192.203.102(10.192.203.102:3306) started, pid: 8353
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info] Logmessages from 10.192.203.102 ...
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info]  Waiting to execute all relay logs on10.192.203.102(10.192.203.102:3306)..
Tue Feb 14 02:45:10 2017 - [info]  master_pos_wait(single-mysql-bin.000009:997)completed on 10.192.203.102(10.192.203.102:3306). Executed 0 events.
Tue Feb 14 02:45:10 2017 - [info]   done.
Tue Feb 14 02:45:10 2017 - [info]  Resetting slave10.192.203.102(10.192.203.102:3306) and starting replication from the newmaster 10.192.203.201(10.192.203.201:3306)..
Tue Feb 14 02:45:10 2017 - [info]  Executed CHANGE MASTER.
Tue Feb 14 02:45:10 2017 - [info]  Slave started.
Tue Feb 14 02:45:10 2017 - [info] End oflog messages from 10.192.203.102 ...
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info] -- Slaveswitch on host 10.192.203.102(10.192.203.102:3306) succeeded.
Tue Feb 14 02:45:10 2017 - [info] Unlockingall tables on the orig master:
Tue Feb 14 02:45:10 2017 - [info] ExecutingUNLOCK TABLES..
Tue Feb 14 02:45:10 2017 - [info]  ok.
Tue Feb 14 02:45:10 2017 - [info] Startingorig master as a new slave..
Tue Feb 14 02:45:10 2017 - [info]  Resetting slave10.192.203.101(10.192.203.101:3306) and starting replication from the newmaster 10.192.203.201(10.192.203.201:3306)..
Tue Feb 14 02:45:10 2017 - [info]  Executed CHANGE MASTER.
Tue Feb 14 02:45:10 2017 - [info]  Slave started.
Tue Feb 14 02:45:10 2017 - [info] All newslave servers switched successfully.
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info] * Phase5: New master cleanup phase..
Tue Feb 14 02:45:10 2017 - [info]
Tue Feb 14 02:45:10 2017 - [info]  10.192.203.201: Resetting slave infosucceeded.
Tue Feb 14 02:45:10 2017 - [info] Switchingmaster to 10.192.203.201(10.192.203.201:3306) completed successfully.

You need to enter two yes manually in the middle.

 

10.3 Check if the switch is successful

Execute master ha_check_repl -- conf=/etc/mha/app1.cnf to verify that the current master-slave configuration is correct.

Log in to the slave libraries and verify that the replication now points to the new master library.

Now 10.192.203.101, 10.192.203.102 copies all point to 10.192.203.201.

No replication information was found on 10.192.203.201:

mysql> show slave status \G;

Empty set (0.00 sec)

 

ERROR:

No query specified

 

It is found that there are the following characteristics after online switching:

After the online switch is completed, other library replicates will point to the current main library. The original replicated information on the current main library has been cleared

(2) The information of the original main library in the mha configuration file has not been deleted.

(3) The original main library automatically becomes read-only, and now the main library automatically becomes writable.

(4) The vip of the original main library drifts automatically to the current main library.

10.4 Follow-up

(1) After the online switching is completed, the current main library still needs to change master to the original main library, and then start replication.

(2) Start the keepalive process of the original main library

(3) Start mha monitoring

 

I found that when the standby main library was down, if the main library was also down, then mha could not automatically switch to other slave s. Error log:

Wed Feb 15 21:42:04 2017 -[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln492]  Server 10.192.203.101(10.192.203.101:3306) isdead, but must be alive! Check server settings.

Wed Feb 15 21:42:04 2017 -[error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm line 268

 

This article refers to: http://www.cnblogs.com/xuanzhi201111/p/4231412.html

If there are any mistakes in the article, you are welcome to point out.

Posted by davidkierz on Fri, 29 Mar 2019 21:27:29 -0700