shell script - switch soft link file (nagios monitoring)

Keywords: Linux DNS network sudo

Task: you need to define services in nagios to detect the state of three DC S (1. Host state, 2. Consumer cluster state, 3.nomad cluster state). As long as one of the service states fails, trigger nagios eventhandler to change the link file of dns server, as shown in the figure above.

Script: server address in script is different from actual

Script 1: this script detects the service status of three DC S. According to the detected results, it will output the file name that dns should link at present, which will be displayed on nagios. If dns is not linked to the correct filename, nagios will alert and trigger event handler.

#!/bin/bash
#Detection DC host status,consul cluster status,nomad cluster status
DATE=`date +%Y%m%d%H%M%S`

#DC:US(tier1001 and tier1002)
#DC:EU(tier2001 and tier2002)
#DC:AS(tier3001 and tier3002)

#All DC -> axel-geo_us_eu_as.yml default
#DC-EU down -> axel-geo_us_as.yml  if DC-EU down
#DC-AS down -> axel-geo_us_eu.yml  if DC-AS down
#DC-US down -> axel-geo_eu_as.yml  if DC-US down

#detection dc(US) ping status     #Detect the host status of three DC's through the nagios plug-in check Ping
PING_1001=`/usr/lib64/nagios/plugins/check_ping -4 -H tier1001 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'`
PING_1002=`/usr/lib64/nagios/plugins/check_ping -4 -H tier1002 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'`
#detection dc(EU) ping status
PING_2001=`/usr/lib64/nagios/plugins/check_ping -4 -H tier2001 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'`
PING_2002=`/usr/lib64/nagios/plugins/check_ping -4 -H tier2002 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'`
#detection dc(AS) ping status
PING_3001=`/usr/lib64/nagios/plugins/check_ping -4 -H tier3001 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'`
PING_3002=`/usr/lib64/nagios/plugins/check_ping -4 -H tier3002 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'`

#detection dc(US) consul          #Detect the consumer cluster status of three DC S, and call the script on the remote host through nrpe
if /usr/lib64/nagios/plugins/check_nrpe -H tier1001.axel.network -c check_consul_cluster &>/dev/null ; then CON_US=0 ; else CON_US=1 ; fi
#detection dc(EU) consul
if /usr/lib64/nagios/plugins/check_nrpe -H tier2001.axel.network -c check_consul_cluster &>/dev/null ; then CON_EU=0 ; else CON_EU=1 ; fi
#detection dc(AS) consul
if /usr/lib64/nagios/plugins/check_nrpe -H tier3001.axel.network -c check_consul_cluster &>/dev/null ; then CON_AS=0 ; else CON_AS=1 ; fi

#detection dc(US) nomad        #Detect the nomad cluster status of three DC S, and call the script on the remote host through nrpe
if /usr/lib64/nagios/plugins/check_nrpe -H tier1001.axel.network -c check_nomad_cluster &>/dev/null ; then NOM_US=0 ; else NOM_US=1 ; fi
#detection dc(EU) nomad
if /usr/lib64/nagios/plugins/check_nrpe -H tier2001.axel.network -c check_nomad_cluster &>/dev/null ; then NOM_EU=0 ; else NOM_EU=1 ; fi
#detection dc(AS) nomad
if /usr/lib64/nagios/plugins/check_nrpe -H tier3001.axel.network -c check_nomad_cluster &>/dev/null ; then NOM_AS=0 ; else NOM_AS=1 ; fi

#detection corrent linkfile         #Check the file name of the current link on the dns server
FILE=`/usr/lib64/nagios/plugins/check_nrpe -H romeo.zencoo.com -c check_pdns_link`
[ ! -n "$FILE" ] && {
echo '$FILE is NULL'
exit 1
}

#detection service function     #Judge the three services of each DC. In a DC, only all the service states are normal. The variable of the DC is assigned 0 (for example, US is assigned 0).
function service {
#detection ping 
[ "$PING_1001" == "OK" -a "$PING_1002" == "OK" ] && PING_US=0 || PING_US=1
[ "$PING_2001" == "OK" -a "$PING_2001" == "OK" ] && PING_EU=0 || PING_EU=1
[ "$PING_3001" == "OK" -a "$PING_3002" == "OK" ] && PING_AS=0 || PING_AS=1
#detection all status 
[ "$PING_US" -eq 0 ] && [ "$CON_US" -eq 0 ] && [ "$NOM_US" -eq 0 ] && US=0 || US=1
[ "$PING_EU" -eq 0 ] && [ "$CON_EU" -eq 0 ] && [ "$NOM_EU" -eq 0 ] && EU=0 || EU=1
[ "$PING_AS" -eq 0 ] && [ "$CON_AS" -eq 0 ] && [ "$NOM_AS" -eq 0 ] && AS=0 || AS=1
}

service

#Judge whether to switch the linked file. If necessary, the exit status code is 2. nagios will give an alarm and trigger the event handler.
if [ ${US} -eq 0 ] && [ ${EU} -eq 0 ] && [ ${AS} -eq 0 ] && [ "$FILE" == "axel-geo_us_eu_as.yml" ];then
   echo "all-DC-is ok,->already axel-geo_us_eu_as.yml";exit 0
elif [ ${US} -eq 0 ] && [ ${EU} -eq 0 ] && [ ${AS} -eq 0 ] && [ "$FILE" != "axel-geo_us_eu_as.yml" ];then
   echo "axel-geo_us_eu_as.yml";exit 2
elif [ ${US} -eq 1 -a "$FILE" != "axel-geo_eu_as.yml" ];then
   echo "axel-geo_eu_as.yml";exit 2
elif [ ${EU} -eq 1 -a "$FILE" != "axel-geo_us_as.yml" ];then
   echo "axel-geo_us_as.yml";exit 2
elif [ ${AS} -eq 1 -a "$FILE" != "axel-geo_us_eu.yml" ];then
   echo "axel-geo_us_eu.yml";exit 2
else
   echo "link file is ${FILE}"
   exit 0
fi

Script 2: script that triggers event handler

#!/bin/bash
#check_service_status.sh dection All dc host status,consul status,nomad status.
#script return a file name ($2 following four)
#All DC -> axel-geo_us_eu_as.yml default
#DC-EU down -> axel-geo_us_as.yml  if DC-EU down
#DC-AS down -> axel-geo_us_eu.yml  if DC-AS down
#DC-US down -> axel-geo_eu_as.yml  if DC-US down

WORKDIR=/usr/lib64/nagios/plugins
DATE=`date +%Y%m%d%H%M%S`
LOG=/tmp/.dns_linkfile
exec &>>${LOG}

case $1 in     #$1 is the status code of nagios detection service. If the alarm is CRITICAL
OK)
   #correct link file
   exit 0
   ;;
CRITICAL)    #$2 is the information displayed on nagios, that is, the file name, and then nrpe calls the script on the dns server to change the linked file.
   #need to switch link file
   case $2 in
     axel-geo_us_eu_as.yml)
          #DC-EU,DC-AS,DC-US state ok,linkfile->axel-geo_us_eu_as.yml
          REMOTE_CMD=update_us_eu_as
       ;;  
     axel-geo_us_as.yml)
          #DC-EU down,linkfile->axel-geo_us_as.yml
          REMOTE_CMD=update_us_as
       ;;  
     axel-geo_us_eu.yml)
          #DC-AS down, linkfile->axel-geo_us_eu.yml
          REMOTE_CMD=update_us_eu
       ;;  
     axel-geo_eu_as.yml)
          #DC-US down, linkfile->axel-geo_eu_as.yml
          REMOTE_CMD=update_eu_as
       ;;
                      *)
          #default output
          echo "${DATE}--warining,no file match"
          exit 1 
       ;;
     esac
          echo "${DATE}--${WORKDIR}/check_nrpe -H {ns1,ns2}.zencoo.com -c ${REMOTE_CMD}"
          ${WORKDIR}/check_nrpe -H DNS1 -c ${REMOTE_CMD}          
          ${WORKDIR}/check_nrpe -H DNS2 -c ${REMOTE_CMD}
   ;;
esac
exit 0

Script 3: change the link file on the DNS Service

#!/bin/bash
#The script is called in the check_dc_status and change_dns_linkfile scripts
LOG=/tmp/.dns_linkfile
DATE=`date +%Y%m%d%H%M%S`
DIR=/etc/pdns
LN=axel-geo.yml
FILE="`ls -l ${DIR}/${LN} | sed -n '/^l/p'|sed 's/.*-> //g'`"

#$1 is check_dc_status and change_dns_linkfile passed parameters
case $1 in     #The first two scripts call the script through nrpe, $1 is the parameter passed in.
check)
   FILE="`ls -l ${DIR}/${LN} | sed -n '/^l/p'|sed 's/.*-> //g'`"
   echo "$FILE" 
   exit 0
   ;;
us_eu_as)
   TAGETFILE="${DIR}/axel-geo_us_eu_as.yml"
   ;;
us_as)
   TAGETFILE="${DIR}/axel-geo_us_as.yml"
   ;;
us_eu)
   TAGETFILE="${DIR}/axel-geo_us_eu.yml"
   ;;
eu_as)
   TAGETFILE="${DIR}/axel-geo_eu_as.yml"
   ;;
*)
   echo '$1 error' >>${LOG}
   exit 1
   ;;
esac

if [ ! -f ${TAGETFILE} ];then
 echo '$TAGETFILE does not exist/${DATE}' >>${LOG}
 exit 1
elif  [ "$FILE" == "$TAGETFILE" ];then
 echo "${DATE}-Link file is correct, no need to switch" >>${LOG}
 exit 0
else
 echo "${HOSTNAME}/${DATE} ln -snf $TAGETFILE ${DIR}/${LN}" >>${LOG}
sudo /usr/bin/ln -snf $TAGETFILE ${DIR}/${LN}  
sudo /bin/pdns_control reload && echo "${DATE}-reload dns ok" >>${LOG} || echo "${DATE}-reload dns failed" >>${LOG}
 exit 0
fi

nagios configuration defines detection service and event handler

define service{
        use                             generic-service
        host_name                         xxx
        service_description             check_dc_status
        contact_groups                  admins,admins_jabber
        check_command                   check_nrpe_t60!check_dc_status   #Call script to detect service status (script 1)
        event_handler                   change_dns_linkfile                             #Call the event command
        }

define command {
        command_name    change_dns_linkfile          #$servicestate $$serviceoutput $corresponds to $1 and $2 in script 2
        command_line    $USER1$/eventhandlers/change_dns_linkfile $SERVICESTATE$ $SERVICEOUTPUT$     
        }

puppet configuration script 1 and script 2 will call script 3 through nrpe, and corresponding commands and parameters need to be defined.

<% if @fqdn == 'dns1xxxx' or @fqdn == 'dns2xxxx' -%>
command[check_pdns_link]=<%= @pluginsdir %>/dns_file_check.sh check                    
command[update_us_eu_as]=<%= @pluginsdir %>/dns_file_check.sh us_eu_as
command[update_us_eu]=<%= @pluginsdir %>/dns_file_check.sh us_eu
command[update_us_as]=<%= @pluginsdir %>/dns_file_check.sh us_as
command[update_eu_as]=<%= @pluginsdir %>/dns_file_check.sh eu_as
<% end -%>

The first time I got into Nagios event handler, it was very messy, and the script needs to be improved.

Posted by OsvaldoM on Sat, 02 Nov 2019 04:26:20 -0700