Task: you need to define services in nagios to detect the state of three DC S (1. Host state, 2. Consumer cluster state, 3.nomad cluster state). As long as one of the service states fails, trigger nagios eventhandler to change the link file of dns server, as shown in the figure above.
Script: server address in script is different from actual
Script 1: this script detects the service status of three DC S. According to the detected results, it will output the file name that dns should link at present, which will be displayed on nagios. If dns is not linked to the correct filename, nagios will alert and trigger event handler.
#!/bin/bash #Detection DC host status,consul cluster status,nomad cluster status DATE=`date +%Y%m%d%H%M%S` #DC:US(tier1001 and tier1002) #DC:EU(tier2001 and tier2002) #DC:AS(tier3001 and tier3002) #All DC -> axel-geo_us_eu_as.yml default #DC-EU down -> axel-geo_us_as.yml if DC-EU down #DC-AS down -> axel-geo_us_eu.yml if DC-AS down #DC-US down -> axel-geo_eu_as.yml if DC-US down #detection dc(US) ping status #Detect the host status of three DC's through the nagios plug-in check Ping PING_1001=`/usr/lib64/nagios/plugins/check_ping -4 -H tier1001 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'` PING_1002=`/usr/lib64/nagios/plugins/check_ping -4 -H tier1002 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'` #detection dc(EU) ping status PING_2001=`/usr/lib64/nagios/plugins/check_ping -4 -H tier2001 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'` PING_2002=`/usr/lib64/nagios/plugins/check_ping -4 -H tier2002 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'` #detection dc(AS) ping status PING_3001=`/usr/lib64/nagios/plugins/check_ping -4 -H tier3001 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'` PING_3002=`/usr/lib64/nagios/plugins/check_ping -4 -H tier3002 -w 3000.0,80% -c 5000.0,100% -p 5|awk '{print $2}'` #detection dc(US) consul #Detect the consumer cluster status of three DC S, and call the script on the remote host through nrpe if /usr/lib64/nagios/plugins/check_nrpe -H tier1001.axel.network -c check_consul_cluster &>/dev/null ; then CON_US=0 ; else CON_US=1 ; fi #detection dc(EU) consul if /usr/lib64/nagios/plugins/check_nrpe -H tier2001.axel.network -c check_consul_cluster &>/dev/null ; then CON_EU=0 ; else CON_EU=1 ; fi #detection dc(AS) consul if /usr/lib64/nagios/plugins/check_nrpe -H tier3001.axel.network -c check_consul_cluster &>/dev/null ; then CON_AS=0 ; else CON_AS=1 ; fi #detection dc(US) nomad #Detect the nomad cluster status of three DC S, and call the script on the remote host through nrpe if /usr/lib64/nagios/plugins/check_nrpe -H tier1001.axel.network -c check_nomad_cluster &>/dev/null ; then NOM_US=0 ; else NOM_US=1 ; fi #detection dc(EU) nomad if /usr/lib64/nagios/plugins/check_nrpe -H tier2001.axel.network -c check_nomad_cluster &>/dev/null ; then NOM_EU=0 ; else NOM_EU=1 ; fi #detection dc(AS) nomad if /usr/lib64/nagios/plugins/check_nrpe -H tier3001.axel.network -c check_nomad_cluster &>/dev/null ; then NOM_AS=0 ; else NOM_AS=1 ; fi #detection corrent linkfile #Check the file name of the current link on the dns server FILE=`/usr/lib64/nagios/plugins/check_nrpe -H romeo.zencoo.com -c check_pdns_link` [ ! -n "$FILE" ] && { echo '$FILE is NULL' exit 1 } #detection service function #Judge the three services of each DC. In a DC, only all the service states are normal. The variable of the DC is assigned 0 (for example, US is assigned 0). function service { #detection ping [ "$PING_1001" == "OK" -a "$PING_1002" == "OK" ] && PING_US=0 || PING_US=1 [ "$PING_2001" == "OK" -a "$PING_2001" == "OK" ] && PING_EU=0 || PING_EU=1 [ "$PING_3001" == "OK" -a "$PING_3002" == "OK" ] && PING_AS=0 || PING_AS=1 #detection all status [ "$PING_US" -eq 0 ] && [ "$CON_US" -eq 0 ] && [ "$NOM_US" -eq 0 ] && US=0 || US=1 [ "$PING_EU" -eq 0 ] && [ "$CON_EU" -eq 0 ] && [ "$NOM_EU" -eq 0 ] && EU=0 || EU=1 [ "$PING_AS" -eq 0 ] && [ "$CON_AS" -eq 0 ] && [ "$NOM_AS" -eq 0 ] && AS=0 || AS=1 } service #Judge whether to switch the linked file. If necessary, the exit status code is 2. nagios will give an alarm and trigger the event handler. if [ ${US} -eq 0 ] && [ ${EU} -eq 0 ] && [ ${AS} -eq 0 ] && [ "$FILE" == "axel-geo_us_eu_as.yml" ];then echo "all-DC-is ok,->already axel-geo_us_eu_as.yml";exit 0 elif [ ${US} -eq 0 ] && [ ${EU} -eq 0 ] && [ ${AS} -eq 0 ] && [ "$FILE" != "axel-geo_us_eu_as.yml" ];then echo "axel-geo_us_eu_as.yml";exit 2 elif [ ${US} -eq 1 -a "$FILE" != "axel-geo_eu_as.yml" ];then echo "axel-geo_eu_as.yml";exit 2 elif [ ${EU} -eq 1 -a "$FILE" != "axel-geo_us_as.yml" ];then echo "axel-geo_us_as.yml";exit 2 elif [ ${AS} -eq 1 -a "$FILE" != "axel-geo_us_eu.yml" ];then echo "axel-geo_us_eu.yml";exit 2 else echo "link file is ${FILE}" exit 0 fi
Script 2: script that triggers event handler
#!/bin/bash #check_service_status.sh dection All dc host status,consul status,nomad status. #script return a file name ($2 following four) #All DC -> axel-geo_us_eu_as.yml default #DC-EU down -> axel-geo_us_as.yml if DC-EU down #DC-AS down -> axel-geo_us_eu.yml if DC-AS down #DC-US down -> axel-geo_eu_as.yml if DC-US down WORKDIR=/usr/lib64/nagios/plugins DATE=`date +%Y%m%d%H%M%S` LOG=/tmp/.dns_linkfile exec &>>${LOG} case $1 in #$1 is the status code of nagios detection service. If the alarm is CRITICAL OK) #correct link file exit 0 ;; CRITICAL) #$2 is the information displayed on nagios, that is, the file name, and then nrpe calls the script on the dns server to change the linked file. #need to switch link file case $2 in axel-geo_us_eu_as.yml) #DC-EU,DC-AS,DC-US state ok,linkfile->axel-geo_us_eu_as.yml REMOTE_CMD=update_us_eu_as ;; axel-geo_us_as.yml) #DC-EU down,linkfile->axel-geo_us_as.yml REMOTE_CMD=update_us_as ;; axel-geo_us_eu.yml) #DC-AS down, linkfile->axel-geo_us_eu.yml REMOTE_CMD=update_us_eu ;; axel-geo_eu_as.yml) #DC-US down, linkfile->axel-geo_eu_as.yml REMOTE_CMD=update_eu_as ;; *) #default output echo "${DATE}--warining,no file match" exit 1 ;; esac echo "${DATE}--${WORKDIR}/check_nrpe -H {ns1,ns2}.zencoo.com -c ${REMOTE_CMD}" ${WORKDIR}/check_nrpe -H DNS1 -c ${REMOTE_CMD} ${WORKDIR}/check_nrpe -H DNS2 -c ${REMOTE_CMD} ;; esac exit 0
Script 3: change the link file on the DNS Service
#!/bin/bash #The script is called in the check_dc_status and change_dns_linkfile scripts LOG=/tmp/.dns_linkfile DATE=`date +%Y%m%d%H%M%S` DIR=/etc/pdns LN=axel-geo.yml FILE="`ls -l ${DIR}/${LN} | sed -n '/^l/p'|sed 's/.*-> //g'`" #$1 is check_dc_status and change_dns_linkfile passed parameters case $1 in #The first two scripts call the script through nrpe, $1 is the parameter passed in. check) FILE="`ls -l ${DIR}/${LN} | sed -n '/^l/p'|sed 's/.*-> //g'`" echo "$FILE" exit 0 ;; us_eu_as) TAGETFILE="${DIR}/axel-geo_us_eu_as.yml" ;; us_as) TAGETFILE="${DIR}/axel-geo_us_as.yml" ;; us_eu) TAGETFILE="${DIR}/axel-geo_us_eu.yml" ;; eu_as) TAGETFILE="${DIR}/axel-geo_eu_as.yml" ;; *) echo '$1 error' >>${LOG} exit 1 ;; esac if [ ! -f ${TAGETFILE} ];then echo '$TAGETFILE does not exist/${DATE}' >>${LOG} exit 1 elif [ "$FILE" == "$TAGETFILE" ];then echo "${DATE}-Link file is correct, no need to switch" >>${LOG} exit 0 else echo "${HOSTNAME}/${DATE} ln -snf $TAGETFILE ${DIR}/${LN}" >>${LOG} sudo /usr/bin/ln -snf $TAGETFILE ${DIR}/${LN} sudo /bin/pdns_control reload && echo "${DATE}-reload dns ok" >>${LOG} || echo "${DATE}-reload dns failed" >>${LOG} exit 0 fi
nagios configuration defines detection service and event handler
define service{ use generic-service host_name xxx service_description check_dc_status contact_groups admins,admins_jabber check_command check_nrpe_t60!check_dc_status #Call script to detect service status (script 1) event_handler change_dns_linkfile #Call the event command } define command { command_name change_dns_linkfile #$servicestate $$serviceoutput $corresponds to $1 and $2 in script 2 command_line $USER1$/eventhandlers/change_dns_linkfile $SERVICESTATE$ $SERVICEOUTPUT$ }
puppet configuration script 1 and script 2 will call script 3 through nrpe, and corresponding commands and parameters need to be defined.
<% if @fqdn == 'dns1xxxx' or @fqdn == 'dns2xxxx' -%> command[check_pdns_link]=<%= @pluginsdir %>/dns_file_check.sh check command[update_us_eu_as]=<%= @pluginsdir %>/dns_file_check.sh us_eu_as command[update_us_eu]=<%= @pluginsdir %>/dns_file_check.sh us_eu command[update_us_as]=<%= @pluginsdir %>/dns_file_check.sh us_as command[update_eu_as]=<%= @pluginsdir %>/dns_file_check.sh eu_as <% end -%>
The first time I got into Nagios event handler, it was very messy, and the script needs to be improved.