Centos 7.3 realizes disk sharing based on iscsi+pacemaker+corosync+gfs2+clvmd

Keywords: Ruby SSL Vmware yum

demand

One of our services needs to be able to access the stored data by mounting the data directory on the shared storage, so as to achieve high availability of multi-node clusters.

Previous blog posts pacemaker+drbd dual master We use drbd+gfs2 to implement it. Is there a simpler way?

This article will introduce iscsi+pacemaker+corosync+gfs2+clvmd to realize storage sharing.

We all know that multiple nodes can mount the same iscsi storage at the same time. If the volume created by lvm can be read by different nodes, but the data can not be shared properly, we need to use the following commands:

#Activation of volume
lvchange -ay /dev/vg/lv1
#Refresh the volume
lvchange --refresh /dev/vg/lv

This can not meet the normal use, how to solve?
Answer: Use gfs2+clvmd.

GFS2 is a native file system that communicates directly with the VFS layer of the Linux kernel file system interface. GFS2 uses distributed metadata and multiple logs to optimize cluster operations. To maintain the integrity of the file system, GFS2 uses a Distributed Lock Manager (DLM) to coordinate I/O. When a node operating system, it locks. When the node modifies the data on the GFS2 file system, the change is immediately visible to other cluster nodes using the file system.

CLVM (Cluster Logical Volume Manager, Cluster Logical Volume Manager) is an extension of a group of LVM clusters. These extensions allow computer clusters to use LVM to manage shared storage (for example, in SAN), and when LVM operations are performed on shared storage on one node, other nodes are notified through highly available components.

Environmental preparation

host name ip Remarks
node1 192.168.3.54 vmware virtual machine-centos7.3-ap1
node1 192.168.3.55 vmware virtual machine-centos7.3-ap2
iscsi 192.168.3.29 storage

To configure

I. Cluster initialization
1. Configure the host name and ip according to the table above. Please close selinux or firewall.
If the firewall is not closed, it is recommended that the relevant ports be opened.
2. Install components on all ap nodes

#Installation of pcs will be accompanied by installation of pacemaker, corosync, etc.
yum install pcs -y

3. Start pcsd at each node

systemctl enable pcsd.service
systemctl start pcsd.service

4. Authentication between node s in the cluster

#Set passwords for two node accounts (account is hacluster), and the passwords must be consistent
passwd hacluster

#Configuration Authentication (hereinafter referred to as Authentication Pass)
pcs cluster auth node1 node2        #Then comes the host name.

[root@node1 ~]# pcs cluster auth node1 node2
Username: hacluster
Password: 
node1: Authorized
node2: Authorized

5. Creating Clusters

##Creating a cluster on node1 and node2 requires only one node to execute
pcs cluster setup --name testcluster node1 node2

pcs will produce corosync.conf and modify the cib.xml file when executing the above commands. corosync.conf is the configuration file of corosync and cib.xml is the configuration file of pacemaker. These two configuration files are the core configuration of the cluster. It is recommended that the backup of these two configuration files be done when the system is reinstalled.

6. Start the cluster

#It only needs to be executed on a node
pcs cluster start --all     #Start cluster
pcs cluster enable --all    #Setting up Cluster Autostart
pcs cluster status      #View cluster status

Configuration of fence
Can refer to Chapter 5. Configuring a GFS2 File System in a Cluster
Although our cluster has been initialized, we must configure fence before configuring the gfs file system. Since our virtual machine was built in ESXI5.5, we use fence-agents-vmware-soap to virtual fence devices.
1. Installation components

#At each node
yum install fence-agents-vmware-soap

[root@node1 ~]# fence_vmware_soap -h
Usage:
    fence_vmware_soap [options]
Options:
   -a, --ip=[ip]                  IP address or hostname of fencing device
   -l, --username=[name]          Login name
   -p, --password=[password]      Login password or passphrase
   -z, --ssl                      Use ssl connection
   -t, --notls                    Disable TLS negotiation and force SSL3.0.
                                        This should only be used for devices that do not support TLS1.0 and up.
   -n, --plug=[id]                Physical plug number on device, UUID or
                                        identification of machine
   -u, --ipport=[port]            TCP/UDP port to use
                                        (default 80, 443 if --ssl option is used)
   -4, --inet4-only               Forces agent to use IPv4 addresses only
   -6, --inet6-only               Forces agent to use IPv6 addresses only
   -S, --password-script=[script] Script to run to retrieve password
   --ssl-secure                   Use ssl connection with verifying certificate
   --ssl-insecure                 Use ssl connection without verifying certificate
   -o, --action=[action]          Action: status, reboot (default), off or on
   -v, --verbose                  Verbose mode
   -D, --debug-file=[debugfile]   Debugging to output file
   -V, --version                  Output version information and exit
   -h, --help                     Display this help and exit
   -C, --separator=[char]         Separator for CSV created by 'list' operation
   --power-timeout=[seconds]      Test X seconds for status change after ON/OFF
   --shell-timeout=[seconds]      Wait X seconds for cmd prompt after issuing command
   --login-timeout=[seconds]      Wait X seconds for cmd prompt after login
   --power-wait=[seconds]         Wait X seconds after issuing ON/OFF
   --delay=[seconds]              Wait X seconds before fencing is started
   --retry-on=[attempts]          Count of attempts to retry power on

2. Viewing devices

[root@node1 ~]# fence_vmware_soap -z -l test@vsphere.local -p test -a 192.168.3.216 -o list --ssl-insecure
ap1,4214b3a6-68ed-42e9-f65d-6f932a185a55
ap2,4214d57d-766e-2e12-d372-d3b06e874dda

192.168.3.216 is the ESXI virtual host or vCenter where the two virtual machines are located.

3. Configuring fence

#It only needs to be executed on a node
#View the configuration parameters for fence_vmware_soap
pcs stonith describe fence_vmware_soap
#Configure fence
pcs stonith create fence_vmware_1 fence_vmware_soap  ipaddr=192.168.3.216 ipport=443 ssl_insecure=1 inet4_only=1 login="test@vsphere.local" passwd="test"  pcmk_host_map="ap1:4214b3a6-68ed-42e9-f65d-6f932a185a55;ap2:4214d57d-766e-2e12-d372-d3b06e874dda" pcmk_host_list="node1,node2" pcmk_host_check=static-list pcmk_monitor_timeout=60s pcmk_reboot_action=reboot power_wait=3 op monitor interval=60s

4. View fence

[root@node1 ~]# pcs stonith show --full
 Resource: fence_vmware (class=stonith type=fence_vmware_soap)
  Attributes: inet4_only=1 ipaddr=192.168.3.216 ipport=443 login=test@vsphere.local passwd=testpcmk_host_check=static-list pcmk_host_list=node1,node2 pcmk_host_map=ap1:4214b3a6-68ed-42e9-f65d-6f932a185a55;ap2:4214d57d-766e-2e12-d372-d3b06e874dda pcmk_monitor_timeout=60s power_wait=3 ssl_insecure=1
  Operations: monitor interval=60s (fence_vmware-monitor-interval-60s)

#View cluster status
[root@node1 ~]# pcs status
Cluster name: testcluster
Stack: corosync
Current DC: node1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Sun Jul 29 00:40:15 2018
Last change: Sun Jul 29 00:37:40 2018 by root via cibadmin on node1

2 nodes configured
7 resources configured

Online: [ node1 node2 ]

Full list of resources:

 fence_vmware   (stonith:fence_vmware_soap):    Started node1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Configuration of gfs
1. Installation components

#At each node
yum install lvm2-cluster gfs2-utils

2. Setting no_quorum_policy

#It only needs to be executed on a node
pcs property set no-quorum-policy=freeze

By default, the value of no-quorum-policy is set to stop, which means that once arbitration is lost, all resources on the remaining partition will be stopped immediately. Typically, this default value is the safest and best option, but unlike most resources, GFS2 requires arbitration to run. When arbitration is lost, neither the application installed with GFS2 nor the installation itself can stop correctly. Any attempt to stop these resources without arbitration will fail, which will eventually lead to the isolation of the entire cluster every time a quorum is lost.
To solve this problem, you can set no-quorum-policy = freeze when using GFS2. This means that when the arbitration is lost, the remaining partitions will not be able to perform any operations until the arbitration is retrieved.
3. Add dlm resources

#It only needs to be executed on a node
pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true

4. Enabling Cluster Locks

#At each node
/sbin/lvmconf --enable-cluster

5. Add clvm resources

#It only needs to be executed on a node
pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true

Note that clvmd and cmirrord deamons are started and managed by Pacemaker using ocf: heartbeat: clvm resource agent and do not need to be started when using system D startup. In addition, as part of the startup process, the ocf: heartbeat: clvm resource agent sets the lock_type parameter in the / etc/lvm/lvm.conf file to 3 and disables the lvmetad daemon.

Be careful:
The following errors were reported:

connect() failed on local socket: No such file or directory
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group VGgfs00

Because clvmd failed to start successfully, you may need to check if your fence is in trouble. What I encountered was that clvmd was not started because fence was not set or there was a problem with setting.

6. Dependency and startup order for creating dlm and clum

#It only needs to be executed on a node
pcs constraint order start dlm-clone then clvmd-clone
pcs constraint colocation add clvmd-clone with dlm-clone

7. Hang on iscsi and create gfs file system

#It only needs to be executed on a node
#iscsi mounted as / dev/sdb
pvcreate /dev/sdb
vgcreate -Ay -cy cluster_vg /dev/sdb
lvcreate -l 100%FREE -n cluster_lv cluster_vg
mkfs.gfs2 -j2 -p lock_dlm -t testcluste:test /dev/cluster_vg/cluster_lv

8. Add clusterfs resources

#It only needs to be executed on a node
pcs resource create clusterfs Filesystem device="/dev/cluster_vg/cluster_lv" directory="/mnt" fstype="gfs2" "options=noatime" op monitor interval=10s on-fail=fence clone interleave=true

9. Establishing the Startup Order and Dependency of GFS2 and clvmd

#It only needs to be executed on a node
pcs constraint order start clvmd-clone then clusterfs-clone
pcs constraint colocation add clusterfs-clone with clvmd-clone

10. Verify that gfs is mounted successfully
View mounts and write data tests

Posted by SWI03 on Sat, 22 Dec 2018 04:27:06 -0800