After the GlusterFS distributed storage system was migrated in a previous period of time, the phenomenon of dropping lines and intermittent happened frequently, which made it unable to work properly for a long time.Observations revealed that one of the nodes was always restarting, further revealing that only one of the four nodes in the replicated storage was alive, while the other two nodes were still running, but the corresponding directory was empty, and the gluster volume status showed no online.The procedure for repairing the GlusterFS distributed storage system is documented below.
1. View Status
Get the volume status and information for GlusterFS:
#Get state is the result of runtime. sudo gluster volume status gvzr00 #Getting information is a predefined result. sudo gluster volume info gvzr00
The difference between the results defined and the results at run time is the problem.
Get information about the online node (peer):
sudo gluster peer status
One of the nodes was found to be disconnetted. After restarting several times and failing to update the system software, it was decided to remove the node temporarily.
2. Remove Nodes
Remove the node using the following command:
sudo gluster peer detach
However, execution failed.
- Tips that there is a brick connected on this node, you need to remove the bricks located on this node on all volume s first.
First use volume info to view the corresponding bricks, then force the removal of bricks:
#Removing brick on replica volume gvzr00 requires specifying the replica parameter. sudo gluster volume remove-brick gvzr00 replica 2 10.1.1.193:/zpool/gvzr00 force #Remove brick on strip volume gvz00 (note: data loss may occur). sudo gluster volume remove-brick gvz00 10.1.1.193:/zpool/gvz00 force
Then force the peer to be removed (because it is offline, the force parameter must be used):
sudo gluster peer detach 10.1.1.193 force
3. Recovery Node
Re-add peer:
sudo gluster peer add 10.1.1.193
Get the status of the peer:
sudo gluster peer status
Re-add brick:
sudo gluster volume add-brick gvzr00 replica 2 10.1.1.193:/zpool/gvzr00
Get the status of the volume:
sudo gluster volume info gvzr00 sudo gluster volume status gvzr00
The output is as follows:
sudo gluster volume status gvzr00 Status of volume: gvzr00 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.1.205:/zpool/gvzr00 49153 0 Y 30347 Brick 10.1.1.193:/zpool/gvzr00 49152 0 Y 15144 Brick 10.1.1.150:/zpool/gvzr00 49152 0 Y 11586 NFS Server on localhost 2049 0 Y 16425 Self-heal Daemon on localhost N/A N/A Y 16499 NFS Server on 10.1.1.150 N/A N/A N N/A Self-heal Daemon on 10.1.1.150 N/A N/A Y 11661 NFS Server on 10.1.1.205 N/A N/A N N/A Self-heal Daemon on 10.1.1.205 N/A N/A Y 5848 NFS Server on 10.1.1.203 2049 0 Y 27732 Self-heal Daemon on 10.1.1.203 N/A N/A Y 27770 NFS Server on 10.1.1.167 2049 0 Y 24585 Self-heal Daemon on 10.1.1.167 N/A N/A Y 24619 NFS Server on 10.1.1.202 2049 0 Y 28924 Self-heal Daemon on 10.1.1.202 N/A N/A Y 28941 NFS Server on 10.1.1.234 2049 0 Y 26891 Self-heal Daemon on 10.1.1.234 N/A N/A Y 26917 NFS Server on 10.1.1.193 2049 0 Y 15689 Self-heal Daemon on 10.1.1.193 N/A N/A Y 15724 Task Status of Volume gvzr00 ------------------------------------------------------------------------------ There are no active volume tasks
The basic storage service is back to normal.
4. Restore JupyterHub service
For use in Kubernetes, further:
- Deploying highly available Kubernetes 1.17.0 using kubeadm
- Kubernetes 1.17.0 Management Interface Dashboard 2
- Quick Setup JupyterHub for K8s
First you need to create PVs and PVCs in Kubernetes.Reference resources:
Then JupyterHub runtime error, Notebook Server can not start, go into the pod log to find the prompt message "NoneType", fix it as follows:
kubectl patch deploy -n jupyter hub --type json \ --patch '[{"op": "replace", "path": "/spec/template/spec/containers/0/command", "value": ["bash", "-c", "\nmkdir -p ~/hotfix\ncp \ -r /usr/local/lib/python3.6/dist-packages/kubespawner ~/hotfix\nls -R ~/hotfix\npatch ~/hotfix/kubespawner/spawner.py \ << EOT\n72c72\n< key=lambda x: x.last_timestamp,\n---\n> key=lambda x: x.last_timestamp and x.last_timestamp.timestamp() or 0.,\nEOT\n\nPYTHONPATH=$HOME/hotfix \ jupyterhub --config /srv/jupyterhub_config.py --upgrade-db\n"]}]'
Go back to JupyterHub's service and get back to normal.