ETCD introduction
ETCD It is a distributed and consistent KV storage system for shared configuration and service discovery. Etcd is an open source project initiated by CoreOS, and the license agreement is Apache.
ETCD usage scenario
ETCD has many usage scenarios, including but not limited to:
- configuration management
- Service registered on Discovery
- Choose the master
- Application scheduling
- Distributed queue
- Distributed lock
ETCD stores k8s all data information
etcd is a very important service of k8s cluster, which stores all data information of the cluster. Similarly, if a disaster occurs or etcd data is lost, cluster data recovery will be affected. Therefore, this article focuses on how to backup and restore data.
ETCD some query operations
- View cluster status
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://192.168.1.36:2379,https://192.168.1.37:2379,https://192.168.1.38:2379 endpoint health https://192.168.1.36:2379 is healthy: successfully committed proposal: took = 1.698385ms https://192.168.1.37:2379 is healthy: successfully committed proposal: took = 1.577913ms https://192.168.1.38:2379 is healthy: successfully committed proposal: took = 5.616079ms
- Get a key information
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://192.168.1.36:2379,https://192.168.1.37:2379,https://192.168.1.38:2379 get /registry/apiregistration.k8s.io/apiservices/v1.apps
- Get etcd version information
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://192.168.1.36:2379,https://192.168.1.37:2379,https://192.168.1.38:2379 version
- Get all ETCD key s
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://192.168.1.36:2379,https://192.168.1.37:2379,https://192.168.1.38:2379 get / --prefix --keys-only
environment
host IP k8s-master1 192.168.1.36 k8s-master2 192.168.1.37 k8s-master3 192.168.1.38
- ETCD version 3.2.12
- Kubernetes version v1.15.6 binary installation
backups
Note: the etcdctl commands of different versions of ETCD are different, but they are roughly the same, which is used for backup in this article napshot save , Just back up one node at a time.
Command backup (backup on k8s-master1 machine):
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://192.168.1.36:2379 snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db
Backup script (backup on k8s-master1 machine):
#!/usr/bin/env bash date; CACERT="/opt/kubernetes/ssl/ca.pem" CERT="/opt/kubernetes/ssl/server.pem" EKY="/opt/kubernetes/ssl/server-key.pem" ENDPOINTS="192.168.1.36:2379" ETCDCTL_API=3 etcdctl \ --cacert="${CACERT}" --cert="${CERT}" --key="${EKY}" \ --endpoints=${ENDPOINTS} \ snapshot save /data/etcd_backup_dir/etcd-snapshot-`date +%Y%m%d`.db # Backup retention for 30 days find /data/etcd_backup_dir/ -name *.db -mtime +30 -exec rm -f {} \;
recovery
preparation
- Stop on all masters kube-apiserver service
$ systemctl stop kube-apiserver # Confirm whether the Kube apiserver service is stopped $ ps -ef | grep kube-apiserver
- Stop all ETCD services in the cluster
$ systemctl stop etcd
- Remove all data in ETCD storage directory
$ mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak
- Copy ETCD backup snapshot
# Copy backup from k8s-master1 machine $ scp /data/etcd_backup_dir/etcd-snapshot-20191222.db root@k8s-master2:/data/etcd_backup_dir/ $ scp /data/etcd_backup_dir/etcd-snapshot-20191222.db root@k8s-master3:/data/etcd_backup_dir/
Restore backup
# Operation on k8s-master1 machine $ ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20191222.db \ --name etcd-0 \ --initial-cluster "etcd-0=https://192.168.1.36:2380,etcd-1=https://192.168.1.37:2380,etcd-2=https://192.168.1.38:2380" \ --initial-cluster-token etcd-cluster \ --initial-advertise-peer-urls https://192.168.1.36:2380 \ --data-dir=/var/lib/etcd/default.etcd # Operation on k8s-master2 machine $ ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20191222.db \ --name etcd-1 \ --initial-cluster "etcd-0=https://192.168.1.36:2380,etcd-1=https://192.168.1.37:2380,etcd-2=https://192.168.1.38:2380" \ --initial-cluster-token etcd-cluster \ --initial-advertise-peer-urls https://192.168.1.37:2380 \ --data-dir=/var/lib/etcd/default.etcd # Operation on k8s-master3 machine $ ETCDCTL_API=3 etcdctl snapshot restore /data/etcd_backup_dir/etcd-snapshot-20191222.db \ --name etcd-2 \ --initial-cluster "etcd-0=https://192.168.1.36:2380,etcd-1=https://192.168.1.37:2380,etcd-2=https://192.168.1.38:2380" \ --initial-cluster-token etcd-cluster \ --initial-advertise-peer-urls https://192.168.1.38:2380 \ --data-dir=/var/lib/etcd/default.etcd
After the above three etcds are recovered, log in to the three machines in turn to start the ETCD
$ systemctl start etcd
After three etcds are started, check the status of ETCD cluster
$ ETCDCTL_API=3 etcdctl --cacert=/opt/kubernetes/ssl/ca.pem --cert=/opt/kubernetes/ssl/server.pem --key=/opt/kubernetes/ssl/server-key.pem --endpoints=https://192.168.1.36:2379,https://192.168.1.37:2379,https://192.168.1.38:2379 endpoint health
All three etcds are healthy. Start Kube apiserver at each Master
$ systemctl start kube-apiserver
Check whether the Kubernetes cluster returns to normal
$ kubectl get cs
summary
Kubernetes cluster backup mainly backs up ETCD clusters. During recovery, the whole sequence of recovery is mainly considered:
Stop Kube apiserver -- > stop etcd -- > recover data -- > start etcd -- > start Kube apiserver
Note: when backing up an ETCD cluster, only one ETCD needs to be backed up. When recovering, take the same backup data for recovery.