Backup and recovery scheme of cloud disk data volume in ACK cluster

Keywords: Web Server snapshot Nginx Kubernetes github

The cloud disk data volume is usually used for data storage when deploying stateful services in the Alibaba cloud ACK cluster. The cloud disk itself provides a backup (snapshot) recovery mechanism for data. However, how to integrate the underlying capabilities and K8S services and flexibly provide them to applications is a problem that the cloud original survival storage service needs to solve. K8S uses the following two features to achieve backup recovery capability:

The backup of cloud disk (snapshot function) is realized by VolumeSnapshot object;

Data recovery (snapshot recovery) is realized by the DataSource function in PVC;

Because VolumeSnapshot is in Alpha state in K8S 1.16, there is no default deployment snapshot function in the ACK cluster at present, so you need to install the plug-in manually to use it;

K8S snapshot Description:

In order to realize snapshot related functions in Kubernetes, the following three related resource types are defined through CRD:

Volume snapshot content: describes the snapshot instance of the storage back end, which is created and maintained by the system administrator without NameSpace; similar to PV concept;

VolumeSnapshot: declare a snapshot instance, created and maintained by the user, belonging to a specific NameSpace; similar to the PVC concept;

VolumeSnapshotClass: defines a snapshot class that describes the parameters and Controller used to create the snapshot; similar to the StorageClass concept;

Snapshot resource binding rules:

When using Snapshot objects, like pv and pvc, VolumeSnapshot and VolumeSnapshotContent need to be bound first;

Volumesnapshotif no static VolumeSnapshotContent can be bound, a dynamic VolumeSnapshotContent will be created;

The binding between VolumeSnapshotContent and VolumeSnapshot is one-to-one;

Deleting the VolumeSnapshotContent will delete the back-end snapshot at the same time;

1. Volume snapshot template

Here is a VolumeSnapshotClass definition template:

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotClass
metadata:
  name: default-snapclass
snapshotter: disk-snapshot
parameters:
  forceDelete: "false"

Among them:

Snapshot: defines the controller used by the VolumeSnapshot using this snapshot class;

forceDelete: indicates whether the snapshot can be deleted when the cloud disk references the snapshot (it is not allowed to delete by default, because when the cloud disk is created with the snapshot as the data source, there will be a delay in the creation process, and forced deletion may cause data loss);

Here is a VolumeSnapshot definition template:

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
  name: snapshot-test
spec:
  snapshotClassName: default-snapclass
  source:
    name: pvc-disk
    kind: PersistentVolumeClaim

Among them:

Definition in VolumeSnapshot: data source (PVC name) and class name of snapshot creation;

Snapshot data source (PVC): it is defined to take a snapshot of the cloud disk volume, and find the disk id through PVC PV handler;

Snapshot classname: defines the snapshot class used for snapshot;

Create a snapshot instance of a cloud disk (associated with PVC) by creating a volume snapshot resource;

2. Recover data through snapshot

Creating a cloud disk from a cloud disk snapshot is the basic function provided by alicloud cloud disk. In the container service, you can specify which snapshot to use by defining DataSource in pvc. When you create a cloud disk dynamically, you can use the snapshot to create a cloud disk;

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: disk-snapshot
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: alicloud-disk-ssd
  dataSource:
    name: snapshot-test
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  resources:
    requests:
      storage: 20Gi

Among them:

storageClassName: to create a storage class of pv. The disk controller pointed to needs to support the DataSource feature;

dataSource: Specifies the snapshot resource, indicating that the snapshot data will be used when creating a cloud disk;

Plug in deployment:

Before deploying CSI snapshot, you need to create an ACK 1.16 cluster, and choose to use CSI plug-in when creating cluster; Cluster creation

Download the CSI snapshot template: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/deploy/disk/snapshot/csi-snapshotter.yaml

Deployment plug-ins:

$ kubectl apply -f csi-snapshotter.yaml

After deployment, the csi plug-ins in the cluster are as follows:

# kubectl get pod -nkube-system |grep csi
csi-plugin-25xhh                                    9/9     Running   0          28h
csi-plugin-5xjqh                                    9/9     Running   0          28h
csi-plugin-9p4kd                                    9/9     Running   0          28h
csi-plugin-tmlmg                                    9/9     Running   0          28h
csi-plugin-tw57q                                    9/9     Running   0          28h
csi-provisioner-577d66cbb7-zks24                    8/8     Running   0          161m
csi-provisioner-577d66cbb7-kja32                    8/8     Running   0          161m
csi-snapshotter-859bdf8888-mq4dk                    2/2     Running   0          161m

use:

The following figure is an example flow chart, which is divided into three steps: 1, 2 and 3:

Step 1: create the original application, create a cloud disk volume and save the data;

Step 2: create a VolumeSnapshot, which will automatically create the VolumeSnapshotContent and the snapshot instance of the storage side;

Step 3: create a new application and configure PVC to reference the snapshot object created in step 2;

Through the above three steps:

Backup: data in Volume1 is backed up to Snapshot1;

Recovery: the data of snapshot 1 (data of Volume1) is recovered to Volume2;

To create a VolumeSnapshotClass snapshot class:

$ kubectl apply -f volumesnapshotcalss.yaml

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotClass
metadata:
  name: default-snapclass
snapshotter: diskplugin.csi.alibabacloud.com
parameters:
  forceDelete: "true"
# kubectl get VolumeSnapshotClass
NAME                AGE
default-snapclass   4h40m

Step 1: create the original application and write the data:

$ kubectl apply -f sts.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
        - name: disk-ssd
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: disk-ssd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "alicloud-disk-snap"
      resources:
        requests:
          storage: 20Gi

Write data to pod:

# kubectl exec -ti web-0 touch /data/test
# kubectl exec -ti web-0 ls /data
lost+found  test

Step 2: create a VolumeSnapshot:

$ kubectl apply -f snapshot.yaml

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-test
spec:
  snapshotClassName: default-snapclass
  source:
    name: disk-ssd-web-0
    kind: PersistentVolumeClaim

Check the cluster status. The creation of VolumeSnapshot and VolumeSnapshotContent is completed. At the same time, check the ECS console to see that the creation of snapshot instances is completed

# kubectl get VolumeSnapshot
NAME                AGE
new-snapshot-test   173m

# kubectl get VolumeSnapshotContent
NAME                                               AGE
snapcontent-b9bcccde-9ea4-41f0-967d-3647b8a5cc29   173m

Step 3: Data Recovery

$ kubectl apply -f sts-snapshot.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: disk-snapshot-restore
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: alicloud-disk-snap
  resources:
    requests:
      storage: 20Gi
  dataSource:
    name: new-snapshot-test
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: web-restore
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: pvc-disk
          mountPath: /data
      volumes:
        - name: pvc-disk
          persistentVolumeClaim:
            claimName: disk-snapshot-restore

In the PVC definition, specify dataSource as VolumeSnapshot type, and select VolumeSnapshot with the name of new snapshot test created in step 2.

View the container data and verify that the recovery was successful:

# kubectl exec -ti web-restore-0 ls /data
lost+found  test

It can be seen that data recovery is realized.

This scheme only gives a scenario of creating a snapshot and recovering. Later, we will provide a scheme of creating a timed snapshot.

Posted by Brit on Sat, 25 Apr 2020 02:56:22 -0700