ES backup recovery

Keywords: Database snapshot ElasticSearch network AWS

@[toc]

1, ES backup

1.1 basic concepts

   Elasticsearch replica provides high reliability; partial node loss can be tolerated without service interruption. However, replicas do not provide protection against catastrophic failures. In this case, it is necessary to back up for processing.
   backup Elasticsearch clusters can use the snapshot API. The API takes the current state and data in the cluster and saves it to a shared repository. This backup process is "intelligent.". The first snapshot is a full copy of the data, but all subsequent snapshots retain the difference between the existing snapshot and the new data. With the snapshot of data from time to time, backups are added and deleted incrementally. This means that subsequent backups can be quite fast because they transfer only a small amount of data.

   each snapshot can contain indexes created in various versions of Elasticsearch, and when restoring a snapshot, it must be possible to restore all indexes to the target cluster. If any indexes in the snapshot were created in an incompatible version, the snapshot cannot be restored.

  • Index snapshots created in 6.x can be restored to 7.x.
  • Index snapshots created in 5.x can be restored to 6.x.
  • Index snapshots created in 2.x can be recovered to 5.x.
  • You can restore index snapshots created in 1.x to 2.x.

   to use this function, you must first create a warehouse to hold data. There are several warehouse types to choose from:

  • GCS
  • AZURE
  • S3
  • NAS
  • HDFS
  • OSS
  • Shared block storage

When backing up data before upgrading, if the snapshot contains indexes created in a version that is incompatible with the upgraded version, the snapshot cannot be restored after the upgrade. If you eventually encounter a situation where you need to restore an index snapshot that is not compatible with the current running cluster version, you can restore it to the latest compatible version, and then use reindex from remote to rebuild the index.

1.2 create warehouse

mkdir -p /backups/my_backup
chown -R es:es /backups/my_backup

Deploy a shared file system repository:

--Warehouse Name: my_backup
PUT _snapshot/my_backup20200603/
{
--The warehouse type should be a shared file system
    "type": "fs",
    "settings": {
--The address of the mounted device
        "location": "/backups/my_backup/20200603" 
    }
}

Note: the shared file system path must be accessible to all nodes in the cluster
This step creates the repository and the required metadata at the mount point. There are other configurations that you may want to configure, depending on the node, network performance, and warehouse location

//Location the location of the snapshot. Mandatory 
//compress opens the compression of the snapshot file. Compression applies only to metadata files (index mapping and settings). The data file is not compressed. The default is true.
//chunk_size if necessary, you can break a large file into blocks during the snapshot. Specifies the value of the block size, and the cell, for example, 1GB, 10MB, 5KB, 500B. The default is null (unlimited block size).
//max_restore_bytes_per_sec the throttle recovery rate of each node. The default is 40mb / s.
//max_snapshot_bytes_per_sec snapshot rate limit per node. The default is 40mb / s.
#POST will update the settings of the existing warehouse, but the underlying data will not be modified
POST _snapshot/my_backup20200603/ 
{
    "type": "fs",
    "settings": {
        "location": "/backups/my_backup/20200603",
        "max_snapshot_bytes_per_sec" : "50mb", 
        "max_restore_bytes_per_sec" : "50mb"
    }
}

1.3 snapshot all open indexes

   a warehouse can contain multiple snapshots. Each snapshot is related to a series of indexes (such as all indexes, some indexes, or a single index). The snapshot should have a unique name.

PUT _snapshot/my_backup20200603/snapshot_1
#This will back up all open indexes to my_ The next name of the backup warehouse is snapshot_ In the snapshot of 1. The call returns immediately, and the snapshot runs in the background.

   usually, the snapshot runs as a background process, but sometimes you have to wait for the snapshot to be created before returning. You can add a wait_ for_ The completion tag is implemented, but pay attention to the timeout limit of kibana. Parameter wait_for_completion determines whether the request returns immediately after the snapshot is initialized (default) or wait until the snapshot is created. When the snapshot is initialized, all previous snapshot information will be loaded into memory, so it takes several seconds (or even minutes) for the change request to return in a large warehouse, even if the parameter wait_ for_ The value of completion is set to false.

PUT _snapshot/my_backup20200603/snapshot_1?wait_for_completion=true
#This blocks the call until the snapshot completes. Large snapshots take a long time to return

1.4 snapshot specified index

   the default behavior is to back up all open indexes. However, if you need to back up some indexes, you can specify which indexes to back up when you snapshot the cluster

PUT _snapshot/my_backup20200604/snapshot_1
{
    "indices": "name"
}

1.5 list snapshot related information

   to obtain the information of a single snapshot, a GET request is made directly to the warehouse and snapshot Name:

GET _snapshot/my_backup20200603/snapshot_1
#Various information related to the snapshot of the request

   to get a complete list of all snapshots in a warehouse, use the_ The all placeholder replaces the specific snapshot Name:

GET _snapshot/my_backup20200603/_all
{
  "snapshots" : [
    {
      "snapshot" : "snapshot_1",
      "uuid" : "z0lgNdUaQyy7_p2SPgZ2AQ",
      "version_id" : 7040099,
      "version" : "7.4.0",
      "indices" : [
        ".kibana_task_manager_1",
        ".security-7",
        ".apm-agent-configuration",
        ".monitoring-es-7-2020.06.03",
        ".monitoring-kibana-7-2020.06.03",
        "kibana_sample_data_logs",
        ".kibana_1",
        "name",
        "age"
      ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2020-06-03T14:52:13.247Z",
      "start_time_in_millis" : 1591195933247,
      "end_time" : "2020-06-03T14:52:14.256Z",
      "end_time_in_millis" : 1591195934256,
      "duration_in_millis" : 1009,
      "failures" : [ ],
      "shards" : {
        "total" : 9,
        "failed" : 0,
        "successful" : 9
      }
    },
    {
      "snapshot" : "snapshot_2",
      "uuid" : "54KqGBlPQdqKn6UNzvJjcg",
      "version_id" : 7040099,
      "version" : "7.4.0",
      "indices" : [
        ".kibana_task_manager_1",
        ".security-7",
        ".apm-agent-configuration",
        ".monitoring-es-7-2020.06.03",
        ".monitoring-kibana-7-2020.06.03",
        "kibana_sample_data_logs",
        ".kibana_1",
        "name",
        "age"
      ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2020-06-03T14:52:23.285Z",
      "start_time_in_millis" : 1591195943285,
      "end_time" : "2020-06-03T14:52:23.688Z",
      "end_time_in_millis" : 1591195943688,
      "duration_in_millis" : 403,
      "failures" : [ ],
      "shards" : {
        "total" : 9,
        "failed" : 0,
        "successful" : 9
      }
    },
    {
      "snapshot" : "snapshot_3",
      "uuid" : "wdCKB4WiRPqwY9BO154vHA",
      "version_id" : 7040099,
      "version" : "7.4.0",
      "indices" : [
        ".kibana_task_manager_1",
        ".security-7",
        ".apm-agent-configuration",
        ".monitoring-es-7-2020.06.03",
        ".monitoring-kibana-7-2020.06.03",
        "kibana_sample_data_logs",
        ".kibana_1",
        "name",
        "age"
      ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2020-06-03T14:52:28.704Z",
      "start_time_in_millis" : 1591195948704,
      "end_time" : "2020-06-03T14:52:28.907Z",
      "end_time_in_millis" : 1591195948907,
      "duration_in_millis" : 203,
      "failures" : [ ],
      "shards" : {
        "total" : 9,
        "failed" : 0,
        "successful" : 9
      }
    }
  ]
}

1.6 monitoring snapshot progress

  wait_ for_ The completion tag provides a basic form of monitoring, but it is not enough even when it is used for snapshot recovery of a medium-sized cluster.
   the other two API s give more detailed information about the status of the snapshot. First, you can perform a GET for the snapshot ID:

GET _snapshot/my_backup20200603/snapshot_3

   if the snapshot is still in progress when this command is called, you will see information such as when it started, how long it has been running, and so on. Note, however, that this API uses the same thread pool as the snapshot mechanism. If the snapshot is very large, the interval between status updates will be large, because the API is competing for the same thread pool resources.

The better option is to pull_ status API data:

GET _ snapshot/my_ backup20200603/snapshot_ 3/_ The status API returns immediately, and then gives the detailed statistics output:

   the response includes the overall status of the snapshot, but also includes statistics that drill down to each index and each slice. This gives a very detailed view of the progress of the snapshot. Fragmentation can be in different completion states:

INITIALIZING
 Fragmentation is checking the cluster status to see if it can be snapshot. This is usually very fast.
STARTED
 Data is being transferred to the warehouse.
FINALIZING
 Data transfer complete; sharding is now sending snapshot metadata.
DONE
 Snapshot complete!
FAILED
 An error was encountered during snapshot processing. This partition / index / snapshot cannot be completed. Check the log for more information.

1.7 deleting a snapshot

   finally, we need a command to delete all old snapshots that are no longer useful. This requires a simple DELETE HTTP call to the warehouse / snapshot Name:

DELETE _snapshot/my_backup/snapshot_2

    it's important to delete a snapshot using the API, not other mechanisms (such as manual deletion, or using the automatic cleanup tool on AWS S3). Because snapshots are incremental, it is possible that many snapshots depend on past segments. The delete API knows which data is still being used by more recent snapshots, and then only segments that are no longer in use are deleted.

                       .

1.8 cancel snapshot

   to cancel a snapshot, delete the snapshot while it is in progress

DELETE _ snapshot/my_ backup20200603/snapshot_ This will interrupt the snapshot process. Then delete half of the snapshots in the warehouse.

1.9 cleaning up the warehouse

POST _snapshot/my_backup20200603/_cleanup?pretty

    when any snapshot is deleted from the warehouse, most of the cleanup operations performed by this endpoint are performed automatically. If you delete a snapshot regularly, in most cases, using this feature will not save any space or only a small amount of space, so you should reduce its call frequency accordingly.

1.10 delete warehouse

   after deleting a warehouse, Elasticsearch only deletes references to the location where the warehouse stores the snapshot. The snapshot itself remains unchanged.

DELETE _snapshot/my_backup20200604

2, ES recovery

2.1 restore snapshot

   ES recovery is simple, as long as you add the snapshot ID after the recovery back to the cluster_ restore:

POST _snapshot/my_backup/snapshot_1/_restore

   the default behavior is to restore all indexes stored in this snapshot. If snapshot_1 includes five indexes, all of which will be restored to our cluster. Like the snapshot API, we can also choose which index we want to recover.
   there are additional options to rename the index. This option allows the index name to be matched by a pattern, and then a new name is provided through the recovery process. This option is useful if you want to restore old data to verify content or do other processing without replacing existing data. Let's recover a single index from the snapshot and provide a replacement name:

POST /_snapshot/my_backup/snapshot_1/_restore
{
//Recover only the name index, ignoring the remaining indexes that exist in the snapshot
    "indices": "name", 
//Find that the provided pattern matches the index being recovered on
    "rename_pattern": "name(.+)", 
//rename
    "rename_replacement": "restored_index_$1" 
}

This will restore the index_1 to the cluster, but renamed restored_index_1 .

   similar to a snapshot, the restore command returns immediately and the recovery process takes place in the background. If you prefer HTTP calls to block until recovery is complete, add wait_for_completion mark:

POST _snapshot/my_backup/snapshot_1/_restore?wait_for_completion=true

2.2 monitoring recovery operation

                    . In terms of internal implementation, it is equivalent to restore fragmentation from a warehouse and restore from another node.
   if you want to monitor the progress of the recovery, you can use the recovery API. This is a general purpose API to show the state of fragmentation moving in a cluster.
   this API can be called separately for the specified index in recovery

GET restored_index_3/_recovery

    or view all indexes in the cluster, which may include other partition moves unrelated to the recovery process:

GET /_recovery/
{
  ".monitoring-kibana-7-2020.06.03" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591184352400,
        "stop_time_in_millis" : 1591184354191,
        "total_time_in_millis" : 1790,
        "source" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "target" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 6,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 57,
          "total" : 57,
          "percent" : "100.0%",
          "total_on_start" : -1,
          "total_time_in_millis" : 1692
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "EXISTING_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591184349472,
        "stop_time_in_millis" : 1591184351099,
        "total_time_in_millis" : 1626,
        "source" : {
          "bootstrap_new_history_uuid" : false
        },
        "target" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 359017,
            "reused_in_bytes" : 359017,
            "recovered_in_bytes" : 0,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 21,
            "reused" : 21,
            "recovered" : 0,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 12,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 57,
          "total" : 57,
          "percent" : "100.0%",
          "total_on_start" : 57,
          "total_time_in_millis" : 1577
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  ".security-7" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591184353847,
        "stop_time_in_millis" : 1591184354295,
        "total_time_in_millis" : 447,
        "source" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "target" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 2,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : -1,
          "total_time_in_millis" : 415
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "EXISTING_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591184352286,
        "stop_time_in_millis" : 1591184352942,
        "total_time_in_millis" : 655,
        "source" : {
          "bootstrap_new_history_uuid" : false
        },
        "target" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 83272,
            "reused_in_bytes" : 83272,
            "recovered_in_bytes" : 0,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 38,
            "reused" : 38,
            "recovered" : 0,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 5,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 588
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  ".kibana_task_manager_1" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591184354328,
        "stop_time_in_millis" : 1591184355333,
        "total_time_in_millis" : 1004,
        "source" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "target" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 1,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : -1,
          "total_time_in_millis" : 294
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "EXISTING_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591184349520,
        "stop_time_in_millis" : 1591184350042,
        "total_time_in_millis" : 521,
        "source" : {
          "bootstrap_new_history_uuid" : false
        },
        "target" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 12872,
            "reused_in_bytes" : 12872,
            "recovered_in_bytes" : 0,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 7,
            "reused" : 7,
            "recovered" : 0,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 0,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 465
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  ".monitoring-es-7-2020.06.03" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591184356727,
        "stop_time_in_millis" : 1591184361202,
        "total_time_in_millis" : 4474,
        "source" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "target" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 2,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 1697,
          "total" : 1697,
          "percent" : "100.0%",
          "total_on_start" : -1,
          "total_time_in_millis" : 4108
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "EXISTING_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591184352320,
        "stop_time_in_millis" : 1591184356308,
        "total_time_in_millis" : 3987,
        "source" : {
          "bootstrap_new_history_uuid" : false
        },
        "target" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 7584798,
            "reused_in_bytes" : 7584798,
            "recovered_in_bytes" : 0,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 51,
            "reused" : 51,
            "recovered" : 0,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 9,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 1696,
          "total" : 1696,
          "percent" : "100.0%",
          "total_on_start" : 1696,
          "total_time_in_millis" : 3858
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  ".apm-agent-configuration" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591184353017,
        "stop_time_in_millis" : 1591184353862,
        "total_time_in_millis" : 845,
        "source" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "target" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 1,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : -1,
          "total_time_in_millis" : 674
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "EXISTING_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591184352314,
        "stop_time_in_millis" : 1591184352691,
        "total_time_in_millis" : 377,
        "source" : {
          "bootstrap_new_history_uuid" : false
        },
        "target" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 283,
            "reused_in_bytes" : 283,
            "recovered_in_bytes" : 0,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 1,
            "reused" : 1,
            "recovered" : 0,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 0,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 337
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  "name" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "EMPTY_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591195879522,
        "stop_time_in_millis" : 1591195879541,
        "total_time_in_millis" : 18,
        "source" : { },
        "target" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 6,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 5
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591195879565,
        "stop_time_in_millis" : 1591195879725,
        "total_time_in_millis" : 159,
        "source" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "target" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 230,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 230,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 1,
            "reused" : 0,
            "recovered" : 1,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 43,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 1,
          "total" : 1,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 98
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  "kibana_sample_data_logs" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "SNAPSHOT",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591186687934,
        "stop_time_in_millis" : 1591186688200,
        "total_time_in_millis" : 265,
        "source" : {
          "repository" : "my_backup",
          "snapshot" : "snapshot_1",
          "version" : "7.4.0",
          "index" : "kibana_sample_data_logs",
          "restoreUUID" : "25k0feqzTzWa9m6qTcGFTw"
        },
        "target" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 11820808,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 11820808,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 27,
            "reused" : 0,
            "recovered" : 27,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 234,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 13
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591186688243,
        "stop_time_in_millis" : 1591186688639,
        "total_time_in_millis" : 396,
        "source" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "target" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 11820808,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 11820808,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 27,
            "reused" : 0,
            "recovered" : 27,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 348,
          "source_throttle_time_in_millis" : 190,
          "target_throttle_time_in_millis" : 102
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 35
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  ".kibana_1" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "EXISTING_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591184349501,
        "stop_time_in_millis" : 1591184350141,
        "total_time_in_millis" : 640,
        "source" : {
          "bootstrap_new_history_uuid" : false
        },
        "target" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 22482,
            "reused_in_bytes" : 22482,
            "recovered_in_bytes" : 0,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 16,
            "reused" : 16,
            "recovered" : 0,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 2,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 579
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591184353648,
        "stop_time_in_millis" : 1591184354331,
        "total_time_in_millis" : 682,
        "source" : {
          "id" : "YeJHST86S6ei3vN2Y6snfQ",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9301",
          "ip" : "192.168.137.11",
          "name" : "node1"
        },
        "target" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 3,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : -1,
          "total_time_in_millis" : 605
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  },
  "age" : {
    "shards" : [
      {
        "id" : 0,
        "type" : "PEER",
        "stage" : "DONE",
        "primary" : false,
        "start_time_in_millis" : 1591195876756,
        "stop_time_in_millis" : 1591195876875,
        "total_time_in_millis" : 119,
        "source" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "target" : {
          "id" : "o1cz718RT96ahpjXOZz5kg",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9303",
          "ip" : "192.168.137.11",
          "name" : "node3"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 230,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 230,
            "percent" : "100.0%"
          },
          "files" : {
            "total" : 1,
            "reused" : 0,
            "recovered" : 1,
            "percent" : "100.0%"
          },
          "total_time_in_millis" : 25,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 1,
          "total" : 1,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 80
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      },
      {
        "id" : 0,
        "type" : "EMPTY_STORE",
        "stage" : "DONE",
        "primary" : true,
        "start_time_in_millis" : 1591195876703,
        "stop_time_in_millis" : 1591195876724,
        "total_time_in_millis" : 21,
        "source" : { },
        "target" : {
          "id" : "fLaYEiq_TrCKNDWoDHs4uw",
          "host" : "192.168.137.11",
          "transport_address" : "192.168.137.11:9302",
          "ip" : "192.168.137.11",
          "name" : "node2"
        },
        "index" : {
          "size" : {
            "total_in_bytes" : 0,
            "reused_in_bytes" : 0,
            "recovered_in_bytes" : 0,
            "percent" : "0.0%"
          },
          "files" : {
            "total" : 0,
            "reused" : 0,
            "recovered" : 0,
            "percent" : "0.0%"
          },
          "total_time_in_millis" : 8,
          "source_throttle_time_in_millis" : 0,
          "target_throttle_time_in_millis" : 0
        },
        "translog" : {
          "recovered" : 0,
          "total" : 0,
          "percent" : "100.0%",
          "total_on_start" : 0,
          "total_time_in_millis" : 8
        },
        "verify_index" : {
          "check_index_time_in_millis" : 0,
          "total_time_in_millis" : 0
        }
      }
    ]
  }
}

The output will be similar to this (note that depending on the activity of the cluster, the output may be very large):

  • The type field tells the nature of the recovery; the fragment is recovering from a snapshot.
  • The source hash describes the specific snapshot and repository that are the source of recovery.
  • The percent field gives you an idea of the state of recovery.
        the output lists all indexes currently undergoing recovery, and then lists all the tiles in those indexes. Each partition will have statistics such as start / stop time, duration, recovery percentage, number of bytes transmitted, etc.

2.3 cancel recovery

    to cancel a recovery, you need to drop the index being recovered. Because the recovery process is actually a piecemeal recovery, sending a delete index API to modify the cluster state can stop the recovery process. For example:

DELETE /restored_name

If restored_name is recovering. This delete command will stop the recovery and delete all data that has been recovered to the cluster.

3, The impact of cluster block on backup and recovery operations

    many backup and recovery operations are affected by clusters and index blocks. For example, creating and deleting a warehouse requires writing global metadata access. The backup operation requires all indexes and their metadata as well as global metadata to be readable. The global index is created at the global index recovery level, but it is actually ignored during the recovery process. The warehouse content is not part of the cluster, so the cluster block does not affect internal warehouse operations.

4, Appendix

//////////////////////Backup/////////////////////////
//Create / update warehouse configuration my_backup20200603
POST _snapshot/my_backup20200603/ 
{
    "type": "fs",
    "settings": {
        "location": "/backups/my_backup/20200603",
        "max_snapshot_bytes_per_sec" : "50mb", 
        "max_restore_bytes_per_sec" : "50mb"
    }
}

GET _cat/tasks?v
//Generate test data
GET /_cat/indices?v
GET /_cat/indices/name,age?v
POST age/_doc
{
    "age" : 1
}
POST name/_doc
{
    "user" : "Mike"
}
GET name/_search
GET age/_search
GET movies/_search
GET movies/_mappings
GET movies/_settings

//Snapshot all open indexes
PUT _snapshot/my_backup20200603/snapshot_1
PUT _snapshot/my_backup20200603/snapshot_2
PUT _snapshot/my_backup20200603/snapshot_3

//Create / update warehouse configuration my_backup20200604
POST _snapshot/my_backup20200604/
{
    "type": "fs",
    "settings": {
        "location": "/backups/my_backup/20200604",
        "max_snapshot_bytes_per_sec" : "50mb", 
        "max_restore_bytes_per_sec" : "50mb"
    }
}

//Snapshot specifies the index
PUT _snapshot/my_backup20200604/snapshot_1
{
    "indices": "name"
}

//List snapshot related information
GET _snapshot/my_backup20200603/snapshot_1
GET _snapshot/my_backup20200603/_all

// View snapshot progress
GET _snapshot/my_backup20200604/snapshot_1
GET _snapshot/my_backup20200604/snapshot_1/_status


//Delete snapshot / cancel snapshot
DELETE _snapshot/my_backup20200603/snapshot_1

DELETE name
DELETE age
//Delete warehouse
DELETE _snapshot/my_backup20200603
//Clean up the warehouse
POST _snapshot/my_backup20200603/_cleanup?pretty

//////////////////////Recovery/////////////////////////

//Restore snapshot
POST _snapshot/my_backup20200603/snapshot_1/_restore
POST /_snapshot/my_backup20200603/snapshot_3/_restore
{

    "indices": "name,age", 
    "rename_pattern": "(.+)", 
    "rename_replacement": "$1" 
}
GET restored_index_name/_search
//View recovery status
GET restored_index_name/_recovery
GET /_recovery/

//Cancel restore
DELETE /restored_index_name

Posted by Naez on Mon, 29 Jun 2020 00:38:26 -0700