Function calculation, automatic operation and maintenance practice 3 -- event triggering, automatic snapshot creation

Keywords: JSON snapshot network

Function calculation

Ali cloud Function calculation Is an event driven fully hosted computing service. Through function calculation, you don't need to manage infrastructure such as servers, just write code and upload it. Function calculation will prepare computing resources for you, run your code in a flexible and reliable way, and provide log query, performance monitoring, alarm and other functions. With function computing, you can quickly build any type of application and service without management and operation and maintenance. What's better, you only need to pay for the resources consumed by the actual running of the code, while there is no cost for the code not running.

Cloud monitoring

Ali cloud Cloud monitoring Provide an enterprise level open one-stop monitoring solution for cloud users out of the box. IT covers IT infrastructure monitoring, external network quality dial-up monitoring, business monitoring based on events, custom indicators and logs. To provide you with a more efficient, comprehensive and cost-effective monitoring service. < br / > cloud monitoring provides rich events, which are still in the process of enrichment( Event monitoring of cloud product system ), rich event trigger user-defined processing functions can achieve more perfect automatic operation and maintenance.

Thematic portal = > Function calculation for automatic operation and maintenance

Example scenario

In this paper, we focus on function calculation to deal with ecs restart events, because these ecs restart events need to be responded to by users with high priority at present; assuming that a previous ecs is restarted due to system error, users may urgently get up to do some verification or create snapshot processing. In this example, we have an example of a ecs restart event due to system error The machine that is restarted or restarted due to an instance error is automatically processed, such as creating a snapshot after a successful restart.

ecs system events

Event monitoring of cloud product system

Operation steps

Note: remember to set the permission to operate ecs for the role of the function's service

  • Sign in Cloud monitoring console , create alarm rules, and monitor the start and end of ecs restart due to instance error or Xirong error

Code

# -*- coding: utf-8 -*-
import logging
import json, random, string, time
from aliyunsdkcore import client
from aliyunsdkecs.request.v20140526.DeleteSnapshotRequest import DeleteSnapshotRequest
from aliyunsdkecs.request.v20140526.CreateSnapshotRequest import CreateSnapshotRequest
from aliyunsdkecs.request.v20140526.DescribeDisksRequest import DescribeDisksRequest
from aliyunsdkcore.auth.credentials import StsTokenCredential
LOGGER = logging.getLogger()
clt = None
def handler(event, context):
  creds = context.credentials
  sts_token_credential = StsTokenCredential(creds.access_key_id, creds.access_key_secret, creds.security_token)
  '''
  {
    "product": "ECS",
    "content": {
        "executeFinishTime": "2018-06-08T01:25:37Z",
        "executeStartTime": "2018-06-08T01:23:37Z",
        "ecsInstanceName": "timewarp",
        "eventId": "e-t4nhcpqcu8fqushpn3mm",
        "eventType": "InstanceFailure.Reboot",
        "ecsInstanceId": "i-bp18l0uopocfc98xxxx" 
    },
    "resourceId": "acs:ecs:cn-hangzhou:123456789:instance/i-bp18l0uopocfc98xxxx",
    "level": "CRITICAL",
    "instanceName": "instanceName",
    "status": "Executing",
    "name": "Instance:SystemFailure.Reboot:Executing", 
    "regionId": "cn-hangzhou"
  }
  '''
  evt = json.loads(event)
  content = evt.get("content");
  ecsInstanceId = content.get("ecsInstanceId");
  regionId = evt.get("regionId");
  global clt
  clt = client.AcsClient(region_id=regionId, credential=sts_token_credential)
  name = evt.get("name");
  name = name.lower()
  if name in ['Instance:SystemFailure.Reboot:Executing'.lower(), "Instance:InstanceFailure.Reboot:Executing".lower()]:
    pass
    # do other things
  
  if name in ['Instance:SystemFailure.Reboot:Executed'.lower(), "Instance:InstanceFailure.Reboot:Executed".lower()]:
    request = DescribeDisksRequest()
    request.add_query_param("RegionId", "cn-shenzhen")
    request.set_InstanceId(ecsInstanceId)
    response = _send_request(request)
    disks = response.get('Disks').get('Disk', [])
    for disk in disks:
      diskId = disk["DiskId"]
      SnapshotId = create_ecs_snap_by_id(diskId)
      LOGGER.info("Create ecs snap sucess, ecs id = %s , disk id = %s ", ecsInstanceId, diskId)
    
def create_ecs_snap_by_id(disk_id):
    LOGGER.info("Create ecs snap, disk id is %s ", disk_id)
    request = CreateSnapshotRequest()
    request.set_DiskId(disk_id)
    request.set_SnapshotName("reboot_" + ''.join(random.choice(string.ascii_lowercase) for _ in range(6)))
    response = _send_request(request)
    return response.get("SnapshotId")
# send open api request
def _send_request(request):
    request.set_accept_format('json')
    try:
        response_str = clt.do_action_with_exception(request)
        LOGGER.info(response_str)
        response_detail = json.loads(response_str)
        return response_detail
    except Exception as e:
        LOGGER.error(e)

Posted by pirri on Thu, 05 Dec 2019 05:17:53 -0800