Function calculation
Ali cloud Function calculation Is an event driven fully hosted computing service. Through function calculation, you don't need to manage infrastructure such as servers, just write code and upload it. Function calculation will prepare computing resources for you, run your code in a flexible and reliable way, and provide log query, performance monitoring, alarm and other functions. With function computing, you can quickly build any type of application and service without management and operation and maintenance. What's better, you only need to pay for the resources consumed by the actual running of the code, while there is no cost for the code not running.
Cloud monitoring
Ali cloud Cloud monitoring Provide an enterprise level open one-stop monitoring solution for cloud users out of the box. IT covers IT infrastructure monitoring, external network quality dial-up monitoring, business monitoring based on events, custom indicators and logs. To provide you with a more efficient, comprehensive and cost-effective monitoring service. < br / > cloud monitoring provides rich events, which are still in the process of enrichment( Event monitoring of cloud product system ), rich event trigger user-defined processing functions can achieve more perfect automatic operation and maintenance.
Thematic portal = > Function calculation for automatic operation and maintenance
Example scenario
In this paper, we focus on function calculation to deal with ecs restart events, because these ecs restart events need to be responded to by users with high priority at present; assuming that a previous ecs is restarted due to system error, users may urgently get up to do some verification or create snapshot processing. In this example, we have an example of a ecs restart event due to system error The machine that is restarted or restarted due to an instance error is automatically processed, such as creating a snapshot after a successful restart.
Event monitoring of cloud product system
Operation steps
- Create a function (the function code is at the end of the article). For function creation, please refer to Function calculation helloworld
Note: remember to set the permission to operate ecs for the role of the function's service
- Sign in Cloud monitoring console , create alarm rules, and monitor the start and end of ecs restart due to instance error or Xirong error
-
mock debugging
-
Simulate real ecs events < br / > please refer to Drill system event handler? So Easy~
Code
# -*- coding: utf-8 -*- import logging import json, random, string, time from aliyunsdkcore import client from aliyunsdkecs.request.v20140526.DeleteSnapshotRequest import DeleteSnapshotRequest from aliyunsdkecs.request.v20140526.CreateSnapshotRequest import CreateSnapshotRequest from aliyunsdkecs.request.v20140526.DescribeDisksRequest import DescribeDisksRequest from aliyunsdkcore.auth.credentials import StsTokenCredential LOGGER = logging.getLogger() clt = None def handler(event, context): creds = context.credentials sts_token_credential = StsTokenCredential(creds.access_key_id, creds.access_key_secret, creds.security_token) ''' { "product": "ECS", "content": { "executeFinishTime": "2018-06-08T01:25:37Z", "executeStartTime": "2018-06-08T01:23:37Z", "ecsInstanceName": "timewarp", "eventId": "e-t4nhcpqcu8fqushpn3mm", "eventType": "InstanceFailure.Reboot", "ecsInstanceId": "i-bp18l0uopocfc98xxxx" }, "resourceId": "acs:ecs:cn-hangzhou:123456789:instance/i-bp18l0uopocfc98xxxx", "level": "CRITICAL", "instanceName": "instanceName", "status": "Executing", "name": "Instance:SystemFailure.Reboot:Executing", "regionId": "cn-hangzhou" } ''' evt = json.loads(event) content = evt.get("content"); ecsInstanceId = content.get("ecsInstanceId"); regionId = evt.get("regionId"); global clt clt = client.AcsClient(region_id=regionId, credential=sts_token_credential) name = evt.get("name"); name = name.lower() if name in ['Instance:SystemFailure.Reboot:Executing'.lower(), "Instance:InstanceFailure.Reboot:Executing".lower()]: pass # do other things if name in ['Instance:SystemFailure.Reboot:Executed'.lower(), "Instance:InstanceFailure.Reboot:Executed".lower()]: request = DescribeDisksRequest() request.add_query_param("RegionId", "cn-shenzhen") request.set_InstanceId(ecsInstanceId) response = _send_request(request) disks = response.get('Disks').get('Disk', []) for disk in disks: diskId = disk["DiskId"] SnapshotId = create_ecs_snap_by_id(diskId) LOGGER.info("Create ecs snap sucess, ecs id = %s , disk id = %s ", ecsInstanceId, diskId) def create_ecs_snap_by_id(disk_id): LOGGER.info("Create ecs snap, disk id is %s ", disk_id) request = CreateSnapshotRequest() request.set_DiskId(disk_id) request.set_SnapshotName("reboot_" + ''.join(random.choice(string.ascii_lowercase) for _ in range(6))) response = _send_request(request) return response.get("SnapshotId") # send open api request def _send_request(request): request.set_accept_format('json') try: response_str = clt.do_action_with_exception(request) LOGGER.info(response_str) response_detail = json.loads(response_str) return response_detail except Exception as e: LOGGER.error(e)