How to report errors through Kubernetes events

Keywords: Kubernetes

This article was first published in https://robberphex.com/error-reporting-with-kubernetes-events/

A Kubernetes Webhook is maintained in the group, which can intercept the creation request of pod and make some modifications (such as adding environment variables, adding init container, etc.).

The business logic itself is very simple, but if there are errors in the process, it is difficult to deal with. If you do not directly prevent the creation of pod, the application may not start. Either ignore the business logic, then silence will fail. No one knows that an error has occurred here.

Therefore, the simple idea is to access the alarm system, but this will cause the current components to be coupled with the specific alarm system.

In Kubernetes, there is an Event mechanism, which can record some events, such as warnings, errors and other information, which is more suitable for this scenario.

What is an Event / Event in Kubernetes?

Event is one of many resource objects in Kubernetes. It is usually used to record state changes in the cluster, ranging from cluster node exceptions to Pod startup and successful scheduling.

For example, if we Describe a pod, we can see the event corresponding to this Pod:

kubectl describe pod sc-b-68867c5dcb-sf9hn

It can be seen that from scheduling, to startup, and then to the final failure of the pod to pull the image, it will be recorded in the form of event.

Let's look at the structure of the next Event:

$ k get events -o json | jq .items[10]

{
  "apiVersion": "v1",
  "count": 1,
  "eventTime": null,
  "firstTimestamp": "2021-12-04T17:02:14Z",
  "involvedObject": {
    "apiVersion": "v1",
    "fieldPath": "spec.containers{sc-b}",
    "kind": "Pod",
    "name": "sc-b-68867c5dcb-sf9hn",
    "namespace": "default",
    "resourceVersion": "322554830",
    "uid": "24df4a07-f41e-42c2-ba26-d90940303b00"
  },
  "kind": "Event",
  "lastTimestamp": "2021-12-04T17:02:14Z",
  "message": "Error: ErrImagePull",
  "metadata": {
    "creationTimestamp": "2021-12-04T17:02:14Z",
    "name": "sc-b-68867c5dcb-sf9hn.16bd9bf933d60437",
    "namespace": "default",
    "resourceVersion": "1197082",
    "selfLink": "/api/v1/namespaces/default/events/sc-b-68867c5dcb-sf9hn.16bd9bf933d60437",
    "uid": "f928ff2d-c618-44a6-bf5a-5b0d3d20e95e"
  },
  "reason": "Failed",
  "reportingComponent": "",
  "reportingInstance": "",
  "source": {
    "component": "kubelet",
    "host": "eci"
  },
  "type": "Warning"
}

You can see that an event has several important characters:

  • Type - event type, which can be Warning, Normal, Error, etc
  • reason - the cause of the event, which can be Failed, Scheduled, Started, Completed, etc
  • message - description of the event
  • involvedObject - the resource object corresponding to this event, which can be Pod, Node, etc
  • Source - the source of this event, which can be kubelet, Kube apiserver, etc
  • firstTimestamp,lastTimestamp - the time of the first and last occurrence of this event

Based on this information, we can do some cluster level monitoring and alarm, such as Alibaba cloud ACK Send Event to SLS And then make an alarm according to the corresponding rules.

How to report events

We mentioned the Event in Kubernetes earlier, but we must report the Event to let the Kubernetes cluster know that the Event has occurred, so as to make subsequent monitoring and alarm.

How to access the Kubernetes API

The first step in reporting events is to access the Kubernetes API, which is based on the Restful API. Kubernetes is also based on this API, packaged with SDK and can be used directly.

Connect to Kubernetes API through SDK, including Two ways:

The first is accessed through the kubeconfg file (from the outside), and the second is accessed through the serviceaccount (from the Pod).

For simplicity, we use the first method as an example:

package main

import (
    "flag"
    "fmt"
    "path/filepath"

    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/homedir"
)

func main() {
    var kubeconfig *string
    if home := homedir.HomeDir(); home != "" {
        kubeconfig = flag.String("kubeconfig", filepath.Join(home, ".kube", "config"), "(optional) absolute path to the kubeconfig file")
    } else {
        kubeconfig = flag.String("kubeconfig", "", "absolute path to the kubeconfig file")
    }
    flag.Parse()

    config, err := clientcmd.BuildConfigFromFlags("", *kubeconfig)
    if err != nil {
        panic(err)
    }
    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        panic(err)
    }
    versionInfo, err := clientset.ServerVersion()
    if err != nil {
        panic(err)
    }
    fmt.Printf("Version: %#v\n", versionInfo)
}

Run this code to connect to the cluster and get the Kubernetes Server version:

Version: &version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-aliyun.1", GitCommit:"27f24d2", GitTreeState:"", BuildDate:"2021-08-19T10:00:16Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

How to create and report events

In the above example, with the clientset object, we now need to rely on this object to create an event in the Kuberentes cluster:

now := time.Now()
message := "test message at " + now.Format(time.RFC3339)
// Namespace is default
_, err = clientset.CoreV1().Events("default").Create(&apiv1.Event{
    ObjectMeta: metav1.ObjectMeta{
        GenerateName: "test-",
    },
    Type:           "Warning",
    Message:        message,
    Reason:         "OnePilotFail",
    FirstTimestamp: metav1.NewTime(now),
    LastTimestamp:  metav1.NewTime(now),
    InvolvedObject: apiv1.ObjectReference{
        Namespace: "default",
        Kind:      "Deployment",
        Name:      "sc-b",
    },
})
fmt.Printf("create event with err: %v\n", err)

In the above example, we created an Event named test - under namespace default, and the type of this Event is Warning.

We can also take a look at the final Event:

kubectl get events -o json | jq .items[353]

{
  "apiVersion": "v1",
  "eventTime": null,
  "firstTimestamp": "2021-12-04T17:27:06Z",
  "involvedObject": {
    "kind": "Deployment",
    "name": "sc-b",
    "namespace": "default"
  },
  "kind": "Event",
  "lastTimestamp": "2021-12-04T17:27:06Z",
  "message": "test message at 2021-12-05T01:27:06+08:00",
  "metadata": {
    "creationTimestamp": "2021-12-04T17:27:06Z",
    "generateName": "test-",
    "name": "test-vvjzp",
    "namespace": "default",
    "resourceVersion": "1198057",
    "selfLink": "/api/v1/namespaces/default/events/test-vvjzp",
    "uid": "f2bcdd1c-442f-4f61-921a-e18637ee5871"
  },
  "reason": "OnePilotFail",
  "reportingComponent": "",
  "reportingInstance": "",
  "source": {},
  "type": "Warning"
}

In this way, people who care about the corresponding events, such as operation and maintenance personnel, can monitor and alarm based on these information.

Usage scenario

Different from business events, Kubernetes events are resources in the cluster, and most of the people who pay attention to them are the maintainers of the cluster.

Therefore, this event reporting mechanism is more suitable for some basic components to let the cluster maintainer know the current cluster status.

If more flexible alarm and monitoring are needed, the time and alarm system closer to the business and richer rules can be used.

Posted by wobbit on Sun, 05 Dec 2021 19:22:26 -0800