A thorough understanding of kubernetes CNI

Keywords: Linux network JSON Kubernetes shell

Off-line installation packages for various versions of kubernetes

CNI interface is very simple, especially some novices must overcome the fear, and I explore unexpectedly, this paper combines theory and practice, read down carefully will be very thorough understanding of the principle.
<!--more-->

Environmental Introduction

When we install kubernetes, we don't install CNI first. If we use it sealyun offline package Then modify kube/conf/master.sh

Just leave the following:

[root@helix105 shell]# cat master.sh 
kubeadm init --config ../conf/kubeadm.yaml
mkdir ~/.kube
cp /etc/kubernetes/admin.conf ~/.kube/config

kubectl taint nodes --all node-role.kubernetes.io/master-

Clear CNI related directories:

rm -rf /opt/cni/bin/*
rm -rf /etc/cni/net.d/*

Start kubernetes, and if you've already installed kubeadm reset:

cd kube/shell && sh init.sh && sh master.sh

At this point, your node is not ready, and your coredns can't be assigned to address either:

[root@helix105 shell]# kubectl get pod -n kube-system -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP              NODE                    NOMINATED NODE   READINESS GATES
coredns-5c98db65d4-5fh6c                        0/1     Pending   0          54s   <none>          <none>                  <none>           <none>
coredns-5c98db65d4-dbwmq                        0/1     Pending   0          54s   <none>          <none>                  <none>           <none>
kube-controller-manager-helix105.hfa.chenqian   1/1     Running   0          19s   172.16.60.105   helix105.hfa.chenqian   <none>           <none>
kube-proxy-k74ld                                1/1     Running   0          54s   172.16.60.105   helix105.hfa.chenqian   <none>           <none>
kube-scheduler-helix105.hfa.chenqian            1/1     Running   0          14s   172.16.60.105   helix105.hfa.chenqian   <none>           <none>
[root@helix105 shell]# kubectl get node
NAME                    STATUS     ROLES    AGE   VERSION
helix105.hfa.chenqian   NotReady   master   86s   v1.15.0

Installation of CNI

Create CNI Profile

$ mkdir -p /etc/cni/net.d
$ cat >/etc/cni/net.d/10-mynet.conf <<EOF
{
    "cniVersion": "0.2.0",
    "name": "mynet",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "subnet": "10.22.0.0/16",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ]
    }
}
EOF
$ cat >/etc/cni/net.d/99-loopback.conf <<EOF
{
    "cniVersion": "0.2.0",
    "name": "lo",
    "type": "loopback"
}
EOF

One of the two configurations here is to plug a network card into the container and hang it on the bridge. The other is responsible for the local loop.

When configured, the node ready:

[root@helix105 shell]# kubectl get node
NAME                    STATUS   ROLES    AGE   VERSION
helix105.hfa.chenqian   Ready    master   15m   v1.15.0

But coredns will always be in the Container Creating state because the bin file has not yet:

failed to find plugin "bridge" in path [/opt/cni/bin]

plugins There are many CNI implementations, such as the bridge we configured above.

$ cd $GOPATH/src/github.com/containernetworking/plugins
$ ./build_linux.sh
$ cp bin/* /opt/cni/bin
$ ls bin/
bandwidth  dhcp      flannel      host-local  loopback  portmap  sbr     tuning
bridge     firewall  host-device  ipvlan      macvlan   ptp      static  vlan

There are a lot of binaries here. We don't need to pay attention to all of them when we learn. Just look at PTP (simply creating device pairs) or bridge.

Look at coredns already being able to assign addresses:

[root@helix105 plugins]# kubectl get pod -n kube-system -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP              NODE                    NOMINATED NODE   READINESS GATES
coredns-5c98db65d4-5fh6c                        1/1     Running   0          3h10m   10.22.0.8       helix105.hfa.chenqian   <none>           <none>
coredns-5c98db65d4-dbwmq                        1/1     Running   0          3h10m   10.22.0.9

Look at the bridge. There are two devices on cni0. As we have configured in the CNI configuration above, the type field specifies which bin file to use and the bridge field specifies the bridge name:

[root@helix105 plugins]# brctl show
bridge name    bridge id        STP enabled    interfaces
cni0        8000.8ef6ac49c2f7    no        veth1b28b06f
                                        veth1c093940

principle

To better understand what kubelet is doing, we can find a script to explain it. script This script can also be used to test your CNI:

To make it easy to read, I delete something that is not important. The original script can be picked up in the connection.

# Create a container first, just to get a net namespace
contid=$(docker run -d --net=none golang:1.12.7 /bin/sleep 10000000) 
pid=$(docker inspect -f '{{ .State.Pid }}' $contid)
netnspath=/proc/$pid/ns/net # This is what we need.

kubelet start-up pod It's time to create containers. pod Of network namespaces,Go ahead again ns Pass to cni Give Way cni To configure

./exec-plugins.sh add $contid $netnspath # Pass in two parameters to the next script, containerid and net namespace paths

docker run --net=container:$contid $@
NETCONFPATH=${NETCONFPATH-/etc/cni/net.d}

i=0
# Get container id and network ns
contid=$2 
netns=$3

# Several environment variables are set up here, and CNI command line tools can get these parameters.
export CNI_COMMAND=$(echo $1 | tr '[:lower:]' '[:upper:]')
export PATH=$CNI_PATH:$PATH # This specifies the path of the CNI bin file
export CNI_CONTAINERID=$contid 
export CNI_NETNS=$netns

for netconf in $(echo $NETCONFPATH/10-mynet.conf | sort); do
        name=$(jq -r '.name' <$netconf)
        plugin=$(jq -r '.type' <$netconf) # The type field of CNI configuration file corresponds to the binary program name
        export CNI_IFNAME=$(printf eth%d $i) # Container Intranet Card Name

        # Here the command line tool is executed
        res=$($plugin <$netconf) # Here the CNI configuration file is also passed to the CNI command line tool through standard input.
        if [ $? -ne 0 ]; then
                # Output the results to standard output, so that kubelet can get some information such as container address.
                errmsg=$(echo $res | jq -r '.msg')
                if [ -z "$errmsg" ]; then
                        errmsg=$res
                fi

                echo "${name} : error executing $CNI_COMMAND: $errmsg"
                exit 1
        let "i=i+1"
done

To sum up:

         CNI Profile
         Container ID
         Network ns
kubelet -------------->  CNI command
   ^                        |
   |                        |
   +------------------------+
       RESULTS STANDARD OUTPUT

bridge CNI Implementation

Now that it's so simple, you can go and see what's achieved:

bridge CNI code

//cmdAdd is responsible for creating the network
func cmdAdd(args *skel.CmdArgs) error 

//The input parameters are all written in here. The previous parameters are read from environment variables, and the CNI configuration is read from stdin.
type CmdArgs struct {
    ContainerID string
    Netns       string
    IfName      string
    Args        string
    Path        string
    StdinData   []byte
}

So CNI configuration file in addition to name type these specific fields, you can also add some of your own fields. And then to parse their own.

And then you have to depend on yourself for everything.

//The device pair is created here and mounted on the cni0 King Bridge.
hostInterface, containerInterface, err := setupVeth(netns, br, args.IfName, n.MTU, n.HairpinMode, n.Vlan)

The specific way to hang up is to call it.netlink This library is also used by sealos for kernel load.

err := netns.Do(func(hostNS ns.NetNS) error { //Create device pairs
    hostVeth, containerVeth, err := ip.SetupVeth(ifName, mtu, hostNS)
    ...
    //Configure the name of the network card in the container, mac address, etc.
    contIface.Name = containerVeth.Name
    contIface.Mac = containerVeth.HardwareAddr.String()
    contIface.Sandbox = netns.Path()
    hostIface.Name = hostVeth.Name
    return nil
})
...

// Find the host device pair name based on index
hostVeth, err := netlink.LinkByName(hostIface.Name)
...
hostIface.Mac = hostVeth.Attrs().HardwareAddr.String()

// Host-end equipment paired to the bridge
if err := netlink.LinkSetMaster(hostVeth, br); err != nil {}

// Setting up hairpin mode
if err = netlink.LinkSetHairpin(hostVeth, hairpinMode); err != nil {
}

// Setting vlanid
if vlanID != 0 {
    err = netlink.BridgeVlanAdd(hostVeth, uint16(vlanID), true, true, false, true)
}

return hostIface, contIface, nil

Finally, the results are returned to:

type Result struct {
    CNIVersion string         `json:"cniVersion,omitempty"`
    Interfaces []*Interface   `json:"interfaces,omitempty"`
    IPs        []*IPConfig    `json:"ips,omitempty"`
    Routes     []*types.Route `json:"routes,omitempty"`
    DNS        types.DNS      `json:"dns,omitempty"`
}

// So kubelet receives the return message.
func (r *Result) PrintTo(writer io.Writer) error {
    data, err := json.MarshalIndent(r, "", "    ")
    if err != nil {
        return err
    }
    _, err = writer.Write(data)
    return err
}

Such as:

{
  "cniVersion": "0.4.0",
  "interfaces": [                                            (this key omitted by IPAM plugins)
      {
          "name": "<name>",
          "mac": "<MAC address>",                            (required if L2 addresses are meaningful)
          "sandbox": "<netns path or hypervisor identifier>" (required for container/hypervisor interfaces, empty/omitted for host interfaces)
      }
  ],
  "ips": [
      {
          "version": "<4-or-6>",
          "address": "<ip-and-prefix-in-CIDR>",
          "gateway": "<ip-address-of-the-gateway>",          (optional)
          "interface": <numeric index into 'interfaces' list>
      },
      ...
  ],
  "routes": [                                                (optional)
      {
          "dst": "<ip-and-prefix-in-cidr>",
          "gw": "<ip-of-next-hop>"                           (optional)
      },
      ...
  ],
  "dns": {                                                   (optional)
    "nameservers": <list-of-nameservers>                     (optional)
    "domain": <name-of-local-domain>                         (optional)
    "search": <list-of-additional-search-domains>            (optional)
    "options": <list-of-options>                             (optional)
  }
}

summary

CNI interface level is very simple, so more is in the implementation of CNI itself, understand the above can be achieved on their own a CNI, is it cool, will let you become more familiar with the network and more calm attitude to investigate network problems.

Scanning Focus on sealyun

Exploring additive QQ groups: 98488045

Posted by daglasen on Fri, 02 Aug 2019 03:48:39 -0700