Rate Computation of Data Relevant to IPVS Receiving and Sending

Keywords: network Linux

This paper introduces the rate calculation of data receiving and receiving in IPVS.

Enabling estimator

The ip_vs_start_estimator function mounts the estimator contained in the second parameter stats into the estimator list in the ipvs network namespace.

void ip_vs_start_estimator(struct netns_ipvs *ipvs, struct ip_vs_stats *stats)
{   
    struct ip_vs_estimator *est = &stats->est;
    
    INIT_LIST_HEAD(&est->list);
    
    spin_lock_bh(&ipvs->est_lock);
    list_add(&est->list, &ipvs->est_list);
    spin_unlock_bh(&ipvs->est_lock);
}

The following function calls ip_vs_start_estimator to start the estimator. The call in function _ip_vs_update_dest is used to estimate the rate associated with each real server; the call in function ip_vs_add_service is used to estimate the rate in virtual service.

static void __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest, struct ip_vs_dest_user_kern *udest, int add)
{
    struct netns_ipvs *ipvs = svc->ipvs;

    if (add) {
        ip_vs_start_estimator(svc->ipvs, &dest->stats);
}
static int ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u, struct ip_vs_service **svc_p)
{
    ip_vs_start_estimator(ipvs, &svc->stats);
}
static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
{
    ip_vs_start_estimator(ipvs, &ipvs->tot_stats);
}

The initialization of the IP vs performance estimator is accomplished by the function ip_vs_estimator_net_init. The key is to initialize an estimation timer, which takes 2 seconds and the timeout processing function is estimation_timer. This setting is network namespace independent, and each namespace has an ipvs estimator.

int __net_init ip_vs_estimator_net_init(struct netns_ipvs *ipvs)
{
    INIT_LIST_HEAD(&ipvs->est_list);
    spin_lock_init(&ipvs->est_lock);
    timer_setup(&ipvs->est_timer, estimation_timer, 0);
    mod_timer(&ipvs->est_timer, jiffies + 2 * HZ);
}

In the timeout function estimate_timer, five rate values are calculated: connection rate, input message rate, output message rate, input data rate and output data rate. The above five rate values are updated every 2 seconds. Rate estimation is done by calculating the rate of the last 8 seconds and the rate of the second seconds every 2 seconds. The ratio of this value to the final average rate is 1/4. See the following formula:

avgrate = avgrate*(1-W) + rate*W

where W = 2^(-2)

Take the new connection CPS per second as an example, e - > CPS = e - > CPS + (rate - E - > cps) * 1/4 = e - > CPS * (1-1/4) + rate * 1/4. To prevent data loss caused by right-shift operations, the kernel shifted the number of connections to the left by 9 bits in computational support. For inbytes and outbytes data, four bits were left-shifted. E - > CPS saves the amount of messages received in the last 2 seconds.

static void estimation_timer(struct timer_list *t)
{   
    struct ip_vs_estimator *e;
    struct ip_vs_stats *s;
    u64 rate;
    struct netns_ipvs *ipvs = from_timer(ipvs, t, est_timer);
    
    spin_lock(&ipvs->est_lock);
    list_for_each_entry(e, &ipvs->est_list, list) {
        s = container_of(e, struct ip_vs_stats, est);
        
        spin_lock(&s->lock);
        ip_vs_read_cpu_stats(&s->kstats, s->cpustats);
        
        /* scaled by 2^10, but divided 2 seconds */
        rate = (s->kstats.conns - e->last_conns) << 9;
        e->last_conns = s->kstats.conns;
        e->cps += ((s64)rate - (s64)e->cps) >> 2;
        
        rate = (s->kstats.inpkts - e->last_inpkts) << 9;
        e->last_inpkts = s->kstats.inpkts;
        e->inpps += ((s64)rate - (s64)e->inpps) >> 2;
        
        rate = (s->kstats.outpkts - e->last_outpkts) << 9;
        e->last_outpkts = s->kstats.outpkts;
        e->outpps += ((s64)rate - (s64)e->outpps) >> 2;
        
        /* scaled by 2^5, but divided 2 seconds */
        rate = (s->kstats.inbytes - e->last_inbytes) << 4;
        e->last_inbytes = s->kstats.inbytes;
        e->inbps += ((s64)rate - (s64)e->inbps) >> 2;
        
        rate = (s->kstats.outbytes - e->last_outbytes) << 4;
        e->last_outbytes = s->kstats.outbytes;
        e->outbps += ((s64)rate - (s64)e->outbps) >> 2;
        spin_unlock(&s->lock);
    }
    spin_unlock(&ipvs->est_lock);
    mod_timer(&ipvs->est_timer, jiffies + 2*HZ);
}

The right-shift operation is performed in the estimation_timer of the estimated timeout function, and the right-shift operation is performed in the rate reading function ip_vs_read_estimator. Take cps as an example, because the number of connections saved in E - > cps is 2 seconds, the number of connections per second is 10 bits instead of 9 bits.

void ip_vs_read_estimator(struct ip_vs_kstats *dst, struct ip_vs_stats *stats)
{
struct ip_vs_estimator *e = &stats->est;

dst->cps = (e->cps + 0x1FF) >> 10; 
dst->inpps = (e->inpps + 0x1FF) >> 10;
dst->outpps = (e->outpps + 0x1FF) >> 10;
dst->inbps = (e->inbps + 0xF) >> 5;
dst->outbps = (e->outbps + 0xF) >> 5;

}

ip_vs_read_cpu_stats reads statistics protected by sequential locks. The statistics of each processor are accumulated.

static void ip_vs_read_cpu_stats(struct ip_vs_kstats *sum, struct ip_vs_cpu_stats __percpu *stats)
{   
    bool add = false;
    
    for_each_possible_cpu(i) { 
        struct ip_vs_cpu_stats *s = per_cpu_ptr(stats, i);
        unsigned int start;
        u64 conns, inpkts, outpkts, inbytes, outbytes;
        
        if (add) {
            do {
                start = u64_stats_fetch_begin(&s->syncp);
                conns = s->cnt.conns;
                inpkts = s->cnt.inpkts;
                outpkts = s->cnt.outpkts;
                inbytes = s->cnt.inbytes;
                outbytes = s->cnt.outbytes;
            } while (u64_stats_fetch_retry(&s->syncp, start));
            sum->conns += conns;
            sum->inpkts += inpkts;
            sum->outpkts += outpkts;
            sum->inbytes += inbytes;
            sum->outbytes += outbytes;
        } else {
            add = true;
            do {
                start = u64_stats_fetch_begin(&s->syncp);
                sum->conns = s->cnt.conns;
                sum->inpkts = s->cnt.inpkts;
                sum->outpkts = s->cnt.outpkts;
                sum->inbytes = s->cnt.inbytes;
                sum->outbytes = s->cnt.outbytes;
            } while (u64_stats_fetch_retry(&s->syncp, start));
        }
    }
}

Input statistics

The function ip_vs_in_stats is used for statistics of input information. As can be seen from the code, after receiving a packet, the kernel will increase the length of input message and input data of its final real server, as well as the length of input message and input data of the virtual service matched by the packet. Finally, the input statistics in the ipvs network namespace will be added.

The modification of statistical information is protected by sequential locks.

static inline void ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
{
    struct ip_vs_dest *dest = cp->dest;
    struct netns_ipvs *ipvs = cp->ipvs;

    if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) {
        struct ip_vs_cpu_stats *s;
        struct ip_vs_service *svc;

        s = this_cpu_ptr(dest->stats.cpustats);
        u64_stats_update_begin(&s->syncp);
        s->cnt.inpkts++;
        s->cnt.inbytes += skb->len;
        u64_stats_update_end(&s->syncp);

        svc = rcu_dereference(dest->svc);
        s = this_cpu_ptr(svc->stats.cpustats);
        u64_stats_update_begin(&s->syncp);
        s->cnt.inpkts++;
        s->cnt.inbytes += skb->len;
        u64_stats_update_end(&s->syncp);

        s = this_cpu_ptr(ipvs->tot_stats.cpustats);
        u64_stats_update_begin(&s->syncp);
        s->cnt.inpkts++;
        s->cnt.inbytes += skb->len;
        u64_stats_update_end(&s->syncp);
    }
}

The above input statistics function is called in the ip_vs_in function, while the ip_vs_in function is called at both hook points of NF_INET_LOCAL_IN and NF_INET_LOCAL_OUT. So this statistical function can count the data from the outside of the system or the application layer into the ipvs system. In addition, in the case of scheduling failure, such as in the tcp_conn_scheme function, if ignored is not set, the ip_vs_leave function may also call the ip_vs_in_states function to increase statistics.

static int tcp_conn_schedule(struct netns_ipvs *ipvs, int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
          int *verdict, struct ip_vs_conn **cpp, struct ip_vs_iphdr *iph)
{
    if (svc) {
        int ignored;
        /*
         * Let the virtual server select a real server for the incoming connection, and create a connection entry.
         */
        *cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
        if (!*cpp && ignored <= 0) {
            if (!ignored)
                *verdict = ip_vs_leave(svc, skb, pd, iph);

Output statistics

The function ip_vs_out_stats is used for statistics of output information. Similar to the above input statistics function ip_vs_in_stats, this function adds output statistics from real servers, virtual services and ipvs network namespaces. The modification of statistical information is protected by sequential locks.

static inline void ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
{
    struct ip_vs_dest *dest = cp->dest;
    struct netns_ipvs *ipvs = cp->ipvs;

    if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) {
        struct ip_vs_cpu_stats *s;
        struct ip_vs_service *svc;

        s = this_cpu_ptr(dest->stats.cpustats);
        u64_stats_update_begin(&s->syncp);
        s->cnt.outpkts++;
        s->cnt.outbytes += skb->len;
        u64_stats_update_end(&s->syncp);

        svc = rcu_dereference(dest->svc);
        s = this_cpu_ptr(svc->stats.cpustats);
        u64_stats_update_begin(&s->syncp);
        s->cnt.outpkts++;
        s->cnt.outbytes += skb->len;
        u64_stats_update_end(&s->syncp);

        s = this_cpu_ptr(ipvs->tot_stats.cpustats);
        u64_stats_update_begin(&s->syncp);
        s->cnt.outpkts++;
        s->cnt.outbytes += skb->len;
        u64_stats_update_end(&s->syncp);
    }
}

The above output statistics function has call flow at three hook points of NF_INET_LOCAL_IN, NF_INET_FORWARD and NF_INET_LOCAL_OUT. For these three hook points, in NAT forwarding mode, if a matching connection is found, it indicates that it is a retaliatory message and adds statistical information.

Connection statistics

The connection statistics function ip_vs_conn_stats is used to increase the connection statistics among real servers, virtual services and ipvs network namespaces. The modification of statistical information is protected by sequential locks.

static inline void ip_vs_conn_stats(struct ip_vs_conn *cp, struct ip_vs_service *svc)
{
    struct netns_ipvs *ipvs = svc->ipvs;
    struct ip_vs_cpu_stats *s;

    s = this_cpu_ptr(cp->dest->stats.cpustats);
    u64_stats_update_begin(&s->syncp);
    s->cnt.conns++;
    u64_stats_update_end(&s->syncp);

    s = this_cpu_ptr(svc->stats.cpustats);
    u64_stats_update_begin(&s->syncp);
    s->cnt.conns++;
    u64_stats_update_end(&s->syncp);

    s = this_cpu_ptr(ipvs->tot_stats.cpustats);
    u64_stats_update_begin(&s->syncp);
    s->cnt.conns++;
    u64_stats_update_end(&s->syncp);
}

The above connection statistics functions are called in ip_vs_sched_persist, ip_vs_new_conn_out and ip_vs_schedule functions. It is important to note that the function is called only after a new ipvs connection is created. The following function ip_vs_new_conn_out:

struct ip_vs_conn *ip_vs_new_conn_out(struct ip_vs_service *svc, struct ip_vs_dest *dest, 
                      struct sk_buff *skb, const struct ip_vs_iphdr *iph, __be16 dport,  __be16 cport)
{
    cp = ip_vs_conn_new(&param, dest->af, daddr, dport, flags, dest, 0);
    if (!cp) {
        if (ct) ip_vs_conn_put(ct);
        return NULL;
    }
    if (ct) {
        ip_vs_control_add(cp, ct);
        ip_vs_conn_put(ct);
    }
    ip_vs_conn_stats(cp, svc);
}

Linux Kernel Version 4.15

Posted by SpasePeepole on Mon, 12 Aug 2019 05:42:09 -0700