Add routing table entry for IPv4

Keywords: network route

The following IP commands add routing table entries. By default, routes are added in the main routing table:

# ip route add 19.1.0.0/16 via 192.168.9.1
#
# ip route show table main 
19.1.0.0/16 via 192.168.9.1 dev ens34 

Kernel function inet_rtm_newroute handles the addition of routes. Function rtm_to_fib_config converts netlink data to kernel structure fib_configļ¼Œfib_table_insert according to fib_ The config content performs the operation of adding routing table entries.

static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh,
                 struct netlink_ext_ack *extack)
{
    struct net *net = sock_net(skb->sk);
    struct fib_config cfg;
    struct fib_table *tb;

    err = rtm_to_fib_config(net, skb, nlh, &cfg, extack);
    if (err < 0)
        goto errout;

    tb = fib_new_table(net, cfg.fc_table);
    if (!tb) {
        err = -ENOBUFS;
        goto errout;
    }

    err = fib_table_insert(net, tb, &cfg, extack);
    if (!err && cfg.fc_type == RTN_LOCAL)
        net->ipv4.fib_has_custom_local_routes = true;

Routing table entry addition

Routing table entry insertion function fib_table_insert, first check the validity of the prefix and prefix length plen. In IPv4, the prefix length plen cannot be greater than 32, and the part of the prefix key excluding the plen length should be all zero. See function fib_ valid_ key_ Implementation of len.

int fib_table_insert(struct net *net, struct fib_table *tb,
             struct fib_config *cfg, struct netlink_ext_ack *extack)
{
    struct trie *t = (struct trie *)tb->tb_data;
    struct fib_alias *fa, *new_fa;
    struct key_vector *l, *tp;
    u16 nlflags = NLM_F_EXCL;
    struct fib_info *fi;
    u8 plen = cfg->fc_dst_len;
    u8 slen = KEYLENGTH - plen;
    u8 tos = cfg->fc_tos;

    key = ntohl(cfg->fc_dst);

    if (!fib_valid_key_len(key, plen, extack))
        return -EINVAL;

    pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);

For each routing table entry, the kernel creates a fib_info structure. However, if this routing table entry references a nexthop entry, there may already be available fibs_ Info structure, in function fib_ create_ Make corresponding judgment in info.

# ip nexthop add id 1 via 192.168.2.1 dev ens33
# 
# ip route add 192.2.0.0/16 nhid 1  

Next, you need to find a fib_alias structure, which is used to transfer the new routing item fib_info is added to the trie tree. Function fib_find_node finds the appropriate node in the trie tree according to the prefix value key of the destination network. See IPv4 routing tries tree node addition and lookup.

If l has a value, it indicates that a suitable leaf node is found, and then traverse the FIB of the leaf node_ Alias linked list (fib_find_alias function) to check whether there are available fibs_ Alias structure. If fa is found, it indicates that it is exactly the same as the table item prefix / suffix / table ID to be added, but tos and priority are not necessarily equal. The following code judges these two items. When tos and priority are the same, if the flag NLM is set for the new routing table entry_ F_ Excl, keep the old table entry and exit processing.

    fi = fib_create_info(cfg, extack);

    l = fib_find_node(t, &tp, key);
    fa = l ? fib_find_alias(&l->leaf, slen, tos, fi->fib_priority, tb->tb_id, false) : NULL;

    /* Now fa, if non-NULL, points to the first fib alias
     * with the same keys [prefix,tos,priority], if such key already
     * exists or to the node before which we will insert new one.
     * If fa is NULL, we will need to allocate a new one and
     * insert to the tail of the section matching the suffix length of the new alias.
     */
    if (fa && fa->fa_tos == tos && fa->fa_info->fib_priority == fi->fib_priority) {
        struct fib_alias *fa_first, *fa_match;

        err = -EEXIST;
        if (cfg->fc_nlflags & NLM_F_EXCL)
		    goto out;

NLM is not set in_ F_ In the case of the excl flag, the FIB of the leaf node_ The alias linked list starts at fa and continues to traverse. If one of the suffix length, table ID, tos value and priority of the currently traversed fa is different from the configuration item, end the traversal (these items are arranged in order in the linked list). If the above items are equal, and the routing type is also equal, and this fib_ FIB pointed by alias_ Info and FIB of newly inserted route_ Info is the same, that is, find the matching fib_alias.

Through function fib_create_info, in general, new fibs will be created_ Info structure. It is possible to use the existing FIB only when nhid is used in routing configuration_ Info structure. So the following fa - > fa_ Info equals fi, which takes effect only in the latter case. Next hop generic routing without nhid, fa_match is always empty.

        nlflags &= ~NLM_F_EXCL;
        /* We have 2 goals:
         * 1. Find exact match for type, scope, fib_info to avoid duplicate routes
         * 2. Find next 'fa' (or head), NLM_F_APPEND inserts before it */
        fa_first = fa;
        hlist_for_each_entry_from(fa, fa_list) {
            if ((fa->fa_slen != slen) || (fa->tb_id != tb->tb_id) || (fa->fa_tos != tos))
                break;
            if (fa->fa_info->fib_priority != fi->fib_priority)
                break;
            if (fa->fa_type == cfg->fc_type && fa->fa_info == fi) {
                fa_match = fa;
                break;
            }
        }

The following process replaces the existing route (NLM_F_REPLACE), if the above hlist_ for_ each_ entry_ The traversal of from (including FA itself) found a matching fa_match, indicating that the table entry already exists. EEXIST is returned, but if fa_match and function fib_ find_ The FA values returned by alias are equal, and the function returns 0 (I don't know the logic here: when the matching item is the first, the replacement is successful, otherwise, the returned table item already exists. It may be related to matching the first FA in the fib_table_lookup function). Otherwise, in FA_ If match is empty (the route type or fib_info is different), a new FIB is allocated_ Alias, initialize and replace the old FIB in the linked list_ Alias node. FA of new node_ Info points to the current fib_ Info structure.

        if (cfg->fc_nlflags & NLM_F_REPLACE) {
            struct fib_info *fi_drop;
            u8 state;

            nlflags |= NLM_F_REPLACE;
            fa = fa_first;
            if (fa_match) {
                if (fa == fa_match)
                    err = 0;
                goto out;
            }
            err = -ENOBUFS;
            new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
            if (!new_fa) goto out;

            fi_drop = fa->fa_info;
            new_fa->fa_tos = fa->fa_tos;
            new_fa->fa_info = fi;
            new_fa->fa_type = cfg->fc_type;
            state = fa->fa_state;
            new_fa->fa_state = state & ~FA_S_ACCESSED;
            new_fa->fa_slen = fa->fa_slen;
            new_fa->tb_id = tb->tb_id;
            new_fa->fa_default = -1;
            new_fa->offload = 0;
            new_fa->trap = 0;

            hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);

Function fib_find_alias traverses the FIB of the leaf node_ In the alias linked list, find the first suffix with the length of fa_slen's item (the search here ignores the values of tos and priority), if it is equal to the newly added new_fa, send FIB_EVENT_ENTRY_REPLACE notifies chain events.

            if (fib_find_alias(&l->leaf, fa->fa_slen, 0, 0, tb->tb_id, true) == new_fa) {
                enum fib_event_type fib_event;

                fib_event = FIB_EVENT_ENTRY_REPLACE;
                err = call_fib_entry_notifiers(net, fib_event, key, plen, new_fa, extack);
                if (err) {
                    hlist_replace_rcu(&new_fa->fa_list, &fa->fa_list);
                    goto out_free_new_fa;
                }
            }

After that, the rtnetlink message added by the new route is sent to the application layer. Release the original fib_alias Structure fa and FIB it points to_ Info, end processing.

            rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id, &cfg->fc_nlinfo, nlflags);

            alias_free_mem_rcu(fa);

            fib_release_info(fi_drop);
            if (state & FA_S_ACCESSED)
                rt_cache_flush(cfg->fc_nlinfo.nl_net);

            goto succeeded;
        }

If the replacement flag NLM is not set_ F_ Replace returns EEXIST when the scope, type and nexthop are the same. After that, the new table entry will be processed. If NLM is not set_ F_ Create, error returned.

        /* Error if we find a perfect match which
         * uses the same scope, type, and nexthop information.
         */
        if (fa_match) goto out;

        if (cfg->fc_nlflags & NLM_F_APPEND)
            nlflags |= NLM_F_APPEND;
        else
            fa = fa_first;
    }
    err = -ENOENT;
    if (!(cfg->fc_nlflags & NLM_F_CREATE))
        goto out;

No matching FIB for_ In the case of alias, a new FIB is allocated here_ Alias and add it to the trie tree.

    nlflags |= NLM_F_CREATE;
    err = -ENOBUFS;
    new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
    if (!new_fa) goto out;

    new_fa->fa_info = fi;
    new_fa->fa_tos = tos;
    new_fa->fa_type = cfg->fc_type;
    new_fa->fa_state = 0;
    new_fa->fa_slen = slen;
    new_fa->tb_id = tb->tb_id;
    new_fa->fa_default = -1;
    new_fa->offload = 0;
    new_fa->trap = 0;

    /* Insert new entry to the list. */
    err = fib_insert_alias(t, tp, l, new_fa, fa, key);
    if (err)
        goto out_free_new_fa;

Since the leaf node is added above, the query leaf node here must exist. Traverse the FIB of leaf node_ Alias linked list. Find the first suffix whose length is equal to the new fib_alias suffix length fa_ The item of slen if it is equal to the newly created fib_alias, send FIB_EVENT_ENTRY_REPLACE notifies chain events.

    /* The alias was already inserted, so the node must exist. */
    l = l ? l : fib_find_node(t, &tp, key);
    if (WARN_ON_ONCE(!l))
        goto out_free_new_fa;

    if (fib_find_alias(&l->leaf, new_fa->fa_slen, 0, 0, tb->tb_id, true) ==
        new_fa) {
        enum fib_event_type fib_event;

        fib_event = FIB_EVENT_ENTRY_REPLACE;
        err = call_fib_entry_notifiers(net, fib_event, key, plen, new_fa, extack);
        if (err)
            goto out_remove_new_fa;
    }

Finally, the user layer is notified of the creation of new routing table entries.

    if (!plen)
        tb->tb_num_default++;

    rt_cache_flush(cfg->fc_nlinfo.nl_net);
    rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id, &cfg->fc_nlinfo, nlflags);
succeeded:
    return 0;

Prefix length check

The following IP command prompts an error. For the 16 bit prefix length, the prefix should be 192.2.0.0.

# ip route add 192.2.1.0/16 via 192.168.1.1
Error: Invalid prefix for given prefix length.
#
# ip route add 192.2.1.0/33 via 192.168.1.1  
Error: any valid prefix is expected rather than "192.2.1.0/33".

Function fib_valid_key_len returns this error message. When the prefix length exceeds KEYLENGTH (32), the IP command itself will prompt an error message, as shown above, so the error message (Invalid prefix length) in the kernel cannot be seen.

static bool fib_valid_key_len(u32 key, u8 plen, struct netlink_ext_ack *extack)
{
    if (plen > KEYLENGTH) {
        NL_SET_ERR_MSG(extack, "Invalid prefix length");
        return false;
    }

    if ((plen < KEYLENGTH) && (key << plen)) {
        NL_SET_ERR_MSG(extack, "Invalid prefix for given prefix length");
        return false;
    }

    return true;

Kernel version 5.10

Posted by misfits on Fri, 19 Nov 2021 14:03:54 -0800