DPDK基础库LPM

DPDK中LPM(Longest Prefix Match)的实现,使用了DIR-24-8算法的一个变种,实际上就是用空间换时间。其由一个224大小的表,和256(RTE_LPM_TBL8_NUM_GROUPS)个大小为28的表组成。前者叫做tbl24,可使用IP地址的前24位进行索引。后者叫做tbl8,可使用IP地址的后8位进行索引。

理论上,tbl8表的数量应为2^24个,但是考虑的内存的消耗,DPDK默认仅设置为256,事实上长度超过24位的路由表项并不多见。

LPM初始化

LPM的主要配置参数就是支持的最大规则数量,LPM规则由LPM前缀唯一标识(LPM prefix),而LPM前缀由两部分组成,32位的键值(key)和深度(depth)。如下l3fwd示例程序中对lpm的初始化,其中设置了最大规则数量max_rules。

void
setup_lpm(const int socketid)
{        
    struct rte_lpm_config config_ipv4;

    /* create the LPM table */
    config_ipv4.max_rules = IPV4_L3FWD_LPM_MAX_RULES;
    config_ipv4.number_tbl8s = IPV4_L3FWD_LPM_NUMBER_TBL8S;
    config_ipv4.flags = 0;
    snprintf(s, sizeof(s), "IPV4_L3FWD_LPM_%d", socketid);
    ipv4_l3fwd_lpm_lookup_struct[socketid] =
            rte_lpm_create(s, socketid, &config_ipv4);

在LPM的实现中,可指定与LPM前缀关联的用户数据(下一跳数据next-hop),大小为4个字节,官方的说明是将此字段作为索引值,而不是真正的存储下一跳IP地址。此索引值可应用于另外的路由表,找到对应的表项。

LPM规则添加

LPM规则的添加大致分为两个步骤:规则本身的添加和查找项的添加。首先看一下规则本身的添加,添加之前需要在LPM表中查找是否已经存在要插入的规则,如果没有,将其插入LPM表。否则,更新下一跳用户数据字段。当没有可用的空间时,返回错误。如下添加规则函数rule_add,LPM按照深度(depth)将规则分为32个组。rule_info中保存了每个组的使用情况,即组内的第一个规则的索引和规则数量。如果将要添加的规则所属的组内已有规则,检查是否与要添加的规则相同,更新next_hop字段,返回索引值。

static int32_t
rule_add(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, uint32_t next_hop)
{
    if (lpm->rule_info[depth - 1].used_rules > 0) {

        rule_gindex = lpm->rule_info[depth - 1].first_rule;
        rule_index = rule_gindex;
        last_rule = rule_gindex + lpm->rule_info[depth - 1].used_rules;

        for (; rule_index < last_rule; rule_index++) {

            /* If rule already exists update its next_hop and return. */
            if (lpm->rules_tbl[rule_index].ip == ip_masked) {
                lpm->rules_tbl[rule_index].next_hop = next_hop;

                return rule_index;
            }
        }

如果以上不成立,要添加的规则所属的组内不存在已有规则,遍历所有深度小于当前规则的组(深度由大到小遍历),找到一个存在规则的组,取其中最大规则的索引值,作为要添加规则对应的组的首个索引值(first_rule)。

    } else {
        /* Calculate the position in which the rule will be stored. */
        rule_index = 0;

        for (i = depth - 1; i > 0; i--) {
            if (lpm->rule_info[i - 1].used_rules > 0) {
                rule_index = lpm->rule_info[i - 1].first_rule
                        + lpm->rule_info[i - 1].used_rules;
                break;
            }
        }
        if (rule_index == lpm->max_rules)
            return -ENOSPC;

        lpm->rule_info[depth - 1].first_rule = rule_index;
    }

以上找到了新规则要插入的位置索引(rule_index),之后,将新规则所属组之后的所有组的规则全部向后移动一个位置,为新规则留出空间。之后将新规则的参数(ip_masked,next_hop)写入索引位置。

    /* Make room for the new rule in the array. */
    for (i = RTE_LPM_MAX_DEPTH; i > depth; i--) {
        if (lpm->rule_info[i - 1].first_rule
                + lpm->rule_info[i - 1].used_rules == lpm->max_rules)
            return -ENOSPC;

        if (lpm->rule_info[i - 1].used_rules > 0) {
            lpm->rules_tbl[lpm->rule_info[i - 1].first_rule
                + lpm->rule_info[i - 1].used_rules]
                    = lpm->rules_tbl[lpm->rule_info[i - 1].first_rule];
            lpm->rule_info[i - 1].first_rule++;
        }
    }

    /* Add the new rule. */
    lpm->rules_tbl[rule_index].ip = ip_masked;
    lpm->rules_tbl[rule_index].next_hop = next_hop;

    /* Increment the used rules counter for this rule group. */
    lpm->rule_info[depth - 1].used_rules++;

    return rule_index;

以上完成了规则本身的添加,下面看一下LPM库的接口函数rte_lpm_add中查找项的添加。根据规则的深度(IP地址掩码长度)分成两个部分,深度小于等于24位的情况由函数add_depth_small处理,大于24位时,由较复杂的函数add_depth_big处理。

int rte_lpm_add(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, uint32_t next_hop)
{
    ip_masked = ip & depth_to_mask(depth);

    rule_index = rule_add(lpm, ip_masked, depth, next_hop);

    /* If the is no space available for new rule return error. */
    if (rule_index < 0)
        return rule_index;

    if (depth <= MAX_DEPTH_TBL24) {
        status = add_depth_small(lpm, ip_masked, depth, next_hop);
    } else { /* If depth > RTE_LPM_MAX_DEPTH_TBL24 */
        status = add_depth_big(lpm, ip_masked, depth, next_hop);
        /*
         * If add fails due to exhaustion of tbl8 extensions delete rule that was added to rule table.
         */
        if (status < 0) {
            rule_delete(lpm, rule_index, depth);

            return status;
        }
    }

先看一下小于等于24位深度的规则添加函数,将IP地址的前24位值作为起始索引,在tbl24表中查找可用项,即此项还没有被占用(valid为0),或者,此项有值,但是其深度小于当前要添加的项的深度,根据最长匹配原则,将其替换。

注意这里将遍历新规则的IP地址前24位开始,直到此深度表示的最大值结束,其中的所有符合以上条件的项都将被替换。

static __rte_noinline int32_t
add_depth_small(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, uint32_t next_hop)
{
#define group_idx next_hop
    uint32_t tbl24_index, tbl24_range, tbl8_index, tbl8_group_end, i, j;

    tbl24_index = ip >> 8;
    tbl24_range = depth_to_range(depth);

    for (i = tbl24_index; i < (tbl24_index + tbl24_range); i++) {

        if (!lpm->tbl24[i].valid || (lpm->tbl24[i].valid_group == 0 &&
                lpm->tbl24[i].depth <= depth)) {

            struct rte_lpm_tbl_entry new_tbl24_entry = {
                .next_hop = next_hop,
                .valid = VALID,
                .valid_group = 0,
                .depth = depth,
            };

            /* Setting tbl24 entry in one go to avoid race conditions
             */
            __atomic_store(&lpm->tbl24[i], &new_tbl24_entry, __ATOMIC_RELEASE);
            continue;
        }

对于valid_group等于1的表项,表示其深度大于24位,此时,表项中的next_hop字段存储的为tbl8的索引值。遍历tbl8中对应的256个表项,如果其中表项无效,或者深度小于等于当前要添加规则的深度,进行表项替换。

        if (lpm->tbl24[i].valid_group == 1) {
            /* If tbl24 entry is valid and extended calculate the index into tbl8.
             */
            tbl8_index = lpm->tbl24[i].group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
            tbl8_group_end = tbl8_index + RTE_LPM_TBL8_GROUP_NUM_ENTRIES;

            for (j = tbl8_index; j < tbl8_group_end; j++) {
                if (!lpm->tbl8[j].valid || lpm->tbl8[j].depth <= depth) {
                    struct rte_lpm_tbl_entry
                        new_tbl8_entry = {
                        .valid = VALID,
                        .valid_group = VALID,
                        .depth = depth,
                        .next_hop = next_hop,
                    };
                    /* Setting tbl8 entry in one go to avoid race conditions
                     */
                    __atomic_store(&lpm->tbl8[j], &new_tbl8_entry, __ATOMIC_RELAXED);

                    continue;
                }
            }
        }
    }
#undef group_idx
    return 0;

以下函数add_depth_big完成深度大于24位的规则的添加。首先看第一种情况,即IP地址对应的前24位,在tlb24表中对应的项无效的情况(valid为0)。这表明其对应的tbl8表还没有创建,tbl8表分成了256个组,tbl8_alloc函数返回一个可用的组索引(tbl8_group_index),即组内的首个索引值(其对应的表项的valid_group为真)。之后将组内depth深度包括的所有索引对应的项都填充为当前规则生成的表项,但是它们对应的valid_group并不改变。

参见depth_to_range函数,对于大于24位的depth值,其计算的范围(range)为(1 << (32-depth)),例如depth为30,那么范围就是4。由于此tbl8组是新创建的,其中表项为空,可直接进行填充。最后,将对应的tbl24表中的项设置为有效,并且在next_hop字段填充对应的tbl8表的组索引值。

static __rte_noinline int32_t      
add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, uint32_t next_hop)         
{ 
#define group_idx next_hop         
  
    tbl24_index = (ip_masked >> 8);
    tbl8_range = depth_to_range(depth);

    if (!lpm->tbl24[tbl24_index].valid) {    
        /* Search for a free tbl8 group. */      
        tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
        if (tbl8_group_index < 0)
            return tbl8_group_index;

        /* Find index into tbl8 and range. */    
        tbl8_index = (tbl8_group_index * RTE_LPM_TBL8_GROUP_NUM_ENTRIES) + (ip_masked & 0xFF);

        /* Set tbl8 entry. */
        for (i = tbl8_index; i < (tbl8_index + tbl8_range); i++) {
            struct rte_lpm_tbl_entry new_tbl8_entry = {
                .valid = VALID,
                .depth = depth,
                .valid_group = lpm->tbl8[i].valid_group,
                .next_hop = next_hop,
            };
            __atomic_store(&lpm->tbl8[i], &new_tbl8_entry, __ATOMIC_RELAXED);
        }

        /* Update tbl24 entry to point to new tbl8 entry. Note: The ext_flag and 
		 * tbl8_index need to be updated simultaneously, so assign whole structure in one go
         */
        struct rte_lpm_tbl_entry new_tbl24_entry = {
            .group_idx = tbl8_group_index,
            .valid = VALID,
            .valid_group = 1,
            .depth = 0,
        };
        __atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry, __ATOMIC_RELEASE);

以下为第二种情况,对于的tbl24表项有效,但是其还没有进行过扩展(valid_group为0),即对应的tbl8表还没有创建。与以上介绍的情况类似,先分配tbl8组,获得组索引。不同点在于,这里首先将tbl24中的旧的表项添加到tbl8组内的所有256个表项中,深度和next_hop都使用tbl24中表项的值。之后的操作与第一种情况相同,。

    } /* If valid entry but not extended calculate the index into Table8. */
    else if (lpm->tbl24[tbl24_index].valid_group == 0) {
        /* Search for free tbl8 group. */
        tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);

        if (tbl8_group_index < 0)
            return tbl8_group_index;

        tbl8_group_start = tbl8_group_index * RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
        tbl8_group_end = tbl8_group_start + RTE_LPM_TBL8_GROUP_NUM_ENTRIES;

        /* Populate new tbl8 with tbl24 value. */
        for (i = tbl8_group_start; i < tbl8_group_end; i++) {
            struct rte_lpm_tbl_entry new_tbl8_entry = {
                .valid = VALID,
                .depth = lpm->tbl24[tbl24_index].depth,
                .valid_group = lpm->tbl8[i].valid_group,
                .next_hop = lpm->tbl24[tbl24_index].next_hop,
            };
            __atomic_store(&lpm->tbl8[i], &new_tbl8_entry, __ATOMIC_RELAXED);
        }

        tbl8_index = tbl8_group_start + (ip_masked & 0xFF);

        /* Insert new rule into the tbl8 entry. */
        for (i = tbl8_index; i < tbl8_index + tbl8_range; i++) {
            struct rte_lpm_tbl_entry new_tbl8_entry = {
                .valid = VALID,
                .depth = depth,
                .valid_group = lpm->tbl8[i].valid_group,
                .next_hop = next_hop,
            };
            __atomic_store(&lpm->tbl8[i], &new_tbl8_entry,  __ATOMIC_RELAXED);
        }

        /* Update tbl24 entry to point to new tbl8 entry. Note: The ext_flag and 
         * tbl8_index need to be updated simultaneously, so assign whole structure in one go.
         */
        struct rte_lpm_tbl_entry new_tbl24_entry = {
                .group_idx = tbl8_group_index,
                .valid = VALID,
                .valid_group = 1,
                .depth = 0,
        };
        __atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry, __ATOMIC_RELEASE);

以下为第三种情况,即tbl8表已经分配好,由tlb24表项中的next_hop字段中获取组索引,之后,在depth表示的索引范围内遍历所有tbl8表项,填充其中无效表项,或者深度小于等于新规则深度的表项。

    } else { 
        /* If it is valid, extended entry calculate the index into tbl8. */
		 
        tbl8_group_index = lpm->tbl24[tbl24_index].group_idx;
        tbl8_group_start = tbl8_group_index * RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
        tbl8_index = tbl8_group_start + (ip_masked & 0xFF);

        for (i = tbl8_index; i < (tbl8_index + tbl8_range); i++) {

            if (!lpm->tbl8[i].valid || lpm->tbl8[i].depth <= depth) {
                struct rte_lpm_tbl_entry new_tbl8_entry = {
                    .valid = VALID,
                    .depth = depth,
                    .next_hop = next_hop,
                    .valid_group = lpm->tbl8[i].valid_group,
                };

                /* Setting tbl8 entry in one go to avoid race condition
                 */
                __atomic_store(&lpm->tbl8[i], &new_tbl8_entry, __ATOMIC_RELAXED);
                continue;
            }
        }
    }
#undef group_idx
    return 0;
}

LPM查找

在查找函数之前,先看一下LPM的表项结构rte_lpm_tbl_entry,其中valid_group和valid两个字段的位置,由宏定义RTE_LPM_VALID_EXT_ENTRY_BITMASK表示的掩码进行了定义,之后将会用到。

另外,宏RTE_LPM_LOOKUP_SUCCESS指明了valid标志字段的位置,以下用于判断表示是否有效。

struct rte_lpm_tbl_entry {
    uint32_t depth       :6;
    uint32_t valid_group :1;
    uint32_t valid       :1;
    uint32_t next_hop    :24;

}; 

#define RTE_LPM_VALID_EXT_ENTRY_BITMASK 0x03000000

#define RTE_LPM_LOOKUP_SUCCESS          0x01000000

如下LPM查找函数rte_lpm_lookup,首先使用IP地址的前24位为索引在tbl24表中取出对应的表项,如果valid_group标志没有设置,表项的后24位即查找的next_hop值。否则,根据tbl24表项中存储的tbl8组索引,找到tbl8表中对应的位置,取出表项。

static inline int rte_lpm_lookup(struct rte_lpm *lpm, uint32_t ip, uint32_t *next_hop)
{   
    unsigned tbl24_index = (ip >> 8);
    uint32_t tbl_entry;
    const uint32_t *ptbl;
    
    /* DEBUG: Check user input arguments. */ 
    RTE_LPM_RETURN_IF_TRUE(((lpm == NULL) || (next_hop == NULL)), -EINVAL);
    
    /* Copy tbl24 entry */ 
    ptbl = (const uint32_t *)(&lpm->tbl24[tbl24_index]);
    tbl_entry = *ptbl;
    
    /* Memory ordering is not required in lookup. Because dataflow
     * dependency exists, compiler or HW won't be able to re-order the operations.
     */
    /* Copy tbl8 entry (only if needed) */
    if (unlikely((tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
            RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
        
        unsigned tbl8_index = (uint8_t)ip +
                (((uint32_t)tbl_entry & 0x00FFFFFF) * RTE_LPM_TBL8_GROUP_NUM_ENTRIES);
        
        ptbl = (const uint32_t *)&lpm->tbl8[tbl8_index];
        tbl_entry = *ptbl;
    }
    *next_hop = ((uint32_t)tbl_entry & 0x00FFFFFF); 
    return (tbl_entry & RTE_LPM_LOOKUP_SUCCESS) ? 0 : -ENOENT;
}

DPDK版本 19.11.3

你可能感兴趣的:(DPDK,lpm,route)