dpdk源代码——mbuf结构体

参考链接:
http://blog.csdn.net/hejin_some/article/details/72473031
http://blog.csdn.net/bestboyxie/article/details/52984397
http://dpdk.org/doc/guides/prog_guide/mbuf_lib.html#mbuf-library
http://www.cnblogs.com/yhp-smarthome/p/6687175.html
这篇博文其实不算原创,翻译的官方文档+网络博文的摘抄+自己的一点实践经验

0、Direct and Indirect Buffers 介绍 http://dpdk.org/doc/guides/prog_guide/mbuf_lib.html#mbuf-library

一、mbuf核心结构体

struct rte_mbuf {
    MARKER cacheline0;

    void *buf_addr;           /**< Virtual address of segment buffer. */
    phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */

    uint16_t buf_len;         /**< Length of segment buffer. */

    /* next 6 bytes are initialised on RX descriptor rearm */
    MARKER8 rearm_data;
    uint16_t data_off;

    /**
     * 16-bit Reference counter.
     * It should only be accessed using the following functions:
     * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
     * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
     * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
     * config option.
     */
    RTE_STD_C11
    union {
        rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
        uint16_t refcnt;              /**< Non-atomically accessed refcnt */
    };
    uint8_t nb_segs;          /**< Number of segments. */
    uint8_t port;             /**< Input port. */

    uint64_t ol_flags;        /**< Offload features. */

    /* remaining bytes are set on RX when pulling packet from descriptor */
    MARKER rx_descriptor_fields1;

    /*
     * The packet type, which is the combination of outer/inner L2, L3, L4
     * and tunnel types. The packet_type is about data really present in the
     * mbuf. Example: if vlan stripping is enabled, a received vlan packet
     * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
     * vlan is stripped from the data.
     */
    RTE_STD_C11
    union {
        uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
        struct {
            uint32_t l2_type:4; /**< (Outer) L2 type. */
            uint32_t l3_type:4; /**< (Outer) L3 type. */
            uint32_t l4_type:4; /**< (Outer) L4 type. */
            uint32_t tun_type:4; /**< Tunnel type. */
            uint32_t inner_l2_type:4; /**< Inner L2 type. */
            uint32_t inner_l3_type:4; /**< Inner L3 type. */
            uint32_t inner_l4_type:4; /**< Inner L4 type. */
        };
    };

    uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
    uint16_t data_len;        /**< Amount of data in segment buffer. */
    /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
    uint16_t vlan_tci;

    union {
        uint32_t rss;     /**< RSS hash result if RSS enabled */
        struct {
            RTE_STD_C11
            union {
                struct {
                    uint16_t hash;
                    uint16_t id;
                };
                uint32_t lo;
                /**< Second 4 flexible bytes */
            };
            uint32_t hi;
            /**< First 4 flexible bytes or FD ID, dependent on
                 PKT_RX_FDIR_* flag in ol_flags. */
        } fdir;           /**< Filter identifier if FDIR enabled */
        struct {
            uint32_t lo;
            uint32_t hi;
        } sched;          /**< Hierarchical scheduler */
        uint32_t usr;     /**< User defined tags. See rte_distributor_process() */
    } hash;                   /**< hash information */

    uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */

    /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
    uint16_t vlan_tci_outer;

    /* second cache line - fields only used in slow path or on TX */
    MARKER cacheline1 __rte_cache_min_aligned;

    RTE_STD_C11
    union {
        void *userdata;   /**< Can be used for external metadata */
        uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
    };

    struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
    struct rte_mbuf *next;    /**< Next segment of scattered packet. */

    /* fields to support TX offloads */
    RTE_STD_C11
    union {
        uint64_t tx_offload;       /**< combined for easy fetch */
        __extension__
        struct {
            uint64_t l2_len:7;
            /**< L2 (MAC) Header Length for non-tunneling pkt.
             * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
             */
            uint64_t l3_len:9; /**< L3 (IP) Header Length. */
            uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
            uint64_t tso_segsz:16; /**< TCP TSO segment size */

            /* fields for TX offloading of tunnels */
            uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
            uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */

            /* uint64_t unused:8; */
        };
    };

    /** Size of the application private data. In case of an indirect
     * mbuf, it stores the direct mbuf private data size. */
    uint16_t priv_size;

    /** Timesync flags for use with IEEE1588. */
    uint16_t timesync;
} __rte_cache_aligned;

二、图示mbuf结构体

dpdk源代码——mbuf结构体_第1张图片
上图是只有一个segment的mbuf结构体图示

rte_pktmbuf_mtod(m):得到data的首地址
headroom and tailroom
data数据的长度:rte_pktmbuf_pktlen 或者 rte_pktmbuf_datalen

dpdk源代码——mbuf结构体_第2张图片
mbuf结构体中的pkt的next字段记录下一个segment的地址
m的pkt总长度是seg1+seg2+seg3三段数据之和。

创建新mbuf,只包括一个segment,length = 0
释放mbuf时,是将mbuf归还给内存池使用。

When freeing a packet mbuf that contains several segments, all of them are freed and returned to their original mempool.
如果释放一个包含多个segment的mbuf结构体,其中的每个segment(其实也是mbuf*)都会被释放,然后回归到原始内存池中

三、mbuf和零拷贝实现原理

四、mbuf何时释放

五、mbuf的基本操作

Rte_mbuf的结构与linux内核协议栈的skb_buf相似,在保存报文的内存块前后分别保留headroom和tailroom,以方便应用解封报文。Headroom默认128字节,可以通过宏RTE_PKTMBUF_HEADROOM调整。

5.1 计算出mbuf中各个字段的长度
我们可以通过m->pkt.data – m->buf_addr计算出headroom长度,通过m->buf_len – m->pkt.data_len – headroom_size计算出tailroom长度。这些计算过程都由以下函数实现:

uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)

5.2 解封装报文头部
假设m->pkt.data指向报文的二层首地址,我们可以通过以下一系列操作剥去报文的二层头部:

m->pkt.data += 14;
m->pkt.data_len -= 14;
m->pkt.pkt_len -= 14;

这些操作已经由rte_pktmbuf_adj()实现,函数原型如下:
char *rte_pktmbuf_adj(struct rte_mbuf *m, uint16_t len)

5.3 封装报文头部
我们可以通过以下一系列操作为IP报文封装二层头部:

m->pkt.data -= 14;
m->pkt.data_len += 14;
m->pkt.pkt_len += 14;

这些操作由rte_pktmbuf_prepend()实现,函数原型如下:
char *rte_pktmbuf_prepend(struct rte_mbuf *m, uint16_t len)

5.3 在尾部tailroom添加数据
如果需要在tailroom 中加入N个字节数据,我们可以通过以下操作完成:
tail = m->pkt.data + m->pkt.data_len; // tail记录tailroom首地址
m->pkt.data_len += N;
m->pkt.pkt_len += N;

这些操作由rte_pktmbuf_append()实现,函数原型如下:
char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)

5.3 从data尾部删除数据
librte_mbuf还提供了rte_pktmbuf_trim()函数,用来移除mbuf中data数据域的最后N个字节,函数实现如下:

m->pkt.data_len -= N;
m->pkt.pkt_len -= N;

函数原型如下:
int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)

重点:
一 . 报文数据永远是存放在data数据域中的;主要控制的就是data_off 与data_len,
data_off + buf_addr = data数据的开始地址
data_off + buf_addr = data数据的结束地址

二 . pkt_len和data_len的关系

uint32_t pkt_len;/**< Total pkt len: sum of all segments. */
uint16_t data_len;/**< Amount of data in segment buffer. */

如果只有一个mbuf,则pkt_len和data_len是相同值.

三 api 应用场景

rte_pktmbuf_prepend
移动data_off指针,注意:需要查看返回值,如果已经偏移到headroom的时候,会返回NULL;(报文向前扩容),例如报文从应用层往下,一层一层的封装就用这个。

rte_pktmbuf_append
改变data_len的长度 ,返回改变前的尾地址。(向后扩容)
例如先有首部再填数据字段,就可以用这个

rte_pktmbuf_adj

(首部向后缩小空间) 改变data_off的值 从二层到三层转发,去二层头就可以用这个

rte_pktmbuf_trim
(尾部向前缩小空间) 移动data_len减少buf_len;(预分配的内容太大,数据没那么大可以用这个)

总结:
这4个API就是我们常见的调整数据部分大小,其实用法和API的名字和内核的skbuf类似。

rte_pktmbuf_mtod
rte_pktmbuf_mtod_offset

#define rte_pktmbuf_mtod_offset(m, t, o)    \
    ((t)((char *)(m)->buf_addr + (m)->data_off + (o)))

#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)

这两个API就是就是返回buf_addr+data_off +useroff 然后再强制类型转换一下而已~~

学习知识要扎实,一步步完善自己知识体系.,dpdk的源代码还是很值得学习的.

你可能感兴趣的:(DPDK入门教程)