DPDK中L2fwd随笔
L2fwd是网络的二层转发,通过MAC地址进行发包。如官网中所述,在DPDK中至少有两个端口进行双向的发包和收包,同时目的端口和发送端口是相邻的,因此DPDK必须满足是偶数个的端口。即,若有四个端口,端口1和2互相发包和收包,端口3和4互相发包和收包。
测试过程:
[l2fwd] mount | grep /mnt/huge
[l2fwd] mkdir -p /mnt/huge
[l2fwd] mount -t hugetlbfs nodev /mnt/huge
[l2fwd]echo 1024 >>/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
[l2fwd] export RTE_SDK=/root/dpdk-16.07
[l2fwd] export RTE_TARGET=x86_64-native-linuxapp-gcc
[l2fwd] make
[build] ./l2fwd -c 0xf -n 3 -- -p 0x3 -q 1
-c 0xf 运行的十六机制掩码核心数。在这里指分配4个core
-n 3 每个处理器的内存通道数
-- 表示之后的为次参数;
-p 0x3 设置dpdk起点的端口数,0x3是指后两位为1,即起点两个端口;
-q 确定每个core的队列的数量,表示每个逻辑核心有一个队列。
以下是官网中对-q的描述:
The application uses onelcore to poll one or several ports, depending on the -q option, which specifiesthe number of queues per lcore.
For example, if the userspecifies -q 4, the application is able to poll four ports with one lcore. Ifthere are 16 ports on the target (and if the portmask argument is -p ffff ),the application will need four lcores to poll all the ports.
L2fwd代码运行过程:
(1) 参数传递,将输入的命令./l2fwd-c 0xf -n 3 -- -p 0x3 -q 1传入函数中,该过程在main()中进行。
/* init EAL */
ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "InvalidEAL arguments\n");
argc -= ret;
argv += ret;
/* parse application arguments(after the EAL ones) */
ret = l2fwd_parse_args(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "InvalidL2FWD arguments\n");
(2) 内存池的初始化(Mbuf Pool Initialization)
被驱动或者应用程序用于存储数据包的数据,第一次使用 rte_pktmbuf_pool_init(),这个函数是由mbufAPI提供的,也可以由开发者复制或扩展,第二次使用 rte_mempool_create()
(3) 驱动程序初始化
DPDK使用的轮询模式避开了中断 ,使得性能大大提高
(4) 接受队列的的初始化,该模块就是依靠于-q的参数
ret = rte_eth_rx_queue_setup((uint8_t)portid,0,nb_rxd,SOCKET0, &rx_conf,l2fwd_pktmbuf_pool);
if (ret < 0)
rte_exit(EXIT_FAILURE,"rte_eth_rx_queue_setup: "
"err=%d, port=%u\n",
ret,portid);
必须为指定逻辑核心轮询的队列列表存储在lcore_queue_conf结构中
structlcore_queue_conf {
unsignedn_rx_port;
unsignedrx_port_list[MAX_RX_QUEUE_PER_LCORE];
struct mbuf_tabletx_mbufs[L2FWD_MAX_PORTS];
}rte_cache_aligned;
structlcore_queue_conflcore_queue_conf[RTE_MAX_LCORE];
rx队列的全局配置
staticconststruct rte_eth_rxconf rx_conf = {
.rx_thresh = {
.pthresh = RX_PTHRESH,
.hthresh = RX_HTHRESH,
.wthresh = RX_WTHRESH,
},
};
(5)发送队列的初始化
每个队列应该可以在每个端口上传输,每个端口,初始化一个发送队列
/* init one TX queue on each port */
fflush(stdout);
ret = rte_eth_tx_queue_setup((uint8_t)portid,0,nb_txd,rte_eth_dev_socket_id(portid), &tx_conf);
if (ret < 0)
rte_exit(EXIT_FAILURE,"rte_eth_tx_queue_setup:err=%d, port=%u\n",ret, (unsigned)portid);
发送队列的全局配置
static const struct rte_eth_txconftx_conf = {
.tx_thresh = {
.pthresh = TX_PTHRESH,
.hthresh = TX_HTHRESH,
.wthresh = TX_WTHRESH,
},
.tx_free_thresh = RTE_TEST_TX_DESC_DEFAULT + 1,/* disable feature */
};
(6)接受数据包
/*
* Read packet from RX queues
*/
for (i = 0;i < qconf->n_rx_port;i++) {
portid = qconf->rx_port_list[i];
nb_rx = rte_eth_rx_burst((uint8_t)portid,0, pkts_burst,MAX_PKT_BURST);
for (j = 0;j < nb_rx;j++) {
m = pkts_burst[j];
rte_prefetch0[rte_pktmbuf_mtod(m,void *)); l2fwd_simple_forward(m,portid);
}
}
Packets are read in a burst of size MAX_PKT_BURST. The rte_eth_rx_burst() function writes the mbuf pointers in a local table and returns the number of available mbufs in the table.
(7)处理数据包
处理非常简单:从RX端口处理TX端口,然后替换源和目标MAC地址。
但是在测试过程中发现,替换了目标MAC地址之后就无法正常通信了,也不知道为什么要替换目标MAC地址,当注释了黄色部分的代码之后就可以正常通信了。
staticvoid
l2fwd_simple_forward(structrte_mbuf *m,unsignedportid)
{
struct ether_hdr *eth;
void *tmp;
unsigneddst_port;
dst_port = l2fwd_dst_ports[portid];
eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
/* 02:00:00:00:00:xx */
tmp = ð->d_addr.addr_bytes[0];
*((uint64_t *)tmp) = 0x000000000002 + ((uint64_t)dst_port << 40);
/* src addr */
ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], ð->s_addr);
l2fwd_send_packet(m, (uint8_t)dst_port);
}
(8)发送数据包
/* Send the packet on an output interface */
static int
l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
{
unsigned lcore_id, len;
struct lcore_queue_conf \*qconf;
lcore_id = rte_lcore_id();
qconf = &lcore_queue_conf[lcore_id];
len = qconf->tx_mbufs[port].len;
qconf->tx_mbufs[port].m_table[len] = m;
len++;
/* enough pkts to be sent */
if (unlikely(len == MAX_PKT_BURST)) {
l2fwd_send_burst(qconf, MAX_PKT_BURST, port);
len = 0;
}
qconf->tx_mbufs[port].len = len; return 0;
}