在前面的分析中基本介绍了DPDK中的各种模块的技术架构,从这篇开始,就从头开始对整个代码的流程进行分析,然后在此基础上,初步掌握了DPDK的代码架构和功能分析后,再对基于DPDK的上层应用进行分析。
这是一个从分到合,从基础到应用的过程。在这个过程中需要不断的回顾和补充原来的知识。
在前面详细分析过数据包的转运流程,其实做一种IO接口,一定是分成两部分,即数据的接收和发送。从宏观上看,无论数据的接收和发送,都需要网卡到队列(Ring)然后到应用层的这么一个过程。DPDK的特点是尽量不打扰内核,并通过DMA(RDMA)的方式将数据利用队列将应用层与网卡进行交互。
而在这个交互的过程中,又有几种方式,如轮询、中断和混合三种方式。明白了这些,再加上前面的相关文章的分析,那么整体的流程和细节的程序大致已经把握了,那么最重要的就是看DPDK的源码是如何实现这些的。
闲言少叙,直接进入正题(examples\skeleton\basicfwd.c):
先看入口函数:
/*
* The main function, which does initialization and calls the per-lcore
* functions.
*/
int
main(int argc, char *argv[])
{
struct rte_mempool *mbuf_pool;
unsigned nb_ports;
uint16_t portid;
/* Initialize the Environment Abstraction Layer (EAL). */
int ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
argc -= ret;
argv += ret;
/* Check that there is an even number of ports to send/receive on. */
nb_ports = rte_eth_dev_count_avail();
if (nb_ports < 2 || (nb_ports & 1))
rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
/* Creates a new mempool in memory to hold the mbufs. */
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports,
MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (mbuf_pool == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
/* Initialize all ports. */
RTE_ETH_FOREACH_DEV(portid)
if (port_init(portid, mbuf_pool) != 0)
rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n",
portid);
if (rte_lcore_count() > 1)
printf("\nWARNING: Too many lcores enabled. Only 1 used.\n");
/* Call lcore_main on the master core only. */
lcore_main();
/* clean up the EAL */
rte_eal_cleanup();
return 0;
}
从代码上看,首先是主函数中需要调用rte_eal_init对所有的相关参数和设置进行初始化:
/* Launch threads, called at application init(). */
int rte_eal_init(int argc, char **argv)
{
......
/* checks if the machine is adequate */
if (!rte_cpu_is_supported()) {
rte_eal_init_alert("unsupported cpu type.");
rte_errno = ENOTSUP;
return -1;
}
if (!rte_atomic32_test_and_set(&run_once)) {
rte_eal_init_alert("already called initialization.");
rte_errno = EALREADY;
return -1;
}
......
/* Call each registered callback, if enabled */
rte_option_init();
return fctret;
}
这个函数已经分析过多次了,但是其中其实还有很多的细节没有分析到位,这个只有真正的用到哪块才会认真的去看每一行代码。这个代码其实就是大量的参数、配置、大页内存等待的初始化。反正代码里大量的init估计即使不明白初始化啥也知道是初始化的功能。
其下就是检查偶数个端的函数:
uint16_t
rte_eth_dev_count_avail(void)
{
uint16_t p;
uint16_t count;
count = 0;
RTE_ETH_FOREACH_DEV(p)
count++;
return count;
}
然后开始创建内存池:
/* helper to create a mbuf pool */
struct rte_mempool *
rte_pktmbuf_pool_create(const char *name, unsigned int n,
unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id)
{
return rte_pktmbuf_pool_create_by_ops(name, n, cache_size, priv_size,
data_room_size, socket_id, NULL);
}
然后调用端口初始化函数:
/* basicfwd.c: Basic DPDK skeleton forwarding example. */
/*
* Initializes a given port using global settings and with the RX buffers
* coming from the mbuf_pool passed as a parameter.
*/
static inline int
port_init(uint16_t port, struct rte_mempool *mbuf_pool)
{
struct rte_eth_conf port_conf = port_conf_default;
const uint16_t rx_rings = 1, tx_rings = 1;
......
/* Enable RX in promiscuous mode for the Ethernet device. */
retval = rte_eth_promiscuous_enable(port);
if (retval != 0)
return retval;
return 0;
}
然后通过rte_lcore_count() 来判断逻辑核心的数量,如果多于1个,在本程序其实没啥意义。
最后调用函数:
static __attribute__((noreturn)) void
lcore_main(void)
{
uint16_t port;
/*
* Check that the port is on the same NUMA node as the polling thread
* for best performance.
*/
RTE_ETH_FOREACH_DEV(port)
if (rte_eth_dev_socket_id(port) >= 0 &&
rte_eth_dev_socket_id(port) !=
(int)rte_socket_id())
printf("WARNING, port %u is on remote NUMA node to "
"polling thread.\n\tPerformance will "
"not be optimal.\n", port);
printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
rte_lcore_id());
/* Run until the application is quit or killed. */
for (;;) {
/*
* Receive packets on a port and forward them on the paired
* port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
*/
RTE_ETH_FOREACH_DEV(port) {
/* Get burst of RX packets, from first port of pair. */
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
/* Send burst of TX packets, to second port of pair. */
const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
bufs, nb_rx);
/* Free any unsent packets. */
if (unlikely(nb_tx < nb_rx)) {
uint16_t buf;
for (buf = nb_tx; buf < nb_rx; buf++)
rte_pktmbuf_free(bufs[buf]);
}
}
}
}
真正的处理数据收发,如果退出,则需要调用rte_eal_cleanup()来清理现场,并返回。
int
rte_eal_cleanup(void)
{
/* if we're in a primary process, we need to mark hugepages as freeable
* so that finalization can release them back to the system.
*/
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
rte_memseg_walk(mark_freeable, NULL);
rte_service_finalize();
#ifdef VFIO_PRESENT
vfio_mp_sync_cleanup();
#endif
rte_mp_channel_cleanup();
eal_cleanup_config(&internal_config);
return 0;
}
这个程序相对简单,又对数据流进行了网口的转发,这样,对数据包正好有收有接,容易分析理解。在这个简单的服务代码中,开始展开对DPDK中数据流程的整体流转的源码分析。
代码分析比较枯燥,但这又是必不可少的一关,不过不行。古人不是说过,做学问就像翻山,一山方出一山拦。只要坚持下去,就会闯出层峦叠嶂的知识万山。