首先以libpcap为主线,先通过pcap_open_live函数,做一些初始化的操作,比如打开网卡,设置好读取数据包的回调函数等等,然后就可以通过pcap_next,pcap_next_ex,pcap_dispatch,pcap_loop来捕获数据包了。本文的主要宗旨是分析源码,从应用层的libpcap,pfring一直分析到内核的PF_RING,通过对源码的讲解,使得我们深入的理解PF_RING,及它是怎样改善libpcap捕获数据包的性能的。
1) pcap_open_live
首先还是从应用层的libpcap开始分析,第一个分析的函数pcap_open_live,在pcap.c文件中找到pcap_open_live函数,源码如下:
pcap_t * pcap_open_live(constchar *source, int snaplen, int promisc, int to_ms, char *errbuf)
{
pcap_t*p;
intstatus;
p= pcap_create(source, errbuf);
if(p == NULL)
return(NULL);
status= pcap_set_snaplen(p, snaplen);
if(status < 0)
gotofail;
status= pcap_set_promisc(p, promisc);
if(status < 0)
gotofail;
status= pcap_set_timeout(p, to_ms);
if(status < 0)
gotofail;
p->oldstyle= 1;
status= pcap_activate(p);
if(status < 0)
gotofail;
return(p);
fail:
if(status == PCAP_ERROR)
snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s", source,
p->errbuf);
elseif (status == PCAP_ERROR_NO_SUCH_DEVICE ||
status == PCAP_ERROR_PERM_DENIED)
snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s (%s)", source,
pcap_statustostr(status), p->errbuf);
else
snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s", source,
pcap_statustostr(status));
pcap_close(p);
return(NULL);
}
从上面的源码可以看出,pcap_open_live函数首先调用pcap_create函数,这个函数里面的内容到下面在进行分析,然后调用pcap_set_snaplen设置最大捕获包的长度,对于以太网数据包,最大长度为1518bytes,默认可以设置成65535就可以捕获所有的数据包了。然后调用pcap_set_promisc设置数据包的捕获模式,1为混杂模式,pcap_set_timeout函数的作用是设置超时的时间,当应用程序在这个时间内没读到数据就返回。接着就是pcap_activate函数了,这个也在下面进行讲解。其实在pcap_create函数和pcap_activate函数之间还可以调用pcap_set_buffer_size函数设置内核缓冲区的大小,这个函数我们可以在opentest.c文件中看到它的调用方法。我也会在下文中进行讲解。
Libpcap源码为了支持多个操作系统,代码错综复杂,你搜一下pcap_create函数,有很多地方定义了该函数,但是我们是在linux系统下进行源码分析,所以我们首先在pcap_linux.c下面搜索pcap_create函数,源码如下:
pcap_t * pcap_create(constchar *device, char *ebuf)
{ //device 为网卡的设备名,ebuf:存放错误信息的缓冲区
pcap_t *handle;
/*
* A null device name is equivalent to the"any" device.
*/
if (device == NULL)
device ="any";
#ifdef HAVE_DAG_API
if (strstr(device,"dag")) {
returndag_create(device, ebuf);
}
#endif /* HAVE_DAG_API */
#ifdef HAVE_SEPTEL_API
if (strstr(device,"septel")) {
returnseptel_create(device, ebuf);
}
#endif /* HAVE_SEPTEL_API */
#ifdef HAVE_SNF_API
handle =snf_create(device, ebuf);
if (strstr(device,"snf") || handle != NULL)
return handle;
#endif /* HAVE_SNF_API */
#ifdef PCAP_SUPPORT_BT
if (strstr(device,"bluetooth")) {
returnbt_create(device, ebuf);
}
#endif
#ifdef PCAP_SUPPORT_CAN
if (strstr(device,"can") || strstr(device, "vcan")) {
returncan_create(device, ebuf);
}
#endif
#ifdef PCAP_SUPPORT_USB
if (strstr(device,"usbmon")) {
returnusb_create(device, ebuf);
}
#endif
handle = pcap_create_common(device, ebuf);
if (handle == NULL)
return NULL;
// pcap_create_common为初始化的函数,通过网卡设备的名字,获得pcap_t*一个句柄,然后再设定handle的回调函数。
handle->activate_op =pcap_activate_linux;
handle->can_set_rfmon_op= pcap_can_set_rfmon_linux; //设置rfmonmode
return handle;
}
为了支持不同的设备,pcap_create通过 #ifdef进行区分,这样就将打开不同的设备集成在一个函数中,而在我们的应用中就是普通的网卡,所以它就是调用pcap_create_common函数,它在pcap.c中定义,感觉有点混乱,为什么不直接在pcap-linux.c中定义呢,个人观点,应该在pcap-linux中定义,显的直观些,害我跟踪的时候,还要到pcap.c中取找这个函数,因为libpcap还要兼容其它操作系统的原因吧,因为你把它放在pcap-linux.c,其它操作系统调用这个函数,就不方便了,从这一点考虑,libpcap的作者们的架构还是挺不错的。另外定义2个回调函数pcap_activate_linux和pcap_can_set_rfmon_linux函数。Pcap_create函数的返回值为pcap_t*类型的网卡的句柄。既然讲到了pcap_create函数,就必须跟踪到pcap_create_common函数及另外的2个回调函数中去。下面接着看pcap_create_common函数的源码。
pcap_t *pcap_create_common(constchar *source, char *ebuf)
{
pcap_t*p;
p= malloc(sizeof(*p)); //给p分配内存
if(p == NULL) {
snprintf(ebuf,PCAP_ERRBUF_SIZE, "malloc: %s",
pcap_strerror(errno));
return(NULL);
}
memset(p,0, sizeof(*p)); //对p的内存区域清0
#ifndef WIN32
p->fd= -1; /* not opened yet */
p->selectable_fd= -1;
p->send_fd= -1;
#endif
p->opt.source= strdup(source); //source为网卡的名字
if(p->opt.source == NULL) {
snprintf(ebuf,PCAP_ERRBUF_SIZE, "malloc: %s",
pcap_strerror(errno));
free(p);
return(NULL);
}
/*
* Default to"can't set rfmon mode"; if it's supported by
* a platform, thecreate routine that called us can set
* the op to its routineto check whether a particular
* device supports it.
*/
p->can_set_rfmon_op= pcap_cant_set_rfmon;
initialize_ops(p);
/*put in some defaults*/
pcap_set_timeout(p,0);
pcap_set_snaplen(p,65535); /* max packet size */
p->opt.promisc= 0;
p->opt.buffer_size= 0;
return(p);
}
在这个函数中,需要讲解的是strdup函数,它的作用是复制字符串,返回指向被复制的字符串的指针。注意应用它时,需要加头文件#include
p->can_set_rfmon_op =pcap_cant_set_rfmon; 这句话的作用在函数里面的注释中已经讲了,默认为不设置rfmon mode。initialize_ops(p);函数的作用就是设置初始化的一系列回调函数。
pcap_set_timeout(p,0);
pcap_set_snaplen(p,65535); /* max packet size */
p->opt.promisc= 0;
p->opt.buffer_size= 0;
这几行代码的作用是设置初始的超时,snaplen=65535,设置成非混杂模式,内核缓冲区的大小初始化为0。整的来说pcap_create_common就是一个初始化函数。
其中initialize_ops函数的源码如下:
static void initialize_ops(pcap_t*p)
{
/*
* Set operationpointers for operations that only work on
* an activated pcap_tto point to a routine that returns
* a "this isn'tactivated" error.
*/
p->read_op= (read_op_t)pcap_not_initialized;
p->inject_op= (inject_op_t)pcap_not_initialized;
p->setfilter_op= (setfilter_op_t)pcap_not_initialized;
p->setdirection_op= (setdirection_op_t)pcap_not_initialized;
p->set_datalink_op= (set_datalink_op_t)pcap_not_initialized;
p->getnonblock_op= (getnonblock_op_t)pcap_not_initialized;
p->setnonblock_op= (setnonblock_op_t)pcap_not_initialized;
p->stats_op= (stats_op_t)pcap_not_initialized;
#ifdef WIN32
p->setbuff_op= (setbuff_op_t)pcap_not_initialized;
p->setmode_op= (setmode_op_t)pcap_not_initialized;
p->setmintocopy_op= (setmintocopy_op_t)pcap_not_initialized;
#endif
/*
* Default cleanupoperation - implementations can override
* this, but should callpcap_cleanup_live_common() after
* doing their ownadditional cleanup.
*/
p->cleanup_op= pcap_cleanup_live_common;
/*
* In most cases, the standard one-shortcallback can
* be used for pcap_next()/pcap_next_ex().
*/
p->oneshot_callback= pcap_oneshot;
}
pcap_create_common讲解完了,接着讲解pcap_create函数中的另外一个回调函数,pcap_activate_linux,搜索这个函数,呵呵,在pcap-linux.c中找到了这个函数。Libpcap的作者这个架构,实在是令小生佩服。把linux要用到的函数都集成到了pcap-linux.c中,还把多个操作系统共用的函数就放到了pcap.c中,比如前面讲到的pcap_create_common函数。先不管这么多,抓住pcap_activate_linux再说。下面讲解pcap_activate_linux这个源码。从pcap_activate_linux的源码可以看到,通过pcap_create_common对pcap_t * p设定初始值,其实就像c++的初始化函数一样,比如c++的构造函数,MFC的OninitDialog函数一样。初始化就是初始化,对于不同的系统,就要进行不同的设置了,在linux函数中pcap_activate_linux中可以看到又对pcap_create_common中初始化的回调函数又重新进行了设置,看到这里我就佩服libpcap的作者了,把pcap_create_common函数放到了pcap.c文件中。
static int pcap_activate_linux(pcap_t*handle)
{
constchar *device;
int status = 0;
device= handle->opt.source; //网卡的名字
handle->inject_op= pcap_inject_linux;
handle->setfilter_op= pcap_setfilter_linux;
handle->setdirection_op= pcap_setdirection_linux;
handle->set_datalink_op= NULL; /* can't change data link type */
handle->getnonblock_op= pcap_getnonblock_fd;
handle->setnonblock_op= pcap_setnonblock_fd;
handle->cleanup_op= pcap_cleanup_linux;
handle->read_op= pcap_read_linux;
handle->stats_op= pcap_stats_linux;
/*
* The "any"device is a special device which causes us not
* to bind to a particulardevice and thus to look at all
* devices.
*/
if(strcmp(device, "any") == 0) {
if(handle->opt.promisc) {
handle->opt.promisc= 0;
/*Just a warning. */
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"Promiscuous mode not supported on the\"any\" device");
status= PCAP_WARNING_PROMISC_NOTSUP;
}
}
handle->md.device = strdup(device);
if(handle->md.device == NULL) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "strdup: %s",
pcap_strerror(errno) );
returnPCAP_ERROR;
}
#ifdef HAVE_PF_RING //是否定义pf_ring
if(!getenv("PCAP_NO_PF_RING")){
/* Code courtesy ofChris Wakelin
char *clusterId;
handle->ring =pfring_open((char*)device, handle->opt.promisc, handle->snapshot, 1);
/*
#ifdef HAVE_PF_RING 如果定义了PF_RING,就执行这个里面的东东,从里面的函数可以看出,pf_ring从新定义了socket函数,pfring_open函数的作用如下:初始化PF_RING socket,获得一个pfring类型的结构。函数原型如下:
pfring* pfring_open(char *device_name,u_int8_t promisc, u_int32_t caplen, u_int8_t reentrant);
函数功能:初始化PF_RING socket,获得一个pfring类型结构。如果需要以DNA的方式打开一个设备,则必须调用pfring_open_dna()函数;
参数:
Device_name: PF_RING的符号链接命令(egeth0);
Promisc: 设置是否为混合模式(1=混合模式);
Caplen:最大的包捕获长度,(also known assnaplen和pcap_open_live函数的snaplen一样,通常设为65535就能捕获到网络上最大的数据包);
Reentrant: 设为非0,则设备已reentrant的模式打开,它以信号量的机制执行,性能稍微会变差,主要用在多线程应用程序;
返回值:成功返回一个句柄,否则返回NULL
Pfring_open的源码如下:
pfring*pfring_open(char *device_name, u_int8_t promisc,u_int32_t caplen, u_int8_t_reentrant) {
return(pfring_open_consumer(device_name, promisc, caplen, _reentrant,
0, NULL, 0));
Pfring_open 其实是调用的pfring_open_consumer函数;该函数到后面我们在继续分析它;
*/
if(handle->ring) {
if(clusterId =getenv("PCAP_PF_RING_CLUSTER_ID"))
/*
其中getenv为C语言中读取环境变量的当前值的函数
原形:char *getenv(const char *name)
用法:s=getenv("环境变量名");
需先定义char *s;
功能:返回一给定的环境变量值,环境变量名可大写或小写。如果指定的变量在环境中未定义,则返回一空串。
*/
if(atoi(clusterId) > 0 &&atoi(clusterId) < 255)
if(getenv("PCAP_PF_RING_USE_CLUSTER_PER_FLOW"))
pfring_set_cluster(handle->ring,atoi(clusterId), cluster_per_flow);
else
pfring_set_cluster(handle->ring, atoi(clusterId),cluster_round_robin);
pfring_enable_ring(handle->ring);
} else
handle->ring = NULL;
}else
handle->ring = NULL;
/*
pfring_set_cluster的函数只用于设置cluster_id,通过调用PF_RING的setsockopt函数完成:
查找PF_RING的文档,对这个函数有以下说明,在多cpu的情况下,pfring_set_cluster是非常有用的:
This call allows a ring to be added to acluster that can spawn across address spaces. On a nuthsell when two or moresockets are clustered they share incoming packets that are balanced on aper-flow manner. This technique is useful for exploiting multicore systems of for sharing packets in the same address space across multiple threads.
intpfring_set_cluster(pfring *ring, u_int clusterId, cluster_type the_type) {
#ifdef USE_PCAP
return(-1);
#else
if(ring->dna_mapped_device)
return(-1);
else {
struct add_to_cluster cluster;
cluster.clusterId = clusterId,cluster.the_type = the_type;
return(ring ? setsockopt(ring->fd, 0,SO_ADD_TO_CLUSTER,
&cluster, sizeof(cluster)): -1);
}
#endif
}
其中setsockopt/getsockopt函数的作用是:
功能描述:
获取或者设置与某个套接字关联的选项。选项可能存在于多层协议中,它们总会出现在最上面的套接字层。当操作套接字选项时,选项位于的层和选项的名称必须给出。为了操作套接字层的选项,应该将层的值指定为SOL_SOCKET。为了操作其它层的选项,控制选项的合适协议号必须给出。例如,为了表示一个选项由TCP协议解析,层应该设定为协议号TCP。用法如下:
#include
#include
int getsockopt(int sock,int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int sock,int level, int optname, const void *optval, socklen_t optlen);
参数说明:
sock:将要被设置或者获取选项的套接字。
level:选项所在的协议层。
optname:需要访问的选项名。//SO_ADD_TO_CLUSTER
optval:对于getsockopt(),指向返回选项值的缓冲。对于setsockopt(),指向包含新选项值的缓冲。
optlen:对于getsockopt(),作为入口参数时,选项值的最大长度。作为出口参数时,选项值的实际长度。对于setsockopt(),现选项的长度。
如果定义了PF_RING就是调用pfring_open建立sock,这一部分内容讲解告一段落了。
*/
if(handle->ring!= NULL) {
handle->fd = handle->ring->fd;
handle->bufsize = handle->snapshot;
handle->linktype = DLT_EN10MB;
handle->offset = 2;
/* printf("OpenHAVE_PF_RING(%s)\n", device); */
}else {
/* printf("Open HAVE_PF_RING(%s) failed.Fallback to pcap\n", device); */
#endif
/*
* If we're inpromiscuous mode, then we probably want
* to see when theinterface drops packets too, so get an
* initial count from/proc/net/dev
*/
if(handle->opt.promisc)
handle->md.proc_dropped= linux_if_drops(handle->md.device);
/*
* Current Linux kernelsuse the protocol family PF_PACKET to
* allow direct accessto all packets on the network while
* older kernels had aspecial socket type SOCK_PACKET to
* implement thisfeature.
* While this oldimplementation is kind of obsolete we need
* to be compatible witholder kernels for a while so we are
* trying both methodswith the newer method preferred.
*/
// 目前的内核是采用PF_PACKET,而老的内核通过采用SOCK_PACKET
if((status = activate_new(handle)) == 1) {
/*
* Try to open a packet socket using the newkernel PF_PACKET interface.
* Returns 1 on success, 0 on an error thatmeans the new interface isn't
* present (so the old SOCK_PACKET interfaceshould be tried), and a
* PCAP_ERROR_ value on an error that meansthat the old mechanism won't
* work either (so it shouldn't be tried). Activate_new函数的作用在没有定义PF_RING的情况下通过PF_PACKET接口建立socket,返回1表示成功,可以采用PF_PACKET建立socket,返回0表示失败,这时可以尝试采用SOCKET_PACKET接口建立socket,该函数也在pcap-linux.c中可以找到源码;根据status的返回值,确定3种不同的情况,返回1成功,表示采用的是PF_PACKET建立socket,而返回0的时候,又调用activate_old函数进行判断,如果activate_old函数返回1表示调用的是SOCK_PACKET建立socket,而activate_old返回0表示失败;第3种情况是status不等于上面的2个值,则表示失败。
*/
/*
* Success.
* Try to use memory-mapped access.
*/
switch(activate_mmap(handle)) {
case1:
/*we succeeded; nothing more to do */
return0;
case0:
/*
* Kernel doesn't support it - just continue
* with non-memory-mapped access.
*/
status= 0;
break;
case-1:
/*
* We failed to set up to use it, or kernel
* supports it, but we failed to enable it;
* return an error. handle->errbuf contains
* an error message.
*/
status= PCAP_ERROR;
gotofail;
}
}
elseif (status == 0) {
/*Non-fatal error; try old way */
if((status = activate_old(handle)) != 1) {
/*
* Bothmethods to open the packet socket failed.
* Tidy upand report our failure (handle->errbuf
* isexpected to be set by the functions above).
*/
gotofail;
}
}else {
/*
* Fatal errorwith the new way; just fail.
* status has theerror return; if it's PCAP_ERROR,
*handle->errbuf has been set appropriately.
*/
gotofail;
}
/*
* We set up the socket,but not with memory-mapped access.
*/
if(handle->opt.buffer_size != 0) {
/*
如果opt.buffer_size!=0以我的理解就是应用程序调用了pcap_set_buffer_size设置了内核缓冲区的大小,而不是采用默认的内核缓冲区,因此首先通过setsockopt发送设置命令,然后调用malloc分配内存。
* Set the socket buffersize to the specified value.
*/
if(setsockopt(handle->fd, SOL_SOCKET,SO_RCVBUF,
&handle->opt.buffer_size,
sizeof(handle->opt.buffer_size)) == -1){
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"SO_RCVBUF: %s",pcap_strerror(errno));
status= PCAP_ERROR;
gotofail;
}
}
#ifdef HAVE_PF_RING
}
#endif
/*Allocate the buffer */
handle->buffer = malloc(handle->bufsize +handle->offset);
if(!handle->buffer) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"malloc: %s", pcap_strerror(errno));
status= PCAP_ERROR;
gotofail;
}
/*
*"handle->fd" is a socket, so "select()" and"poll()"
* should work on it.
*/
handle->selectable_fd= handle->fd;
returnstatus;
fail:
pcap_cleanup_linux(handle);
returnstatus;
}
pcap_activate_linux函数分析完了,按我的理解应该是用PF_RING代替PF_PACKET或SOCK_PACKET。但是我从pcap_activate_linux函数,简单的分析下,发现首先采用的pfring_open建立sock,以我的理解,当定义了pf_ring时,采用pfring_open建立socket后应该马上退出函数,不去判断后面的内容了,比如又去判断activate_new和activate_old函数,没有搞明白,也不理解作者的意图。所以我再次的对pfring_open的源码进行分析,继续跟踪代码:首先跟踪的是pfring_open函数,然后跟踪activate_new函数,有必要看看这个里面是怎么实现的。前面说过pfring_open是调用pfring_open_consumer函数的,为了分析他们的源码,跟踪到pfring.c文件中,pfring_open_consumer函数的源码如下:
pfring* pfring_open_consumer(char *device_name, u_int8_tpromisc,
u_int32_t caplen, u_int8_t _reentrant,
u_int8_tconsumer_plugin_id,
char* consumer_data, u_intconsumer_data_len) {
#ifdefUSE_PCAP
char ebuf[256];
pcap_t *pcapPtr = pcap_open_live(device_name,
caplen,
1 /* promiscuous mode */,
1000 /* ms */,
ebuf);
return((pfring*)pcapPtr);
#else
int err = 0;
pfring *ring =(pfring*)malloc(sizeof(pfring)); //申请pfring结构体大小的内存
if(ring == NULL)
return(NULL);
else
memset(ring, 0, sizeof(pfring)); //将缓冲区清0
ring->reentrant = _reentrant;
ring->fd = socket(PF_RING, SOCK_RAW,htons(ETH_P_ALL)); //建立socket
#ifdef RING_DEBUG
printf("OpenRING [fd=%d]\n", ring->fd);
#endif
if(ring->fd > 0) {
int rc;
u_int memSlotsLen;
if(caplen > MAX_CAPLEN) caplen = MAX_CAPLEN;
//在pfring.h中定义 MAX_CAPLEN,#define MAX_CAPLEN 16384
setsockopt(ring->fd, 0, SO_RING_BUCKET_LEN, &caplen, sizeof(caplen));
//设置caplen,caplen为捕获包的大小在pfring.h中定义它的最大大小为16384
/* printf("channel_id=%d\n",channel_id); */
if(device_name == NULL /* any */) {
device_name = "any";
rc = pfring_bind(ring, device_name); //绑定ring
} else if(!strcmp(device_name,"none")) {
/* No binding yet */
rc = 0;
} else
rc = pfring_bind(ring, device_name);
if(rc == 0) {
if(consumer_plugin_id > 0) {
ring->kernel_packet_consumer =consumer_plugin_id;
rc = pfring_set_packet_consumer_mode(ring,consumer_plugin_id,
consumer_data, consumer_data_len);
if(rc < 0) {
free(ring);
return(NULL);
}
} else
ring->kernel_packet_consumer = 0;
ring->buffer = (char *)mmap(NULL,PAGE_SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED, ring->fd, 0);
//mmap 内存映射其中PAGE_SIZE=4096
/*
内存映射mmap函数原型如下:函数:void *mmap(void*start,size_t length,int prot,int flags,int fd,off_t offsize);
参数start:指向欲映射的内存起始地址,通常设为 NULL,代表让系统自动选定地址,映射成功后返回该地址。
参数length:代表将文件中多大的部分映射到内存。
参数prot:映射区域的保护方式。可以为以下几种方式的组合:
PROT_EXEC 映射区域可被执行,PROT_READ映射区域可被读取,PROT_WRITE映射区域可被写入
PROT_NONE 映射区域不能存取;
参数flags:影响映射区域的各种特性。在调用mmap()时必须要指定MAP_SHARED或MAP_PRIVATE。
MAP_FIXED 如果参数start所指的地址无法成功建立映射时,则放弃映射,不对地址做修正。通常不鼓励用此旗标。
MAP_SHARED对映射区域的写入数据会复制回文件内,而且允许其他映射该文件的进程共享。
MAP_PRIVATE 对映射区域的写入操作会产生一个映射文件的复制,即私人的“写入时复制”(copy on write)对此区域作的任何修改都不会写回原来的文件内容。
MAP_ANONYMOUS建立匿名映射。此时会忽略参数fd,不涉及文件,而且映射区域无法和其他进程共享。
MAP_DENYWRITE只允许对映射区域的写入操作,其他对文件直接写入的操作将会被拒绝。
MAP_LOCKED 将映射区域锁定住,这表示该区域不会被置换(swap)。
参数fd:要映射到内存中的文件描述符(ring->fd为socket函数的返回值)。如果使用匿名内存映射时,即flags中设置了MAP_ANONYMOUS,fd设为-1。有些系统不支持匿名内存映射,则可以使用fopen打开/dev/zero文件,然后对该文件进行映射,可以同样达到匿名内存映射的效果。
参数offset:文件映射的偏移量,通常设置为0,代表从文件最前方开始对应,offset必须是分页大小的整数倍。
返回值:
若映射成功则返回映射区的内存起始地址,否则返回MAP_FAILED(-1),错误原因存于errno中。
*/
if(ring->buffer == MAP_FAILED) {
printf("mmap()failed: try with a smaller snaplen\n");
free(ring);
return(NULL);
}
ring->slots_info = (FlowSlotInfo *)ring->buffer;
//其中ring->buffer为mmap内存映射的缓冲区,ring->slot_info指向ring->buffer的开始位置;
if(ring->slots_info->version != RING_FLOWSLOT_VERSION) {
printf("WrongRING version: "
"kernel is %i, libpfring wascompiled with %i\n",
ring->slots_info->version,RING_FLOWSLOT_VERSION);
free(ring);
return(NULL);
}
memSlotsLen = ring->slots_info->tot_mem; //
munmap(ring->buffer,PAGE_SIZE); //删除映射
ring->buffer = (char*)mmap(NULL, memSlotsLen,
PROT_READ|PROT_WRITE,
MAP_SHARED, ring->fd, 0);
/*
感觉前面的mmap就是为了得到memSlotsLen,然后就用munmap删除映射了,接着使用mmap重新内存映射。
*/
if(ring->buffer == MAP_FAILED) {
printf("mmap() failed");
free(ring);
return(NULL);
}
ring->slots_info = (FlowSlotInfo *)ring->buffer; //得到环状缓冲区指针
ring->slots = (char*)(ring->buffer+sizeof(FlowSlotInfo));
//跳过环状缓冲区前面的机构体的大小,后面就是用来接收数据了。
/* Set defaults */
ring->device_name = strdup(device_name? device_name : "");
#ifdefRING_DEBUG
printf("RING (%s):tot_mem=%u/min_tot_slots=%u/max_slot_len=%u/"
"insert_off=%u/remove_off=%u/dropped=%llu\n",
device_name,
ring->slots_info->tot_mem,
ring->slots_info->tot_slots,
ring->slots_info->slot_len,
ring->slots_info->insert_off,
ring->slots_info->remove_off,
ring->slots_info->tot_lost);
#endif
if(promisc) {
if(set_if_promisc(device_name, 1) == 0)
ring->clear_promisc = 1;
}
#ifdef ENABLE_HW_TIMESTAMP
pfring_enable_hw_timestamp(ring,device_name);
#endif
} else {
close(ring->fd);
err = -1;
}
} else {
err = -1;
free(ring);
}
if(err == 0) {
if(ring->reentrant)
pthread_spin_init(&ring->spinlock,PTHREAD_PROCESS_PRIVATE);
return(ring);
} else
return(NULL);
#endif
}
//pfring_bind函数的作用是调用bind绑定socket; rc = bind(ring->fd,(struct sockaddr *)&sa, sizeof(sa));
int pfring_bind(pfring *ring, char *device_name) {
struct sockaddr sa; //定义一个socket地址变量
char *at;
int32_t channel_id = -1;
int rc = 0;
if((device_name == NULL) ||(strcmp(device_name, "none") == 0))
return(-1);
at = strchr(device_name, '@');
if(at != NULL) {
char *tok, *pos = NULL;
at[0] = '\0';
/* Syntax
ethX@1,5 channel 1 and 5
ethX@1-5 channel 1,2...5
ethX@1-3,5-7 channel 1,2,3,5,6,7
*/
tok = strtok_r(&at[1], ",",&pos);
channel_id = 0;
while(tok != NULL) {
char *dash = strchr(tok, '-');
int32_t min_val, max_val, i;
if(dash) {
dash[0] = '\0';
min_val = atoi(tok);
max_val = atoi(&dash[1]);
} else
min_val = max_val = atoi(tok);
for(i = min_val; i <= max_val; i++)
channel_id |= 1 << i;
tok = strtok_r(NULL, ",",&pos);
}
}
/* Setup TX */
ring->sock_tx.sll_family = PF_PACKET;
ring->sock_tx.sll_protocol =htons(ETH_P_ALL);
sa.sa_family = PF_RING;
snprintf(sa.sa_data, sizeof(sa.sa_data),"%s", device_name);
rc = bind(ring->fd, (struct sockaddr*)&sa, sizeof(sa));
/*
Bind函数:
头文件 |
#include |
|
函数原型 |
int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen); |
|
返回值 |
成功 |
失败 |
0 |
1 |
*/
if(rc == 0) {
if(channel_id != -1) {
int rc = pfring_set_channel_id(ring,channel_id);
if(rc != 0)
printf("pfring_set_channel_id()failed: %d\n", rc);
}
}
return(rc);
}
在这里又将pfring_open_consumer源码分析完了,确实跟我理解的一样。就是通过内存映射建立一个ring缓冲区,然后调用pfring_bind对socket进行绑定。再前面我们说了以我的个人理解,PF_RING的补丁,就是要采用新的socket代替原来的PF_PACKET和SOCK_PACKET,但是我开始分析源码时,发现既然建立了PF_RING,为什么pcap_activate_linux不直接返回呢,诧异,诧异。再次返回pcap_activate_linux函数看看,有什么没有看懂的吗?首先分析下pcap_activate_linux带的参数pcap_t *handle,这个数据结构吧,大家知道算法+数据结构=程序,可见数据结构的重要性。在pcap-int.h中找到了定义ring的地方,如下:
#ifdefHAVE_PF_RING
pfring *ring;
#endif
下面要看看,既然采用了pfring_open建立和绑定了socket,后面的activate_new函数的作用是什么呢?跟踪一下activate_new函数吧,
static int activate_new(pcap_t*handle)
{
#ifdef HAVE_PF_PACKET_SOCKETS
// HAVE_PF_PACKET_SOCKETS首先判断是不是PF_PACKETsocket类型,是的就执行这个里面的操作,不是的话,相当于直接返回0,就可以去调用activate_old去判断是不是SOCK_PACKET类型了。
const char *device = handle->opt.source;
int is_any_device= (strcmp(device, "any") == 0);
int sock_fd= -1, arptype;
#ifdef HAVE_PACKET_AUXDATA
int val;
#endif
int err= 0;
struct packet_mreq mr;
/*
* Open a socket with protocol family packet.If the
* "any" device was specified, weopen a SOCK_DGRAM
* socket for the cooked interface, otherwisewe first
* try a SOCK_RAW socket for the raw interface.
*/
sock_fd = is_any_device ?
socket(PF_PACKET, SOCK_DGRAM,htons(ETH_P_ALL)) :
socket(PF_PACKET, SOCK_RAW,htons(ETH_P_ALL));
//socket函数的作用是建立socket,下面是不是会出现绑定的函数呢,仔细看看
if (sock_fd == -1) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "socket: %s",
pcap_strerror(errno) );
return 0; /* try old mechanism */
}
/* It seems the kernel supports the newinterface. */
handle->md.sock_packet = 0;
/*
* Get the interface index of the loopbackdevice.
* If the attempt fails, don't fail, just setthe
* "md.lo_ifindex" to -1.
*
* XXX - can there be more than one device thatloops
* packets back, i.e. devices other than"lo"? If so,
* we'd need to find them all, and have anarray of
* indices for them, and check all of them in
* "pcap_read_packet()".
*/
handle->md.lo_ifindex =iface_get_id(sock_fd, "lo", handle->errbuf);
/*
* Default value for offset to align link-layerpayload
* on a 4-byte boundary.
*/
handle->offset = 0;
/*
* What kind of frames do we have to deal with?Fall back
* to cooked mode if we have an unknowninterface type
* or a type we know doesn't work well in rawmode.
*/
if (!is_any_device) {
/* Assume for now we don't needcooked mode. */
handle->md.cooked = 0;
if (handle->opt.rfmon) {
/*
* We were asked to turn on monitor mode.
* Do so before we get the link-layer type,
* because entering monitor mode could change
* the link-layer type.
*/
err =enter_rfmon_mode(handle, sock_fd, device);
if (err < 0) {
/* Hard failure */
close(sock_fd);
return err;
}
if (err == 0) {
/*
* Nothing worked for turning monitor mode
* on.
*/
close(sock_fd);
returnPCAP_ERROR_RFMON_NOTSUP;
}
/*
* Either monitor mode has been turned on for
* the device, or we've been given a different
* device to open for monitor mode. If we've
* been given a different device, use it.
*/
if (handle->md.mondevice!= NULL)
device =handle->md.mondevice;
}
arptype = iface_get_arptype(sock_fd, device, handle->errbuf);
if (arptype < 0) {
close(sock_fd);
return arptype;
}
map_arphrd_to_dlt(handle, arptype,1);
if (handle->linktype == -1 ||
handle->linktype == DLT_LINUX_SLL ||
handle->linktype == DLT_LINUX_IRDA ||
handle->linktype == DLT_LINUX_LAPD ||
(handle->linktype == DLT_EN10MB &&
(strncmp("isdn", device, 4) == 0||
strncmp("isdY", device, 4) ==0))) {
if (close(sock_fd) == -1) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"close: %s", pcap_strerror(errno));
return PCAP_ERROR;
}
sock_fd = socket(PF_PACKET,SOCK_DGRAM,
htons(ETH_P_ALL));
if (sock_fd == -1) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"socket: %s",pcap_strerror(errno));
return PCAP_ERROR;
}
handle->md.cooked = 1;
/*
* Get rid of any link-layer type list
* we allocated - this only supports cooked
* capture.
*/
if (handle->dlt_list !=NULL) {
free(handle->dlt_list);
handle->dlt_list= NULL;
handle->dlt_count= 0;
}
if (handle->linktype ==-1) {
/*
* Warn that we're falling back on
* cooked mode; we may want to
* update "map_arphrd_to_dlt()"
* to handle the new type.
*/
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"arptype%d not "
"supportedby libpcap - "
"fallingback to cooked "
"socket",
arptype);
}
/*
* IrDA capture is not a real"cooked" capture,
* it's IrLAP frames, not IP packets. The
* same applies to LAPD capture.
*/
if (handle->linktype !=DLT_LINUX_IRDA &&
handle->linktype != DLT_LINUX_LAPD)
handle->linktype= DLT_LINUX_SLL;
}
handle->md.ifindex =iface_get_id(sock_fd, device,
handle->errbuf);
if (handle->md.ifindex == -1) {
close(sock_fd);
return PCAP_ERROR;
}
// 在上面我们分析盼望已久的绑定函数终于出现了iface_bind函数就是绑定函数,这个函数我猜里面也是调用的bind函数吧,带着这个预期,我去跟踪下iface_bind的代码,再来给答案,看了iface_bind代码,果然和我预测的结果一样,是调用的bind函数进行绑定。
if ((err =iface_bind(sock_fd, handle->md.ifindex,
handle->errbuf)) != 1) {
close(sock_fd);
if (err < 0)
return err;
else
return 0; /* try old mechanism */
}
} else {
/*
* The "any" device.
*/
if (handle->opt.rfmon) {
/*
* It doesn't support monitor mode.
*/
returnPCAP_ERROR_RFMON_NOTSUP;
}
/*
* It uses cooked mode.
*/
handle->md.cooked = 1;
handle->linktype =DLT_LINUX_SLL;
/*
* We're not bound to a device.
* For now, we're using this as an indication
* that we can't transmit; stop doing that only
* if we figure out how to transmit in cooked
* mode.
*/
handle->md.ifindex = -1;
}
if (!is_any_device &&handle->opt.promisc) {
memset(&mr, 0, sizeof(mr));
mr.mr_ifindex =handle->md.ifindex;
mr.mr_type = PACKET_MR_PROMISC;
if (setsockopt(sock_fd,SOL_PACKET, PACKET_ADD_MEMBERSHIP,
&mr, sizeof(mr)) == -1) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"setsockopt:%s", pcap_strerror(errno));
close(sock_fd);
return PCAP_ERROR;
}
}
/* Enableauxillary data if supported and reserve room for
* reconstructing VLAN headers. */
#ifdef HAVE_PACKET_AUXDATA
val = 1;
if (setsockopt(sock_fd, SOL_PACKET,PACKET_AUXDATA, &val,
sizeof(val)) == -1 && errno !=ENOPROTOOPT) {
snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,
"setsockopt: %s", pcap_strerror(errno));
close(sock_fd);
return PCAP_ERROR;
}
handle->offset += VLAN_TAG_LEN;
#endif /* HAVE_PACKET_AUXDATA */
if (handle->md.cooked) {
if (handle->snapshot handle->snapshot =SLL_HDR_LEN + 1; } handle->bufsize = handle->snapshot; /* Save the socket FD in the pcapstructure */ handle->fd = sock_fd; return 1; #else //如果不是PF_PACKET类型,就直接返回0了,呵呵 strncpy(ebuf, "New packet capturinginterface not supported by build " "environment",PCAP_ERRBUF_SIZE); return 0; #endif } 从activate_new函数的源码中也没有解决我要解决的那个问题,如果是PF_RING,就应该不去判断后面两种socket类型了,我又回到了pcap_activate_linux函数的源码,仔细看了看,这一次真的看出来了,就是一个handle->ring != NULL开始没有注意到,害我分析好久的其它代码不过也学到一些东西, if(handle->ring != NULL) { handle->fd = handle->ring->fd; handle->bufsize = handle->snapshot; handle->linktype = DLT_EN10MB; handle->offset = 2; /* printf("OpenHAVE_PF_RING(%s)\n", device); */ }else { /* printf("Open HAVE_PF_RING(%s) failed.Fallback to pcap\n", device); */ 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 } 当handle->ring!=NULL的时候,就会跳过activate_new等代码的,也就是说执行了PF_RING成功后,就不会去判断后面2种socket类型了,和我预测的一样。呵呵,终于明白pcap_activate_linux这个函数的功能了。 2011-4-18补充。并不是所有的情况pfring_open都会返回成功的,对应pcap_activate_linux里面当pfring_open调用后,比如我在实验时,将PF_RING补丁打入内核就出现错误"WrongRING version: " "kernel is 10, libpfring was compiled with 13" ,但是提示这个错误后,程序还能正确的跑,我后面再ring.h中看到内核pf_ring的版本定义为: #defineRING_FLOWSLOT_VERSION 10 同时在pf_ring.h中发现: #defineRING_FLOWSLOT_VERSION 13 在pfring_open的源码pfring_open_consumer中发现如果版本不一致,就会提示错误,pfring_open_consumer直接返回,这样pfring_open的返回值为NULL,但是为什么程序还能继续运行呢,这就是因为执行到了handle->ring!=NULL时的else部分,随后就会调用原始的libpcap收包函数获取数据包了,也就是说采用PF_PACKET的方式读取数据包,所以还是能够正常运行的。 同时在没有加载insmodpf_ring.ko时候,pfring_open也会返回为NULL,此时,程序也会调用libpcap原来的PF_PACKET进行收包的。 另外另一问题,当采用PF_RING读取数据包时,cpu占用率从原来的37%上升到47%,原来240Mbit/s的速度发包,大约2分钟丢3个包,采用PF_RING后可以提高到3分钟丢2个包,包长为1514个字节。 pcap_activate_linux定义的这些回调函数也是值得注意的。这里把他们都列出来。 device = handle->opt.source; handle->inject_op = pcap_inject_linux; handle->setfilter_op = pcap_setfilter_linux; handle->setdirection_op = pcap_setdirection_linux; handle->set_datalink_op = NULL; /* can't change data link type */ handle->getnonblock_op = pcap_getnonblock_fd; handle->setnonblock_op = pcap_setnonblock_fd; handle->cleanup_op = pcap_cleanup_linux; handle->read_op = pcap_read_linux; handle->stats_op = pcap_stats_linux; 其它的回调函数我就不多说了,这里重点要讲解的是pcap_read_linux函数,函数源码如下: /* * Readat most max_packets from the capture stream and call the callback * foreach of them. Returns the number of packets handled or -1 if an * erroroccured. */ static int pcap_read_linux(pcap_t *handle, intmax_packets, pcap_handler callback, u_char *user) { /* * Currently, on Linuxonly one packet is delivered per read, * so we don't loop. */ returnpcap_read_packet(handle, callback, user); } 函数体就相当简答了,晕,只有一句,就是调用pcap_read_packet函数读取数据包。 pcap_read_packet函数;这个函数可长了,一步一步看吧,既然开始分析了,就一定要把这些源码吃透,这里才能理解libpcap为什么丢包,而加上pf-ring补丁后的libpcap就不丢包了呢。不多说了,看源码吧。还有这个回调函数什么时候调用的呢,我现在猜想应该是应用程序调用pcap_next, pcap_next_ex, pcap_dispatch, pcap_loop这几个函数时读包时调用的吧,现在只是猜想,还没有分析这部分读包的源码,呵呵,好了,还是来看pcap_read_packet函数吧。 /* * Read a packet from the socket calling thehandler provided by * the user. Returns the number of packetsreceived or -1 if an * error occured. */ staticint pcap_read_packet(pcap_t*handle, pcap_handler callback, u_char *userdata) { u_char *bp; int offset; #ifdef HAVE_PF_PACKET_SOCKETS struct sockaddr_ll from; struct sll_header *hdrp; #else struct sockaddr from; #endif #if defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) struct iovec iov; struct msghdr msg; struct cmsghdr *cmsg; union { structcmsghdr cmsg; char buf[CMSG_SPACE(sizeof(structtpacket_auxdata))]; } cmsg_buf; #else /* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */ socklen_t fromlen; #endif/* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */ int packet_len,caplen; #ifdef HAVE_PF_RING structpfring_pkthdr pcap_header; #else struct pcap_pkthdr pcap_header; #endif // 这里必须讲解下,当定义了HAVE_PF_RING时候,pcap_header指向的是pfring_pkthdr结构体,去看看它和pcap_pkthdr结构体有什么不同。Pfring_pkthdr结构体的定义如下: /* struct pfring_pkthdr { /* pcap header */ struct timeval ts; /* timestamp */ u_int32_t caplen; /* length ofportion present */ u_int32_t len; /* lengththis packet (off wire) */ struct pfring_extended_pkthdr extended_hdr; /* PF_RING extended header*/ }; */ /* 而pcap_pkthdr的结构体定义如下: struct pcap_pkthdr { struct timeval ts; /* time stamp */ bpf_u_int32 caplen; /* length of portion present */ bpf_u_int32 len; /* length this packet (off wire) */ }; */ //对比发现它们两个相比,pfring_pkthdr多了一个PF_RING的扩展头。 #ifdefHAVE_PF_RING if(handle->ring) { do { if (handle->break_loop) { /* * Yes - clear the flag that indicates that it * has, and return -2 as an indication that we * were told to break out of the loop. * * Patch courtesy of Michael Stiller */ handle->break_loop = 0; return -2; } packet_len = pfring_recv(handle->ring, (char*)handle->buffer, handle->bufsize, &pcap_header, 1 /* wait_for_incoming_packet */); /*如果定义了PF_RING,就采用pfring_recv接收数据包,这个函数后面在进行讲解,如果没有定义PF_RING的话,采用recvmsg或recvfrom来接收数据包了,这两个函数有什么区别呢,大家google一下吧,不讲了。 */ if (packet_len > 0) { bp = handle->buffer; pcap_header.caplen = min(pcap_header.caplen, handle->bufsize); caplen = pcap_header.caplen, packet_len = pcap_header.len; goto pfring_pcap_read_packet; } }while (packet_len == -1 && (errno == EINTR || errno == ENETDOWN)); } #endif #ifdefHAVE_PF_PACKET_SOCKETS /* *If this is a cooked device, leave extra room for a *fake packet header. */ if (handle->md.cooked) offset = SLL_HDR_LEN; else offset = 0; #else /* *This system doesn't have PF_PACKET sockets, so it doesn't *support cooked devices. */ offset = 0; #endif /* * Receive a single packet from the kernel. * We ignore EINTR, as that might just be dueto a signal * being delivered - if the signal shouldinterrupt the * loop, the signal handler should callpcap_breakloop() * to set handle->break_loop (we ignore iton other * platforms as well). * We also ignore ENETDOWN, so that we cancontinue to * capture traffic if the interface goes downand comes * back up again; comments in the kernelindicate that * we'll just block waiting for packets if wetry to * receive from a socket that deliveredENETDOWN, and, * if we're using a memory-mapped buffer, wewon't even * get notified of "network down"events. */ bp = handle->buffer +handle->offset; #ifdefined(HAVE_PACKET_AUXDATA) && defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) msg.msg_name = &from; msg.msg_namelen = sizeof(from); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = &cmsg_buf; msg.msg_controllen = sizeof(cmsg_buf); msg.msg_flags = 0; iov.iov_len = handle->bufsize - offset; iov.iov_base = bp + offset; #endif /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */ do { /* * Has "pcap_breakloop()" beencalled? */ if (handle->break_loop) { /* * Yes - clear the flag that indicates that ithas, * and return PCAP_ERROR_BREAK as an indicationthat * we were told to break out of the loop. */ handle->break_loop = 0; return PCAP_ERROR_BREAK; } #ifdefined(HAVE_PACKET_AUXDATA) && defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) packet_len = recvmsg(handle->fd, &msg, MSG_TRUNC); #else /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */ fromlen = sizeof(from); packet_len = recvfrom( handle->fd, bp + offset, handle->bufsize -offset, MSG_TRUNC, (struct sockaddr *)&from, &fromlen); #endif /* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */ } while (packet_len == -1 &&errno == EINTR); /* Check if an error occured */ if (packet_len == -1) { switch (errno) { case EAGAIN: return 0; /* no packet there */ case ENETDOWN: /* * The device on which we're capturing wentaway. * * XXX - we should really return * PCAP_ERROR_IFACE_NOT_UP, but pcap_dispatch() * etc. aren't defined to return that. */ snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "The interfacewent down"); return PCAP_ERROR; default: snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "recvfrom: %s",pcap_strerror(errno)); return PCAP_ERROR; } } #ifdefHAVE_PF_PACKET_SOCKETS if (!handle->md.sock_packet) { /* * Unfortunately, there is a window betweensocket() and * bind() where the kernel may queue packetsfrom any * interface. If we're bound to a particular interface, * discard packets notfrom that interface. * * (If socket filters are supported, we coulddo the * same thing we do when changing the filter;however, * that won't handle packet sockets withoutsocket * filter support, and it's a bit more complicated. * It would save some instructions per packet,however.) */ if (handle->md.ifindex != -1&& from.sll_ifindex != handle->md.ifindex) return 0; /* * Do checks based on packet direction. * We can only do this if we're usingPF_PACKET; the * address returned for SOCK_PACKET is a"sockaddr_pkt" * which lacks the relevant packet typeinformation. */ if (from.sll_pkttype ==PACKET_OUTGOING) { /* * Outgoing packet. * If this is from the loopback device, rejectit; * we'll see the packet as an incoming packetas well, * and we don't want to see it twice. */ if (from.sll_ifindex ==handle->md.lo_ifindex) return 0; /* * If the user only wants incoming packets,reject it. */ if (handle->direction ==PCAP_D_IN) return 0; } else { /* * Incoming packet. * If the user only wants outgoing packets,reject it. */ if (handle->direction ==PCAP_D_OUT) return 0; } } #endif #ifdefHAVE_PF_PACKET_SOCKETS /* * If this is a cooked device, fill in the fakepacket header. */ if (handle->md.cooked) { /* * Add the length of the fake header to thelength * of packet data we read. */ packet_len += SLL_HDR_LEN; hdrp = (struct sll_header *)bp; hdrp->sll_pkttype =map_packet_type_to_sll_type(from.sll_pkttype); hdrp->sll_hatype =htons(from.sll_hatype); hdrp->sll_halen =htons(from.sll_halen); memcpy(hdrp->sll_addr,from.sll_addr, (from.sll_halen > SLL_ADDRLEN) ? SLL_ADDRLEN : from.sll_halen); hdrp->sll_protocol= from.sll_protocol; } #ifdefHAVE_PF_RING pfring_pcap_read_packet: #endif #ifdefined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) for (cmsg = CMSG_FIRSTHDR(&msg);cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { struct tpacket_auxdata *aux; unsigned int len; struct vlan_tag *tag; if (cmsg->cmsg_len cmsg->cmsg_level != SOL_PACKET || cmsg->cmsg_type != PACKET_AUXDATA) continue; aux= (struct tpacket_auxdata *)CMSG_DATA(cmsg); if (aux->tp_vlan_tci == 0) continue; len = packet_len > iov.iov_len? iov.iov_len : packet_len; if (len < 2 * ETH_ALEN) break; bp -= VLAN_TAG_LEN; memmove(bp, bp + VLAN_TAG_LEN, 2 *ETH_ALEN); tag = (struct vlan_tag *)(bp + 2 *ETH_ALEN); tag->vlan_tpid =htons(ETH_P_8021Q); tag->vlan_tci =htons(aux->tp_vlan_tci); packet_len += VLAN_TAG_LEN; } #endif /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */ #endif /*HAVE_PF_PACKET_SOCKETS */ /* * XXX: According to the kernel source weshould get the real * packet len if calling recvfrom withMSG_TRUNC set. It does * not seem to work here :(, but it issupported by this code * anyway. * To be honest the code RELIES on that featureso this is really * broken with 2.2.x kernels. * I spend a day to figure out what's going onand I found out * that the following is happening: * * The packet comes from a random interface andthe packet_rcv * hook is called with a clone of the packet.That code inserts * the packet into the receive queue of thepacket socket. * If a filter is attached to that socket thatfilter is run * first - and there lies the problem. Thedefault filter always * cuts the packet at the snaplen: * * # tcpdump -d * (000) ret #68 * * So the packet filter cuts down the packet.The recvfrom call * says "hey, it's only 68 bytes, it fitsinto the buffer" with * the result that we don't get the real packetlength. This * is valid at least until kernel 2.2.17pre6. * * We currently handle this by making a copy ofthe filter * program, fixing all "ret"instructions with non-zero * operands to have an operand of 65535 so thatthe filter * doesn't truncate the packet, and supplyingthat modified * filter to the kernel. */ caplen = packet_len; if (caplen > handle->snapshot) caplen = handle->snapshot; /* Run the packet filter if not usingkernel filter */ if (!handle->md.use_bpf && handle->fcode.bf_insns){ if(bpf_filter(handle->fcode.bf_insns, bp, packet_len, caplen) == 0) { /* rejected by filter */ return 0; } } /* Fill in our own header data */ #ifdef HAVE_PF_RING if(!handle->ring) { #endif if (ioctl(handle->fd,SIOCGSTAMP, &pcap_header.ts) == -1) { snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "SIOCGSTAMP: %s",pcap_strerror(errno)); returnPCAP_ERROR; } pcap_header.caplen = caplen; pcap_header.len = packet_len; #ifdef HAVE_PF_RING } #endif /* * Count the packet. * * Arguably, we should count them before wecheck the filter, * as on many other platforms"ps_recv" counts packets * handed to the filter rather than packetsthat passed * the filter, but if filtering is done in thekernel, we * can't get a count of packets that passed thefilter, * and that would mean the meaning of"ps_recv" wouldn't * be the same on all Linux systems. * * XXX - it's not the same on all systems inany case; * ideally, we should have a "get thestatistics" call * that supplies more counts and indicateswhich of them * it supplies, so that we supply a count ofpackets * handed to the filter only on platforms wherethat * information is available. * * We count them here even if we can get thepacket count * from the kernel, as we can only determine atrun time * whether we'll be able to get it from thekernel (if * HAVE_TPACKET_STATS isn't defined, we can'tget it from * the kernel, but if it is defined, thelibrary might * have been built with a 2.4 or later kernel,but we * might be running on a 2.2[.x] kernel without Alexey * Kuznetzov's turbopacket patches, and thusthe kernel * might not be able to supply thosestatistics). We * could, I guess, try, when opening thesocket, to get * the statistics, and if we can not incrementthe count * here, but it's not clear that alwaysincrementing * the count is more expensive than alwaystesting a flag * in memory. * * We keep the count in"md.packets_read", and use that for * "ps_recv" if we can't get thestatistics from the kernel. * We do that because, if we *can* get thestatistics from * the kernel, we use"md.stat.ps_recv" and "md.stat.ps_drop" * as running counts, as reading the statisticsfrom the * kernel resets the kernel statistics, and ifwe directly * increment "md.stat.ps_recv" here,that means it will * count packets *twice* on systems where wecan get kernel * statistics - once here, and once inpcap_stats_linux(). */ handle->md.packets_read++; /* Call the usersupplied callback function */ #if defined(HAVE_PF_RING) { struct myts { struct timeval ts; u_int32_t caplen, len; u_int64_t ns; }; struct myts myhdr; myhdr.ts.tv_sec = pcap_header.ts.tv_sec,myhdr.ts.tv_usec = pcap_header.ts.tv_usec; myhdr.caplen = pcap_header.caplen, myhdr.len= pcap_header.len; myhdr.ns =pcap_header.extended_hdr.timestamp_ns; callback(userdata, (structpcap_pkthdr*)&myhdr, bp); } #else callback(userdata,&pcap_header, bp); #endif /*这个函数虽然比较长,但是一路看下来,还是比较好理解的,就是在不同的socket下调用不同的函数接收数据包,最后看是否定义了HAVE_PF_RING,如果定义了,调用的回调函数callback的头会不一样的,呵呵,上面代码中已经可以看的很清楚了。 */ return 1; } 讲解了这么多了,pcap_open_live还没有讲解完了,这几十页下来就讲解了pcap_open_live中调用的一个函数,哈哈,也就是pcap-linux.c中调用的pcap_create函数,libpcap博大精深,加上了pf-ring就有一种更高深的感觉。既然还没有讲解完,就接着讲解呗,下面讲解pcap_open_live中调用的另外一个函数,pcap_activate。Pcap_create起的作用是创建和绑定socket,同时定义了一些回调函数。那么pcap_activate的作用是啥呢,用源码说话,I love linux ,I love open source。 Int pcap_activate(pcap_t*p) { int status; status = p->activate_op(p); /*activate_op是个什么函数呢,搜了下原型是个函数指针,这个函数指针在哪里赋值呢,搜源码吧,呵呵。终于在pcap-linux.c下搜到了它的初始化赋值,哈哈,原来就是 handle->activate_op= pcap_activate_linux; 明白了在pcap_create中定义的pcap_activate_linux函数中赋值的回调函数activate_op终于在这里调用了,其实pcap_create只赋值定义这个回调函数,而调用就是在这里了。前面分析的一切到现在才调用,呵呵,明白了吗? */ if (status >= 0) //pcap_activate_linux的返回值>=0表示成功 p->activated = 1; else { if (p->errbuf[0] == '\0') { /* * No error message supplied by the activateroutine; * for the benefit of programs that don'tspecially * handle errors other than PCAP_ERROR,return the * error message corresponding to the status. */ snprintf(p->errbuf,PCAP_ERRBUF_SIZE, "%s", pcap_statustostr(status)); } /* * Undo any operation pointer setting, etc.done by * the activate operation. */ initialize_ops(p); } return (status); } Pcap_open_live终于分析完了,我也要去吃晚饭了,下面还有好多要分析呢,排个队吧,首先分析pcap_next等函数吧,socket已经建立和绑定了,也是该捕获数据的时候了,呵呵,捕获数据的回调函数也已经定义了,就是那个pcap_read_linux函数,即pcap_read_packet函数了,我现在猜想,pcap_open_live中肯定会调用这个回调函数的,咋们走着瞧。先吃饭,人是铁,饭是刚,一顿不吃饿的慌。稍后见。。。。。。。。