Programming with pcap

小弟由于近来涉及包嗅探的相关应用,故查阅了一些资料,下面这篇可谓入门级的经典之作,将它翻译如下,本人水平所限,错误在所难免,仅供参考.

Programming with pcap


Tim Carstens
timcarst at yahoo dot com


The latest version of this document can be found at http://broker.dhs.org/pcap.htm


Ok, lets begin by defining who this document is written for.  Obviously, some basic knowledge of C is required, unless you only wish to know the basic theory.  You do not need to be a code ninja; for the areas likely to be understood only by more experienced programmers, I'll be sure to describe concepts in greater detail.  Additionally, some basic understanding of networking might help, given that this is a packet sniffer and all.  All of the code examples presented here have been tested on FreeBSD 4.3 with a default kernel.
Getting Started: The format of a pcap application
The first thing to understand is the general layout of a pcap sniffer.  The flow of code is as follows:
1. We begin by determining which interface we want to sniff on.  In Linux this may be something like eth0, in BSD it may be xl1, etc.  We can either define this device in a string, or we can as pcap to provide us with the name of an interface that will do the job.
2. Initialize pcap.  This is where we actually tell pcap what device we are sniffing on.  We can, if we want to, sniff on multiple devices.  How do we differentiate between them?  Using file handles.  Just like opening a file for reading or writing, we must name our sniffing "session" so we can tell it apart from other such sessions.
3. In the event that we only want to sniff specific traffic (e.g.: only TCP/IP packets, only packets going to port 23, etc) we must create a rule set, "compile" it, and apply it.  This is a three phase process, all of which is closely related.  The rule set is kept in a string, and is converted into a format that pcap can read (hence compiling it.)  The compilation is actually just done by calling a function within our program; it does not involve the use of an external application.  Then we tell pcap to apply it to whichever session we wish for it to filter.
4. Finally, we tell pcap to enter it's primary execution loop.  In this state, pcap waits until it has received however many packets we want it to.  Every time it gets a new packet in, it calls another function that we have already defined.  The function that it calls can do anything we want; it can dissect the packet and print it to the user, it can save it in a file, or it can do nothing at all.
5. After our sniffing needs are satisfied, we close our session and are complete.
This is actually a very simple process.  Five steps total, one of which is optional (step 3, incase you were wondering.)  Why don't we take a look at each of the steps and how to implement them.
Setting the devise
This is terribly simple.  There are two techniques for setting the device that we wish to sniff on.
The first is that we can simply have the user tell us.  Consider the following program:
#include
#include
int main(int argc, char *argv[])
{
    char *dev = argv[1];
    printf("Device: %s/n", dev);
    return(0);
}
The user specifies the device by passing the name of it as the first argument to the program.  Now the string "dev" holds the name of the interface that we will sniff on in a format that pcap can understand (assuming, of course, the user gave us a real interface).
The other technique is equally simply.  Look at this program:
#include
#include
int main()
{
    char *dev, errbuf[PCAP_ERRBUF_SIZE];
    dev = pcap_lookupdev(errbuf);
    printf("Device: %s/n", dev);
    return(0);
}
In this case, pcap just sets the device on its own.  "But wait Tim," you say.  "What is the deal with the errbuf string?"  Most of the pcap commands allow us to pass them a string as an argument.  The purpose of this string?  In the event that the command fails, it will populate the string with a description of the error.  In this case, if pcap_lookupdev() fails, it will store an error message in errbuf.  Nifty, isn't it?  And that's how we set our device.
Opening the device for sniffing
The task of creating a sniffing session is really quite simple.  For this, we use pcap_open_live().  The prototype of this function (from the pcap man page) is as follows:
pcap_t *pcap_open_live(char *device, int snaplen, int promisc, int to_ms, char *ebuf)
The first argument is the device that we specified in the previous section.  snaplen is an integer which defines the maximum number of bytes to be captured by pcap.  promisc, when set to true, brings the interface into promiscuous mode (however, even if it is set to false, it is possible under specific cases for the interface to be in promiscuous mode, anyway).  to_ms is the read time out in milliseconds (a value of 0 sniffs until an error occurs; -1 sniffs indefinitely).  Lastly, ebuf is a string we can store any error messages within (as we did above with errbuf).  The function returns our session handler.
To demonstrate, consider this code snippet:
    #include
    ...
    pcap_t *handle;
    handle = pcap_open_live(somedev, BUFSIZ, 1, 0, errbuf);
This code fragment opens the devise stored in the strong "somedev", tells it to read however many bytes are specified in BUFSIZ (which is defined in pcap.h).  We are telling it to put the device into promiscuous mode, to sniff until an error occurs, and if there is an error, store it in the string errbuf.
A note about promiscuous vs. non-promiscuous sniffing:  The two techniques are very different in style.  In standard, non-promiscuous sniffing, a host is sniffing only traffic that is directly related to it.  Only traffic to, from, or routed through the host will be picked up by the sniffer.  Promiscuous mode, on the other hand, sniffs all traffic on the wire.  In a non-switched environment, this could be all network traffic.  The obvious advantage to this is that it provides more packets for sniffing, which may or may not be helpful depending on the reason you are sniffing the network.  However, there are regressions.  Promiscuous mode sniffing is detectable; a host can test with strong reliability to determine if another host is doing promiscuous sniffing.  Second, it only works in a non-switched environment (such as a hub, or a switch that is being ARP flooded).  Third, on high traffic networks, the host can become quite taxed for system resources.
Filtering traffic
Often times our sniffer may only be interested in specific traffic.  For instance, there may be times when all we want is to sniff on port 23 (telnet) in search of passwords.  Or perhaps we want to highjack a file being sent over port 21 (FTP).  Maybe we only want DNS traffic (port 53 UDP). Whatever the case, rarely do we just want to blindly sniff all network traffic.  Enter pcap_compile() and pcap_setfilter().
The process is quite simple.  After we have already called pcap_open_live() and have a working sniffing session, we can apply our filter.  Why not just use our own if/else if statements?  Two reasons.  First, pcap's filter is far more efficient, because it does it directly with the BPF filter; we eliminate numerous steps by having the BPF driver do it directly.  Second, this is a lot easier :)
Before applying our filter, we must "compile" it.  The filter expression is kept in a regular string (char array).  The syntax is documented quite well in the man page for tcpdump; I leave you to read it on your own.  However, we will use simple test expressions, so perhaps you are sharp enough to figure it out from my examples.
To compile the program we call pcap_compile().  The prototype defines it as:
int pcap_compile(pcap_t *p, struct bpf_program *fp, char *str, int optimize, bpf_u_int32 netmask)
The first argument is our session handle (pcap_t *handle in our previous example).  Following that is a reference to the place we will store the compiled version of our filter.  Then comes the expression itself, in regular string format.  Next is an integer that decides if the expression should be "optimized" or not (0 is false, 1 is true.  Standard stuff.)  Finally, we must specify the net mask of the network the filter applies to.  The function returns -1 on failure; all other values imply success.
After the expression has been compiled, it is time to apply it.  Enter pcap_setfilter().  Following our format of explaining pcap, we shall look at the pcap_setfilter() prototype:
int pcap_setfilter(pcap_t *p, struct bpf_program *fp)
This is very strait forward.  The first argument is our session handler, the second is a reference to the compiled version of the expression (presumably the same variable as the second argument to pcap_compile()).
Perhaps another code sample would help to better understand:
    #include
    ...
    pcap_t *handle;                           /* Session handle */
    char dev[] = "rl0";                        /* Device to sniff on */
    char errbuf[PCAP_ERRBUF_SIZE];    /* Error string */
    struct bpf_program filter;            /* The compiled filter expression */
    char filter_app[] = "port 23";          /* The filter expression */
    bpf_u_int32 mask;                 /* The netmask of our sniffing device */
    bpf_u_int32 net;                      /* The IP of our sniffing device */
    pcap_lookupnet(dev, &net, &mask, errbuf);
    handle = pcap_open_live(dev, BUFSIZ, 1, 0, errbuf);
    pcap_compile(handle, &filter, filter_app, 0, net);
    pcap_setfilter(handle, &filter);
This program preps the sniffer to sniff all traffic coming from or going to port 23, in promiscuous mode, on the device rl0.
You may notice that the previous example contains a function that we have not yet discussed.  pcap_lookupnet() is a function that, given the name of a device, returns its IP and net mask.  This was essential because we needed to know the net mask in order to apply the filter.  This function is described in the Miscellaneous section at the end of the document.
It has been my experience that this filter does not work across all operating systems.  In my test environment, I found that OpenBSD 2.9 with a default kernel does support this type of filter, but FreeBSD 4.3 with a default kernel does not.  Your mileage may vary.
The actual sniffing
At this point we have learned how to define a device, prepare it for sniffing, and apply filters about what we should and should not sniff for.  Now it is time to actually capture some packets.
There are two main techniques for capturing packets.  We can either capture a single packet at a time, or we can enter a loop that waits for n number of packets to be sniffed before being done.  We will begin by looking at how to capture a single packet, then look at methods of using loops.  For this we use pcap_next().
The prototype for pcap_next() fairly simple:
u_char *pcap_next(pcap_t *p, struct pcap_pkthdr *h)
The first argument is our session handler.  The second argument is a pointer to a structure that holds general information about the packet, specifically the time in which it was sniffed, the length of this packet, and the length of his specific portion (incase it is fragmented, for example.)  pcap_next() returns a u_char pointer to the packet that is described by this structure.  We'll discuss the technique for actually reading the packet itself later.
Here is a simple demonstration of using pcap_next() to sniff a packet.
    #include
    #include
    int main()
    {
        pcap_t *handle;                        /* Session handle */
        char *dev;                                /* The device to sniff on */
        char errbuf[PCAP_ERRBUF_SIZE]; /* Error string */
        struct bpf_program filter;            /* The compiled filter */
        char filter_app[] = "port 23";       /* The filter expression */
        bpf_u_int32 mask;                     /* Our netmask */
        bpf_u_int32 net;                        /* Our IP */
        struct pcap_pkthdr header;       /* The header that pcap gives us */
        const u_char *packet;               /* The actual packet */
        /* Define the device */
        dev = pcap_lookupdev(errbuf);
        /* Find the properties for the device */
        pcap_lookupnet(dev, &net, &mask, errbuf);
        /* Open the session in promiscuous mode */
        handle = pcap_open_live(dev, BUFSIZ, 1, 0, errbuf);
        /* Compile and apply the filter */
        pcap_compile(handle, &filter, filter_app, 0, net);
        pcap_setfilter(handle, &filter);
        /* Grab a packet */
        packet = pcap_next(handle, &header);
        /* Print its length */
        printf("Jacked a packet with length of [%d]/n", header.len);
        /* And close the session */
        pcap_close(handle);
        return(0);
    }
This application sniffs on whatever device is returned by pcap_lookupdev() by putting it into promiscuous mode.  It finds the first packet to come across port 23 (telnet) and tells the user the size of the packet (in bytes).  Again, this program includes a new call, pcap_close(), which we will discuss later (although it really is quite self explanatory).
The other technique we can use is more complicated, and probably more useful.  Few sniffers (if any) actually use pcap_next().  More often than not, they use pcap_loop() or pcap_dispatch() (which then themselves use pcap_loop()).  To understand the use of these two functions, you must understand the idea of a callback function.
Callback functions are not anything new, and are very common in many API's.  The concept behind a callback function is fairly simple.  Suppose I have a program that is waiting for an event of some sort.  For the purpose of this example, lets pretend that my program wants a user to press a key on the keyboard.  Every time they press a key, I want to call a function which then will determine that to do.  The function I am utilizing is a callback function.  Every time the user presses a key, my program will call the callback function.  Callbacks are used in pcap, but instead of being called when a user presses a key, they are called when pcap sniffs a packet.  The two functions that one can use to define their callback is pcap_loop() and pcap_dispatch().  pcap_loop() and pcap_dispatch() are very similar in their usage of callbacks.  Both of them call a callback function every time a packet is sniffed that meets our filter requirements (if any filter exists, of course.  If not, then all packets that are sniffed are sent to the callback.)
The prototype for pcap_loop() is below:
int pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user)
The first argument is our session handle.  Following that is an integer that tells pcap_loop() how many packets it should sniff for before returning (a negative value means it should sniff until an error occurs).  The third argument is the name of the callback function (just it's identifier, no parenthesizes).  The last argument is useful in some applications, but many times is simply set as NULL.  Suppose we have arguments of our own that we wish to send to our callback function, in addition to the arguments that pcap_loop() sends.  This is where we do it.  Obviously, you must typecast to a u_char pointer to ensure the results make it there correctly; as we will see later, pcap makes use of some very interesting means of passing information in the form of a u_char pointer.  After we show an example of how pcap does it, it should be obvious how to do it here.  If not, consult your local C reference text, as an explanation of pointers is beyond the scope of this document.  pcap_dispatch() is almost identical in usage.  The only difference between pcap_dispatch() and pcap_loop() is in how they handle timeouts (remember how you could set a timeout when you called pcap_open_live()?  This is where it comes into play.  pcap_loop() ignores the timeout while pcap_dispatch() does not.  For a more in depth discussion of their differences, see the pcap man page.
Before we can provide an example of using pcap_loop(), we must examine the format of our callback function.  We cannot arbitrarily define our callback's prototype; otherwise, pcap_loop() would not know how to use the function.  So we use this format as the prototype for our callback function:
void got_packet(u_char *args, const struct pcap_pkthdr *header, const u_char *packet);
Lets examine this in more detail.  First, you'll notice that the function has a void return type.  This is logical, because pcap_loop() wouldn't know how to handle a return value anyway.  The first argument corresponds to the last argument of pcap_loop().  Whatever value is passed as the last argument to pcap_loop() is passed to the first argument of our callback function every time the function is called.  The second argument is the pcap header, which contains information about when the packet was sniffed, how large it is, etc.  The pcap_pkthdr structure is defined in pcap.h as:
struct pcap_pkthdr {
    struct timeval ts; /* time stamp */
    bpf_u_int32 caplen; /* length of portion present */
    bpf_u_int32 len; /* length this packet (off wire) */
};
These values should be fairly self explanatory.  The last argument is the most interesting of them all, and the most confusing to the average novice pcap programmer.  It is another pointer to a u_char, and it contains the entire packet, as sniffed by pcap_loop().
But how do you make use of this variable (named "packet" in our prototype)?  A packet contains many attributes, so as you can imagine, it is not really a string, but actually a collection of structures (for instance, a TCP/IP packet would have an Ethernet header, an IP header, a TCP header, and lastly, the packet's payload).  This u_char is the serialized version of these structures.  To make any use of it, we must do some interesting typecasting.
First, we must have the actual structures define before we can typecast to them.  The following is the structure definitions that I use to describe a TCP/IP packet over Ethernet.  All three definitions that I use are taken directly out of the POSIX libraries.  Normally I would have simply just used the definitions in those libraries, but it has been my experience that the libraries vary slightly from platform to platform, making it complicated to implement them quickly.  So for demonstration purposes we will just avoid that mess and simply copy the relevant structures.  All of these, incidentally, can be found in include/netinet on your local Unix system.  Here are the structures:
/* Ethernet header */
struct sniff_ethernet {
    u_char ether_dhost[ETHER_ADDR_LEN]; /* Destination host address */
    u_char ether_shost[ETHER_ADDR_LEN]; /* Source host address */
    u_short ether_type; /* IP? ARP? RARP? etc */
};
/* IP header */
struct sniff_ip {
    #if BYTE_ORDER == LITTLE_ENDIAN
    u_int ip_hl:4, /* header length */
    ip_v:4; /* version */
    #if BYTE_ORDER == BIG_ENDIAN
    u_int ip_v:4, /* version */
    ip_hl:4; /* header length */
    #endif
    #endif /* not _IP_VHL */
    u_char ip_tos; /* type of service */
    u_short ip_len; /* total length */
    u_short ip_id; /* identification */
    u_short ip_off; /* fragment offset field */
    #define IP_RF 0x8000 /* reserved fragment flag */
    #define IP_DF 0x4000 /* dont fragment flag */
    #define IP_MF 0x2000 /* more fragments flag */
    #define IP_OFFMASK 0x1fff /* mask for fragmenting bits */
    u_char ip_ttl; /* time to live */
    u_char ip_p; /* protocol */
    u_short ip_sum; /* checksum */
    struct in_addr ip_src,ip_dst; /* source and dest address */
};
/* TCP header */
struct sniff_tcp {
    u_short th_sport; /* source port */
    u_short th_dport; /* destination port */
    tcp_seq th_seq; /* sequence number */
    tcp_seq th_ack; /* acknowledgement number */
    #if BYTE_ORDER == LITTLE_ENDIAN
    u_int th_x2:4, /* (unused) */
    th_off:4; /* data offset */
    #endif
    #if BYTE_ORDER == BIG_ENDIAN
    u_int th_off:4, /* data offset */
    th_x2:4; /* (unused) */
    #endif
    u_char th_flags;
    #define TH_FIN 0x01
    #define TH_SYN 0x02
    #define TH_RST 0x04
    #define TH_PUSH 0x08
    #define TH_ACK 0x10
    #define TH_URG 0x20
    #define TH_ECE 0x40
    #define TH_CWR 0x80
   #define TH_FLAGS (TH_FIN|TH_SYN|TH_RST|TH_ACK|TH_URG|TH_ECE|TH_CWR)
    u_short th_win; /* window */
    u_short th_sum; /* checksum */
    u_short th_urp; /* urgent pointer */
};
Note: On my Slackware Linux 8 box (stock kernel 2.2.19) I found that code using the above structures would not compile.  The problem, as it turns out, was in include/features.h, which implements a POSIX interface unless _BSD_SOURCE is defined.  If it was not defined, then I had to use a different structure definition for the TCP header.  The more universal solution, that does not prevent the code from working on FreeBSD or OpenBSD (where it had previously worked fine), is simply to do the following:
#define _BSD_SOURCE 1
prior to including any of your header files.  This will ensure that a BSD style API is being used.  Again, if you don't wish to do this, then you can simply use the alternative TCP header structure, which I've linked to here , along with some quick notes about using it.
So how does all of this relate to pcap and our mysterious u_char?  Well, as luck would have it, pcap uses the exact same structures when sniffing packets.  Then they simply create a u_char string and stuff the structures into it.  So how can we break it apart?  Be prepared to witness one of the most practical uses of pointers (for all of those new C programmers who insist that pointers are useless, I smite you).
Again, we're going to assume that we are dealing with a TCP/IP packet over Ethernet.  This same technique applies to any packet; the only difference is the structure types that you actually use.  So lets begin by declaring the variables we will need to deconstruct the packet u_char.
const struct sniff_ethernet *ethernet; /* The ethernet header */
const struct sniff_ip *ip; /* The IP header */
const struct sniff_tcp *tcp; /* The TCP header */
const char *payload; /* Packet payload */
/* For readability, we'll make variables for the sizes of each of the structures */
int size_ethernet = sizeof(struct sniff_ethernet);
int size_ip = sizeof(struct sniff_ip);
int size_tcp = sizeof(struct sniff_tcp);
And now we do our magical typecasting:
ethernet = (struct sniff_ethernet*)(packet);
ip = (struct sniff_ip*)(packet + size_ethernet);
tcp = (struct sniff_tcp*)(packet + size_ethernet + size_ip);
payload = (u_char *)(packet + size_ethernet + size_ip + size_tcp);
How does this work?  Consider the layout of the packet u_char in memory.  Basically, all that has happened when pcap stuffed these structures into a u_char is that all of the data contained within them was put in a string, and that string was sent to our callback.  The convenient thing is that, regardless of the values set to these structures, their sizes always remains the same.  On my workstation, for instance, a sniff_ethernet structure has a size of 14 bytes.  a sniff_ip structure is 20 bytes, and likewise a sniff_tcp structure is 20 bytes.  The u_char pointer is really just a variable containing an address in memory.  That's what a pointer is; it points to a location in memory.  For the sake of simplicity, we'll say that the address this pointer is set to is the value X.  Well, if our three structures are just sitting in line, the first of them (sniff_ethernet) being located in memory at the address X, then we can easily find the address of the other structures.  So lets make a chart:
Variable Location (in bytes) 
sniff_Ethernet X 
sniff_ip X + 14 
sniff_tcp X + 14 + 20 
Payload X + 14 + 20 + 20 
The sniff_ethernet structure, being the first in line, is simply at location X.  sniff_ip, who follows directly after sniff_ethernet, is at the location X, plus however much space sniff_ethernet consumes (14 in this example).  sniff_tcp is after both sniff_ip and sniff_ethernet, so it is location at X plus the sizes of sniff_ethernet and sniff_ip (14 and 20 byes, respectively).  Lastly, the payload (which isn't really a structure, just a character string) is located after all of them.
Note: It is important that you not assume your variables will have these sizes.  You should always use the sizeof() function to ensure that your sizes are accurate.  This is because the members of each of these structures can have different sizes on different platforms.
So at this point, we know how to set our callback function, call it, and find out the attributes about the packet that has been sniffed.  It's now the time you have been waiting for: writing a useful packet sniffer.  Because of the length of the source code, I'm not going to include it in the body of this document.  Simply download sniffer.c and try it out.
Wrapping Up
At this point you should be able to write a sniffer using pcap.  You have learned the basic concepts behind opening a pcap session, learning general attributes about it, sniffing packets, applying filters, and using callbacks.  Now it's time to get out there sniff those wires!
Pcap程序设计
Tim Carstens
此文的最近更新见于

好,让我们从看看这篇文章写给谁开始。显而易见的,需要一些C语言基础知识,除非你只想了解基本的理论。你不必是一个编码专家,因为这个领域只有经验丰富的程序员涉足,而我将尽可能详细的描述这些概念。另外,考虑到这是有关一个包嗅探器的,所以对网络基础知识的理解是有帮助的。所有在此出现的代码示例都已在FreeBSD 4.3平台上测试通过。

开始:pcap应用程序的格式

我们所要理解的第一件事情是一个基于pcap的嗅探器程序的总体布局。流程如下:

1. 我们从决定用哪一个接口进行嗅探开始。在Linux中,2. 这可能是eth0,3. 而4. 在BSD系统中则可能是xl1,5. 等等。我们也可以用一个字符串来定义这个设备6. ,7. 或者采用pcap提供的接口名8. 来工作。
9. 初始化pcap。在这里我们要告诉pcap对什么设备10. 进行嗅探。假如愿意的话,11. 我们还可以嗅探多个设备12. 。怎样区分它们呢?使用文件句柄。就像打开一个文件进行读写一样,13. 必须命名14. 我们的嗅探“会话”,15. 以此使它们各自区别开来。
16. 如果我们只想嗅探特定的传输(如TCP/IP包,17. 发往端口23的包,18. 等等),19. 我们必须创建一个规则集合,20. 编译并且使用它。这个过程分为三个相互紧密关联的阶段。规则集合被置于一个字符串内,21. 并且被转换成能被 pcap 读的格式(因此编译它)。编译实际上就是在我们的程序里调用一个不22. 被外部程序使用的函数。接下来我们要告诉pcap使用它来过滤出我们想要的那一个会话。
23. 最后,24. 我们告诉pcap进入它的主体执行循环。在这个阶段内,25. pcap一直工作到它接收了所有我们想要的包为止。每当它收到一个包就调用另一个已经定义好的函数,26. 这个函数可以做我们想要的任何工作,27. 它可以剖析所部获的包并给用户打印出结果,28. 它可以将结果保存为一个文件,29. 或者什么也不30. 作。
31. 在嗅探到所需的数据后,32. 我们要关闭会话并结束。
这是实际上一个很简单的过程。一共五个步骤,其中一个(第3个)是可选的。我们为什么不看一看是怎样实现每一个步骤呢?

设置设备

这是很简单的。有两种方法设置想要嗅探的设备。
第一种,我们可以简单的让用户告诉我们。考察下面的程序:
#include
#include
int main(int argc, char *argv[])
{
    char *dev = argv[1];
    printf("Device: %s/n", dev);
    return(0);
}
用户通过传递给程序的第一个参数来指定设备。字符串“dev”以pcap能“理解”的格式保存了我们要嗅探的接口的名字(当然,用户必须给了我们一个真正存在的接口)。
另一种也是同样的简单。来看这段程序:
#include
#include
int main()
{
    char *dev, errbuf[PCAP_ERRBUF_SIZE];
    dev = pcap_lookupdev(errbuf);
    printf("Device: %s/n", dev);
    return(0);
}
在这个例子里,pcap就自己设置设备。“但是,等一下,Tim”,你会说,“字符串errbuf是做什么的?”大多数的pcap命令允许我们向它们传递字符串作为参数。这个字符串的目的是什么呢?如果命令失败,它将传给这个字符串关于错误的描述。这样,如果pcap_lookupdev()失败,它将在errbuf存储错误信息。很好,是不是?这就是我们怎样去设置设备。

打开设备进行嗅探

创建一个嗅探会话的任务真的非常简单。为此,我们使用pcap_open_live()函数。此函数的原型(根据pcap的手册页)如下:
pcap_t *pcap_open_live(char *device, int snaplen, int promisc, int to_ms, char *ebuf)
其第一个参数是我们在上一节中指定的设备,snaplen是整形的,它定义了将被pcap捕获的最大字节数。当promisc设为true时将置指定接口为混杂模式(然而,当它置为false时接口仍处于混杂模式的特殊情况也是有可能的)。to_ms是读取时的超时值,单位是毫秒(如果为0则一直嗅探直到错误发生,为-1则不确定)。最后,ebuf是一个我们可以存入任何错误信息的字符串(就像上面的errbuf)。此函数返回其会话句柄。

举个例子,考察以下代码片断:
    #include
    ...
    pcap_t *handle;
    handle = pcap_open_live(somedev, BUFSIZ, 1, 0, errbuf);

这个代码片断打开字符串somedev的设备,告诉它读取被BUFSIZ指定的字节数(BUFSIZ在pcap.h里定义)。我们告诉它将设备置为混杂模式,一直嗅探到错误发生,如果有了错误,把它存放在字符串errbuf中。

混杂模式与非混杂模式的区别:这两种方式区别很大。一般来说,非混杂模式的嗅探器中,主机仅嗅探那些跟它直接有关的通信,如发向它的,从它发出的,或经它路由的等都会被嗅探器捕获。而在混杂模式中则嗅探传输线路上的所有通信。在非交换式网络中,这将是整个网络的通信。这样做最明显的优点就是使更多的包被嗅探到,它们因你嗅探网络的原因或者对你有帮助,或者没有。但是,混杂模式是可被探测到的。一个主机可以通过高强度的测试判定另一台主机是否正在进行混杂模式的嗅探。其次,它仅在非交换式的网络环境中有效工作(如集线器,或者交换中的ARP层面)。再次,在高负荷的网络中,主机的系统资源将消耗的非常严重。

过滤通信
通常,我们的嗅探器仅对某特定的通信感兴趣。例如,有时我们想嗅探到端口23(telnet)的包以获得密码;或者我们想截获一个正通过端口21(FTP)传送的文件;可能我们仅想要得到DNS的通信(端口53,UDP)。无论哪种情况,我们都很少盲目的嗅探整个网络的通信。下面讨论pcap_compile()与pcap_setfilter()。
这个过程非常简单。当我们已经调用了pcap_open_live()从而建立了一个嗅探会话之后就可以应用我们自己的过滤器了。为什么要用我们自己的过滤器呢?有两个原因。第一,pcap的过滤器太强大了,因为它直接使用BPF过滤器,我们通过使用BPF驱动直接过滤跳过了很多的关节。第二,这样做要容易的多。

在使用我们自己的过滤器前必须编译它。过滤表达式被保存在一个字符串中(字符数组)。其句法在tcpdump的手册页中被证明非常好。我建议你亲自阅读它。但是我们将使用简单的测试表达式,这样你可能很容易理解我的例子。

我们调用pcap_compile()来编译它,其原型是这样定义的:
int pcap_compile(pcap_t *p, struct bpf_program *fp, char *str, int optimize, bpf_u_int32 netmask)
第一个参数是会话句柄(pcap_t *handle在前一节的示例中)。接下来的是我们存储被编译的过滤器版本的地址的引用。再接下来的则是表达式本身,存储在规定的字符串格式里。再下边是一个定义表达式是否被优化的整形量(0为false,1为true,标准规定)。最后,我们必须指定应用此过滤器的网络掩码。函数返回-1为失败,其他的任何值都表明是成功的。

表达式被编译之后就可以使用了。现在进入pcap_setfilter()。仿照我们介绍pcap的格式,先来看一看pcap_setfilter()的原型:

int pcap_setfilter(pcap_t *p, struct bpf_program *fp)

这非常直观,第一个参数是会话句柄,第二个参数是被编译表达式版本的引用(可推测出它与pcap_compile()的第二个参数相同)。
下面的代码示例可能能使你更好的理解:
    #include
    ...
    pcap_t *handle;                     /* 会话的句柄 */
    char dev[] = "rl0";             /* 执行嗅探的设备 */
    char errbuf[PCAP_ERRBUF_SIZE];    /* 存储错误 信息的字符串 */
    struct bpf_program filter;         /*已经编译好的过滤表达式*/
    char filter_app[] = "port 23";     /* 过滤表达式*/
    bpf_u_int32 mask; /* 执行嗅探的设备的网络掩码 */
    bpf_u_int32 net;    /* 执行嗅探的设备的IP地址 */
    pcap_lookupnet(dev, &net, &mask, errbuf);
    handle = pcap_open_live(dev, BUFSIZ, 1, 0, errbuf);
    pcap_compile(handle, &filter, filter_app, 0, net);
    pcap_setfilter(handle, &filter);

这个程序使嗅探器嗅探经由端口23的所有通信,使用混杂模式,设备是rl0。

你可能注意到前面的示例包含一个我们还没提到的函数:pcap_lookupnet(),向这个函数提供设备接口名,它将返回其IP和网络掩码,这是很基本的,因为我们需要知道网络掩码以便应用过滤器。此函数在此文最后的miscellaneous一节里还有描述。

据我的经验,这个过滤器在所有的操作系统下都不会工作。在我的测试环境里,我发现OpenBSD 2.9默认内核支持这种过滤器,但FreeBSD 4.3默认内核则不支持。你的情况可能会有变化。

实际的嗅探

到此为止,我们已经学习了如何定义一个设备,让它准备嗅探,还有应用过滤器使我们嗅谈到什么或者不嗅探到什么。现在到了真正去捕获一些数据包的时候了。有两种手段捕获包。我们可以一次只捕获一个包,也可以进入一个循环,等捕获到多个包再进行处理。我们将先看看怎样去捕获单个包,然后再看看使用循环的方法。为此,我们使用函数pcap_next()。
Pcap_next()的原型及其简单:
u_char *pcap_next(pcap_t *p, struct pcap_pkthdr *h)
第一个参数是会话句柄,第二个参数是指向一个包括了当前数据包总体信息(被捕获时的时间,包的长度,其被指定的部分长度)的结构体的指针(在这里只有一个片断,只作为一个示例)。Pcap_next()返回一个u_char指针给被这个结构体描述的包。我们将稍后讨论这种实际读取包本身的手段。

这里有一个演示怎样使用pcap_next()来嗅探一个包的例子:
    #include
    #include
    int main()
    {
        pcap_t *handle;                 /* 会话句柄 */
        char *dev;                   /* 执行嗅探的设备 */
        char errbuf[PCAP_ERRBUF_SIZE]; /* 存储错误信息的字符串 */
        struct bpf_program filter;            /* 已经编译好的过滤器 */
        char filter_app[] = "port 23";        /* 过滤表达式 */
        bpf_u_int32 mask;         /* 所在网络的掩码 */
        bpf_u_int32 net;             /* 主机的IP地址 */
        struct pcap_pkthdr header;          /* 由pcap.h定义 */
        const u_char *packet;           /* 实际的包 */
        /* Define the device */
        dev = pcap_lookupdev(errbuf);
        /* 探查设备属性 */
        pcap_lookupnet(dev, &net, &mask, errbuf);
        /* 以混杂模式打开会话 */
        handle = pcap_open_live(dev, BUFSIZ, 1, 0, errbuf);
        /* 编译并应用过滤器 */
        pcap_compile(handle, &filter, filter_app, 0, net);
        pcap_setfilter(handle, &filter);
        /* 截获一个包 */
        packet = pcap_next(handle, &header);
        /* 打印它的长度 */
        printf("Jacked a packet with length of [%d]/n", header.len);
        /* 关闭会话 */
        pcap_close(handle);
        return(0);
    }
这个程序嗅探被pcap_lookupdev()返回的设备并将它置为混杂模式。它发现第一个包经过端口23(telnet)并且告诉用户此包的大小(以字节为单位)。这个程序又包含了一个新的调用pcap_close(),我们将在后面讨论(尽管它的名字就足够证明它自己的作用)。
我们可以使用的另一种手段则要复杂的多,并且可能也更为有用。很少有(如果有的话)嗅探器真正的使用pcap_next()。通常,它们使用pcap_loop()或者pcap_dispatch()(它就是用了pcap_loop())。为了理解这两个函数的用法,你必须理解回调函数的思想。

回调函数并不是什么新东西,它在许多API里面非常普遍。回调函数的概念极其简单。设想我有一个程序正等待某种排序的事件。为了达到这个例子的目的,让我们假象我的程序想让用户在键盘上按下一个键,每当他们按下了一个键,我就想调用一个作相应处理的函数。我所用的函数就是一个回调函数。用户每按一个键一次,我的程序就调用回调函数一次。回调函数在应用在pcap里,取代当用户按下键时被调用的函数的是当pcap嗅探到一个数据包时所调用的函数。可以定义它们的回调函数的两个函数就是pcap_loop()和pcap_dispatch()。此二者在它们的回调函数的使用上非常的相似。它们都是每当捕获到一个符合我们过滤器的包时调用器回调函数(当然是存在一个过滤器时,如果不存在则所有被嗅探到的包都被送到会调函数处理)。

Pcap_loop()的原型如下:

int pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user)

第一个参数是会话句柄,接下来是一个整型,它告诉pcap_loop()在返回前应捕获多少个数据包(若为负值则表示应该一直工作直至错误发生)。第三个参数是回调函数的名称(正像其标识符所指,无括号)。最后一个参数在有些应用里有用,但更多时候则置为NULL。假设我们有我们自己的想送往回调函数的参数,另外还有pcap_loop()发送的参数,这就需要用到它。很明显,必须是一个u_char类型的指针以确保结果正确;正像我们稍后见到的,pcap使用了很有意思的方法以u_char指针的形势传递信息。在我们展示了一个pcap是怎样做的例子之后就很容易去做了。若是还不行就参考你的本地的C引用文本,作为一个指针的解释那就超出了本文的范围。 Pcap_dispatch()的用法几乎相同。 唯一不同的是它们如何处理超时(还记得在调用pcap_open_live()时怎样设置超时吗?这就是它起作用的地方)。Pcap_loop()忽略超时而pcap_dispatch()则不。关于它们之间区别的更深入的讨论请参见pcap的手册页。

在提供使用pcap_loop()的示例之前,我们必须检查我们的回调函数的格式。我们不能武断的定义回调函数的原型,否则pcap_loop()将会不知道如何去使用它。因此我们使用这样的格式作为我们的回调函数的原型:
void got_packet(u_char *args, const struct pcap_pkthdr *header, const u_char *packet);
让我们更细致的考察它。首先,你会注意到该函数返回void类型,这是符合逻辑的,因为pcap_loop()不知道如何去处理一个回调返回值。第一个参数相应于pcap_loop()的最后一个参数。每当回调函数被调用时,无论最后一个参数传给pcap_loop()什么值,这个值都会传给我们回调函数的第一个参数。第二个参数是pcap头文件定义的,它包括数据包被嗅探的时间、大小等信息。结构体pcap_pkhdr在pcap.h中定义如下:
struct pcap_pkthdr {
    struct timeval ts; /* 时间戳 */
    bpf_u_int32 caplen; /* 已捕获部分的长度 */
    bpf_u_int32 len; /* 该包的脱机长度 */
};
这些量都相当明了。最后一个参数在它们中是最有意思的,也最让pcap程序新手感到迷惑。这又是一个u_char指针,它包含了被pcap_loop()嗅探到的所有包。

但是你怎样使用这个我们在原型里称为packet的变量呢?一个数据包包含许多属性,因此你可以想象它不只是一个字符串,而实质上是一个结构体的集合(比如,一个TCP/IP包会有一个以太网的头部,一个IP头部,一个TCP头部,还有此包的有效载荷)。这个u_char就是这些结构体的串联版本。为了使用它,我们必须作一些有趣的匹配工作。

首先,在匹配它们之前必须定义这些实际的结构体。下面就是我用来描述一个通过以太网的TCP/IP包的结构体的定义。我使用的所有这些定义都是直接从POSIX库中提取的。通常,我只简单的使用那些库中的定义即可,但据我的经验不同平台的库之间有轻微的差别,这使得它实现起来变得混乱。因此,为达到示例的目的,我就避免那些混乱而简单的复制这些有关的结构体。所有这些都能在你的本地unix系统中的include/netinet中找到。下面就是这些结构体:
/* 以太网帧头部 */
struct sniff_ethernet {
    u_char ether_dhost[ETHER_ADDR_LEN]; /* 目的主机的地址 */
    u_char ether_shost[ETHER_ADDR_LEN]; /* 源主机的地址 */
    u_short ether_type; /* IP? ARP? RARP? etc */
};
/* IP数据包的头部 */
struct sniff_ip {
    #if BYTE_ORDER == LITTLE_ENDIAN
    u_int ip_hl:4, /* 头部长度 */
    ip_v:4; /* 版本号 */
    #if BYTE_ORDER == BIG_ENDIAN
    u_int ip_v:4, /* 版本号 */
    ip_hl:4; /* 头部长度 */
    #endif
    #endif /* not _IP_VHL */
    u_char ip_tos; /* 服务的类型 */
    u_short ip_len; /* 总长度 */
    u_short ip_id; /*包标志号 */
    u_short ip_off; /* 碎片偏移 */
    #define IP_RF 0x8000 /* 保留的碎片标志 */
    #define IP_DF 0x4000 /* dont fragment flag */
    #define IP_MF 0x2000 /* 多碎片标志*/
    #define IP_OFFMASK 0x1fff /*分段位 */
    u_char ip_ttl; /* 数据包的生存时间 */
    u_char ip_p; /* 所使用的协议 */
    u_short ip_sum; /* 校验和 */
    struct in_addr ip_src,ip_dst; /* 源地址、目的地址*/
};
/* TCP 数据包的头部 */
struct sniff_tcp {
    u_short th_sport; /* 源端口 */
    u_short th_dport; /* 目的端口 */
    tcp_seq th_seq; /* 包序号 */
    tcp_seq th_ack; /* 确认序号 */
    #if BYTE_ORDER == LITTLE_ENDIAN
    u_int th_x2:4, /* 还没有用到 */
    th_off:4; /* 数据偏移 */
    #endif
    #if BYTE_ORDER == BIG_ENDIAN
    u_int th_off:4, /* 数据偏移*/
    th_x2:4; /*还没有用到 */
    #endif
    u_char th_flags;
    #define TH_FIN 0x01
    #define TH_SYN 0x02
    #define TH_RST 0x04
    #define TH_PUSH 0x08
    #define TH_ACK 0x10
    #define TH_URG 0x20
    #define TH_ECE 0x40
    #define TH_CWR 0x80
   #define TH_FLAGS (TH_FIN|TH_SYN|TH_RST|TH_ACK|TH_URG|TH_ECE|TH_CWR)
    u_short th_win; /* TCP滑动窗口 */
    u_short th_sum; /* 头部校验和 */
    u_short th_urp; /* 紧急服务位 */
};

注:在Slackware Linux 8(内核版本2.2.19)上我发现使用以上结构体的代码将不能通过编译。后来证明问题在于include/fearures.h,它只实现了一个POSIX接口,除非定义BSD_SOURCE。如果它没有被定义,我就只能使用一个不同的结构体去定义TCP头部。使它们工作在FreeBSD或OpenBSD系统上的更为通用的解决方法如下:
#define  _BSD_SOURCE  1
事先要包含你自己的所有头文件。这将确保正常使用BSD风格的API。如果不想这样做,那你可以改变TCP头结构(点此链接即可,内含注释)。

那么所有这些与pcap还有神秘的u_char是怎么关联的呢?看,幸运的是pcap嗅探数据包时正是使用的这些结构。接下来,它简单的创建一个u_char字符串并且将这些结构体填入。那么我们怎样才能区分它们呢?准备好见证指针最实用的好处之一吧(在此,我可要刺激刺激那些坚持说指针无用的C程序新手了)。

我们再一次假定要对以太网上的TCP/IP包进行处理。同样的手段可以应用于任何数据包,唯一的区别是你实际所使用的结构体的类型。让我们从声明分解u_char包的变量开始:
const struct sniff_ethernet *ethernet; /* 以太网帧头部*/
const struct sniff_ip *ip; /* IP包头部 */
const struct sniff_tcp *tcp; /* TCP包头部 */
const char *payload; /* 数据包的有效载荷*/
/*为了让它的可读性好,我们计算每个结构体中的变量大小*/
int size_ethernet = sizeof(struct sniff_ethernet);
int size_ip = sizeof(struct sniff_ip);
int size_tcp = sizeof(struct sniff_tcp);
现在我们开始让人感到有些神秘的匹配:
ethernet = (struct sniff_ethernet*)(packet);
ip = (struct sniff_ip*)(packet + size_ethernet);
tcp = (struct sniff_tcp*)(packet + size_ethernet + size_ip);
payload = (u_char *)(packet + size_ethernet + size_ip + size_tcp);

此处如何工作?考虑u_char在内存中的层次。基本的,当pcap将这些结构体填入u_char的时候是将这些数据存入一个字符串中,那个字符串将被送入我们的会调函数中。反向转换是这样的,不考虑这些结构体制中的值,它们的大小将是一致的。例如在我的平台上,一个sniff_ethernet结构体的大小是14字节。一个sniff_ip结构体是20字节,一个sniff_tcp结构体也是20字节。 u_char指针正是包含了内存地址的一个变量,这也是指针的实质,它指向内存的一个区域。简单而言,我们说指针指向的地址为x,如果三个结构体恰好线性排列,第一个(sniff_ethernet)被装载到内存地址的x处则我们很容易的发现其他结构体的地址,让我们以表格显示之:

Variable Location (in bytes) 
sniff_ethernet X 
sniff_ip X + 14 
sniff_tcp X + 14 + 20 
payload X + 14 + 20 + 20 

结构体sniff_ethernet正好在x处,紧接着它的sniff_ip则位于x加上它本身占用的空间(此例为14字节),依此类推可得全部地址。

注意:你没有假定你的变量也是同样大小是很重要的。你应该总是使用sizeof()来确保尺寸的正确。这是因为这些结构体中的每个成员在不同平台下可以有不同的尺寸。

到现在,我们已经知道了怎样设置回调函数,调用它,弄清被嗅探到的数据包的属性。你可能正期待着写出一个可用的包嗅探器。因为代码的长度关系,我不想列在这篇文章里。你可以点击这里下载并测试它。

结束语
到此为止,你应该可以写出一个基于pcap的包嗅探器了。你已经学习了基本的概念:打开一个pcap会话,有关它的全体属性,嗅探数据包,使用过滤器,使用回调函数,等等。现在是进行数据包嗅探的时候了。

你可能感兴趣的:(Programming with pcap)