网络抓包,必需从数据链路层开始抓取,至于原因之前在原始套接口中提到过。
现在的sniffer大部分都采用了libpcap,详细文档可以在www.tcpdump.org上面找到。
#include <stdio.h>
#include <pcap.h>
#include <stdlib.h>
#include <string>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <iostream>
#define SIZE_ETHERNET 14 //以太头部长度
using std::string;
using std::cout;
using std::endl;
string info;
static u_int count = 0;
void get_packet(u_char * args, const pcap_pkthdr * header, const u_char * packet){
char from_ip[24], to_ip[24];
const struct sniffer_ethernet * ethernet = (struct sniffer_ethernet *) packet;
//packet记载了包的内容,因为是从数据链路层开始抓取,第一头部当然是以太网(当然不是绝对,但大部分局域网是以太网,如果是其它网络,就用对应的数据结构
//前面我设置的端口是常用的WEB端口80,WEB协议是HTTP,下一层则是TCP,TCP又是构建在IP基础之上,所以这里包的分层解析就很清楚了
const struct ip * ip_header = (struct ip *)(packet + SIZE_ETHERNET);
//越过以太网头部,获取IP头部,struct ip是linux中对IP头部的一个数据结构,如果是windows机器,就请使用对应的数据结构
const struct tcphdr * tcp_header = (struct tcphdr *)(packet + SIZE_ETHERNET + sizeof(struct ip));
//越过以太网和IP头部,获取TCP头部指针,同样道理struct tcphdr是linux下的tcp描述结构,如果是windows机器,换成对应的数据结构
inet_ntop(AF_INET, &ip_header->ip_src, from_ip, sizeof(from_ip));
inet_ntop(AF_INET, &ip_header->ip_dst, to_ip, sizeof(to_ip));
//从ip头部获取目的IP地址和源IP地址
printf("from %s:%d to %s:%d, ip packet len = %d\n", from_ip, ntohs(tcp_header->source), to_ip, ntohs(tcp_header->dest), ip_header->ip_len);
//从TCP头部可以获取目的端口和源端口
printf("tcp syn = %d, ack = %d\n", tcp_header->syn, tcp_header->ack);
char * content = (char *)(packet + SIZE_ETHERNET + sizeof(struct ip) + sizeof(struct tcphdr));
//这里就是TCP包所承载的内容所在,80号端口自然是HTTP内容,这里还包括HTTP头部,但HTTP是应用层协议,linux内核中集成了TCP和UDP协议,并没有关于HTTP描述。不过HTTP比较简单,之后的运行结果可以看到HTTP内容
string content_str(content);
int loc = content_str.find("HTTP", 0);
if(loc != string::npos) { //这个判断后面运行结果会提到
cout << content_str << endl;
}
}
int main(int argc, char ** argv) {
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t * handle;
struct bpf_program fp;
char filter_exp[] = "port 80"; //设置过滤器,过滤器语法,文档上有说明,这里是监听80号端口的包(本地端口80和目的端口80都会被抓取
char * dev = argv[1];
bpf_u_int32 mask;
bpf_u_int32 net;
struct pcap_pkthdr header;
const u_char *packet;
if(pcap_lookupnet(dev, &net, &mask, errbuf) == -1) {
fprintf(stderr, "Can't get netmask for device %s\n", dev);
net = 0;
mask = 0;
}
printf("net = %d, mask = %d\n", net, mask);
//dev = pcap_lookupdev(errbuf);
/*if(dev == NULL) {
fprintf(stderr, "Couldn't find default device: %s\n", errbuf);
exit(2);
}*/
/*
pcap_t *pcap_open_live(char *device, int snaplen, int promisc, int to_ms, char *ebuf)
snaplen is an integer which defines the maximum number of bytes to be captured by pcap.
promisc, whe nset to true, brings the interface into promiscuous mode
to_ms is the read time out in milliseconds (a value of 0 means no time out)
*/
handle = pcap_open_live(dev, BUFSIZ, 1, 100000, errbuf);
if(!handle) {
fprintf(stderr, "Couldn't open device: %s %s\n", dev, errbuf);
exit(1);
}
if(pcap_compile(handle, &fp, filter_exp, 0, net) == -1) {
fprintf(stderr, "Couldn't parse filter %s: %s\n", filter_exp, pcap_geterr(handle));
exit(2);
}
if(pcap_setfilter(handle, &fp) == -1) {
fprintf(stderr, "Couldn't install filter %s: %s\n", filter_exp, pcap_geterr(handle));
exit(2);
}
//packet = pcap_next(handle, &header);
/* Print its length */
//printf("Jacked a packet with length of [%d]\n", header.len);
//pcap_loop(handle, 4, get_packet, NULL);
/*
* pcap_dispatch(pcap_t * p, int cnt, pcap_handler callback, u_char * user)
* p is easy to understand
* cnt told pcap_dispatch how many packets it should sniff before return;
*/
pcap_loop(handle, -1, get_packet, NULL);
pcap_close(handle);
exit(0);
}
结合www.tcpdump.org上的文档,这个程序并不是很难懂。
下面是运行结果,输入./a.out eth0后,打开浏览器(请使用空白页面),输入百度地址后,抓取的包。
difa:/program# ./a.out eth0
from 192.168.1.154:2660 to 119.75.217.56:80, ip packet len = 12288
tcp syn = 1, ack = 0
from 119.75.217.56:80 to 192.168.1.154:2660, ip packet len = 12288
tcp syn = 1, ack = 1
from 192.168.1.154:2660 to 119.75.217.56:80, ip packet len = 10240
tcp syn = 0, ack = 1
from 192.168.1.154:2660 to 119.75.217.56:80, ip packet len = 64514
tcp syn = 0, ack = 1
GET / HTTP/1.1
Host: www.baidu.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: BAIDUID=CDAA16F90ADB78EDFBFC5FEB0D4A620C:FG=1; USERID=c9a484d8b976b5a3e70b5ae36d0f6d; J_MY=1; BDUSS=zNMMnM2cUhSLWFKS2FkTGJqQjdnVEpwODcwTFNQOXZvOFN5QkRMclViV0FPVkpNQUFBQUFBJCQAAAAAAAAAAApBCIorNXYFbWVtb3J5bXlhbm4AAAAAAAAAAAAAAAAAAAAAAAAAAACAy2V7AAAAAAAAAAAAAAAAAU9CAAAAAAAxMC42NS4yNICsKkyArCpMZ; OPENPLATFORM_SP=2b356d656d6f72796d79616e6e7605_1277865106
程序没有退出,结果只是复制粘贴其中一部分,没有全部展出(到后面因为中文乱码显示问题,就没有继续贴出)
./a.out eth0
eth0是网卡,如果想寻找默认的物理设备,pcap中提供了API,可以自己在文档中找到。
net = 108736, mask = 16777215
对应的IP地址和子网掩码,没有换成点分十进制类型。
from 192.168.1.154:2660 to 119.75.217.56:80, ip packet len = 12288
tcp syn = 1, ack = 0
from 119.75.217.56:80 to 192.168.1.154:2660, ip packet len = 12288
tcp syn = 1, ack = 1
from 192.168.1.154:2660 to 119.75.217.56:80, ip packet len = 10240
tcp syn = 0, ack = 1
TCP的三路握手,简历链接的过程,从ip头部的SYN和ACK字段可以明显看出。HTTP是构建于TCP之上,所以之前三个包是建立连接所用。
from 192.168.1.154:2660 to 119.75.217.56:80, ip packet len = 64514
tcp syn = 0, ack = 1
GET / HTTP/1.1
Host: www.baidu.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: BAIDUID=CDAA16F90ADB78EDFBFC5FEB0D4A620C:FG=1; USERID=c9a484d8b976b5a3e70b5ae36d0f6d; J_MY=1; BDUSS=zNMMnM2cUhSLWFKS2FkTGJqQjdnVEpwODcwTFNQOXZvOFN5QkRMclViV0FPVkpNQUFBQUFBJCQAAAAAAAAAAApBCIorNXYFbWVtb3J5bXlhbm4AAAAAAAAAAAAAAAAAAAAAAAAAAACAy2V7AAAAAAAAAAAAAAAAAU9CAAAAAAAxMC42NS4yNICsKkyArCpMZ; OPENPLATFORM_SP=2b356d656d6f72796d79616e6e7605_1277865106
这个就很容易理解了,链接建立完毕,浏览器向服务器发送HTTP的请求,内容就是HTTP请求头部。
一个小问题,我的程序是部署在linux虚拟机中,最初的API设置为混杂模式,之后打开的浏览器是WINDOW下面的firefox,所以user-agent属性里面看到的是windows
注:抓包程序一般都是从数据链路层着手,操作系统基本上都提供了对应的系统调用。libpcap并不是操作系统提供,是第三方工具,且它是跨平台的,无论是linux或者windows都可以使用,当然前提是你安装了libpcap。