网络数据包捕捉工具jNetPcap学习笔记(一)

Java平台本身不支持底层网络操作,需要第三方包利用JNI封装不同系统的C 库来提供Java的上层接口。常用的类库包括 JPcap,JNetPcap等,他们都是基于TcpDump/LibPcap的Java封装。
本文对jNetPcap官网的Tutorial教程做了分析,欢迎大家指正。

准备知识

数据封装:

这里写图片描述

  1. 用户信息转换为数据,以便在网络上传输(应用层,会话层,表示层)
  2. 数据转换为数据段,并在发送方和接收方主机之间建立一条可靠的连接。(传输层:数据单元为数据段Segment)
  3. 数据段转换为数据包或数据报,并在报头中放上逻辑地址,这样每一个数据包都可以通过互联网络进行传输。(网络层:数据单元为包Packect)
  4. 数据包或数据报转换为帧,以便在本地网络中传输。在本地网段上,使用硬件地址唯一标识每一台主机。(数据链路层:数据单元为帧Frame)
  5. 帧转换为比特流,并采用数字编码和时钟方案(物理层:比特流)

The transport layer breaks the request into TCP segments, adds some sequence numbers and checksums to the data, and then passes the request to the local internet layer. The internet layer fragments the segments into IP datagrams of the necessary size for the local network and passes them to the host-to-network layer for transmission onto the wire. The host-to-network layer encodes the digital data as analog signals appropriate for the particular physical medium and sends the request out the wire where it will be read by thehost-to-network layer of the remote system to which it s addressed.
—— Java Network Programming

传输层把数据切分成TCP数据段,给每个数据段加上TCP报头(序列号,校验和)然后交给本地网络层。网络层把TCP数据段封装成IP报文,在传递给链路层。链路层传给接收方的链路层,接收方的网络层做简单的检查并验证IP报文是否完好,如果他们被fragmented,就重新组装,再传给接收方的传输层。

TCP数据报文格式:

TCP是面向连接的传输层协议,每一个数据段的传输都需要接收端向发送端确认。我们看到报文中的序号,确认号就是TCP传输过程中保证可靠传输的机制核心。序号是发送端给TCP数据段编的连续的编号,接收端接收到连续序号的数据段就说明没有数据丢失,接收端在确认号那里填入下一次要接收的数据段编号给发送端发送确认报文。

IP数据报文格式
网络数据包捕捉工具jNetPcap学习笔记(一)_第1张图片

这里我们重点关注fragment这个机制。在IP数据包(IP datagram)封装好后交给数据链路层封装成帧的时候,可能会被fragmented,以适应帧的MTU。被fragmented的IP数据段分片携带各自的IP报头,他们的报头内的总长度,标识,标志(MF,DF),片偏移都会发生改变。以便这些被fragmented的IP数据段在发送过去之后还能被拼回来。举个例子。一个IP datagram本来有3820bytes,其中报头部分20bytes。帧的MTU是1420字节。问这个IP datagram要怎么分段,分段后各自的头部信息会怎么变化:

  • 分段:根据帧的MTU对数据部分进行分片:1400byte,1400byte,1000byte
  • 总长度:fragment之前,IP头部的总长度是3820。fragment之后,三段长度分别为1420byte,1420byte,1020byte。
  • 标识:假设fragment之前是0xFF00,fragment之后,三段分片的标识一定要与之前一样。
  • 标志:MF(MORE_FRAGMENT)fragment之前是false,之后前两段为true,最后一段为false。
    DF(DON’T_FRAGMENT)fragment之前是false,之后三段都为true。
  • 片偏移:fragment之前,一定是0,fragment之后三段分片的片偏移分别为0,1400/8,2800/8

相关知识:http://blog.csdn.net/kernel_jim_wu/article/details/7447377

概念梳理

说明:segment、fragment、packect、datagram的概念非常confusing。笔者根据自己对网络知识的掌握,对很多地方结合上下文的意思进行了些许改动。也加入了一些原文中没有但是我认为很有必要的补充。这里从笔者的角度解释下我在译文里所指的fragment、packect、datagram的意思。

  • segment:是传输层中TCP协议的数据单元,中文叫数据段(准确说应该是TCP协议的数据单元)
  • datagram:是传输层中UDP协议的数据单元。但是在IP datagram中,datagram又是ip数据包的意思,等同于packect。
  • fragment:首先需要了解分片机制。IP数据有长有短,标准规定最短不能小于1500bytes,但是当长度大于数据链路层的MTU时,就需要把IP数据包分片接收端的网络层需要通过片偏移恢复被分片的数据包,把这些分片按原有的顺序装载还原。
  • packect:是网络层的数据协议单元,中文叫数据包。在本文中packect指的是对端分片后发送到本地网络层数据单元,datagram指的是本地将接收到的packect装载成完整的数据包。

笔者在此保留自己的看法,同时附上一个来自网络的解释,欢迎就这个问题在评论区讨论。

网络数据包捕捉工具jNetPcap学习笔记(一)_第2张图片

官网 Tutorial 学习

We need to handle the incoming stream of packets. So the first thing we need to setup is a packet handler that will receive packets from libpcap. We’re not going to be concerned with multi-threading issues in this tutorial. So to receive packets our main application class will simply implement the PcapPacketHandler interface. Once we have the packets we will need to check if the packet is Ip4 packet and if its fragmented or not.
我们需要处理传过来的数据。首先要做的是将Libpcap类型的数据做数据包处理。这里先不考虑多线程的需求。我们设计的应用类要接收数据包,所以让它先实现接口PcapPacketHandler,并且检查报文是否为IPv4数据包,是否被分片。

For all Ip4 packets, fragmented or not, we going to stuff them into a reassembly buffer that we are going to use for IP datagram, fragmented or not.
对于所有的IPv4数据包,无论有没有被分片。我们都要把他们存放在封装缓存中以供之后的IP报文使用。

The Ip4 flag NO_MORE_FRAGMENTS is going to give us a clue about when the fragment is complete, but we can’t always rely on that flag. Fragments can arrive out of sequence or even be dropped along the way and never arrive. So we are also going to keep track of how many bytes we have reassembled. When that total matches the length of the entire unfragmented datagram, then we know we have received all fragments and we are done.
IPv4数据包的flag:NO_MORE_FRAGMENTS标志着分片是否结束。但我们不能总是通过这个flag标志位做判断。分片可能会在传输过程中发生乱序或丢失。因此需要追踪我们装载的总字节数。当总数等于报文的整体长度时,就认为所有分片装载完成了。

For those cases where fragment is dropped and never arrives, we are also going to implement a simple timeout mechanism that will timeout each reassembly buffer past certain amount of time.
对于那些传输过程中出现差错的分片,需要实施一个简单的超时等待机制,将后来的装载缓存数据整体得延迟一定时间。

接收端装载IP报文分片的伪代码:

loop {
  Receive packet from libpcap;
  if packet is Ip4 packet then
    get or create reassembly buffer and store in a map;

    calculate offset into the buffer and add fragment

    if the packet is complete then
      remove buffer from map;
      dispatch buffer to user's handler;
    endif

  endif

  timeout buffer entries;
}

User handler {
  receive reassembly buffers;
  create a new IP only packet;
  scan the packet;
  to packet.toString() to get pretty output;
}

1.Reassembly buffer handler:装载缓存处理程序

This is a very important piece of our application therefore we need to plan it out in detail. We’re calling this buffer IpReassemblyBuffer and it extends a JBuffer.
我们创建一个类IpReassemblyBuffer管理相关的装载缓存机制,这个类继承了JBuffer。

We are going to allocate a large JBuffer which will hold our ip header and all the fragments combined. Like so:
要给Buffer分配足够大的空间以存放IP报头和分片。

The buffer is also going to keep track of timeouts. We’re going to set a time value at which time the buffer becomes officially timed out. We will implement a simple isTimedout():boolean method to check for that condition. The method simply compares the timeout timestamp with the current time and if its past its due date, return true.
这个Buffer还需要设置定时,通过重写IpReassemblyBuffer类的构造方法将定时加入时间戳。用 isTimedout():boolean 方法确认传输是否超时。设置超时是为了要求在规定的时间内接收到可以装载恢复成原来完整IP数据包的所有分片。当前时间距离时间戳的时长超过的规定的范围,我们就确认传输超时,返回true。

The buffer needs to keep track of number of bytes already assembled and the total length of the IP datagram. When the 2 are equal, that means the buffer is complete and we can dispatch it to the user. We’re also going to implement as boolean method that checks for this condition isComplete().
Buffer是一个动态的概念,可以把它看做一个正在装载中的IP数据包。Buffer需要实时跟踪已经装载的字节数和IP报文的总长度。二者相等时,装载完成,我们把Buffer分发给指定的进程接口(网络层数据交给传输层处理)。用isComplete():boolean 方法确认装载是否完成。

To keep track of all the buffers, we’re going to use a JRE Map and use a 32 bit int hash we generate from ip fragments ip header using fields, Ip4.id(), Ip4.source(), Ip4.destination(), Ip4.protocol like so:
Buffer不止一个,我们需要使用HashMap容器存放这个Buffer。通过IP头部的一些信息生成一个32bit的hashcode作为Map的key。

int hash = (id() << 16) ^ source() ^ destination() ^ type();

We’re also going to use a PriorityQueue that will prioritize buffers for us based on the timeout timestamp value. Buffers will be ordered according to timeout value. The packets on top of the queue are going to be either timedout or closer to timeout than any other buffer on the queue. This is going to lets us efficiently check packets on the queue, until we reach a packet that is not timedout, at which time we can stop.
使用优先队列处理这些存放在Map中的Buffer,队列是基于每个Buffer携带的时间戳排序的。超时或即将超时的数据包排在前面,我们先调用这些数据包的isTimeout()方法,直到找到第一个没有满足超时的数据包就可以停止了。

The first fragment that we see is the one that creates the buffer for that Ip datagram. At the time of the construction of the buffer, we’re going to use the ip header of that fragment as a template for the IP header we need to insert infront off all the fragments in the buffer. We also need to reset a few fields in the header to match the new packet that we are creating out of the fragments. We need to either recalculate or reset to the header crc, clear the MORE_FRAGMENTS flag, drop any optional headers by resetting the hlen field to 5 and also set the total length field to the new length of our IP datagram.
第一个分片:我们封装IP数据包的分片为IP报头提供了模板。这个过程是在Buffer的构造方法中完成的。所有fragment数据段封装好了之后,将IP报头插到最前面的位置,并修改报头中的几个字段以匹配整个IP数据包的真实情况:CRC校验值,将flag:MORE_FRAGMENTS置为无效,重新计算报文长度,去除未使用的可选字段,修改片偏移量……

The buffer will never be complete unless we receive that last fragment. That last fragment is crucial since it tells us the length of the original IP datagram. If all the fragment arrive in sequence then the last fragment also means that reassembly is complete and we can dispatch to user. Although we could receive fragments out of sequence and still receive a fragment after the last one has been received. Another important thing we need to set, is to change the size of the buffer to match that of the entire datagram. The buffer’s physical size is 8K, our datagrams are probably going to be smaller than that, so there will be some unused space at the end, but the buffer will be strictly bounded to datagram data.
最后一个分片:最后一个分片的结束才算是一个IP数据包装载结束。不论这些分片是否安顺序到达,我们通过报文长度和片偏移量总会在最后一个分片到达时完成装载。通过接收最后一个数据段,Buffer的物理长度时8K,但整个IP报文的长度通常是比8K小的,因此Buffer尾部会有部分空闲的空间。

So in summary. We have a buffer Map and a timeout Queue. The Map keeps track of reassembly buffers for us based on a special hashcode, while the timeout queue uses the priority queue mechanism to sort our buffers and keep buffers that have timed out at the top.
我们的应用类中主要通过HashMap和定时队列完成IP数据包的装载。Map用来存储特定Hashcode下对应的Buffer,定时队列利用Buffer的时间戳进行装载。

2.The user handler:用户处理程序

注:这里的user实际上可以看做应用层的各个进程。对接传输层的逻辑端口。用户处理程序模拟的就是传输层的分发工作。

The user handler is going to receive ip reassembly buffers. These buffers may or may not be complete, but they will always have at least an ip header and 1 fragment.
用户处理程序的任务是接收装载好的IP数据包Buffer,有可能这些数据包装载失败(意味着有分片丢失)。

We will check if the buffer is complete and report an error message if its not. Otherwise we will just create a packet out of it.
通过isComplete():boolean检查IP数据包是否装载成功。

There is no need to copy the data out of the buffer, it already contains everything we need. It is freshly allocated so its our to do as we please. It has an Ip4 header at the start and then all of the reassembled fragments already copied into it.
IP数据包无需再复制,如果是装载成功的话,Buffer里面刚好就是完整的IP数据包。用户处理程序直接对Buffer中的数据进行处理。

We are simply going to peer the a JMemoryPacket with our buffer. Then we are going to run a scan on the packet to decode it, telling the scanner that the first header is Ip4.
通过引入JMemoryPacket类,可以对Buffer中的IP数据包进行数据包层次的分析。告知scanner IP报头的位置就可以直接浏览数据包并对其译码。

3.The packet handler:数据包处理程序

All it needs to do is implement the PcapPacketHandler interface. We are also going to use a static main method that will setup the libpcap portion and register our application as listener to libpcap packets.
实现了 PcapPacketHandler 接口对装载好的数据包进行处理。在静态主方法内,配置libpcap的部分,并将应用注册到对libpcap packets的监听器。

public class IpReassemblyExample implements PcapPacketHandler<object> {  

  public static void main(String[] args) {  
    StringBuilder errbuf = new StringBuilder();  
    //open up a capture file
    Pcap pcap = Pcap.openOffline("tests/test-ipreassembly2.pcap", errbuf);  
    if (pcap == null) {  
      System.err.println(errbuf.toString());  
      return;  
    }  
   //enter a dispatch loop
    pcap.loop(  
    6, // Collect 6 packets  
    new IpReassemblyExample(  
    5 * 1000, // 5 second timeout for reassembly buffer  
    new IpReassemblyBufferHandler() {/*Omitted for now */}),  
    "");  
  }  
}  

Here we are using the static main method as a start for our application. We open up a capture file and enter a dispatch loop. We’re only collecting 6 packets and we are registering out application as the PcapPacketHandler. Our constructor takes a IpReassemblyBufferHandler that will be notified with reassembled buffers. We create an anonymous class for that since we only do very minor work in its handler callback method.
启动libpcap只需要打开一个连接好的capture文件,监听到6个数据包(我认为这里指的是分片)之后将应用注册为PcapPacketHandler,创建一个匿名内部类完成BufferHandler的创建,每得到6个分片,创建一个装载缓存处理器。

我们来分析下pcap.loop函数:

 Pcap.loop(6, new IpReassemblyExample(5 * 1000, new IpReassemblyBufferHandler() {...}), "");  

JavaDoc:

org.jnetpcap.Pcap.loop(int cnt, JBufferHandler handler, T user);
@cnt         count of fragments in the loop
@handler     JBufferHandler is the interface that dispatcher and loop would dispatch to
@user        always be NUll

装载缓存处理器处理程序的构造方法

IpReassemblyExample(int timeOut,JBufferHandler<T> handler){
...
}
@timeOut    timeOut settings for Buffer
@handler    JBufferHandler is the interface that dispatcher and loop would dispatch to

所有接收的IP数据包通过nextPacket方法进行处理:
1. 检查数据包是否是version 4
2. 检查是否完成装载
2.1 bufferFragment() 负责将后续的分片组装成完整的IP数据包
2.2 bufferLastFragment()负责记录整个IP数据包的长度。

public class IpReassemblyExample implements   
PcapPacketHandler<object> {  

  private Ip4 ip = new Ip4(); // Ip4 header  

  public void nextPacket(PcapPacket packet, Object user) {  

    if (packet.hasHeader(ip)) {  
      final int flags = ip.flags();  

      /* 
       * Check if we have an IP fragment 
       */  
      if ((flags & Ip4.FLAG_MORE_FRAGEMNTS) != 0) {  
        bufferFragment(packet, ip);  

        /* 
         * record the last fragment 
         */  
      } else {  
        bufferLastFragment(packet, ip);  
      }  

      /* 
       * Our crude timeout mechanism, should be implemented as a separate thread 
       */  
      timeoutBuffers();  
    }  
  }  
}  

We process each Ip4 packet a little differently depending if the the Ip4.FLAG_MORE_FRAGEMNTS is set. If it is not set that means it is the last fragment, otherwise we received a packet inside fragment. If a packet is not fragmented at all, it only contains a single fragment and is always the last fragment and we treat it as a last segment.
根据flag:MORE_FRAGEMNTS 我们判断接收到的数据包是否是最后一组分片,如果一个IP数据包并没有分片,但是它的flag值为false。说明这个IP数据包没有分片。

We use to methods, bufferFragment() and bufferLastFragment() to record the fragments in the reassembly buffer. The bufferLastFragment() is a little bit special in that it records the length of the entire ip datagram we are reassembling and if all the fragments arrived in sequence it also means we’re done with this buffer.
bufferFragment() 负责将后续的分片组装成完整的IP数据包,bufferLastFragment()负责记录整个IP数据包的长度。

你可能感兴趣的:(Java,SE,学习笔记)