netfilter对子连接处理

本文档的Copyleft归yfydz所有,使用GPL发布,可以自由拷贝,转载,转载时请保持文档的完整性,严禁用于任何商业用途。
msn: [email protected]
来源:http://yfydz.cublog.cn

1. 前言
在多连接协议中,如FTP、H.323等,主连接进行控制信息的交互,在子连接中进行实际的通信数据交互,子连接的端口通常是动态打开的,数据传输完成后关闭,子连接的信息(协议、地址、端口等)在主连接数据中。对于防火墙来说,要能够识别主连接中描述子连接的这些信息,再为未来的子连接动态打开相应的端口。netfilter将未来的子连接称为期待的连接,子连接数据到达后,建立起来后的子连接就和普通连接一样了,只是连接类型被称为相关 RELATED的,子连接也就称为确认了的连接。
本文所用linux内核代码版本2.4.26。

2. 数据结构和操作函数

2.1 期待子连接链表

所有期待子连接都放在一个链表中,就没有用HASH处理:
LIST_HEAD(ip_conntrack_expect_list);  // net/ipv4/netfilter/ip_conntrack_core.c

2.2 期待子连接结构

struct ip_conntrack_expect
{
 /* Internal linked list (global expectation list) */
 struct list_head list;
 /* reference count */
 atomic_t use;
 /* expectation list for this master */
 struct list_head expected_list;
 /* The conntrack of the master connection */
 struct ip_conntrack *expectant;
 /* The conntrack of the sibling connection, set after
  * expectation arrived */
 struct ip_conntrack *sibling;
 /* Tuple saved for conntrack */
 struct ip_conntrack_tuple ct_tuple;
 /* Timer function; deletes the expectation. */
 struct timer_list timeout;
 /* Data filled out by the conntrack helpers follow: */
 /* We expect this tuple, with the following mask */
 struct ip_conntrack_tuple tuple, mask;
 /* Function to call after setup and insertion */
 int (*expectfn)(struct ip_conntrack *new);
 /* At which sequence number did this expectation occur */
 u_int32_t seq;
 
 union ip_conntrack_expect_proto proto;
 union ip_conntrack_expect_help help;
};
结构成员说明:
struct list_head list:全局期待子连接链表,即ip_conntrack_expect_list
atomic_t use:使用参考计数
struct list_head expected_list:主连接的子连接链表
struct ip_conntrack *expectant:主连接
struct ip_conntrack *sibling:实际的子连接信息
struct ip_conntrack_tuple ct_tuple:主连接的tuple信息
struct timer_list timeout:期待子连接的定时器
以此为界,结构以下的参数是多连接协议跟踪辅助函数中需要进行填写的:
struct ip_conntrack_tuple tuple, mask:期待子连接的tuple信息,mask表示范围,如一般情况下只知道子连接的目的端口,源端口是随意的;
int (*expectfn)(struct ip_conntrack *new):建立子连接后子连接内部处理函数,用于处理子连接的子连接,如H.323;普通只有一级子连接时置为NULL;
u_int32_t seq:主连接中描述期待子连接信息的序列号,针对TCP协议
 
union ip_conntrack_expect_proto proto:子连接相关协议信息
union ip_conntrack_expect_help help:子连接相关协议辅助信息,如主连接描述子连接信息的序列号,长度等,可供以后进行NAT修改信息时使用;

2.3 期待连接的相关处理函数

比较一个新的连接是否是期待的连接:
static inline int expect_cmp(const struct ip_conntrack_expect *i,
        const struct ip_conntrack_tuple *tuple)

释放期待连接
static void
destroy_expect(struct ip_conntrack_expect *exp)

检测子连接是否还在用,已经不使用时释放
inline void ip_conntrack_expect_put(struct ip_conntrack_expect *exp)

根据连接信息查找期待连接
struct ip_conntrack_expect *
ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple)
static inline struct ip_conntrack_expect *
__ip_ct_expect_find(const struct ip_conntrack_tuple *tuple)
 
/* remove one specific expecatation from all lists, drop refcount
 * and expire timer.
 * This function can _NOT_ be called for confirmed expects! */
static void unexpect_related(struct ip_conntrack_expect *expect)
/* remove one specific expectation from all lists and drop refcount,
 * does _NOT_ delete the timer. */
static void __unexpect_related(struct ip_conntrack_expect *expect)
删除与某主连接相关的所有未确认的期待连接
static void remove_expectations(struct ip_conntrack *ct, int drop_refcount)
建立与主连接相关的期待子连接
int ip_conntrack_expect_related(struct ip_conntrack *related_to,
    struct ip_conntrack_expect *expect)
 
3. 流程
 
3.1 基本流程
 
保含子连接信息的主连接数据包进入netfilter时,先进行连接跟踪,在ip_conntrack_in()函数中判断该包是否属于多连接协议有相应的跟踪辅助函数,有的话调用该辅助函数建立期待子连接的信息;如果需要进行NAT,在进入NAT表处理的do_bindings()函数,调用 NAT的辅助函数来修改主连接中的地址端口信息,相应修改期待的子连接信息。在定时时间内子连接数据包到达后,和普通连接一样建立连接信息,状态标志为 RELATED的,在NAT环境下通过协议NAT辅助模块的expect()函数建立子连接本身的NAT处理信息。
 
3.2 分析期待的子连接信息
 
期待的子连接的建立是在各个多连接协议跟踪模块中实现的,也就是各个协议连接跟踪的help函数中实现的,help()函数在ip_conntrack_in()函数中调用的:
/* net/ipv4/netfilter/ip_conntrack_core.c */
...
  ret = ct->helper->help((*pskb)->nh.iph, (*pskb)->len,
           ct, ctinfo);
...

在help()函数中,解析应用层协议数据信息而得出子连接信息,以FTP协议为例,主要是分析FTP主连接中的PORT、EPRT命令、227、229回应信息得到子连接的地址和端口信息的,分析之后根据其信息填写struct ip_conntrack_expect结构信息:
/* net/ipv4/netfilter/ip_conntrack_ftp.c */
......
 memset(&expect, 0, sizeof(expect));
 /* Update the ftp info */
 LOCK_BH(&ip_ftp_lock);
 if (htonl((array[0] << 24) | (array[1] << 16) | (array[2] << 8) | array[3])
     == ct->tuplehash[dir].tuple.src.ip) {
  exp->seq = ntohl(tcph->seq) + matchoff;
  exp_ftp_info->len = matchlen;
  exp_ftp_info->ftptype = search[i].ftptype;
  exp_ftp_info->port = array[4] << 8 | array[5];
 } else {
  /* Enrico Scholz's passive FTP to partially RNAT'd ftp
     server: it really wants us to connect to a
     different IP address.  Simply don't record it for
     NAT. */
  DEBUGP("conntrack_ftp: NOT RECORDING: %u,%u,%u,%u != %u.%u.%u.%u\n",
         array[0], array[1], array[2], array[3],
         NIPQUAD(ct->tuplehash[dir].tuple.src.ip));
  /* Thanks to Cristiano Lincoln Mattos
     <[email protected]> for reporting this potential
     problem (DMZ machines opening holes to internal
     networks, or the packet filter itself). */
  if (!loose) goto out;
 }
// 子连接的tuple及其掩码信息,也就是对内容分析解码出来的子连接的地址
// 端口等信息
 exp->tuple = ((struct ip_conntrack_tuple)
  { { ct->tuplehash[!dir].tuple.src.ip,
      { 0 } },
    { htonl((array[0] << 24) | (array[1] << 16)
     | (array[2] << 8) | array[3]),
      { .tcp = { htons(array[4] << 8 | array[5]) } },
      IPPROTO_TCP }});
 exp->mask = ((struct ip_conntrack_tuple)
  { { 0xFFFFFFFF, { 0 } },
    { 0xFFFFFFFF, { .tcp = { 0xFFFF } }, 0xFFFF }});
 exp->expectfn = NULL;
 /* Ignore failure; should only happen with NAT */
// 建立期待子连接
 ip_conntrack_expect_related(ct, &expect);
 out:
 UNLOCK_BH(&ip_ftp_lock);
......

3.2 建立期待的子连接
 
处理函数为ip_conntrack_expect_related(),该函数根据协议跟踪辅助函数提供的期待连接信息新分配内存空间,填写完结构其他部分信息,并添加到期待连接链表中,供在建立新连接时检查是否属于期待连接。在分配内存前,要检测所提供的期待连接信息是否有效,如是否重复提供,是否子连接数超过协议最大限制等。
/* Add a related connection. */
int ip_conntrack_expect_related(struct ip_conntrack *related_to,
    struct ip_conntrack_expect *expect)
{
 struct ip_conntrack_expect *old, *new;
 int ret = 0;
 WRITE_LOCK(&ip_conntrack_lock);
 /* Because of the write lock, no reader can walk the lists,
  * so there is no need to use the tuple lock too */
 DEBUGP("ip_conntrack_expect_related %p\n", related_to);
 DEBUGP("tuple: "); DUMP_TUPLE(&expect->tuple);
 DEBUGP("mask:  "); DUMP_TUPLE(&expect->mask);
// 根据所提供的期待连接tuple信息查找期待连接链表是否已经存在
 old = LIST_FIND(&ip_conntrack_expect_list, resent_expect,
          struct ip_conntrack_expect *, &expect->tuple,
   &expect->mask);
 if (old) {
// 找到,说明主连接的包是重发包,不用新建,修改一下期待连接的定时器即可
  /* Helper private data may contain offsets but no pointers
     pointing into the payload - otherwise we should have to copy
     the data filled out by the helper over the old one */
  DEBUGP("expect_related: resent packet\n");
  if (related_to->helper->timeout) {
   if (!del_timer(&old->timeout)) {
    /* expectation is dying. Fall through */
    old = NULL;
   } else {
    old->timeout.expires = jiffies +
     related_to->helper->timeout * HZ;
    add_timer(&old->timeout);
   }
  }
  if (old) {
   WRITE_UNLOCK(&ip_conntrack_lock);
   return -EEXIST;
  }
 } else if (related_to->helper->max_expected &&
     related_to->expecting >= related_to->helper->max_expected) {
// 没找到,但检查当前主连接所有的子连接数是否超过预定的限制
  /* old == NULL */
  if (!(related_to->helper->flags &
        IP_CT_HELPER_F_REUSE_EXPECT)) {
   WRITE_UNLOCK(&ip_conntrack_lock);
        if (net_ratelimit())
         printk(KERN_WARNING
           "ip_conntrack: max number of expected "
           "connections %i of %s reached for "
           "%u.%u.%u.%u->%u.%u.%u.%u\n",
           related_to->helper->max_expected,
           related_to->helper->name,
                       NIPQUAD(related_to->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.ip),
                       NIPQUAD(related_to->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.ip));
   return -EPERM;
  }
  DEBUGP("ip_conntrack: max number of expected "
         "connections %i of %s reached for "
         "%u.%u.%u.%u->%u.%u.%u.%u, reusing\n",
          related_to->helper->max_expected,
         related_to->helper->name,
         NIPQUAD(related_to->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.ip),
         NIPQUAD(related_to->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.ip));
 
  /* choose the the oldest expectation to evict */
  list_for_each_entry(old, &related_to->sibling_list,
                                        expected_list)
   if (old->sibling == NULL)
    break;
  /* We cannot fail since related_to->expecting is the number
   * of unconfirmed expectations */
  IP_NF_ASSERT(old && old->sibling == NULL);
  /* newnat14 does not reuse the real allocated memory
   * structures but rather unexpects the old and
   * allocates a new.  unexpect_related will decrement
   * related_to->expecting.
   */
  unexpect_related(old);
  ret = -EPERM;
 } else if (LIST_FIND(&ip_conntrack_expect_list, expect_clash,
        struct ip_conntrack_expect *, &expect->tuple,
        &expect->mask)) {
// 检查是否和已有的期待连接冲突
  WRITE_UNLOCK(&ip_conntrack_lock);
  DEBUGP("expect_related: busy!\n");
  return -EBUSY;
 }
// 新分配内存建立一个期待连接结构
 new = (struct ip_conntrack_expect *)
       kmalloc(sizeof(struct ip_conntrack_expect), GFP_ATOMIC);
 if (!new) {
  WRITE_UNLOCK(&ip_conntrack_lock);
  DEBUGP("expect_relaed: OOM allocating expect\n");
  return -ENOMEM;
 }
 
 DEBUGP("new expectation %p of conntrack %p\n", new, related_to);
// 将提供的期待连接信息拷贝到分配的缓冲区
 memcpy(new, expect, sizeof(*expect));
// 期待连接的主连接
 new->expectant = related_to;
 new->sibling = NULL;
// 当前使用数
 atomic_set(&new->use, 1);
 
 /* add to expected list for this connection */ 
// 添加到主连接的子连接链表
 list_add_tail(&new->expected_list, &related_to->sibling_list);
 /* add to global list of expectations */
// 添加到系统的全局期待连接链表
 list_prepend(&ip_conntrack_expect_list, &new->list);
 /* add and start timer if required */
 if (related_to->helper->timeout) {
// 设置定时器
  init_timer(&new->timeout);
  new->timeout.data = (unsigned long)new;
  new->timeout.function = expectation_timed_out;
  new->timeout.expires = jiffies +
     related_to->helper->timeout * HZ;
  add_timer(&new->timeout);
 }
// 增加主连接当前期待子连接的数量
 related_to->expecting++;
 WRITE_UNLOCK(&ip_conntrack_lock);
 return ret;
}

3.3 NAT时的子连接信息的修改
 
如果要进行NAT,通常是需要修改主连接数据信息中的地址端口等信息,将地址修改为防火墙本身地址,端口要改为一个防火墙上的未用端口。注意,如果信息不是二进制模式而是字符形式描述,如FTP,那很可能修改后的数据长度和原来的长度是不同的,因此T包头信息中长度要修改,后续同方向的TCP包的序列号也相应修改,反方向的TCP包的确认号也要修改,直到主连接结束。
 
在do_bindings()函数中调用了协议NAT辅助模块:
/* net/ipv4/netfilter/ip_nat_core.c */
......
 if (helper) {
  struct ip_conntrack_expect *exp = NULL;
  struct list_head *cur_item;
  int ret = NF_ACCEPT;
  int helper_called = 0;
  DEBUGP("do_bindings: helper existing for (%p)\n", ct);
  /* Always defragged for helpers */
  IP_NF_ASSERT(!((*pskb)->nh.iph->frag_off
          & htons(IP_MF|IP_OFFSET)));
  /* Have to grab read lock before sibling_list traversal */
  READ_LOCK(&ip_conntrack_lock);
  list_for_each_prev(cur_item, &ct->sibling_list) {
   exp = list_entry(cur_item, struct ip_conntrack_expect,
      expected_list);
     
   /* if this expectation is already established, skip */
   if (exp->sibling)
    continue;
// 检查是否是需要进行修改的数据包
   if (exp_for_packet(exp, pskb)) {
    /* FIXME: May be true multiple times in the
     * case of UDP!! */
    DEBUGP("calling nat helper (exp=%p) for packet\n", exp);
// 调用协议NAT辅助函数进行数据修改,如果修改失败有可能会返回NF_DROP
// 只有返回NF_ACCEPT,数据包才会继续处理
    ret = helper->help(ct, exp, info, ctinfo,
         hooknum, pskb);
    if (ret != NF_ACCEPT) {
     READ_UNLOCK(&ip_conntrack_lock);
     return ret;
    }
    helper_called = 1;
   }
  }
  /* Helper might want to manip the packet even when there is no
   * matching expectation for this packet */
  if (!helper_called && helper->flags & IP_NAT_HELPER_F_ALWAYS) {
   DEBUGP("calling nat helper for packet without expectation\n");
// 如果前面没作NAT辅助函数处理而要求对所有包进行处理,再处理一下
   ret = helper->help(ct, NULL, info, ctinfo,
        hooknum, pskb);
   if (ret != NF_ACCEPT) {
    READ_UNLOCK(&ip_conntrack_lock);
    return ret;
   }
  }
  READ_UNLOCK(&ip_conntrack_lock);
  
  /* Adjust sequence number only once per packet
   * (helper is called at all hooks) */
// 对于TCP,必要的话调整序列号和确认号
  if (is_tcp && (hooknum == NF_IP_POST_ROUTING
          || hooknum == NF_IP_LOCAL_IN)) {
   DEBUGP("ip_nat_core: adjusting sequence number\n");
   /* future: put this in a l4-proto specific function,
    * and call this function here. */
   ip_nat_seq_adjust(*pskb, ct, ctinfo);
  }
  return ret;
 } else
  return NF_ACCEPT;
......

在help()函数中,修改数据包中的子连接信息,以FTP协议为例:

/* net/ipv4/netfilter/ip_nat_ftp.c */
......
static unsigned int help(struct ip_conntrack *ct,
    struct ip_conntrack_expect *exp,
    struct ip_nat_info *info,
    enum ip_conntrack_info ctinfo,
    unsigned int hooknum,
    struct sk_buff **pskb)
{
 struct iphdr *iph = (*pskb)->nh.iph;
 struct tcphdr *tcph = (void *)iph + iph->ihl*4;
 unsigned int datalen;
 int dir;
 struct ip_ct_ftp_expect *ct_ftp_info;
 if (!exp)
  DEBUGP("ip_nat_ftp: no exp!!");
 ct_ftp_info = &exp->help.exp_ftp_info;
 /* Only mangle things once: original direction in POST_ROUTING
    and reply direction on PRE_ROUTING. */
 dir = CTINFO2DIR(ctinfo);
// 数据只修改一次,是在刚进入或出netfilter时进行
 if (!((hooknum == NF_IP_POST_ROUTING && dir == IP_CT_DIR_ORIGINAL)
       || (hooknum == NF_IP_PRE_ROUTING && dir == IP_CT_DIR_REPLY))) {
  DEBUGP("nat_ftp: Not touching dir %s at hook %s\n",
         dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY",
         hooknum == NF_IP_POST_ROUTING ? "POSTROUTING"
         : hooknum == NF_IP_PRE_ROUTING ? "PREROUTING"
         : hooknum == NF_IP_LOCAL_OUT ? "OUTPUT" : "???");
  return NF_ACCEPT;
 }
 datalen = (*pskb)->len - iph->ihl * 4 - tcph->doff * 4;
 LOCK_BH(&ip_ftp_lock);
 /* If it's in the right range... */
// 检查序列号是否是要进行修改的范围,修改范围数据是在跟踪处理时填写的
 if (between(exp->seq + ct_ftp_info->len,
      ntohl(tcph->seq),
      ntohl(tcph->seq) + datalen)) {
// 进行数据修改操作,如果修改失败丢弃该包
  if (!ftp_data_fixup(ct_ftp_info, ct, pskb, ctinfo, exp)) {
   UNLOCK_BH(&ip_ftp_lock);
   return NF_DROP;
  }
 } else {
// 表示数据包中的数据是不完整的,丢弃
  /* Half a match?  This means a partial retransmisison.
     It's a cracker being funky. */
  if (net_ratelimit()) {
   printk("FTP_NAT: partial packet %u/%u in %u/%u\n",
          exp->seq, ct_ftp_info->len,
          ntohl(tcph->seq),
          ntohl(tcph->seq) + datalen);
  }
  UNLOCK_BH(&ip_ftp_lock);
  return NF_DROP;
 }
 UNLOCK_BH(&ip_ftp_lock);
 return NF_ACCEPT;
}
 
......
 
// 修改FTP数据中关于地址端口的描述信息,地址修改为NAT转换后地址
// 端口部分则是查找防火墙上一个未用端口代替
static int ftp_data_fixup(const struct ip_ct_ftp_expect *ct_ftp_info,
     struct ip_conntrack *ct,
     struct sk_buff **pskb,
     enum ip_conntrack_info ctinfo,
     struct ip_conntrack_expect *expect)
{
 u_int32_t newip;
 struct iphdr *iph = (*pskb)->nh.iph;
 struct tcphdr *tcph = (void *)iph + iph->ihl*4;
 u_int16_t port;
// newtuple中保存NAT修改后的子连接的连接参数
 struct ip_conntrack_tuple newtuple;
 MUST_BE_LOCKED(&ip_ftp_lock);
 DEBUGP("FTP_NAT: seq %u + %u in %u\n",
        expect->seq, ct_ftp_info->len,
        ntohl(tcph->seq));
 /* Change address inside packet to match way we're mapping
    this connection. */
// 根据是主动模式还是被动模式调整IP值
 if (ct_ftp_info->ftptype == IP_CT_FTP_PASV
     || ct_ftp_info->ftptype == IP_CT_FTP_EPSV) {
// 被动模式
  /* PASV/EPSV response: must be where client thinks server
     is */
  newip = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.ip;
  /* Expect something from client->server */
  newtuple.src.ip =
   ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.ip;
  newtuple.dst.ip =
   ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.ip;
 } else {
// 主动模式
  /* PORT command: must be where server thinks client is */
  newip = ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.ip;
  /* Expect something from server->client */
  newtuple.src.ip =
   ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.ip;
  newtuple.dst.ip =
   ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.ip;
 }
 newtuple.dst.protonum = IPPROTO_TCP;
 newtuple.src.u.tcp.port = expect->tuple.src.u.tcp.port;
 /* Try to get same port: if not, try to change it. */
 for (port = ct_ftp_info->port; port != 0; port++) {
  newtuple.dst.u.tcp.port = htons(port);
// 从连接当前端口值开始,循环到65535,查找一个未用端口,
// 如果端口被使用,返回值非0; 如果端口未用,将期待子连接中的端口修改为
// 新的端口
  if (ip_conntrack_change_expect(expect, &newtuple) == 0)
   break;
 }
 if (port == 0)
  return 0;
// 完成数据修改
 if (!mangle[ct_ftp_info->ftptype](pskb, newip, port,
       expect->seq - ntohl(tcph->seq),
       ct_ftp_info->len, ct, ctinfo))
  return 0;
 return 1;
}

3.4 修改期待连接
 
在NAT辅助模块中一个重要处理就是修改期待子连接信息,
/* Change tuple in an existing expectation */
int ip_conntrack_change_expect(struct ip_conntrack_expect *expect,
          struct ip_conntrack_tuple *newtuple)
{
 int ret;
 MUST_BE_READ_LOCKED(&ip_conntrack_lock);
 WRITE_LOCK(&ip_conntrack_expect_tuple_lock);
 DEBUGP("change_expect:\n");
 DEBUGP("exp tuple: "); DUMP_TUPLE(&expect->tuple);
 DEBUGP("exp mask:  "); DUMP_TUPLE(&expect->mask);
 DEBUGP("newtuple:  "); DUMP_TUPLE(newtuple);
 if (expect->ct_tuple.dst.protonum == 0) {
  /* Never seen before */
  DEBUGP("change expect: never seen before\n");
  if (!ip_ct_tuple_equal(&expect->tuple, newtuple)
      && LIST_FIND(&ip_conntrack_expect_list, expect_clash,
            struct ip_conntrack_expect *, newtuple, &expect->mask)) {
   /* Force NAT to find an unused tuple */
   ret = -1;
  } else {
   memcpy(&expect->ct_tuple, &expect->tuple, sizeof(expect->tuple));
   memcpy(&expect->tuple, newtuple, sizeof(expect->tuple));
   ret = 0;
  }
 } else {
  /* Resent packet */
  DEBUGP("change expect: resent packet\n");
  if (ip_ct_tuple_equal(&expect->tuple, newtuple)) {
   ret = 0;
  } else {
   /* Force NAT to choose again the same port */
   ret = -1;
  }
 }
 WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock);
 
 return ret;
}

3.5 将标识为子连接的新连接与主连接联系

当子连接的数据到达时,netfilter先为其建立相应的连接结构(struct ip_conntrack),然后检查该连接是否是期待的子连接,如果是,将新连接与主连接建立相应联系,
具体处理是在init_conntrack()函数中进行的:

/* net/ipv4/netfilter/ip_conntrack_core.c */

static struct ip_conntrack_tuple_hash *
init_conntrack(const struct ip_conntrack_tuple *tuple,
        struct ip_conntrack_protocol *protocol,
        struct sk_buff *skb)
{
 struct ip_conntrack *conntrack;
 struct ip_conntrack_tuple repl_tuple;
 size_t hash;
 struct ip_conntrack_expect *expected;
 int i;
 static unsigned int drop_next = 0;
// 先建立新连接的信息
 if (!ip_conntrack_hash_rnd_initted) {
  get_random_bytes(&ip_conntrack_hash_rnd, 4);
  ip_conntrack_hash_rnd_initted = 1;
 }
.....(省略)
 INIT_LIST_HEAD(&conntrack->sibling_list);
// 以下进行期待子连接的检查处理
 WRITE_LOCK(&ip_conntrack_lock);
 /* Need finding and deleting of expected ONLY if we win race */
 READ_LOCK(&ip_conntrack_expect_tuple_lock);
// 在期待子连接链表中根据新连接的tuple检查是否匹配
 expected = LIST_FIND(&ip_conntrack_expect_list, expect_cmp,
        struct ip_conntrack_expect *, tuple);
 READ_UNLOCK(&ip_conntrack_expect_tuple_lock);
 /* If master is not in hash table yet (ie. packet hasn't left
    this machine yet), how can other end know about expected?
    Hence these are not the droids you are looking for (if
    master ct never got confirmed, we'd hold a reference to it
    and weird things would happen to future packets). */
// 如果找到期待连接,但该连接却没有相应主连接,找到的期待连接无效
 if (expected && !is_confirmed(expected->expectant))
  expected = NULL;
 /* Look up the conntrack helper for master connections only */
// 主连接才允许有helper辅助函数
 if (!expected)
  conntrack->helper = ip_ct_find_helper(&repl_tuple);
 /* If the expectation is dying, then this is a looser. */
// 找到期待连接,删除该期待连接超时,如果删除超时失败,说明该期待连接
// 已经在定时时间到的中断处理,正在删除期待连接中
 if (expected
     && expected->expectant->helper->timeout
     && ! del_timer(&expected->timeout))
  expected = NULL;
 if (expected) {
// 找到期待连接
  DEBUGP("conntrack: expectation arrives ct=%p exp=%p\n",
   conntrack, expected);
  /* Welcome, Mr. Bond.  We've been expecting you... */
  IP_NF_ASSERT(master_ct(conntrack));
  __set_bit(IPS_EXPECTED_BIT, &conntrack->status);
// 新连接的master参数设置为该期待连接
  conntrack->master = expected;
// 期待连接的sibling设置为新连接
  expected->sibling = conntrack;
// 将期待连接从期待连接链表中删除
  LIST_DELETE(&ip_conntrack_expect_list, expected);
// 将主连接中等待的子连接数减一,可以继续建立新的期待连接
  expected->expectant->expecting--;
// 增加连接使用计数器
  nf_conntrack_get(&master_ct(conntrack)->infos[0]);
 }
 atomic_inc(&ip_conntrack_count);
 WRITE_UNLOCK(&ip_conntrack_lock);
 if (expected && expected->expectfn)
  expected->expectfn(conntrack);
 return &conntrack->tuplehash[IP_CT_DIR_ORIGINAL];
}

3.6 子连接NAT信息的建立
 
子连接建立后,如果是NAT环境,同样要对该连接建立相关的NAT处理,对子连接建立NAT信息是协议NAT辅助模块中的expect()函数完成的,以FTP协议为例:
 
/* net/ipv4/netfilter/ip_nat_ftp.c */
...
static unsigned int
ftp_nat_expected(struct sk_buff **pskb,
   unsigned int hooknum,
   struct ip_conntrack *ct,
   struct ip_nat_info *info)
{
 struct ip_nat_multi_range mr;
 u_int32_t newdstip, newsrcip, newip;
 struct ip_ct_ftp_expect *exp_ftp_info;
 struct ip_conntrack *master = master_ct(ct);
 
 IP_NF_ASSERT(info);
 IP_NF_ASSERT(master);
// 要保证连接NAT信息还没有被初始化
 IP_NF_ASSERT(!(info->initialized & (1<<HOOK2MANIP(hooknum))));
 DEBUGP("nat_expected: We have a connection!\n");
 exp_ftp_info = &ct->master->help.exp_ftp_info;
 LOCK_BH(&ip_ftp_lock);
// 根据子连接类型确定NAT转换信息
 if (exp_ftp_info->ftptype == IP_CT_FTP_PORT
     || exp_ftp_info->ftptype == IP_CT_FTP_EPRT) {
// 主动模式,子连接由服务器发起连接到客户端,
// 新目的IP是原始方向的源IP,新源IP是原始方向的目的IP
  /* PORT command: make connection go to the client. */
  newdstip = master->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.ip;
  newsrcip = master->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.ip;
  DEBUGP("nat_expected: PORT cmd. %u.%u.%u.%u->%u.%u.%u.%u\n",
         NIPQUAD(newsrcip), NIPQUAD(newdstip));
 } else {
// 被动模式,子连接由客户端发起连接到服务器,
// 新目的IP是响应方向的源IP,新源IP是响应方向的目的IP
  /* PASV command: make the connection go to the server */
  newdstip = master->tuplehash[IP_CT_DIR_REPLY].tuple.src.ip;
  newsrcip = master->tuplehash[IP_CT_DIR_REPLY].tuple.dst.ip;
  DEBUGP("nat_expected: PASV cmd. %u.%u.%u.%u->%u.%u.%u.%u\n",
         NIPQUAD(newsrcip), NIPQUAD(newdstip));
 }
 UNLOCK_BH(&ip_ftp_lock);
 if (HOOK2MANIP(hooknum) == IP_NAT_MANIP_SRC)
  newip = newsrcip;
 else
  newip = newdstip;
 DEBUGP("nat_expected: IP to %u.%u.%u.%u\n", NIPQUAD(newip));
// 填写NAT处理参数
 mr.rangesize = 1;
 /* We don't want to manip the per-protocol, just the IPs... */
 mr.range[0].flags = IP_NAT_RANGE_MAP_IPS;
 mr.range[0].min_ip = mr.range[0].max_ip = newip;
 /* ... unless we're doing a MANIP_DST, in which case, make
    sure we map to the correct port */
 if (HOOK2MANIP(hooknum) == IP_NAT_MANIP_DST) {
  mr.range[0].flags |= IP_NAT_RANGE_PROTO_SPECIFIED;
  mr.range[0].min = mr.range[0].max
   = ((union ip_conntrack_manip_proto)
    { .tcp = { htons(exp_ftp_info->port) } });
 }
// 建立子连接的NAT处理
 return ip_nat_setup_info(ct, &mr, hooknum);
}
......
该expect()函数在ip_nat_standalone.c文件的ip_nat_fn()函数中调用到call_expect()函数,在call_expect()函数中调用(*expect)函数:
/* net/ipv4/netfilter/ip_nat_standalone.c */
......
static inline int call_expect(struct ip_conntrack *master,
         struct sk_buff **pskb,
         unsigned int hooknum,
         struct ip_conntrack *ct,
         struct ip_nat_info *info)
{
 return master->nat.info.helper->expect(pskb, hooknum, ct, info);
}
......
// 对NEW或RELATED的连接要建立NAT信息
   if (ct->master
       && master_ct(ct)->nat.info.helper
       && master_ct(ct)->nat.info.helper->expect) {
// 是子连接,调用相应的expect()建立NAT处理信息
    ret = call_expect(master_ct(ct), pskb,
        hooknum, ct, info);
   } else {
#ifdef CONFIG_IP_NF_NAT_LOCAL
    /* LOCAL_IN hook doesn't have a chain!  */
    if (hooknum == NF_IP_LOCAL_IN)
     ret = alloc_null_binding(ct, info,
         hooknum);
    else
#endif
// 对于主连接,是根据制定的iptables的NAT规则来建立NAT信息
    ret = ip_nat_rule_find(pskb, hooknum, in, out,
             ct, info);
   }
......
 
3.7  期待连接的删除
 
3.7.1 正常子连接删除
 
子连接也是普通的连接,也是使用destroy_conntrack()函数删除,如果发现是子连接,会进行相应处理:
 
static void
destroy_conntrack(struct nf_conntrack *nfct)
{
 struct ip_conntrack *ct = (struct ip_conntrack *)nfct, *master = NULL;
 struct ip_conntrack_protocol *proto;
 DEBUGP("destroy_conntrack(%p)\n", ct);
 IP_NF_ASSERT(atomic_read(&nfct->use) == 0);
 IP_NF_ASSERT(!timer_pending(&ct->timeout));
 /* To make sure we don't get any weird locking issues here:
  * destroy_conntrack() MUST NOT be called with a write lock
  * to ip_conntrack_lock!!! -HW */
 proto = ip_ct_find_proto(ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.protonum);
 if (proto && proto->destroy)
  proto->destroy(ct);
 if (ip_conntrack_destroyed)
  ip_conntrack_destroyed(ct);
 WRITE_LOCK(&ip_conntrack_lock);
 /* Delete us from our own list to prevent corruption later */
 list_del(&ct->sibling_list);
 /* Delete our master expectation */
 if (ct->master) {
// ct是子连接,ct->master期待连接
  if (ct->master->expectant) {
// ct->master->expectant是主连接
   /* can't call __unexpect_related here,
    * since it would screw up expect_list */
// 将期待连接从主连接的期待连接链表中删除
   list_del(&ct->master->expected_list);
// master是主连接
   master = ct->master->expectant;
  }
// 释放期待连接的内存空间
  kfree(ct->master);
 }
 WRITE_UNLOCK(&ip_conntrack_lock);
// 如果有主连接,减少主连接的连接计数
 if (master)
  ip_conntrack_put(master);
 DEBUGP("destroy_conntrack: returning ct=%p to slab\n", ct);
 kmem_cache_free(ip_conntrack_cachep, ct);
 atomic_dec(&ip_conntrack_count);
}

3.7.2 超时删除
 
当子连接数据一直没来,期待连接的超时器超时,调用超时函数expectation_timed_out():
 
static void expectation_timed_out(unsigned long ul_expect)
{
 struct ip_conntrack_expect *expect = (void *) ul_expect;
 DEBUGP("expectation %p timed out\n", expect); 
 WRITE_LOCK(&ip_conntrack_lock);
 __unexpect_related(expect);
 WRITE_UNLOCK(&ip_conntrack_lock);
}
调用顺序:
expectation_timed_out()
  ->__unexpect_related()
    ->ip_conntrack_expect_put()
      ->destroy_expect()
        ->kfree()
 
4. 结论
 
netfilter的主连接和子连接都是用struct ip_conntrack描述的,两者之间使用struct ip_conntrack_expect结构进行连接,宏master_ct()就是由子连接获取主连接:
#define master_ct(conntr) (conntr->master ? conntr->master->expectant : NULL)

你可能感兴趣的:(数据结构,.net,应用服务器,linux,防火墙)