Linux 网络协议注册及内核对其的处理过程

1 Overview

1.1 网络协议栈模型

两个最著名的网络协议模型: 7 层的 OSI 模型和 5 层的 TCP/IP 模型, 如下图:

Network protocol module

后文的介绍以 TCP/IP 模型为主,数据在不同层中传递的时候, 需要知道使用何种协议来对代传输的数据进行处理, 例如,在 L2 层, 网卡驱动接收到了 skb 之后,需要在 netif_receive_skb() 中调用合适的 handler 来将 skb 转给上层协议。后面的内容主要介绍 Linux 内核中这种 handler 的组织和处理。

1.2 协议 handler 在内核中的组织方式

协议在内核中用 packet_type 来表示, 每个协议在系统初始化或者相应模块加载的时候添加到内核中的一个大小为 16 的哈希表 —— ptype_base 中, 其中哈希表中的每个元素,都是一个双向链表。 函数 dev_add_pack() 用于将指定的协议 (packet_type) 注册到系统中。

此外, ETH_P_ALL 被单独的放到了 ptype_all 这个表中,用于 sniffer 中。

整体的架构如下图所示:

Kernel Data Structer of Protocol Handler

2 协议 Handler 的注册、注销和使用

协议的注册由 dev_add_pack() 完成,这里从协议在内核中的表示开始介绍。

2.1 协议的表示 packet_type:

定义如下:

struct packet_type {
    __be16 type; /* This is really htons(ether_type). */
    struct net_device *dev; /* NULL is wildcarded here */
    int (*func) (struct sk_buff *,
                 struct net_device *,
                 struct packet_type *,
                 struct net_device *);
    struct sk_buff *(*gso_segment)(struct sk_buff *skb,
                                   int features);
    int (*gso_send_check)(struct sk_buff *skb);
    struct sk_buff **(*gro_receive)(struct sk_buff **head,
                                    struct sk_buff *skb);
    int (*gro_complete)(struct sk_buff *skb);
    void *af_packet_priv;
    struct list_head list;
};

其中:

  • type: 

    协议代码,表明了协议类型,定义在 include/linux/if_ether.h 中,如下:

    #define ETH_P_LOOP       0x0060 /* Ethernet Loopback packet */
    #define ETH_P_PUP        0x0200 /* Xerox PUP packet */
    #define ETH_P_PUPAT      0x0201 /* Xerox PUP Addr Trans packet */
    #define ETH_P_IP         0x0800 /* Internet Protocol packet */
    #define ETH_P_X25        0x0805 /* CCITT X.25 */
    #define ETH_P_ARP        0x0806 /* Address Resolution packet */
    #define ETH_P_BPQ        0x08FF /* G8BPQ AX.25 Ethernet Packet [ NOT AN OFFICIALLY REGISTERED ID ] */
    #define ETH_P_IEEEPUP    0x0a00 /* Xerox IEEE802.3 PUP packet */
    #define ETH_P_IEEEPUPAT  0x0a01 /* Xerox IEEE802.3 PUP Addr Trans packet */
    #define ETH_P_DEC        0x6000 /* DEC Assigned proto */
    #define ETH_P_DNA_DL     0x6001 /* DEC DNA Dump/Load */
    #define ETH_P_DNA_RC     0x6002 /* DEC DNA Remote Console */
    #define ETH_P_DNA_RT     0x6003 /* DEC DNA Routing */
    #define ETH_P_LAT        0x6004 /* DEC LAT */
    #define ETH_P_DIAG       0x6005 /* DEC Diagnostics */
    #define ETH_P_CUST       0x6006 /* DEC Customer use */
    #define ETH_P_SCA        0x6007 /* DEC Systems Comms Arch */
    #define ETH_P_TEB        0x6558 /* Trans Ether Bridging */
    #define ETH_P_RARP       0x8035 /* Reverse Addr Res packet */
    #define ETH_P_ATALK      0x809B /* Appletalk DDP */
    #define ETH_P_AARP       0x80F3 /* Appletalk AARP */
    #define ETH_P_8021Q      0x8100 /* 802.1Q VLAN Extended Header */
    #define ETH_P_IPX        0x8137 /* IPX over DIX */
    #define ETH_P_IPV6       0x86DD /* IPv6 over bluebook */
    #define ETH_P_PAUSE      0x8808 /* IEEE Pause frames. See 802.3 31B */
    #define ETH_P_SLOW       0x8809 /* Slow Protocol. See 802.3ad 43B */
    #define ETH_P_WCCP       0x883E /* Web-cache coordination protocol defined in draft-wilson-wrec-wccp-v2-00.txt */
    #define ETH_P_PPP_DISC   0x8863 /* PPPoE discovery messages */
    #define ETH_P_PPP_SES    0x8864 /* PPPoE session messages */
    #define ETH_P_MPLS_UC    0x8847 /* MPLS Unicast traffic */
    #define ETH_P_MPLS_MC    0x8848 /* MPLS Multicast traffic */
    #define ETH_P_ATMMPOA    0x884c /* MultiProtocol Over ATM */
    #define ETH_P_ATMFATE    0x8884 /* Frame-based ATM Transport over Ethernet */
    #define ETH_P_PAE        0x888E /* Port Access Entity (IEEE 802.1X) */
    #define ETH_P_AOE        0x88A2 /* ATA over Ethernet */
    #define ETH_P_TIPC       0x88CA /* TIPC */
    #define ETH_P_1588       0x88F7 /* IEEE 1588 Timesync */
    #define ETH_P_FCOE       0x8906 /* Fibre Channel over Ethernet */
    #define ETH_P_FIP        0x8914 /* FCoE Initialization Protocol */
    #define ETH_P_EDSA       0xDADA /* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
    #define ETH_P_802_3      0x0001 /* Dummy type for 802.3 frames */
    #define ETH_P_AX25       0x0002 /* Dummy protocol id for AX.25 */
    #define ETH_P_ALL        0x0003 /* Every packet (be careful!!!) */
    #define ETH_P_802_2      0x0004 /* 802.2 frames */
    #define ETH_P_SNAP       0x0005 /* Internal only */
    #define ETH_P_DDCMP      0x0006 /* DEC DDCMP: Internal only */
    #define ETH_P_WAN_PPP    0x0007 /* Dummy type for WAN PPP frames*/
    #define ETH_P_PPP_MP     0x0008 /* Dummy type for PPP MP frames */
    #define ETH_P_LOCALTALK  0x0009 /* Localtalk pseudo type */
    #define ETH_P_CAN        0x000C /* Controller Area Network */
    #define ETH_P_PPPTALK    0x0010 /* Dummy type for Atalk over PPP*/
    #define ETH_P_TR_802_2   0x0011 /* 802.2 frames */
    #define ETH_P_MOBITEX    0x0015 /* Mobitex ([email protected]) */
    #define ETH_P_CONTROL    0x0016 /* Card specific control frames */
    #define ETH_P_IRDA       0x0017 /* Linux-IrDA */
    #define ETH_P_ECONET     0x0018 /* Acorn Econet */
    #define ETH_P_HDLC       0x0019 /* HDLC frames */
    #define ETH_P_ARCNET     0x001A /* 1A for ArcNet :-) */
    #define ETH_P_DSA        0x001B /* Distributed Switch Arch. */
    #define ETH_P_TRAILER    0x001C /* Trailer switch tagging */
    #define ETH_P_PHONET     0x00F5 /* Nokia Phonet frames */
    #define ETH_P_IEEE802154 0x00F6 /* IEEE802.15.4 frame */
    
  • dev: 

    设备, 指明了在哪个设备上使能该协议。 dev 将 NULL 视为通配符,表示任意设备。

  • func: 

    对应协议的 Handler ,例如在网卡收到数据后, netif_receive_skb() 将会根据 skb->protocol 来调用相应的 func 来处理 skb 。

  • af_packet_priv: 

    为 PF_PACKET 类型的 socket 使用,指向该 packet_type 的创建者相关的 sock 数据结构。

  • list: 

    链表头。

2.2 协议的注册

协议的注册过程由两个步骤组成:先初始化协议, 随后使用 dev_add_pack() 将初始化的协议加入到 ptype_base 。

例如 ipv6 的注册过程:

static struct packet_type ipv6_packet_type __read_mostly = {
    .type = cpu_to_be16(ETH_P_IPV6),
    .func = ipv6_rcv,
    .gso_send_check = ipv6_gso_send_check,
    .gso_segment = ipv6_gso_segment,
    .gro_receive = ipv6_gro_receive,
    .gro_complete = ipv6_gro_complete,
};

static int __init ipv6_packet_init(void)
{
    dev_add_pack(&ipv6_packet_type);
    return 0;
}
/**
 *  dev_add_pack - add packet handler
 *  @pt: packet type declaration
 *
 *  Add a protocol handler to the networking stack. The passed &packet_type
 *  is linked into kernel lists and may not be freed until it has been
 *  removed from the kernel lists.
 *
 *  This call does not sleep therefore it can not
 *  guarantee all CPU's that are in middle of receiving packets
 *  will see the new packet type (until the next received packet).
 */

void dev_add_pack(struct packet_type *pt)
{
    int hash;

    spin_lock_bh(&ptype_lock);
    if (pt->type == htons(ETH_P_ALL))
        list_add_rcu(&pt->list, &ptype_all);
    else {
        hash = ntohs(pt->type) & PTYPE_HASH_MASK;
        list_add_rcu(&pt->list, &ptype_base[hash]);
    }
    spin_unlock_bh(&ptype_lock);
}
EXPORT_SYMBOL(dev_add_pack);

首先定义和初始化了 ipv6_packet_type , 设置了 type 和各个 Handler , 然后在调用了 ipv6_packet_init() 中调用 dev_add_pack() 将 ipv6_packet_type 添加到内核的 ptype_base 中去。

dev_add_pack() 的流程也很简单:检查一下协议的类型代码,如果是 ETH_P_ALL , 则将 pt 添加到 ptype_all ,否则则将其添加到 ptype_base 中由哈希值决定的双向链表中。

2.3 协议的注销

函数 dev_remove_pack() 用于注销某个协议,仍以 IPV6 为例:

/* net/ipv6/af_inet6.c*/
static void ipv6_packet_cleanup(void)
{
    dev_remove_pack(&ipv6_packet_type);
}

/* net/core/dev.c */
/**
 *  dev_remove_pack  - remove packet handler
 *  @pt: packet type declaration
 *
 *  Remove a protocol handler that was previously added to the kernel
 *  protocol handlers by dev_add_pack(). The passed &packet_type is removed
 *  from the kernel lists and can be freed or reused once this function
 *  returns.
 *
 *  This call sleeps to guarantee that no CPU is looking at the packet
 *  type after return.
 */
void dev_remove_pack(struct packet_type *pt)
{
    __dev_remove_pack(pt);

    synchronize_net();
}
EXPORT_SYMBOL(dev_remove_pack);

/**
 *  __dev_remove_pack    - remove packet handler
 *  @pt: packet type declaration
 *
 *  Remove a protocol handler that was previously added to the kernel
 *  protocol handlers by dev_add_pack(). The passed &packet_type is removed
 *  from the kernel lists and can be freed or reused once this function
 *  returns.
 *
 *      The packet type might still be in use by receivers
 *  and must not be freed until after all the CPU's have gone
 *  through a quiescent state.
 */
void __dev_remove_pack(struct packet_type *pt)
{
    struct list_head *head;
    struct packet_type *pt1;

    spin_lock_bh(&ptype_lock);

    if (pt->type == htons(ETH_P_ALL))
        head = &ptype_all;
    else
        head = &ptype_base[ntohs(pt->type) & PTYPE_HASH_MASK];

    list_for_each_entry(pt1, head, list) {
        if (pt == pt1) {
            list_del_rcu(&pt->list);
            goto out;
        }
    }

    printk(KERN_WARNING "dev_remove_pack: %p not found.\n", pt);
out:
    spin_unlock_bh(&ptype_lock);
}
EXPORT_SYMBOL(__dev_remove_pack);

net/ipv6/af_inet6.c 中, ipv6_packet_cleanup() 用于 IpV6 协议的注销过程, 该函数通过 dev_remove_pack() 来完成。而 dev_remove_pack() 负责真正的注销过程。

dev_remove_pack() 是 __dev_remove_pack() 的 wrapper , 而后者,首先根据 Packet_type 来找到挂载了指定 packet_type 的链表, 随后遍历该链表并从中移除指定的 packet_type 。 dev_remove_pack() 的最后又调用了 synchronize_net() 以保证在 dev_remove_pack() 返回的时候, 内核中没有在使用这个移除(注销)了的 packet_type 。

2.4 协议 Handler 的调用

对于 ptype_base 和 ptype_all 两个类型的协议Container, 协议 Handler 的调用方法是相似的: 遍历其中的双向链表,直到找到了符合条件的 packet_type , 然后使用 deliver_skb() 间接调用 packet_type->func() , 或者直接调用 packet_type->func()。 所不同的是, ptype_all 本身就是一个双向链表,可以直接遍历; 而 ptype_base 则是一个包含了双向链表的哈希表, 在遍历之前需要根据 skb->protocol 来计算找到待遍历的双向链表。

例如, netif_receive_skb() 中相关代码如下:

int netif_receive_skb(struct sk_buff *skb)
{
    struct packet_type *ptype, *pt_prev;
    struct net_device *orig_dev;
    struct net_device *master;
    struct net_device *null_or_orig;
    struct net_device *null_or_bond;
    int ret = NET_RX_DROP;
    __be16 type;

    /* ... */

    list_for_each_entry_rcu(ptype, &ptype_all, list) {
        if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
            ptype->dev == orig_dev) {
            if (pt_prev)
                ret = deliver_skb(skb, pt_prev, orig_dev);
            pt_prev = ptype;
        }
    }

    /* ... */

    type = skb->protocol;
    list_for_each_entry_rcu(ptype,
                            &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
        if (ptype->type == type && (ptype->dev == null_or_orig ||
                                    ptype->dev == skb->dev || ptype->dev == orig_dev ||
                                    ptype->dev == null_or_bond)) {
            if (pt_prev)
                ret = deliver_skb(skb, pt_prev, orig_dev);
            pt_prev = ptype;
        }
    }

    if (pt_prev) {
        ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
    } else {
        kfree_skb(skb);
        /* Jamal, now you will not able to escape explaining
         * me how you were going to use this. :-)
         */
        ret = NET_RX_DROP;
    }
    /* ... */

}
原文地址:http://blog.163.com/vic_kk/blog/static/4947052420101045435182/

你可能感兴趣的:(Linux 网络协议注册及内核对其的处理过程)