两个最著名的网络协议模型: 7 层的 OSI 模型和 5 层的 TCP/IP 模型, 如下图:
Network protocol module
后文的介绍以 TCP/IP 模型为主,数据在不同层中传递的时候, 需要知道使用何种协议来对代传输的数据进行处理, 例如,在 L2 层, 网卡驱动接收到了 skb 之后,需要在 netif_receive_skb() 中调用合适的 handler 来将 skb 转给上层协议。后面的内容主要介绍 Linux 内核中这种 handler 的组织和处理。
协议在内核中用 packet_type 来表示, 每个协议在系统初始化或者相应模块加载的时候添加到内核中的一个大小为 16 的哈希表 —— ptype_base 中, 其中哈希表中的每个元素,都是一个双向链表。 函数 dev_add_pack() 用于将指定的协议 (packet_type) 注册到系统中。
此外, ETH_P_ALL 被单独的放到了 ptype_all 这个表中,用于 sniffer 中。
整体的架构如下图所示:
Kernel Data Structer of Protocol Handler
协议的注册由 dev_add_pack() 完成,这里从协议在内核中的表示开始介绍。
定义如下:
struct packet_type { __be16 type; /* This is really htons(ether_type). */ struct net_device *dev; /* NULL is wildcarded here */ int (*func) (struct sk_buff *, struct net_device *, struct packet_type *, struct net_device *); struct sk_buff *(*gso_segment)(struct sk_buff *skb, int features); int (*gso_send_check)(struct sk_buff *skb); struct sk_buff **(*gro_receive)(struct sk_buff **head, struct sk_buff *skb); int (*gro_complete)(struct sk_buff *skb); void *af_packet_priv; struct list_head list; };
其中:
协议代码,表明了协议类型,定义在 include/linux/if_ether.h 中,如下:
#define ETH_P_LOOP 0x0060 /* Ethernet Loopback packet */ #define ETH_P_PUP 0x0200 /* Xerox PUP packet */ #define ETH_P_PUPAT 0x0201 /* Xerox PUP Addr Trans packet */ #define ETH_P_IP 0x0800 /* Internet Protocol packet */ #define ETH_P_X25 0x0805 /* CCITT X.25 */ #define ETH_P_ARP 0x0806 /* Address Resolution packet */ #define ETH_P_BPQ 0x08FF /* G8BPQ AX.25 Ethernet Packet [ NOT AN OFFICIALLY REGISTERED ID ] */ #define ETH_P_IEEEPUP 0x0a00 /* Xerox IEEE802.3 PUP packet */ #define ETH_P_IEEEPUPAT 0x0a01 /* Xerox IEEE802.3 PUP Addr Trans packet */ #define ETH_P_DEC 0x6000 /* DEC Assigned proto */ #define ETH_P_DNA_DL 0x6001 /* DEC DNA Dump/Load */ #define ETH_P_DNA_RC 0x6002 /* DEC DNA Remote Console */ #define ETH_P_DNA_RT 0x6003 /* DEC DNA Routing */ #define ETH_P_LAT 0x6004 /* DEC LAT */ #define ETH_P_DIAG 0x6005 /* DEC Diagnostics */ #define ETH_P_CUST 0x6006 /* DEC Customer use */ #define ETH_P_SCA 0x6007 /* DEC Systems Comms Arch */ #define ETH_P_TEB 0x6558 /* Trans Ether Bridging */ #define ETH_P_RARP 0x8035 /* Reverse Addr Res packet */ #define ETH_P_ATALK 0x809B /* Appletalk DDP */ #define ETH_P_AARP 0x80F3 /* Appletalk AARP */ #define ETH_P_8021Q 0x8100 /* 802.1Q VLAN Extended Header */ #define ETH_P_IPX 0x8137 /* IPX over DIX */ #define ETH_P_IPV6 0x86DD /* IPv6 over bluebook */ #define ETH_P_PAUSE 0x8808 /* IEEE Pause frames. See 802.3 31B */ #define ETH_P_SLOW 0x8809 /* Slow Protocol. See 802.3ad 43B */ #define ETH_P_WCCP 0x883E /* Web-cache coordination protocol defined in draft-wilson-wrec-wccp-v2-00.txt */ #define ETH_P_PPP_DISC 0x8863 /* PPPoE discovery messages */ #define ETH_P_PPP_SES 0x8864 /* PPPoE session messages */ #define ETH_P_MPLS_UC 0x8847 /* MPLS Unicast traffic */ #define ETH_P_MPLS_MC 0x8848 /* MPLS Multicast traffic */ #define ETH_P_ATMMPOA 0x884c /* MultiProtocol Over ATM */ #define ETH_P_ATMFATE 0x8884 /* Frame-based ATM Transport over Ethernet */ #define ETH_P_PAE 0x888E /* Port Access Entity (IEEE 802.1X) */ #define ETH_P_AOE 0x88A2 /* ATA over Ethernet */ #define ETH_P_TIPC 0x88CA /* TIPC */ #define ETH_P_1588 0x88F7 /* IEEE 1588 Timesync */ #define ETH_P_FCOE 0x8906 /* Fibre Channel over Ethernet */ #define ETH_P_FIP 0x8914 /* FCoE Initialization Protocol */ #define ETH_P_EDSA 0xDADA /* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */ #define ETH_P_802_3 0x0001 /* Dummy type for 802.3 frames */ #define ETH_P_AX25 0x0002 /* Dummy protocol id for AX.25 */ #define ETH_P_ALL 0x0003 /* Every packet (be careful!!!) */ #define ETH_P_802_2 0x0004 /* 802.2 frames */ #define ETH_P_SNAP 0x0005 /* Internal only */ #define ETH_P_DDCMP 0x0006 /* DEC DDCMP: Internal only */ #define ETH_P_WAN_PPP 0x0007 /* Dummy type for WAN PPP frames*/ #define ETH_P_PPP_MP 0x0008 /* Dummy type for PPP MP frames */ #define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type */ #define ETH_P_CAN 0x000C /* Controller Area Network */ #define ETH_P_PPPTALK 0x0010 /* Dummy type for Atalk over PPP*/ #define ETH_P_TR_802_2 0x0011 /* 802.2 frames */ #define ETH_P_MOBITEX 0x0015 /* Mobitex ([email protected]) */ #define ETH_P_CONTROL 0x0016 /* Card specific control frames */ #define ETH_P_IRDA 0x0017 /* Linux-IrDA */ #define ETH_P_ECONET 0x0018 /* Acorn Econet */ #define ETH_P_HDLC 0x0019 /* HDLC frames */ #define ETH_P_ARCNET 0x001A /* 1A for ArcNet :-) */ #define ETH_P_DSA 0x001B /* Distributed Switch Arch. */ #define ETH_P_TRAILER 0x001C /* Trailer switch tagging */ #define ETH_P_PHONET 0x00F5 /* Nokia Phonet frames */ #define ETH_P_IEEE802154 0x00F6 /* IEEE802.15.4 frame */
设备, 指明了在哪个设备上使能该协议。 dev 将 NULL 视为通配符,表示任意设备。
对应协议的 Handler ,例如在网卡收到数据后, netif_receive_skb() 将会根据 skb->protocol 来调用相应的 func 来处理 skb 。
为 PF_PACKET 类型的 socket 使用,指向该 packet_type 的创建者相关的 sock 数据结构。
链表头。
协议的注册过程由两个步骤组成:先初始化协议, 随后使用 dev_add_pack() 将初始化的协议加入到 ptype_base 。
例如 ipv6 的注册过程:
static struct packet_type ipv6_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_IPV6), .func = ipv6_rcv, .gso_send_check = ipv6_gso_send_check, .gso_segment = ipv6_gso_segment, .gro_receive = ipv6_gro_receive, .gro_complete = ipv6_gro_complete, }; static int __init ipv6_packet_init(void) { dev_add_pack(&ipv6_packet_type); return 0; } /** * dev_add_pack - add packet handler * @pt: packet type declaration * * Add a protocol handler to the networking stack. The passed &packet_type * is linked into kernel lists and may not be freed until it has been * removed from the kernel lists. * * This call does not sleep therefore it can not * guarantee all CPU's that are in middle of receiving packets * will see the new packet type (until the next received packet). */ void dev_add_pack(struct packet_type *pt) { int hash; spin_lock_bh(&ptype_lock); if (pt->type == htons(ETH_P_ALL)) list_add_rcu(&pt->list, &ptype_all); else { hash = ntohs(pt->type) & PTYPE_HASH_MASK; list_add_rcu(&pt->list, &ptype_base[hash]); } spin_unlock_bh(&ptype_lock); } EXPORT_SYMBOL(dev_add_pack);
首先定义和初始化了 ipv6_packet_type , 设置了 type 和各个 Handler , 然后在调用了 ipv6_packet_init() 中调用 dev_add_pack() 将 ipv6_packet_type 添加到内核的 ptype_base 中去。
dev_add_pack() 的流程也很简单:检查一下协议的类型代码,如果是 ETH_P_ALL , 则将 pt 添加到 ptype_all ,否则则将其添加到 ptype_base 中由哈希值决定的双向链表中。
函数 dev_remove_pack() 用于注销某个协议,仍以 IPV6 为例:
/* net/ipv6/af_inet6.c*/ static void ipv6_packet_cleanup(void) { dev_remove_pack(&ipv6_packet_type); } /* net/core/dev.c */ /** * dev_remove_pack - remove packet handler * @pt: packet type declaration * * Remove a protocol handler that was previously added to the kernel * protocol handlers by dev_add_pack(). The passed &packet_type is removed * from the kernel lists and can be freed or reused once this function * returns. * * This call sleeps to guarantee that no CPU is looking at the packet * type after return. */ void dev_remove_pack(struct packet_type *pt) { __dev_remove_pack(pt); synchronize_net(); } EXPORT_SYMBOL(dev_remove_pack); /** * __dev_remove_pack - remove packet handler * @pt: packet type declaration * * Remove a protocol handler that was previously added to the kernel * protocol handlers by dev_add_pack(). The passed &packet_type is removed * from the kernel lists and can be freed or reused once this function * returns. * * The packet type might still be in use by receivers * and must not be freed until after all the CPU's have gone * through a quiescent state. */ void __dev_remove_pack(struct packet_type *pt) { struct list_head *head; struct packet_type *pt1; spin_lock_bh(&ptype_lock); if (pt->type == htons(ETH_P_ALL)) head = &ptype_all; else head = &ptype_base[ntohs(pt->type) & PTYPE_HASH_MASK]; list_for_each_entry(pt1, head, list) { if (pt == pt1) { list_del_rcu(&pt->list); goto out; } } printk(KERN_WARNING "dev_remove_pack: %p not found.\n", pt); out: spin_unlock_bh(&ptype_lock); } EXPORT_SYMBOL(__dev_remove_pack);
net/ipv6/af_inet6.c 中, ipv6_packet_cleanup() 用于 IpV6 协议的注销过程, 该函数通过 dev_remove_pack() 来完成。而 dev_remove_pack() 负责真正的注销过程。
dev_remove_pack() 是 __dev_remove_pack() 的 wrapper , 而后者,首先根据 Packet_type 来找到挂载了指定 packet_type 的链表, 随后遍历该链表并从中移除指定的 packet_type 。 dev_remove_pack() 的最后又调用了 synchronize_net() 以保证在 dev_remove_pack() 返回的时候, 内核中没有在使用这个移除(注销)了的 packet_type 。
对于 ptype_base 和 ptype_all 两个类型的协议Container, 协议 Handler 的调用方法是相似的: 遍历其中的双向链表,直到找到了符合条件的 packet_type , 然后使用 deliver_skb() 间接调用 packet_type->func() , 或者直接调用 packet_type->func()。 所不同的是, ptype_all 本身就是一个双向链表,可以直接遍历; 而 ptype_base 则是一个包含了双向链表的哈希表, 在遍历之前需要根据 skb->protocol 来计算找到待遍历的双向链表。
例如, netif_receive_skb() 中相关代码如下:
int netif_receive_skb(struct sk_buff *skb) { struct packet_type *ptype, *pt_prev; struct net_device *orig_dev; struct net_device *master; struct net_device *null_or_orig; struct net_device *null_or_bond; int ret = NET_RX_DROP; __be16 type; /* ... */ list_for_each_entry_rcu(ptype, &ptype_all, list) { if (ptype->dev == null_or_orig || ptype->dev == skb->dev || ptype->dev == orig_dev) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } /* ... */ type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) { if (ptype->type == type && (ptype->dev == null_or_orig || ptype->dev == skb->dev || ptype->dev == orig_dev || ptype->dev == null_or_bond)) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } if (pt_prev) { ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); } else { kfree_skb(skb); /* Jamal, now you will not able to escape explaining * me how you were going to use this. :-) */ ret = NET_RX_DROP; } /* ... */ }
原文地址:http://blog.163.com/vic_kk/blog/static/4947052420101045435182/