linux报文处理

http://blog.chinaunix.net/uid-24148050-id-464587.html

http://blog.csdn.net/zhangskd/article/details/21469399

http://blog.csdn.net/weixiuc/article/details/2955565

http://blog.csdn.net/zhangskd/article/details/21627963 ---- NAPI

http://blog.csdn.net/qy532846454/article/details/6993695 ------- L4处理

http://blog.chinaunix.net/uid-15014334-id-4411101.html  --- 文字概述


NAPI是linux新的网卡数据处理API,据说是由于找不到更好的名字,所以就叫NAPI(New API),在2.5之后引入。

简单来说,NAPI是综合中断方式与轮询方式的技术。
中断的好处是响应及时,如果数据量较小,则不会占用太多的CPU事件;缺点是数据量大时,会产生过多中断,
而每个中断都要消耗不少的CPU时间,从而导致效率反而不如轮询高。轮询方式与中断方式相反,它更适合处理
大量数据,因为每次轮询不需要消耗过多的CPU时间;缺点是即使只接收很少数据或不接收数据时,也要占用CPU
时间。
NAPI是两者的结合,数据量低时采用中断,数据量高时采用轮询。平时是中断方式,当有数据到达时,会触发中断
处理函数执行,中断处理函数关闭中断开始处理。如果此时有数据到达,则没必要再触发中断了,因为中断处理函
数中会轮询处理数据,直到没有新数据时才打开中断。
很明显,数据量很低与很高时,NAPI可以发挥中断与轮询方式的优点,性能较好。如果数据量不稳定,且说高不高
说低不低,则NAPI则会在两种方式切换上消耗不少时间,效率反而较低一些。


linux启动时,注册函数:

net_dev_init函数内:

for_each_possible_cpu(i) {

struct softnet_data *sd = &per_cpu(softnet_data, i);

sd->backlog.poll = process_backlog; ------ 软中断中,处理报文时调用。

}

open_softirq(NET_TX_SOFTIRQ, net_tx_action); 

open_softirq(NET_RX_SOFTIRQ, net_rx_action); ---------- 注册接收报文的软中断。


inet_init函数内:

static struct packet_type ip_packet_type __read_mostly = {

.type = cpu_to_be16(ETH_P_IP),

.func = ip_rcv,

};

dev_add_pack(&ip_packet_type); ------------- 注册ETH_P_IP类型的处理函数。

全部:

#define ETH_P_LOOP 0x0060/* Ethernet Loopback packet*/
#define ETH_P_PUP 0x0200/* Xerox PUP packet*/
#define ETH_P_PUPAT 0x0201/* Xerox PUP Addr Trans packet*/
#define ETH_P_IP 0x0800/* Internet Protocol packet*/
#define ETH_P_X25 0x0805/* CCITT X.25*/
#define ETH_P_ARP 0x0806/* Address Resolution packet*/
#define ETH_P_BPQ0x08FF/* G8BPQ AX.25 Ethernet Packet[ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_IEEEPUP 0x0a00/* Xerox IEEE802.3 PUP packet */
#define ETH_P_IEEEPUPAT 0x0a01/* Xerox IEEE802.3 PUP Addr Trans packet */
#define ETH_P_BATMAN 0x4305/* B.A.T.M.A.N.-Advanced packet [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_DEC       0x6000          /* DEC Assigned proto           */
#define ETH_P_DNA_DL    0x6001          /* DEC DNA Dump/Load            */
#define ETH_P_DNA_RC    0x6002          /* DEC DNA Remote Console       */
#define ETH_P_DNA_RT    0x6003          /* DEC DNA Routing              */
#define ETH_P_LAT       0x6004          /* DEC LAT                      */
#define ETH_P_DIAG      0x6005          /* DEC Diagnostics              */
#define ETH_P_CUST      0x6006          /* DEC Customer use             */
#define ETH_P_SCA       0x6007          /* DEC Systems Comms Arch       */
#define ETH_P_TEB 0x6558/* Trans Ether Bridging*/
#define ETH_P_RARP      0x8035 /* Reverse Addr Res packet*/
#define ETH_P_ATALK 0x809B/* Appletalk DDP*/
#define ETH_P_AARP 0x80F3/* Appletalk AARP*/
#define ETH_P_8021Q 0x8100          /* 802.1Q VLAN Extended Header  */
#define ETH_P_IPX 0x8137/* IPX over DIX*/
#define ETH_P_IPV6 0x86DD/* IPv6 over bluebook*/
#define ETH_P_PAUSE 0x8808/* IEEE Pause frames. See 802.3 31B */
#define ETH_P_SLOW 0x8809/* Slow Protocol. See 802.3ad 43B */
#define ETH_P_WCCP 0x883E/* Web-cache coordination protocol
* defined in draft-wilson-wrec-wccp-v2-00.txt */
#define ETH_P_PPP_DISC 0x8863/* PPPoE discovery messages     */
#define ETH_P_PPP_SES 0x8864/* PPPoE session messages*/
#define ETH_P_MPLS_UC 0x8847/* MPLS Unicast traffic*/
#define ETH_P_MPLS_MC 0x8848/* MPLS Multicast traffic*/
#define ETH_P_ATMMPOA 0x884c/* MultiProtocol Over ATM*/
#define ETH_P_LINK_CTL 0x886c/* HPNA, wlan link local tunnel */
#define ETH_P_ATMFATE 0x8884/* Frame-based ATM Transport
* over Ethernet
*/
#define ETH_P_PAE 0x888E/* Port Access Entity (IEEE 802.1X) */
#define ETH_P_AOE 0x88A2/* ATA over Ethernet*/
#define ETH_P_8021AD 0x88A8          /* 802.1ad Service VLAN*/
#define ETH_P_802_EX1 0x88B5/* 802.1 Local Experimental 1.  */
#define ETH_P_TIPC 0x88CA/* TIPC*/
#define ETH_P_8021AH 0x88E7          /* 802.1ah Backbone Service Tag */
#define ETH_P_MVRP 0x88F5          /* 802.1Q MVRP                  */
#define ETH_P_1588 0x88F7/* IEEE 1588 Timesync */
#define ETH_P_FCOE 0x8906/* Fibre Channel over Ethernet  */
#define ETH_P_TDLS 0x890D          /* TDLS */
#define ETH_P_FIP 0x8914/* FCoE Initialization Protocol */
#define ETH_P_QINQ1 0x9100/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_QINQ2 0x9200/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_QINQ3 0x9300/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_EDSA 0xDADA/* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
#define ETH_P_AF_IUCV   0xFBFB /* IBM af_iucv [ NOT AN OFFICIALLY REGISTERED ID ] */


#define ETH_P_802_3_MIN 0x0600/* If the value in the ethernet type is less than this value
* then the frame is Ethernet II. Else it is 802.3 */


/*
 * Non DIX types. Won't clash for 1500 types.
 */


#define ETH_P_802_3 0x0001/* Dummy type for 802.3 frames  */
#define ETH_P_AX25 0x0002/* Dummy protocol id for AX.25  */
#define ETH_P_ALL 0x0003/* Every packet (be careful!!!) */
#define ETH_P_802_2 0x0004/* 802.2 frames*/
#define ETH_P_SNAP 0x0005/* Internal only*/
#define ETH_P_DDCMP     0x0006          /* DEC DDCMP: Internal only     */
#define ETH_P_WAN_PPP   0x0007          /* Dummy type for WAN PPP frames*/
#define ETH_P_PPP_MP    0x0008          /* Dummy type for PPP MP frames */
#define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type*/
#define ETH_P_CAN 0x000C/* CAN: Controller Area Network */
#define ETH_P_CANFD 0x000D/* CANFD: CAN flexible data rate*/
#define ETH_P_PPPTALK 0x0010/* Dummy type for Atalk over PPP*/
#define ETH_P_TR_802_2 0x0011/* 802.2 frames*/
#define ETH_P_MOBITEX 0x0015/* Mobitex ([email protected])*/
#define ETH_P_CONTROL 0x0016/* Card specific control frames */
#define ETH_P_IRDA 0x0017/* Linux-IrDA*/
#define ETH_P_ECONET 0x0018/* Acorn Econet*/
#define ETH_P_HDLC 0x0019/* HDLC frames*/
#define ETH_P_ARCNET 0x001A/* 1A for ArcNet :-)            */
#define ETH_P_DSA 0x001B/* Distributed Switch Arch.*/
#define ETH_P_TRAILER 0x001C/* Trailer switch tagging*/
#define ETH_P_PHONET 0x00F5/* Nokia Phonet frames          */
#define ETH_P_IEEE802154 0x00F6 /* IEEE802.15.4 frame*/
#define ETH_P_CAIF 0x00F7/* ST-Ericsson CAIF protocol*/


inet_init函数中:
if (inet_add_protocol(&icmp_protocol, IPPROTO_ICMP) < 0)
pr_crit("%s: Cannot add ICMP protocol\n", __func__);
if (inet_add_protocol(&udp_protocol, IPPROTO_UDP) < 0)
pr_crit("%s: Cannot add UDP protocol\n", __func__);
if (inet_add_protocol(&tcp_protocol, IPPROTO_TCP) < 0)
pr_crit("%s: Cannot add TCP protocol\n", __func__);
#ifdef CONFIG_IP_MULTICAST
if (inet_add_protocol(&igmp_protocol, IPPROTO_IGMP) < 0)
pr_crit("%s: Cannot add IGMP protocol\n", __func__);

#endif

enum {
  IPPROTO_IP = 0, /* Dummy protocol for TCP*/
  IPPROTO_ICMP = 1, /* Internet Control Message Protocol*/
  IPPROTO_IGMP = 2, /* Internet Group Management Protocol*/
  IPPROTO_IPIP = 4, /* IPIP tunnels (older KA9Q tunnels use 94) */
  IPPROTO_TCP = 6, /* Transmission Control Protocol*/
  IPPROTO_EGP = 8, /* Exterior Gateway Protocol*/
  IPPROTO_PUP = 12, /* PUP protocol*/
  IPPROTO_UDP = 17, /* User Datagram Protocol*/
  IPPROTO_IDP = 22, /* XNS IDP protocol*/
  IPPROTO_DCCP = 33, /* Datagram Congestion Control Protocol */
  IPPROTO_RSVP = 46, /* RSVP protocol*/
  IPPROTO_GRE = 47, /* Cisco GRE tunnels (rfc 1701,1702)*/


  IPPROTO_IPV6 = 41,/* IPv6-in-IPv4 tunnelling*/


  IPPROTO_ESP = 50,            /* Encapsulation Security Payload protocol */
  IPPROTO_AH = 51,             /* Authentication Header protocol       */
  IPPROTO_BEETPH = 94,       /* IP option pseudo header for BEET */
  IPPROTO_PIM    = 103, /* Protocol Independent Multicast*/


  IPPROTO_COMP   = 108,                /* Compression Header protocol */
  IPPROTO_SCTP   = 132, /* Stream Control Transport Protocol*/
  IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828)*/


  IPPROTO_RAW = 255,/* Raw IP packets*/
  IPPROTO_MAX
};


struct sk_buff {
struct sock *sk;   ------- 所属的socket
struct net_device*dev;  ------- 所属的device

unsigned int len,
data_len;
__u16 mac_len,
hdr_len;


__be16 inner_protocol;
__u16 inner_transport_header;
__u16 inner_network_header;
__u16 inner_mac_header;
__u16 transport_header;
__u16 network_header;
__u16 mac_header;
/* These elements must be at the end, see alloc_skb() for details.  */
sk_buff_data_ttail;
sk_buff_data_tend;
unsigned char *head,
*data;
unsigned int truesize;
atomic_t users;

__u8 pkt_type:3, -------- 报文类型,PACKET_HOST,PACKET_BROADCAST,PACKET_MULTICAST等
__be16 protocol;   -------- 协议类型,ETH_P_802_3等

};

非NAPI情况:

上半部处理流程:

调用netif_rx函数,里面执行ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);将报文放到队列中。

一般会执行eth_type_trans,需要执行(下面这段出过错!!!)
skb_reset_mac_header(skb); 
skb_pull_inline(skb, ETH_HLEN); -----把data移到指向三层,并判断报文类型等。 

如果复制报文,考虑用sk_copy_expand。socket设置skb_set_owner_w。


下半部处理流程:(可以参考http://blog.csdn.net/weixiuc/article/details/2955569)

非NAPI,接收数据包的下半部处理流程为:

net_rx_action // 软中断
    |--> process_backlog() // 默认poll
               |--> __netif_receive_skb(执行了skb_reset_network_header) // L2处理函数,此时也可能查看二层信息,如br_handle_frame,但sk_buff的data已经指向三层,遍历所有三层入口函数
                            |--> ip_rcv() // L3入口, 内部执行Netfilter hook函数

                                       |---> tcp_v4_rcv等 //L4入口

net_rx_action,里面通过work = n->poll(n, weight);

如果网卡驱动不支持NAPI,则默认的napi_struct->poll()函数为process_backlog()。process_backlog里面执行__netif_receive_skb,__netif_receive_skb_core,

       如果是bridge接口,执行过br_add_if,err = netdev_rx_handler_register(dev, br_handle_frame, p);,则dev->rx_handler为br_handle_frame,在报文处理时,依次执行__netif_receive_skb_core,br_handle_frame,会依次执行bridge层和IP层的filter功能。

这是ebtables功能---(http://ebtables.sourceforge.net/misc/ebtables-man.html              http://www.cnblogs.com/peteryj/archive/2011/07/24/2115602.html)

根据类型,选择三层处理函数,执行ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); 如ip_rcv。

ip_rcv内执行NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL, ip_rcv_finish);里面执行的netfilter的hook,这些hook都是通过nf_register_hooks,nf_register_hook注册。如br_netfilter_init等。Netfilter主要采用连接跟踪(Connection Tracking)、包过滤(Packet Filtering)、地址转换(NAT)、包处理(Packet Mangling)四种技术。--------- 这是iptables功能。和ebtables、arptables等都使用netfilter实现。

最后执行ip_rcv_finish,ip_route_input_noref等,ip_local_deliver_finish,最后根据4层协议,执行ipprot->handler(skb);

L4如执行udp_rcv,__udp4_lib_rcv,udp_queue_rcv_skb,__udp_queue_rcv_skb,sock_queue_rcv_skb,sk->sk_data_ready,sock_def_readable,根据端口找到socket,并调用相应的接收函数,唤醒socket所属的进程。 



你可能感兴趣的:(linux理论与编程)