linux路由表之route

1. 前言

https://blog.csdn.net/vevenlcf/article/details/48026965

描述了主机路由和网络路由的区别:https://blog.csdn.net/buhuiguowang/article/details/81026050 

2. route 命令参数

mike@ubuntu:~/workspace/DCU-LEDE$ man route > log.txt
mike@ubuntu:~/workspace/DCU-LEDE$ cat log.txt 
ROUTE(8)             Linux System Administrator's Manual             ROUTE(8)

NAME
       route - show / manipulate the IP routing table

SYNOPSIS
       route [-CFvnNee] [-A family |-4|-6]

       route  [-v]  [-A  family  |-4|-6] add [-net|-host] target [netmask Nm]
              [gw Gw] [metric N] [mss M] [window W] [irtt I]  [reject]  [mod]
              [dyn] [reinstate] [[dev] If]

       route  [-v]  [-A  family |-4|-6] del [-net|-host] target [gw Gw] [net‐
              mask Nm] [metric M] [[dev] If]

       route  [-V] [--version] [-h] [--help]

DESCRIPTION
       Route manipulates the kernel's IP routing tables.  Its primary use  is
       to set up static routes to specific hosts or networks via an interface
       after it has been configured with the ifconfig(8) program.

       When the add or del options  are  used,  route  modifies  the  routing
       tables.  Without these options, route displays the current contents of
       the routing tables.

OPTIONS
       -A family
              use the specified address family (eg `inet'). Use route  --help
              for  a full list. You can use -6 as an alias for --inet6 and -4
              as an alias for -A inet

       -F     operate on the kernel's FIB (Forwarding Information Base) rout‐
              ing table.  This is the default.

       -C     operate on the kernel's routing cache.

       -v     select verbose operation.

       -n     show  numerical  addresses  instead of trying to determine sym‐
              bolic host names. This is useful if you are trying to determine
              why the route to your nameserver has vanished.

       -e     use  netstat(8)-format  for  displaying the routing table.  -ee
              will generate a very long line with  all  parameters  from  the
              routing table.

       del    delete a route.

       add    add a new route.

       target the  destination  network or host. You can provide an addresses
              or symbolic network or host name. Optionally you can use  /pre‐
              fixlen notation instead of using the netmask option.

       -net   the target is a network.

       -host  the target is a host.

       netmask NM
              when adding a network route, the netmask to be used.

       gw GW  route packets via a gateway.
              NOTE:  The specified gateway must be reachable first. This usu‐
              ally means that you have to set up a static route to the  gate‐
              way beforehand. If you specify the address of one of your local
              interfaces, it will be used to decide about  the  interface  to
              which the packets should be routed to. This is a BSDism compat‐
              ibility hack.

       metric M
              set the metric field in the routing table (used by routing dae‐
              mons)  to  M.  If  this  option is not specified the metric for
              inet6 (IPv6) address family defaults to '1', for inet (IPv4) it
              defaults  to  '0'. You should always specify an explicit metric
              value to not rely on those defaults -  they  also  differ  from
              iproute2.

       mss M  sets  MTU  (Maximum Transmission Unit) of the route to M bytes.
              Note that the current implementation of the route command  does
              not allow the option to set the Maximum Segment Size (MSS).

       window W
              set  the  TCP  window size for connections over this route to W
              bytes. This is typically only used on AX.25 networks  and  with
              drivers unable to handle back to back frames.

       irtt I set the initial round trip time (irtt) for TCP connections over
              this route to I milliseconds (1-12000). This is typically  only
              used  on  AX.25  networks.  If  omitted the RFC 1122 default of
              300ms is used.

       reject install a blocking route, which will force a  route  lookup  to
              fail.   This  is  for  example used to mask out networks before
              using the default route. This is NOT for firewalling.

       mod, dyn, reinstate
              install a dynamic or modified route. These flags are for  diag‐
              nostic purposes, and are generally only set by routing daemons.

       dev If force  the route to be associated with the specified device, as
              the kernel will otherwise try to determine the  device  on  its
              own  (by checking already existing routes and device specifica‐
              tions, and where the route is added to). In  most  normal  net‐
              works you won't need this.

              If  dev If is the last option on the command line, the word dev
              may be omitted, as it's the default. Otherwise the order of the
              route modifiers (metric netmask gw dev) doesn't matter.

EXAMPLES
       route add -net 127.0.0.0 netmask 255.0.0.0 metric 1024 dev lo
              adds  the  normal  loopback  entry, using netmask 255.0.0.0 and
              associated with the "lo" device (assuming this device was  pre‐
              viously set up correctly with ifconfig(8)).

       route add -net 192.56.76.0 netmask 255.255.255.0 metric 1024 dev eth0
              adds  a route to the local network 192.56.76.x via "eth0".  The
              word "dev" can be omitted here.

       route del default
              deletes the current default route, which is  labeled  "default"
              or  0.0.0.0 in the destination field of the current routing ta‐
              ble.

       route del -net 192.56.76.0 netmask 255.255.255.0
              deletes the route. Since the Linux routing kernel  uses  class‐
              less  addressing,  you  pretty  much always have to specify the
              netmask that is same as as seen in 'route -n' listing.

       route add default gw mango
              adds a default route (which will be  used  if  no  other  route
              matches).   All  packets  using  this  route  will be gatewayed
              through the address of a node named "mango". The  device  which
              will  actually  be  used  for  that route depends on how we can
              reach "mango" - "mango" must be on directly reachable route.

       route add mango sl0
              Adds the route to the host named "mango" via the SLIP interface
              (assuming that "mango" is the SLIP host).

       route add -net 192.57.66.0 netmask 255.255.255.0 gw mango
              This command adds the net "192.57.66.x" to be gatewayed through
              the former route to the SLIP interface.

       route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
              This is an obscure one documented so people know how to do  it.
              This  sets  all  of the class D (multicast) IP routes to go via
              "eth0". This is the correct normal configuration  line  with  a
              multicasting kernel.

       route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject
              This  installs  a  rejecting  route  for  the  private  network
              "10.x.x.x."

       route -6 add 2001:0002::/48 metric 1 dev eth0
              This adds a IPv6 route with the specified metric to be directly
              reachable via eth0.

OUTPUT
       The  output  of the kernel routing table is organized in the following
       columns

       Destination
              The destination network or destination host.

       Gateway
              The gateway address or '*' if none set.

       Genmask
              The netmask for the destination net;  '255.255.255.255'  for  a
              host destination and '0.0.0.0' for the default route.

       Flags  Possible flags include
              U (route is up)
              H (target is a host)
              G (use gateway)
              R (reinstate route for dynamic routing)
              D (dynamically installed by daemon or redirect)
              M (modified from routing daemon or redirect)
              A (installed by addrconf)
              C (cache entry)
              !  (reject route)

       Metric The 'distance' to the target (usually counted in hops).

       Ref    Number of references to this route. (Not used in the Linux ker‐
              nel.)

       Use    Count of lookups for the route.  Depending on the use of -F and
              -C this will be either route cache misses (-F) or hits (-C).

       Iface  Interface to which packets for this route will be sent.

       MSS    Default  maximum  segment  size  for  TCP connections over this
              route.

       Window Default window size for TCP connections over this route.

       irtt   Initial RTT (Round Trip Time). The kernel uses  this  to  guess
              about the best TCP protocol parameters without waiting on (pos‐
              sibly slow) answers.

       HH (cached only)
              The number of ARP entries and cached routes that refer  to  the
              hardware  header cache for the cached route. This will be -1 if
              a hardware address is not  needed  for  the  interface  of  the
              cached route (e.g. lo).

       Arp (cached only)
              Whether  or not the hardware address for the cached route is up
              to date.

FILES
       /proc/net/ipv6_route
       /proc/net/route
       /proc/net/rt_cache

SEE ALSO
       ifconfig(8), netstat(8), arp(8), rarp(8), ip(8)

HISTORY
       Route for Linux  was  originally  written  by  Fred  N.   van  Kempen,
         and then modified by Johannes Stille and
       Linus Torvalds for pl15. Alan Cox added the mss and window options for
       Linux  1.1.22.  irtt support and merged with netstat from Bernd Ecken‐
       fels.

AUTHOR
       Currently maintained by Phil Blundell   and
       Bernd Eckenfels .

net-tools                         2014-02-17                         ROUTE(8)

3. route源码分析之busybox

在shell窗口配置如下路由信息时

添加到主机的路由
# route add -host 192.168.1.2 dev eth0 
# route add -host 10.20.30.148 gw 10.20.30.40     #添加到10.20.30.148的网管

添加到网络的路由
# route add -net 10.20.30.40 netmask 255.255.255.248 eth0   #添加10.20.30.40的网络
# route add -net 10.20.30.48 netmask 255.255.255.248 gw 10.20.30.41 #添加10.20.30.48的网络
# route add -net 192.168.1.0/24 eth1

添加默认路由
# route add default gw 192.168.1.1

删除路由
# route del -host 192.168.1.2 dev eth0:0
# route del -host 10.20.30.148 gw 10.20.30.40
# route del -net 10.20.30.40 netmask 255.255.255.248 eth0
# route del -net 10.20.30.48 netmask 255.255.255.248 gw 10.20.30.41
# route del -net 192.168.1.0/24 eth1
# route del default gw 192.168.1.1

将调用busybox内部的源码route.c

int route_main(int argc UNUSED_PARAM, char **argv)
{
	unsigned opt;
	int what;
	char *family;
	char **p;

	/*
		route add -net 192.56.76.0 netmask 255.255.255.0 dev eth0 //添加一条静态路由
		route add default gw 192.168.0.1 //添加默认路由
		route del -net 192.168.1.0/24 gw 192.168.0.1 //删除一条路由
		route -n //查看路由表
	*/

	/* First, remap '-net' and '-host' to avoid getopt problems. */
	p = argv;
	while (*++p) {
		if (strcmp(*p, "-net") == 0 || strcmp(*p, "-host") == 0) {
			p[0][0] = '#';
		}
	}

	opt = getopt32(argv, "A:ne", &family);

	if ((opt & ROUTE_OPT_A) && strcmp(family, "inet") != 0) {
#if ENABLE_FEATURE_IPV6
		if (strcmp(family, "inet6") == 0) {
			opt |= ROUTE_OPT_INET6;	/* Set flag for ipv6. */
		} else
#endif
		bb_show_usage();
	}

	argv += optind;

	/* No more args means display the routing table. */
	if (!*argv) { //表示输入的命令是:route 即显示所有路由信息
		int noresolve = (opt & ROUTE_OPT_n) ? 0x0fff : 0;
#if ENABLE_FEATURE_IPV6
		if (opt & ROUTE_OPT_INET6)
			INET6_displayroutes();
		else
#endif
			bb_displayroutes(noresolve, opt & ROUTE_OPT_e);

		fflush_stdout_and_exit(EXIT_SUCCESS);
	}

	/* Check verb.  At the moment, must be add, del, or delete. */
	what = kw_lookup(tbl_verb, &argv);
	if (!what || !*argv) {		/* Unknown verb or no more args. */
		bb_show_usage();
	}

#if ENABLE_FEATURE_IPV6
	if (opt & ROUTE_OPT_INET6)
		INET6_setroute(what, argv);
	else
#endif
		INET_setroute(what, argv); //what meas: add del delete

	return EXIT_SUCCESS;
}

由于是IPV4,所以在上面的函数将调用INET_setroute()

static NOINLINE void INET_setroute(int action, char **args)
{
	/* char buffer instead of bona-fide struct avoids aliasing warning */
	char rt_buf[sizeof(struct rtentry)];
	struct rtentry *const rt = (void *)rt_buf;

	const char *netmask = NULL;
	int skfd, isnet, xflag;

    ...
    
	/* Create a socket to the INET kernel. */
	skfd = xsocket(AF_INET, SOCK_DGRAM, 0);

	if (action == RTACTION_ADD)
		xioctl(skfd, SIOCADDRT, rt);
	else
		xioctl(skfd, SIOCDELRT, rt);

	if (ENABLE_FEATURE_CLEAN_UP) close(skfd);
}

在该函数内部主要完成 struct rtentry *const rt = (void *)rt_buf结构体的初始化,最后通过xioctl系统调用CM=SIOCADDRT、SIOCDELRT,完成对参数为struct rtentry *const rt处理。

4. route之内核源码

int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
{
	struct sock *sk = sock->sk;
	int err = 0;
	struct net *net = sock_net(sk);

	switch (cmd) {
    ......
	case SIOCADDRT:
	case SIOCDELRT:
	case SIOCRTMSG:
		err = ip_rt_ioctl(net, cmd, (void __user *)arg);
		break;
    ......
}
int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
{
	struct fib_config cfg;
	struct rtentry rt;
	int err;

	switch (cmd) {
	case SIOCADDRT:		/* Add a route */
	case SIOCDELRT:		/* Delete a route */
		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
			return -EPERM;

		if (copy_from_user(&rt, arg, sizeof(rt))) //强制转换为rtentry结构体成员
			return -EFAULT;

		rtnl_lock();
		err = rtentry_to_fib_config(net, cmd, &rt, &cfg); //通过rtentry结构体来解析cfg的配置
		if (err == 0) {
			struct fib_table *tb;

			if (cmd == SIOCDELRT) {
				tb = fib_get_table(net, cfg.fc_table); //cfg.fc_table = 0
				if (tb)
					err = fib_table_delete(tb, &cfg);
				else
					err = -ESRCH;
			} else {
				tb = fib_new_table(net, cfg.fc_table); //cfg.fc_table = 0
				if (tb)
					err = fib_table_insert(tb, &cfg);
				else
					err = -ENOBUFS;
			}

			/* allocated by rtentry_to_fib_config() */
			kfree(cfg.fc_mx);
		}
		rtnl_unlock();
		return err;
	}
	return -EINVAL;
}

在该函数内部主要完成以下几种功能:

a. rtentry_to_fib_config()  rtentry结构体转fib_config;

先看下struct fib_config的成员

struct fib_config {
	u8			fc_dst_len; //目的地址有效bit的个数
	u8			fc_tos;
	/* rtm_protocol */
#define RTPROT_UNSPEC	0
#define RTPROT_REDIRECT	1	/* Route installed by ICMP redirects;
				   not used by current IPv4 */
#define RTPROT_KERNEL	2	/* Route installed by kernel		*/
#define RTPROT_BOOT	3	/* Route installed during boot		*/
#define RTPROT_STATIC	4	/* Route installed by administrator	*/
	u8			fc_protocol; //见上宏定义

	enum rt_scope_t {
		RT_SCOPE_UNIVERSE=0,
	/* User defined values	*/
		RT_SCOPE_SITE=200,
		RT_SCOPE_LINK=253,
		RT_SCOPE_HOST=254,
		RT_SCOPE_NOWHERE=255
	};
	u8			fc_scope; //RT_SCOPE_HOST RT_SCOPE_LINK

	/* rtm_type */
enum {
	RTN_UNSPEC,
	RTN_UNICAST,		/* Gateway or direct route	*/
	RTN_LOCAL,		/* Accept locally		*/
	RTN_BROADCAST,		/* Accept locally as broadcast,
				   send as broadcast */
	RTN_ANYCAST,		/* Accept locally as broadcast,
				   but send as unicast */
	RTN_MULTICAST,		/* Multicast route		*/
	RTN_BLACKHOLE,		/* Drop				*/
	RTN_UNREACHABLE,	/* Destination is unreachable   */
	RTN_PROHIBIT,		/* Administratively prohibited	*/
	RTN_THROW,		/* Not in this table		*/
	RTN_NAT,		/* Translate this address	*/
	RTN_XRESOLVE,		/* Use external resolver	*/
	__RTN_MAX
};
	u8			fc_type; //fc config配置
	/* 3 bytes unused */
	u32			fc_table;
	__be32			fc_dst; //目的地址
	__be32			fc_gw; //网关地址
	int			fc_oif; // = dev->ifindex 设备接口索引编号
	u32			fc_flags;
	u32			fc_priority;
	__be32			fc_prefsrc; //  = ifa->ifa_local  即本地地址,通过net获取的设备
	struct nlattr		*fc_mx; //指向nlattr链表
	struct rtnexthop	*fc_mp; //fc_mp表示fc_mx指针的个数
	int			fc_mx_len; //表示 fc_mx 指针指向的成员字节数
	int			fc_mp_len;
	u32			fc_flow;

	/* Modifiers to NEW request */
#define NLM_F_REPLACE	0x100	/* Override existing		*/
#define NLM_F_EXCL	0x200	/* Do not touch, if it exists	*/
#define NLM_F_CREATE	0x400	/* Create, if it does not exist	*/
#define NLM_F_APPEND	0x800	/* Add to end of list		*/
	u32			fc_nlflags; //见上宏定义
	struct nl_info		fc_nlinfo;
 };
static int rtentry_to_fib_config(struct net *net, int cmd, struct rtentry *rt,
				 struct fib_config *cfg)
{
	__be32 addr;
	int plen;

	memset(cfg, 0, sizeof(*cfg));
	cfg->fc_nlinfo.nl_net = net;

	if (rt->rt_dst.sa_family != AF_INET) //目的地址的协议族
		return -EAFNOSUPPORT;

	/*
	 * Check mask for validity:
	 * a) it must be contiguous.
	 * b) destination must have all host bits clear.
	 * c) if application forgot to set correct family (AF_INET),
	 *    reject request unless it is absolutely clear i.e.
	 *    both family and mask are zero.
	 */
	plen = 32;
	addr = sk_extract_addr(&rt->rt_dst); //提取目的地址
	if (!(rt->rt_flags & RTF_HOST)) { //不是主机路由,即是网络路由
		__be32 mask = sk_extract_addr(&rt->rt_genmask); //提取子网掩码

		if (rt->rt_genmask.sa_family != AF_INET) { //子网掩码协议族不为AF_INET	
			if (mask || rt->rt_genmask.sa_family)	//子网掩码有效 || rt->rt_genmask.sa_family有效
				return -EAFNOSUPPORT;
		}

		//到这里是网络路由,假设路由IP=192.168.1.0/24,所以addr=192.168.1.0,mask=255.255.255.0
		//在bad_mask函数内部为真,表示网络路由配置的有问题
		if (bad_mask(mask, addr)) 
			return -EINVAL;

		plen = inet_mask_len(mask); //获取mask的长度
	}

	cfg->fc_dst_len = plen; 
	cfg->fc_dst = addr;

	if (cmd != SIOCDELRT) {
		cfg->fc_nlflags = NLM_F_CREATE;
		cfg->fc_protocol = RTPROT_BOOT;
	}

	if (rt->rt_metric)
		cfg->fc_priority = rt->rt_metric - 1;

	if (rt->rt_flags & RTF_REJECT) {
		cfg->fc_scope = RT_SCOPE_HOST;
		cfg->fc_type = RTN_UNREACHABLE;
		return 0;
	}

	cfg->fc_scope = RT_SCOPE_NOWHERE;
	cfg->fc_type = RTN_UNICAST;

	if (rt->rt_dev) {
		char *colon;
		struct net_device *dev;
		char devname[IFNAMSIZ];

		if (copy_from_user(devname, rt->rt_dev, IFNAMSIZ-1))
			return -EFAULT;

		devname[IFNAMSIZ-1] = 0;
		colon = strchr(devname, ':'); //colon: 冒号  如eth0:1表示eth0的别名
		if (colon)
			*colon = 0;
		dev = __dev_get_by_name(net, devname); //通过接口名称devname获取设备dev
		if (!dev)
			return -ENODEV;
		cfg->fc_oif = dev->ifindex;
		if (colon) {
			struct in_ifaddr *ifa;
			struct in_device *in_dev = __in_dev_get_rtnl(dev);
			if (!in_dev)
				return -ENODEV;
			*colon = ':';
			for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) //遍历设备下的接口
				if (strcmp(ifa->ifa_label, devname) == 0) //接口名称是否相同
					break;
			if (ifa == NULL)
				return -ENODEV;
			cfg->fc_prefsrc = ifa->ifa_local; //获取本地地址
		}
	}

	addr = sk_extract_addr(&rt->rt_gateway); //提取路由网关地址
	if (rt->rt_gateway.sa_family == AF_INET && addr) {
		cfg->fc_gw = addr; //网关地址
		if (rt->rt_flags & RTF_GATEWAY && //目的地址是网关
		    inet_addr_type(net, addr) == RTN_UNICAST)
			cfg->fc_scope = RT_SCOPE_UNIVERSE;
	}

	if (cmd == SIOCDELRT)
		return 0;

	if (rt->rt_flags & RTF_GATEWAY && !cfg->fc_gw) //网关地址无效就直接退出
		return -EINVAL;

	if (cfg->fc_scope == RT_SCOPE_NOWHERE)
		cfg->fc_scope = RT_SCOPE_LINK;

	if (rt->rt_flags & (RTF_MTU | RTF_WINDOW | RTF_IRTT)) { //路由标识
		struct nlattr *mx;
		int len = 0;

		mx = kzalloc(3 * nla_total_size(4), GFP_KERNEL);
		if (mx == NULL)
			return -ENOMEM;

		//特别注意put_rtax函数接口,mx是函数指针,在put_rtax函数内部会执行mx的偏移
		if (rt->rt_flags & RTF_MTU)
			len = put_rtax(mx, len, RTAX_ADVMSS, rt->rt_mtu - 40);

		if (rt->rt_flags & RTF_WINDOW)
			len = put_rtax(mx, len, RTAX_WINDOW, rt->rt_window);

		if (rt->rt_flags & RTF_IRTT)
			len = put_rtax(mx, len, RTAX_RTT, rt->rt_irtt << 3);

		cfg->fc_mx = mx;
		cfg->fc_mx_len = len;
	}

	return 0;
}

b. cmd == SIOCDELRT 的处理;

			if (cmd == SIOCDELRT) {
				tb = fib_get_table(net, cfg.fc_table); //cfg.fc_table = 0
				if (tb)
					err = fib_table_delete(tb, &cfg); //详见其内部的实现
//通过形参id,匹配hash链表,成功就返回tb,否则NULL
struct fib_table *fib_get_table(struct net *net, u32 id)
{
	struct fib_table *tb;
	struct hlist_head *head;
	unsigned int h;

	if (id == 0)
		id = RT_TABLE_MAIN;
	h = id & (FIB_TABLE_HASHSZ - 1); //h = id & 0xff

	rcu_read_lock();
	//关于fib_table_hash[*]的创建,详见:https://blog.csdn.net/guodong1010/article/details/52245555
	head = &net->ipv4.fib_table_hash[h]; //看下这里是什么时候赋值的,在 fib_new_table 函数内部初始化链表的
	hlist_for_each_entry_rcu(tb, head, tb_hlist) { //遍历 net->ipv4.fib_table_hash 链表,寻找匹配成功的路由表id
		if (tb->tb_id == id) {
			rcu_read_unlock();
			return tb;
		}
	}
	rcu_read_unlock();
	return NULL;
}
int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
{
	struct trie *t = (struct trie *) tb->tb_data;
	u32 key, mask;
	int plen = cfg->fc_dst_len;
	u8 tos = cfg->fc_tos;
	struct fib_alias *fa, *fa_to_delete;
	struct list_head *fa_head;
	struct leaf *l;
	struct leaf_info *li;

	if (plen > 32)
		return -EINVAL;

	key = ntohl(cfg->fc_dst); //获取目的地址
	mask = ntohl(inet_make_mask(plen)); //获取子网掩码

	if (key & ~mask) //key是网络地址?如果是网络地址,那么子网掩码为1的个数,即表示网络号,子网掩码为0的表示主机号
		return -EINVAL;

	key = key & mask; //获取网络号地址
	l = fib_find_node(t, key); //返回leaf叶子

	if (!l)
		return -ESRCH;

	li = find_leaf_info(l, plen); //通过叶子leaf上的链表,比较其有效地址长度,获取其leaf_info

	if (!li)
		return -ESRCH;

	fa_head = &li->falh; //通过leaf_info获取fib_alias链表,通过该链表获取其fib_alias别名
	fa = fib_find_alias(fa_head, tos, 0); 

	if (!fa)
		return -ESRCH;

	pr_debug("Deleting %08x/%d tos=%d t=%p\n", key, plen, tos, t);

	fa_to_delete = NULL;
	fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list); //遍历fa->fa_list链表
	list_for_each_entry_continue(fa, fa_head, fa_list) {
		struct fib_info *fi = fa->fa_info;

		if (fa->fa_tos != tos)
			break;

		if ((!cfg->fc_type || fa->fa_type == cfg->fc_type) && //路由类型
		    (cfg->fc_scope == RT_SCOPE_NOWHERE ||
		     fa->fa_info->fib_scope == cfg->fc_scope) && //路由返回,如主机、link...
		    (!cfg->fc_prefsrc ||
		     fi->fib_prefsrc == cfg->fc_prefsrc) && //源地址相等
		    (!cfg->fc_protocol ||
		     fi->fib_protocol == cfg->fc_protocol) && //路由协议类型
		    fib_nh_match(cfg, fi) == 0) {
			fa_to_delete = fa;
			break;
		}
	}

	if (!fa_to_delete)
		return -ESRCH;

	fa = fa_to_delete;
	rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id,
		  &cfg->fc_nlinfo, 0);

	list_del_rcu(&fa->fa_list);

	if (!plen)
		tb->tb_num_default--;

	if (list_empty(fa_head)) {
		hlist_del_rcu(&li->hlist);
		free_leaf_info(li);
	}

	if (hlist_empty(&l->list))
		trie_leaf_remove(t, l);

	if (fa->fa_state & FA_S_ACCESSED)
		rt_cache_flush(cfg->fc_nlinfo.nl_net);

	fib_release_info(fa->fa_info);
	alias_free_mem_rcu(fa);
	return 0;
}

c. cmd == SIOCADDRT的处理;

				tb = fib_new_table(net, cfg.fc_table); //cfg.fc_table = 0
				if (tb)
					err = fib_table_insert(tb, &cfg);
struct fib_table *fib_new_table(struct net *net, u32 id)
{
	struct fib_table *tb;
	unsigned int h;

	if (id == 0)
		id = RT_TABLE_MAIN;
	tb = fib_get_table(net, id); //检索tb是否被加入到id对应的链表(如RT_TABLE_LOCAL链表)上,被加入就直接退出,否则将执行 fib_trie_table
	if (tb)
		return tb;

	tb = fib_trie_table(id); //内存申请一个 fib_table
	if (!tb)
		return NULL;

	switch (id) {
	case RT_TABLE_LOCAL:
		net->ipv4.fib_local = tb;
		break;

	case RT_TABLE_MAIN:
		net->ipv4.fib_main = tb;
		break;

	case RT_TABLE_DEFAULT:
		net->ipv4.fib_default = tb;
		break;

	default:
		break;
	}

	h = id & (FIB_TABLE_HASHSZ - 1);
	hlist_add_head_rcu(&tb->tb_hlist, &net->ipv4.fib_table_hash[h]);//将tb(struct fib_table *tb)添加到其链表上
	return tb;
}
int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
{
	struct trie *t = (struct trie *) tb->tb_data;
	struct fib_alias *fa, *new_fa;
	struct list_head *fa_head = NULL;
	struct fib_info *fi;
	int plen = cfg->fc_dst_len; //目的地址有效bit个数(如表示网络号的个数)
	u8 tos = cfg->fc_tos; //服务类型质量
	u32 key, mask;
	int err;
	struct leaf *l;

	if (plen > 32)
		return -EINVAL;

	key = ntohl(cfg->fc_dst); //获取目的地址

	pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);

	mask = ntohl(inet_make_mask(plen)); //通过地址有效的个数,计算子网掩码

	if (key & ~mask) //网络号
		return -EINVAL;

	key = key & mask; //地址 & mask = 网络号

	fi = fib_create_info(cfg); //分配一个struct fib_info结构体
	if (IS_ERR(fi)) {
		err = PTR_ERR(fi);
		goto err;
	}

	l = fib_find_node(t, key);  //通过关键字key查找leaf
	fa = NULL;

	if (l) { //l为真表示叶子存在
		fa_head = get_fa_head(l, plen); //通过leaf->leaf_info->fa_alias获取其链表头
		fa = fib_find_alias(fa_head, tos, fi->fib_priority); //通过表头fa_head遍历是否存在相同的fa
	}

	/* Now fa, if non-NULL, points to the first fib alias
	 * with the same keys [prefix,tos,priority], if such key already
	 * exists or to the node before which we will insert new one.
	 *
	 * If fa is NULL, we will need to allocate a new one and
	 * insert to the head of f.
	 *
	 * If f is NULL, no fib node matched the destination key
	 * and we need to allocate a new one of those as well.
	 */

	if (fa && fa->fa_tos == tos &&
	    fa->fa_info->fib_priority == fi->fib_priority) { //表明存在相同的fa
		struct fib_alias *fa_first, *fa_match;

		err = -EEXIST;
		if (cfg->fc_nlflags & NLM_F_EXCL)
			goto out;

		/* We have 2 goals:
		 * 1. Find exact match for type, scope, fib_info to avoid
		 * duplicate routes
		 * 2. Find next 'fa' (or head), NLM_F_APPEND inserts before it
		 */
		fa_match = NULL;
		fa_first = fa;
		fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
		list_for_each_entry_continue(fa, fa_head, fa_list) {
			if (fa->fa_tos != tos)
				break;
			if (fa->fa_info->fib_priority != fi->fib_priority)
				break;
			if (fa->fa_type == cfg->fc_type &&
			    fa->fa_info == fi) {
				fa_match = fa;
				break;
			}
		}

		if (cfg->fc_nlflags & NLM_F_REPLACE) { //存在,替换原来的
			struct fib_info *fi_drop;
			u8 state;

			fa = fa_first;
			if (fa_match) {
				if (fa == fa_match)
					err = 0;
				goto out; //上面匹配成功就直接退出,否则要新建一个new_fa
			}
			err = -ENOBUFS;
			new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
			if (new_fa == NULL)
				goto out;

			fi_drop = fa->fa_info;
			new_fa->fa_tos = fa->fa_tos;
			new_fa->fa_info = fi;
			new_fa->fa_type = cfg->fc_type;
			state = fa->fa_state;
			new_fa->fa_state = state & ~FA_S_ACCESSED;

			list_replace_rcu(&fa->fa_list, &new_fa->fa_list);
			alias_free_mem_rcu(fa);

			fib_release_info(fi_drop);
			if (state & FA_S_ACCESSED)
				rt_cache_flush(cfg->fc_nlinfo.nl_net);
			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);

			goto succeeded;
		}
		/* Error if we find a perfect match which
		 * uses the same scope, type, and nexthop
		 * information.
		 */
		if (fa_match) //匹配成功就退出
			goto out;

		if (!(cfg->fc_nlflags & NLM_F_APPEND))
			fa = fa_first;
	}
	err = -ENOENT;
	if (!(cfg->fc_nlflags & NLM_F_CREATE))
		goto out;

	err = -ENOBUFS;
	new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL); //到这里表明上面没有找到相同的fa,需重新申请一个新的
	if (new_fa == NULL)
		goto out;

	//初始化fib_alias结构体
	new_fa->fa_info = fi; //绑定上面分配的fi(fib_info)
	//关键字绑定
	new_fa->fa_tos = tos;
	new_fa->fa_type = cfg->fc_type;
	new_fa->fa_state = 0;
	/*
	 * Insert new entry to the list.
	 */

	if (!fa_head) { //为NULL,表明是第一次执行
		fa_head = fib_insert_node(t, key, plen); //插入一个节点,内部的实现还未理顺,待分析中,核心部分!!!
		if (unlikely(!fa_head)) {
			err = -ENOMEM;
			goto out_free_new_fa;
		}
	}

	if (!plen)
		tb->tb_num_default++;

	list_add_tail_rcu(&new_fa->fa_list,
			  (fa ? &fa->fa_list : fa_head));

	rt_cache_flush(cfg->fc_nlinfo.nl_net);
	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
		  &cfg->fc_nlinfo, 0);
succeeded:
	return 0;

out_free_new_fa:
	kmem_cache_free(fn_alias_kmem, new_fa);
out:
	fib_release_info(fi);
err:
	return err;
}

最后在该函数内部通过路由消息发送

	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb->tb_id,
		  &cfg->fc_nlinfo, 0);

关于该路由接收的处理,这里不再赘述,详见:https://blog.csdn.net/chenliang0224/article/details/82534489 里面有描述接收部分的处理。

你可能感兴趣的:(linux,tcp/ip)