ipvlan l3s模式

ipvlan 三种模式,l2、l3、l3s,前两种网上资料很多,但第三种却很少,自己看了下代码记录一下。
为什么要看ipvlan?它特别适合做多租户nat场景,这种场景下,用户的内网VPC地址网段可能是重叠的,需要使用net namespace、vrf等手段隔离路由,但通常公网网卡和公网网关就一个,且网关地址和用户公网地址通常不是一个网段的,也就是说不能放到一个二层中。但使用ipvlan l3 mode可以解决这个问题。到此为止,有时间写一下用法,继续l3s mode。
附英文注释:

4.1 L2 mode:
    In this mode TX processing happens on the stack instance attached to the
slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well.

4.2 L3 mode:
    In this mode TX processing up to L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves
will not receive nor can send multicast / broadcast traffic.

4.3 L3S mode:
    This is very similar to the L3 mode except that iptables (conn-tracking)
works in this mode and hence it is L3-symmetric (L3s). This will have slightly less
performance but that shouldn't matter since you are choosing this mode over plain-L3
mode to make conn-tracking work.

虽然说 l3s is very similar to the L3 mode,但代码上面完全不同。

物理口收到报文之后,调用接口的handler函数,可以看到IPVLAN_MODE_L3S模式下,直接返回了RX_HANDLER_PASS,什么也没做,继续内核的协议栈,走ip_rcv函数过了PREROUTING,再调用ip_rcv_finish函数。


rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
{
    struct sk_buff *skb = *pskb;
    struct ipvl_port *port = ipvlan_port_get_rcu(skb->dev);

    if (!port)
        return RX_HANDLER_PASS;

    switch (port->mode) {
    case IPVLAN_MODE_L2:
        return ipvlan_handle_mode_l2(pskb, port);
    case IPVLAN_MODE_L3:
        return ipvlan_handle_mode_l3(pskb, port);
    case IPVLAN_MODE_L3S:
        return RX_HANDLER_PASS;
    }

    /* Should not reach here */
    WARN_ONCE(true, "ipvlan_handle_frame() called for mode = [%hx]\n",
              port->mode);
    kfree_skb(skb);
    return RX_HANDLER_CONSUMED;
}

ip_rcv_finish 会调用l3mdev_ip_rcv 函数,这个是重点。

在创建ipvlan接口时(ipvlan_link_new),如果是 IPVLAN_MODE_L3S 模式,会给物理口挂载l3mdev_ops=&ipvl_l3mdev_ops 和 一个netfilter 钩子函数。


static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
{
   struct ipvl_dev *ipvlan;
   struct net_device *mdev = port->dev;
   int err = 0;

   ASSERT_RTNL();
   if (port->mode != nval) {
       if (nval == IPVLAN_MODE_L3S) {
           /* New mode is L3S */
           err = ipvlan_register_nf_hook();
           if (!err) {
               mdev->l3mdev_ops = &ipvl_l3mdev_ops;
               mdev->priv_flags |= IFF_L3MDEV_MASTER;
           } else
               return err;
       } else if (port->mode == IPVLAN_MODE_L3S) {
           /* Old mode was L3S */
           mdev->priv_flags &= ~IFF_L3MDEV_MASTER;
           ipvlan_unregister_nf_hook();
           mdev->l3mdev_ops = NULL;
       }
       list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
           if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S)
               ipvlan->dev->flags |= IFF_NOARP;
           else
               ipvlan->dev->flags &= ~IFF_NOARP;
       }
       port->mode = nval;
   }
   return err;
}

l3mdev_ops一般用来定义路由查找逻辑,如vrf的实现,它会在特定的路由表中查找路由。ipvlan l3s模式这个l3mdev_l3_rcv函数自己定义了路由查找方式,会根据报文目的地址或者arp的target ip找到slave 接口,然后将其作为入接口,在其所在的netns中查找路由,由于是本地报文,会走到ip_local_deliver,作为本机报文处理。

static struct nf_hook_ops ipvl_nfops[] __read_mostly = {
    {
        .hook     = ipvlan_nf_input,
        .pf       = NFPROTO_IPV4,
        .hooknum  = NF_INET_LOCAL_IN,
        .priority = INT_MAX,
    },
    {
        .hook     = ipvlan_nf_input,
        .pf       = NFPROTO_IPV6,
        .hooknum  = NF_INET_LOCAL_IN,
        .priority = INT_MAX,
    },
};


static struct l3mdev_ops ipvl_l3mdev_ops __read_mostly = {
    .l3mdev_l3_rcv = ipvlan_l3_rcv,
};


struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb,
                  u16 proto)
{
    struct ipvl_addr *addr;
    struct net_device *sdev;

    addr = ipvlan_skb_to_addr(skb, dev);
    if (!addr)
        goto out;

    sdev = addr->master->dev;
    switch (proto) {
    case AF_INET:
    {
        int err;
        struct iphdr *ip4h = ip_hdr(skb);

        err = ip_route_input_noref(skb, ip4h->daddr, ip4h->saddr,
                       ip4h->tos, sdev);
        if (unlikely(err))
            goto out;
        break;
    }
    case AF_INET6:
    {
        struct dst_entry *dst;
        struct ipv6hdr *ip6h = ipv6_hdr(skb);
        int flags = RT6_LOOKUP_F_HAS_SADDR;
        struct flowi6 fl6 = {
            .flowi6_iif   = sdev->ifindex,
            .daddr        = ip6h->daddr,
            .saddr        = ip6h->saddr,
            .flowlabel    = ip6_flowinfo(ip6h),
            .flowi6_mark  = skb->mark,
            .flowi6_proto = ip6h->nexthdr,
        };

        skb_dst_drop(skb);
        dst = ip6_route_input_lookup(dev_net(sdev), sdev, &fl6, flags);
        skb_dst_set(skb, dst);
        break;
    }
    default:
        break;
    }

out:
    return skb;
}

可以看到和l3 mode在代码流程方面区别很大,l3s过了PREROUTING hook点,找到slave接口后,再过LOCAL_IN hook点。
nat场景下,l3 mode在找到slave 接口之后调用netif_rx_internal会完整再走一边协议栈,可以做由外到内的一对一nat,而l3s mode,找到slave接口后,调用ip_local_deliver直接进LOCAL_IN hook点,后面送上层协议栈了,没机会做DNAT了。

你可能感兴趣的:(ipvlan l3s模式)