【OpenVswitch源码分析之四】控制面关键接口与调用流程

受限于篇幅,前文只讲述了控制面板关键接口的前两个部分,本篇继续讲述基本二层协议的配置,Openflow的控制又是一个很大的命题,会放到下一篇文章做阐述。
1. 虚拟设备的生命周期接口
2. 虚拟设备的业务配置接口
3. 二层基础协议的运行启动
4. Openflow协议的运行开启与流表下发
本节以BFD的配置为例做讲解,无论是STP,BFD的配置,初始的配置接口都是在vsctl中,vsctl客户端是针对OVSDB配置的交互式客户端,除了对OVSDB的表项做直接的配置,也可以二层协议做使能工作,当然默认初始化时这些功能并没有使能。

static void
usage(void)
{
    printf("\
%s: ovs-vswitchd management utility\n\
usage: %s [OPTIONS] COMMAND [ARG...]\n\
\n\
Open vSwitch commands:\n\
  init                        initialize database, if not yet initialized\n\
  show                        print overview of database contents\n\
  emer-reset                  reset configuration to clean state\n\
\n\
Bridge commands:\n\
  add-br BRIDGE               create a new bridge named BRIDGE\n\
  add-br BRIDGE PARENT VLAN   create new fake BRIDGE in PARENT on VLAN\n\
  del-br BRIDGE               delete BRIDGE and all of its ports\n\
  list-br                     print the names of all the bridges\n\
  br-exists BRIDGE            exit 2 if BRIDGE does not exist\n\
  br-to-vlan BRIDGE           print the VLAN which BRIDGE is on\n\
  br-to-parent BRIDGE         print the parent of BRIDGE\n\
  br-set-external-id BRIDGE KEY VALUE  set KEY on BRIDGE to VALUE\n\
  br-set-external-id BRIDGE KEY  unset KEY on BRIDGE\n\
  br-get-external-id BRIDGE KEY  print value of KEY on BRIDGE\n\
  br-get-external-id BRIDGE  list key-value pairs on BRIDGE\n\
\n\
Port commands (a bond is considered to be a single port):\n\
  list-ports BRIDGE           print the names of all the ports on BRIDGE\n\
  add-port BRIDGE PORT        add network device PORT to BRIDGE\n\
  add-bond BRIDGE PORT IFACE...  add bonded port PORT in BRIDGE from IFACES\n\
  del-port [BRIDGE] PORT      delete PORT (which may be bonded) from BRIDGE\n\
  port-to-br PORT             print name of bridge that contains PORT\n\
\n\
Interface commands (a bond consists of multiple interfaces):\n\
  list-ifaces BRIDGE          print the names of all interfaces on BRIDGE\n\
  iface-to-br IFACE           print name of bridge that contains IFACE\n\
\n\
Controller commands:\n\
  get-controller BRIDGE      print the controllers for BRIDGE\n\
  del-controller BRIDGE      delete the controllers for BRIDGE\n\
  set-controller BRIDGE TARGET...  set the controllers for BRIDGE\n\
  get-fail-mode BRIDGE       print the fail-mode for BRIDGE\n\
  del-fail-mode BRIDGE       delete the fail-mode for BRIDGE\n\
  set-fail-mode BRIDGE MODE  set the fail-mode for BRIDGE to MODE\n\
\n\
Manager commands:\n\
  get-manager                print the managers\n\
  del-manager                delete the managers\n\
  set-manager TARGET...      set the list of managers to TARGET...\n\
\n\
SSL commands:\n\
  get-ssl                     print the SSL configuration\n\
  del-ssl                     delete the SSL configuration\n\
  set-ssl PRIV-KEY CERT CA-CERT  set the SSL configuration\n\
\n\
Auto Attach commands:\n\
  add-aa-mapping BRIDGE I-SID VLAN   add Auto Attach mapping to BRIDGE\n\
  del-aa-mapping BRIDGE I-SID VLAN   delete Auto Attach mapping VLAN from BRIDGE\n\
  get-aa-mapping BRIDGE              get Auto Attach mappings from BRIDGE\n\
\n\
Switch commands:\n\
  emer-reset                  reset switch to known good state\n\
\n\
%s\
\n\
Options:\n\
  --db=DATABASE               connect to DATABASE\n\
                              (default: %s)\n\
  --no-wait                   do not wait for ovs-vswitchd to reconfigure\n\
  --retry                     keep trying to connect to server forever\n\
  -t, --timeout=SECS          wait at most SECS seconds for ovs-vswitchd\n\
  --dry-run                   do not commit changes to database\n\
  --oneline                   print exactly one line of output per command\n",
           program_name, program_name, ctl_get_db_cmd_usage(), ctl_default_db());
    vlog_usage();
    printf("\
  --no-syslog             equivalent to --verbose=vsctl:syslog:warn\n");
    stream_usage("database", true, true, false);
    printf("\n\
Other options:\n\
  -h, --help                  display this help message\n\
  -V, --version               display version information\n");
    exit(EXIT_SUCCESS);
}

上部分代码表述命令行的用途,而bfd以及stp的配置都是Bridge数据接口中的配置使能位,只要配置使能了,相关的二层协议就会开始运行。

/* Configures BFD on 'ofp_port' in 'ofproto'.  This function has no effect if
 - 'ofproto' does not have a port 'ofp_port'. */
void
ofproto_port_set_bfd(struct ofproto *ofproto, ofp_port_t ofp_port,
                     const struct smap *cfg)
{
    struct ofport *ofport;
    int error;

    ofport = ofproto_get_port(ofproto, ofp_port);
    if (!ofport) {
        VLOG_WARN("%s: cannot configure bfd on nonexistent port %"PRIu32,
                  ofproto->name, ofp_port);
        return;
    }

    error = (ofproto->ofproto_class->set_bfd
             ? ofproto->ofproto_class->set_bfd(ofport, cfg)
             : EOPNOTSUPP);
    if (error) {
        VLOG_WARN("%s: bfd configuration on port %"PRIu32" (%s) failed (%s)",
                  ofproto->name, ofp_port, netdev_get_name(ofport->netdev),
                  ovs_strerror(error));
    }
}

先简单介绍下BFD

为了保护关键应用,网络中会设计有一定的冗余备份链路,网络发生故障时就要求网络设备能够快速检测出故障并将流量切换至备份链路以加快网络收敛速度。目前有些链路(如POS)通过硬件检测机制来实现快速故障检测。但是某些链路(如以太网链路)不具备这样的检测机制。此时,应用就要依靠上层协议自身的机制来进行故障检测,上层协议的检测时间都在1秒以上,这样的故障检测时间对某些应用来说是不能容忍的。某些路由协议如OSPF、IS-IS虽然有Fast Hello功能来加快检测速度,但是检测时间也只能达到1秒的精度,而且Fast Hello功能只是针对本协议的,无法为其它协议提供快速故障检测。

BFD:(Bidirectional Forwarding Detection,双向转发检测)协议提供一种轻负载、快速检测两台邻接路由器/交换机之间转发路径连通状态的方法,它是一个简单的“Hello”协议,在很多方面,它与那些著名的路由协议的邻居检测部分相似。一对系统在它们之间的所建立会话的通道上周期性的发送检测报文,如果某个系统在足够长的时间内没有收到对端的检测报文,则认为在这条到相邻系统的双向通道的某个部分发生了故障协议邻居通过该方式可以快速检测到转发路径的连通故障,加快启用备份转发路径,提升现有网络性能。

BFD 提供的检测机制与所应用的接口介质类型、封装格式、以及关联的上层协议如 OSPF、BGP、RIP 等无关。BFD 在两台路由器之间建立会话,通过快速发送检测故障消息给正在运行的路由协议,以触发路由协议重新计算路由表,大大减少整个网络的收敛时间。BFD 本身没有发现邻居的能力,需要上层协议通知与哪个邻居建立会话。

BFD报文格式
BFD发送的检测报文是UDP报文,定义两种类型的报文
建立BFD会话时缺省采用版本1,如果收到对端系统发送的是版本0的报文,将自动切换到版本0。可以通过show bfd neighbors命令查看采用的版本。

1)控制报文
其格式如下:
【OpenVswitch源码分析之四】控制面关键接口与调用流程_第1张图片
- Vers:BFD协议版本号,目前为1
- Diag:诊断字,标明本地BFD系统最近一次会话状态发生变化的原因
- Sta:BFD本地状态
- P:参数发生改变时,发送方在BFD报文中置该标志,接收方必须立即响应该报文
- F:响应P标志置位的回应报文中必须将F标志置位
- C:转发/控制分离标志,一旦置位,控制平面的变化不影响BFD检测,如:控制平面为ISIS,当ISIS重- 启/GR时,BFD可以继续监测链路状态
- A:认证标识,置位代表会话需要进行验证
- D:查询请求,置位代表发送方期望采用查询模式对链路进行监测
- R:预留位
- Detect Mult:检测超时倍数,用于检测方计算检测超时时间
- Length:报文长度
- My Discreaminator:BFD会话连接本地标识符
- Your Discreaminator:BFD会话连接远端标识符
- Desired Min Tx Interval:本地支持的最小BFD报文发送间隔
- Required Min RX Interval:本地支持的最小BFD接收间隔
- Required Min Echo RX Interval:本地支持的最小Echo报文接收间隔(如果本地不支持Echo功能,则设置0)
- Auth Type:认证类型,目前协议提供有

  • Simple Password
  • Keyed MD5
  • Meticulous Keyed MD5
  • Keyed SHA1
  • Meticulous Keyed SHA1

-Auth Length:认证数据长度 Authentication Data:认证数据区
其中认证部分为可选部分,可以在报文中选择使用,其中认证方式可以有:Simple Password、Keyed MD5、Meticulous Keyed MD5、Keyed SHA1、Meticulous Keyed SHA1。协议定义了控制报文所使用的UDP目的端口号为3784。

2)回声报文(ECHO)
BFD协议并未定义回声报文的格式,但是对于回声报文,其格式只是与本地相关,远端只需把此报文在反向通道上返回,回声报文的源目的IP相同。
会话建立
BFD在检测前,需要在通道两端建立对等会话,会话建立以后以协商后的速率各自向对端发送BFD的控制报文来实现故障检测。其会话检测的路径可以是标记交换路径,也可以是其它类型的隧道或是可交换以太网。
1)会话初始化过程
对于BFD会话建立过程中的初始化阶段,两端是主动角色还是被动角色是由应用来决定的,但是至少有一端为主动角色。
2)会话建立过程
会话建立过程是一个三次握手的过程,经过此过程后两端的会话变为Up状态,在此过程中同时协商好相应的参数,以后的状态变化就是根据缺陷的检测结果来进行,并做相应的处理。其状态机迁移如下:
【OpenVswitch源码分析之四】控制面关键接口与调用流程_第2张图片
以BFD会话连接建立过程为例,简要介绍状态机迁移过程:
【OpenVswitch源码分析之四】控制面关键接口与调用流程_第3张图片
- A、B两站启动BFD,各自初始状态为“down”,发送BFD报文携带状态为“down”
- B站收到状态为“down”的BFD报文,本地状态切换至“init”,发送BFD报文携带状态为“init”
- B站本地BFD状态为“init”后,再接收到状态为“down”的报文不做处理
- A站BFD状态变化过程同上
- B站收到状态为“init”的BFD报文,本地状态切换至“up”
- A站BFD状态变化过程同上
- A、B两站在发生“down => init”变化后,会启动一个超时定时器,该定时器的作用是防止本地状态阻塞在“init”(有可能AB连接此时断连,会话不能正常建立),如果在规定的时间内仍未收到状态为“init/up”的BFD报文,则状态自动切换回“down”
- 本地状态UP标志会话建立成功

BFD实现简介

/* Initializes, destroys, or reconfigures the BFD session 'bfd' (named 'name'),
 * according to the database configuration contained in 'cfg'.  Takes ownership
 * of 'bfd', which may be NULL.  Returns a BFD object which may be used as a
 * handle for the session, or NULL if BFD is not enabled according to 'cfg'.
 * Also returns NULL if cfg is NULL. */
struct bfd *
bfd_configure(struct bfd *bfd, const char *name, const struct smap *cfg,
              struct netdev *netdev) OVS_EXCLUDED(mutex)
{
    static atomic_count udp_src = ATOMIC_COUNT_INIT(0);

    int decay_min_rx;
    long long int min_tx, min_rx;
    bool need_poll = false;
    bool cfg_min_rx_changed = false;
    bool cpath_down, forwarding_if_rx;

    if (!cfg || !smap_get_bool(cfg, "enable", false)) {
        bfd_unref(bfd);
        return NULL;
    }

    ovs_mutex_lock(&mutex);
    if (!bfd) {
        bfd = xzalloc(sizeof *bfd);
        bfd->name = xstrdup(name);
        bfd->forwarding_override = -1;
        bfd->disc = generate_discriminator();
        hmap_insert(all_bfds, &bfd->node, bfd->disc);

        bfd->diag = DIAG_NONE;
        bfd->min_tx = 1000;
        bfd->mult = 3;
        ovs_refcount_init(&bfd->ref_cnt);
        bfd->netdev = netdev_ref(netdev);
        bfd->rx_packets = bfd_rx_packets(bfd);
        bfd->in_decay = false;
        bfd->flap_count = 0;

        /* RFC 5881 section 4
         * The source port MUST be in the range 49152 through 65535.  The same
         * UDP source port number MUST be used for all BFD Control packets
         * associated with a particular session.  The source port number SHOULD
         * be unique among all BFD sessions on the system. */
        bfd->udp_src = (atomic_count_inc(&udp_src) % 16384) + 49152;

        bfd_set_state(bfd, STATE_DOWN, DIAG_NONE);

        bfd_status_changed(bfd);
    }

    bfd->oam = smap_get_bool(cfg, "oam", false);

    atomic_store_relaxed(&bfd->check_tnl_key,
                         smap_get_bool(cfg, "check_tnl_key", false));
    min_tx = smap_get_int(cfg, "min_tx", 100);
    min_tx = MAX(min_tx, 1);
    if (bfd->cfg_min_tx != min_tx) {
        bfd->cfg_min_tx = min_tx;
        if (bfd->state != STATE_UP
            || (!bfd_in_poll(bfd) && bfd->cfg_min_tx < bfd->min_tx)) {
            bfd->min_tx = bfd->cfg_min_tx;
        }
        need_poll = true;
    }

    min_rx = smap_get_int(cfg, "min_rx", 1000);
    min_rx = MAX(min_rx, 1);
    if (bfd->cfg_min_rx != min_rx) {
        bfd->cfg_min_rx = min_rx;
        if (bfd->state != STATE_UP
            || (!bfd_in_poll(bfd) && bfd->cfg_min_rx > bfd->min_rx)) {
            bfd->min_rx = bfd->cfg_min_rx;
        }
        cfg_min_rx_changed = true;
        need_poll = true;
    }

    decay_min_rx = smap_get_int(cfg, "decay_min_rx", 0);
    if (bfd->decay_min_rx != decay_min_rx || cfg_min_rx_changed) {
        if (decay_min_rx > 0 && decay_min_rx < bfd->cfg_min_rx) {
            VLOG_WARN("%s: decay_min_rx cannot be less than %lld ms",
                      bfd->name, bfd->cfg_min_rx);
            bfd->decay_min_rx = 0;
        } else {
            bfd->decay_min_rx = decay_min_rx;
        }
        /* Resets decay. */
        bfd->in_decay = false;
        bfd_decay_update(bfd);
        need_poll = true;
    }

    cpath_down = smap_get_bool(cfg, "cpath_down", false);
    if (bfd->cpath_down != cpath_down) {
        bfd->cpath_down = cpath_down;
        bfd_set_state(bfd, bfd->state, DIAG_NONE);
        need_poll = true;
    }

    eth_addr_from_string(smap_get_def(cfg, "bfd_local_src_mac", ""),
                         &bfd->local_eth_src);
    eth_addr_from_string(smap_get_def(cfg, "bfd_local_dst_mac", ""),
                         &bfd->local_eth_dst);
    eth_addr_from_string(smap_get_def(cfg, "bfd_remote_dst_mac", ""),
                         &bfd->rmt_eth_dst);

    bfd_lookup_ip(smap_get_def(cfg, "bfd_src_ip", ""),
                  htonl(0xA9FE0101) /* 169.254.1.1 */, &bfd->ip_src);
    bfd_lookup_ip(smap_get_def(cfg, "bfd_dst_ip", ""),
                  htonl(0xA9FE0100) /* 169.254.1.0 */, &bfd->ip_dst);

    forwarding_if_rx = smap_get_bool(cfg, "forwarding_if_rx", false);
    if (bfd->forwarding_if_rx != forwarding_if_rx) {
        bfd->forwarding_if_rx = forwarding_if_rx;
        if (bfd->state == STATE_UP && bfd->forwarding_if_rx) {
            bfd_forwarding_if_rx_update(bfd);
        } else {
            bfd->forwarding_if_rx_detect_time = 0;
        }
    }

    if (need_poll) {
        bfd_poll(bfd);
    }
    ovs_mutex_unlock(&mutex);
    return bfd;
}

可以看到上述代码对BFD的控制报文的相关设置基本覆盖,接下来就是如何把BFD报文编码交给内核进行进一步处理

/* Executes, against 'dpif', up to the first 'n_ops' operations in 'ops'.
 * Returns the number actually executed (at least 1, if 'n_ops' is
 * positive). */
static size_t
dpif_netlink_operate__(struct dpif_netlink *dpif,
                       struct dpif_op **ops, size_t n_ops)
{
    enum { MAX_OPS = 50 };

    struct op_auxdata {
        struct nl_transaction txn;

        struct ofpbuf request;
        uint64_t request_stub[1024 / 8];

        struct ofpbuf reply;
        uint64_t reply_stub[1024 / 8];
    } auxes[MAX_OPS];

    struct nl_transaction *txnsp[MAX_OPS];
    size_t i;

    n_ops = MIN(n_ops, MAX_OPS);
    for (i = 0; i < n_ops; i++) {
        struct op_auxdata *aux = &auxes[i];
        struct dpif_op *op = ops[i];
        struct dpif_flow_put *put;
        struct dpif_flow_del *del;
        struct dpif_flow_get *get;
        struct dpif_netlink_flow flow;

        ofpbuf_use_stub(&aux->request,
                        aux->request_stub, sizeof aux->request_stub);
        aux->txn.request = &aux->request;

        ofpbuf_use_stub(&aux->reply, aux->reply_stub, sizeof aux->reply_stub);
        aux->txn.reply = NULL;

        switch (op->type) {
        case DPIF_OP_FLOW_PUT:
            put = &op->u.flow_put;
            dpif_netlink_init_flow_put(dpif, put, &flow);
            if (put->stats) {
                flow.nlmsg_flags |= NLM_F_ECHO;
                aux->txn.reply = &aux->reply;
            }
            dpif_netlink_flow_to_ofpbuf(&flow, &aux->request);
            break;

        case DPIF_OP_FLOW_DEL:
            del = &op->u.flow_del;
            dpif_netlink_init_flow_del(dpif, del, &flow);
            if (del->stats) {
                flow.nlmsg_flags |= NLM_F_ECHO;
                aux->txn.reply = &aux->reply;
            }
            dpif_netlink_flow_to_ofpbuf(&flow, &aux->request);
            break;

        case DPIF_OP_EXECUTE:
            /* Can't execute a packet that won't fit in a Netlink attribute. */
            if (OVS_UNLIKELY(nl_attr_oversized(
                                 dp_packet_size(op->u.execute.packet)))) {
                /* Report an error immediately if this is the first operation.
                 * Otherwise the easiest thing to do is to postpone to the next
                 * call (when this will be the first operation). */
                if (i == 0) {
                    VLOG_ERR_RL(&error_rl,
                                "dropping oversized %"PRIu32"-byte packet",
                                dp_packet_size(op->u.execute.packet));
                    op->error = ENOBUFS;
                    return 1;
                }
                n_ops = i;
            } else {
                dpif_netlink_encode_execute(dpif->dp_ifindex, &op->u.execute,
                                            &aux->request);
            }
            break;

        case DPIF_OP_FLOW_GET:
            get = &op->u.flow_get;
            dpif_netlink_init_flow_get(dpif, get, &flow);
            aux->txn.reply = get->buffer;
            dpif_netlink_flow_to_ofpbuf(&flow, &aux->request);
            break;

        default:
            OVS_NOT_REACHED();
        }
    }

    for (i = 0; i < n_ops; i++) {
        txnsp[i] = &auxes[i].txn;
    }
    nl_transact_multiple(NETLINK_GENERIC, txnsp, n_ops);

    for (i = 0; i < n_ops; i++) {
        struct op_auxdata *aux = &auxes[i];
        struct nl_transaction *txn = &auxes[i].txn;
        struct dpif_op *op = ops[i];
        struct dpif_flow_put *put;
        struct dpif_flow_del *del;
        struct dpif_flow_get *get;

        op->error = txn->error;

        switch (op->type) {
        case DPIF_OP_FLOW_PUT:
            put = &op->u.flow_put;
            if (put->stats) {
                if (!op->error) {
                    struct dpif_netlink_flow reply;

                    op->error = dpif_netlink_flow_from_ofpbuf(&reply,
                                                              txn->reply);
                    if (!op->error) {
                        dpif_netlink_flow_get_stats(&reply, put->stats);
                    }
                }
            }
            break;

        case DPIF_OP_FLOW_DEL:
            del = &op->u.flow_del;
            if (del->stats) {
                if (!op->error) {
                    struct dpif_netlink_flow reply;

                    op->error = dpif_netlink_flow_from_ofpbuf(&reply,
                                                              txn->reply);
                    if (!op->error) {
                        dpif_netlink_flow_get_stats(&reply, del->stats);
                    }
                }
            }
            break;

        case DPIF_OP_EXECUTE:
            break;

        case DPIF_OP_FLOW_GET:
            get = &op->u.flow_get;
            if (!op->error) {
                struct dpif_netlink_flow reply;

                op->error = dpif_netlink_flow_from_ofpbuf(&reply, txn->reply);
                if (!op->error) {
                    dpif_netlink_flow_to_dpif_flow(&dpif->dpif, get->flow,
                                                   &reply);
                }
            }
            break;

        default:
            OVS_NOT_REACHED();
        }

        ofpbuf_uninit(&aux->request);
        ofpbuf_uninit(&aux->reply);
    }

    return n_ops;
}

这里仍然是调用Netlink的协议族,以OVS_PACKET_CMD_EXECUTE来标识消息类型,当内核收到该类型消息后会相应的调用ovs_packet_cmd_execute方法进行处理

static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
{
    struct ovs_header *ovs_header = info->userhdr;
    struct net *net = sock_net(skb->sk);
    struct nlattr **a = info->attrs;
    struct sw_flow_actions *acts;
    struct sk_buff *packet;
    struct sw_flow *flow;
    struct sw_flow_actions *sf_acts;
    struct datapath *dp;
    struct ethhdr *eth;
    struct vport *input_vport;
    u16 mru = 0;
    int len;
    int err;
    bool log = !a[OVS_PACKET_ATTR_PROBE];

    err = -EINVAL;
    if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] ||
        !a[OVS_PACKET_ATTR_ACTIONS])
        goto err;

    len = nla_len(a[OVS_PACKET_ATTR_PACKET]);
    packet = __dev_alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL);
    err = -ENOMEM;
    if (!packet)
        goto err;
    skb_reserve(packet, NET_IP_ALIGN);

    nla_memcpy(__skb_put(packet, len), a[OVS_PACKET_ATTR_PACKET], len);

    skb_reset_mac_header(packet);
    eth = eth_hdr(packet);

    /* Normally, setting the skb 'protocol' field would be handled by a
     * call to eth_type_trans(), but it assumes there's a sending
     * device, which we may not have.
     */
    if (eth_proto_is_802_3(eth->h_proto))
        packet->protocol = eth->h_proto;
    else
        packet->protocol = htons(ETH_P_802_2);

    /* Set packet's mru */
    if (a[OVS_PACKET_ATTR_MRU]) {
        mru = nla_get_u16(a[OVS_PACKET_ATTR_MRU]);
        packet->ignore_df = 1;
    }
    OVS_CB(packet)->mru = mru;

    /* Build an sw_flow for sending this packet. */
    flow = ovs_flow_alloc();
    err = PTR_ERR(flow);
    if (IS_ERR(flow))
        goto err_kfree_skb;

    err = ovs_flow_key_extract_userspace(net, a[OVS_PACKET_ATTR_KEY],
                         packet, &flow->key, log);
    if (err)
        goto err_flow_free;

    err = ovs_nla_copy_actions(net, a[OVS_PACKET_ATTR_ACTIONS],
                   &flow->key, &acts, log);
    if (err)
        goto err_flow_free;

    rcu_assign_pointer(flow->sf_acts, acts);
    packet->priority = flow->key.phy.priority;
    packet->mark = flow->key.phy.skb_mark;

    rcu_read_lock();
    dp = get_dp_rcu(net, ovs_header->dp_ifindex);
    err = -ENODEV;
    if (!dp)
        goto err_unlock;

    input_vport = ovs_vport_rcu(dp, flow->key.phy.in_port);
    if (!input_vport)
        input_vport = ovs_vport_rcu(dp, OVSP_LOCAL);

    if (!input_vport)
        goto err_unlock;

    packet->dev = input_vport->dev;
    OVS_CB(packet)->input_vport = input_vport;
    sf_acts = rcu_dereference(flow->sf_acts);

    local_bh_disable();
    err = ovs_execute_actions(dp, packet, sf_acts, &flow->key);
    local_bh_enable();
    rcu_read_unlock();

    ovs_flow_free(flow, false);
    return err;

err_unlock:
    rcu_read_unlock();
err_flow_free:
    ovs_flow_free(flow, false);
err_kfree_skb:
    kfree_skb(packet);
err:
    return err;
}

这个处理函数会调用到相应的Output方法,将报文从相应的端口发送出去,而对应的,BFD的状态机的维护和对应的处理都在控制面,所以在数据面收到相关的BFD报文时是要上送到控制面进行处理的。

enum upcall_type {
    BAD_UPCALL,                 /* Some kind of bug somewhere. */
    MISS_UPCALL,                /* A flow miss.  */
    SFLOW_UPCALL,               /* sFlow sample. */
    FLOW_SAMPLE_UPCALL,         /* Per-flow sampling. */
    IPFIX_UPCALL                /* Per-bridge sampling. */
};
/* A packet passed up from the datapath to userspace.
 *
 * The 'packet', 'key' and 'userdata' may point into data in a buffer
 * provided by the caller, so the buffer should be released only after the
 * upcall processing has been finished.
 *
 * While being processed, the 'packet' may be reallocated, so the packet must
 * be separately released with ofpbuf_uninit().
 */
struct dpif_upcall {
    /* All types. */
    struct dp_packet packet;    /* Packet data,'dp_packet' should be the first
                   member to avoid a hole. This is because
                   'rte_mbuf' in dp_packet is aligned atleast
                   on a 64-byte boundary */
    enum dpif_upcall_type type;
    struct nlattr *key;         /* Flow key. */
    size_t key_len;             /* Length of 'key' in bytes. */
    ovs_u128 ufid;              /* Unique flow identifier for 'key'. */
    struct nlattr *mru;         /* Maximum receive unit. */
    struct nlattr *cutlen;      /* Number of bytes shrink from the end. */

    /* DPIF_UC_ACTION only. */
    struct nlattr *userdata;    /* Argument to OVS_ACTION_ATTR_USERSPACE. */
    struct nlattr *out_tun_key;    /* Output tunnel key. */
    struct nlattr *actions;    /* Argument to OVS_ACTION_ATTR_USERSPACE. */
};

从上述代码可以看出Upcall的消息类型,这里面BFD的控制报文会以MISS_UPCALL的消息类型经过Datapath模块调用Upcall上送到控制面进行处理

你可能感兴趣的:(云计算,C语言算法,OpenVswitch)