vswitchd是用户态的daemon进程,其核心是执行ofproto的逻辑。我们知道ovs是遵从openflow交换机的规范实现的,就拿二层包转发为例,传统交换机(包括Linux bridge的实现)是通过查找cam表,找到dst mac对应的port;而open vswitch的实现则是根据入包skb,查找是否有对应的flow。如果有flow,说明这个skb不是流的第一个包了,那么可以在flow->action里找到转发的port。这里要说明的是,SDN的思想就是所有的包都需要对应一个flow,基于flow给出包的行为action,传统的action无非就是转发,接受,或者丢弃,而在SDN中,会有更多的action定义:修改skb的内容,改变包的路径,clone多份出来发到不同路径等等。
如果skb没有对应的flow,说明这是flow的第一个包,需要为这个包创建一个flow,vswitchd会在一个while循环里反复检查有没有ofproto的请求过来,有可能是ovs-ofctl传过来的,也可能是openvswitch.ko通过netlink发送的upcall请求,当然大部分情况下,都是flow miss导致的创建flow的请求,这时vswitchd会基于openflow规范创建flow, action,我们看下这个流程:
由于open vswitch是一个2层交换机模型,所有包开始都是从某个port接收进来,即调用ovs_dp_process_received_packet,该函数先基于skb通过ovs_flow_extract生成key,然后调用ovs_flow_tbl_lookup基于key查找flow,如果无法找到flow,调用ovs_dp_upcall通过netlink把一个dp_upcall_info结构发到vswitchd里去处理(调用genlmsg_unicast)
vswitchd会在handle_upcalls里来处理上述的netlink request,对于flow table里miss的情况,会调用handle_miss_upcalls,继而又调用handle_flow_miss,下面来看handle_miss_upcalls的实现
static void
handle_miss_upcalls(struct dpif_backer *backer, struct dpif_upcall *upcalls,
size_t n_upcalls)
{
/* Construct the to-do list.
*
* This just amounts to extracting the flow from each packet and sticking
* the packets that have the same flow in the same "flow_miss" structure so
* that we can process them together. */
hmap_init(&todo);
n_misses = 0;
注释里写得很明白,下面的循环会遍历netlink传到用户态的struct dpif_upcall,该结构包含了miss packet,和基于报文生成的的flow key,对于flow key相同的packet,会集中处理
for (upcall = upcalls; upcall < &upcalls[n_upcalls]; upcall++) {
fitness = odp_flow_key_to_flow(upcall->key, upcall->key_len, &flow);
port = odp_port_to_ofport(backer, flow.in_port);
odp_flow_key_to_flow,先调用lib/parse_flow_nlattrs函数解析upcall->key, upcall->key_len,把解析出来的attr属性放到一个bitmap present_attrs中,而对应类型的struct nlattr则放到struct nlattr* attrs[]中。接下来对present_attrs的每一位,从upcall->key中取得相应值并存入flow中。对于vlan的parse,特别调用了parse_8021q_onward
odp_port_to_ofport,用来把flow.in_port,即datapath的port号转换成openflow port,即struct ofport_dpif* port
flow_extract(upcall->packet, flow.skb_priority,
&flow.tunnel, flow.in_port, &miss->flow);
这里把packet解析到flow中,该函数和odp_flow_key_to_flow有些地方重复
/* Add other packets to a to-do list. */
hash = flow_hash(&miss->flow, 0);
existing_miss = flow_miss_find(&todo, &miss->flow, hash);
if (!existing_miss) {
hmap_insert(&todo, &miss->hmap_node, hash);
miss->ofproto = ofproto;
miss->key = upcall->key;
miss->key_len = upcall->key_len;
miss->upcall_type = upcall->type;
list_init(&miss->packets);
n_misses++;
} else {
miss = existing_miss;
}
list_push_back(&miss->packets, &upcall->packet->list_node);
}
flow_hash计算出miss->flow的哈希值,之后在todo这个hmap里基于哈希值查找struct flow_miss*,如果为空,表示这是第一个flow_miss,初始化这个flow_miss并加入到todo中,最后把packet假如到flow_miss->packets的list中。这里验证了之前的结论,对于一次性的多个upcall,会把属于同一个flow_miss的packets链接到同一个flow_miss下再一并处理。
OVS定义了facet,用来表示用户态程序,比如vswitchd,对于一条被匹配的flow的视图。同时kernel space对于一条flow同样有一个视图,facet表示两个视图相同的部分。不同的部分用subfacet来表示,struct subfacet里定义了action行为
如果datapath计算出的flow_key,和vswitchd基于packet计算出的flow_key完全一致的话,facet只会包含唯一的subfacet,如果datapath计算出的flow_key的成员比vswitchd基于packet计算出来的还要多,那么每个多出来的部分都会成为一个subfacet
struct subfacet {
/* Owners. */
struct hmap_node hmap_node; /* In struct ofproto_dpif 'subfacets' list. */
struct list list_node; /* In struct facet's 'facets' list. */
struct facet *facet; /* Owning facet. */
/* Key.
*
* To save memory in the common case, 'key' is NULL if 'key_fitness' is
* ODP_FIT_PERFECT, that is, odp_flow_key_from_flow() can accurately
* regenerate the ODP flow key from ->facet->flow. */
enum odp_key_fitness key_fitness;
struct nlattr *key;
int key_len;
long long int used; /* Time last used; time created if not used. */
uint64_t dp_packet_count; /* Last known packet count in the datapath. */
uint64_t dp_byte_count; /* Last known byte count in the datapath. */
/* Datapath actions.
*
* These should be essentially identical for every subfacet in a facet, but
* may differ in trivial ways due to VLAN splinters. */
size_t actions_len; /* Number of bytes in actions[]. */
struct nlattr *actions; /* Datapath actions. */
enum slow_path_reason slow; /* 0 if fast path may be used. */
enum subfacet_path path; /* Installed in datapath? */
}
我们先来看handle_flow_miss
/* Handles flow miss 'miss' on 'ofproto'. May add any required datapath
* operations to 'ops', incrementing '*n_ops' for each new op. */
static void
handle_flow_miss(struct ofproto_dpif *ofproto, struct flow_miss *miss,
struct flow_miss_op *ops, size_t *n_ops)
{
struct facet *facet;
uint32_t hash;
/* The caller must ensure that miss->hmap_node.hash contains
* flow_hash(miss->flow, 0). */
hash = miss->hmap_node.hash;
facet = facet_lookup_valid(ofproto, &miss->flow, hash);
在表示datapath的数据结构struct ofproto_dpif* ofproto中查找flow。ofproto->facets是一个hashmap,首先计算出miss flow的hash值,之后在hash对应的hmap_node list中查找是否有匹配的flow,比较的方式比较暴力,直接拿memcmp比较。。
if (!facet) {
struct rule_dpif *rule = rule_dpif_lookup(ofproto, &miss->flow);
if (!flow_miss_should_make_facet(ofproto, miss, hash)) {
handle_flow_miss_without_facet(miss, rule, ops, n_ops);
此时认为没有必要创建flow facet,对于一些trivial的流量,创建一个flow facet反而会带来更大的overload
return;
}
facet = facet_create(rule, &miss->flow, hash);
好吧,我们为这个flow创建一个facet
}
handle_flow_miss_with_facet(miss, facet, ops, n_ops);
}
struct flow_miss是对flow的一个封装,用来加快miss flow的batch处理。大多数情况下,都会创建这个facet出来,
2012-10-26T07:15:43Z|22522|ofproto_dpif|INFO|[qinq] miss flow, create facet: vlan_tci 0, proto 0x806, in_port 1, src mac 0:16:3e:83:0:1, dst mac 0:25:9e:5d:62:53
2012-10-26T07:15:43Z|22529|ofproto_dpif|INFO|[qinq] miss flow, create facet: vlan_tci 0, proto 0x806, in_port 2, src mac 0:25:9e:5d:62:53, dst mac 0:16:3e:83:0:1
可以看出一个双工通信创建了两个flow出来,同时也创建了facet
下面来看handle_flow_miss_with_facet,里面调用subfacet_make_actions来生成action,该函数首先调用action_xlate_ctx_init,初始化一个action_xlate_ctx结构,该结构定义如下:
struct action_xlate_ctx {
/* action_xlate_ctx_init() initializes these members. */
/* The ofproto. */
struct ofproto_dpif *ofproto;
/* Flow to which the OpenFlow actions apply. xlate_actions() will modify
* this flow when actions change header fields. */
struct flow flow;
/* The packet corresponding to 'flow', or a null pointer if we are
* revalidating without a packet to refer to. */
const struct ofpbuf *packet;
/* Should OFPP_NORMAL update the MAC learning table? Should "learn"
* actions update the flow table?
*
* We want to update these tables if we are actually processing a packet,
* or if we are accounting for packets that the datapath has processed, but
* not if we are just revalidating. */
bool may_learn;
/* The rule that we are currently translating, or NULL. */
struct rule_dpif *rule;
/* Union of the set of TCP flags seen so far in this flow. (Used only by
* NXAST_FIN_TIMEOUT. Set to zero to avoid updating updating rules'
* timeouts.) */
uint8_t tcp_flags;
/* xlate_actions() initializes and uses these members. The client might want
* to look at them after it returns. */
struct ofpbuf *odp_actions; /* Datapath actions. */
tag_type tags; /* Tags associated with actions. */
enum slow_path_reason slow; /* 0 if fast path may be used. */
bool has_learn; /* Actions include NXAST_LEARN? */
bool has_normal; /* Actions output to OFPP_NORMAL? */
bool has_fin_timeout; /* Actions include NXAST_FIN_TIMEOUT? */
uint16_t nf_output_iface; /* Output interface index for NetFlow. */
mirror_mask_t mirrors; /* Bitmap of associated mirrors. */
/* xlate_actions() initializes and uses these members, but the client has no
* reason to look at them. */
int recurse; /* Recursion level, via xlate_table_action. */
bool max_resubmit_trigger; /* Recursed too deeply during translation. */
struct flow base_flow; /* Flow at the last commit. */
uint32_t orig_skb_priority; /* Priority when packet arrived. */
uint8_t table_id; /* OpenFlow table ID where flow was found. */
uint32_t sflow_n_outputs; /* Number of output ports. */
uint16_t sflow_odp_port; /* Output port for composing sFlow action. */
uint16_t user_cookie_offset;/* Used for user_action_cookie fixup. */
bool exit; /* No further actions should be processed. */
struct flow orig_flow; /* Copy of original flow. */
};
之后调用xlate_actions,openflow1.0定义了如下action,
enum ofp10_action_type {
OFPAT10_OUTPUT, /* Output to switch port. */
OFPAT10_SET_VLAN_VID, /* Set the 802.1q VLAN id. */
OFPAT10_SET_VLAN_PCP, /* Set the 802.1q priority. */
OFPAT10_STRIP_VLAN, /* Strip the 802.1q header. */
OFPAT10_SET_DL_SRC, /* Ethernet source address. */
OFPAT10_SET_DL_DST, /* Ethernet destination address. */
OFPAT10_SET_NW_SRC, /* IP source address. */
OFPAT10_SET_NW_DST, /* IP destination address. */
OFPAT10_SET_NW_TOS, /* IP ToS (DSCP field, 6 bits). */
OFPAT10_SET_TP_SRC, /* TCP/UDP source port. */
OFPAT10_SET_TP_DST, /* TCP/UDP destination port. */
OFPAT10_ENQUEUE, /* Output to queue. */
OFPAT10_VENDOR = 0xffff
};
对应不同的action type,其action传入的数据结构也不同,e.g.
/* Action structure for OFPAT10_SET_VLAN_VID. */
struct ofp_action_vlan_vid {
ovs_be16 type; /* OFPAT10_SET_VLAN_VID. */
ovs_be16 len; /* Length is 8. */
ovs_be16 vlan_vid; /* VLAN id. */
uint8_t pad[2];
};
/* Action structure for OFPAT10_SET_VLAN_PCP. */
struct ofp_action_vlan_pcp {
ovs_be16 type; /* OFPAT10_SET_VLAN_PCP. */
ovs_be16 len; /* Length is 8. */
uint8_t vlan_pcp; /* VLAN priority. */
uint8_t pad[3];
};
union ofp_action {
ovs_be16 type;
struct ofp_action_header header;
struct ofp_action_vendor_header vendor;
struct ofp_action_output output;
struct ofp_action_vlan_vid vlan_vid;
struct ofp_action_vlan_pcp vlan_pcp;
struct ofp_action_nw_addr nw_addr;
struct ofp_action_nw_tos nw_tos;
struct ofp_action_tp_port tp_port;
};
do_xlate_actions传入一个struct ofp_action*数组,对每个struct ofp_action,执行不同的操作,e.g.
case OFPUTIL_OFPAT10_OUTPUT:
xlate_output_action(ctx, &ia->output);
break;
case OFPUTIL_OFPAT10_SET_VLAN_VID:
ctx->flow.vlan_tci &= ~htons(VLAN_VID_MASK);
ctx->flow.vlan_tci |= ia->vlan_vid.vlan_vid | htons(VLAN_CFI);
break;
case OFPUTIL_OFPAT10_SET_VLAN_PCP:
ctx->flow.vlan_tci &= ~htons(VLAN_PCP_MASK);
ctx->flow.vlan_tci |= htons(
(ia->vlan_pcp.vlan_pcp << VLAN_PCP_SHIFT) | VLAN_CFI);
break;
case OFPUTIL_OFPAT10_STRIP_VLAN:
ctx->flow.vlan_tci = htons(0);
break;
对于转发报文,最重要的就是xlate_output_action,该函数调用的xlate_output_action__,其中传入的port为datapath port index,或者其他控制参数,可以在ofp_port的定义中看到如下定义:
enum ofp_port {
/* Maximum number of physical switch ports. */
OFPP_MAX = 0xff00,
/* Fake output "ports". */
OFPP_IN_PORT = 0xfff8, /* Send the packet out the input port. This
virtual port must be explicitly used
in order to send back out of the input
port. */
OFPP_TABLE = 0xfff9, /* Perform actions in flow table.
NB: This can only be the destination
port for packet-out messages. */
OFPP_NORMAL = 0xfffa, /* Process with normal L2/L3 switching. */
OFPP_FLOOD = 0xfffb, /* All physical ports except input port and
those disabled by STP. */
OFPP_ALL = 0xfffc, /* All physical ports except input port. */
OFPP_CONTROLLER = 0xfffd, /* Send to controller. */
OFPP_LOCAL = 0xfffe, /* Local openflow "port". */
OFPP_NONE = 0xffff /* Not associated with a physical port. */
};
在xlate_output_action__中,大部分情况都是走到OFPP_NORMAL里面,调用xlate_normal,里面会调用mac_learning_lookup, 查找mac表找到报文的出口port,然后调用output_normal,output_normal最终调用compose_output_action
compose_output_action__(struct action_xlate_ctx *ctx, uint16_t ofp_port,
bool check_stp)
{
const struct ofport_dpif *ofport = get_ofp_port(ctx->ofproto, ofp_port);
uint16_t odp_port = ofp_port_to_odp_port(ofp_port);
ovs_be16 flow_vlan_tci = ctx->flow.vlan_tci;
uint8_t flow_nw_tos = ctx->flow.nw_tos;
uint16_t out_port;
...
out_port = vsp_realdev_to_vlandev(ctx->ofproto, odp_port,
ctx->flow.vlan_tci);
if (out_port != odp_port) {
ctx->flow.vlan_tci = htons(0);
}
commit_odp_actions(&ctx->flow, &ctx->base_flow, ctx->odp_actions);
nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_OUTPUT, out_port);
ctx->sflow_odp_port = odp_port;
ctx->sflow_n_outputs++;
ctx->nf_output_iface = ofp_port;
ctx->flow.vlan_tci = flow_vlan_tci;
ctx->flow.nw_tos = flow_nw_tos;
}
commit_odp_actions,用来把所有action编码车功能nlattr的格式存到ctx->odp_actions中,之后的nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_OUTPUT, out_port)把报文的出口port添加进去,这样一条flow action差不多组合完毕了
下面来讨论下vswitchd中的cam表,代码在lib/mac-learning.h lib/mac-learning.c中,
vswitchd内部维护了一个mac/port的cam表,其中mac entry的老化时间为300秒,cam表定义了flooding vlan的概念,即如果vlan是flooding,表示不会去学习任何地址,这个vlan的所有转发都通过flooding完成,
/* A MAC learning table entry. */
struct mac_entry {
struct hmap_node hmap_node; /* Node in a mac_learning hmap. */
struct list lru_node; /* Element in 'lrus' list. */
time_t expires; /* Expiration time. */
time_t grat_arp_lock; /* Gratuitous ARP lock expiration time. */
uint8_t mac[ETH_ADDR_LEN]; /* Known MAC address. */
uint16_t vlan; /* VLAN tag. */
tag_type tag; /* Tag for this learning entry. */
/* Learned port. */
union {
void *p;
int i;
} port;
};
/* MAC learning table. */
struct mac_learning {
struct hmap table; /* Learning table. */ mac_entry组成的hmap哈希表,mac_entry通过hmap_node挂载到mac_learning->table中
struct list lrus; /* In-use entries, least recently used at the
front, most recently used at the back. */ lru的链表,mac_entry通过lru_node挂载到mac_learning->lrus中
uint32_t secret; /* Secret for randomizing hash table. */
unsigned long *flood_vlans; /* Bitmap of learning disabled VLANs. */
unsigned int idle_time; /* Max age before deleting an entry. */ 最大老化时间
};
static uint32_t
mac_table_hash(const struct mac_learning *ml, const uint8_t mac[ETH_ADDR_LEN],
uint16_t vlan)
{
unsigned int mac1 = get_unaligned_u32((uint32_t *) mac);
unsigned int mac2 = get_unaligned_u16((uint16_t *) (mac + 4));
return hash_3words(mac1, mac2 | (vlan << 16), ml->secret);
}
mac_entry计算的hash值,由mac_learning->secret,vlan, mac地址共同通过hash_3words计算出来
mac_entry_lookup,通过mac地址,vlan来查看是否已经对应的mac_entry
get_lru,找到lru链表对应的第一个mac_entry
mac_learning_create/mac_learning_destroy,创建/销毁mac_learning表
mac_learning_may_learn,如果vlan不是flooding vlan且mac地址不是多播地址,返回true
mac_learning_insert,向mac_learning中插入一条mac_entry,首先通过mac_entry_lookup查看mac, vlan对应的mac_entry是否存在,不存在的话如果此时mac_learning已经有了MAC_MAX条mac_entry,老化最老的那条,之后创建mac_entry并插入到cam表中。
mac_learning_lookup,调用mac_entry_lookup在cam表中查找某个vlan对应的mac地址
mac_learning_run,循环老化已经超时的mac_entry
How to Port Open vSwitch to New Software or Hardware ==================================================== Open vSwitch (OVS) is intended to be easily ported to new software and hardware platforms. This document describes the types of changes that are most likely to be necessary in porting OVS to Unix-like platforms. (Porting OVS to other kinds of platforms is likely to be more difficult.) Vocabulary ---------- For historical reasons, different words are used for essentially the same concept in different areas of the Open vSwitch source tree. Here is a concordance, indexed by the area of the source tree: datapath/ vport --- vswitchd/ iface port ofproto/ port bundle ofproto/bond.c slave bond lib/lacp.c slave lacp lib/netdev.c netdev --- database Interface Port Open vSwitch Architectural Overview ----------------------------------- The following diagram shows the very high-level architecture of Open vSwitch from a porter's perspective. +-------------------+ | ovs-vswitchd |<-->ovsdb-server +-------------------+ | ofproto |<-->OpenFlow controllers +--------+-+--------+ | netdev | | ofproto| +--------+ |provider| | netdev | +--------+ |provider| +--------+ Some of the components are generic. Modulo bugs or inadequacies, these components should not need to be modified as part of a port: - "ovs-vswitchd" is the main Open vSwitch userspace program, in vswitchd/. It reads the desired Open vSwitch configuration from the ovsdb-server program over an IPC channel and passes this configuration down to the "ofproto" library. It also passes certain status and statistical information from ofproto back into the database. - "ofproto" is the Open vSwitch library, in ofproto/, that implements an OpenFlow switch. It talks to OpenFlow controllers over the network and to switch hardware or software through an "ofproto provider", explained further below. - "netdev" is the Open vSwitch library, in lib/netdev.c, that abstracts interacting with network devices, that is, Ethernet interfaces. The netdev library is a thin layer over "netdev provider" code, explained further below. The other components may need attention during a port. You will almost certainly have to implement a "netdev provider". Depending on the type of port you are doing and the desired performance, you may also have to implement an "ofproto provider" or a lower-level component called a "dpif" provider. The following sections talk about these components in more detail. Writing a netdev Provider ------------------------- A "netdev provider" implements an operating system and hardware specific interface to "network devices", e.g. eth0 on Linux. Open vSwitch must be able to open each port on a switch as a netdev, so you will need to implement a "netdev provider" that works with your switch hardware and software. struct netdev_class, in lib/netdev-provider.h, defines the interfaces required to implement a netdev. That structure contains many function pointers, each of which has a comment that is meant to describe its behavior in detail. If the requirements are unclear, please report this as a bug. The netdev interface can be divided into a few rough categories: * Functions required to properly implement OpenFlow features. For example, OpenFlow requires the ability to report the Ethernet hardware address of a port. These functions must be implemented for minimally correct operation. * Functions required to implement optional Open vSwitch features. For example, the Open vSwitch support for in-band control requires netdev support for inspecting the TCP/IP stack's ARP table. These functions must be implemented if the corresponding OVS features are to work, but may be omitted initially. * Functions needed in some implementations but not in others. For example, most kinds of ports (see below) do not need functionality to receive packets from a network device. The existing netdev implementations may serve as useful examples during a port: * lib/netdev-linux.c implements netdev functionality for Linux network devices, using Linux kernel calls. It may be a good place to start for full-featured netdev implementations. * lib/netdev-vport.c provides support for "virtual ports" implemented by the Open vSwitch datapath module for the Linux kernel. This may serve as a model for minimal netdev implementations. * lib/netdev-dummy.c is a fake netdev implementation useful only for testing. Porting Strategies ------------------ After a netdev provider has been implemented for a system's network devices, you may choose among three basic porting strategies. The lowest-effort strategy is to use the "userspace switch" implementation built into Open vSwitch. This ought to work, without writing any more code, as long as the netdev provider that you implemented supports receiving packets. It yields poor performance, however, because every packet passes through the ovs-vswitchd process. See [INSTALL.userspace.md] for instructions on how to configure a userspace switch. If the userspace switch is not the right choice for your port, then you will have to write more code. You may implement either an "ofproto provider" or a "dpif provider". Which you should choose depends on a few different factors: * Only an ofproto provider can take full advantage of hardware with built-in support for wildcards (e.g. an ACL table or a TCAM). * A dpif provider can take advantage of the Open vSwitch built-in implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and other features. An ofproto provider has to provide its own implementations, if the hardware can support them at all. * A dpif provider is usually easier to implement, but most appropriate for software switching. It "explodes" wildcard rules into exact-match entries (with an optional wildcard mask). This allows fast hash lookups in software, but makes inefficient use of TCAMs in hardware that support wildcarding. The following sections describe how to implement each kind of port. ofproto Providers ----------------- An "ofproto provider" is what ofproto uses to directly monitor and control an OpenFlow-capable switch. struct ofproto_class, in ofproto/ofproto-provider.h, defines the interfaces to implement an ofproto provider for new hardware or software. That structure contains many function pointers, each of which has a comment that is meant to describe its behavior in detail. If the requirements are unclear, please report this as a bug. The ofproto provider interface is preliminary. Please let us know if it seems unsuitable for your purpose. We will try to improve it. Writing a dpif Provider ----------------------- Open vSwitch has a built-in ofproto provider named "ofproto-dpif", which is built on top of a library for manipulating datapaths, called "dpif". A "datapath" is a simple flow table, one that is only required to support exact-match flows, that is, flows without wildcards. When a packet arrives on a network device, the datapath looks for it in this table. If there is a match, then it performs the associated actions. If there is no match, the datapath passes the packet up to ofproto-dpif, which maintains the full OpenFlow flow table. If the packet matches in this flow table, then ofproto-dpif executes its actions and inserts a new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the packet up to ofproto to send the packet to the OpenFlow controller, if one is configured.) When calculating the dpif flow, ofproto-dpif generates an exact-match flow that describes the missed packet. It makes an effort to figure out what fields can be wildcarded based on the switch's configuration and OpenFlow flow table. The dpif is free to ignore the suggested wildcards and only support the exact-match entry. However, if the dpif supports wildcarding, then it can use the masks to match multiple flows with fewer entries and potentially significantly reduce the number of flow misses handled by ofproto-dpif. The "dpif" library in turn delegates much of its functionality to a "dpif provider". The following diagram shows how dpif providers fit into the Open vSwitch architecture: _ | +-------------------+ | | ovs-vswitchd |<-->ovsdb-server | +-------------------+ | | ofproto |<-->OpenFlow controllers | +--------+-+--------+ _ | | netdev | |ofproto-| | userspace | +--------+ | dpif | | | | netdev | +--------+ | | |provider| | dpif | | | +---||---+ +--------+ | | || | dpif | | implementation of | || |provider| | ofproto provider |_ || +---||---+ | || || | _ +---||-----+---||---+ | | | |datapath| | kernel | | +--------+ _| | | | |_ +--------||---------+ || physical NIC struct dpif_class, in lib/dpif-provider.h, defines the interfaces required to implement a dpif provider for new hardware or software. That structure contains many function pointers, each of which has a comment that is meant to describe its behavior in detail. If the requirements are unclear, please report this as a bug. There are two existing dpif implementations that may serve as useful examples during a port: * lib/dpif-netlink.c is a Linux-specific dpif implementation that talks to an Open vSwitch-specific kernel module (whose sources are in the "datapath" directory). The kernel module performs all of the switching work, passing packets that do not match any flow table entry up to userspace. This dpif implementation is essentially a wrapper around calls into the kernel module. * lib/dpif-netdev.c is a generic dpif implementation that performs all switching internally. This is how the Open vSwitch userspace switch is implemented.
vswitchd是ovs中最核心的组件,openflow的相关逻辑都在vswitchd里实现,一般来说,ovs分为datapath, vswitchd以及ovsdb三个部分,datapath一般是和具体是数据面平台相关的,比如白盒交换机,或者Linux内核等,同时datapath不是必须的组件。ovsdb用于存储vswitch本身的配置信息,比如端口,拓扑,规则等。vswitchd在ovs dist包里是以用户态进程形式呈现的,但这个不是绝对的,上文摘录的部分给出了把ovs移植到其他平台上的方法,也算是目前官方仅有的一篇大致描述了ovs架构的文档
可以看出vswitchd本身是分层的结构,最上面的daemon层主要用于和ovsdb通信,做配置的下发和更新等,中间是ofproto层,用于和openflow控制器通信,以及通过ofproto_class暴露了ofproto provider接口,不同平台上openflow的具体实现就通过ofproto_class统一了接口。
在ovs的定义里,netdev代表了具体平台的设备实现,e.g. linux内核的net_device或者移植到交换机平台下的port等,struct netdev_class定义了netdev-provider的具体实现需要的接口,具体的平台实现需要支持这些统一的接口,从而完成netdev设备的创建,销毁,打开,关闭等一系列操作。不同的netdev类型通过netdev_register_provider被注册,vswitchd内部会保存一个struct cmap netdev_classes保存所有注册的netdev类型,struct netdev定义如下
ofproto层通过ofproto_class定义了openflow的接口,除此之外,还有几个重要的数据结构和ofproto相关,struct ofproto, struct ofport, struct rule, struct oftable, struct ofgroup
1. struct ofproto代表了一个openflow switch结构体,内部包含了struct ofproto_class, struct ofport的hash map,struct oftable, struct ofgroup的hash map etc.
2. struct ofport代表了openflow switch的一个端口,同时关联一个struct netdev的设备抽象
3. struct rule表示一条openflow规则,rule里面包含了一组struct rule_actions
| +-------------------+ | | ovs-vswitchd |<-->ovsdb-server | +-------------------+ | | ofproto |<-->OpenFlow controllers | +--------+-+--------+ _ | | netdev | |ofproto-| | userspace | +--------+ | dpif | | | | netdev | +--------+ | | |provider| | dpif | | | +---||---+ +--------+ | | || | dpif | | implementation of | || |provider| | ofproto provider |_ || +---||---+ | || || | _ +---||-----+---||---+ | | | |datapath| | kernel | | +--------+ _| | | | |_ +--------||---------+ || physical NIC