OpenFlow Controller:SDN 网络将传统的网络结构划分成了 Control Plane 和 Data Plane 两部分,OpenFlow Controller 正是 Control Plane 部分,通过约定的通信协议来远程控制管理 OpenFlow Switch。增加、删除或者修改 OpenFlow Switch 的 Flow Entries。
OpenFlow Switch:实现了 OpenFlow Switch 规范的交换机。
- OpenFlow-only Switch(纯粹的 OpenFlow Switch):所有的网络包只能通过 OpenFlow 流表的 Pipeline。
- OpenFlow-hybrid Switch(混合型的 OpenFlow Switch):同时支持传统网络协议栈和 OpenFlow 协议的交换机设备。
OpenFlow Channel:OpenFlow Switch 对外开放的接口,接受来自于 Remote Controller 的通信协议,进而来操纵 OpenFlow Switch。
OpenFlow Protocol:一种通信协议规范,用于 Remote Controller 和 OpenFlow Switch 之间的消息交换。
Flow Table:包含多个 Flow Entry 记录,控制数据包的流向。
Group Table:相对于 Flow Table,控制着数据包更高级的转发特性,比如 Flooding、Multipath、Fast Reroute、Link Aggregation 等。
Meter Table:对匹配流表项的网络包执行 QoS 策略。
Pipeline:由多个 Flow Table 链接而成,控制数据包的一系列行为。
OpenFlow 交换机基于多个流表(Flow Table)和一个组表(Group Table)转发数据包,通过 OpenFlow Channel 与 OpenFlow 控制器进行通信。OpenFlow 控制器可以向 OpenFlow 交换机下发配置来添加、删除、更新流表中的流表项(Flow Entry),OpenFlow 交换机也可以将数据包转发至 OpenFlow 控制器,由 OpenFlow 控制器来判断如何处理数据包。
OpenFlow 规范主要定义了 OpenFlow 交换机的功能模块以及其与 OpenFlow 控制器之间的通信信道等方面。OpenFlow 规范还在不停改进,本文以 OpenFlow 1.3.5 为基础展开。
OpenFlow 规范主要分为四大部分:
一台 OpenFlow 交换机可以配置 65280 个端口,OpenFlow 规范将交换机上的端口分为 3 种类别:
OpenFlow 目前总共定义了 ALL、CONTROLLER、TABLE、IN_PORT、ANY、LOCAL、NORMAL 和 FLOOD 等 8 种预留端口。
以 16 比特标记 OpenFlow 自定义端口类型:
其中后 3 种为非必需的端口,只在混合型的 OpenFlow Switch 中存在。e.g.
一个 OpenFlow 交换机中可以有多个流表,一个流表又可以包含有多个流表项。网络包可以与各流表中的流表项匹配,即网络包可以与多个流表项匹配。
**OpenFlow 通过用户定义的或者预设的流表项来匹配和处理网络包。**所有 OpenFlow 的流表项都被组织在不同的 Flow Table 中,在同一个 Flow Table 中按规则的优先级进行先后匹配。一个 OpenFlow 的交换机至少包含一个可以包含多个 Flow Table,从 0 依次编号排列。OpenFlow 规范中定义了流水线式的处理流程,当数据包进入交换机后,必须从 Flow Table 0 开始依次匹配。Flow Table 可以使用 goto 语句按次序从小到大越级跳转,但不能从某一 Flow Table 向前跳转至编号更小的 Flow Table。当数据包成功匹配一条流表项后,将首先更新该流表项对应的统计数据(又称计数器,如:成功匹配数据包总数目和总字节数等),然后根据规则流表项中的指令进行相应操作。比如:跳转至后续某一 Flow Table 继续处理,修改或者立即执行该数据包对应的 Action Set 等。当数据包已经处于最后一个 Flow Table 时,其对应的 Action Set 中的所有 Action 将被执行,包括转发至某一端口,修改数据包某一字段,丢弃数据包等。OpenFlow 规范中对目前所支持的 Instructions 和 Actions 进行了完整详细的说明和定义。
OpenFlow 的流水线(Popeline Processing):
Action Set 中的主要 Action 包括:
Set-Field Action 可以有以下类型:
NOTE:为了实现 QoS,与 ToS 同时使用的是 CoS(Class of Service,服务等级)。ToS 在 IPv4 Header 中,Set VLAN priority 相当于 CoS。
在一条流表项的匹配域中可以根据网络包在 L2、L3 或者 L4 等网络报文头的任意字段进行匹配。比如以太网帧的源 MAC 地址,IP 包的协议类型和 IP 地址,或者 TCP/UDP 的端口号等。目前 OpenFlow 的规范中还规定了交互机设备厂商可以选择性地支持通配符进行匹配。
v1.0 匹配与字段,v1.3 的字段类型更多:
网络包在 OpenFlow 交换机中的执行流程:
首先 OpenFlow 交换机解析进入设备的网络包,从 Table 0 开始匹配,按照优先级高低依次匹配该流表中的流表项,一个网络包在一个流表中只会匹配上一条流表项。根据指令是否继续前往下一个流表,不继续则终止匹配流程执行动作集,如果指令要求继续前往下一个流表则继续匹配,下一个流表的 ID 需要比当前流表 ID 大。当网络包匹配失败了,且存在无匹配流表项(Table-miss)就按照该表项执行指令。一般是将网络包转发给 OpenFlow 控制器、丢弃或转发给其他流表。如果没有 Table-miss 表项则默认丢弃该网络包。一般 Table-miss 流表项所有 MatchFields 都为空,并且优先级为 0。
OpenFlow 控制器与 OpenFlow 交换机的流表项交互模式:
除了直接由流表项处理数据包,还可以由流表项指定通过组表(Group Table)来转发数据包,可以通过在不同流表项动作中引用相同的组表实现对数据包执行相同的动作,以此简化了流表的维护。
OpenFlow 1.3 还有 Meter 表,用于关联的流表项,对匹配流表项的网络包执行 QoS 策略,字段如下:
这一节中,OpenFlow 规范定义了一个 OpenFlow Switch 如何与 Controller 建立连接、通讯以及相关消息类型等。
当 OpenFlow 交换机启动后既可与 OpenFlow 控制器建立连接,这个连接就称之为 “OpenFlow 通道”。连接是从 OpenFlow 交换机向 OpenFlow 控制器发起建立的。出于安全和高可用性等方面的考虑,OpenFlow 的规范还规定了如何为 Controller 和 Switch 之间的信道加密、如何建立多连接等(主连接和辅助连接)。OpenFlow 通道支持 TLS 安全通信,OpenFlow 控制器和 OpenFlow 交换机使用服务器证书和客户端证书进行认证,建立安全通道的 TCP 端口默认为 6633。在 OpenFlow 交换机与控制器建立连接后,有时候会由 OpenFlow 控制器来完成 OpenFlow 网络的拓扑检测,OpenFLow 可以使用 LLDP(Link Layer Discovery Protocol,链路层发现协议)来完成。
OpenFlow 消息头部:
OpenFlow 消息类型一览(v1.0):
OpenFlow 规范中定义了三种消息类型:每一种类型都有多个子类型,控制器和交换机之间通过这三类消息进行连接建立,流表下发和信息交换,实现对网络中所有 OpenFlow 交换机的控制。
Controller/Switch(Controller-to-Switch)消息:是指由 Controller 发起、Switch 接收并处理的消息,主要包括下列消息。这些消息主要由 Controller 用来对 Switch 进行状态查询和修改配置等操作。
异步(Asynchronous)消息:是由 Switch 发送给 Controller、用来通知 Switch上 发生的某些异步事件的消息,主要包括下列等。例如,当某一条规则因为超时而被删除时,Switch 将自动发送一条 Flow-Removed 消息通知 Controller,以方便 Controller 作出相应的操作,如重新设置相关规则等。
对称(Symmetric)消息:顾名思义,这些都是双向对称的消息,主要用来建立连接、检测对方是否在线等,包括下列三种消息。
下图展示了 OpenFlow 和 Switch 之间一次典型的消息交换过程:
在 OpenFlow 规范的最后一部分,主要详细定义了各种 OpenFlow 消息的数据结构,包括 OpenFlow 消息的消息头等。这里就不一一赘述,如需了解可以参考 OpenFlow 源代码 openflow.h 头文件中关于各种数据结构的定义。
OpenFlow 协议的发展演进一直都围绕着两个方面,一方面是控制面增强,让系统功能更丰富更灵活;另一方面是转发层面的增强,可以匹配更多的关键字,执行更多的动作。每一个后续版本的 OpenFlow 协议都在前一版本的基础上进行了或多或少的改进,但自 OpenFlow 1.1 版本开始和之前版本不兼容,OpenFlow 协议官方维护组织 ONF 为了保证产业界有一个稳定发展的平台,把 OpenFlow 1.0 和 1.3 版本作为长期支持的稳定版本,一段时间内后续版本发展要保持和稳定版本的兼容。
OpenFlow 1.0 指定每个 OpenFlow 交换机中都存在一张流表,用于数据包查找、处理和转发,并且只能同一台控制器进行通信,流表的维护也是通过控制器下发相应的 OpenFlow 消息来实现。流表由多个流表项组成,而每个流表项就是一个转发规则。流表项由包头域、计数器和动作组成。
自 OpenFlow 1.1 版本开始支持多级流表,有 256 级,将流表匹配过程分解成多个步骤,形成流水线处理方式,这样可以有效和灵活利用硬件内部固有的多表特性,同时把数据包处理流程分解到不同的流表中也避免了单流表过度膨胀问题。除此之外 OpenFlow 1.1 中还增加了对于 VLAN 和 MPLS 标签的处理,并且增加了 Group 表,通过在不同流表项动作中引用相同的组表实现对数据包执行相同的动作,简化了流表的维护。OpenFlow 1.1 版本是 OpenFlow 协议版本发展的一个分水岭,它和 OpenFlow 1.0 版本开始不兼容,但后续版本仍然还是在此基础上发展。OpenFlow 1.1 把包头域修改为了匹配域。
为了更好支持协议的可扩展性, OpenFlow 1.2 版本发展为下发规则的匹配字段不再通过固定长度的结构来定义,而是采用了 TLV 结构定义匹配字段,称为 OXM(OpenFlow Extensible Match),这样用户就可以灵活的下发自己的匹配字段,增加了更多关键字匹配字段的同时也节省了流表空间。同时,OpenFlow 1.2 规定可以使用多台控制器和同一台交换机进行连接增加可靠性,并且多控制器可以通过发送消息来变换自己的角色。还有重要的一点是自 OpenFlow 1.2 版本开始支持 IPv6。
经过 1.1 和 1.2 版本的演变积累,2012 年 4 月发布的 OpenFlow 1.3 版本成为长期支持的稳定版本。OpenFlow 1.3 流表支持的匹配关键字已经增加到 40 个,足以满足现有网络应用的需要。OpenFlow 1.3 主要还增加了 Meter 表,用于控制关联流表的数据包的传送速率,但控制方式目前还相对简单。OpenFlow 1.3 还改进了版本协商过程,允许交换机和控制器根据自己的能力协商支持的 OpenFlow 协议版本。同时,连接建立也增加了辅助连接提高交换机的处理效率和实现应用的并行性。其它还有 IPv6 扩展头和 Table-miss 表项的支持。
2013 年发布的 OpenFlow 1.4 版本仍然是基于 1.3 版本的特征改进版本,数据转发层面没有太大变化,主要是增加了一种流表同步机制,多个流表可以共享相同的匹配字段,但可以定义不同的动作;另外又增加了 Bundle 消息,确保控制器下发一组完整消息或同时向多个交换机下发消息的状态一致性。其它还支持光口属性描述,多控制器相关的流表监控等特征。
数据包类型识别流程(以太网数据包、PPP 数据包)egress Table。
NOTE:idle_timeout 不包含在 ovs-ofctl dump-flows br_name
的输出。
常用字段:
NOTE:在网络分层结构中底层的字段未给出确定值时上层的字段不允许给确定值,即一条流规则中允许底层协议字段指定为确定值,上层协议字段指定为通配符或不指定(匹配任何值),而不允许上层协议字段指定为确定值,而底层协议字段却指定为通配符或不指定(匹配任何值)。否则,ovs-vswitchd 中的流规则将全部丢失,网络无法连接。
详细介绍:
字段(key/value) | 含义 |
---|---|
in_port=port | Matches OpenFlow port port |
dl_vlan=vlan | Matches IEEE 802.1q Virtual LAN tag vlan. |
dl_vlan_pcp=priority | Matches IEEE 802.1q Priority Code Point (PCP) priority, which is specified as a value between 0 and 7, inclusive. A higher value indicates a higher frame priority level. |
dl_src=xx:xx:xx:xx:xx:xx dl_dst=xx:xx:xx:xx:xx:xx | Matches an Ethernet source (or destination) address specified as 6 pairs of hexadecimal digits delimited by colons (e.g. 00:0A:E4:25:6B:B0). |
dl_src=xx:xx:xx:xx:xx:xx/xx:xx:xx:xx:xx:xx dl_dst=xx:xx:xx:xx:xx:xx/xx:xx:xx:xx:xx:xx | Matches an Ethernet destination address specified as 6 pairs of hexadecimal digits delimited by colons (e.g. 00:0A:E4:25:6B:B0), with a wildcard mask following the slash. 01:00:00:00:00:00 Match only the multicast bit. Thus, dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 matches all multicast (including broadcast) Ethernet packets, and dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 matches all unicast Ethernet packets. ff:ff:ff:ff:ff:ff Exact match (equivalent to omitting the mask). 00:00:00:00:00:00 Wildcard all bits (equivalent to dl_dst=*.) |
dl_type=ethertype | Matches Ethernet protocol type ethertype, which is specified as an integer between 0 and 65535 |
nw_src=ip[/netmask] nw_dst=ip[/netmask] | When dl_type is 0x0800 (possibly via shorthand, e.g. ip or tcp), matches IPv4 source (or destination) address ip, which may be specified as an IP address or host name. When dl_type=0x0806 or arp is specified, matches the ar_spa or ar_tpa field, respectively, in ARP packets for IPv4 and Ethernet. When dl_type=0x8035 or rarp is specified, matches the ar_spa or ar_tpa field, respectively, in RARP packets for IPv4 and Ethernet. |
nw_proto=proto | When ip or dl_type=0x0800 is specified, matches IP protocol type proto, which is specified as a decimal number between 0 and 255, inclusive (e.g. 1 to match ICMP packets or 6 to match TCP packets). When ipv6 or dl_type=0x86dd is specified, matches IPv6 header type proto, which is specified as a decimal number between 0 and 255, inclusive (e.g. 58 to match ICMPv6 packets or 6 to match TCP). When arp or dl_type=0x0806 is specified, matches the lower 8 bits of the ARP opcode. When rarp or dl_type=0x8035 is specified, matches the lower 8 bits of the ARP opcode. |
nw_tos=tos | Matches IP ToS/DSCP or IPv6 traffic class field tos, which is specified as a decimal number between 0 and 255, inclusive. |
nw_ecn=ecn | Matches ecn bits in IP ToS or IPv6 traffic class fields, which is specified as a decimal number between 0 and 3, inclusive. |
nw_ttl=ttl | Matches IP TTL or IPv6 hop limit value ttl, which is specified as a decimal number between 0 and 255, inclusive. |
tp_src=port tp_dst=port | When dl_type and nw_proto specify TCP or UDP, tp_src and tp_dst match the UDP or TCP source or destination port port |
icmp_type=type icmp_code=code | When dl_type and nw_proto specify ICMP or ICMPv6, type matches the ICMP type and code matches the ICMP code. |
table=number | If specified, limits the flow manipulation and flow dump commands to only apply to the table with the given number between 0 and 254. |
vlan_tci=tci[/mask] | Matches modified VLAN TCI tci. If mask is omitted, tci is the exact VLAN TCI to match; if mask is specified, then a 1-bit in mask indicates that the corresponding bit in tci must match exactly, and a 0-bit wildcards that bit. |
ip_frag=frag_type | When dl_type specifies IP or IPv6, frag_type specifies what kind of IP fragments or non-fragments to match. The following values of frag_type are supported: no Matches only non-fragmented packets. yes Matches all fragments. first Matches only fragments with offset 0. later Matches only fragments with nonzero offset. not_later Matches non-fragmented packets and fragments with zero offset. |
arp_sha=xx:xx:xx:xx:xx:xx arp_tha=xx:xx:xx:xx:xx:xx | When dl_type specifies either ARP or RARP, arp_sha and arp_tha match the source and target hardware address, respectively. |
tun_id=tunnel-id[/mask] | Matches tunnel identifier tunnel-id. Only packets that arrive over a tunnel that carries a key (e.g. GRE with the RFC 2890 key extension and a nonzero key value) will have a nonzero tunnel ID. |
常用字段:
NOTE:一条流规则可有多个动作,动作执行按指定的先后顺序依次完成。
详细介绍:
字段(key/value) | 含义 |
---|---|
output:port | Outputs the packet to port |
output:src[start…end] | Outputs the packet to the OpenFlow port number read from src, which must be an NXM field as described above. For example, output:NXM_NX_REG0[16…31] outputs to the OpenFlow port number written in the upper half of register 0. |
enqueue:port:queue | Enqueues the packet on the specified queue within port port |
normal | Subjects the packet to the device’s normal L2/L3 processing. |
flood | Outputs the packet on all switch physical ports other than the port on which it was received and any ports on which flooding is disabled |
all | Outputs the packet on all switch physical ports other than the port on which it was received. |
controller(key=value…) | Sends the packet to the OpenFlow controller as a “packet in” message. The supported key-value pairs are: max_len=nbytes : Limit to nbytes the number of bytes of the packet to send to the controller. By default the entire packet is sent. reason=reason: Specify reason as the reason for sending the message in the “packet in” message. The supported reasons are action (the default), no_match, and invalid_ttl. id=controller-id : Specify controller-id |
in_port | Outputs the packet on the port from which it was received. |
drop | Discards the packet, so no further processing or forwarding takes place. |
mod_vlan_vid:vlan_vid | Modifies the VLAN id on a packet. |
mod_vlan_pcp:vlan_pcp | Modifies the VLAN priority on a packet. |
strip_vlan | Strips the VLAN tag from a packet if it is present. |
push_vlan:ethertype | Push a new VLAN tag onto the packet. |
push_mpls:ethertype | If the packet does not already contain any MPLS labels, changes the packet’s Ethertype to ethertype, which must be either the MPLS unicast Ethertype 0x8847 or the MPLS multicast Ethertype 0x8848, and then pushes an initial label stack entry. |
pop_mpls:ethertype | Strips the outermost MPLS label stack entry. |
mod_dl_src:mac | Sets the source Ethernet address to mac. |
mod_dl_dst:mac | Sets the destination Ethernet address to mac. |
mod_nw_src:ip | Sets the IPv4 source address to ip. |
mod_nw_dst:ip | Sets the IPv4 destination address to ip. |
mod_tp_src:port | Sets the TCP or UDP source port to port. |
mod_tp_dst:port | Sets the TCP or UDP destination port to port. |
mod_nw_tos:tos | Sets the IPv4 ToS/DSCP field to tos, which must be a multiple of 4 between 0 and 255. |
resubmit([port],[table]) | Re-searches this OpenFlow flow table (or the table whose number is specified by table) with the in_port field replaced by port (if port is specified) |
set_tunnel:id set_tunnel64:id | If outputting to a port that encapsulates the packet in a tunnel and supports an identifier (such as GRE), sets the identifier to id. |
set_queue:queue | Sets the queue that should be used to queue when packets are output. |
pop_queue | Restores the queue to the value it was before any set_queue actions were applied. |
dec_ttl dec_ttl[(id1,id2)] | Decrement TTL of IPv4 packet or hop limit of IPv6 packet. |
set_mpls_ttl:ttl | Set the TTL of the outer MPLS label stack entry of a packet. ttl should be in the range 0 to 255 inclusive. |
dec_mpls_ttl | Decrement TTL of the outer MPLS label stack entry of a packet. |
move:src[start…end]−>dst[start…end] | Copies the named bits from field src to field dst. src and dst must be NXM field names as defined in nicira−ext.h, e.g. NXM_OF_UDP_SRC or NXM_NX_REG0. Examples: move:NXM_NX_REG0[0…5]−>NXM_NX_REG1[26…31] copies the six bits numbered 0 through 5, inclusive, in register 0 into bits 26 through 31, inclusive; move:NXM_NX_REG0[0…15]−>NXM_OF_VLAN_TCI[] copies the least significant 16 bits of register 0 into the VLAN TCI field. |
load:value−>dst[start…end] | Writes value to bits start through end, inclusive, in field dst. Example: load:55−>NXM_NX_REG2[0…5] loads value 55 (bit pattern 110111) into bits 0 through 5, inclusive, in register 2. |
push:src[start…end] | Pushes start to end bits inclusive, in fields on top of the stack. Example: push:NXM_NX_REG2[0…5] push the value stored in register 2 bits 0 through 5, inclusive, on to the internal stack. |
pop:dst[start…end] | Pops from the top of the stack, retrieves the start to end bits inclusive, from the value popped and store them into the corresponding bits in dst. Example: pop:NXM_NX_REG2[0…5] pops the value from top of the stack. Set register 2 bits 0 through 5, inclusive, based on bits 0 through 5 from the value just popped. |
set_field:value−>dst | Writes the literal value into the field dst, which should be specified as a name used for matching. Example: set_field:fe80:0123:4567:890a:a6ba:dbff:fefe:59fa−>ipv6_src |
learn(argument[,argument]…) | This action adds or modifies a flow in an OpenFlow table, similar to ovs−ofctl −−strict mod−flows. The arguments specify the flow’s match fields, actions, and other properties, as follows |
idle_timeout=seconds hard_timeout=seconds priority=value | These key-value pairs have the same meaning as in the usual ovs−ofctl flow syntax. |
fin_idle_timeout=seconds fin_hard_timeout=seconds | Adds a fin_timeout action with the specified arguments to the new flow. |
table=number | The table in which the new flow should be inserted. Specify a decimal number between 0 and 254. The default, if table is unspecified, is table 1. |
field=value field[start…end]=src[start…end] field[start…end] | Adds a match criterion to the new flow. |
load:value−>dst[start…end] load:src[start…end]−>dst[start…end] | Adds a load action to the new flow. |
output:field[start…end] | Add an output action to the new flow’s actions, that outputs to the OpenFlow port taken from field[start…end], which must be an NXM field as described above. |
http://net.zol.com.cn/461/4610667.html
https://www.sdnlab.com/sdn-guide/14716.html
http://www.just4coding.com/blog/2016/12/31/introducing-openflow/
https://www.li-rui.top/2018/12/01/network/openflow介绍/
https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-specifications/openflow/openflow-switch-v1.3.5.pdf
https://www.sdnlab.com/14484.html