k8s之ovs-cni

ovs-cni是由kubevirt提供的一种k8s cni, 用于将pod接口长在ovs网桥上面,其原理为:创建一对veth接口,一端加到ovs网桥,另一端加到pod内部。
ovs-cni不会自动创建网桥,所以必须提前创建好。
ovs-cni也不会实现跨host的pod通信,必须提前规划好通过ovs跨host通信方案。

环境介绍

必须在安装了multus的k8s环境上,因为要使用multus创建的crd资源network-attachment-definitions来定义ovs配置。
k8s环境如下

root@master:~# kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION     CONTAINER-RUNTIME
master   Ready    master   183d   v1.17.3   192.168.122.20           Ubuntu 19.10   5.3.0-62-generic   docker://19.3.2
node1    Ready       183d   v1.17.3   192.168.122.21           Ubuntu 19.10   5.3.0-62-generic   docker://19.3.2
node2    Ready       183d   v1.17.3   192.168.122.22           Ubuntu 19.10   5.3.0-62-generic   docker://19.3.2

root@master:~# kubectl get pod -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-5b644bc49c-4vfjx   1/1     Running   2          46d
calico-node-5gtw7                          1/1     Running   2          46d
calico-node-mqt6l                          1/1     Running   4          46d
calico-node-t4vjh                          1/1     Running   2          46d
coredns-9d85f5447-4znmx                    1/1     Running   4          42d
coredns-9d85f5447-fh667                    1/1     Running   2          42d
etcd-master                                1/1     Running   8          183d
kube-apiserver-master                      1/1     Running   0          27h
kube-controller-manager-master             1/1     Running   8          183d
kube-multus-ds-amd64-7b4fw                 1/1     Running   0          5h13m
kube-multus-ds-amd64-dq2s8                 1/1     Running   0          5h13m
kube-multus-ds-amd64-sqf8g                 1/1     Running   0          5h13m
kube-proxy-l4wn7                           1/1     Running   5          183d
kube-proxy-prhcm                           1/1     Running   5          183d
kube-proxy-psxqt                           1/1     Running   8          183d
kube-scheduler-master                      1/1     Running   8          183d

在三个节点上执行如下命令,安装openvswitch,如果要实现跨host的pod通信,可以将host上的对外通信的网卡加到网桥上。

apt install openvswitch-switch/eoan

安装ovs-cni

下载ovs-cni源码,获取用于安装ovs-cni的yaml文件

git clone https://github.com/kubevirt/ovs-cni.git
cp manifests/ovs-cni.yml.in ./ovs-cni.yaml

修改ovs-cni.yaml文件中如下几个宏定义

#安装到kube-system namespace中
NAMESPACE -> kube-system
#ovs-cni-plugin的image路径
${OVS_CNI_PLUGIN_IMAGE_REPO}/${OVS_CNI_PLUGIN_IMAGE_NAME}:${OVS_CNI_PLUGIN_IMAGE_VERSION} -> quay.io/kubevirt/ovs-cni-plugin
#cni binary路径,ovs-cni-plugin起来后,会将pod内的ovs binary复制到这个路径
CNI_MOUNT_PATH->/opt/cni/bin
#image pull策略,不一定非是Always
OVS_CNI_PLUGIN_IMAGE_PULL_POLICY->Always
#ovs-cni-marker的image路径
${OVS_CNI_MARKER_IMAGE_REPO}/${OVS_CNI_MARKER_IMAGE_NAME}:${OVS_CNI_MARKER_IMAGE_VERSION} ->quay.io/kubevirt/ovs-cni-marker

安装

root@master:~/ovs/ovs-cni-master# kubectl apply -f ovs-cni.yaml
daemonset.apps/ovs-cni-amd64 created
clusterrole.rbac.authorization.k8s.io/ovs-cni-marker-cr created
clusterrolebinding.rbac.authorization.k8s.io/ovs-cni-marker-crb created
serviceaccount/ovs-cni-marker created

如下,ovs-cni pod已经在三个节点上处于running状态。

root@master:~/ovs/ovs-cni-master# kubectl get pod -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-5b644bc49c-4vfjx   1/1     Running   2          46d
calico-node-5gtw7                          1/1     Running   2          46d
calico-node-mqt6l                          1/1     Running   4          46d
calico-node-t4vjh                          1/1     Running   2          46d
coredns-9d85f5447-4znmx                    1/1     Running   4          42d
coredns-9d85f5447-fh667                    1/1     Running   2          42d
etcd-master                                1/1     Running   8          183d
kube-apiserver-master                      1/1     Running   0          28h
kube-controller-manager-master             1/1     Running   8          183d
kube-multus-ds-amd64-7b4fw                 1/1     Running   0          5h26m
kube-multus-ds-amd64-dq2s8                 1/1     Running   0          5h26m
kube-multus-ds-amd64-sqf8g                 1/1     Running   0          5h26m
kube-proxy-l4wn7                           1/1     Running   5          183d
kube-proxy-prhcm                           1/1     Running   5          183d
kube-proxy-psxqt                           1/1     Running   8          183d
kube-scheduler-master                      1/1     Running   8          183d
ovs-cni-amd64-2wjnx                        1/1     Running   0          4m53s
ovs-cni-amd64-dp7w5                        1/1     Running   0          4m53s
ovs-cni-amd64-l849m                        1/1     Running   0          4m53s

从上面的ovs-cni.yaml可知,ovs-cni的pod中配置了两个容器:ovs-cni-plugin和ovs-cni-marker。下面分别介绍这俩容器的作用
a. ovs-cni-plguin是一个initContainers,它的作用是将ovs binary从image中拷贝到host上的/opt/cni/bin目录下,执行完此容器就结束,所以pod处于running后,READY为1/1,只显示一个容器。可查看下pod的describe中和 ovs-cni-plugin相关的状态。

Init Containers:
  ovs-cni-plugin:
    Container ID:  docker://b74f58af95cf2e36be9c34bc168fcf57a51643b4aeaef92dbff7eae1b25951f8
    Image:         quay.io/kubevirt/ovs-cni-plugin
    Image ID:      docker-pullable://quay.io/kubevirt/ovs-cni-plugin@sha256:4101c52617efb54a45181548c257a08e3689f634b79b9dfcff42bffd8b25af53
    Port:          
    Host Port:     
    Command:
      cp
      /ovs
      /host/opt/cni/bin/ovs
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 16 Aug 2020 15:03:48 +0000
      Finished:     Sun, 16 Aug 2020 15:03:48 +0000
    Ready:          True
    Restart Count:  0
    Environment:    
    Mounts:
      /host/opt/cni/bin from cnibin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ovs-cni-marker-token-mg682 (ro)

b. ovs-cni-marker主要是为了将node上发现的ovs网桥通知k8s,作为k8s的node资源
在三个节点上创建网桥br1

root@master:~# ovs-vsctl add-br br1
root@master:~# ovs-vsctl show
10e5bd4e-be5c-4f68-ba52-59d428e9dbe3
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
    ovs_version: "2.12.0"

可以查看到node的capacity和Allocatable多了ovs-cni的资源,其中"1k"表示br1网桥上ovs端口个数(代码写死,没参数可以修改)。

root@master:~/ovs/ovs-cni-master# kubectl describe node master
...
Capacity:
  ovs-cni.network.kubevirt.io/br1:  1k
...
Allocatable:
  ovs-cni.network.kubevirt.io/br1:  1k
...

查看marker进程,-ovs-socket 指定了ovs db的sock文件,用于获取ovs网桥和接口信息

root@master:~# ps -ef | grep marker | grep -v grep
root     23338 23319  0 15:03 ?        00:00:04 ./marker -v 3 -logtostderr -node-name master -ovs-socket /host/var/run/openvswitch/db.sock

使用ovs-cni

首先创建net-attach-def,可使用如下参数

name (string, required): the name of the network.
type (string, required): "ovs".
bridge (string, required): name of the bridge to use.
vlan (integer, optional): VLAN ID of attached port. Trunk port if not specified.
mtu (integer, optional): MTU.
trunk (optional): List of VLAN ID's and/or ranges of accepted VLAN ID's.

创建单个ovs接口
创建一个net-attach-def ovs-conf, cni 类型为ovs,使用网桥为br1,指定vlan id为100

cat <

创建一个pod,annotations指定网络为ovs-conf

cat <

查看pod的网络接口,lo为默认的loopback接口,tunl0为calico网络自动创建的,eth0为calico创建的默认的pod接口,net1为刚才创建的连接到ovs网桥的接口(接口类型为veth,net1的对端连接到ovs网桥)。

root@master:~# kubectl exec -it test ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE:  mtu 1480 qdisc noop state DOWN qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if42:  mtu 1440 qdisc noqueue state UP
    link/ether 1e:6a:39:93:ba:e8 brd ff:ff:ff:ff:ff:ff
    inet 10.24.166.138/32 scope global eth0
       valid_lft forever preferred_lft forever
6: net1@if5:  mtu 1500 qdisc noqueue state UP
    link/ether 02:00:00:75:a6:3e brd ff:ff:ff:ff:ff:ff

此pod被调度到node1上,查看网桥br1,可看到有一个veth74ca52ee接口,并且vlan id为100。此接口的对端为pod内部的net1。

root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
    Bridge "br1"
        Port "veth74ca52ee"
            tag: 100
            Interface "veth74ca52ee"
        Port "br1"
            Interface "br1"
                type: internal
    ovs_version: "2.12.0"

创建多个ovs接口
可以在pod的annotations中,指定多次同一个net-attach-def,
或者指定多个不同的net-attach-def来达到添加多个接口的目的。
a. 指定多次同一个net-attach-def ovs-conf

cat <

在node1的网桥br1上,可看到添加了两个veth接口,并且vlan都是100

root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
        Port "veth2fa51154"
            tag: 100
            Interface "veth2fa51154"
        Port "veth99fb8572"
            tag: 100
            Interface "veth99fb8572"
    ovs_version: "2.12.0"

b. 指定多个不同的net-attach-def
首先创建另一个net-attach-def ovs-conf1, 指定vlan id为200

cat <

创建pod时,同时指定ovs-conf和ovs-conf1

cat <

在node1的网桥br1添加了两个veth接口,并且vlan是不同的,分别为ovs-conf指定的100和ovs-conf1指定的200.

root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
    Bridge "br1"
        Port "veth1d98bc6f"
            tag: 100
            Interface "veth1d98bc6f"
        Port "br1"
            Interface "br1"
                type: internal
        Port "veth2e3c55ba"
            tag: 200
            Interface "veth2e3c55ba"
    ovs_version: "2.12.0"

参考

https://github.com/kubevirt/ovs-cni
https://github.com/kubevirt/ovs-cni/blob/master/docs/cni-plugin.md
https://github.com/kubevirt/ovs-cni/blob/master/docs/marker.md

你可能感兴趣的:(k8s之ovs-cni)