本文分析 Kube-ovn cni,部分,详细分析 cni daemon,kubelet 调用 cni 二进制文件添加容器网络和删除容器网络。
kube-ovn cni 配置文件 /etc/cni/net.d/01-kube-ovn.conflist,cni 二进制通过 socket 与 cni daemon 通信,发送请求来配置和删除容器网络。
{
"name":"kube-ovn",
"cniVersion":"0.3.1",
"plugins":[
{
"type":"kube-ovn",
"server_socket":"/run/openvswitch/kube-ovn-daemon.sock"
},
{
"type":"portmap",
"capabilities":{
"portMappings":true
}
}
]
}
发送 POST /api/v1/add 请求到 cni daemon 进程。
client := request.NewCniServerClient(netConf.ServerSocket)
response, err := client.Add(request.CniRequest{
CniType: netConf.Type,
PodName: podName,
PodNamespace: podNamespace,
ContainerID: args.ContainerID,
NetNs: args.Netns,
IfName: args.IfName,
Provider: netConf.Provider,
DeviceID: netConf.DeviceID,
VfDriver: netConf.VfDriver,
})
cni daemon 启动命令: kube-ovn-daemon --ovs-socket=/run/openvswitch/db.sock --bind-socket=/run/openvswitch/kube-ovn-daemon.sock --enable-mirror=false --encap-checksum=false --service-cluster-ip-range=10.96.0.0/12 --iface= --network-type=geneve --default-interface-name=
从 /etc/openvswitch/system-id.conf 读取 chassis ID,更新该节点 node 的注解 ovn.kubernetes.io/chassis
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: "0"
ovn.kubernetes.io/allocated: "true"
ovn.kubernetes.io/chassis: a789446a-df5c-4f9f-b65e-fa385962855f
如果启动命 enable-mirror=true,则 configureGlobalMirror 调用 ovs-vsctl add-port br-int ${portName} 配置 port 开启流量 mirror,如果启动命令 enable-mirror=false,configureEmptyMirror 关闭流量 mirror
2.3.1 根据 node 注解获取 mac,ip,cidr,portName,gateway,调用 configureNodeNic 配置 Node 的网卡。 调用命令 ovs-vsctl --may-exist add-port br-int ovn0 -- set interface ovn0 type=internal -- set interface ovn0 external_ids:iface-id=${port_name} external_ids:ip=${pod_ip}
_uuid : cbfb82d1-edd6-4f18-9bb2-cdc6a37f9e54
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {iface-id=node-node1, ip="100.64.0.2", ovn-installed="true"}
ifindex : 7
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 1
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "00:00:00:fb:59:c5"
mtu : 1400
mtu_request : []
name : ovn0
ofport : 2
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=282850635, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=3116636, tx_bytes=2608833065, tx_dropped=0, tx_errors=0, tx_packets=3262147}
status : {driver_name=openvswitch}
type : internal
2.3.2 configureNic 配置 ovn0 网卡的 ip,mac,mtu
实例化对 k8s 的 Controller 控制器,这里关注的资源包括, ProviderNetworks, Subnets, Pods, Nodes
go wait.Until(ovs.CleanLostInterface, time.Minute, stopCh)
go wait.Until(recompute, 10*time.Minute, stopCh)
if ok := cache.WaitForCacheSync(stopCh, c.providerNetworksSynced, c.subnetsSynced, c.podsSynced, c.nodesSynced); !ok {
klog.Fatalf("failed to wait for caches to sync")
return
}
klog.Info("Started workers")
go wait.Until(c.loopOvn0Check, 5*time.Second, stopCh)
go wait.Until(c.runAddOrUpdateProviderNetworkWorker, time.Second, stopCh)
go wait.Until(c.runDeleteProviderNetworkWorker, time.Second, stopCh)
go wait.Until(c.runSubnetWorker, time.Second, stopCh)
go wait.Until(c.runPodWorker, time.Second, stopCh)
go wait.Until(c.runGateway, 3*time.Second, stopCh)
2.5.1 loopOvn0Check 函数获取 node 的网关,ovn0Check 可以 ping 通网关
2.5.2 runAddOrUpdateProviderNetworkWorker 函数,如果该节点为 provider-network 设置的排除节点则调用 cleanProviderNetwork, ovsCleanProviderNetwork 函数删除 host 网卡,包括删除桥上端口,删除桥 IP,设置桥设备 down,路由
spec:
customInterfaces:
- interface: enp1s0
nodes:
- node1
defaultInterface: eth0
excludeNodes:
- node2
2.5.2.1 initProviderNetwork 函数获取该节点设置的网络接口,ovsInitProviderNetwork 函数
configExternalBridge 配置外部桥,调用命令 ovs-vsctl ---may-exist add-br br-net1 -- set bridge br-net1 external_ids:vendor=kube-ovn。
configProviderNic 函数添加 host 网卡到外部桥上,IP 地址,mac,路由都设置在桥上。调用命令 ovs-vsctl --may-exist add-port ${br-net1} ${enp1s0} --set port ${enp1s0} external_ids:vendor=kube-ovn。 把 host nic 地址移除,添加到 bridge 上
spec:
customInterfaces:
- interface: enp1s0
nodes:
- node1
defaultInterface: eth0
excludeNodes:
- node2
status:
conditions:
- lastTransitionTime: "2021-08-12T06:26:41Z"
lastUpdateTime: "2021-08-12T06:26:41Z"
node: node1
reason: InitOVSBridgeSucceeded
status: "True"
type: Ready
readyNodes:
- node1
vlans:
- vlan1
_uuid : d095aab5-da4a-45c0-b260-180e169280a6
auto_attach : []
controller : []
datapath_id : "0000525400c02e6e"
datapath_type : ""
datapath_version : ""
external_ids : {vendor=kube-ovn}
fail_mode : []
flood_vlans : []
flow_tables : {}
ipfix : []
mcast_snooping_enable: false
mirrors : []
name : br-net1
netflow : []
other_config : {hwaddr="52:54:00:c0:2e:6e"}
ports : [3a9a5e6c-f981-449f-9c2f-962b9acfbf53, aa183004-64ee-498a-8314-35924ca52637]
protocols : []
rstp_enable : false
rstp_status : {}
sflow : []
status : {}
stp_enable : false
2.5.3 runDeleteProviderNetworkWorker 函数
ovsCleanProviderNetwork 把 host 网卡从外部桥移除,把external 桥删除
把 node 资源的 labels 删除。更新到 k8s 中。
2.5.4 runSubnetWorker 函数调整路由
策略路由待分析TODO
2.5.5 runPodWorker 设置 pod 也就是 interface 的带宽
2.5.6 runGateway
2.5.6.1 setIPSet 更新 IPSet 表,包括 ovn40services,ovs40subnets,ovn40local-pod-ip-nat,ovn40subnets-nat,ovn40other-node
Name: ovn40services
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 408
References: 5
Number of entries: 1
Members:
10.96.0.0/12
Name: ovn40subnets
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 472
References: 8
Number of entries: 2
Members:
100.64.0.0/16
10.16.0.0/16
Name: ovn40local-pod-ip-nat
Type: hash:ip
Revision: 4
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 280
References: 1
Number of entries: 4
Members:
10.16.0.5
10.16.0.6
10.16.0.4
10.16.0.11
2.5.6.2 setPolicyRouting 待分析TODO
2.5.6.3 setIptables 这里比较多,设置添加 iptables 规则
2.5.6.4 setGatewayBandwidth
2.5.6.5 setICGateway
2.5.6.6 setExGateway
2.5.6.7 appendMssRule
启动 http server,包括 api/v1/add 和 api/v1/del 处理函数分别为,handleAdd 和 handleDel
2.6.1 handleAdd 函数
接收的 CniRequest 请求为:
cni_type: kube-ovn
pod_name: nginx-57dd86f5cc-9btkq
pod_namespace:
container_id: 532bed4f476d3e2a63f14907bfe9ddd5d89278eb75a3340901f5d8acdabbd46a
net_ns: /proc/10112/ns/net
if_name: eth0
provider: ovn
2.6.1.1 获取 pod 的注解,包括 mac,ip,cidr,gateway,subnet,ingress,egress,provider_network,vlanID,ipAddr,nicType
apiVersion: v1
kind: Pod
metadata:
annotations:
ovn.kubernetes.io/allocated: "true"
ovn.kubernetes.io/cidr: 100.100.0.0/16
ovn.kubernetes.io/gateway: 100.100.0.1
ovn.kubernetes.io/ip_address: 100.100.0.3
ovn.kubernetes.io/logical_switch: subnet1
ovn.kubernetes.io/mac_address: 00:00:00:B8:58:AB
ovn.kubernetes.io/network_type: vlan
ovn.kubernetes.io/pod_nic_type: veth-pair
ovn.kubernetes.io/provider_network: net1
ovn.kubernetes.io/routed: "true"
ovn.kubernetes.io/vlan_id: "1000"
2.6.1.2 createOrUpdateIPCr 函数涉及 IPS 资源,如果未查询到则创建创建 IP 资源,存在则更新。
2.6.1.3 如果网卡类型为 internal-port 则调用函数 configureNicWithInternalPort 待分析TODO
否则调用函数 configureNic 建立 veth-pair,把 veth-pair host 端加入到 ovs port,调用命令 ovs-vsctl --may-exist add-port br-int ${hostNicName} -- set interface ${hostNicname} external_ids:iface-id=${ifaceID} external_ids:pod_name=${podName} external_ids:pod_namespace=${podNamespace} external_ids:ip=${ipStr} external_ids:pod_netns=${podNetns}
configureHostNic 配置网卡,如果包含 vlanID,则调用 ovs.SetPortTag 添加 port tag=${vlanID}
ovn.kubernetes.io/ingress_rate: "3"
ovn.kubernetes.io/egress_rate: "1"
ovs.SetInterfaceBandwidth 如果设置 ingress 和 egress,找到 ovs 桥的 port 端口,设置 interface 的属性使用 egress_rate,待分析TODO,设置 ingress_policing_rate=1000 和 ingress_policing_burst=1000 * 8 / 10
_uuid : 62e3e579-3736-413d-9a63-591ec062ae35
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : full
error : []
external_ids : {iface-id=starter-backend-69459d76f5-ktrwn.ns1, ip="10.16.0.11", ovn-installed="true", pod_name=starter-backend-69459d76f5-ktrwn, pod_namespace=ns1, pod_netns="/proc/10542/ns/net"}
ifindex : 3082
ingress_policing_burst: 800
ingress_policing_rate: 1000
lacp_current : []
link_resets : 1
link_speed : 10000000000
link_state : up
lldp : {}
mac : []
mac_in_use : "ee:09:ec:2a:7d:59"
mtu : 1400
mtu_request : []
name : "9e3f831b4601_h"
ofport : 1542
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=100, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=2, tx_bytes=820, tx_dropped=0, tx_errors=0, tx_packets=10}
status : {driver_name=veth, driver_version="1.0", firmware_version=""}
type : ""
对于 egress 的设置使用 qos,设置的为 ingress_rate 注解。在 port 端口的属性 qos 设置该 qosID
_uuid : 8f5b65a2-868a-4f78-bfe6-559c35fbbd5d
external_ids : {iface-id=starter-backend-69459d76f5-ktrwn.ns1, pod="ns1/starter-backend-69459d76f5-ktrwn"}
other_config : {max-rate="3000000"}
queues : {}
type : linux-htb
configureContainerNic 函数配置容器里的网卡,配置 IP,mac, mtu,配置路由
2.6.1.4 ovs.ConfigInterfaceMirror 函数配置