k8s的网络中主要存在四种类型的通信:
同一pod内的容器共享同一个网络命名空间,所以可以直接通过lo直接通信
又分为:
Service都会生成一个虚拟IP,称为Service-IP,
Kube-porxy组件负责实现Service-IP路由和转发,在overlay network之上又实现了虚拟转发网络
Kube-porxy实现了以下功能:
分为以下两种情况:
从k8s的service访问Internet
从Internet访问k8s的service
步骤:
说明: 和以前学习的虚拟机NAT上外网,云主机NAT上外网过程一致。
让Internet流量进入k8s集群,可分为:
通过以上章节内容可将k8s网络分层为:
1, Internet : 外部网络 ( 严格来说,internet不属于k8s集群网络 )
2, node网络: 各主机(master、node、etcd等)自身所属的网络,物理网卡配置的网络
3, service网络: 虚拟网络
4, pod网络: 虚拟网络
CNI(container network interface)是容器网络接口
Overlay | L3 Routing | Underlay | |
---|---|---|---|
描述 | 把二层报文封装在IP报文之上进行传输 | 通过三层路由的方式向IP报文传输到目的宿主机 | 直接使用底层网络的IP,与宿主机在同一个网络里进行通讯 |
网络要求 | 低:IP可达 | 二层可达或BGP可达 | 二层可达/交换机支持 |
性能 | 中:封包、拆包 | 高:路由转发 | 高:几乎没有损耗 |
IP类型 | 虚拟IP | 虚拟IP | 物理IP |
集群外访问 | Ingress/NodePort | Ingress/NodePort | Ingress/NodePort |
访问控制 | Network Policy | Network Policy | Iptables/外部网络 |
静态IP | 不支持 | 不支持 | 支持 |
场景 | 对性要求不高的;网络环境不灵活的 | 大多数场景 | 对性能要求高的,需要和现有业务直接通信,需要静态IP |
开源产品 | flannel-vxlan,openshift-sdn | calico,flannel-hostgw | Macvlan/IPvlan |
[root@master1 ~]# ls /etc/cni/net.d/
10-calico.conflist calico-kubeconfig calico-tls
[root@master1 ~]# which calico
/opt/kube/bin/calico
官方网站: https://www.projectcalico.org/
1, 更优的资源利用
2, 可扩展性
3, 简单而更容易 debug
4. 更少的依赖
5, 可适配性
优势总结:
1,Felix: 运行在每一台Host的agent进程,主要负责网络接口管理和监听、路由、ARP 管理、ACL 管理和同步、状态上报等。
[root@master1 ~]# ps -ef |grep felix |grep -v grep
root 8712 7527 0 12:06 ? 00:00:00 runsv felix
root 8723 8712 5 12:06 ? 00:30:24 calico-node -felix
其它节点上都可查到相关进程
2,etcd: 分布式键值存储,主要负责网络元数据一致性,确保Calico网络状态的准确性,可以与kubernetes共用;
3,BGP Client(BIRD):Calico 为每一台 Host 部署一个BGP Client,使用BIRD实现,BIRD 是一个单独的持续发展的项目,实现了众多动态路由协议比如 BGP、OSPF、RIP 等。在 Calico 的角色是监听 Host 上由 Felix 注入的路由信息,然后通过 BGP 协议广播告诉剩余 Host 节点,从而实现网络互通。
4,BGP Route Reflector:在大型网络规模中,如果仅仅使用 BGP client 形成 mesh 全网互联的方案就会导致规模限制,因为所有节点之间俩俩互联,需要 N^2 个连接,为了解决这个规模问题,可以采用 BGP 的 Router Reflector 的方法,使所有 BGP Client 仅与特定 RR 节点互联并做路由同步,从而大大减少连接数。
calico主要通过两种协议来实现通信
bgp协议主要由两种方式:
安装calicoctl工具
[root@master1 ~]# wget https://github.com/projectcalico/calicoctl/releases/download/v3.16.5/calicoctl-linux-amd64
[root@master1 ~]# mv calicoctl-linux-amd64 /bin/calicoctl
[root@master1 ~]# chmod a+x /bin/calicoctl
如果是kubeasz安装时选择了calico方案,默认就已经有calicoctl命令了
[root@master1 ~]# which calicoctl
/opt/kube/bin/calicoctl
查看calico节点信息
[root@master1 ~]# calicoctl get node
NAME
master1
master2
node1
node2
查看calico节点详细信息
[root@master1 ~]# calicoctl get node -o yaml
......
......
查看当前ip池
[root@master1 ~]# calicoctl get ippool
NAME CIDR SELECTOR
default-ipv4-ippool 10.3.0.0/16 all()
查看当前网络模式
[root@master1 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+-------------------+-------+----------+-------------+
| 192.168.122.12 | node-to-node mesh | up | 04:06:47 | Established |
| 192.168.122.13 | node-to-node mesh | up | 04:07:56 | Established |
| 192.168.122.14 | node-to-node mesh | up | 04:06:37 | Established |
+----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
可以看到使用的是BGP的node-to-node mesh模式
[root@master1 ~]# netstat -anp | grep ESTABLISH | grep bird
tcp 0 0 192.168.122.11:179 192.168.122.13:39842 ESTABLISHED 8955/bird
tcp 0 0 192.168.122.11:179 192.168.122.14:54717 ESTABLISHED 8955/bird
tcp 0 0 192.168.122.11:179 192.168.122.12:59785 ESTABLISHED 8955/bird
我的k8s集群是4个节点,所以每个节点上能看到3个连接。(证明了节点之间是两两连接)
参考: https://kubernetes.io/zh/docs/concepts/services-networking/network-policies/
网络策略就是对网络进行隔离和限制。
CNI插件可以实现不同Node节点的Pod互通问题,这整个就是一个扁平化的网络。但是如果遇到以下场景呢?
这时候网络策略就派上用场了。
网络策略就相当于把iptables防火墙规则做成了YAML资源,只要iptables会玩,网络策略就不难。
以官方一个YAML例子做讲解:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default # 代表此策略对default这个namespace生效(还要看下面的podSelector)
spec:
podSelector:
matchLabels:
role: db # 代表此策略对default这个namespace里的带有role=db标签的pod才生效
policyTypes:
- Ingress # 相当于iptables -P INPUT DROP,做成进入流量全拒绝
- Egress # 相当于iptables -P OUTPUT DROP,做成出去流量全拒绝
ingress: # 在双链接拒绝的情况下配置的进入白名单
- from:
- ipBlock: # 以ip段的方式配置的白名单
cidr: 172.17.0.0/16 # 允许的IP段
except:
- 172.17.1.0/24 # 允许的IP段基础上再拒绝的IP段或IP
- namespaceSelector: # 以namespace的标签配置的白名单
matchLabels:
project: myproject # 允许带有project=myproject标签的namespace里的所有pod
- podSelector:
matchLabels:
role: frontend # 允许相同namespace里的带有role=frontend标签的pod
ports:
- protocol: TCP
port: 6379 # 相当于一个端口的过滤条件。上面三种白名单都只能访问目标tcp:6379端口
egress: # 在双链接拒绝的情况下配置的出去白名单
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 5978
1,创建两个命名空间
[root@master1 ~]# kubectl create ns dev
namespace/dev created
[root@master1 ~]# kubectl create ns test
namespace/test created
2,分别运行一个pod
[root@master1 ~]# kubectl run nginx1 --image=nginx:1.15-alpine -n dev
pod/nginx1 created
[root@master1 ~]# kubectl run nginx2 --image=nginx:1.15-alpine -n test
pod/nginx2 created
3, 查看两个pod的IP
[root@master1 ~]# kubectl get pods -o wide -n dev
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx1 1/1 Running 0 8m33s 10.3.104.4 192.168.122.14
[root@master1 ~]# kubectl get pods -o wide -n test
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx2 1/1 Running 0 8m26s 10.3.166.139 192.168.122.13
4, 验证不同命名空间两个pod可以互通
[root@master1 ~]# kubectl exec -it nginx1 -n dev -- ping -c 2 10.3.166.139
PING 10.3.166.139 (10.3.166.139): 56 data bytes
64 bytes from 10.3.166.139: seq=0 ttl=62 time=1.454 ms
64 bytes from 10.3.166.139: seq=1 ttl=62 time=0.891 ms
[root@master1 ~]# kubectl exec -it nginx2 -n test -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
64 bytes from 10.3.104.4: seq=0 ttl=62 time=1.094 ms
64 bytes from 10.3.104.4: seq=1 ttl=62 time=4.969 ms
5,对dev命名空间创建网络策略(拒绝所有进入流量,允许所有出去流量)
[root@master1 ~]# vim dev-netpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dev-netpolicy
spec:
podSelector: {} # 任意pod
policyTypes:
- Ingress # ingress代表进入
[root@master1 ~]# kubectl apply -f ingress-deny.yaml -n dev
networkpolicy.networking.k8s.io/dev-netpolicy created
[root@master1 ~]# kubectl get netpol -n dev
NAME POD-SELECTOR AGE
dev-netpolicy 9s
6, 验证dev的pod可以ping通test的pod(对于dev来说,可以出去)
[root@master1 ~]# kubectl exec -it nginx1 -n dev -- ping -c 2 10.3.166.139
PING 10.3.166.139 (10.3.166.139): 56 data bytes
64 bytes from 10.3.166.139: seq=0 ttl=62 time=0.345 ms
64 bytes from 10.3.166.139: seq=1 ttl=62 time=0.690 ms
7, 验证test的pod不能ping通dev的pod(对于dev来说,拒绝进入)
[root@master1 ~]# kubectl exec -it nginx2 -n test -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
1,修改yaml并应用
[root@master1 ~]# vim dev-netpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dev-netpolicy
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress # 加1个Egress代表出, 其它不变
[root@master1 ~]# kubectl apply -f ingress-deny.yaml -n dev
networkpolicy.networking.k8s.io/dev-netpolicy configured # apply后在原基础上修改了配置
2,验证( 对于dev命名空间来说,进入与出去都拒绝)
出去ping不通
[root@master1 ~]# kubectl exec -it nginx1 -n dev -- ping -c 2 10.3.166.139
PING 10.3.166.139 (10.3.166.139): 56 data bytes
进入ping不通
[root@master1 ~]# kubectl exec -it nginx2 -n test -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
1, dev命名空间再创建一个名为nginx3的pod,test命名空间再创建一个名为nginx4的pod
[root@master1 ~]# kubectl run nginx3 --image=nginx:1.15-alpine -n dev
pod/nginx3 created
[root@master1 ~]# kubectl run nginx4 --image=nginx:1.15-alpine -n test
pod/nginx4 created
2, 为dev命名空间名为nginx1的pod打一个标记
[root@master1 haha]# kubectl label pod nginx1 app=nginx1 -n dev
pod/nginx1 labeled
3, 修改策略并应用
[root@master1 ~]# vim dev-netpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dev-netpolicy
spec:
podSelector:
matchLabels:
app: nginx1 # 匹配pod的标签
policyTypes:
- Ingress
- Egress # 双链拒绝
ingress:
- from:
- ipBlock: # IP地址块
cidr: 10.3.0.0/16 # 允许的IP段
except:
- 10.3.166.139/32 # 在允许的IP段中拒绝的IP(必须加掩码32位),这个IP是test命名空间的nginx2的pod-ip
[root@master1 ~]# kubectl apply -f dev-netpolicy.yaml -n dev
networkpolicy.networking.k8s.io/dev-netpolicy configured
[root@master1 ~]# kubectl get netpol -n dev
NAME POD-SELECTOR AGE
dev-netpolicy app=nginx1 85m
说明: 总的意思就是dev命名空间中带有app=nginx1标签的pod,允许10.3.0.0/16
网段进来,但拒绝10.3.166.139
进来
4, 确认nginx1与nginx3的pod-IP
[root@master1 ~]# kubectl get pods -o wide -n dev
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx1 1/1 Running 0 134m 10.3.104.4 192.168.122.14
nginx3 1/1 Running 0 20m 10.3.104.5 192.168.122.14
5, 验证
[root@master1 ~]# kubectl exec -it nginx4 -n test -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
64 bytes from 10.3.104.4: seq=0 ttl=63 time=0.146 ms
64 bytes from 10.3.104.4: seq=1 ttl=63 time=0.150 ms
[root@master2 ~]# kubectl exec -it nginx2 -n test -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
[root@master2 ~]# kubectl exec -it nginx2 -n test -- ping -c 2 10.3.104.5
PING 10.3.104.5 (10.3.104.5): 56 data bytes
64 bytes from 10.3.104.5: seq=0 ttl=62 time=0.507 ms
64 bytes from 10.3.104.5: seq=1 ttl=62 time=0.749 ms
1,修改策略并应用
[root@master1 ~]# vim dev-netpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dev-netpolicy
spec:
podSelector:
matchLabels:
app: nginx1 # 只能访问app=nginx1标签的pod
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
access_nginx1: "true" # 表示只有带有access_nginx1=true标签的pod才能访问(同namespace下)
[root@master1 ~]# kubectl apply -f dev-netpolicy.yaml -n dev
networkpolicy.networking.k8s.io/dev-netpolicy configured
2, 验证: 同namespace下的nginx3 ping不通nginx1
[root@master2 ~]# kubectl exec -it nginx3 -n dev -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
3, 验证: 为nginx3打上access_nginx1=true
标签就可以访问nginx1了
[root@master2 ~]# kubectl label pod nginx3 access_nginx1=true -n dev
pod/nginx3 labeled
[root@master2 ~]# kubectl exec -it nginx3 -n dev -- ping -c 2 10.3.104.4
PING 10.3.104.4 (10.3.104.4): 56 data bytes
64 bytes from 10.3.104.4: seq=0 ttl=63 time=0.260 ms
64 bytes from 10.3.104.4: seq=1 ttl=63 time=0.164 ms
写在最后: 以上几个小例子只是抛砖引玉,更多复杂策略请按照业务需求对照语法来实现。
[root@master1 ~]# vim bgpconfig.yml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: false # 关闭node-to-node模式
asNumber: 61234 # 自定义AS号
[root@master1 ~]# calicoctl apply -f bgpconfig.yml
Successfully applied 1 'BGPConfiguration' resource(s)
验证
[root@master1 ~]# calicoctl get bgpconfig
NAME LOGSEVERITY MESHENABLED ASNUMBER
default Info false 61234
[root@master1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-nginx-59bd7848d6-4fsph 1/1 Running 1 22h 10.3.104.35 192.168.122.14
deploy-nginx-59bd7848d6-c5c49 1/1 Running 1 22h 10.3.166.160 192.168.122.13
[root@master1 ~]# ping 10.3.104.35
ping不通k8s集群中的pod了
[root@master1 ~]# calicoctl get nodes --output=wide
NAME ASN IPV4 IPV6
master1 (61234) 192.168.122.11/24
master2 (61234) 192.168.122.12/24
node1 (61234) 192.168.122.13/24
node2 (61234) 192.168.122.14/24
[root@master1 ~]# vim bgppeer.yml
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: bgppeer-global
spec:
peerIP: 192.168.122.11 # 指定路由反射器节点(BGPPeer)
asNumber: 61234
验证
[root@master1 ~]# calicoctl apply -f bgppeer.yml
Successfully applied 1 'BGPPeer' resource(s)
[root@master1 ~]# calicoctl get bgppeer
NAME PEERIP NODE ASN
bgppeer-global 192.168.122.11 (global) 61234
验证
[root@master1 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+----------------+---------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+---------------+-------+----------+-------------+
| 192.168.122.12 | node specific | up | 15:10:22 | Established |
| 192.168.122.13 | node specific | up | 15:10:22 | Established | 现在为node specific
| 192.168.122.14 | node specific | up | 15:10:22 | Established |
+----------------+---------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
[root@master1 ~]# ping -c 2 10.3.104.35
PING 10.3.104.35 (10.3.104.35) 56(84) bytes of data.
64 bytes from 10.3.104.35: icmp_seq=1 ttl=63 time=0.359 ms
64 bytes from 10.3.104.35: icmp_seq=2 ttl=63 time=0.759 ms k8s集群上的pod也可以ping得通了
除了bgppeer节点有N-1个连接外,其它节点只有1个连接
[root@master1 ~]# netstat -anp | grep ESTABLISH | grep bird
tcp 0 0 192.168.122.11:55617 192.168.122.14:179 ESTABLISHED 8955/bird
tcp 0 0 192.168.122.11:179 192.168.122.12:43342 ESTABLISHED 8955/bird
tcp 0 0 192.168.122.11:33696 192.168.122.13:179 ESTABLISHED 8955/bird
[root@master2 ~]# netstat -anp | grep ESTABLISH | grep bird
tcp 0 0 192.168.122.12:43342 192.168.122.11:179 ESTABLISHED 8216/bird
[root@node1 ~]# netstat -anp | grep ESTABLISH | grep bird
tcp 0 0 192.168.122.13:179 192.168.122.11:33696 ESTABLISHED 13541/bird
[root@node2 ~]# netstat -anp | grep ESTABLISH | grep bird
tcp 0 0 192.168.122.14:179 192.168.122.11:55617 ESTABLISHED 12223/bird
[root@master1 ~]# calicoctl delete -f bgppeer.yml
Successfully deleted 1 'BGPPeer' resource(s)
[root@master1 ~]# vim bgpconfig.yml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: true # 再改为True
asNumber: 61234
[root@master1 ~]# calicoctl apply -f bgpconfig.yml
Successfully applied 1 'BGPConfiguration' resource(s)
[root@master1 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+-------------------+-------+----------+-------------+
| 192.168.122.12 | node-to-node mesh | up | 08:17:11 | Established |
| 192.168.122.13 | node-to-node mesh | up | 08:17:11 | Established |
| 192.168.122.14 | node-to-node mesh | up | 08:17:11 | Established |
+----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
下一篇:k8s-HPA