1. 前言
转载请说明原文出处, 尊重他人劳动成果!
源码位置: https://github.com/nicktming/flannel
分支: tming-v0.10.0 (基于v0.10.0版本)
flannel
1. [docker 网络][flannel] 配置安装测试
2. [docker 网络][flannel] 背后操作
3. [docker 网络][flannel] 源码简单分析
前面两篇文章 [docker 网络][flannel] 配置安装测试 和 [docker 网络][flannel] 背后操作 已经测试了
flannel vxlan
实现跨主机容器之间的访问, 本文将从源码角度简单分析flannel
是如何实现以及管理的, 主要追踪主要逻辑, 不会涉及特别多的细节.
所以将沿着
Main
方法主线进行分析.
2. 查找ExternalInterface
对外出口的设备
// backend/common.go
type ExternalInterface struct {
Iface *net.Interface
IfaceAddr net.IP
ExtAddr net.IP
}
// main.go
func main() {
...
var extIface *backend.ExternalInterface
var err error
// Check the default interface only if no interfaces are specified
if len(opts.iface) == 0 && len(opts.ifaceRegex) == 0 {
// 因为没有指定iface 所以直接从eth0里面找
extIface, err = LookupExtIface("", "")
if err != nil {
log.Error("Failed to find any valid interface to use: ", err)
os.Exit(1)
}
} else {
// 按指定iface找
// Check explicitly specified interfaces
for _, iface := range opts.iface {
extIface, err = LookupExtIface(iface, "")
if err != nil {
log.Infof("Could not find valid interface matching %s: %s", iface, err)
}
if extIface != nil {
break
}
}
...
}
log.Infof("======>IfaceAddr:%v, ExtAddr:%v, Iface.Name:%s, Iface.Index:%d, Iface.MTU:%d", extIface.IfaceAddr, extIface.ExtAddr,
extIface.Iface.Name, extIface.Iface.Index, extIface.Iface.MTU)
...
}
该段落的主要目的是找到该节点与外部网络通信的设备是哪个. 都是调用的同一个方法
LookupExtIface
.如果没有指定
iface
或者ifaceRegex
(通配符), 则寻找默认设备eth0
.
如果指定了iface
就直接按照iface
名字去找到该设备, 比如eth0
. 如果没有找到则尝试通过ifaceRegex
(通配符)去找.
运行
[root@master flannel]# pwd
/root/go/src/github.com/coreos/flannel
[root@master flannel]# go build .
[root@master flannel]# ./flannel --etcd-endpoints="http://172.21.0.16:2379"
I1103 10:51:59.247205 6879 main.go:480] Determining IP address of default interface
I1103 10:51:59.247563 6879 main.go:493] Using interface with name eth0 and address 172.21.0.16
I1103 10:51:59.247575 6879 main.go:510] Defaulting external address to interface address (172.21.0.16)
I1103 10:51:59.247584 6879 main.go:232] ======>IfaceAddr:172.21.0.16, ExtAddr:172.21.0.16, Iface.Name:eth0, Iface.Index:2, Iface.MTU:1500
...
可以看到找到了默认设备
eth0
.
[root@master flannel]# ifconfig eth0
eth0: flags=4163 mtu 1500
inet 172.21.0.16 netmask 255.255.240.0 broadcast 172.21.15.255
inet6 fe80::5054:ff:fed5:4f7e prefixlen 64 scopeid 0x20
ether 52:54:00:d5:4f:7e txqueuelen 1000 (Ethernet)
RX packets 10578209 bytes 2187448866 (2.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10211912 bytes 2268572896 (2.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@master flannel]#
3. 子网管理
这个是已经存到
etcd
中的子网配置信息.
[root@master flannel]# etcdctl get /coreos.com/network/config
{"Network": "10.0.0.0/16", "SubnetLen": 24, "SubnetMin": "10.0.1.0","SubnetMax": "10.0.20.0", "Backend": {"Type": "vxlan"}}
[root@master flannel]#
创建子网管理
===> main.go
func main() {
...
sm, err := newSubnetManager()
...
}
func newSubnetManager() (subnet.Manager, error) {
if opts.kubeSubnetMgr {
return kube.NewSubnetManager(opts.kubeApiUrl, opts.kubeConfigFile)
}
cfg := &etcdv2.EtcdConfig{
Endpoints: strings.Split(opts.etcdEndpoints, ","),
Keyfile: opts.etcdKeyfile,
Certfile: opts.etcdCertfile,
CAFile: opts.etcdCAFile,
Prefix: opts.etcdPrefix,
Username: opts.etcdUsername,
Password: opts.etcdPassword,
}
prevSubnet := ReadSubnetFromSubnetFile(opts.subnetFile)
log.Infof("======>prevSubnet:%s\n", prevSubnet)
return etcdv2.NewLocalManager(cfg, prevSubnet)
}
// 从/run/flannel/subnet.env中读子网
func ReadSubnetFromSubnetFile(path string) ip.IP4Net {
var prevSubnet ip.IP4Net
if _, err := os.Stat(path); !os.IsNotExist(err) {
prevSubnetVals, err := godotenv.Read(path)
if err != nil {
log.Errorf("Couldn't fetch previous subnet from subnet file at %s: %s", path, err)
} else if prevSubnetString, ok := prevSubnetVals["FLANNEL_SUBNET"]; ok {
err = prevSubnet.UnmarshalJSON([]byte(prevSubnetString))
if err != nil {
log.Errorf("Couldn't parse previous subnet from subnet file at %s: %s", path, err)
}
}
}
return prevSubnet
}
===> subnet/etcdv2/local_manager.go
func NewLocalManager(config *EtcdConfig, prevSubnet ip.IP4Net) (Manager, error) {
r, err := newEtcdSubnetRegistry(config, nil)
if err != nil {
return nil, err
}
return newLocalManager(r, prevSubnet), nil
}
func newLocalManager(r Registry, prevSubnet ip.IP4Net) Manager {
return &LocalManager{
registry: r,
previousSubnet: prevSubnet,
}
}
可以看到
LocalManager
中有两个属性:registry: 可以访问
etcd
数据的客户端.
previousSubnet: 该机器上上一次分配的子网信息.
运行
[root@master flannel]# ./flannel --etcd-endpoints="http://172.21.0.16:2379"
I1103 11:22:18.097587 10865 main.go:480] Determining IP address of default interface
I1103 11:22:18.097773 10865 main.go:493] Using interface with name eth0 and address 172.21.0.16
I1103 11:22:18.097783 10865 main.go:510] Defaulting external address to interface address (172.21.0.16)
I1103 11:22:18.097791 10865 main.go:232] ======>IfaceAddr:172.21.0.16, ExtAddr:172.21.0.16, Iface.Name:eth0, Iface.Index:2, Iface.MTU:1500
I1103 11:22:18.097807 10865 main.go:168] ======>prevSubnet:0.0.0.0/0
...
因为刚刚开始的时候
/run/flannel/subnet.env
文件可能不存在, 所以表明以前该机器上没有分配过子网, 或者该文件被删除了. 所以显示了======>prevSubnet:0.0.0.0/0
.
4. 获得backend
4.1 vxlan注册信息
在
flannel
配置的时候需要告诉使用什么Type
, 前文 [docker 网络][flannel] 配置安装测试 用的vxlan
, 比如还有udp
等等, 这些Type
在flannel
这里就是一个backend
, 每个backend
要做的事情是一样的, 只是实现方式不一样而已.
type Backend interface {
RegisterNetwork(ctx context.Context, config *subnet.Config) (Network, error)
}
type Network interface {
Lease() *subnet.Lease
MTU() int
Run(ctx context.Context)
}
type BackendCtor func(sm subnet.Manager, ei *ExternalInterface) (Backend, error)
在程序启动的时候每个
backend
都会去backend manager
注册自己的信息. 以vxlan
为例
===> backend/vxlan/vxlan.go
func init() {
backend.Register("vxlan", New)
}
const (
defaultVNI = 1
)
type VXLANBackend struct {
subnetMgr subnet.Manager
extIface *backend.ExternalInterface
}
func New(sm subnet.Manager, extIface *backend.ExternalInterface) (backend.Backend, error) {
backend := &VXLANBackend{
subnetMgr: sm,
extIface: extIface,
}
return backend, nil
}
// backend/manager.go
var constructors = make(map[string]BackendCtor)
func Register(name string, ctor BackendCtor) {
constructors[name] = ctor
}
可以看到
vxlan
向backend manager
中的constructors
注册了自己的New
方法, 告诉backend manager
如何创建一个vxlan backend
.
4.2 获得backend
既然所有的
type
在程序启动的时候都会注册自己的信息, 那如何获取一个特定的backend
呢? 回头看看main
方法.
func main() {
...
// Fetch the network config (i.e. what backend to use etc..).
config, err := getConfig(ctx, sm)
if err == errCanceled {
wg.Wait()
os.Exit(0)
}
// Create a backend manager then use it to create the backend and register the network with it.
bm := backend.NewManager(ctx, sm, extIface)
be, err := bm.GetBackend(config.BackendType)
if err != nil {
log.Errorf("Error fetching backend: %s", err)
cancel()
wg.Wait()
os.Exit(1)
}
...
}
1.
config, err := getConfig(ctx, sm)
就是从etcd
获取如下的配置信息.
[root@master flannel]# etcdctl get /coreos.com/network/config
{"Network": "10.0.0.0/16", "SubnetLen": 24, "SubnetMin": "10.0.1.0","SubnetMax": "10.0.20.0", "Backend": {"Type": "vxlan"}}
[root@master flannel]#
2.
bm := backend.NewManager(ctx, sm, extIface)
生成一个backend manager
, 为什么需要传入sm
和extIface
, 所以生成某一个具体的backend
这两个参数, 这两个参数是给具体的backend
来操作的. 从此定义从可以看到type BackendCtor func(sm subnet.Manager, ei *ExternalInterface) (Backend, error)
其用法.
func NewManager(ctx context.Context, sm subnet.Manager, extIface *ExternalInterface) Manager {
return &manager{
ctx: ctx,
sm: sm,
extIface: extIface,
active: make(map[string]Backend),
}
}
3.
be, err := bm.GetBackend(config.BackendType)
具体实现如下, 逻辑非常简单, 有该类型的backend
就直接返回, 没有则用注册的New
创建一个.
func (bm *manager) GetBackend(backendType string) (Backend, error) {
bm.mux.Lock()
defer bm.mux.Unlock()
betype := strings.ToLower(backendType)
// see if one is already running
if be, ok := bm.active[betype]; ok {
return be, nil
}
// first request, need to create and run it
befunc, ok := constructors[betype]
if !ok {
return nil, fmt.Errorf("unknown backend type: %v", betype)
}
be, err := befunc(bm.sm, bm.extIface)
if err != nil {
return nil, err
}
bm.active[betype] = be
bm.wg.Add(1)
go func() {
<-bm.ctx.Done()
bm.mux.Lock()
delete(bm.active, betype)
bm.mux.Unlock()
bm.wg.Done()
}()
return be, nil
}
很明显进行完之后可以获得一个
VXLANBackend
实例.
5. 注册网络
===> main.go
func main() {
...
bn, err := be.RegisterNetwork(ctx, config)
...
}
===> backend/vxlan/vxlan.go
func (be *VXLANBackend) RegisterNetwork(ctx context.Context, config *subnet.Config) (backend.Network, error) {
// Parse our configuration
cfg := struct {
VNI int
Port int
GBP bool
DirectRouting bool
}{
VNI: defaultVNI,
}
if len(config.Backend) > 0 {
if err := json.Unmarshal(config.Backend, &cfg); err != nil {
return nil, fmt.Errorf("error decoding VXLAN backend config: %v", err)
}
}
log.Infof("VXLAN config: VNI=%d Port=%d GBP=%v DirectRouting=%v", cfg.VNI, cfg.Port, cfg.GBP, cfg.DirectRouting)
devAttrs := vxlanDeviceAttrs{
vni: uint32(cfg.VNI),
name: fmt.Sprintf("flannel.%v", cfg.VNI),
vtepIndex: be.extIface.Iface.Index,
vtepAddr: be.extIface.IfaceAddr,
vtepPort: cfg.Port,
gbp: cfg.GBP,
}
// 生成flannel.1 设备
dev, err := newVXLANDevice(&devAttrs)
if err != nil {
return nil, err
}
dev.directRouting = cfg.DirectRouting
subnetAttrs, err := newSubnetAttrs(be.extIface.ExtAddr, dev.MACAddr())
if err != nil {
return nil, err
}
// 分配子网的时候 需要保存出口地址以及flannel.1 mac地址
lease, err := be.subnetMgr.AcquireLease(ctx, subnetAttrs)
switch err {
case nil:
case context.Canceled, context.DeadlineExceeded:
return nil, err
default:
return nil, fmt.Errorf("failed to acquire lease: %v", err)
}
// Ensure that the device has a /32 address so that no broadcast routes are created.
// This IP is just used as a source address for host to workload traffic (so
// the return path for the traffic has an address on the flannel network to use as the destination)
if err := dev.Configure(ip.IP4Net{IP: lease.Subnet.IP, PrefixLen: 32}); err != nil {
return nil, fmt.Errorf("failed to configure interface %s: %s", dev.link.Attrs().Name, err)
}
return newNetwork(be.subnetMgr, be.extIface, dev, ip.IP4Net{}, lease)
}
运行:
===> 运行前
[root@master flannel]# etcdctl ls /coreos.com/network/subnets
[root@master flannel]#
===> 运行
[root@master flannel]# ./flannel --etcd-endpoints="http://172.21.0.16:2379"
I1103 12:15:50.487232 18595 main.go:480] Determining IP address of default interface
I1103 12:15:50.487439 18595 main.go:493] Using interface with name eth0 and address 172.21.0.16
I1103 12:15:50.487449 18595 main.go:510] Defaulting external address to interface address (172.21.0.16)
I1103 12:15:50.487457 18595 main.go:232] ======>IfaceAddr:172.21.0.16, ExtAddr:172.21.0.16, Iface.Name:eth0, Iface.Index:2, Iface.MTU:1500
I1103 12:15:50.487505 18595 main.go:168] ======>prevSubnet:10.0.13.0/24
I1103 12:15:50.487570 18595 main.go:240] Created subnet manager: Etcd Local Manager with Previous Subnet: 10.0.13.0/24
I1103 12:15:50.487576 18595 main.go:243] Installing signal handlers
I1103 12:15:50.488757 18595 main.go:358] Found network config - Backend type: vxlan
I1103 12:15:50.488800 18595 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I1103 12:15:50.490867 18595 local_manager.go:201] Found previously leased subnet (10.0.13.0/24), reusing
I1103 12:15:50.491779 18595 local_manager.go:220] Allocated lease (10.0.13.0/24) to current node (172.21.0.16)
I1103 12:15:50.491908 18595 main.go:305] Wrote subnet file to /run/flannel/subnet.env
I1103 12:15:50.491926 18595 main.go:309] Running backend.
I1103 12:15:50.492172 18595 vxlan_network.go:60] watching for new subnet leases
I1103 12:15:50.493612 18595 main.go:401] Waiting for 22h59m59.997445279s to renew lease
...
===> 运行后
[root@master ~]# ifconfig flannel.1
flannel.1: flags=4163 mtu 1450
inet 10.0.13.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::d457:dcff:febb:a72e prefixlen 64 scopeid 0x20
ether d6:57:dc:bb:a7:2e txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
[root@master ~]#
可以看到运行后多了
flannel.1
设备, 然后再看一下etcd
里面的内容
[root@master ~]# etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/10.0.13.0-24
[root@master ~]# etcdctl get /coreos.com/network/subnets/10.0.13.0-24
{"PublicIP":"172.21.0.16","BackendType":"vxlan","BackendData":{"VtepMAC":"d6:57:dc:bb:a7:2e"}}
可以看到
etcd
给172.21.0.16
机器分配了10.0.13.0/24
网络, 并保存了此机器上的flannel.1
的MAC
地址. 这个是当别的机器加入到flannel
中时会用到, 在add fdb, neighbor
中用到.
另外会把分配得到的子网信息存到各自机器的
/run/flannel/subnet.env
中.
[root@master flannel]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.0.0.0/16
FLANNEL_SUBNET=10.0.13.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=false
[root@master flannel]#
另外
AcquireLease
调用tryAcquireLease
去申请子网. 主要分三步1. 从
etcd
中寻找是否有该ip
的子网网段存在.
2. 从本地/run/flannel/subnet.env
中查看该主机是否有分配过子网.
3. 分配一个新的子网.
6. 监控变化
func main() {
...
log.Info("Running backend.")
wg.Add(1)
go func() {
bn.Run(ctx)
wg.Done()
}()
...
}
bn, err := be.RegisterNetwork(ctx, config)
是5. 注册网络中返回的一个Network
对象, 该对象拥有分配的子网信息以及到期时间等等.
func (nw *network) Run(ctx context.Context) {
...
subnet.WatchLeases(ctx, nw.subnetMgr, nw.SubnetLease, events)
...
}()
...
for {
select {
case evtBatch := <-events:
nw.handleSubnetEvents(evtBatch)
...
}
}
这里主要关注两个方法:
subnet.WatchLeases(ctx, nw.subnetMgr, nw.SubnetLease, events):
监控etcd
中子网的变化情况, 一旦有机器加入/删除/更新等等, 就会把信息传到events
中.
handleSubnetEvents
: 一旦有变化, 就需要处理与该变化有关的操作.
for _, event := range batch {
...
switch event.Type {
case subnet.EventAdded:
if directRoutingOK {
log.V(2).Infof("Adding direct route to subnet: %s PublicIP: %s", sn, attrs.PublicIP)
if err := netlink.RouteReplace(&directRoute); err != nil {
log.Errorf("Error adding route to %v via %v: %v", sn, attrs.PublicIP, err)
continue
}
} else {
log.V(2).Infof("adding subnet: %s PublicIP: %s VtepMAC: %s", sn, attrs.PublicIP, net.HardwareAddr(vxlanAttrs.VtepMAC))
if err := nw.dev.AddARP(neighbor{IP: sn.IP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err != nil {
log.Error("AddARP failed: ", err)
continue
}
if err := nw.dev.AddFDB(neighbor{IP: attrs.PublicIP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err != nil {
log.Error("AddFDB failed: ", err)
// Try to clean up the ARP entry then continue
if err := nw.dev.DelARP(neighbor{IP: event.Lease.Subnet.IP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err != nil {
log.Error("DelARP failed: ", err)
}
continue
}
// Set the route - the kernel would ARP for the Gw IP address if it hadn't already been set above so make sure
// this is done last.
if err := netlink.RouteReplace(&vxlanRoute); err != nil {
log.Errorf("failed to add vxlanRoute (%s -> %s): %v", vxlanRoute.Dst, vxlanRoute.Gw, err)
// Try to clean up both the ARP and FDB entries then continue
if err := nw.dev.DelARP(neighbor{IP: event.Lease.Subnet.IP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err != nil {
log.Error("DelARP failed: ", err)
}
if err := nw.dev.DelFDB(neighbor{IP: event.Lease.Attrs.PublicIP, MAC: net.HardwareAddr(vxlanAttrs.VtepMAC)}); err != nil {
log.Error("DelFDB failed: ", err)
}
continue
}
}
case subnet.EventRemoved:
...
}
}
}
可以看到有三个操作:
1. AddARP
2. AddFDB
3. RouteReplace 增加路由
所以说只要etcd
中子网有变化, 那每台机器上的flannel
都需要更新自己的arp
,fdb
,route
表.
运行
在另外一台机器上
(172.21.0.12)
上启动flannel
, 可以看到该机器分配的子网为10.0.10.0/24
, 并且与机器(172.21.0.16)
进行了flannel.1
连通的相关操作.
// 在另外一台机器上(172.21.0.12)上启动flannel
[root@worker flannel]# ./flannel --etcd-endpoints="http://172.21.0.16:2379" --ip-masq=true --v=2
...
I1103 12:53:18.445859 6246 local_manager.go:147] Found lease (10.0.10.0/24) for current IP (172.21.0.12)
...
I1103 12:53:18.449268 6246 vxlan_network.go:138] adding subnet: 10.0.13.0/24 PublicIP: 172.21.0.16 VtepMAC: d6:57:dc:bb:a7:2e
I
查看
flannel.1 arp, fdb, route
表
[root@worker ~]# ifconfig flannel.1
flannel.1: flags=4163 mtu 1450
inet 10.0.10.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::acf0:22ff:fef1:f63d prefixlen 64 scopeid 0x20
ether ae:f0:22:f1:f6:3d txqueuelen 0 (Ethernet)
RX packets 9 bytes 756 (756.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 672 (672.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
[root@worker ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
...
10.0.13.0 10.0.13.0 255.255.255.0 UG 0 0 0 flannel.1
...
[root@worker ~]#
[root@worker ~]# bridge fdb show
...
10.0.13.0 dev flannel.1 lladdr d6:57:dc:bb:a7:2e PERMANENT
...
[root@worker ~]# ip neighbor show
...
10.0.13.0 dev flannel.1 lladdr d6:57:dc:bb:a7:2e PERMANENT
...
[root@worker ~]#
类似的可以看到
172.21.0.16
中flannel
的日志中有了
[root@master flannel]# ./flannel --etcd-endpoints="http://172.21.0.16:2379" --ip-masq=true --v=2
...
I1103 12:53:18.446670 24228 vxlan_network.go:138] adding subnet: 10.0.10.0/24 PublicIP: 172.21.0.12 VtepMAC: ae:f0:22:f1:f6:3d
此时查看一下
172.21.0.16
中的flannel.1
与172.21.0.12
是否可以互通.
// flannel.1(172.21.0.16) ===> flannel.1(172.21.0.12)
[root@master flannel]# ping -c 1 10.0.10.0
PING 10.0.10.0 (10.0.10.0) 56(84) bytes of data.
64 bytes from 10.0.10.0: icmp_seq=1 ttl=64 time=0.402 ms
--- 10.0.10.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.402/0.402/0.402/0.000 ms
[root@master flannel]#
// flannel.1(172.21.0.12) ===> flannel.1(172.21.0.16)
root@worker ~]# ping -c 1 10.0.13.0
PING 10.0.13.0 (10.0.13.0) 56(84) bytes of data.
64 bytes from 10.0.13.0: icmp_seq=1 ttl=64 time=0.399 ms
--- 10.0.13.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.399/0.399/0.399/0.000 ms
可以看到已经互相连通.
7. 总结
1. 请求子网分配的时候, 按如下顺序, 原则就是尽量不让每个机器所属的子网发生变化:
1.1 从
etcd
中寻找是否有该ip的子网网段存在.
1.2 从本地/run/flannel/subnet.env
中查看该主机是否有分配过子网.
1.3 分配一个新的子网.
2. 当有机器加入或者删除或更新等等, 都会触发每台机器去更新对应的
arp fdb route
表. 从而保证不影响通信.