ceph 性能抖动导致kube-ovn无法工作

背景

[图片上传失败...(image-3d0ed5-1655084210955)]

[图片上传失败...(image-d51c22-1655084210955)]

ceph hdd 突然出现性能急剧变差,导致ovn-central无法完成选举,导致ovn-central 无法完成建成健康检查。etcd的性能也不是很理想,kube-ovn-controller etcd 选举标志无法更新到etcd,导致kube-ovn-control 频繁重启。

ovsdb 对磁盘性能敏感,普通hdd(ceph hdd性能出现下降)无法满足性能需求,导致ovn-central 无法正常选举,读写io duration 较高约4s.

也可能是存在io竞争,比如filebeat类负载。

[图片上传失败...(image-1baef7-1655084210955)]


# E:\yealink-code\ovs\ovsdb\raft.c


    case RAFT_FOLLOWER:
        if (now < raft->election_base + raft->election_timer) {
            VLOG_WARN_RL(&rl, "ignoring vote request received after only "
                         "%lld ms (minimum election time is %"PRIu64" ms)",
                         now - raft->election_base, raft->election_timer);
            return true;
        }
        return false;


    /* The election timeout base value for leader election, in milliseconds.
     * It can be set by unixctl cluster/change-election-timer. Default value is
     * ELECTION_BASE_MSEC. */
    uint64_t election_timer;

在选举过程中,除去网络交互,至少有一次写log,一次写ovn sb db,如果读写一次都要3s+,那么不可能在5s内完成

[图片上传失败...(image-c36dd2-1655084210955)]

[图片上传失败...(image-aabbcc-1655084210955)]

[图片上传失败...(image-d0baf9-1655084210955)]

[图片上传失败...(image-ee9769-1655084210955)]

经过分析和ceph替换盘后内存不足有关,5G limit osd pod 内存不足,< 5G osd pod无法启动。

准备迁移数据

ovs依赖文件路径一览

rm -rf /var/run/openvswitch
rm -rf /var/run/ovn
rm -rf /etc/origin/openvswitch/
rm -rf /etc/origin/ovn/
rm -rf /etc/cni/net.d/00-kube-ovn.conflist
rm -rf /etc/cni/net.d/01-kube-ovn.conflist
rm -rf /var/log/openvswitch
rm -rf /var/log/ovn
/etc/origin/
├── openvswitch
│   ├── conf.db
│   └── system-id.conf
└── ovn
    ├── ovnnb_db.db
    └── ovnsb_db.db

# 可以看到该目录的文件只有ovn和ovs相关的数据库
# 预估2000个port 数据量在30M左右,所以这个数据量不会超过1G
# db 都需要至少在ssd(ceph)存储之上

迁移步骤

建一块5G的磁盘,挂载到/etc/origin/目录,重建下ovn-central 和 ovs-ovn pod

有三个节点,逐个替换即可,重启 ovn-central

重启ovs-ovn 会断网,但是只重启master上ovs-ovn不会造成影响,业务负载不在master。

由于 master1 已调整了 ovn-central , 而master2 影响较大,所以优先调整master2

# mkdir /etc/origin/
mkdir /etc/origin1
mkfs.xfs -f /dev/vdc
mount /dev/vdc /etc/origin1
cp -fr /etc/origin/* /etc/origin1
umount /dev/vdc /etc/origin1
# 如果在当前目录内是无法umount的
# fuser -um /dev/vdc
mount /dev/vdc /etc/origin
vi /etc/fstab
/dev/vdc                              /etc/origin          xfs     defaults        0 0

问题: 如果基于空白目录直接替换,那么会有如下结果

mount /dev/vdc /etc/origin


重建ovn-central 和 ovs

ovn-central 和 ovs 都会正常初始化,但是不会继承之前的member id,所以最好采用之前的数据信息,采用复用角色,增量追赶的方式


# 错误结果记录


[root@pro-k8s-master-2 ovn]# tailf ovsdb-server-nb.log

2022-06-10T02:00:20.063Z|00451|raft|INFO|tcp:10.120.33.146:35616: syntax "{"cluster":"c0924b48-fd71-4b92-9b01-39a901e5d5c6","comment":"heartbeat","from":"d57ee44d-74f3-416a-8b41-d8448471b8ff","leader_commit":7884166,"log":[],"prev_log_index":7884166,"prev_log_term":2318,"term":2318,"to":"a9df09bf-a805-4aa0-87b5-2da62490e6dd"}": syntax error: Parsing raft append_request RPC failed: misrouted message (addressed to a9df but we're 5e13)


root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
d57e
Name: OVN_Northbound
Cluster ID: c092 (c0924b48-fd71-4b92-9b01-39a901e5d5c6)
Server ID: d57e (d57ee44d-74f3-416a-8b41-d8448471b8ff)
Address: tcp:[10.120.33.146]:6643
Status: cluster member
Role: leader
Term: 2387
Leader: self
Vote: self

Last Election started 53321 ms ago, reason: timeout
Last Election won: 50300 ms ago
Election timer: 5000
Log: [7868635, 7884644]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->248f ->a9df <-248f <-a9df
Disconnections: 243
Servers:
    5e13 (5e13 at tcp:[10.120.34.53]:6643) next_index=7884644 match_index=0 last msg 1853455 ms ago   # 这个是空白加入的节点,会启用新角色
    248f (248f at tcp:[10.120.35.101]:6643) next_index=7884644 match_index=7884643 last msg 797 ms ago
    d57e (d57e at tcp:[10.120.33.146]:6643) (self) next_index=7884625 match_index=7884643
    a9df (a9df at tcp:[10.120.34.53]:6643) next_index=7884644 match_index=7884643 last msg 797 ms ago   # 原来的角色

可以使用 ovs-appctl cluster/kick 将该新角色踢掉

参考

$ ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl list-commands
The available commands are:
  cluster/cid             DB
  cluster/kick            DB SERVER
  cluster/leave           DB
  cluster/sid             DB
  cluster/status          DB
  coverage/show
  exit
  list-commands
  memory/show
  ovsdb-server/add-db     DB
  ovsdb-server/add-remote REMOTE
  ovsdb-server/compact
  ovsdb-server/connect-active-ovsdb-server
  ovsdb-server/disable-monitor-cond
  ovsdb-server/disconnect-active-ovsdb-server
  ovsdb-server/get-active-ovsdb-server
  ovsdb-server/get-sync-exclude-tables
  ovsdb-server/list-dbs
  ovsdb-server/list-remotes
  ovsdb-server/perf-counters-clear
  ovsdb-server/perf-counters-show
  ovsdb-server/reconnect
  ovsdb-server/remove-db  DB
  ovsdb-server/remove-remote REMOTE
  ovsdb-server/set-active-ovsdb-server
  ovsdb-server/set-sync-exclude-tables
  ovsdb-server/sync-status
  version
  vlog/close
  vlog/disable-rate-limit [module]...
  vlog/enable-rate-limit  [module]...
  vlog/list
  vlog/list-pattern
  vlog/reopen
  vlog/set                {spec | PATTERN:destination:pattern}



ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound 5e13

执行


root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/kick OVN_Northbound 5e13
started removal
root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
d57e
Name: OVN_Northbound
Cluster ID: c092 (c0924b48-fd71-4b92-9b01-39a901e5d5c6)
Server ID: d57e (d57ee44d-74f3-416a-8b41-d8448471b8ff)
Address: tcp:[10.120.33.146]:6643
Status: cluster member
Role: leader
Term: 2389
Leader: self
Vote: self

Last Election started 246592 ms ago, reason: timeout
Last Election won: 241569 ms ago
Election timer: 5000
Log: [7868635, 7884726]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->248f ->a9df <-248f <-a9df
Disconnections: 243
Servers:
    248f (248f at tcp:[10.120.35.101]:6643) next_index=7884726 match_index=7884725 last msg 16 ms ago
    d57e (d57e at tcp:[10.120.33.146]:6643) (self) next_index=7884656 match_index=7884725
    a9df (a9df at tcp:[10.120.34.53]:6643) next_index=7884726 match_index=7884723 last msg 2305 ms ago

# 可以看到已踢掉

类似的 清理南向冲突数据库


root@pro-k8s-master-1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
6a5d
Name: OVN_Southbound
Cluster ID: 7be1 (7be14312-0b05-45c3-91ad-7cad7e9abd2c)
Server ID: 6a5d (6a5df53d-a44c-4524-ad98-ed1a71d9134d)
Address: tcp:[10.120.33.146]:6644
Status: cluster member
Role: follower
Term: 29685
Leader: 5e34
Vote: 5e34

Last Election started 104771 ms ago, reason: leadership_transfer
Last Election won: 104587 ms ago
Election timer: 5000
Log: [333796091, 333800501]
Entries not yet committed: 1
Entries not yet applied: 1
Connections: ->5e34 ->9699 <-5e34 <-9699
Disconnections: 149
Servers:
    5e34 (5e34 at tcp:[10.120.34.53]:6644) last msg 20 ms ago
    2a81 (2a81 at tcp:[10.120.34.53]:6644) last msg 8971396 ms ago
    6a5d (6a5d at tcp:[10.120.33.146]:6644) (self)
    9699 (9699 at tcp:[10.120.35.101]:6644) last msg 69925 ms ago





ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/kick OVN_Southbound 2a81


Last Election started 252638 ms ago, reason: leadership_transfer
Last Election won: 252454 ms ago
Election timer: 5000
Log: [333796091, 333810309]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->5e34 ->9699 <-5e34 <-9699
Disconnections: 149
Servers:
    5e34 (5e34 at tcp:[10.120.34.53]:6644) last msg 3 ms ago
    6a5d (6a5d at tcp:[10.120.33.146]:6644) (self)
    9699 (9699 at tcp:[10.120.35.101]:6644) last msg 217792 ms ago

# 但是这个应答也太久了吧,可能是这个原因导致双主频繁切换,这个第三方应答很慢

[root@pro-k8s-master-3 ovn]# tailf ovsdb-server-sb.log
2022-06-10T04:02:43.431Z|00043|raft|INFO|server 6a5d is leader for term 29676
2022-06-10T04:03:11.661Z|00044|raft|INFO|server 5e34 is leader for term 29677
2022-06-10T04:07:26.783Z|00045|raft|INFO|server 6a5d is leader for term 29678
2022-06-10T04:07:53.643Z|00046|raft|INFO|server 5e34 is leader for term 29679
2022-06-10T04:12:17.662Z|00047|raft|INFO|server 6a5d is leader for term 29680
2022-06-10T04:12:47.224Z|00048|raft|INFO|server 5e34 is leader for term 29681
2022-06-10T04:17:25.519Z|00049|raft|INFO|server 6a5d is leader for term 29682
2022-06-10T04:17:58.691Z|00050|raft|INFO|server 5e34 is leader for term 29683
2022-06-10T04:23:23.640Z|00051|raft|INFO|server 6a5d is leader for term 29684
2022-06-10T04:23:58.321Z|00052|raft|INFO|server 5e34 is leader for term 29685


总结

etcd 和 ovs 三个db 最好都放在ssd本地盘上,本身具备副本机制。 另外其他的数据库在ceph上也是性能不大行。

[图片上传失败...(image-a6d5cd-1655084210955)]

可以看到性能差的时候

读 3.59*1000 / 6 = 600

写 2.66 * 1000 / 51 = 52

读相差600倍, 写相差52倍

[图片上传失败...(image-700a1e-1655084210955)]

可以看到磁盘切换后,再加上hdd性能恢复后,kube-ovn-controller重启次数以及频率都有所放缓。

你可能感兴趣的:(ceph 性能抖动导致kube-ovn无法工作)