问题描述
单节点的ceph环境,周末机房异常断电后,周一来发现环境大量pg不健康(pg down、stale等)、大量osd down,且手动启osd失败,查看osd日志中大量形如:ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory的报错
Ceph版本
[root@hyhive /]# ceph version
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
[root@hyhive /]#
初步环境问题定位
1.查看osd的状态
[root@hyhive osd]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-5 18.56999 root hdd_root
-6 18.56999 host hdd_host
1 1.85699 osd.1 down 0 1.00000
2 1.85699 osd.2 down 0 1.00000
3 1.85699 osd.3 up 1.00000 1.00000
4 1.85699 osd.4 down 0 1.00000
5 1.85699 osd.5 up 1.00000 1.00000
6 1.85699 osd.6 up 1.00000 1.00000
7 1.85699 osd.7 down 0 1.00000
8 1.85699 osd.8 down 0 1.00000
9 1.85699 osd.9 down 0 1.00000
10 1.85699 osd.10 down 0 1.00000
-3 0.88399 root ssd_root
-4 0.88399 host ssd_host
0 0.44199 osd.0 down 0 1.00000
11 0.44199 osd.11 down 1.00000 1.00000
-1 19.45399 root default
-2 19.45399 host hyhive
0 0.44199 osd.0 down 0 1.00000
1 1.85699 osd.1 down 0 1.00000
2 1.85699 osd.2 down 0 1.00000
3 1.85699 osd.3 up 1.00000 1.00000
4 1.85699 osd.4 down 0 1.00000
5 1.85699 osd.5 up 1.00000 1.00000
6 1.85699 osd.6 up 1.00000 1.00000
7 1.85699 osd.7 down 0 1.00000
8 1.85699 osd.8 down 0 1.00000
9 1.85699 osd.9 down 0 1.00000
10 1.85699 osd.10 down 0 1.00000
11 0.44199 osd.11 down 1.00000 1.00000
[root@hyhive osd]#
[root@hyhive osd]#
2.查看各磁盘的挂载状态
[root@hyhive osd]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 138.8G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 475M 0 part /boot
└─sda3 8:3 0 131.9G 0 part
├─centos-root 253:0 0 99.9G 0 lvm /
└─centos-swap 253:1 0 32G 0 lvm [SWAP]
sdb 8:16 0 447.1G 0 disk
├─sdb1 8:17 0 442.1G 0 part
└─sdb2 8:18 0 5G 0 part
sdc 8:32 0 447.1G 0 disk
├─sdc1 8:33 0 442.1G 0 part
└─sdc2 8:34 0 5G 0 part
sdd 8:48 0 1.8T 0 disk
├─sdd1 8:49 0 1.8T 0 part
└─sdd2 8:50 0 5G 0 part
sde 8:64 0 1.8T 0 disk
├─sde1 8:65 0 1.8T 0 part
└─sde2 8:66 0 5G 0 part
sdf 8:80 0 1.8T 0 disk
├─sdf1 8:81 0 1.8T 0 part /var/lib/ceph/osd/ceph-3
└─sdf2 8:82 0 5G 0 part
sdg 8:96 0 1.8T 0 disk
├─sdg1 8:97 0 1.8T 0 part
└─sdg2 8:98 0 5G 0 part
sdh 8:112 0 1.8T 0 disk
├─sdh1 8:113 0 1.8T 0 part /var/lib/ceph/osd/ceph-5
└─sdh2 8:114 0 5G 0 part
sdi 8:128 0 1.8T 0 disk
├─sdi1 8:129 0 1.8T 0 part /var/lib/ceph/osd/ceph-6
└─sdi2 8:130 0 5G 0 part
sdj 8:144 0 1.8T 0 disk
├─sdj1 8:145 0 1.8T 0 part
└─sdj2 8:146 0 5G 0 part
sdk 8:160 0 1.8T 0 disk
├─sdk1 8:161 0 1.8T 0 part
└─sdk2 8:162 0 5G 0 part
sdl 8:176 0 1.8T 0 disk
├─sdl1 8:177 0 1.8T 0 part
└─sdl2 8:178 0 5G 0 part
sdm 8:192 0 1.8T 0 disk
├─sdm1 8:193 0 1.8T 0 part
└─sdm2 8:194 0 5G 0 part
[root@hyhive osd]#
从磁盘挂载情况来看,发现有很多盘没有挂载,怀疑与此相关,待一步确认
3.查看ceph-osd日志
手动尝试启osd.7的服务失败后,选择查看其日志,日志报错如下:
[root@hyhive osd]# vim /var/log/ceph/ceph-osd.7.log
2018-12-03 13:05:49.951385 7f060066f800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory^[[0m
2018-12-03 13:05:50.159180 7f6ddf984800 0 set uid:gid to 167:167 (ceph:ceph)
2018-12-03 13:05:50.159202 7f6ddf984800 0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 27941
2018-12-03 13:05:50.159488 7f6ddf984800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory^[[0m
2018-12-03 13:10:52.974345 7f1405dab800 0 set uid:gid to 167:167 (ceph:ceph)
2018-12-03 13:10:52.974368 7f1405dab800 0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 34223
2018-12-03 13:10:52.974634 7f1405dab800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory^[[0m
2018-12-03 13:10:53.123099 7f0f7af13800 0 set uid:gid to 167:167 (ceph:ceph)
2018-12-03 13:10:53.123120 7f0f7af13800 0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 34295
2018-12-03 13:10:53.123365 7f0f7af13800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory^[[0m
2018-12-03 13:10:53.275191 7f4a49579800 0 set uid:gid to 167:167 (ceph:ceph)
2018-12-03 13:10:53.275212 7f4a49579800 0 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 34356
2018-12-03 13:10:53.275464 7f4a49579800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory^[[0m
~
cat /var/lib/ceph/osd/ceph-7目录发现其目录下为空
处理步骤
- 方法一
思路:
1.将一块盘挂载在临时目录下
2.查看其osd对应编号信息
3卸载临时挂载
4将盘挂载在对应的/var/lib/ceph/osd/下
5重启osd服务
命令形如:
mount /dev/{盘符} /mnt
cat whoami
umount /mnt
mount -t xfs /dev/{盘符} /var/lib/ceph/osd/ceph-{osd_id}
systemctl restart ceph-osd@{osd_id}
具体的一次完整实践如下图:
[root@hyhive osd]# mount /dev/sdm1 /mnt
[root@hyhive osd]#
[root@hyhive osd]# cd /mnt
[root@hyhive mnt]# ls
activate.monmap ceph_fsid fsid journal_uuid magic store_version systemd whoami
active current journal keyring ready superblock type
[root@hyhive mnt]#
[root@hyhive mnt]# cat whoami
10
[root@hyhive mnt]#
[root@hyhive mnt]# cd ../
[root@hyhive /]# umount /mnt
[root@hyhive /]#
[root@hyhive /]#
//注意:-o后面的参数是根据我们的环境要求调整的参数,可以不加
[root@hyhive /]# mount -t xfs -o noatime,logbsize=128k /dev/sdm1 /var/lib/ceph/osd/ceph-10/
[root@hyhive /]#
[root@hyhive /]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 138.8G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 475M 0 part /boot
└─sda3 8:3 0 131.9G 0 part
├─centos-root 253:0 0 99.9G 0 lvm /
└─centos-swap 253:1 0 32G 0 lvm [SWAP]
sdb 8:16 0 447.1G 0 disk
├─sdb1 8:17 0 442.1G 0 part
└─sdb2 8:18 0 5G 0 part
sdc 8:32 0 447.1G 0 disk
├─sdc1 8:33 0 442.1G 0 part
└─sdc2 8:34 0 5G 0 part
sdd 8:48 0 1.8T 0 disk
├─sdd1 8:49 0 1.8T 0 part
└─sdd2 8:50 0 5G 0 part
sde 8:64 0 1.8T 0 disk
├─sde1 8:65 0 1.8T 0 part
└─sde2 8:66 0 5G 0 part
sdf 8:80 0 1.8T 0 disk
├─sdf1 8:81 0 1.8T 0 part /var/lib/ceph/osd/ceph-3
└─sdf2 8:82 0 5G 0 part
sdg 8:96 0 1.8T 0 disk
├─sdg1 8:97 0 1.8T 0 part
└─sdg2 8:98 0 5G 0 part
sdh 8:112 0 1.8T 0 disk
├─sdh1 8:113 0 1.8T 0 part /var/lib/ceph/osd/ceph-5
└─sdh2 8:114 0 5G 0 part
sdi 8:128 0 1.8T 0 disk
├─sdi1 8:129 0 1.8T 0 part /var/lib/ceph/osd/ceph-6
└─sdi2 8:130 0 5G 0 part
sdj 8:144 0 1.8T 0 disk
├─sdj1 8:145 0 1.8T 0 part
└─sdj2 8:146 0 5G 0 part
sdk 8:160 0 1.8T 0 disk
├─sdk1 8:161 0 1.8T 0 part
└─sdk2 8:162 0 5G 0 part
sdl 8:176 0 1.8T 0 disk
├─sdl1 8:177 0 1.8T 0 part
└─sdl2 8:178 0 5G 0 part
sdm 8:192 0 1.8T 0 disk
├─sdm1 8:193 0 1.8T 0 part /var/lib/ceph/osd/ceph-10
└─sdm2 8:194 0 5G 0 part
[root@hyhive /]#
[root@hyhive /]# systemctl restart ceph-osd@10
Job for [email protected] failed because start of the service was attempted too often. See "systemctl status [email protected]" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed [email protected]" followed by "systemctl start [email protected]" again.
[root@hyhive /]#
[root@hyhive /]# systemctl reset-failed ceph-osd@10
[root@hyhive /]#
[root@hyhive /]# systemctl restart ceph-osd@10
[root@hyhive /]#
[root@hyhive /]# systemctl status ceph-osd@10
● [email protected] - Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-12-03 13:15:27 CST; 6s ago
Process: 39672 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 39680 (ceph-osd)
CGroup: /system.slice/system-ceph\x2dosd.slice/[email protected]
└─39680 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
Dec 03 13:15:27 hyhive systemd[1]: Starting Ceph object storage daemon...
Dec 03 13:15:27 hyhive systemd[1]: Started Ceph object storage daemon.
Dec 03 13:15:27 hyhive ceph-osd[39680]: starting osd.10 at :/0 osd_data /var/lib/ceph/osd/ceph-10 /var/lib/ceph/osd/ceph-10/journal
[root@hyhive /]#
- 方法二:
在处理一块盘后,发现方法一有点麻烦,反复的挂载、卸载,其实就是为了查看盘和osd的对应信息。那么有没有命令可以查看所有的盘和osd的对应信息呢?答案:ceph-disk list
思路:
1查看所有盘与其osd的对应信息
2根据盘与osd的对应信息,挂载盘符
3.重启osd服务
命令如下:
ceph-disk list
mount -t xfs /dev/{盘符} /var/lib/ceph/osd/ceph-{osd_id}
systemctl restart ceph-osd@{osd_id}
一次完整操作如下:
[root@hyhive /]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/dm-1 swap, swap
/dev/sda :
/dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
/dev/sda3 other, LVM2_member
/dev/sda2 other, xfs, mounted on /boot
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, prepared, cluster ceph, osd.11, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, prepared, cluster ceph, osd.0, journal /dev/sdc2
/dev/sdd :
/dev/sdd2 ceph journal, for /dev/sdd1
/dev/sdd1 ceph data, prepared, cluster ceph, osd.1, journal /dev/sdd2
/dev/sde :
/dev/sde2 ceph journal, for /dev/sde1
/dev/sde1 ceph data, prepared, cluster ceph, osd.2, journal /dev/sde2
/dev/sdf :
/dev/sdf2 ceph journal, for /dev/sdf1
/dev/sdf1 ceph data, active, cluster ceph, osd.3, journal /dev/sdf2
/dev/sdg :
/dev/sdg2 ceph journal, for /dev/sdg1
/dev/sdg1 ceph data, prepared, cluster ceph, osd.4, journal /dev/sdg2
/dev/sdh :
/dev/sdh2 ceph journal, for /dev/sdh1
/dev/sdh1 ceph data, active, cluster ceph, osd.5, journal /dev/sdh2
/dev/sdi :
/dev/sdi2 ceph journal, for /dev/sdi1
/dev/sdi1 ceph data, active, cluster ceph, osd.6, journal /dev/sdi2
/dev/sdj :
/dev/sdj2 ceph journal, for /dev/sdj1
/dev/sdj1 ceph data, prepared, cluster ceph, osd.7, journal /dev/sdj2
/dev/sdk :
/dev/sdk2 ceph journal, for /dev/sdk1
/dev/sdk1 ceph data, prepared, cluster ceph, osd.8, journal /dev/sdk2
/dev/sdl :
/dev/sdl2 ceph journal, for /dev/sdl1
/dev/sdl1 ceph data, active, cluster ceph, osd.9, journal /dev/sdl2
/dev/sdm :
/dev/sdm2 ceph journal, for /dev/sdm1
/dev/sdm1 ceph data, active, cluster ceph, osd.10, journal /dev/sdm2
[root@hyhive /]#
[root@hyhive /]# mount -t xfs -o noatime,logbsize=128k /dev/sdk1 /var/lib/ceph/osd/ceph-8/
[root@hyhive /]# systemctl reset-failed ceph-osd@8
[root@hyhive /]# systemctl restart ceph-osd@8
[root@hyhive /]# systemctl status ceph-osd@8
● [email protected] - Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-12-03 13:21:23 CST; 28s ago
Process: 48154 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 48161 (ceph-osd)
CGroup: /system.slice/system-ceph\x2dosd.slice/[email protected]
└─48161 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph
Dec 03 13:21:23 hyhive systemd[1]: Starting Ceph object storage daemon...
Dec 03 13:21:23 hyhive systemd[1]: Started Ceph object storage daemon.
Dec 03 13:21:23 hyhive ceph-osd[48161]: starting osd.8 at :/0 osd_data /var/lib/ceph/osd/ceph-8 /var/lib/ceph/osd/ceph-8/journal
Dec 03 13:21:49 hyhive ceph-osd[48161]: 2018-12-03 13:21:49.959213 7f9eec9e6800 -1 osd.8 129653 log_to_monitors {default=true}
[root@hyhive /]#
处理完成后环境状态
[root@hyhive /]#
[root@hyhive /]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-5 18.56999 root hdd_root
-6 18.56999 host hdd_host
1 1.85699 osd.1 up 1.00000 1.00000
2 1.85699 osd.2 up 1.00000 1.00000
3 1.85699 osd.3 up 1.00000 1.00000
4 1.85699 osd.4 up 1.00000 1.00000
5 1.85699 osd.5 up 1.00000 1.00000
6 1.85699 osd.6 up 1.00000 1.00000
7 1.85699 osd.7 up 1.00000 1.00000
8 1.85699 osd.8 up 1.00000 1.00000
9 1.85699 osd.9 up 1.00000 1.00000
10 1.85699 osd.10 up 1.00000 1.00000
-3 0.88399 root ssd_root
-4 0.88399 host ssd_host
0 0.44199 osd.0 up 1.00000 1.00000
11 0.44199 osd.11 up 1.00000 1.00000
-1 19.45399 root default
-2 19.45399 host hyhive
0 0.44199 osd.0 up 1.00000 1.00000
1 1.85699 osd.1 up 1.00000 1.00000
2 1.85699 osd.2 up 1.00000 1.00000
3 1.85699 osd.3 up 1.00000 1.00000
4 1.85699 osd.4 up 1.00000 1.00000
5 1.85699 osd.5 up 1.00000 1.00000
6 1.85699 osd.6 up 1.00000 1.00000
7 1.85699 osd.7 up 1.00000 1.00000
8 1.85699 osd.8 up 1.00000 1.00000
9 1.85699 osd.9 up 1.00000 1.00000
10 1.85699 osd.10 up 1.00000 1.00000
11 0.44199 osd.11 up 1.00000 1.00000
[root@hyhive /]#
[root@hyhive /]# ceph -s
cluster 0eef9474-08c7-445e-98e9-35120d03bf19
health HEALTH_WARN
too many PGs per OSD (381 > max 300)
monmap e1: 1 mons at {hyhive=192.168.3.1:6789/0}
election epoch 23, quorum 0 hyhive
fsmap e90: 1/1/1 up {0=hyhive=up:active}
osdmap e129886: 12 osds: 12 up, 12 in
flags sortbitwise,require_jewel_osds
pgmap v39338084: 2288 pgs, 12 pools, 1874 GB data, 485 kobjects
3654 GB used, 15800 GB / 19454 GB avail
2288 active+clean
client io 4357 kB/s rd, 357 kB/s wr, 265 op/s rd, 18 op/s wr
[root@hyhive /]#