CEPH相关报错解决

文章目录

  • 报错一
  • 报错二
  • ceph初始monitor(s)报错解决

报错一

[root@ct ceph]# ceph -s
  cluster:
    id:     dfb110f9-e0e0-4544-9f13-9141750ee9f6
    health: HEALTH_WARN
            Degraded data redundancy: 192 pgs undersized

  services:
    mon: 3 daemons, quorum ct,c1,c2
    mgr: ct(active), standbys: c2, c1
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   3 pools, 192 pgs
    objects: 0  objects, 0 B
    usage:   2.0 GiB used, 2.0 TiB / 2.0 TiB avail
    pgs:     102 active+undersized
             90  stale+active+undersized
查看obs状态,c2没有连接上
[root@ct ceph]# ceph osd status
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| 0  |  ct  | 1026M | 1022G |    0   |     0   |    0   |     0   | exists,up |
| 1  |  c1  | 1026M | 1022G |    0   |     0   |    0   |     0   | exists,up |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
解决方法:
在c2重启osd即可解决
[root@c2 ~]# systemctl restart ceph-osd.target

报错二

[root@ct ceph]# ceph -s
  cluster:
    id:     44d72edb-4085-4cfc-8652-eb670472f169
    health: HEALTH_WARN
            clock skew detected on mon.c1, mon.c2

  services:
    mon: 3 daemons, quorum ct,c1,c2
    mgr: c1(active), standbys: c2, ct
    osd: 3 osds: 1 up, 1 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   1.0 GiB used, 1023 GiB / 1024 GiB avail
    pgs: 
解决方法:
(1)控制节点重启NTP服务
[root@ct ceph]# systemctl restart ntpd
(2)计算节点重新同步控制节点时间
[root@c2 ~]# ntpdate 192.168.100.10
(3)在控制节点重启mon服务
[root@ct ceph]# systemctl restart ceph-mon.target

ceph初始monitor(s)报错解决


执行ceph-deploy mon create-initial

报错部分内容如下:

[ceph2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph2][WARNIN] monitor: mon.ceph2, might not be running yet
[ceph2][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph2.asok mon_status
[ceph2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph2][WARNIN] monitor ceph2 does not exist in monmap
[ceph2][WARNIN] neither public_addr nor public_network keys are defined for monitors
[ceph2][WARNIN] monitors may not be able to form quorum

注意报错中public_network,这是由于没有在ceph.conf中配置

解决办法:

修改ceph.conf配置文件(此IP段根据个人情况设定),添加public_network = 192.168.1.0/24

修改后继续执行ceph-deploy mon create-initial后,发现依旧报错,报错部分内容如下

[ceph3][WARNIN] provided hostname must match remote hostname
[ceph3][WARNIN] provided hostname: ceph3
[ceph3][WARNIN] remote hostname: localhost
[ceph3][WARNIN] monitors may not reach quorum and create-keys will not complete
[ceph3][WARNIN] ********************************************************************************
[ceph3][DEBUG ] deploying mon to ceph3
[ceph3][DEBUG ] get remote short hostname
[ceph3][DEBUG ] remote hostname: localhost
[ceph3][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
[ceph_deploy][ERROR ] GenericError: Failed to create 3 monitors

这里看到错误提示/etc/ceph/ceph.conf内容不同,使用--overwrite-conf来覆盖

命令如下:

ceph-deploy --overwrite-conf config push ceph1 ceph2 ceph3

修改后继续执行ceph-deploy mon create-initial,发现报错还是存在,报错部分内容如下

[ceph3][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph3.asok mon_status
[ceph3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph_deploy.mon][WARNIN] mon.ceph3 monitor is not yet in quorum, tries left: 1
[ceph_deploy.mon][WARNIN] waiting 20 seconds before retrying
[ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum:
[ceph_deploy.mon][ERROR ] ceph1
[ceph_deploy.mon][ERROR ] ceph3
[ceph_deploy.mon][ERROR ] ceph2

经过排查发现节点的hostname与/etc/hosts不符

解决办法:修改节点hostname名称,使其与/etc/hosts相符

节点一执行:hostnamectl set-hostname ceph1
节点二执行:hostnamectl set-hostname ceph2
节点三执行:hostnamectl set-hostname ceph3

修改后继续执行ceph-deploy mon create-initial,mmp发现还是报错,报错内容又不一样了,中间部分报错内容如下

[ceph2][ERROR ] no valid command found; 10 closest matches:
[ceph2][ERROR ] perf dump {} {}
[ceph2][ERROR ] log reopen
[ceph2][ERROR ] help
[ceph2][ERROR ] git_version
[ceph2][ERROR ] log flush
[ceph2][ERROR ] log dump
[ceph2][ERROR ] config unset 
[ceph2][ERROR ] config show
[ceph2][ERROR ] get_command_descriptions
[ceph2][ERROR ] dump_mempools
[ceph2][ERROR ] admin_socket: invalid command
[ceph_deploy.mon][WARNIN] mon.ceph2 monitor is not yet in quorum, tries left: 5
[ceph_deploy.mon][WARNIN] waiting 5 seconds before retrying
[ceph2][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph2.asok mon_status
[ceph2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

解决办法:在各个节点上执行sudo pkill ceph,然后再在deploy节点执行ceph-deploy mon create-initial

然后发现ERROR报错消失了,配置初始monitor(s)、并收集到了所有密钥,当前目录下可以看到下面这些密钥环

ceph.bootstrap-mds.keyring
ceph.bootstrap-mgr.keyring
ceph.bootstrap-osd.keyring
ceph.bootstrap-rgw.keyring
ceph.client.admin.keyring

[root@ct ceph]# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core) 
[root@ct ceph]# ceph -v
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)

你可能感兴趣的:(CEPH)