CMake Error at cmake/modules/BuildBoost.cmake:270 (_add_library):
Cannot find source file:
xxHash/xxhash.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .m .M .mm .h .hh .h++ .hm
.hpp .hxx .in .txx
解决方案:
克隆的时候xxhash文件没有克隆上原因,重新克隆即可。
Note: do_cmake.sh now defaults to creating a debug build of ceph that can be up to 5x slower with some workloads. Please pass “-DCMAKE_BUILD_TYPE=RelWithDebInfo” to do_cmake.sh to create a non-debug release.)
do_cmake.sh现在默认情况下创建的ceph调试版本在某些工作负载下的运行速度可能会慢5倍。请将“ -DCMAKE_BUILD_TYPE = RelWithDebInfo”传递给do_cmake.sh以创建非调试版本。
解决方案:可以试一试
https://tracker.ceph.com/issues/36373
cp /usr/local/lib/librados.so.2 /usr/lib/
cd /usr/local/lib/python3/dist-packages/
cp * /usr/lib/python3/dist-packages/
apt-get install librbd-dev
apt-get install ceph
错误提示:
root@node1:/lib/x86_64-linux-gnu# ceph health detail
HEALTH_WARN 2 mgr modules have failed dependencies
[WRN] MGR_MODULE_DEPENDENCY: 2 mgr modules have failed dependencies
Module 'rbd_support' has failed dependency: /lib/x86_64-linux-gnu/librbd.so.1: undefined symbol: _ZN19TokenBucketThrottleC1EPN4ceph6common11CephContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmmP9SafeTimerPSt5mutex
Module 'telemetry' has failed dependency: /lib/x86_64-linux-gnu/librbd.so.1: undefined symbol: _ZN19TokenBucketThrottleC1EPN4ceph6common11CephContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmmP9SafeTimerPSt5mutex
查看所有librbd,so,1,可以看到Clone的目录下存在/home/ceph-gitee/ceph/build/lib/librbd.so.1,usr/lib下的文件属于缺失文件,只需将ceph目录下的文件拷贝到usr/lib下即可(记得保存副本哦)
root@node1:/lib/x86_64-linux-gnu# find / -name librbd.so.1
/home/ceph-gitee/ceph/build/lib/librbd.so.1
/usr/lib/x86_64-linux-gnu/librbd.so.1
/usr/lib/librbd.so.1
/snap/lxd/20789/lib/x86_64-linux-gnu/librbd.so.1
/snap/lxd/20806/lib/x86_64-linux-gnu/librbd.so.
解决:
root@node1:/lib/x86_64-linux-gnu# cp /home/ceph-gitee/ceph/build/lib/librbd.so.1 .
错误描述:
root@node1:/var/lib/ceph/osd# ceph health detail
HEALTH_WARN 4 mgr modules have failed dependencies; Degraded data redundancy: 1 pg undersized
[WRN] MGR_MODULE_DEPENDENCY: 4 mgr modules have failed dependencies
Module 'orchestrator' has failed dependency: No module named 'ceph'
Module 'rbd_support' has failed dependency: librbd.so.1: cannot open shared object file: No such file or directory
Module 'telemetry' has failed dependency: librbd.so.1: cannot open shared object file: No such file or directory
Module 'volumes' has failed dependency: No module named 'ceph'
[WRN] PG_DEGRADED: Degraded data redundancy: 1 pg undersized
pg 1.0 is stuck undersized for 34m, current state active+undersized+remapped, last acting [1,0]
解决:
The install shipped with a default CRUSH rule which had as the default failure domain:replicated_rule``host
$ ceph osd crush rule dump replicated_rule
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
先找出了与pg1相关的池
$ ceph osd pool ls
device_health_metrics
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+undersized+remapped
并确认pg正在使用默认规则:
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule
而不是修改默认的CRUSH规则,我选择创建一个新的复制规则,但这次指定(又名)类型(文档:CRUSH地图类型和存储桶),也假设默认CRUSH根:osd``device``default
# osd crush rule create-replicated []
$ ceph osd crush rule create-replicated replicated_rule_osd default osd
$ ceph osd crush rule dump replicated_rule_osd
{
"rule_id": 1,
"rule_name": "replicated_rule_osd",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
然后将新规则分配给现有池:
$ ceph osd pool set device_health_metrics crush_rule replicated_rule_osd
set pool 1 crush_rule to replicated_rule_osd
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule_osd
最后确认pg状态:
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+clean
要想缩减集群尺寸或替换硬件,可在运行时删除 OSD 。在 Ceph 里,一个 OSD 通常是一台主机上的一个 ceph-osd
守护进程、它运行在一个硬盘之上。如果一台主机上有多个数据盘,你得挨个删除其对应 ceph-osd
。通常,操作前应该检查集群容量,看是否快达到上限了,确保删除 OSD 后不会使集群达到 near full
比率。
注意: 删除 OSD 时不要让集群达到 full ratio
值,删除 OSD 可能导致集群达到或超过 full ratio
值。
把 OSD 踢出集群,删除 OSD 前,它通常是 up
且 in
的,要先把它踢出集群,以使 Ceph 启动重新均衡、把数据拷贝到其他 OSD 。
ceph osd out {osd-num}
观察数据迁移
一旦把 OSD 踢出( out
)集群, Ceph 就会开始重新均衡集群、把归置组迁出将删除的 OSD 。你可以用 ceph 工具观察此过程。
ceph -w
你会看到归置组状态从 active+clean
变为 active, some degraded objects
、迁移完成后最终回到 active+clean
状态。( Ctrl-c 中止)
注意:
有时候,(通常是只有几台主机的“小”集群,比如小型测试集群)拿出( out
)某个 OSD 可能会使 CRUSH 进入临界状态,这时某些 PG 一直卡在 active+remapped
状态。如果遇到了这种情况,你应该把此 OSD 标记为 in
,用这个命令:
ceph osd in {osd-num}
等回到最初的状态后,把它的权重设置为 0 ,而不是标记为 out
,用此命令:
ceph osd crush reweight osd.{osd-num} 0
执行后,你可以观察数据迁移过程,应该可以正常结束。把某一 OSD 标记为 out
和权重改为 0 的区别在于,前者,包含此 OSD 的桶、其权重没变;而后一种情况下,桶的权重变了(降低了此 OSD 的权重)。某些情况下, reweight 命令更适合“小”集群。
停止 OSD
把 OSD 踢出集群后,它可能仍在运行,就是说其状态为 up
且 out
。删除前要先停止 OSD 进程。
ssh {osd-host}
sudo /etc/init.d/ceph stop osd.{osd-num}
停止 OSD 后,状态变为 down
。
删除 OSD
此步骤依次把一个 OSD 移出集群 CRUSH 图、删除认证密钥、删除 OSD 图条目、删除 ceph.conf
条目。如果主机有多个硬盘,每个硬盘对应的 OSD 都得重复此步骤。
删除 CRUSH 图的对应 OSD 条目,它就不再接收数据了。你也可以反编译 CRUSH 图、删除 device 列表条目、删除对应的 host 桶条目或删除 host 桶(如果它在 CRUSH 图里,而且你想删除主机),重编译 CRUSH 图并应用它。详情参见删除 OSD 。
ceph osd crush remove {name}
删除 OSD 认证密钥:
ceph auth del osd.{osd-num}
ceph-{osd-num}
路径里的 ceph
值是 $cluster-$id
,如果集群名字不是 ceph
,这里要更改。
删除 OSD
ceph osd rm {osd-num}
#for example
ceph osd rm 1
登录到保存 ceph.conf
主拷贝的主机
ssh {admin-host}
cd /etc/ceph
vim ceph.conf
从 ceph.conf
配置文件里删除对应条目
[osd.1]
host = {hostname}
从保存 ceph.conf
主拷贝的主机,把更新过的 ceph.conf
拷贝到集群其他主机的 /etc/ceph
目录下
删除osd对应的vg
vgremove ceph*