主机共有node1,node2,node3三个,每台主机有三个OSD,如下图所示,其中osd1,3,5,6,7,8为SSD盘,2,3,4为SATA盘。
我们用osd1,3,4建一个pool名叫ssd,采用三副本的形式,osd0,2,4建一个Pool名字叫sata,采用纠删码的形式,k=2,m=1,即用两个osd存数据分片,一个osd存校验信息,osd6,7,8建一个pool名叫metadata用来存放cephfs的元数据。
pool ssd和sata构成一个writeback模式的cache分层,ssd为hotstorage,即缓存,sata为coldstorage即后端存储;sata和metadata两个pool构建一个cephfs,挂载到/mnt/cephfs目录下。
1、安装软件
(1)安装依赖
apt-get install autoconf automake autotools-dev libbz2-dev debhelper default-jdk git javahelper junit4 libaio-dev libatomic-ops-dev libbabeltrace-ctf-dev libbabeltrace-dev libblkid-dev libboost-dev libboost-program-options-dev libboost-system-dev libboost-thread-dev libcurl4-gnutls-dev libedit-dev libexpat1-dev libfcgi-dev libfuse-dev libgoogle-perftools-dev libkeyutils-dev libleveldb-dev libnss3-dev libsnappy-dev liblttng-ust-dev libtool libudev-dev libxml2-dev pkg-config python python-argparse python-nose uuid-dev uuid-runtime xfslibs-dev yasm
uuid-dev libkeyutils-dev libgoogle-perftools-dev libatomic-ops-dev libaio-dev libgdata-common libgdata13 libsnappy-dev libleveldb-dev
(2)安装软件包
wget http://ceph.com/download/ceph-0.89.tar.gz
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
make -j4
make install
http://docs.ceph.com/docs/master/install/manual-deployment/
(3)同步时间
在各个节点上运行
ntpdate cn.pool.ntp.org
2、搭建monitor
(1)cp src/init-ceph /etc/init.d/ceph
(2)uuidgen
2fc115bf-b7bf-439a-9c23-8f39f025a9da
vim /etc/ceph/ceph.conf; set fsid = 2fc115bf-b7bf-439a-9c23-8f39f025a9da
(3)在/tmp/ceph.mon.keyring文件下产生一个keyring
ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
(4)在/tmp/ceph.client.admin.keyring文件下产生一个keyring
ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
(5)将ceph.client.admin.keyring导入到ceph.mon.keyring
ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
(6)在node1上创建一个Mon,名字叫node1,/tmp/monmap 存monmap
monmaptool --create --add node1 172.10.2.171 --fsid 2fc115bf-b7bf-439a-9c23-8f39f025a9da /tmp/monmap
(7)创建存储monitor数据的文件夹,文件夹里主要有keyring和store.db
mkdir -p /var/lib/ceph/mon/ceph-node1
(8)用monitor map和keyring组装mon守护进程开启需要的初始数据
ceph-mon --mkfs -i node1 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
(9)
touch /var/lib/ceph/mon/ceph-node1/done
(10)开启monitor
/etc/init.d/ceph start mon.node1
3、加入OSD
(1)做磁盘的格式化工作
ceph-disk prepare --cluster ceph --cluster-uuid 2fc115bf-b7bf-439a-9c23-8f39f025a9da --fs-type xfs /dev/sdb
mkdir -p /var/lib/ceph/bootstrap-osd/
mkdir -p /var/lib/ceph/osd/ceph-0
(2)挂载
ceph-disk activate /dev/sdb1 --activate-key /var/lib/ceph/bootstrap-osd/ceph.keyring
(3)在/etc/ceph/ceph.conf里加入[osd]的信息后
/etc/init.d/ceph start就可以启动所有的OSD
如果启动后
ceph osd stat查看还是没有up
就
rm -rf /var/lib/ceph/osd/ceph-2/upstart
再次启动
/etc/init.d/ceph start
(4)设置node2免登陆(可以不做这一步)
ssh-keygen
ssh-copy-id node2
(5)第二个节点上加osd需要拷一些配置
scp /var/lib/ceph/bootstrap-osd/ceph.keyring [email protected]:/var/lib/ceph/bootstrap-osd/
然后按照上述(1)-(3)操作
以此类推在node1,2,3上各建立3个osd
4、创建mds,创建文件系统
(1)创建存储mds数据的文件夹
mkdir -p /var/lib/ceph/mds/ceph-node1/
(2)生成Mds的keyring,用cephx验证需要此步骤
ceph auth get-or-create mds.node1 mon 'allow rwx' osd 'allow *' mds 'allow *' -o /var/lib/ceph/mds/ceph-node1/keyring
(4)开启mmsd.node1
/etc/init.d/ceph start mds.node1
以此类推在node1,2,3上都建立MDS
5、在第二个节点上加Monitor
(1)ssh node2
(2)mkdir -p /var/lib/ceph/mon/ceph-node2
(3)ceph auth get mon. -o /tmp/ceph.mon.keyring
(4)ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
(5)ceph mon getmap -o /tmp/monmap
(6)ceph-mon --mkfs -i node2 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
(7)touch /var/lib/ceph/mon/ceph-node2/done
(8)rm -f /var/lib/ceph/mon/ceph-node2/upstart
(9)/etc/init.d/ceph start mon.node2
以此类推在node1,2,3上都建立Monitor
至此ps -ef | grep ceph应该可以查看到每个Node上都有一个Mon进程,一个mds进程,3个osd进程,ceph -s命令也可以查看。
配置文件如下:
[global]
fsid = 2fc115bf-b7bf-439a-9c23-8f39f025a9da
mon initial members = node1,node2,node3
mon host = 172.10.2.171,172.10.2.172,172.10.2.173
public network = 172.10.2.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
filestore xattr use omap = true
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1
[mon.node1]
host = node1
mon addr = 172.10.2.171:6789
[mon.node2]
host = node2
mon addr = 172.10.2.172:6789
[mon.node3]
host = node3
mon addr = 172.10.2.173:6789
[osd]
osd crush update on start = false
[osd.0]
host = node1
addr = 172.10.2.171:6789
[osd.1]
host = node1
addr = 172.10.2.171:6789
[osd.2]
host = node2
addr = 172.10.2.172:6789
[osd.3]
host = node2
addr = 172.10.2.172:6789
[osd.4]
host = node3
addr = 172.10.2.173:6789
[osd.5]
host = node3
addr = 172.10.2.173:6789
[osd.6]
host = node3
addr = 172.10.2.173:6789
[osd.7]
host = node2
addr = 172.10.2.172:6789
[osd.8]
host = node1
addr = 172.10.2.171:6789
[mds.node1]
host = node1
[mds.node2]
host = node2
[mds.node3]
host = node3
6、修改crushmap
(1)获取crush map
ceph osd getcrushmap -o compiled-crushmap-filename
(2)反编译
crushtool -d compiled-crushmap-filename -o decompiled-crushmap-filename
(3)编辑decompiled-crushmap-filename ,加入ruleset,一共三个root对应三个pool,再次建立root和osd的对应关系,在ruleset中和root连接起来,设置pool的类型等。
(4)编译
crushtool -c decompiled-crushmap-filename -o compiled-crushmap-filename
(5)设置crush map
ceph osd setcrushmap -i compiled-crushmap-filename
编辑后的crushmap如下:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
root sata {
id -1 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.1
item osd.2 weight 0.1
item osd.4 weight 0.1
}
root ssd {
id -8 # do not change unnecessarily
#weight 0.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 0.1
item osd.3 weight 0.1
item osd.5 weight 0.1
}
root metadata {
id -9 # do not change unnecessarily
#weight 0.000
alg straw
hash 0 # rjenkins1
item osd.7 weight 0.1
item osd.6 weight 0.1
item osd.8 weight 0.1
}
rule ssd {
ruleset 1
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type osd
step emit
}
rule sata {
ruleset 0
type erasure
min_size 1
max_size 10
step take sata
step chooseleaf firstn 0 type osd
step emit
}
rule metadata {
ruleset 2
type replicated
min_size 1
max_size 10
step take metadata
step chooseleaf firstn 0 type osd
step emit
}
7、建立pool
(1)建立ssd pool,类型为replicated
命令原型:ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated]
[crush-ruleset-name]
实际命令:
ceph osd pool create ssd 128 128 repicated ssd
(2)建立sata pool,类型为erasure
命令原型:ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure
[erasure-code-profile] [crush-ruleset-name]
实际命令:
ceph osd pool create sata 128 128 erasure default sata
查看有哪些erasure-code-pofile的命令为
ceph osd erasure-code-profile ls
查看具体profile内容为
ceph osd erasure-code-profile get default,结果为:
directory=/usr/lib/ceph/erasure-code
k=2
m=1
plugin=jerasure
technique=reed_sol_van
erasure-code-profile很重要,一旦设置并应用于pool不能更改,设置命令为
ceph osd erasure-code-profile set myprofile \
k=3 \
m=2 \
ruleset-failure-domain=rack
(3)建立metadata pool,类型为replicated
ceph osd pool create metadata128 128 repicated metadata
查看pg状态
ceph pg dump可以检查哪些PG在哪些OSD中
可以用
ceph osd lspools查看有哪些池,
ceph osd tree查看OSD信息
8、建立cache tier