16、分布式文件系统化GlusterFs

GlusterFS

实验目标:

掌握Glusterfs的理论和配置

 

实验理论:

GlusterFS(Gluster File System) 分布式文件系统的诞生:

以淘宝为例,淘宝把全国所有的数据放在杭州,在双11时,面对巨大的访问量,会发生什么?(并发处理用LB解决)大量的用户时时刻刻的对磁盘阵列读写,面对这么强大的攻势,在强大的磁盘阵列也得崩溃,即使访问量不大,有的人离杭州远,有的离杭州近,用户访问速度也是不同的。

使用分布式文件系统可以解决这些问题,分布式文件系统就是在每一个城市放上一台服务器,那么上海的用户就访问上海的服务器,北京的用户就访问北京的服务器,整体数据就像是一个池子,把整体数据切成一个一个的数据分片,每个池子的数据分片分别放在不同城市的服务器上。

     针对不同城市的数据,为保证数据的安全,采用RAID的技术保证数据的永不丢失,采用LVM技术来保证硬盘随着数据的日渐增多可以增加硬盘大小,

     即使一个城市上的数据损坏了,也可以还原数据

 

分布式文件系统现在有四种:

1、GlusterFS     2、Ceph         3、MooseFS       4、Luster:

GlusterFS和Ceph是4种分布式文件系统中最好的两个,这两个现在都被红帽收购!

现在是GlusterFS的天下,未来是Ceph的天下

 

实验拓扑:

16、分布式文件系统化GlusterFs_第1张图片

Node1、Node2、Node3、Node4、Node5,IP分配为:192.168.1.121----125

 

注意:

一、所有设备都要关闭防火墙和SELinux

二、Node1到Node5用的是glusterfs3.0的iso镜像,在安装界面时要按Esc键,输入linux,回车,这样系统只会拷贝自己需要的iso里的文件来安装系统,系统安装会快些,如果直接在安装界面回车,会把整个iso文件拷贝过来,系统安装会很慢

 

实验步骤:

步骤一、创建分区

Node1234:节点1234都要做的配置

 [root@node1~]# vim  /etc/hosts   

192.168.1.121   node1.example.com  node1

192.168.1.122   node2.example.com  node2

192.168.1.123   node3.example.com  node3

192.168.1.124   node4.example.com  node4

192.168.1.125   node5.example.com  node5

 

[root@node1 ~]# service glusterd  status

glusterd (pid  1207) is running...   用专门的iso安装系统,默认就启动了Glusterd服务,其它系统就需要你去安装glusterd这个服务了

 

 

[root@node1 ~]# fdisk  -cu  /dev/sda

 

Command (m for help): n

Command action

  e   extended

  p   primary partition (1-4)

e

Selected partition 4

First sector (107005952-209715199, default107005952):

Using default value 107005952

Last sector, +sectors or +size{K,M,G}(107005952-209715199, default 209715199):

Using default value 209715199

 

Command (m for help): n

First sector (107008000-209715199, default107008000):

Using default value 107008000

Last sector, +sectors or +size{K,M,G}(107008000-209715199, default 209715199): +10G    

 

Command (m for help): t

Partition number (1-5): 5

Hex code (type L to list codes): 8e

Changed system type of partition 5 to 8e(Linux LVM)

 

Command (m for help): w

The partition table has been altered!

 

[root@node1 ~]# partx -a  /dev/sda

BLKPG: Device or resource busy

error adding partition 1

BLKPG: Device or resource busy

error adding partition 2

BLKPG: Device or resource busy

error adding partition 3

[root@node1 ~]# pvcreate /dev/sda5

 Physical volume "/dev/sda5" successfully created

[root@node1 ~]# vgcreate   vg_bricks    /dev/sda5

 Volume group "vg_bricks" successfully created

[root@node1 ~]# lvcreate  -L  5G  -T  vg_bricks/brickspool  vg_bricks中,创建一个LV池子brickspool,大小为5G

 Logical volume "lvol0" created

 Logical volume "brickspool" created

[root@node1 ~]# lvcreate -V  1G  -T  vg_bricks/brickspool   -n  brick1 LV池子brickspool中,创建一个LVbrick1  大小1G

 Logical volume "brick1" created

[root@node1 ~]# mkfs.xfs -i  size=512  /dev/vg_bricks/brick1  xfs格式化时要指定大小

meta-data=/dev/vg_bricks/brick1  isize=512   agcount=8, agsize=32768 blks

        =                      sectsz=512   attr=2, projid32bit=0

data    =                      bsize=4096   blocks=262144,imaxpct=25

        =                      sunit=0      swidth=0 blks

naming  =version 2             bsize=4096   ascii-ci=0

log     =internal log          bsize=4096   blocks=2560,version=2

        =                       sectsz=512   sunit=0 blks, lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

ode1 ~]# mkdir  -p  /bricks/brick1

[root@node1 ~]# echo "/dev/vg_bricks/brick1  /bricks/brick1   xfs   defaults 0  0"  >> /etc/fstab   /dev/vg_bricks/brick1开机自动挂载到/bricks/brick1

 [root@node1 ~]# mount  -a

[root@node1 ~]# mkdir    /bricks/brick1/brick 挂载完成后在创建这个存放数据分片的目录

 

步骤二、搭建Gluster

Node1:

 

各个城市节点分区完毕后,把这些节点组合起来,形成一个大的“池子”,来综合管理数据,包括数据的动态切割分配,数据的冗余备份,在node1上做,就默认把node1当成主节点

[root@node1 ~]# gluster gluster命令进入到gluster命令行中

gluster> peer   probe  node1.example.com

peer probe: success. Probe on localhost not needed  在节点1上做,自己怎么加入自己?

gluster> peer   probe  node2.example.com

peer probe: success.

gluster> peer   probe  node3.example.com

peer probe: success.

gluster> peer  probe   node4.example.com

peer probe: success.

gluster> peer  status

Number of Peers: 3

 

Hostname: node2.example.com

Uuid: ce2fe11c-94b8-4884-9d3a-271f01eff280

State: Peer in Cluster (Connected)

 

Hostname: node3.example.com

Uuid: 28db58f8-5f8a-4a7f-94a9-03a8a8137fdd

State: Peer in Cluster (Connected)

 

Hostname: node4.example.com

Uuid: 808a34d9-80cf-4077-acfa-f255f52aa9be

State: Peer in Cluster (Connected)

 

创建第一个GlusterFS组并实现两两备份

[root@node1 ~]# gluster

gluster> volume  create   firstvol replica  2   node1.example.com:/bricks/brick1/brick   node2.example.com:/bricks/brick1/brick    node3.example.com:/bricks/brick1/brick    node4.example.com:/bricks/brick1/brick  创建firstvol

volume create: firstvol: success: pleasestart the volume to access data

gluster> volume  start  firstvol   创建完firstvol卷后,启动firstvol,否则使用不了

volume start: firstvol: success

gluster> volume   info  firstvol  查看firstvol卷信息

 

Volume Name: firstvol

Type: Distributed-Replicate

Volume ID: 4087aecf-a4fe-4773-a1af-fa19d6a7aba5

Status: Started

Snap Volume: no

Number of Bricks: 2 x 2 = 4 两两备份模式

Transport-type: tcp

Bricks:

Brick1: node1.example.com:/bricks/brick1/brick

Brick2: node2.example.com:/bricks/brick1/brick

Brick3: node3.example.com:/bricks/brick1/brick

Brick4: node4.example.com:/bricks/brick1/brick

Options Reconfigured:

performance.readdir-ahead: on

auto-delete: disable

snap-max-soft-limit: 90

snap-max-hard-limit: 256

gluster>exit  exit退出时自动保存

 

客户端 NFS挂载:

[root@node5 ~]# mount  -t nfs  node1.example.com:/firstvol   /mnt

[root@node5 ~]# cd /mnt     使用NFS把主节点node1firstvol卷挂载过来

[root@node5 mnt]# touch    file{0..9} 创建10个文件

[root@node5 mnt]# ls

file0 file1  file2  file3 file4  file5  file6 file7  file8  file9

 

 

Node1

[root@node1 ~]# cd /bricks/brick1/brick/

[root@node1 brick]# ls

file3 file4  file7  file9

 

Node2

[root@node2 brick1]# cd   brick/

[root@node2 brick]# ls

file3 file4  file7  file9

 

Node3

[root@node3 ~]# cd /bricks/brick1/brick

[root@node3 brick]# ls

file0 file1  file2  file5 file6  file8

 

Node4

[root@node4 ~]# cd  /bricks/brick1/brick/

[root@node4 brick]# ls

file0 file1  file2  file5 file6  file8

客户端  samba挂载:

Node1   samba使用cifs协议来共享和挂载,服务端和客户端都要安装cifs-utils软件包

[root@node1 ~]# service   smb  restart

Shutting down SMB services:                                [FAILED]

Starting SMB services:                                     [  OK  ]

[root@node1 ~]# service nmb  restart

Shutting down NMB services:                                [FAILED]

Starting NMB services:                                     [  OK  ]

[root@node1 ~]# chkconfig  smb  on

[root@node1 ~]# chkconfig nmb  on

[root@node1 ~]# useradd user1

[root@node1 ~]# smbpasswd  -a  user1

New SMB password:

Retype new SMB password:

Added user user1.

[root@node1 ~]# vim /etc/samba/smb.conf  只需要启动samba服务,不需要你写smb.conf

[gluster-firstvol]              配置文件,有api接口自动把配置写入文件中

comment = For samba share of volumefirstvol

vfs objects = glusterfs

glusterfs:volume = firstvol

glusterfs:logfile = /var/log/samba/glusterfs-firstvol.%M.log

glusterfs:loglevel = 7

path = /

read only = no

guest ok = yes

 

Node5

[root@node5 ~]# smbclient -L  192.168.1.121  需要查看samba通过cifs共享过来的共享名

Enter root's password:

Anonymous login successful

Domain=[MYGROUP] OS=[Unix] Server=[Samba3.6.9-169.1.el6rhs]

 

         Sharename       Type     Comment

         ---------       ----     -------    共享名为gluster-firstvol

         gluster-firstvol Disk      For samba share of volume firstvol

         IPC$            IPC       IPC Service (Samba Server Version 3.6.9-169.1.el6rhs)

Anonymous login successful

Domain=[MYGROUP] OS=[Unix] Server=[Samba3.6.9-169.1.el6rhs]

 

         Server               Comment

         ---------            -------

         NODE1                Samba Server Version3.6.9-169.1.el6rhs

         NODE5                Samba Server Version3.6.9-169.1.el6rhs

 

         Workgroup            Master

         ---------            -------

         MYGROUP              NODE1

 

[root@node5 ~]#  mount  -t cifs //192.168.1.121/gluster-firstvol /mnt  -o  user=user1,password=redhat   

[root@node5 ~]# df     

Filesystem           1K-blocks    Used Available Use% Mounted on

/dev/sda2             50395844 1751408  46084436  4% /

tmpfs                   506168       0   506168   0% /dev/shm

/dev/sda1               198337   29772   158325  16% /boot

//192.168.1.121/gluster-firstvol

                       2076672   66592  2010080   4% /mnt

挂载成功!

 

步骤三、添加节点

 如果你节点的磁盘空间不够用了,需要添加新的节点,来应对更大的数据量

Node3

[root@node3 brick]# lvcreate  -V 1G  -T vg_bricks/brickspool   -n  brick2

 Logical volume "brick2" created

[root@node3 brick]# mkfs.xfs  -i size=512    /dev/vg_bricks/brick2

meta-data=/dev/vg_bricks/brick2  isize=512   agcount=8, agsize=32768 blks

        =                      sectsz=512   attr=2, projid32bit=0

data    =                      bsize=4096   blocks=262144, imaxpct=25

        =                      sunit=0      swidth=0 blks

naming  =version 2             bsize=4096   ascii-ci=0

log     =internal log          bsize=4096   blocks=2560,version=2

        =                       sectsz=512   sunit=0 blks, lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@node3 brick]# mkdir  /bricks/brick2/

[root@node3 ~]# echo "/dev/vg_bricks/brick2  /bricks/brick2   xfs   defaults 0  0"  >> /etc/fstab

[root@node3 brick2]# mkdir /bricks/brick2/brick

 

Node4

[root@node4 brick]# lvcreate  -V 1G   -T  vg_bricks/brickspool   -n brick2

 Logical volume "brick2" created

[root@node4 brick]# mkfs.xfs  -i size=512  /dev/vg_bricks/brick2

meta-data=/dev/vg_bricks/brick2  isize=512   agcount=8, agsize=32768 blks

        =                      sectsz=512   attr=2, projid32bit=0

data    =                      bsize=4096   blocks=262144,imaxpct=25

        =                      sunit=0      swidth=0 blks

naming  =version 2             bsize=4096   ascii-ci=0

log     =internal log          bsize=4096   blocks=2560,version=2

        =                      sectsz=512   sunit=0 blks,lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@node4 brick]# mkdir /bricks/brick2/

[root@node4 ~]# echo "/dev/vg_bricks/brick2  /bricks/brick2   xfs   defaults 0  0"  >> /etc/fstab

[root@node4 brick2]# mkdir /bricks/brick2/brick

 

 

 

Node1

gluster> volume  add-brick   firstvol  node3.example.com:/bricks/brick3/brick   node4.example.com:/bricks/brick3/brick

volume add-brick: success

gluster> volume  info  firstvol

 

Volume Name: firstvol

Type: Distributed-Replicate

Volume ID:4087aecf-a4fe-4773-a1af-fa19d6a7aba5

Status: Started

Snap Volume: no

Number of Bricks: 3 x 2 = 6  

Transport-type: tcp

Bricks:

Brick1:node1.example.com:/bricks/brick1/brick

Brick2:node2.example.com:/bricks/brick1/brick

Brick3:node3.example.com:/bricks/brick1/brick

Brick4:node4.example.com:/bricks/brick1/brick

Brick5:node3.example.com:/bricks/brick3/brick

Brick6:node4.example.com:/bricks/brick3/brick

Options Reconfigured:

performance.readdir-ahead: on

auto-delete: disable

snap-max-soft-limit: 90

snap-max-hard-limit: 256

gluster>exit

 

 

步骤四、均衡分布(Rebalance

虽然节点添加成功,但是查看新节点空间,是没有任何内容的(数据不会重新分配)

[root@node3 /]# ls    /bricks/brick3/brick

[root@node3 /]#

 

即便现在添加数据也不会有

[root@node5 mnt]# touch    text{0..10}

[root@node5 mnt]# ls

file0 file2  file4  file6 file8  text0  text10 text3  text5  text7 text9

file1 file3  file5  file7 file9  text1  text2  text4  text6  text8

 

[root@node1 ~]# cd /bricks/brick1/brick  新数据被添加被添加到Node1Node2

[root@node1 brick]# ls

file3 file4  file7  file9 text1  text3  text8 text9 

 

[root@node3 /]# ls  /bricks/brick1/brick/ 新数据被添加到Node3Node4/bricks/brick1

file0 file1  file2  file5 file6  file8  text0 text10  text2  text4 text5  text6  text7

 

[root@node3 /]# ls  /bricks/brick3/brick  但是NodeNode4新被添加的空间却没有内容!

[root@node3 /]#

-------------------------无论你添加多少都没有内容---------------------------------------------------------------

 

由于以上原因,我们需要使用Rebalance参数让数据均衡分布在不同的节点空间上

[root@node1 brick]# gluster volume  rebalance  firstvol start 

volume rebalance: firstvol: success:Starting rebalance on volume firstvol has been successful.

ID: fc79b11d-13a0-4428-9e53-79f04dbc65a4

 

[root@node3 /]# ls    /bricks/brick3/brick

file0 file1  file2  text0 text10  text3

[root@node4 ~]# ls  /bricks/brick3/brick   Node3Node4上的新空间被分配到了数据

file0 file1  file2  text0 text10  text3

原先存在的数据也会均衡分布一下

均衡分布之后,所有的内容并不是平均、一样多的。各个节点空间中,新添加的数据会像“轮询”一样分布。比如:再创建一个文件,会在node1&node2上,第二个创建的会在node3&node4的原空间上,第三个才会出现在node3&node4的新空间上。

 

步骤五、ACLQuota

ACL:

没有设置acl参数的磁盘是不能设置acl权限的,需要添加磁盘的acl参数方可

node5:

root@node5 mnt]# setfacl -m  u:user1:rw-  /mnt

setfacl: /mnt: Operation not supported     不支持acl

 

为分区添加acl选项

[root@node5 home]#vim  /etc/fstab

node1.example.com:/firstvol /mnt  nfs    defaults,_netdev,acl   0  0

打开glusteracl

[root@node1 brick]# gluster

gluster> volume  set  firstvol acl  on

volume set: success

gluster> exit

重挂分区

[root@node5 home]# mount -o  acl    node1.example.com:/firstvol   /mnt

重新设置用户的ACL

[root@node5 ~]# mount

node1:/firstvolon  /mnt type  nfs  (rw,acl,addr=192.168.1.121)

[root@node5 mnt]# setfacl -m  u:user1:rw-  /mnt

[root@node5 home]# getfacl /mnt

getfacl: Removing leading '/' from absolutepath names

# file: mnt

# owner: root

# group: root

user::rwx

user:user1:rw-

group::r-x

mask::rwx

other::r-x

 

[root@node5 ~]# cd  /mnt

[root@node5 mnt]# su  user1

[user1@node5 mnt]$ touch 123

[user1@node5 mnt$ ls

file0 file2  file4  file6 file8  text0  text10 text3  text5  text7 text9

file1 file3  file5  file7 file9  text1  text2  text4  text6  text8  123

 

Quota

普通的磁盘有磁盘配额的设置选项,那么,作为glusterfs也是有磁盘配额选项的

Node1:

[root@node1 /]# gluster

gluster> volume  quota firstvol  enable  开启Quota功能

volume quota : success

 

Node5:

[root@node5 /]# cd  /mnt

[root@node5 mnt]# mkdir  Limit

[root@node5 mnt]# ls

file0 file2  file4  file6 file8  Limit  text1  text2  text4  text6  text8

file1 file3  file5  file7 file9  text0  123  text3 text5  text7  text9 text10

 

Node1:

对/mnt/Limit进行磁盘限制,只能使用20M空间

gluster> volume   quota firstvol  limit-usage  /Limit 20MB 限制大小

volume quota : success

 

验证:

在/mnt/Limit里创建一个25M的文件

[root@node5 mnt]# cd  /mnt/Limit

[root@node5 Limit]# ls

[root@node5 Limit]# dd if=/dev/zero  of=bigfile  bs=1M count=25

25+0 records in

25+0 records out

26214400 bytes (26 MB) copied, 2.22657 s,11.8 MB/s

[root@node5 Limit]# ls

bigfile

[root@node5 Limit]# ll  -h

total 25M

-rw-r--r-- 1 root root  25M Nov 3  2015 bigfile

直接创建出了25M的内容,从这可以看出,磁盘配额并没有生效,这是因为没有同步到磁盘,需要把数据同步到磁盘中,同步之后,配额生效

[root@node5 Limit]# sync

[root@node5 Limit]# dd if=/dev/zero  of=bigfile1  bs=1M count=25

dd:opening `bigfile1': Disk quota exceeded

 

gluster> volume   quota  firstvol list   Quota查看

Path  Hard-limit  Soft-limit   Used  Available  Soft-limit exceeded?Hard-limit exceeded?

---------------------------------------------------------------------------------------------------------------------------

/Limit 20.0MB     80%     50.0MB  0Bytes        Yes                  Yes

 

 

步骤六、节点间漂移CTDB

上述试验中,存在一个问题。Node1是一个主节点,node5直接就挂载了node1上的”池子”firstvol,当node1出现问题之后,整个GlusterFS就挂了。

    所以为保证其容错性、可考性,需要实现节点之间的飘移,当一个节点出现问题之后,其它节点能接收任务,继续提供服务,这个技术就叫CTDB,此技术需要有一个VIP,对外来说,访问GlusterFS使用的就是VIP。

 

 

Node1234:节点1234都要做的配置

[root@node1 /]# lvcreate  -V 1G -T vg_bricks/brickspool   -n  brick4

 Logical volume "brick4" created

[root@node1 /]# mkfs.xfs  -i size=512  /dev/vg_bricks/brick4

meta-data=/dev/vg_bricks/brick4  isize=512   agcount=8, agsize=32768 blks

        =                      sectsz=512   attr=2, projid32bit=0

data    =                      bsize=4096   blocks=262144,imaxpct=25

        =                      sunit=0      swidth=0 blks

naming  =version 2             bsize=4096   ascii-ci=0

log     =internal log          bsize=4096   blocks=2560,version=2

        =                      sectsz=512   sunit=0 blks,lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@node1 /]# mkdir   /bricks/brick4

[root@node1 /]# echo  "/dev/vg_bricks/brick4  /bricks/brick4  xfs  defaults  0  0" >>  /etc/fstab

[root@node1 /]# mount  -a

[root@node1 /]# mkdir   /bricks/brick4/brick

 

 

[root@node1 brick]# gluster

gluster> volume  create  ctdb replica  4  node1.example.com:/bricks/brick4/brick   node2.example.com:/bricks/brick4/brick   node3.example.com:/bricks/brick4/brick   node4.example.com:/bricks/brick4/brick

volume create: ctdb: success: please startthe volume to access data

先不要启动ctdb卷,所有节点修改卷

 

 

修改ctdb服务的配置文件,node1-4都要进行相同的配置

[root@node1 brick]# vim  /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh

# to prevent the script from running forvolumes it was not intended.

# User needs to set META to the volume thatserves CTDB lockfile.

META="ctdb"  此处名字要与创建的gluster volume名字一样

[root@node1 brick]# vim /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh

# to prevent the script from running forvolumes it was not intended.

# User needs to set META to the volume thatserves CTDB lockfile.

META="ctdb"  此处名字要与创建的gluster volume名字一样

 

启动ctdb卷

gluster> volume  start  ctdb

volume start: ctdb: success

 

安装ctdb2.5软件,node1-4都要进行相同的配置

[root@node1 /]# yum  -y remove  ctdb  要安装ctdb2.5版本,先要卸载原有的ctdb

[root@node1 /]# yum  -y install  ctdb2.5

 

 

创建节点文件(用于指定那些设备之间进行漂移),node1-4都要配置

[root@node1 ~]# vim  /etc/ctdb/nodes

192.168.1.121

192.168.1.122

192.168.1.123

192.168.1.124

 

创建VIP文件,指定VIP地址及在那个网卡上进行漂移。node1-4都要配置

[root@node1 /]#vim /etc/ctdb/public_addresses

192.168.1.1/24   eth0

 

启动ctdb服务,node1-4都要启动此服务,否则不会实现漂移

[root@node1/]# /etc/init.d/ctdb restart

Shutting down ctdbd service: CTDB is notrunning

                                                          [  OK  ]

Starting ctdbd service:                                    [  OK  ]

[root@node1 /]# chkconfig ctdb  on

 

[root@node1 ~]# ctdb status

Number of nodes:4

pnn:0192.168.1.121    OK

pnn:1192.168.1.122    OK

pnn:2192.168.1.123    OK

pnn:3192.168.1.124    OK (THIS NODE)

Generation:1192510318

Size:4

hash:0 lmaster:0

hash:1 lmaster:1

hash:2 lmaster:2

hash:3 lmaster:3

Recovery mode:NORMAL (0)

Recovery master:0

[root@node2 /]# ctdb  ip

Public IPs on node 1

192.168.1.1-1

 

ctdb软件在生成VIP时,速度很慢……耐心等待

Node4上生成了VIP为192.168.1.1/24,代表node4为主节点

[root@node4 ~]# ip addr  list

1: lo: mtu16436 qdisc noqueue state UNKNOWN

   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

   inet 127.0.0.1/8 scope host lo

2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen1000

   link/ether 00:0c:29:42:b7:f2 brd ff:ff:ff:ff:ff:ff

   inet 192.168.1.124/24 brd 192.168.1.255 scope global eth0

    inet 192.168.1.1/24 brd 192.168.1.255 scopeglobal secondary eth0

 

用Node5来ping VIP,进行VIP验证,

[root@node5 ~]# ping 192.168.1.1

PING 192.168.1.1 (192.168.1.1) 56(84) bytesof data.

64 bytes from 192.168.1.1: icmp_seq=1ttl=64 time=4.64 ms

64 bytes from 192.168.1.1: icmp_seq=2ttl=64 time=0.609 ms

^C

--- 192.168.1.1 ping statistics ---

2 packets transmitted, 2 received, 0%packet loss, time 1380ms

rtt min/avg/max/mdev =0.609/2.625/4.642/2.017 ms

VIP连通性正常

 

NFS挂载:

Node5使用VIP来进行挂载

[root@node5 ~]# mount   -t  nfs     192.168.1.1:/ctdb   /mnt

[root@node5 ~]# df

Filesystem        1K-blocks    Used Available Use% Mounted on

/dev/sda2          50395844 1748972  46086872  4% /

tmpfs                506168       0   506168   0% /dev/shm

/dev/sda1            198337   29772    158325 16% /boot

192.168.1.1:/ctdb   1038336  32768   1005568   4% /mnt

 

[root@node5 ~]# cd  /mnt

[root@node5 mnt]# touch  file{1..5}

[root@node5 mnt]# ls

file1 file2  file3  file4 file5  lockfile

节点之间的漂移,也就是说漂移的这几个节点服务器之间的数据应该是一样的,这样才能保证漂移时数据稳定,ctdb漂移要节点数据是一样的。

[root@node1 ~]# ls  /bricks/brick1/brick/

file1 file2  file3  file4 file5  lockfile

[root@node2 ~]# ls  /bricks/brick1/brick/

file1 file2  file3  file4 file5  lockfile

[root@node3 ~]# ls  /bricks/brick1/brick/

file1 file2  file3  file4 file5  lockfile

[root@node4 ~]# ls  /bricks/brick1/brick/

file1 file2  file3  file4 file5  lockfile

 

验证节点间漂移:

[root@node4 ~]#init  0    关闭节点4

[root@node1 ~]# ctdb  status   查看其它节点信息,发现VIP漂移到了Node1

Number of nodes:4

pnn:0 192.168.1.121    OK (THIS NODE)

pnn:1 192.168.1.122    OK

pnn:2 192.168.1.123    OK

pnn:3192.168.1.124    DISCONNECTED|UNHEALTHY|INACTIVE

Generation:495021029

Size:3

hash:0 lmaster:0

hash:1 lmaster:1

hash:2 lmaster:2

Recovery mode:NORMAL (0)

Recovery master:0

[root@node1 ~]# ip  addr  list

1: lo: mtu16436 qdisc noqueue state UNKNOWN

   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

   inet 127.0.0.1/8 scope host lo

2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen1000

   link/ether 00:0c:29:29:d3:3a brd ff:ff:ff:ff:ff:ff

   inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0

    inet192.168.1.1/24 brd 192.168.1.255 scope global secondary eth0

Node1变成了主节点,接替Node4使用VIP进行工作

 

Node5挂载的设备也正常,说明设备漂移成功

[root@node5 mnt]# df

Filesystem        1K-blocks    Used Available Use% Mounted on

/dev/sda2          50395844 1749132  46086712  4% /

tmpfs                506168       0   506168   0% /dev/shm

/dev/sda1            198337   29772   158325  16% /boot

192.168.1.1:/ctdb   1038336  32768   1005568   4% /mnt

注意:有时候,查看ctdb  status,会显示没有OK,这是怎么回事了?

查看日志:/var/log/log.ctdb会显示:(don‘t  open  /gluster/lock/lockfifile.No such  file or directory),此时需要手动创建此文件(所有节点都要创建,先创建/gluster/lock目录),然后再重启服务即可。

 

步骤七、站点间复制

站点间复制主要用于对站点数据的备份,  master 站点在生成新数据之后,实时同步到

slave 站点来防止 master 站点数据的丢失,slave站点只是 master 站点的备份,只有备份作用,master 站点的数据会实时同步到 slave站点,但是 slave 站点只是备份的作用,在 slave 站点上创建的数据是不会复制到 master 站点的

 

Node1&Node2上创建mastervol卷,并关在使用

[root@node1 ~]# gluster

gluster> volume   create  mastervol   replica 2  node1.example.com:/bricks/brick1/brick  node2.example.com:/bricks/brick1/brick

volume create: mastervol: success: pleasestart the volume to access data

gluster> volume  start  mastervol

volume start: mastervol: success

gluster> exit

 [root@node1~]# echo  "node1.example.com:/mastervol   /tmp/mastervol  glusterfs defaults,_netdev  0  0" >>  /etc/fstab

[root@node1 ~]#mount  -a

[root@node1 ~]# df

Filesystem           1K-blocks    Used Available Use% Mounted on

[root@node1 /]# df

Filesystem           1K-blocks    Used Available Use% Mounted on

/dev/sda2             50395844 1793688  46042156  4% /

tmpfs                   506168       0   506168   0% /dev/shm

/dev/sda1               198337   29791   158306  16% /boot

/dev/mapper/vg_bricks-brick1

                       1038336   33296  1005040   4% /bricks/brick1

node1.example.com:/mastervol

                       1038336   33408  1004928   4% /mnt

 

Node5上创建salvevol卷,并挂载使用

[root@node5 mnt]# gluster

gluster> volume  create   slavevol  node5.example.com:/bricks/brick1/brick

volume create: slavevol: success: pleasestart the volume to access data

gluster> volume  start  slavevol

volume start: slavevol: success

gluster> exit

[root@node5 /]# echo  "node5.exampe.com:/slavevol  /mnt glusterfs  defaults,_netdev  0 0 " >> /etc/fstab

[root@node5 /]# mount  -a

 

[root@node5 /]# df

Filesystem           1K-blocks    Used Available Use% Mounted on

/dev/sda2             50395844 1793544  46042300  4% /

tmpfs                   506168       0   506168   0% /dev/shm

/dev/sda1               198337   29772   158325  16% /boot

/dev/mapper/vg_bricks-brick1

                       1038336   33296  1005040   4% /bricks/brick1

node5.example.com:/slavevol

                       1038336   33408  1004928   4% /mnt

 

 

Node5上创建相应的用户和宿组(用于备份时的账户验证)

[root@node5 /]# groupadd  geogroup

[root@node5 /]# useradd  -G  geogroup  geoaccount

[root@node5 /]# passwd geoaccount

Changing password for user geoaccount.

New password:

BAD PASSWORD: it is based on a dictionaryword

BAD PASSWORD: is too simple

Retype new password:

passwd: all authentication tokens updatedsuccessfully.

 

主从站点的复制,是需要站点间连接的,在node1上以node5的账户登陆到node5上复制数据以实现备份。而每次登陆都需要输入node5的账号信息,有鉴于此,可以使用秘钥验证来实现自动登陆,把master站点上的公钥传给slave站点,slave站点利用公钥验证master的私钥即可实现两节点的自动连接

[root@node1 ~]# ssh-keygen   master站点创建秘钥

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

3d:af:76:30:7c:bc:5c:c3:36:82:a0:d3:98:ae:15:[email protected]

The key's randomart image is:

+--[ RSA 2048]----+

|                 |

|                 |

|     E          |

|      o..       |

|     .=Sooo .   |

|     =.. =o+ *  |

|    ...   =.= o |

|    ..   ..+    |

|   ..   ...     |

+-----------------+

geoaccount账户把master站点的公钥复制到slave站点上

[root@node1 ~]# ssh-copy-id  [email protected]

[email protected]'s password:

Now try logging into the machine, with"ssh '[email protected]'", and check in:

 

 .ssh/authorized_keys

 

to make sure we haven't added extra keysthat you weren't expecting.

node5上针对文件/etc/glusterfs/glusterd.vol添加如下内容

[root@node5 ~]# vim  /etc/glusterfs/glusterd.vol

volume management

   type mgmt/glusterd

   option working-directory /var/lib/glusterd

   option transport-type socket,rdma

    option  mountbroker-root  /var/mountbroker-root

   option mountbroker-geo-replication.geoaccount  slavevol

   option geo-replication-log-group  geogroup

   option rpc-auth-allow-insecure  on

   option transport.socket.keepalive-time 10

   option transport.socket.keepalive-interval 2

   option transport.socket.read-fail-log off

   option ping-timeout 0

#  option base-port 49152

end-volume

创建对应的/var/mountbroker-root目录,并给权限0711

[root@node5 ~]# mkdir /var/mountbroker-root

[root@node5 ~]# chmod 0711  /var/mountbroker-root

 

最后在node5上启动glusterd服务

[root@node5 ~]# service  glusterd  restart

Starting glusterd:                                         [  OK  ]

 

在node1(即master)上实现站点复制

[root@node1 ~]# gluster

gluster> system::   execute   gsec_create

Common secret pub file present at/var/lib/glusterd/geo-replication/common_secret.pem.pub

gluster> volume  geo-replication   mastervol [email protected]::slavevol create  push-pem  指定slave站点通过geoaccount账户把master站点上的mastervol卷的数据复制到slave站点上的slavevol卷上

Creating geo-replication session betweenmastervol & [email protected]::slavevol has been successful

gluster> volume  geo-replication  mastervol [email protected]::slavevol start

Starting geo-replication session between mastervol& [email protected]::slavevol has been successful   启动主从站点之间的数据复制

gluster> volume   geo-replication  mastervol [email protected]::slavevol  status             查看主从站点关系状态

MASTER NODE          MASTER VOL    MASTER BRICK            SLAVE                                 STATUS    CHECKPOINT STATUS    CRAWL STATUS       

---------------------------------------------------------------------------------------------------------------------------------------------

node1.example.com    mastervol     /bricks/brick1/brick    [email protected]::slavevol    faulty   N/A                  N/A                

node2.example.com    mastervol     /bricks/brick1/brick    [email protected]::slavevol    faulty   N/A                  N/A    

 

node5上执行如下命令

[root@node5 ~]# /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh   geoaccount mastervol  slavevo

Successfully copied file.

Command executed successfully.

 

Node1上创建数据

[root@node1 /]# cd /mnt

[root@node1 mnt]# mkdir  file{0..5}

[root@node1 mnt]# ls

file0 file1  file2  file3 file4  file5

 

Node1把数据传送过来了

[root@node5 ~]# cd   /mnt

[root@node5 mnt]# ls

[root@node5 mnt]# ls

[root@node5 mnt]# ls

[root@node5 mnt]# ls

file0 file1  file2  file3 file4  file5

 

注意:master/slave站点之间的复制是实时的,master站点只要有数据更新,马上会把数据复制给slave站点,slave站点只能是备份功能,slave站点上创建的数据是不会主动于slave站点的volume同步的

你可能感兴趣的:(集群)