1. Introduction
2. What is GlusterFS
3. Preliminary Assumptions
4. GlusterFS Installation
4.1. Distributed storage configuration
4.1.1. Peer Probe
4.1.2. Create Storage Volume
4.1.3. Start storage volume
4.1.4. Setting up Client
4.1.5. Testing GlusterFS distributed configuration
4.2. Replicated storage configuration
4.2.1. Peer Probe
4.2.2. Create Storage Volume
4.2.3. Start storage volume
4.2.4. Setting up client
Testing GlusterFS replicated configuration
5. Expanding GlusterFS volumes
6. Security Settings
7. Conclusion
8. Appendix
8.1. Incorrect number of bricks
8.2. Host storage.server1 not a friend
Whether you are administrating a small home network or an enterprise network for a large company the data storage is always a concern. It can be in terms of lack of disk space or inefficient backup solution. In both cases GlusterFS can be the right tool to fix your problem as it allows you to scale your resources horizontally as well as vertically. In this guide we will configure the distributed and replicated/mirror data storage. As the name suggests a GlusterFS's distributed storage mode will allow you to evenly redistribute your data across multiple network nodes, while a replicated mode will make sure that all your data are mirrored across all network nodes.
After reading the introduction you should have already a fair idea what GlusterFS is. You can think of it as an aggregation service for all your empty disk space across your whole network. It connects all nodes with GlusterFS installation over TCP or RDMA creating a single storage resource combining all available disk space into a single storage volume ( distributed mode ) or uses the maximum of available disk space on all notes to mirror your data ( replicated mode ). Therefore, each volume consist of multiple nodes, which in GlusterFS terminology are called bricks.
Although GlusterFS can by installed and used on any Linux distribution, this article will primarily use Ubuntu Linux. However, you should be able to use this guide on any Linux Distribution like RedHat, Fedora, SuSe, etc. The only part which will be different will be the GlusterFS installation process.
Furthermore, this guide will use 3 example hostnames:
storage.server1 - GlusterFS storage server
storage.server2 - GlusterFS storage server
storage.client - GlusterFS storage client
Use DNS server or /etc/hosts file to define your hostnames and adjust your scenario to this guide.
GlusterFS server needs to be installed on all hosts you wish to add to your final storage volume. In our case it will be storage.server1 and storage.server2. You can use GlusterFS as a single server and a client connection to act as an NFS server. However, the true value of GlusterFS is when using multiple server hosts to act as one. Use the following command on both servers to install the GlusterFS server:
storage.server1 $ sudo apt-get install glusterfs-server
and
storage.server2 $ sudo apt-get install glusterfs-server
The above commands will install and start glusterfs-server on both systems. Confirm that both servers are running with:
$ sudo service glusterfs-server status
First we will create a GlusterFS distributed volume. In the distributed mode, GlusterFS will distribute evenly any data across all connected bricks. For example, if clients write files file1, file2, file3 and file4 to a GlusterFS mounted directory, then server.storage1 will contain file1 and file2 and server.storage2 will get file3 and file4. This scenario is illustrated using the diagram below.
First, we need to make both GlusterFS servers to talk to each other, which means that we are effectively creating a pool of trusted servers.
storage.server1 $ sudo gluster peer probe storage.server2
Probe successful
The above command will add storage.server2 to a trusted server pool. This settings are replicated across any connected servers so you do not have to run the above command on other serves. By now both servers will have the peer config file available similar to the one below:
$ cat /etc/glusterd/peers/951b8732-42f0-42e1-a32f-0e1c4baec4f1
uuid=951b8732-42f0-42e1-a32f-0e1c4baec4f1
state=3
hostname1=storage.server2
Next, we can use both servers to define a new storage volume consisting of two bricks, one for each server.
storage.server1 $ sudo gluster volume create dist-vol storage.server1:/dist-data \ storage.server2:/dist-data
Creation of volume dist-vol has been successful. Please start the volume to access data.
The above command created a new volume called dist-vol consisting of two bricks. If directory /dist-data does not exist it will be also created on both servers by the above command. As it was already mentioned before, you can add only one brick to the volume and thus making the ClusterFS server act as an NFS server. You can check whether your new volume was created by:
$ sudo gluster volume info dist-vol
Volume Name: dist-vol
Type: Distribute
Status: Created
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage.server1:/dist-data
Brick2: storage.server2:/dist-data
Now, we are ready to start your new volume:
storage.server1 $ sudo gluster volume start dist-vol
Starting volume dist-vol has been successful
storage.server1 $ sudo gluster volume info dist-vol
Volume Name: dist-vol
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage.server1:/dist-data
Brick2: storage.server2:/dist-data
This concludes a configuration of the GlusterFS data server in the distributed mode. The end result should be a new distributed volume called dist-vol consisting of two bricks.
Now that we have created a new GlusterFS volume, we can use the GlusterFS client to mount this volume to any hosts. Login to the client host and install the GlusteFS client:
storage.client $ sudo apt-get install glusterfs-client
Next, create a mount point to which you will mount your new dist-vol GlusterFS volume, for example export-dist:
storage.client $ sudo mkdir /export-dist
Now, we can mount the dist-vol GlusterFS volume with the mount command:
storage.client $ sudo mount -t glusterfs storage.server1:dist-vol /export-dist
All shout be ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly:
$ mount | grep glusterf
Everything is ready so we can start some tests. On the client side crate 4 files in the GlusterFS mounted directory:
storage.client $ touch /export-dist/file1 file2 file3 file4
The GlusterFS will now take all files and redistribute them evenly among all bricks in the dist-vol volume. Therefore, storage.server1 will contain:
storage.server1 $ ls /dist-data/
file3 file4
and storage.server2 will contain:
storage.server2 $ ls /dist-data
file1 file2
Of course your results may be different.
The procedure of creating a replicated GlusterFS volume is similar to the distributed volume explained earlier. In fact, the only difference is the way how the ClusterFS volume is created. But let's go again from the start:
First, we need to make both GlusterFS servers to talk to each other, which means that we are effectively creating a pool of trusted servers.
storage.server1 $ sudo gluster peer probe storage.server2
Probe successful
If this is already done you can skip this step.
In this step we need to create a replica volume.
$ sudo gluster volume create repl-vol replica 2 \ storage.server1:/repl-data storage.server2:/repl-data Creation of volume repl-vol has been successful. Please start the volume to access data.
Basic translation of the above command could be that we have created a replicated volume ( replica ) called repl-vol . The number 2 in the command indicates the stripe count, which means that when expanding this volume we always need to add the number of bricks equal to the multiple of volume stripe count ( 2, 4, 8 16 etc.).
It is time to start our new replicated volume:
$ sudo gluster volume start repl-vol
Starting volume repl-vol has been successful
Check the status:
storage.server1 $ sudo gluster volume info repl-vol
Volume Name: repl-vol
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage.server1:/repl-data
Brick2: storage.server2:/repl-data
The client configuration is the same as when setting up the client for the distributed volume mount.
Install client:
storage.client $ sudo apt-get install glusterfs-client
Create a mount point:
storage.client $ sudo mkdir /export-repl
Mount the repl-vol GlusterFS volume with the mount command:
storage.client $ sudo mount -t glusterfs storage.server1:repl-vol /export-repl
All shout be now ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly:
$ mount | grep glusterf
The point of the replicated GlusterFS volume is that data will be seamlessly mirrored across all nodes. Thus when creating files in /export-repl/
$ touch /export-repl/file1 file2 file3 file4
all files will be available on both servers:
storage.server1 $ ls /repl-data/
file1 file2 file3 file4
and
storage.server2 $ ls /repl-data/
file1 file2 file3 file4
In the case that you need to scale up your data storage to include additional bricks, the process is simple:
$ sudo gluster volume add-brick rep-vol storage.server3:/repl-vol storage.server4:repl-vol /export-repl
This will add another two bricks of storage to your repl-vol. Once you add new bricks you may need to re-balance the entire volume with:
$ sudo gluster volume rebalance repl-vol fix-layout start
and sync / migrate all data with:
$ sudo gluster volume rebalance repl-vol migrate-data start
Furthermore, you can check the re-balance progress with
$ sudo gluster volume rebalance vol0 status
In addition to the above configuration you can make the entire volume more secure by allowing only certain hosts to join the pool of trust. For example, if we want only the host with 10.1.1.10 to be allowed into participating in the volume repl-vol we use the following command:
$ sudo gluster volume set repl-vol auth.allow 10.1.1.10
In the case that we need the entire subnet simply use asterisk:
$ sudo gluster volume set repl-vol auth.allow 10.1.1.*
GlusterFS is a powerful GPL3 licensed software. One can also use it as a quick software RAID 1 by defining two separate physical device bricks on the single host into the replicated GlusterFS volume. Of course it would be better to use the software raid for that job, but still the possibility is there. I found GlusterFS easy to use and configure.
Here I will just list few errors and answers I encountered while playing with GlusterFS:
Incorrect number of bricks supplied 1 for type REPLICATE with count 2
If you have created a volume with stripe count 2 you need to add at least 2 additional bricks at that time.
Host storage.server1 not a friend
First add the GlusterFS server to the pool of trust before you attempt to include it into the volume.