Distributed Storage using GlusterFS

Jake Sanders | 02 March 2015

A common issue with cloud applications is the requirement for persistent storage. If you're spinning up instances all across your datacentre, you have no idea which physical host will end up serving your applications. This points to the need for some sort of networked file system. If you've ever used a Linux-based network, you've probably heard of NFS. But NFS is a poor choice for distributed / concurrent systems, and as that is what cloud computing is all about, we will need a better solution.

GlusterFS

GlusterFS is an "open source, distributed file system designed for massive scale." However, this is slightly misleading - Gluster isn't really a filesystem, but it collates several file systems so that data is distributed across multiple hosts. The underlying filesystem is usually XFS (recommended by the GlusterFS developers) but ZFS or plain ext4 are often used.

So, why use GlusterFS for your storage needs? I'll outline a few of the features and you can decide for yourself.

GlusterFS is...

  • Designed for commodity hardware: You can spread your data across redundant servers and if one goes down, it should be transparent to clients. You can add or remove disks on the fly.
  • Accessible: Gluster exports its files over NFS and CIFS(smb) if you cannot use the gluster native clients.
  • Easy to set up: the only requirement for GlusterFS is a kernel that supports FUSE (2.6.14 or later.)

Let's Have a Go

Note: before starting, ensure each node's hostname resolves correctly to the address of the host, and that times are synchronised using NTP.

For a quick replicated GlusterFS setup, on a couple of ubuntu hosts, run the following commands:


sudo fallocate -l 15G /brick
sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.5 && sudo apt-get update
sudo apt-get -y install glusterfs-server xfsprogs
sudo mkfs.xfs -i size=512 /brick
sudo mkdir -p /export/brick && sudo mount /brick /export/brick && sudo mkdir -p /export/brick/volume
sudo sh -c "echo \"/brick /export/brick xfs defaults 0 0\" >> /etc/fstab"

Then, on one of the hosts, run the following:


sudo gluster peer probe [address.of.host2]
sudo gluster volume create gv0 replica 2 [address.of.host1]:/export/brick/volume [address.of.host2]:/export/brick/volume
sudo gluster volume info
sudo gluster volume set gv0 auth.allow [client1.address,client2.address,...,clientn.address]
sudo gluster volume start gv0

Then on a client:


sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.5 && sudo apt-get update
sudo apt-get -y install glusterfs-client
sudo mkdir -p /mnt/volume && sudo mount -t glusterfs [address.of.host]:/gv0 /mnt/volume && sudo chmod a+w /mnt/volume
echo "Hello from $HOSTNAME" > /mnt/volume/test.txt

Architecture overview

A quick overview of GlusterFS concepts:

  • Trusted pool - all the hosts in a given cluster.
  • Node - any server in the trusted pool, interchangable with host/server.
  • Brick - any filesystem, preferably a physical disk formatted with XFS.
  • Export - the mount path of the brick(s) on a server
  • Subvolume - a brick after being processed by at least one translator.
  • Volume - the final share after passing through all the trainslators.
  • Translator - takes a subvolume or brick, does something with them and offers a subvolume connection.

Gluster takes your bricks, and passes them through several translators before exposing them as volumes. In a multi-host setup, the "cluster" translator is responsible for distribution / replication.

Need help running Kubernetes?

Get in touch and see how we can help you.

Contact Us