[Linux-cluster] GFS2 as virtual machine disk store

Tue Aug 29 09:45:44 UTC 2017

Hi,

On 26/08/17 07:11, Gionatan Danti wrote:
> Hi list,
> I am evaluating how to refresh my "standard" cluster configuration and 
> GFS2 clearly is on the table ;)
>
> GOAL: to have a 2-node HA cluster running DRBD (active/active), GFS2 
> (to store disk image) and KVM (as hypervisor). The cluster had to 
> support live migration, but manual failover is sufficient (ie: if 
> something goes wrong, is ok to require a sysadmin to take action to 
> restore services).
>
> The idea is to, by default, always run VMs on the first host (using 
> virtlock or sanlock to deny the starting of the same virtual machine 
> from the second host). Should anything bad happen, or should the first 
> host be in maintenance mode, the VMs can be migrated/restarted on the 
> second host.
>
> I have a few questions:
>
> - other peoples told me GFS2 is not well suited for such a tasks and 
> that I am going to see much lower performance than running on a local 
> filesystem (replicated via other means). This advice stems from the 
> requirement to maintain proper write ordering, but strict cache 
> coherency also between the hosts. However, from what I understand 
> reading GFS2 documentation, when operating mostly on a single host 
> (ie: not running anything on the second node), the overhead should be 
> negligible. I am right, or orribly wrong?
>
Yes, there is some additional overhead due to the clustering. You can 
however usually organise things so that the overheads are minimised as 
you mentioned above by being careful about the workload.

> - reading RedHat documentation here[1], I see that it is strongly 
> advised to set cache=none for any virtual disk. Is this required from 
> proper operation, or it is "only" a performance optimization to avoid 
> what stated above (ie: two host sharing the same data in pagecache, 
> thus requiring coherency traffic)? As I really like the improved 
> performance with cache=writeback (which, by the virtue of barrier 
> passing, comes without data loss concerns), you think it is safe to 
> use writeback in production?
No. You want to use the default data=ordered for the most part. It is 
less a question of data loss and more a question of whether in case of a 
power outage it is possible for a file being written to, to land up with 
incorrect content. That can happen in the data=writeback case (where 
block allocation has succeeded, but the new data has not yet been 
written to disk) but not in the data=ordered case.

>
> - I plan to have a volume of about 8 or 16 TB. I understand that GFS2 
> is tested with much bigger volumes (ie: 100 TB), but I would ask: do 
> you would trust a TB-sized volume on GFS2? What about fsck? It works 
> well/reliably?
Yes, it works well. The size limit was based on fsck time, rather than 
any reliability issues. It will work reliably at much larger sizes, but 
it will take longer and use more memory.

I hope that answers a few more of your questions,

Steve.

>
> - I plan to put GFS2 on top of LVM (for backup snapshot) and replicate 
> the volume with DRBD2. Do you see any drawback in this approach?
>
> - finally, how do you feel about running your production virtual 
> machines on DRBD + GFS2?
>
> Thank you all.
>
> [1] 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Global_File_System_2/index.html#s1-VMsGFS2-gfs2
>