[Linux-cluster] sparse-file clone breaks on GFS2

Pierre-Philipp Braun pbraun at nethence.com
Sun Sep 20 17:19:08 UTC 2020

 >> cd573cfaace07e7949bc0c46028904ff  /root/dummy-on-ext4.ext4
 >> cd573cfaace07e7949bc0c46028904ff  /data2/dummy-clone-from-ext4.ext4
 >> and... WOW!  Cloning it yet again, it finally remains intact!
 >> cd573cfaace07e7949bc0c46028904ff  dummy-clone-from-ext4-and-clone.ext4
 >> cd573cfaace07e7949bc0c46028904ff  dummy-clone-from-ext4.ext4
 >> It's strange.  Anyone?
 > Absolutely. Really strange.

It truly happened, however when I try this again, I cannot reproduce 
this right now.  All I see now is a corruption whenever I copy a sparse 
file-system from GFS2, be it created on EXT4 or not.

dd if=/dev/zero of=/root/dummy.ext4 bs=1G count=0 seek=1
mkfs.ext4 /root/dummy.ext4
cp --sparse=always /root/dummy.ext4 /data2/dummy-clone-from-ext4.ext4
cp --sparse=always /data2/dummy-clone-from-ext4.ext4 
cp --sparse=always /data2/dummy-clone-from-ext4.ext4 
md5sum /root/dummy.ext4 /data2/dummy-clone-from-ext4.ext4 
/data2/dummy-clone-from-ext4-clone.ext4 /root/dummy-clone-from-gfs2.ext4


aefedffac2f2cbc0d8fe15155703e7a0  /root/dummy.ext4
aefedffac2f2cbc0d8fe15155703e7a0  /data2/dummy-clone-from-ext4.ext4
a3da5ced2af823e06a352124b9c800c7  /data2/dummy-clone-from-ext4-clone.ext4
a3da5ced2af823e06a352124b9c800c7  /root/dummy-clone-from-gfs2.ext4

and it's consistent, as shown with this second run:

be123782b303fc391f93478063d21b51  /root/dummy.ext4
be123782b303fc391f93478063d21b51  /data2/dummy-clone-from-ext4.ext4
22fa85945a94001089740f024a4c3f1e  /data2/dummy-clone-from-ext4-clone.ext4
22fa85945a94001089740f024a4c3f1e  /root/dummy-clone-from-gfs2.ext4

same happens with XFS sparse file:

dd if=/dev/zero of=/root/dummy.xfs bs=1G count=0 seek=1
mkfs.xfs /root/dummy.xfs
cp --sparse=always /root/dummy.xfs /data2/dummy-clone-from-ext4.xfs
cp --sparse=always /data2/dummy-clone-from-ext4.xfs 
cp --sparse=always /data2/dummy-clone-from-ext4.xfs 
md5sum /root/dummy.xfs /data2/dummy-clone-from-ext4.xfs 
/data2/dummy-clone-from-ext4-clone.xfs /root/dummy-clone-from-gfs2.xfs

324c660d154cad3dd35f69a8531fc290  /root/dummy.xfs
324c660d154cad3dd35f69a8531fc290  /data2/dummy-clone-from-ext4.xfs
2e58d6e15a01ef3cf31b89e0ff219c7a  /data2/dummy-clone-from-ext4-clone.xfs
2e58d6e15a01ef3cf31b89e0ff219c7a  /root/dummy-clone-from-gfs2.xfs

and thankfully this does not happen with a sparse-file without a 
file-system on it, namely, full of _only_ zeroes:

dd if=/dev/zero of=/data2/dummy bs=1G count=0 seek=1
cp --sparse=always /data2/dummy /data2/dummy-clone
md5sum /data2/dummy /data2/dummy-clone

cd573cfaace07e7949bc0c46028904ff  /data2/dummy
cd573cfaace07e7949bc0c46028904ff  /data2/dummy-clone

cd573cfaace07e7949bc0c46028904ff  /data2/dummy
cd573cfaace07e7949bc0c46028904ff  /data2/dummy-clone

hexdump -C /data2/dummy
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 

I will try to do other tests with self-forged data plus some sparse and 
see how it goes.  I doubt it's related to something file-system-on-gfs2 

 >> In case it matters, I am using vanilla Linux 4.18.20 and not the RHEL
 >> nor CentOS with patches.
 > What distribution are you using (Slackware)? Any additional data about
 > your filesystem (ie: num journals/nodes)? I would like to try to
 > replicate the issue if I can find some spare time...
Slackware current with a hopefully close-enough kernel version as RHEL8 
+ gfs2-utils and dlm *1 but I suppose it can be reproduced with RHEL8. 
Actually I would prefer if you could reproduce the issue with a rather 
classical setup.  I would make my case easier to defend ;-)

GFS2 file-system was created as such: `mkfs.gfs2 -j2 -t pro5s:data2 
/dev/drbd2`, running on two DRBD8 dual-primary nodes.  I still have no 
idea at what storage layer I am having troubles with here, so I also 
tested with a single mount on node2.  And I also tested while having 
node1 as secondary.  No success there and it's actually good news for 
me, as I wouldn't like my infrastructure to be f&^%ed up at the 
block-device level.

# mkfs.gfs2 -V
mkfs.gfs2 master (built Aug 16 2020 21:36:57)
Copyright (C) Red Hat, Inc.  2004-2010  All rights reserved.

oops, I should not be using the development branch so I switched to the 
release, but no big surprise, copying an EXT4 sparse file from GFS2, 
still corrupts it, here.

# mkfs.gfs2 -V
mkfs.gfs2 3.3.0 (built Sep 20 2020 19:28:37)
Copyright (C) Red Hat, Inc.  2004-2020  All rights reserved.

# dlm_controld -V
dlm_controld 4.0.9 (built Aug 17 2020 07:02:31)
Copyright Red Hat, Inc. 2004-2013

# tunegfs2 -l /dev/drbd2
tunegfs2 (Aug 16 2020 21:36:58)
File system volume name: pro5s:data2
File system UUID: 6a95be71-e865-4706-b7c8-0c1bb0c3e232
File system magic number: 0x1161970
Block size: 4096
Block shift: 12
Root inode: 66241
Master inode: 32854
Lock protocol: lock_dlm
Lock table: pro5s:data2

I am looking forward to hearing from you or anyone with a GFS2 mount in 
place, who's willing to reproduce the issue.  Here are sample command 
lines on how to reproduce, meanwhile I try with something else than some 

dd if=/dev/zero of=dummy.xfs bs=1G count=0 seek=1
mkfs.xfs dummy.xfs
cp --sparse=always dummy.xfs dummy-clone.xfs
md5sum dummy.xfs dummy-clone.xfs

*1 https://pagure.io/gfs2-utils https://pagure.io/dlm


More information about the Linux-cluster mailing list