[Linux-cluster] GFS2 loses data under kernel 2.6.24...

Glen Dosey doseyg at r-networks.net
Fri Feb 8 05:01:57 UTC 2008


I experienced this today at work on a RHEL5 system and have verified it
today at home on Fedora 8. Perhaps I am doing something foolish ....

I have a fully patched RHEL5 x86_64 system which works fine with the Red
Hat supplied cluster stuff, except the NFS server performance is abysmal
(~640Mb/s NFS). After pulling my hair trying to fix NFS I decided to
just grab the latest kernel which fixed the problem (~980Mb/s NFS). But
it introduced another much more serious problem, which I've duplicated
on my FC8 x86_64 system at home. 

I already have all the cman/clvmd/openais/gfs[2]-utils packages
installed through the package manager. I downloaded kernel 2.6.24 from
kernel.org and did a straight `make -j4 rpm ` and installed the
resulting rpm in both instances. Both systems worked fine with
RHEL/Fedora kernels, but here's what happens under 2.6.24

[root at eclipse test]# dd if=/dev/zero of=test3.dd bs=512M count=1
1+0 records in
1+0 records out
536870912 bytes (537 MB) copied, 7.95285 s, 67.5 MB/s
[root at eclipse test]# ll
total 2101312
-rw-r--r-- 1 root root          0 2008-02-07 23:25 test2.dd
-rw-r--r-- 1 root root  536870912 2008-02-07 23:42 test3.dd
-rw-r--r-- 1 root root 1073741824 2008-02-07 22:54 test.dd
[root at eclipse test]# cd ..
[root at eclipse mnt]# umount /mnt/test/
[root at eclipse mnt]# mount /mnt/test/
[root at eclipse mnt]# mount | grep test
/dev/mapper/disk00-test on /mnt/test type gfs2
(rw,hostdata=jid=0:id=524289:first=1)
[root at eclipse mnt]# cd /mnt/test/
[root at eclipse test]# ll
total 2101312
-rw-r--r-- 1 root root          0 2008-02-07 23:25 test2.dd
-rw-r--r-- 1 root root          0 2008-02-07 23:42 test3.dd
-rw-r--r-- 1 root root 1073741824 2008-02-07 22:54 test.dd

Files that have data just go zero size after an umount and remount. I've
tried a variety of file sizes and tried it with file containing data as
well (not all zeros). This worked under the RHEL kernels, so is there
something I'm doing wrong ?

Both systems are running cman and are a quorate 2 node cluster (where
the second node doesn't exist). At work it's a 1TB shared filesystem but
here at home it's just a local disk, so there's nothing else with any
access to it.

If someone could maybe point out what I'm doing wrong I'd appreciate it,
or just let me know this won't work for whatever reason. I haven't even
touched on getting the GFS1 modules to build into this.

Thanks,
Glen





More information about the Linux-cluster mailing list