[Linux-cluster] FS corruption|SAN + GFS2 + DRBD|clvm + snapshot

Wed Apr 27 06:26:25 UTC 2011

Hi All,
I'm looking for a way to backup GFS data against GFS corrupion. A couple
weeks ago my FS was corrupted and I'm not able to repair it due to
memory error - it's what may dmesg shows:

Node 0 HighMem per-cpu: empty
Free pages:       78508kB (0kB HighMem)
Active:2060003 inactive:2004573 dirty:0 writeback:0 unstable:0
free:19627 slab:3437 mapped-file:22 mapped-anon:4064873 pagetables:13656
Node 0 DMA free:10828kB min:8kB low:8kB high:12kB active:0kB
inactive:0kB present:10428kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3255 16132 16132
Node 0 DMA32 free:54744kB min:3276kB low:4092kB high:4912kB
active:1658160kB inactive:1572852kB present:3333344kB
pages_scanned:5178723 all_unreclaimable? yes
lowmem_reserve[]: 0 0 12877 12877
Node 0 Normal free:12936kB min:12968kB low:16208kB high:19452kB
active:6581884kB inactive:6445152kB present:13186560kB
pages_scanned:35577289 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 0*8kB 4*16kB 4*32kB 4*64kB 1*128kB 2*256kB 1*512kB
1*1024kB 0*2048kB 2*4096kB = 10828kB
Node 0 DMA32: 16*4kB 1*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
1*1024kB 0*2048kB 13*4096kB = 54744kB
Node 0 Normal: 10*4kB 2*8kB 3*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB
0*1024kB 0*2048kB 3*4096kB = 12936kB
Node 0 HighMem: empty
55 pagecache pages
Swap cache: add 2826670, delete 2826670, find 34309/65098, race 0+0
Free swap  = 0kB
Total swap = 10223608kB
Free swap:            0kB
4390912 pages of RAM
281043 reserved pages
601 pages shared
0 pages swap cached
Out of memory: Killed process 3552, UID 0, (gfs_fsck).
gfs_fsck: page allocation failure. order:0, mode:0x201d2

Call Trace:
 [<ffffffff8000f576>] __alloc_pages+0x2ef/0x308
 [<ffffffff80012ed7>] __do_page_cache_readahead+0x96/0x179
 [<ffffffff8001386c>] filemap_nopage+0x14c/0x360
 [<ffffffff8000898c>] __handle_mm_fault+0x1fa/0xfaa
 [<ffffffff800c78fb>] generic_file_read+0xac/0xc5
 [<ffffffff80067b55>] do_page_fault+0x4cb/0x874
 [<ffffffff8005ede9>] error_exit+0x0/0x84

I must say that I'm not sure what is going one cause I have 16G RAM on
the server and 64bit OS.... According to the docs gfs.fsck should have
needed about 8G of RAM... (the gfs vol size is about 2.7TB). Thanks God
I was able to mount it with RO opt.

Now I'm looking the way to solve the issue and avoid the similar
situation in the future... I'm thinking about some way to "backup" my
data. The main gole is to avoid longer downtime. I can even accept some
data lost (eg. 1 day or sth...)

I was thinking about two scenarions:
1. SAN+GFS2+DRBD the issue is I don't know it is possible. We use 2
nodes with shared SAN storage. I want to backup it on the  iscsi with
asynchronic way (protocol A in DRBD). The thing is I think it would be
possible if we did't use SAN but only local drives... As so far I didn't
find any docs considering the following scenario.

2. GFS2 + CLVM + snapshot

Which of those is better, more relible? If the 1st one is generally
possible (in some way)? Maybe it is some other solution to prevent from
GFS corrupion and long long recovery process?

What about fsck.gfs2? How fast is it?

Thx in advance!

-- 
mr