[Linux-cluster] Unformatting a GFS cluster disk

Sun Mar 30 19:54:17 UTC 2008

christopher barry wrote:
> On Fri, 2008-03-28 at 07:42 -0700, Lombard, David N wrote:
>>>> A fun feature is that the multiple snapshots of a file have the identical
>>>> inode value
>>>>         
>
>   

I can't fault this statement but would prefer to think snapshots as 
different trees with their own root inodes (check "Figure 3" of:
"Network Appliance: File System Design for an NFS File Server Appliance" 
- the pdf paper can be downloaded from:
http://en.wikipedia.org/wiki/Write_Anywhere_File_Layout; scroll down to 
the bottom of the page).

In the SAN environment, I also like to think multiple snapshots as 
different trees that may share same disk blocks for faster backup (no 
write) and less disk space consumption, but each with its own root 
inode. Upon recovery time, the (different) trees can be exported and 
seen by linux host as different lun(s). The detailed internal could be 
quite tedious and I'm not in the position to describe it here.

> So, I'm trying to understand what to takeaway from this thread:
> * I should not use them?
> * I can use them, but having multiple snapshots introduces a risk that a
> snap-restore could wipe files completely by potentially putting a
> deleted file on top of a new file?
>   

Isn't that a "restore" is supposed to do ? Knowing this caveat without 
being told, you don't look like an admin who will make this mistake .. 

> * I should use them - but not use multiples.
> * something completely different ;)
>
> Our primary goal here is to use snapshots to enable us to backup to tape
> from the snapshot over FC - and not have to pull a massive amount of
> data over GbE nfs through our NAT director from one of our cluster nodes
> to put it on tape. We have thought about a dedicated GbE backup network,
> but would rather use the 4Gb FC fabric we've got.
>
>   
Check Netapp NOW web site (http://now.netapp.com - accessible by its 
customers) to see whether other folks have good tips about this. Just 
did a quick search and found a document titled "Linux Snapshot Records 
and LUN Resizing in a SAN Environment". It is a little bit out of date 
(dated on 1/27/2003 with RHEL 2.1) but still very usable in ext3 
environment.

In general, GFS backup from Linux side during run time has been a pain, 
mostly because of its slowness and the process has to walk thru the 
whole filesystem to read every single file that ends up accumulating 
non-trivial amount of cached glocks and memory. For a sizable filesystem 
(say in TBs range like yours), past experiences have shown that after 
backup(s), the filesystem latency can go up to an unacceptable level 
unless its glocks are trimmed. There is a tunable specifically written 
for this purpose (glock_purge - introduced via RHEL 4.5 ) though.

The problem can certainly be helped by the snapshot functions embedded 
in Netapp SAN box. However, if tape (done from linux host ?) is 
preferred as you described due to space consideration, you may want to 
take a (filer) snapshot instance and do a (filer) "lun clone" to it. It 
is then followed by a gfs mount as a separate gfs filesystem (this is 
more involved than people would expect, more on this later). After that, 
the tape backup can take place without interfering with the original gfs 
filesystem on the linux host. On the filer end, copy-on-write will fork 
disk blocks as soon as new write requests come in, with and without the 
tape backup activities.

The thinking here is to leverage the embedded Netapp copy-on-write 
feature to speed up the backup process with reasonable disk space 
requirement. The snapshot volume and the cloned lun shouldn't take much 
disk space and we can turn on gfs readahead and glock_purge tunables 
with minimum interruptions to the original gfs volume. The caveat here 
is GFS-mounting the cloned lun - for one, gfs itself at this moment 
doesn't allow mounting of multiple devices that have the same filesystem 
identifiers (the -t value you use during mkfs time e.g. 
"cluster-name:filesystem-name") on the same node - but it can be fixed 
(by rewriting the filesystem ID and lock protocol - I will start to test 
out the described backup script and a gfs kernel patch next week). Also 
as any tape backup from linux host, you should not expect an image of 
gfs mountable device (when retrieving from tape) - it is basically a 
collection of all files residing on the gfs filesystem when the backup 
events take places.

Will the above serve your need ? Maybe other folks have (other) better 
ideas ?

BTW, the described procedure is not well tested out yet, and more 
importantly, any statement in this email does not represent my 
ex-employer, nor my current employer's official recommendations.

-- Wendy