[Linux-cluster] dlm and IO speed problem <er, might wanna get a coffee first ; )>

Fri Apr 11 15:47:16 UTC 2008

On Fri, 2008-04-11 at 10:28 -0500, Wendy Cheng wrote:
> christopher barry wrote:
> > On Tue, 2008-04-08 at 09:37 -0500, Wendy Cheng wrote:
> >   
> >> gordan at bobich.net wrote:
> >>     
> >>>       
> >>>> my setup:
> >>>> 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's
> >>>> not new stuff, but corporate standards dictated the rev of rhat.
> >>>>         
> >>> [...]
> >>>       
> >>>> I'm noticing huge differences in compile times - or any home file access
> >>>> really - when doing stuff in the same home directory on the gfs on
> >>>> different nodes. For instance, the same compile on one node is ~12
> >>>> minutes - on another it's 18 minutes or more (not running concurrently).
> >>>> I'm also seeing weird random pauses in writes, like saving a file in vi,
> >>>> what would normally take less than a second, may take up to 10 seconds.
> >>>>         
> >
> > Anyway, thought I would re-connect to you all and let you know how this
> > worked out. We ended up scrapping gfs. Not because it's not a great fs,
> > but because I was using it in a way that was playing to it's weak
> > points. I had a lot of time and energy invested in it, and it was hard
> > to let it go. Turns out that connecting to the NetApp filer via nfs is
> > faster for this workload. I couldn't believe it either, as my bonnie and
> > dd type tests showed gfs to be faster. But for the use case of large
> > sets of very small files, and lots of stats going on, gfs simply cannot
> > compete with NetApp's nfs implementation. GFS is an excellent fs, and it
> > has it's place in the landscape - but for a development build system,
> > the NetApp is simply phenomenal.
> >   
> 
> Assuming you run both configurations (nfs-wafl vs. gfs-san) on the very 
> same netapp box (?) ...

yes.

> 
> Both configurations have their pros and cons. The wafl-nfs runs on 
> native mode that certainly has its advantages - you've made a good 
> choice but the latter (gfs-on-netapp san) can work well in other 
> situations. The biggest problem with your original configuration is the 
> load-balancer. The round-robin (and its variants) scheduling will not 
> work well if you have a write intensive workload that needs to fight for 
> locks between multiple GFS nodes. IIRC, there are gfs customers running 
> on build-compile development environment. They normally assign groups of 
> users on different GFS nodes, say user id starting with a-e on node 1, 
> f-j on node2, etc.

exactly. I was about to implement the sh (source hash) scheduler in LVS,
which I believe would have accomplished the same thing, only
automatically, and in a statistically balanced way. Actually still
might. I've had some developers test out the nfs solution and for some
gfs is still better. I know that if users are pinned to a node - but can
still failover in the event of node failure - this would yield the best
possible performance.

The main reason the IT group wants to use nfs, is for all of the other
benefits, such as file-level snapshots, better backup performance, etc.
Now that they see a chink in the gfs performance armor, mainly because I
implemented the wrong load balancing algorithm, they're circling for the
kill. I'm interested how well the nfs will scale with users vs. the
gfs-san approach.

> 
> One encouraging news from this email is gfs-netapp-san runs well on 
> bonnie. GFS1 has been struggling with bonnie (large amount of smaller 
> files within one single node) for a very long time. One of the reasons 
> is its block allocation tends to get spread across the disk whenever 
> there are resource group contentions. It is very difficult for linux IO 
> scheduler to merge these blocks within one single server. When the 
> workload becomes IO-bound, the locks are subsequently stalled and 
> everything start to snow-ball after that. Netapp SAN has one more layer 
> of block allocation indirection within its firmware and its write speed 
> is "phenomenal" (I'm borrowing your words ;) ), mostly to do with the 
> NVRAM where it can aggressively cache write data - this helps GFS to 
> relieve its small file issue quite well.

Thanks for all of your input Wendy.

-C

> 
> -- Wendy
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster