[Linux-cluster] GFS block size

Wed Jan 5 13:47:44 UTC 2011

Adam,

Thank you for the background on stuffed inodes and resource groups, it is much appreciated.

For this specific application most files are under 1k.  A few are larger (20-30k) but they are rare and so I think we can accommodate a small performance hit for these.  Overall the file system may contain 500,000 or more of these small files at a time.

The improvement we measured is a bit more than "modest".  Our benchmark finishes about 30% faster with the 1k block size compared to 4k.  That's a nice win for a simple change.  Disk bandwidth to/from shared storage might be a factor--we have 12 nodes accessing this storage, so the aggregate bandwidth is considerable.

It has been suggested to me that NFS would yield more performance gains, but I have not attempted this.  RHCS has so far met our expectations of high availability.  Given that NFS is not a cluster file system I'm nervous that such a setup could introduce new points of failure.  (I realize that NFS could be coupled with e.g. DRBD+pacemaker for failover purposes.)

We implemented the typical GFS1 tuneables long ago (noatime, noquota, statfs_fast).  Disabling SELinux also helped.  Checking block size was truly an afterthought, and we had not given any consideration to resource group size either.

I've learned a ton about disk storage by implementing shared storage and clustered filesystems over the past 3 years.  Block devices are a bit "magical" in general, and widely misunderstood by system administrators and software engineers.  (For example, I've heard some fantastic performance claims on ext3 file systems that turned out to demonstrate how effective Linux is at hiding disk latency.)  Thanks again to you and this list for providing continued insight.

-Jeff

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Adam Drew
> Sent: Tuesday, January 04, 2011 2:18 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] GFS block size
> 
> If your average file size is less than 1K then using a block size of 1k may be a good
> option. If you can fit your data in a single block you get the minor performance boost
> of using a stuffed inode so you never have to walk a list from your inode to your data
> block. The performance boost should be small but could add up to larger gains over
> time with lots of transactions. If your average data payload is less than the default
> block-size however, you'll end up losing the delta. So, from a filesystem perspective,
> using a 1k blocksize to store mostly sub-1k files may be a good idea.
> 
> You additionally may want to experiment with reducing your resource group size.
> Blocks are organized into resource groups. If you are using 1k blocks and sub-1k files
> then you'll end up with tons of stuffed inodes per resource group. Some operations in
> GFS require locking the resource group metadata (such as deletes) so you may start
> to experience performance bottle-necks depending on usage patterns and disk layout.
> 
> All-in-all I'd be skeptical of the claim of large performance gains over time by changing
> rg size and block size but modest gains may be had. Still, some access patterns and
> filesystem layouts may experience greater performance gains with such tweaking.
> However, I would expect to see the most significant gains (in GFS1 at least) made by
> mount options and tuneables.
> 
> Regards,
> Adam Drew
> 
> ----- Original Message -----
> From: "juncheol park" <nukejun at gmail.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Tuesday, January 4, 2011 1:42:45 PM
> Subject: Re: [Linux-cluster] GFS block size
> 
> I also experimented 1k block size on GFS1. Although you can improve the disk usage
> using a smaller block size, typically it is recommended to use the block size same as
> the page size, which is 4k in Linux.
> 
> I don't remember all the details of results. However, for large files, the overall
> performance of read/write operations with 1k block size was much worse than the one
> with 4k block size. This is obvious, though. If you don't care any performance
> degradation for large files, it would be fine for you to use 1k.
> 
> Just my two cents,
> 
> -Jun
> 
> 
> On Fri, Dec 17, 2010 at 3:53 PM, Jeff Sturm <jeff.sturm at eprize.com> wrote:
> > One of our GFS filesystems tends to have a large number of very small
> > files, on average about 1000 bytes each.
> >
> >
> >
> > I realized this week we'd created our filesystems with default
> > options.  As an experiment on a test system, I've recreated a GFS
> > filesystem with "-b 1024" to reduce overall disk usage and disk bandwidth.
> >
> >
> >
> > Initially, tests look very good—single file creates are less than one
> > millisecond on average (down from about 5ms each).  Before I go very
> > far with this, I wanted to ask:  Has anyone else experimented with the
> > block size option, and are there any tricks or gotchas to report?
> >
> >
> >
> > (This is with CentOS 5.5, GFS 1.)
> >
> >
> >
> > -Jeff
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster