[Linux-cluster] GFS2 'No space left on device' while df reports 1gb is free

Thu Sep 10 13:37:25 UTC 2015

----- Original Message -----
> Is there a paper which describes that "worst case"? I did not know about
> that allocation subtleties.

Unfortunately, no.

(snip) 
> Heh, another person blindly copied parameters I usually use for very
> small filesystems. And we used that for tests in virtualized
> environments with limited space. As part of testing we tried to grow
> GFS2 and found that with quite big resource groups on small enough block
> devices we loose significant amount of space because remaining space is
> insufficient to fit one more rg. For example, with two 8MB journals,
> grow ~256+2*8 fs to ~300MB. If we had 128MB groups that was failing.
> with 32MB ones it succeeded.
> 
> Well, that just means that one size does not fit all.

Indeed.

> Sure.
> Is it safe enough to just drop that '-r' parameter from mkfs command
> line for production filesystems?

Yes, as a rule using the default parameter for -r is best. You can also
get a performance problem if your resource group size is too big. I discovered
this and documented it in Bugzilla bug #1154782 (which may be private).
https://bugzilla.redhat.com/show_bug.cgi?id=1154782

Basically, if the rgrp size is max size (2GB), you will have 33 blocks of
bitmaps per rgrp. That corresponds to a LOT of page cache lookups for
every bitmap operation, which kills performance. I've written a patch that
fixes that, but it's not available yet, and when it is, will only be for
RHEL7.2 and above. So for now you have to find a happy medium between
too many rgrps with a loss of usable space, and rgrps that are too big, with
loss of performance. Using the default for -r is generally best.

> I suspect there will be attempts to migrate to a much bigger block
> devices (f.e. 1TB -> 20TB), but I'd do not concentrate on them now...
> 
> >
> > If you do mkfs.gfs2 and specify -r512, you will be able to use much more
> > of the file system, and it won't get into this problem until much later.
> 
> What could be the rule of thumb for prediction of such errors?
> I mean at which point (in MB or %) we should start to care that we may
> get such error, depending on a rg size? Is there a point until which we
> definitely won't get them?

Well, that's a sliding scale, and the calculations are messy.
(Hence the need to clean them up).

We always recommend implementing GFS2 in a test environment first
before putting it into production, so you can try these things to
see what works best for your use case.

Regards,

Bob Peterson
Red Hat File Systems