[Linux-cluster] GFS2 'No space left on device' while df reports 1gb is free

Bob Peterson rpeterso at redhat.com
Thu Sep 10 13:37:25 UTC 2015


----- Original Message -----
> Is there a paper which describes that "worst case"? I did not know about
> that allocation subtleties.

Unfortunately, no.

(snip) 
> Heh, another person blindly copied parameters I usually use for very
> small filesystems. And we used that for tests in virtualized
> environments with limited space. As part of testing we tried to grow
> GFS2 and found that with quite big resource groups on small enough block
> devices we loose significant amount of space because remaining space is
> insufficient to fit one more rg. For example, with two 8MB journals,
> grow ~256+2*8 fs to ~300MB. If we had 128MB groups that was failing.
> with 32MB ones it succeeded.
> 
> Well, that just means that one size does not fit all.

Indeed.

> Sure.
> Is it safe enough to just drop that '-r' parameter from mkfs command
> line for production filesystems?

Yes, as a rule using the default parameter for -r is best. You can also
get a performance problem if your resource group size is too big. I discovered
this and documented it in Bugzilla bug #1154782 (which may be private).
https://bugzilla.redhat.com/show_bug.cgi?id=1154782

Basically, if the rgrp size is max size (2GB), you will have 33 blocks of
bitmaps per rgrp. That corresponds to a LOT of page cache lookups for
every bitmap operation, which kills performance. I've written a patch that
fixes that, but it's not available yet, and when it is, will only be for
RHEL7.2 and above. So for now you have to find a happy medium between
too many rgrps with a loss of usable space, and rgrps that are too big, with
loss of performance. Using the default for -r is generally best.

> I suspect there will be attempts to migrate to a much bigger block
> devices (f.e. 1TB -> 20TB), but I'd do not concentrate on them now...
> 
> >
> > If you do mkfs.gfs2 and specify -r512, you will be able to use much more
> > of the file system, and it won't get into this problem until much later.
> 
> What could be the rule of thumb for prediction of such errors?
> I mean at which point (in MB or %) we should start to care that we may
> get such error, depending on a rg size? Is there a point until which we
> definitely won't get them?

Well, that's a sliding scale, and the calculations are messy.
(Hence the need to clean them up).

We always recommend implementing GFS2 in a test environment first
before putting it into production, so you can try these things to
see what works best for your use case.

Regards,

Bob Peterson
Red Hat File Systems




More information about the Linux-cluster mailing list