[dm-devel] trouble with generic/081

Dave Chinner david at fromorbit.com
Thu Jan 5 22:46:00 UTC 2017


On Thu, Jan 05, 2017 at 10:12:25PM +0100, Zdenek Kabelac wrote:
> Dne 5.1.2017 v 20:29 Eric Sandeen napsal(a):
> >On 1/5/17 1:13 PM, Zdenek Kabelac wrote:
> >>>Anyway, at this point I'm not convinced that anything but the filesystem
> >>>should be making decisions based on storage error conditions.
> >>
> >>So far I'm not convinced  doing nothing is better then trying at least unmount.
> >>
> >>Since doing nothing is known to cause  SEVERE filesystem damages,
> >>while I've haven't heard about them when 'unmount' is in the field.
> >
> >I'm pretty sure that's exactly what started this thread.  ;)
> >
> >Failing IOs should never cause "severe filesystem damage" - that is what
> >a journaling filesystem is /for/.  Can you explain further?
> 
> well all I know are user reports - which we capable to use 'XFS'
> with exhausted  thin-pool while  having 'snapshots' of their volumes.
> 
> Since there was no 'umount' and  XFS upon write error just retried
> endlessly to write block over and over -  system appeared

Which has already been fixed upstream.

And my 2c worth on the "lvm unmounting filesystems on error" - stop
it, now. It's the wrong thing to do, and it makes it impossible for
filesystems to handle the error and recover gracefully when
possible.

> to the users nice & usable for quite long time (especially when
> boxes had 32G of RAM or more...)
> 
> Maybe writes passed to 'uniquely' owned blocs....
> 
> Then after some day,two,free   OOM finally killed.
> Users realized thin-pool was out-of-space - added room to VG and pool
> and tried  xfs_repair - but whole FS was largely lost.

That sounds very much like a block device snapshot corruption
problem, not a filesystem problem. As always, the filesystem gets
blamed for data loss, regardless of where the problem really lies.

> Use  LV and make some thin snapshots.
> 
> Then change various parts of origin - at various moment before pool
> is out-of-space
> 
> So you will get lots of different scenarios of missing data.
> 
> You will mostly not get into those mentioned trouble if you
> have just single thinLV and you exhaust thin-pool while using it.
> 
> Games with snapshot are needed.

This really sounds like a problem with snapshot ENOSPC error
handling, not a filesystem issue - the filesystem is simply the
messenger here...

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com




More information about the dm-devel mailing list