[dm-devel] trouble with generic/081

Thu Jan 5 19:13:08 UTC 2017

Dne 5.1.2017 v 19:24 Eric Sandeen napsal(a):
> (dropping fstests list)
>
> On 1/5/17 4:35 AM, Zdenek Kabelac wrote:
>> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>>>
>>>
>>> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>>>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>
> ...
>
>>>> What XFS did on IRIX was to let the volume manager call into the fs
>>>> and shut it down.  At this point no further writes are possible,
>>>> but we do not expose the namespace under the mount point, and the
>>>> admin can fix the situation with all the normal tools.
>>>
>>> <late to the party>
>>>
>>> Is there a need for this kind of call-up when xfs now has the configurable
>>> error handling so that it will shut down after X retries or Y seconds
>>> of a persistent error?
>>
>>
>> We need likely to open  RFE bugzilla  here - and specify how it should
>> work when some conditions are met.
>
> We need volume manager people & filesystem people to coordinate a solution,
> bugzillas are rarely the best place to do that.  ;)

IMHO it's usually better then sending list to vairous list -
we need all facts single place instead of looking them out in lists.

>
>> Current 'best effort' tries to minimize damage by trying to do a full-stop
>> when pool approaches 95% fullness. Which is relatively 'low/small'
>> for small sized thin-pool - but there is reasonable big free space
>> for commonly sized thin-pool to be able to flush most of page cache
>> on disk before things will go crazy.
>
> Sounds like pure speculation. "95%" says nothing about actual space left
> vs. actual amount of outstanding buffered IO.

It's quite similar approach as when the filesystem has reserved some space for 
'root' user - to simply proceed when user exhausted things in fs.

>> Now - we could probably detect presence of kernel version and
>> xfs/ext4 present features - and change reactions.
>
> Unlikely.  Kernel version doesn't mean anything when distros are
> involved.  Many features are not advertised in any way.

Aren'y those 'new' features exposed by  /sysfs in some way ?

>>
>> We need to know what to do with 3.X kernel, 4.X kernel and present
>> features in kernel and how we can detect them in runtime.
>
> Like Mike said, we need to make upstream work, first.
>
> Distros can figure out where to go from there.
>

lvm2 upstream is  'distro' & 'kernel'  independent.

It is designed to COVER-UP known kernel bugs and detect present features.
It's the design purpose of lvm2 and it's KEY feature of lvm2.

So we can't just  'drop' existing users because we like new 4.X kernel so much.
(Yet we may issue a serious WARNING message when user is using something
with bad consequences for him)

> Anyway, at this point I'm not convinced that anything but the filesystem
> should be making decisions based on storage error conditions.

So far I'm not convinced  doing nothing is better then trying at least unmount.

Since doing nothing is known to cause  SEVERE filesystem damages,
while I've haven't heard about them when 'unmount' is in the field.

Users are not happy - but usually filesystem is repairable when new space
is added. (Note here -  user are using couple LVs and usually have
some space left to succeed with flush & umount)

>
> I think unmounting the filesystem is a terrible idea, and hch & mike
> seem to agree.  It's problematic in many ways.

So let's remain core trouble -

Data-exhausted thinpool allows  'fs' user to write to provisioned space - 
while error-out on non-provisioned/missing.

If the filesystem is not immediately stopped on the 1st. such error (like 
remount-ro does for ext4) it continues to destroy itself to major degree  as 
after reboot the non-provisioned space may actually be there  - as user do 
typically use snapshots - and write requires provisioning new space - but old 
space remains there - as thin volume metadata do not point to 'non-existing' 
block for failed provisioning - but the old existing one before error.

This puts  filesystem in rather 'tragical' situation as it reads data out of 
thin volume without knowing how consistent they are - i.e. some  mixture of 
old and new data.

I've proposed couple things - i.e.:

Configurable option that 1st. provisioning error makes ALL further 'writes' to 
thin failing - this solves filesystem repair trouble - but  was not seen as 
good idea by Mike as this would complicate logic in thinp target.

We could possibly implement this by remapping tables via lvm - but it's not
quite easy to provide such feature.

We could actually put 'error' targets instead of thins  - and let filesystem
deals with it - but still some older XFS basically OOM later without telling
user a word how bad is that   (seen users with lots of RAM and working for 2 
days....) unless user monitors syslog for stream or write errors.

>
> I'm not super keen on shutting down the filesystem, for similar reasons,
> but I have a more open mind about that because the implications to the
> system are not so severe.

Yes -   instant 'shutdown' is nice option - expect lot of users
are not using thin for their root volume - just for some data volume (virtual 
machines), so killing machine is quite major obstruction then  - unmount is 
just a tiny bit nicer.

> Upstream now has better xfs error handling configurability.  Have you
> tested with that?  (for that matter, what thinp test framework exists
> on the lvm2/dm side?  We currently have only minimal testing fstests,
> to be honest.  Until we have a framework to test against this seems likely
> to continue going in theoretical circles.)

See i.e.  lvm2/tests/shell  subdir

Regards

Zdenek