[dm-devel] [RFC] disk doesn't spin down with thin pool + dmeventd

Mon Jan 11 09:21:29 UTC 2016

Dne 9.1.2016 v 12:51 Alan Jenkins napsal(a):
> On 09/01/16 10:07, Alan Jenkins wrote:
>> On 08/01/16 08:17, Zdenek Kabelac wrote:
>> > Dne 7.1.2016 v 20:31 Alan Jenkins napsal(a):
>> >> Hi
>> >>
>> >> I tried using Docker on my Fedora NAS box.  It created a thin pool LV,
>> >> which
>> >> caused hard drive activity every ~10 seconds.
>> >>
>> >> dmeventd queries the thin pool every 10 seconds, and it causes a
>> >> transaction
>> >> commit in order to make sure the statistics are up to date. But
>> >> transactions
>> >> are already supposed to be committed after 1 second. (See
>> >> Documentation/device-mapper/thin-provisioning.txt, "Updating on-disk
>> >> metadata").
>> >>
>> >> It seems like a simple case of "don't do that".  The kernel already
>> >> lets us
>> >> avoid the commit.  How about it (patch below)?  If it seems
>> >> reasonable, I can
>> >> whip up a commit message for it.
>> >>
>> >
>> > Hi
>> >
>> > I believe it's already solved upstream in version 2.02.133
>> > of lvm2 package with this commit:
>> >
>> > 81e9ab3156badecc6a64447708c4ae4886e3c244
>> > Date: Thu Oct 22 12:36:25 2015 +0200
>> >
>> > Which version of lvm2 are you using ?
>> >
>> > Regards
>> >
>> > Zdenek
>>
>> Thanks! That explains a question I had.
>>
>> My patch was based on lvm2 master. The upstream commit applies to the
>> _status_ task, but I applied dm_task_no_flush() to the _wait_ task:
>> DM_DEVICE_WAITEVENT / DM_DEV_WAIT_CMD.
>>
>> I need to test this again, and I shall. (I started testing with version
>> lvm2-2.02.132-2.fc23.x86_64).
>>
>> But from the code, it looks like we need *both* patches to fix the problem.
>> See:
>>
>>
>> 1. dm-ioctl.c: dev_wait() seems to include the exact same code as
>> dev_table_status, specifically the call to __dev_status():
>
> Nevermind.  For me, the upstream commit fixes the problem on its own.
>
> The wait task does get run every 10 seconds.  But the 10 second timeout
> interrupts it before __dev_status() is called.  So setting noflush on the
> status task, had already fixed the problem.
>

Status with flush on thin-pool is supposed to give you 'accurate' value, while 
status without flush is giving you some 'possibly out-of-date' value since not 
all IO are on disk (possibly the one which may cause provisioning).

However for dmeventd we should be quite happy with no-flush variant.

AFAIK I guess even the kernel has been slightly improved here, so if there
is nothing to write, it should also skip disk sync in this case (even
if 'flush' would have remained to be set).

And while 'disk' sleeping was one of the reasons to add this flag, the 'core' 
reason why 'no-flush' basically has to be ATM used with every lvm2/dmeventd 
status call is the blocking nature of this flushing when thin-pool gets 
overfilled (suspend/status needs to be able to return with some appropriate 
return code for this). We need to think about some better ways here - but so 
far this is 'reasonably' good workaround.

Regard

Zdenek