[linux-lvm] Add udev-md-raid-safe-timeouts.rules
Wol's lists
antlists at youngman.org.uk
Mon Apr 16 15:02:26 UTC 2018
On 16/04/18 12:43, Austin S. Hemmelgarn wrote:
> On 2018-04-15 21:04, Chris Murphy wrote:
>> I just ran into this:
>> https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec
>>
>>
>> This solution is inadequate, can it be made more generic? This isn't
>> an md specific problem, it affects Btrfs and LVM as well. And in fact
>> raid0, and even none raid setups.
>>
>> There is no good reason to prevent deep recovery, which is what
>> happens with the default command timer of 30 seconds, with this class
>> of drive. Basically that value is going to cause data loss for the
>> single device and also raid0 case, where the reset happens before deep
>> recovery has a chance. And even if deep recovery fails to return user
>> data, what we need to see is the proper error message: read error UNC,
>> rather than a link reset message which just obfuscates the problem.
>
> This has been discussed at least once here before (probably more times,
> hard to be sure since it usually comes up as a side discussion in an
> only marginally related thread).
Sorry, but where is "here"? This message is cross-posted to about three
lists at least ...
Last I knew, the consensus here was
> that it needs to be changed upstream in the kernel, not by adding a udev
> rule because while the value is technically system policy, the default
> policy is brain-dead for anything but the original disks it was
> i9ntended for (30 seconds works perfectly fine for actual SCSI devices
> because they behave sanely in the face of media errors, but it's
> horribly inadequate for ATA devices).
>
> To re-iterate what I've said before on the subject:
>
imho (and it's probably going to be a pain to implement :-) there should
be a soft time-out and a hard time-out. The soft time-out should trigger
"drive is taking too long to respond" messages that end up in a log - so
that people who actually care can keep a track of this sort of thing.
The hard timeout should be the current set-up, where the kernel just
gives up.
Cheers,
Wol
More information about the linux-lvm
mailing list