[dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag

Sun Oct 31 13:27:08 UTC 2021

On 10/29/21 21:08, Darrick J. Wong wrote:
> On Fri, Oct 29, 2021 at 08:23:53PM +0100, Pavel Begunkov wrote:
>> On 10/29/21 17:57, Darrick J. Wong wrote:
>>> On Fri, Oct 29, 2021 at 12:46:14PM +0100, Pavel Begunkov wrote:
>>>> On 10/28/21 23:59, Dave Chinner wrote:
>>>> [...]
>>>>>>> Well, my point is doing recovery from bit errors is by definition not
>>>>>>> the fast path.  Which is why I'd rather keep it away from the pmem
>>>>>>> read/write fast path, which also happens to be the (much more important)
>>>>>>> non-pmem read/write path.
>>>>>>
>>>>>> The trouble is, we really /do/ want to be able to (re)write the failed
>>>>>> area, and we probably want to try to read whatever we can.  Those are
>>>>>> reads and writes, not {pre,f}allocation activities.  This is where Dave
>>>>>> and I arrived at a month ago.
>>>>>>
>>>>>> Unless you'd be ok with a second IO path for recovery where we're
>>>>>> allowed to be slow?  That would probably have the same user interface
>>>>>> flag, just a different path into the pmem driver.
>>>>>
>>>>> I just don't see how 4 single line branches to propage RWF_RECOVERY
>>>>> down to the hardware is in any way an imposition on the fast path.
>>>>> It's no different for passing RWF_HIPRI down to the hardware *in the
>>>>> fast path* so that the IO runs the hardware in polling mode because
>>>>> it's faster for some hardware.
>>>>
>>>> Not particularly about this flag, but it is expensive. Surely looks
>>>> cheap when it's just one feature, but there are dozens of them with
>>>> limited applicability, default config kernels are already sluggish
>>>> when it comes to really fast devices and it's not getting better.
>>>> Also, pretty often every of them will add a bunch of extra checks
>>>> to fix something of whatever it would be.
>>>
>>> So we can't have data recovery because moving fast the only goal?
>>
>> That's not what was said and you missed the point, which was in
>> the rest of the message.
> 
> ...whatever point you were trying to make was so vague that it was
> totally uninformative and I completely missed it.
> 
> What does "callbacks or bit masks" mean, then, specifically?  How
> *exactly* would you solve the problem that Jane is seeking to solve by
> using callbacks?
> 
> Actually, you know what?  I'm so fed up with every single DAX
> conversation turning into a ****storm of people saying NO NO NO NO NO NO
> NO NO to everything proposed that I'm actually going to respond to
> whatever I think your point is, and you can defend whatever I come up
> with.

Interesting, I don't want to break it to you but nobody is going to
defend against yours made up out of thin air interpretations. I think
there is one thing we can relate, I wonder as well what the bloody
hell that opus of yours was

> 
>>>
>>> That's so meta.
>>>
>>> --D
>>>
>>>> So let's add a bit of pragmatism to the picture, if there is just one
>>>> user of a feature but it adds overhead for millions of machines that
>>>> won't ever use it, it's expensive.
> 
> Errors are infrequent, and since everything is cloud-based and disposble
> now, we can replace error handling with BUG_ON().  This will reduce code
> complexity, which will reduce code size, and improve icache usage.  Win!
> 
>>>> This one doesn't spill yet into paths I care about,
> 
> ...so you sail in and say 'no' even though you don't yet care...
> 
>>>> but in general
>>>> it'd be great if we start thinking more about such stuff instead of
>>>> throwing yet another if into the path, e.g. by shifting the overhead
>>>> from linear to a constant for cases that don't use it, for instance
>>>> with callbacks
> 
> Ok so after userspace calls into pread to access a DAX file, hits the
> poisoned memory line and the machinecheck fires, what then?  I guess we
> just have to figure out how to get from the MCA handler (assuming the
> machine doesn't just reboot instantly) all the way back into memcpy?
> Ok, you're in charge of figuring that out because I don't know how to do
> that.
> 
> Notably, RWF_DATA_RECOVERY is the flag that we're calling *from* a
> callback that happens after memory controller realizes it's lost
> something, kicks a notification to the OS kernel through ACPI, and the
> kernel signal userspace to do something about it.  Yeah, that's dumb
> since spinning rust already does all this for us, but that's pmem.
> 
>>>> or bit masks.
> 
> WTF does this even mean?
> 
> --D
> 
>>>>
>>>>> IOWs, saying that we shouldn't implement RWF_RECOVERY because it
>>>>> adds a handful of branches 	 the fast path is like saying that we
>>>>> shouldn't implement RWF_HIPRI because it slows down the fast path
>>>>> for non-polled IO....
>>>>>
>>>>> Just factor the actual recovery operations out into a separate
>>>>> function like:

-- 
Pavel Begunkov