[Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency

Sat May 21 16:37:10 UTC 2022

On May 21 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote:
> On Sat, May 21, 2022 at 01:21:11PM +0100, Nikolaus Rath wrote:
>> Hi,
>>
>> How does the blocksize filter take into account writes that end-up
>> overlapping due to read-modify-write cycles?
>>
>> Specifically, suppose there are two non-overlapping writes handled
>> by two different threads, that, due to blocksize requirements,
>> overlap when expanded.  I think there is a risk that one thread may
>> partially undo the work of the other here.
>>
>> Looking at the code, it seems that writes of unaligned heads and
>> tails are protected with a global lock., but writes of aligned data
>> can occur concurrently.
>
> I agree.
>
> Assuming the underlying plugin is NBDKIT_THREAD_MODEL_PARALLEL and no
> other filters impose thread model limits, the blocksize filter does
> not limit the thread model, so the thread model of nbdkit would also
> be NBDKIT_THREAD_MODEL_PARALLEL.
>
> That means that two writes either on different connections or
> pipelined on the same connection could happen at the same time.
> “blocksize_pwrite” would be called concurrently for the two requests.
>
>> However, does this not miss the case where there is one unaligned
>> write that overlaps with an aligned one?
>>
>> For example, with blocksize 10, we could have:
>> 
>> Thread 1: receives write request for offset=0, size=10
>> Thread 2: receives write request for offset=4, size=16
>> Thread 1: acquires lock, reads bytes 0-4
>> Thread 2: does aligned write (no locking needed), writes bytes 0-10
>> Thread 1: writes bytes 0-10, overwriting data from Thread 2
>
> I believe this analysis is correct.  (CC'd to Eric who knows a lot
> more about this.)
>
> However I don't think it's a bug.  If a client doesn't want writes to
> squash each other, then it shouldn't send overlapping requests.  I bet
> the same thing happens with an SSD.

But the requests are not overlapping from the client point of view. They
only become overlapping when the server applies its read-modify-write
operation to align them to the blocksize.

I think you elsewhere said that the blocksize reported by the NBD server
is only a preferred blocksize, so I'd be surprised if not following this
"preference" results in data corruption.

> NBD_CMD_FLAG_FUA is provided for clients that wish to ensure that a
> write has been committed before sending another request.
>
> Do you have an example of a client which sends overlapping requests
> and depends on particular behaviour of the server?  You may be able to
> get it to work by using nbdkit-noparallel-filter which can be used to
> serialize nbdkit.

I'm working with the kernel's NBD client, and it would explain all the
mysterious data corruption issues that I've seen with the S3 plugin. But
I have not yet confirmed definitely that this is the root cause.

For now, I'll avoid the blocksize filter and instead do the
read-modify-write in the plugin with proper locking. If that fixes it,
then I think we can conclude that the kernel is sending such requests
(but, as I said above, I would not consider them overlapping nor would I
consider this a bug).

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«