[Libguestfs] [nbdkit PATCH 2/4] truncate: Factor out reading real_size under mutex

Thu Apr 25 18:16:50 UTC 2019

On 4/25/19 9:07 AM, Richard W.M. Jones wrote:
> On Wed, Apr 24, 2019 at 05:24:39PM -0500, Eric Blake wrote:
>> Add a helper function get_real_size() to make it easier to add sanity
>> checking that mutex calls don't fail.
> 
> This patch is fine, ACK.
> 
>> I'm a bit unsure why truncate.c needs a lock anyway. I guess it's
>> because we are storing 'size' and 'real_size' as globals rather than
>> per-connection values in the handle, and we're worrying about parallel
>> connections where we don't want one thread reading inconsistent values
>> while another thread is in the middle of truncate_get_size()? At any
>> rate, as long as we don't have dynamic resize in the NBD protocol,
>> size can't change within the confines of a single connection. And even
>> if we DO assume that the underlying plugin reports different sizes for
>> different connections, our use of a global does not play well (if
>> client A sees size 1k, then client B sees size 2k, that changes client
>> A's use of 'real_size' to be something different than client A was
>> expecting).
>>
>> So, I think we have to move 'size' and 'real_size' into the
>> per-connection handle, at which point, can't we just set them once
>> during truncate_prepare by moving the existing guts of
>> truncate_get_size there, and then letting truncate_get_size just
>> return h->size while all other truncate_* functions just access
>> h->real_size without a lock?
> 
> Even storing fields in the handle is not necessarily safe when using
> the parallel thread model, because you can have multiple requests
> outstanding on the same handle.  (However I believe it is safe given
> the current implementation of get_size / lack of resize support in
> nbdkit.)

Looking to the future, I'm wondering how we would safely handle any
resize actions. Our current code was locking only around the time it
read 'real_size', but if you have:

thread A      thread B
.pread
 - lock
 - read real_size
 - unlock
             .resize
               -lock
               - call next_ops->resize
               - update real_size
               - unlock
 - call next_ops->pread

we still have a bug where the pread is operating on the wrong size.
Basically, we need a way to prevent resize until there are no
outstanding ops, which means a lock around the access of real_size is
insufficient, it has to be a lock around the whole of .pread; but
locking like that serializes us unless we use a pthread_rwlock instead
of a mutex (.resize grabs the rwlock for write, all other ops grab the
rwlock for read, and the lock is owned for the entire function rather
than just the copying of 'real_size').  So implementing resize already
needs a change in locking, hence dropping the existing mutex lock in the
short term isn't costing us any wasted code churn.

Thus, I'm leaning towards rewriting this patch to track per-handle
sizing without a lock, along with a big comment about how a lock is
necessary if we implement resize support. I'll go ahead and push the
other three patches, as well as audit for any other filters mistakenly
relying on global instead of per-handle state.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190425/29851805/attachment.sig>