[Libguestfs] [libnbd PATCH] states: Never block state machine inside REPLY

Tue Jun 25 14:37:54 UTC 2019

On 6/25/19 4:03 AM, Richard W.M. Jones wrote:
> On Wed, Jun 19, 2019 at 01:18:01PM -0500, Eric Blake wrote:
>> Oddly enough, I am not getting any measurable performance difference
>> with this patch applied and using examples/threaded-reads-and-writes
>> coupled with nbdkit.  My explanation is that in the common case, once
>> a server has something to send, it is going to send the entire reply
>> as fast as it can, rather than sending a partial reply and waiting;
>> and our attempts to keep up to 64 commands in flight mean that our
>> real bottleneck is not sending our next command (even if we have to
>> wait for the server's reply to finish), but at the server (if we are
>> saturating the server's amount of in-flight requests, the server won't
>> read our next request until it has finished its current reply).
>> Still, I'm sure that it is possible to construct corner cases where
>> this shows more of an effect.
> 
> What I'm going to say is I think stating the obvious, but my intuition
> is there are going to be two types of load.  In the first and easiest
> case you want to read or write sequentially over the whole or a large
> section of the image (think: qemu-img convert).  In this case you know
> well in advance what parts of the image you want to read/write and can
> keep the in flight queue full at all times.  This is what
> threaded-reads-and-writes actually tests.
> 
> The harder case is random access (think: qemu with a database guest).
> There is a short queue, for example because of data dependencies
> between the requests, so the issued commands buffer is always short.
> Also there may be server latency because of the random access pattern.
> This kind of workload should show the benefit of this commit, but we
> don't really have a test for this kind of workload.

Yes, that's a fair assessment.

>>  /* STATE MACHINE */ {
>>   REPLY.START:
>> +  /* If rlen is non-zero, we are resuming an earlier reply cycle. */
>> +  if (h->rlen > 0) {
>> +    if (h->reply_cmd) {
>> +      assert (nbd_internal_is_state_processing (h->reply_cmd->state));
>> +      SET_NEXT_STATE (h->reply_cmd->state);
> 
> This is essentially the "stack of states" idea that I had early on,
> but with a maximum stack depth of 1.
> 
> I originally rejected the idea then because it means that the
> generator would not longer have a complete view of all state
> transitions in the state machine, and thus wouldn't be able to enforce
> invariants such as not jumping to a state in the middle of a state
> group.  (In fact with this patch that is now the case -- I thought the
> generator would give an error about this, but maybe my test is wrong).
> 
> Nevertheless given that we need to do this, maybe it's better to drop
> the idea that the generator needs to have a complete view.
> 
> In my original plan we would have had "push state" and "pop state"
> operations (pop state is a general jump to "any state", which is why
> it breaks the assumption in the generator).

In addition to only stacking a queue of at most 1 state, this also
limits the stacking to occur during the REPLY sub-state.  But I was also
surprised that the generator did not balk at me setting the state to the
contents of a variable rather than to the usual situation of setting it
to an all-caps %FOO magic identifier.

>> +++ b/lib/internal.h
>> @@ -253,6 +253,7 @@ struct command_in_flight {
>>    uint32_t count;
>>    void *data; /* Buffer for read/write */
>>    struct command_cb cb;
>> +  enum state state; /* State to resume with on next POLLIN */
>>    bool data_seen; /* For read, true if at least one data chunk seen */
>>    uint32_t error; /* Local errno value */
>>  };
> 
> The patch seems reasonable.  Does this obviate any need to split the
> state machine?

Yes, as far as I can see, this is all the more we ever have to worry
about. It is possible to determine the entirety of wstate (either we're
blocked under the ISSUE_COMMAND sub-state, or h->request and
h->in_write_payload are sufficient to see what remains on a current
request), and the entirety of rstate (either we're under the REPLY
sub-state, or h->reply_cmd coupled with h->rlen and h->reply_cmd->state
determine what remains on a current reply).

> 
> ACK

Okay, I'll go ahead and push this one today.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 484 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190625/d26aa6a2/attachment.sig>