[Libguestfs] [libnbd PATCH] states: Never block state machine inside REPLY
Eric Blake
eblake at redhat.com
Tue Jun 25 14:37:54 UTC 2019
On 6/25/19 4:03 AM, Richard W.M. Jones wrote:
> On Wed, Jun 19, 2019 at 01:18:01PM -0500, Eric Blake wrote:
>> Oddly enough, I am not getting any measurable performance difference
>> with this patch applied and using examples/threaded-reads-and-writes
>> coupled with nbdkit. My explanation is that in the common case, once
>> a server has something to send, it is going to send the entire reply
>> as fast as it can, rather than sending a partial reply and waiting;
>> and our attempts to keep up to 64 commands in flight mean that our
>> real bottleneck is not sending our next command (even if we have to
>> wait for the server's reply to finish), but at the server (if we are
>> saturating the server's amount of in-flight requests, the server won't
>> read our next request until it has finished its current reply).
>> Still, I'm sure that it is possible to construct corner cases where
>> this shows more of an effect.
>
> What I'm going to say is I think stating the obvious, but my intuition
> is there are going to be two types of load. In the first and easiest
> case you want to read or write sequentially over the whole or a large
> section of the image (think: qemu-img convert). In this case you know
> well in advance what parts of the image you want to read/write and can
> keep the in flight queue full at all times. This is what
> threaded-reads-and-writes actually tests.
>
> The harder case is random access (think: qemu with a database guest).
> There is a short queue, for example because of data dependencies
> between the requests, so the issued commands buffer is always short.
> Also there may be server latency because of the random access pattern.
> This kind of workload should show the benefit of this commit, but we
> don't really have a test for this kind of workload.
Yes, that's a fair assessment.
>> /* STATE MACHINE */ {
>> REPLY.START:
>> + /* If rlen is non-zero, we are resuming an earlier reply cycle. */
>> + if (h->rlen > 0) {
>> + if (h->reply_cmd) {
>> + assert (nbd_internal_is_state_processing (h->reply_cmd->state));
>> + SET_NEXT_STATE (h->reply_cmd->state);
>
> This is essentially the "stack of states" idea that I had early on,
> but with a maximum stack depth of 1.
>
> I originally rejected the idea then because it means that the
> generator would not longer have a complete view of all state
> transitions in the state machine, and thus wouldn't be able to enforce
> invariants such as not jumping to a state in the middle of a state
> group. (In fact with this patch that is now the case -- I thought the
> generator would give an error about this, but maybe my test is wrong).
>
> Nevertheless given that we need to do this, maybe it's better to drop
> the idea that the generator needs to have a complete view.
>
> In my original plan we would have had "push state" and "pop state"
> operations (pop state is a general jump to "any state", which is why
> it breaks the assumption in the generator).
In addition to only stacking a queue of at most 1 state, this also
limits the stacking to occur during the REPLY sub-state. But I was also
surprised that the generator did not balk at me setting the state to the
contents of a variable rather than to the usual situation of setting it
to an all-caps %FOO magic identifier.
>> +++ b/lib/internal.h
>> @@ -253,6 +253,7 @@ struct command_in_flight {
>> uint32_t count;
>> void *data; /* Buffer for read/write */
>> struct command_cb cb;
>> + enum state state; /* State to resume with on next POLLIN */
>> bool data_seen; /* For read, true if at least one data chunk seen */
>> uint32_t error; /* Local errno value */
>> };
>
> The patch seems reasonable. Does this obviate any need to split the
> state machine?
Yes, as far as I can see, this is all the more we ever have to worry
about. It is possible to determine the entirety of wstate (either we're
blocked under the ISSUE_COMMAND sub-state, or h->request and
h->in_write_payload are sufficient to see what remains on a current
request), and the entirety of rstate (either we're under the REPLY
sub-state, or h->reply_cmd coupled with h->rlen and h->reply_cmd->state
determine what remains on a current reply).
>
> ACK
Okay, I'll go ahead and push this one today.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 484 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190625/d26aa6a2/attachment.sig>
More information about the Libguestfs
mailing list