[libvirt] [PATCH v2 11/38] Introduce virStreamSkip

Michal Privoznik mprivozn at redhat.com
Mon May 15 08:54:03 UTC 2017


On 05/15/2017 10:25 AM, Daniel P. Berrange wrote:
> On Fri, May 12, 2017 at 09:29:27AM +0200, Michal Privoznik wrote:
>> On 05/05/2017 04:48 PM, Daniel P. Berrange wrote:
>>> On Fri, May 05, 2017 at 01:25:34PM +0200, Michal Privoznik wrote:
>>>> On 05/04/2017 11:29 PM, John Ferlan wrote:
>>>>>
>>>>>
>>>>> On 04/20/2017 06:01 AM, Michal Privoznik wrote:
>>>>>> This API can be used to tell the other side of the stream to skip
>>>>>
>>>>> s/can be/is  (unless it can be used for something else ;-))
>>>>>
>>>>>> some bytes in the stream. This can be used to create a sparse
>>>>>> file on the receiving side of a stream.
>>>>>>
>>>>>> It takes just one argument @length, which says how big the hole
>>>>>> is. Since our streams are not rewindable like regular files, we
>>>>>> don't need @whence argument like seek(2) has.
>>>>>
>>>>> lseek is an implementation detail...  However, it could be stated that
>>>>> the skipping would be from the current point in the file forward by some
>>>>> number of bytes.  It's expected to be used in conjunction with code that
>>>>> is copying over the real (or non-zero) data and should be considered an
>>>>> optimization over sending zere data segments.
>>>>>
>>>>>>
>>>>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>>>>> ---
>>>>>>  include/libvirt/libvirt-stream.h |  3 +++
>>>>>>  src/driver-stream.h              |  5 ++++
>>>>>>  src/libvirt-stream.c             | 57 ++++++++++++++++++++++++++++++++++++++++
>>>>>>  src/libvirt_public.syms          |  1 +
>>>>>>  4 files changed, 66 insertions(+)
>>>>>>
>>>>>
>>>>> While it would be unused for now, should @flags be added.  Who knows
>>>>> what use it could have, but avoids a new Flags API, but does cause a few
>>>>> other wording changes here.
>>>>
>>>> Ah sure. We should have @flags there. Good point.
>>>>
>>>>>
>>>>> Perhaps it's just me - but "Skip" and "HoleSize" just seem awkward.
>>>>> Would "virStreamSetSkip" and "virStreamGetSkip" be more apropos? (or
>>>>> s/Skip/HoleSize/ - ewww). Names would then follow our more recent
>>>>> function naming guidelines. I think I dislike the HoleSize much more
>>>>> than the Skip.
>>>>
>>>> SetSkip and GetSkip sound wrong to me instead :D
>>>>
>>>>>
>>>>>> diff --git a/include/libvirt/libvirt-stream.h b/include/libvirt/libvirt-stream.h
>>>>>> index bee2516..4e0a599 100644
>>>>>> --- a/include/libvirt/libvirt-stream.h
>>>>>> +++ b/include/libvirt/libvirt-stream.h
>>>>>> @@ -50,6 +50,9 @@ int virStreamRecvFlags(virStreamPtr st,
>>>>>>                         size_t nbytes,
>>>>>>                         unsigned int flags);
>>>>>>  
>>>>>> +int virStreamSkip(virStreamPtr st,
>>>>>> +                  unsigned long long length);
>>>>>
>>>>> Was there consideration for using 'off_t' instead of ULL? I know it's an
>>>>> implementation detail of virFDStreamData and lseek() usage, but it does
>>>>> hide things... IDC either way.
>>>>
>>>> The problem with off_t is that it is signed type, while ULL is unsigned.
>>>> There's not much point in sending a negative offset, is there?
>>>> Moreover, we use ULL for arguments like offset (not sure really why).
>>>> Frankly, I don't really know why. Perhaps some types don't exist everywhere?
>>>
>>> If anything, we would use  size_t, for consistency with the Send/Recv
>>> methods. 
>>
>> So I've given this some though and ran some experiments. On a 32bit arch
>> I've found this:
>>
>> long long 8 signed
>> size_t 4 unsigned
>> ssize_t 4 signed
>> off_t 4 signed
>>
>> So size_t is 4 bytes long and long long is 8 bytes. This got me
>> thinking, size_t type makes sense for those APIs where we need to
>> address individual bytes. But what would happen if I have the following
>> file on a 32 bit arch:
>>
>> [2MB data] -> [5GB hole] -> [2M data]
>>
>> The hole does not fit into size_t, but it would fit into long long. On
>> the other hand, we are very likely to hit lseek() error as off_t is 4
>> bytes also (WTF?!). On a 64 bit arch everything is as expected:
> 
> Were you testing libvirt, or testing a standalone demo program ?

Standalone demo program, that basically does this:

#define PRINT_SIZE(x)   printf( #x " %d %s\n", sizeof(x), (x)-1 < 0 ?
"signed" : "unsigned")

PRINT_SIZE(long long);
PRINT_SIZE(size_t);
PRINT_SIZE(off_t);


> 
> Libvirt should be defining the macro that enables large file support,
> which turns off_t into a 64-bit type.  So in fact, we would actually
> be wrong to use size_t - off_t or long long is actually what we need
> here.

I think we really need just long long. Consider that even though libvirt
enables large file support internally, we don't enable it at the header
files level. It's responsibility of developers of mgmt application to do
that. Having said that, if we had off_t in public API we can't possibly
know its size as it depends on something out of our control. Therefore,
I see long long as our only option.

Now, question is whether we want signed or unsigned long long. I don't
have an opinion about that. On one hand, off_t is signed, but that's
because lseek() can seek backwards. We don't have that in our streams.
Yet. On the other hand, our streams are different to regular files. I
view them as a unidirectional pipe. With some extensions (e.g. sparse
messages). lseek() doesn't work over pipes, does it. But then again,
long long might be more future proof, if we will ever want to assign a
meaning to negative seeks.

Michal




More information about the libvir-list mailing list