[PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

Vladimir Sementsov-Ogievskiy vsementsov at virtuozzo.com
Fri Oct 16 14:28:42 UTC 2020

15.10.2020 20:37, Andrey Shinkevich wrote:
> On 15.10.2020 18:56, Max Reitz wrote:
>> On 14.10.20 20:57, Andrey Shinkevich wrote:
>>> On 14.10.2020 15:01, Max Reitz wrote:
>>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>>> Limit COR operations by the base node in the backing chain when the
>>>>> overlay base node name is given. It will be useful for a block stream
>>>>> job when the COR-filter is applied. The overlay base node is passed as
>>>>> the base itself may change due to concurrent commit jobs on the same
>>>>> backing chain.
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich at virtuozzo.com>
>>>>> ---
>>>>>    block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>>>    1 file changed, 37 insertions(+), 2 deletions(-)
>>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>>> index c578b1b..dfbd6ad 100644
>>>>> --- a/block/copy-on-read.c
>>>>> +++ b/block/copy-on-read.c
>>>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>>                                               size_t qiov_offset,
>>>>>                                               int flags)
>>>>>    {
>>> [...]
>>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>>>> +                                          state->base_overlay, true,
>>>>> offset,
>>>>> +                                          n, &n);
>>>>> +            if (ret) {
>>>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>>>> +            }
>>>>> +        }
>>>> Furthermore, I just noticed – can the is_allocated functions not return
>>>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>>>    (I’m not sure.)
>>>> Max
>>> The check for EOF is managed earlier in the stream_run() for a
>>> block-stream job. For other cases of using the COR-filter, the check for
>>> EOF can be added to the cor_co_preadv_part().
>>> I would be more than happy if we can escape the duplicated checking for
>>> is_allocated in the block-stream. But how the stream_run() can stop
>>> calling the blk_co_preadv() when EOF is reached if is_allocated removed
>>> from it?
>> True.  Is it that bad to lose that optimization, though?  (And I would
>> expect the case of a short backing file to be rather rare, too.)
>>> May the cor_co_preadv_part() return EOF (or other error code)
>>> to be handled by a caller if (ret == 0 && n == 0 && (flags &
>> That sounds like a bad hack.  I’d rather keep the double is_allocated().
>> But what would be the problem with losing the short backing file
>> optimization?  Just performance?  Or would we end up writing actual
>> zeroes into the overlay past the end of the backing file?  Hm, probably
>> not, if the COR filter would detect that case and handle it like stream
>> does.
>> So it seems only a question of performance to me, and I don’t think it
>> would be that bad to in this rather rare case to have a bunch of useless
>> is_allocated and is_allocated_above calls past the backing file’s EOF.
>> (Maybe I’m wrong, though.)
>> Max
> Thank you, Max, for sharing your thoughts on this subject.
> The double check for the is_allocated in the stream_run() is a performance degradation also.
> And we will make a check for the EOF in the cor_co_preadv_part() in either case, won't we?
> Andrey

I'd keep is_allocated logic in block-stream as is for now. It's not good that we check block-status several times (in block-stream, than in cor filter, than in generic COR code), but it shouldn't be real problem, and we can postpone optimizations for the next step.

Also, the resulting architecture is not final. I believe that in bright future block-stream will work through block-copy like backup does. And COR filter will call block_copy() by itself, and generic COR code will be dropped together with BDRV_REQ_COR flag. And stream will do just one background call of block_copy for the whole device, like backup in finish on my in-flight backup series. And all extra levels of block_status checking will leave.

About EOF problem discussed here, let's look at more generic problem: we are going to skip _large_ area, but skipping chunk-by-chunk is inefficient. So, we just want to learn to skip large areas. The simplest way is just to call is_allocated/is_allocated_above
from current offset to device end, if we decided to skip current chunk. Then we'll know how much to skip totally. But that kind of optimization is not directly related to these series and may be done in separate if needed.

Best regards,

More information about the libvir-list mailing list