[libvirt] PATCH: Disable QEMU drive caching

Steve Ofsthun sofsthun at virtualiron.com
Wed Oct 8 19:03:54 UTC 2008


Anthony Liguori wrote:
> Daniel P. Berrange wrote:
>> On Wed, Oct 08, 2008 at 11:06:27AM -0500, Anthony Liguori wrote:
>>   Sorry, it was mistakenly private - fixed now.
>> Xen does use O_DIRECT for paravirt driver case  - blktap is using the
>> combo
>> of AIO+O_DIRECT.
> 
> You have to use O_DIRECT with linux-aio.  And blktap is well known to
> have terrible performance.  Most serious users use blkback/blkfront and
> blkback does not avoid the host page cache.  It maintains data integrity
> by passing through barriers from the guest to the host.  You can
> approximate this in userspace by using fdatasync.

This is not accurate (at least for HVM guests using PV drivers on Xen 3.2).  blkback does indeed bypass the host page cache completely.  It's I/O behavior is akin to O_DIRECT.  I/O is dma'd directly to/from guest pages without involving any dom0 buffering.  blkback barrier support only enforces write ordering of the blkback I/O stream(s).  It does nothing to synchronize data in the host page cache.  Data written through blkback will modify the storage "underneath" any data in the host page cache (w/o flushing the page cache).  Subsequent access to the page cache by qemu-dm will access stale data.  In our own Xen product we must explicitly flush the host page cache backing store data at qemu-dm start up, to guarantee proper data access.  It is not safe to access the same backing object with both qemu-dm and blkback simultaneously.

> The issue the bug addresses, iozone performs better than native, can be
> addressed in the following way:
> 
> 1) For IDE, you have to disable write-caching in the guest.  This should
> force an fdatasync in the host.
> 2) For virtio-blk, we need to implement barrier support.  This is what
> blkfront/blkback do.

I don't think this is enough.  Barrier semantics are local to a particular I/O stream.  There would be no reason for the barrier to affect the host page cache (unless the I/Os are buffered by the cache).

> 3) For SCSI, we should support ordered queuing which would result in an
> fdatasync when barriers are injected.
> 
> This would result in write performanc> e being what was expected in the
> guest while still letting the host coalesce IO requests, perform
> scheduling with other guests (while respecting each guest's own ordering
> requirements).

I generally agree with your suggestion that host page cache performance benefits shouldn't be discarded just to make naive benchmark data collection easier.  Anyone suggesting that QEMU emulated disk I/O could somehow outperform the host I/O system should know that something is wrong with their benchmark setup.  Unfortunately this discussion continues to reappear in the Xen community.  I am sure that as QEMU/KVM/virtio matures, a similar thread will continue to resurface.

Steve

> 
> Regards,
> 
> Anthony LIguori
> 
>>  QEMU code is only used for the IDE emulation case which isn't
>> interesting from a performance POV.
>>
>> Daniel
>>   
> 
> -- 
> Libvir-list mailing list
> Libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list