[libvirt] PATCH: Disable QEMU drive caching

Wed Oct 8 16:06:27 UTC 2008

Daniel P. Berrange wrote:
> On Wed, Oct 08, 2008 at 01:15:46PM +0200, Chris Lalancette wrote:
>> Daniel P. Berrange wrote:
>>> QEMU defaults to allowing the host OS to cache all disk I/O. THis has a
>>> couple of problems
>>>
>>>  - It is a waste of memory because the guest already caches I/O ops
>>>  - It is unsafe on host OS crash - all unflushed guest I/O will be
>>>    lost, and there's no ordering guarentees, so metadata updates could
>>>    be flushe to disk, while the journal updates were not. Say goodbye
>>>    to your filesystem.
>>>  - It makes benchmarking more or less impossible / worthless because
>>>    what the benchmark things are disk writes just sit around in memory
>>>    so guest disk performance appears to exceed host diskperformance.
>>>
>>> This patch disables caching on all QEMU guests. NB, Xen has long done this
>>> for both PV & HVM guests - QEMU only gained this ability when -drive was
>>> introduced, and sadly kept the default to unsafe cache=on settings.
>> I'm for this in general, but I'm a little worried about the "performance
>> regression" aspect of this.  People are going to upgrade to 0.4.7 (or whatever),
>> and suddenly find that their KVM guests perform much more slowly.  This is
>> better in the end for their data, but we might hear large complaints about it.
> 
> Yes & no. They will find their guests perform more consistently. With the
> current system their guests will perform very erratically depending on 
> memory & I/O pressure on the host. If the host I/O cache is empty & has 
> no I/O load, current guests will be "fast",

They will perform marginally better than if cache=off.  This is the 
Linux host knows more about the underlying hardware than the guest and 
is able to do smarter read-ahead.  When using cache=off, the host cannot 
perform any sort of read-ahead.

> but if host I/O cache is full
> and they do something which requires more host memory (eg start up another
> guest), then all existing guests get their I/O performance trashed as the
> I/O cache has to be flushed out, and future I/O is unable to be cached. 

This is not accurate.  Dirty pages in the host page cache are not 
reclaimable until they're written to disk.  If you're in a seriously low 
memory situation, they the thing allocating memory is going to sleep 
until the data is written to disk.  If an existing guest is trying to do 
I/O, then what things will degenerate to is basically cache=off since 
the guest must wait for other pending IO to complete

> Xen went through this same change and there were not any serious
> complaints, particularly when explained that previous system had
> zero data integrity guarentees. The current system merely provides an
> illusion of performance - any attempt to show that performance has 
> decreased is impossible because any attempt to run benchmarks with
> existing caching just results in meaningless garbage.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=444047

I can't see this bug, but a quick grep of ioemu in xen-unstable for 
O_DIRECT reveals that they are not in fact using O_DIRECT.

O_DIRECT, O_SYNC, and fsync are not the same mechanism.

Regards,

Anthony Liguori