[libvirt] PATCH: Disable QEMU drive caching

Wed Oct 8 16:33:37 UTC 2008

On Wed, Oct 08, 2008 at 11:06:27AM -0500, Anthony Liguori wrote:
> Daniel P. Berrange wrote:
> >On Wed, Oct 08, 2008 at 01:15:46PM +0200, Chris Lalancette wrote:
> >>Daniel P. Berrange wrote:
> >>>QEMU defaults to allowing the host OS to cache all disk I/O. THis has a
> >>>couple of problems
> >>>
> >>> - It is a waste of memory because the guest already caches I/O ops
> >>> - It is unsafe on host OS crash - all unflushed guest I/O will be
> >>>   lost, and there's no ordering guarentees, so metadata updates could
> >>>   be flushe to disk, while the journal updates were not. Say goodbye
> >>>   to your filesystem.
> >>> - It makes benchmarking more or less impossible / worthless because
> >>>   what the benchmark things are disk writes just sit around in memory
> >>>   so guest disk performance appears to exceed host diskperformance.
> >>>
> >>>This patch disables caching on all QEMU guests. NB, Xen has long done 
> >>>this
> >>>for both PV & HVM guests - QEMU only gained this ability when -drive was
> >>>introduced, and sadly kept the default to unsafe cache=on settings.
> >>I'm for this in general, but I'm a little worried about the "performance
> >>regression" aspect of this.  People are going to upgrade to 0.4.7 (or 
> >>whatever),
> >>and suddenly find that their KVM guests perform much more slowly.  This is
> >>better in the end for their data, but we might hear large complaints 
> >>about it.
> >
> >Yes & no. They will find their guests perform more consistently. With the
> >current system their guests will perform very erratically depending on 
> >memory & I/O pressure on the host. If the host I/O cache is empty & has 
> >no I/O load, current guests will be "fast",
> 
> They will perform marginally better than if cache=off.  This is the 
> Linux host knows more about the underlying hardware than the guest and 
> is able to do smarter read-ahead.  When using cache=off, the host cannot 
> perform any sort of read-ahead.
> 
> >but if host I/O cache is full
> >and they do something which requires more host memory (eg start up another
> >guest), then all existing guests get their I/O performance trashed as the
> >I/O cache has to be flushed out, and future I/O is unable to be cached. 
> 
> This is not accurate.  Dirty pages in the host page cache are not 
> reclaimable until they're written to disk.  If you're in a seriously low 
> memory situation, they the thing allocating memory is going to sleep 
> until the data is written to disk.  If an existing guest is trying to do 
> I/O, then what things will degenerate to is basically cache=off since 
> the guest must wait for other pending IO to complete
> 
> >Xen went through this same change and there were not any serious
> >complaints, particularly when explained that previous system had
> >zero data integrity guarentees. The current system merely provides an
> >illusion of performance - any attempt to show that performance has 
> >decreased is impossible because any attempt to run benchmarks with
> >existing caching just results in meaningless garbage.
> >
> >https://bugzilla.redhat.com/show_bug.cgi?id=444047
> 
> I can't see this bug, but a quick grep of ioemu in xen-unstable for 
> O_DIRECT reveals that they are not in fact using O_DIRECT.

Sorry, it was mistakenly private - fixed now. 

Xen does use O_DIRECT for paravirt driver case  - blktap is using the combo
of AIO+O_DIRECT. QEMU code is only used for the IDE emulation case which 
isn't interesting from a performance POV.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|