[libvirt] [virt-devel] RFC: Modelling timers / clocks & tick policies in libvirt
Daniel P. Berrange
berrange at redhat.com
Fri Mar 5 17:05:06 UTC 2010
On Fri, Mar 05, 2010 at 06:50:47AM -1000, Zachary Amsden wrote:
> On 03/05/2010 04:27 AM, Daniel P. Berrange wrote:
> >
> > * HPET
> > Multiple timers with periodic interrupts
> > Can replace PIT/RTC timers
> >
> >They all generally suck in real hardware, and this gets worse in virtual
> >machines.
> >Many different approaches to making them suck less in VMWare, Xen& KVM,
> >but there
> >are some reasonably common concepts....
> >
>
> HPET doesn't suck.
The VMWare timekeeping docs mentions that it has timeout race conditions,
poorly defined spec for timer granularity, drift & speed of access, & bad
implementations in the real world which I read as 'sucks' ;-)
> > * Interrupt timers
> >
> > - Ticks can not always be delivered on time
> >
> > Policies to deal with "missed" ticks:
> >
> > 1. Deliver at normal rate without catchup
> > 2. Deliver at higher rate to catch up
> > 3. Merge into 1 tick& deliver asap
> >
> > 4. Discard all missed ticks
> >
>
> The issue is actually more complex than just these policies. A naive
> implementation of the policy leads to a guest DOS of the host.
>
> We actually have such a bug, and it demands a policy which merges ticks
> over a certain threshold and does not deliver ASAP. It's tricky and
> complex to fix because it means our notion of timers for the guest is
> wrong, and we need to introduce a higher order scheduling behaviour.
>
> In general, there isn't much we can tune here, but what we can tune is
> whether the other counters (RTC / HPET / TSC / ACPI) stay in sync with
> ticks delivered. It's not perfect or completely well defined because
> the tick can't actually be delivered until a fairly complex set of
> hardware rules is obeyed. This may not be apparent now, because it gets
> worse as we implement more hardware support for NMIs and SMIs. An ideal
> solution would sync the other counters when the tick is generated, not
> when it is injected. However, this leads us back to the DOS attack.
> There are also problems with SMP timing here (which CPU gets timer
> interrupts can change, and are they broadcast?). These problems are
> made worse because we don't gang schedule.
FYI, I wasn't trying to suggest good / bad policies here. I was just
attempting to document the policies that I see have been implemented
so far. For the libvirt XML the key issue is to identify a way to
list possible policies that can be extended as new one appear in
hypervisors.
> > * TSC
> > - rdtsc instruction can be exposed to guests in two ways
> >
> > 1. Trap + emulate (slow, but more reliable)
> > 2. Native (fast, but possibly unreliable)
> >
> > Optionally also expose a 'rdtscp' instruction
> >
> > Possiblly set a fixed HZ independant of host.
> >
>
> There is also
>
> 3) a mixed approach; trap and emulate only when required, allow native
> access and offset appropriately at each exit; and
>
> 4) a SMP safe approach; trap and emulate always, and interlock SMP
> access to the clock so it is globally consistent
>
> 5) a secure approach; trap and emulate always and hide host time. This
> precludes the possibility of SMP, as timing differences can be observed
> since we don't gang schedule. This obviously has implications for the
> other timers.
>
> So this variable is not a simple boolean, but a multi-choice.
Yep, I captured this increased range of options later after seeing that
Xen has 4 possible choices now!
> >------------------
> >
> > * All timers run in "apparant time" ie track guest wallclock
> > * Missed tick policy is to deliver at higher rate to catchup
> > * TSC can be switched between native/emulate (virtual_rdtsc=TRUE|FALSE)
> > * TSC can have hardcoded HZ in emulate mode (apparantHZ=VALUE)
> > * RTC time of day is synced to host at startup (rtc.diffFromUTC or
> > rtc.startTime)
> > * VMWare tools reset guest TOD if it gets out of sync
> >
>
> There is also lateness hiding; (timeTracker.hideLateness); adjust TSC to
> compensate for lateness of injected interrupts (it's the slightly buggy
> counter compensation at each tick I mention above).
Thanks, I'd not see any reference to that one in the docs.
> >Xen timekeeping
> >---------------
> >
> > * TSC. Can run in 4 modes
> >
> > - auto: emulate if host TSC is unstable. native with invariant TSC
> > - native: always native regardless of host TSC stability
> > - emulate: trap + emulate regardless of host TSC invariant
> > - pvrdtsc: native, requiring invariant TSC. Also exposes rdtscp
> > instruction
> >
>
> TSC is complex enough without RDTSCP. Let's consider rdtscp as a host
> optimization for vendors of hardware with buggy clocks who want fast
> gettimeofday system calls. We already are compensating to try to keep
> virtual TSC in sync on KVM and probably don't need this mode.
I included rdtscp because it is one of the things that latest Xen 4.0 tree
now implements, so we need to be able to represent it in the libvirt XML.
> >Meaning of 'mode':
> >
> > Control how the clock is exposed to guest.
> >
> > auto: native if safe, otherwise emulate
> > native: always native
> > emulate: always emulate
> > paravirt: native + paravirtualize
> >
> > NB: Only relevant for TSC. All other timers are always emulated.
> >
>
> auto, native, emulate can map nicely for us, but it would be good to
> have an smp safe mode. (A secure mode is more of a global setting for
> all timers).
For any of the enumerations I fully expect that we would add further allowed
values to the libvirt XML over time. The goal is to get the baseline on
current implementations & try to keep it easily extensible for future ideas
> >Mapping to VMWare
> >-----------------
> >
> >eg with guest config showing
> >
> > diffFromUTC='123456'
> > apparentHZ='123456'
> > virtual_rdtsc=False
> >
> >libvirt XML gets:
> >
> > <clock mode='variable' adjustment='123456'>
> > <timer name='tsc' frequency='123456' mode='native'/>
> > </clock>
> >
> >
> >Mapping to Xen
> >--------------
> >
> >eg with guest config showing
> >
> > timer_mode=3
> > hpet=1
> > tsc_mode=2
> > localtime=1
> >
> > <clock mode='localtime'>
> > <timer name='platform' tickpolicy='merge' wallclock='host'/>
> > <timer name='hpet'/>
> > <timer name='tsc' mode='native'/>
> > </clock>
> >
> >
> >Mapping to KVM
> >--------------
> >
> >eg with guest ARGV showing
> >
> > -no-kvm-pit-reinjection
> > -clock base=localtime,clock=guest,driftfix=slew
> > -no-hpet
> >
> >
> > <clock mode='localtime'>
> > <timer name='rtc' tickpolicy='catchup' wallclock='guest'/>
> > <timer name='pit' tickpolicy='none'/>
> > <timer name='hpet' present='no'/>
> > </clock>
> >
> >
> >
> >Further reading
> >---------------
> >
> >VMWare has the best doc:
> >
> > http://www.vmware.com/pdf/vmware_timekeeping.pdf
> >
> >Xen:
> >
> > Docs on 'tsc_mode' at
> >
> > $SOURCETREE/docs/misc/tscmode.txt
> >
> > Docs for 'timer_mode' in the source code only:
> >
> > xen/include/public/hvm/params.h
> >
> >KVM:
> >
> > No docs at all. Guess from -help descriptions, reading source code&
> > asking
> > clever people about it :-)
> >
>
> Let me propose an XML mapping a bit later today. I haven't had coffee
> yet, and we know what that can do.
Ok, thanks for the feedback so far.
Regards,
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
More information about the libvir-list
mailing list