[libvirt] [virt-devel] RFC: Modelling timers / clocks & tick policies in libvirt

Daniel P. Berrange berrange at redhat.com
Fri Mar 5 17:05:06 UTC 2010


On Fri, Mar 05, 2010 at 06:50:47AM -1000, Zachary Amsden wrote:
> On 03/05/2010 04:27 AM, Daniel P. Berrange wrote:
> >
> >  * HPET
> >     Multiple timers with periodic interrupts
> >     Can replace PIT/RTC timers
> >
> >They all generally suck in real hardware, and this gets worse in virtual 
> >machines.
> >Many different approaches to making them suck less in VMWare, Xen&  KVM, 
> >but there
> >are some reasonably common concepts....
> >   
> 
> HPET doesn't suck.

The VMWare timekeeping docs mentions that it has timeout race conditions,
poorly defined spec for timer granularity, drift & speed of access, & bad
implementations in the real world which I read as 'sucks' ;-)

> >  * Interrupt timers
> >
> >      - Ticks can not always be delivered on time
> >
> >        Policies to deal with "missed" ticks:
> >
> >         1. Deliver at normal rate without catchup
> >         2. Deliver at higher rate to catch up
> >         3. Merge into 1 tick&  deliver asap
> >   
> >         4. Discard all missed ticks
> >   
> 
> The issue is actually more complex than just these policies.  A naive 
> implementation of the policy leads to a guest DOS of the host.
> 
> We actually have such a bug, and it demands a policy which merges ticks 
> over a certain threshold and does not deliver ASAP.  It's tricky and 
> complex to fix because it means our notion of timers for the guest is 
> wrong, and we need to introduce a higher order scheduling behaviour.
> 
> In general, there isn't much we can tune here, but what we can tune is 
> whether the other counters (RTC / HPET / TSC / ACPI) stay in sync with 
> ticks delivered.  It's not perfect or completely well defined because 
> the tick can't actually be delivered until a fairly complex set of 
> hardware rules is obeyed.  This may not be apparent now, because it gets 
> worse as we implement more hardware support for NMIs and SMIs.  An ideal 
> solution would sync the other counters when the tick is generated, not 
> when it is injected.  However, this leads us back to the DOS attack.  
> There are also problems with SMP timing here (which CPU gets timer 
> interrupts can change, and are they broadcast?).  These problems are 
> made worse because we don't gang schedule.

FYI, I wasn't trying to suggest good / bad policies here. I was just
attempting to document the policies that I see have been implemented
so far. For the libvirt XML the key issue is to identify a way to
list possible policies that can be extended as new one appear in
hypervisors.

> >  * TSC
> >      - rdtsc instruction can be exposed to guests in two ways
> >
> >         1. Trap + emulate (slow, but more reliable)
> >         2. Native         (fast, but possibly unreliable)
> >
> >        Optionally also expose a 'rdtscp' instruction
> >
> >        Possiblly set a fixed HZ independant of host.
> >   
> 
> There is also
> 
> 3) a mixed approach; trap and emulate only when required, allow native 
> access and offset appropriately at each exit; and
> 
> 4) a SMP safe approach; trap and emulate always, and interlock SMP 
> access to the clock so it is globally consistent
> 
> 5) a secure approach; trap and emulate always and hide host time.  This 
> precludes the possibility of SMP, as timing differences can be observed 
> since we don't gang schedule.  This obviously has implications for the 
> other timers.
> 
> So this variable is not a simple boolean, but a multi-choice.

Yep, I captured this increased range of options later after seeing that
Xen has 4 possible choices now!

> >------------------
> >
> >  * All timers run in "apparant time" ie track guest wallclock
> >  * Missed tick policy is to deliver at higher rate to catchup
> >  * TSC can be switched between native/emulate (virtual_rdtsc=TRUE|FALSE)
> >  * TSC can have hardcoded HZ in emulate mode  (apparantHZ=VALUE)
> >  * RTC time of day is synced to host at startup (rtc.diffFromUTC or 
> >  rtc.startTime)
> >  * VMWare tools reset guest TOD if it gets out of sync
> >   
> 
> There is also lateness hiding; (timeTracker.hideLateness); adjust TSC to 
> compensate for lateness of injected interrupts (it's the slightly buggy 
> counter compensation at each tick I mention above).

Thanks, I'd not see any reference to that one in the docs.

> >Xen timekeeping
> >---------------
> >
> >   * TSC. Can run in 4 modes
> >
> >      - auto: emulate if host TSC is unstable. native with invariant TSC
> >      - native: always native regardless of host TSC stability
> >      - emulate: trap + emulate regardless of host TSC invariant
> >      - pvrdtsc: native, requiring invariant TSC. Also exposes rdtscp 
> >      instruction
> >   
> 
> TSC is complex enough without RDTSCP.  Let's consider rdtscp as a host 
> optimization for vendors of hardware with buggy clocks who want fast 
> gettimeofday system calls.  We already are compensating to try to keep 
> virtual TSC in sync on KVM and probably don't need this mode.

I included rdtscp because it is one of the things that latest Xen 4.0 tree
now implements, so we need to be able to represent it in the libvirt XML.

> >Meaning of 'mode':
> >
> >  Control how the clock is exposed to guest.
> >
> >       auto: native if safe, otherwise emulate
> >     native: always native
> >    emulate: always emulate
> >   paravirt: native + paravirtualize
> >
> >   NB: Only relevant for TSC. All other timers are always emulated.
> >   
> 
> auto, native, emulate can map nicely for us, but it would be good to 
> have an smp safe mode.  (A secure mode is more of a global setting for 
> all timers).

For any of the enumerations I fully expect that we would add further allowed
values to the libvirt XML over time. The goal is to get the baseline on
current implementations & try to keep it easily extensible for future ideas

> >Mapping to VMWare
> >-----------------
> >
> >eg with guest config showing
> >
> >    diffFromUTC='123456'
> >      apparentHZ='123456'
> >   virtual_rdtsc=False
> >
> >libvirt XML gets:
> >
> >   <clock mode='variable' adjustment='123456'>
> >     <timer name='tsc' frequency='123456' mode='native'/>
> >   </clock>
> >
> >
> >Mapping to Xen
> >--------------
> >
> >eg with guest config showing
> >
> >      timer_mode=3
> >      hpet=1
> >      tsc_mode=2
> >      localtime=1
> >
> >    <clock mode='localtime'>
> >      <timer name='platform' tickpolicy='merge' wallclock='host'/>
> >      <timer name='hpet'/>
> >      <timer name='tsc' mode='native'/>
> >    </clock>
> >
> >
> >Mapping to KVM
> >--------------
> >
> >eg with guest ARGV showing
> >
> >    -no-kvm-pit-reinjection
> >    -clock base=localtime,clock=guest,driftfix=slew
> >    -no-hpet
> >
> >
> >   <clock mode='localtime'>
> >     <timer name='rtc' tickpolicy='catchup' wallclock='guest'/>
> >     <timer name='pit' tickpolicy='none'/>
> >     <timer name='hpet' present='no'/>
> >   </clock>
> >
> >
> >
> >Further reading
> >---------------
> >
> >VMWare has the best doc:
> >
> >   http://www.vmware.com/pdf/vmware_timekeeping.pdf
> >
> >Xen:
> >
> >   Docs on 'tsc_mode' at
> >
> >     $SOURCETREE/docs/misc/tscmode.txt
> >
> >   Docs for 'timer_mode'  in the source code only:
> >
> >     xen/include/public/hvm/params.h
> >
> >KVM:
> >
> >   No docs at all. Guess from -help descriptions, reading source code&  
> >   asking
> >   clever people about it :-)
> >   
> 
> Let me propose an XML mapping a bit later today.  I haven't had coffee 
> yet, and we know what that can do.

Ok, thanks for the feedback so far.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list