[libvirt] RFC: Modelling timers/ clocks / tick policies in libvirt

Daniel P. Berrange berrange at redhat.com
Fri Mar 5 14:28:55 UTC 2010


One of the things we've not dealt with in libvirt yet is how to model the
various evil hacks most virt products have for dealing with timers in
guests. This email tries to outlines the problems & way each virt system
has dealt with them. Finally it suggests how to manage this in libvirt
domain XML. Comments please :-)

             Virtual machine timer management in libvirt
             ===========================================

On PC hardware there are a number of terrible timers / clock sources available
to operating systems

 * PIT
    Timer with periodic interrupts

 * RTC

    Time of Day clock, continuous running
    Timer with periodic interrupts

 * Local APIC Timer
    Timer with periodic interrupts

 * ACPI Timer
    Timer with periodic interrupts

 * TSC
    Read via rdtsc instruction. No interrupts

    Unreliable on some hardware. eg changes frequency.
    Not synced between cores
    Different HZ across hosts

 * HPET
    Multiple timers with periodic interrupts
    Can replace PIT/RTC timers

They all generally suck in real hardware, and this gets worse in virtual machines.
Many different approaches to making them suck less in VMWare, Xen & KVM, but there
are some reasonably common concepts....


Virtual timekeeping problems
----------------------------

Three primary problems / areas to deal with:

 * Time of day clock (RTC)

     - Initialized to UTC/Localtime/Timezone/UTC+offset
     - Two modes of operation:
        1. Guest wallclock: only runs when guest is executing. ie stopped across save/restore, etc
        2. Host  wallclock: runs continuously with host wall time.

 * Interrupt timers

     - Ticks can not always be delivered on time

       Policies to deal with "missed" ticks:

        1. Deliver at normal rate without catchup
        2. Deliver at higher rate to catch up
        3. Merge into 1 tick & deliver asap
        4. Discard all missed ticks

 * TSC
     - rdtsc instruction can be exposed to guests in two ways

        1. Trap + emulate (slow, but more reliable)
        2. Native         (fast, but possibly unreliable)

       Optionally also expose a 'rdtscp' instruction

       Possiblly set a fixed HZ independant of host.


VMWare timekeeping
------------------

 * All timers run in "apparant time" ie track guest wallclock
 * Missed tick policy is to deliver at higher rate to catchup
 * TSC can be switched between native/emulate (virtual_rdtsc=TRUE|FALSE)
 * TSC can have hardcoded HZ in emulate mode  (apparantHZ=VALUE)
 * RTC time of day is synced to host at startup (rtc.diffFromUTC or rtc.startTime)
 * VMWare tools reset guest TOD if it gets out of sync

Xen timekeeping
---------------

 * Virtual platform timer (VPT) used as source for other timers
 * VPT has 4 modes

    0: delay_for_missed_ticks

     Missed ticks are delivered when next scheduled, at the normal
     rate. RTC runs in guest wallclock, so is delayed. No catchup is
     attempted

    1: no_delay_for_missed_ticks

     Missed ticks are delivered when next scheduled, at the normal
     rate. RTC runs in host wallclock, so is not delayed.

    2: no_missed_ticks_pending

     Missed ticks are discarded & next tick is delivered normally. RTC
     runs in host wallclock.


    3: one_missed_tick_pending

     Missed interrupts are collapsed into a single late tick. RTC
     run in host wallclock.

  * HPET

    Optionally enabled

  * TSC. Can run in 4 modes

     - auto: emulate if host TSC is unstable. native with invariant TSC
     - native: always native regardless of host TSC stability
     - emulate: trap + emulate regardless of host TSC invariant
     - pvrdtsc: native, requiring invariant TSC. Also exposes rdtscp instruction


KVM timekeeping
---------------

 * PIT: can be in kernel, or userspace (userspace deprecated for KVM)

   Default tick policies differ for both impls

    - Userspace: Default: missed ticks are delivered when next scheduled at normal rate

       -tdf flag enable tick reinjection to catchup

    - Kernel: Default: Missed ticks are delivered at higher rate to catch up

       -no-kvm-pit-reinjection to disable tick reinjection catchup

 * RTC

    TOD clock can run in host or guest wallclock (clock=host|guest)

     Default: missed ticks are delivered when next scheduled at normal rate

     -rtc-td-hack or -clock driftfix=slew:  missed ticks are delivered at a
      higher rate to catchup

 * TSC

    - Always runs native.

 * HPET

    - Optionally enabled/disabled


Mapping in libvirt XML
----------------------

Currently supports setting Time of Day clock via

  <clock offset="utc"/>

    Always sync to UTC

  <clock offset="localtime"/>

    Always sync to host timezone

  <clock offset="timezone" timezone='Europe/Paris'/>

    Sync to arbitrary timezone

  <clock offset="variable" adjustment='123456'/>

    Sync to UTC + arbitrary offset



Proposal to model all timers policies as sub-elements of this <clock/>
In general we wil allow zero or more <timer/> elements following the
syntax:

  <timer name='platform|pit|rtc|hpet|tsc'
    wallclock='host|guest'
   tickpolicy='none|catchup|merge|discard'
    frequency='123'
         mode='auto|native|emulate|paravirt'
      present='yes|no' />

Meaning of 'name':

  Names map to regular PC timers / clocks.  'Platform' refers to the
  (optional) master virtual clock source that may be used to drive
  policy of "other" clocks (eg used in Xen, which clocks are controlled
  by the platform clock is to be undefined because it has varied in
  Xen over time).

Meaning of 'tickpolicy':

       none: continue to deliver at normal rate (ie ticks are delayed)
    catchup: deliver at higher rate to catchup
      merge: ticks merged into 1 single tick
    discard: all missed ticks are discarded

Meaning of 'wallclock':

  Only valid for name='rtc' or 'platform'

    host: RTC wallclock always tracks host time
   guest: RTC wallclock always tracks host time

Meaning of 'frequency':

  Set a fixed frequency in HZ.

  NB: Only relevant for TSC. All other timers are fixed (PIT, RTC), or
      fully guest controlled frequency (HPET)

Meaning of 'mode':

 Control how the clock is exposed to guest.

      auto: native if safe, otherwise emulate
    native: always native
   emulate: always emulate
  paravirt: native + paravirtualize

  NB: Only relevant for TSC. All other timers are always emulated.


Meaing of 'present':

  Used to override default set of timers visible to the guest. eg to
  enable or disable the HPET



Mapping to VMWare
-----------------

eg with guest config showing

   diffFromUTC='123456'
     apparentHZ='123456'
  virtual_rdtsc=False

libvirt XML gets:

  <clock mode='variable' adjustment='123456'>
    <timer name='tsc' frequency='123456' mode='native'/>
  </clock>


Mapping to Xen
--------------

eg with guest config showing

     timer_mode=3
     hpet=1
     tsc_mode=2
     localtime=1

   <clock mode='localtime'>
     <timer name='platform' tickpolicy='merge' wallclock='host'/>
     <timer name='hpet'/>
     <timer name='tsc' mode='native'/>
   </clock>


Mapping to KVM
--------------

eg with guest ARGV showing

   -no-kvm-pit-reinjection
   -clock base=localtime,clock=guest,driftfix=slew
   -no-hpet


  <clock mode='localtime'>
    <timer name='rtc' tickpolicy='catchup' wallclock='guest'/>
    <timer name='pit' tickpolicy='none'/>
    <timer name='hpet' present='no'/>
  </clock>



Further reading
---------------

VMWare has the best doc:

  http://www.vmware.com/pdf/vmware_timekeeping.pdf

Xen:

  Docs on 'tsc_mode' at

    $SOURCETREE/docs/misc/tscmode.txt

  Docs for 'timer_mode'  in the source code only:

    xen/include/public/hvm/params.h

KVM:

  No docs at all. Guess from -help descriptions, reading source code & asking
  clever people about it :-)



-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list