[libvirt] Libvirt domain event usage and consistency

Michal Privoznik mprivozn at redhat.com
Fri Nov 25 15:34:07 UTC 2016


On 25.11.2016 14:38, Roman Mohr wrote:
> Hi,
> 
> I recently started to use the libvirt domain events. With them I increase
> the responsiveness of my VM state wachers.
> In general it works pretty well. I just listen to the events and do a
> periodic resync to cope with missed events.
> 
> While watching the events I ran into a few interesting situations I wanted
> to share. The points 1-3 describe some minor issues or irregularities.
> Point 4 is about the fact that domain and state updates are not versioned
> which makes it very hard to stay in sync with libvirt when using events.
> 
> My libvirt version is 1.2.18.4.

This might be the root cause. I'm unable to see some of the scenarios
you're seeing. Have you tried the latest release (or even git HEAD) to
check whether all the scenarios you are describing still stand?

> 
> 1) Event order seems to be weird on startup:
> 
> When listening for VM lifecycle events I get this order:
> 
> {"event_type": "Started", "timestamp": "2016-11-25T11:59:53.209326Z",
> "reason": "Booted", "domain_name": "generic", "domain_id":
> "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
> {"event_type": "Defined", "timestamp": "2016-11-25T11:59:53.435530Z",
> "reason": "Added", "domain_name": "generic", "domain_id":
> "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
> 
> It is strange that a VM already boots before it is defined. Is this the
> intended order?

I don't see this order so probable this is fixed upstream.

> 
> 2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event order

I don't think you can define a domain with that flag. What's the actual
action?

> 
> {"event_type": "Defined", "timestamp": "2016-11-25T12:02:44.037817Z",
> "reason": "Added", "domain_name": "core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Resumed", "timestamp": "2016-11-25T12:02:44.813104Z",
> "reason": "Unpaused", "domain_name": "core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Started", "timestamp": "2016-11-25T12:02:44.813733Z",
> "reason": "Booted", "domain_name": "core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}


Interesting, so here is "defined" event delivered before the "started"
event. Also - where is "suspended" event?

> 
> This boot-order makes it hard to track active domains by listening to
> life-cycle events. One could theoretically still always fetch the VM state
> in the event callback and check the state, but if the state is not
> immediately transferred with the event itself, it can already be outdated,
> so this might be racy (intransparent for the libvirt bindings user), and as
> described in (3) currently not even possible. In general the real existing
> events seem to differ quite significantly from the described life-cycle in
> [1].

Again, in the upstream I see something different:
event 'lifecycle' for domain $domain: Started Booted
event 'lifecycle' for domain $domain: Suspended Paused


> 
> 3) "Defined" event is triggered before the domain is completely defined
> 
> {"event_type": "Defined", "timestamp": "2016-11-25T12:02:44.037817Z",
> "reason": "Added", "domain_name": "core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Resumed", "timestamp": "2016-11-25T12:02:44.813104Z",
> "reason": "Unpaused", "domain_name": "core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Started", "timestamp": "2016-11-25T12:02:44.813733Z",
> "reason": "Booted", "domain_name": "core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> 
> When I try to process the first event and do a xmldump I get:
> 
>    Event: [Code-42] [Domain-10] Domain not found: no domain with matching
> uuid 'b9906489-6d5b-40f8-a742-ca71b2b84277' (core_node)
> 
> So it seems like I get the event before the domain is completely ready.

You know that you shouldn't be calling libvirt APIs from event callbacks?

> 
> 4) There libvirt domain description is not versioned
> 
> I would expect that every time I update a domainxml (update from third
> party entity), or an event is generated (update from libvirt), that the
> resource version of a Domain is increased and that I get this resource
> version when I do a xmldump or when I get an event. Without this there is
> afaik no way to stay in sync with libvirt, even if you do regular polling
> of all domains. The main issue here is that I can never know if events in
> the queue arrived before my latest domain resync or after it.
> 
> Also not that this is not about delivery guarantees of events. It is just
> about having a consistent view of a VM and the individual event. If I have
> resource versions, I can decide if an event is still interesting for me or
> not, which is exactly what I need to solve the syncing problem above.
> When I do a complete relisting of all domains to syn, I know which version
> I got and I can then see on every event if it is newer or older.
> 
> If along side with the event, the domain xml, the VM state, and the
> resource version would be sent to a client, it would be even better. Then,
> whenever there is a new event for a VM in the queue, I can be sure that
> this domainxml I see is the one which triggered the event. This xml is then
> a complete representation for this revision number.

I recall some people asking for this. Basically, they were worried about
somebody from outside could manipulate their XMLs without them knowing.
Frankly I don't recall what was our answer to that.

Having a version number in live XML makes sense. However, it makes less
sense for config XML - there would be no way how to start with version
#0 once I've edited the file.

Michal




More information about the libvir-list mailing list