[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Libvir] [PATCH] virDomainMigrate version 4 (for discussion only!)

On Thu, Jul 19, 2007 at 06:03:29PM +0100, Richard W.M. Jones wrote:
> Some observations about Xen migration and error handling.
> The Xen migration protocol isn't stable between releases.  It changed 
> between 3.0.3 and 3.1.0.  There doesn't seem to be any versioning, and 
> incompatible versions of Xen seem happy to attempt migrations between 
> them, even though these will certainly fail.
> The source host's xend forks an xc_save process.  It appears to me that 
> xc_save will happily write to _anything_ listening on port 8002, even if 
> that thing closes the socket prematurely.  (Try running 'xm migrate' on 
> one host and at the same time 'nc -l 8002 > /dev/null' on the target 
> host.  The 'xm migrate' will happily complete without error.  Meanwhile 
> the domain you "migrated" just gets deleted.)
> Partly because of the lack of error reporting, and partly because the 
> xend -> xc_save fork will make error reporting difficult to add, libvirt 
> has a hard time displaying errors in this situation.  It is quite 
> possible to call virDomainMigrate and get a domain back which shortly 
> afterwards "disappears", all without any indication of error.

We need to be careful about where we draw the line here. We can jump
through all sorts of hoops in libvirt, but at the end of the day there is
some majorly broken stuff in Xen really just needs fixing rather than
working around. I'd rather submit fixes to upstream Xen where needed to
make migration more reliable than put too much complexity into libvirt,
even if it means we have more limited error reporting for current XenD.

The mailing list archive links eludes me right now, but upstream Xen was
reasonably receptive to the idea of bringing  xc_save/restore back into
XenD process which would resolve a huge class of error reporting problems.
The original motivation for making them separate processes was that the
code was fragile and crashed XenD a fair bit, but that's likely no longer
a problem.

|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]