[Libvir] [PATCH] virDomainMigrate version 4 (for discussion only!)

Mon Jul 23 10:23:01 UTC 2007

On Mon, Jul 23, 2007 at 11:00:21AM +0100, Richard W.M. Jones wrote:
> Daniel Veillard wrote:
> >>Firstly we send a "prepare" message to the destination host.  The 
> >>destination host may reply with a cookie.  It may also suggest a URI (in 
> >>the current Xen implementation it just returns gethostname).  Secondly 
> >>we send a "perform" message to the source host.
> >>
> >>Correspondingly, there are two new driver API functions:
> >>
> >>  typedef int
> >>      (*virDrvDomainMigratePrepare)
> >>                      (virConnectPtr dconn,
> >>                       char **cookie,
> >>                       int *cookielen,
> >>                       const char *uri_in,
> >>                       char **uri_out,
> >>                       unsigned long flags,
> >>                       const char *dname,
> >>                       unsigned long resource);
> >>
> >>  typedef int
> >>      (*virDrvDomainMigratePerform)
> >>                      (virDomainPtr domain,
> >>                       const char *cookie,
> >>                       int cookielen,
> >>                       const char *uri,
> >>                       unsigned long flags,
> >>                       const char *dname,
> >>                       unsigned long resource);
> >
> >  I wonder if 2 steps are really sufficient. I have the feeling that a 
> >  third
> >step virDrvDomainMigrateFinish() might be needed, it could for example 
> >resume on the target side and also verify the domain is actually okay.
> >That could improve error handling and feels a lot more like a transactional
> >system where you really want an atomic work/fail operation and nothing 
> >else.
> 
> Yes.  It's important to note that the internals may be changed later, 
> although that may complicate things if we want to allow people to run 
> incompatible versions of the daemons.  [Another email on that subject is 
> coming up after this one].
> 
> I'm not sure how exactly what you propose would work in the Xen case. 
> In the common error case (incompatible xend leading to domains being 
> eaten), the domain is actually created for a short while on the dest 
> host.  It dies later, but it seems possible to me that there is a race 
> where domainMigrateFinish could return "OK", and yet the domain would 
> fail later.  In another case -- where the domain could not connect to 
> its iSCSI backend -- this could be even more common.
> 
> Note also that there is a third step in the current code.  After 
> domainMigrate has finished, the controlling client then does 
> "virDomainLookupByName" in order to fetch the destination domain object. 
>  This is subject to the race conditions in the preceeding paragraph.

  Yes, my point is that other migration may need a more complex third
step, for example to activate the domain on the target after the transfer of
the data in step 2. In the case of xen the semantic of the migrate command
includes the restart on the other side but that may not always be the case.
  In the case of Xen then the backend can just call virDomainLookupByName
on the remote target node.

> >>There are two corresponding wire messages 
> >>(REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and 
> >>REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument 
> >>shuffling, albeit rather complicated because of the number of arguments 
> >>passed in and out.
> >>
> >>The complete list of messages which go across the wire during a 
> >>migration is:
> >>
> >>  client -- prepare --> destination host
> >>  client <-- prepare reply -- destination host
> >>  client -- perform --> source host
> >>  client <-- perform reply -- source host
> >>  client -- lookupbyname --> destination host
> >>  client <-- lookupbyname reply -- destination host
> >
> >  Okay, instead of trying to reuse lookupbyname to assert completion,
> >I would rather make a third special entry point. Sounds more generic
> >to me, but again it's an implementation point, not a blocker, all this
> >is hidden behind the API.
> 
> Noted.

  At some point we should make the network protocol part of the ABI,
but let's try to avoid breaking it too often :-)

> >>Capabilities
> >>------------
> >>
> >>I have extended capabilities with <migration_features>.  For Xen this is:
> >>
> >><capabilities>
> >>  <host>
> >>   <migration_features>
> >>     <live/>
> >>     <uri_transports>
> >>       <uri_transport>tcp</uri_transport>
> >>     </uri_transports>
> >>   </migration_features>
> >
> >  Nice, but what is the expected set of values for uri_transport ?
> >Theorically that can be any scheme name from rfc2396 (or later)
> >  scheme        = alpha *( alpha | digit | "+" | "-" | "." )
> >
> >  unless I misunderstood this.
> 
> I think I'm misunderstanding you.  In the Xen case it would be changed 
> to <uri_transport>xenmigr</uri_transport>.

  yes, if you think its okay.

> >> #define REMOTE_CPUMAPS_MAX 16384
> >>+#define REMOTE_MIGRATE_COOKIE_MAX 256
> >
> >  hum, what is that ? Sorry to show up ignorance, feel free to point
> >me to an xdr FAQ !
> 
> In XDR you can either have unlimited strings (well, the limit is 
> 2^32-1), or you can impose a limit on their length.  Some common types 
> in XDR notation:
> 
>   Unlimited strings:      string foo<>;
>   Limited-length strings: string foo<1000>;
>   Fixed-length strings:   char foo[1000];
>   Byte arrays:            opaque foo[1000];
> 
> Now if we just defined all strings as <> type, then the 
> automatically-built XDR receivers would accept unlimited amounts of data 
> from a buggy or malicious client.  In particular they would attempt to 
> allocate large amounts of memory, crashing the server[1].  So instead we 
> impose upper limits on the length of various strings.  They are defined 
> at the top of qemud/remote_protocol.x.

  okay, thanks for the explanations :-)

> >>+ * Since typically the two hypervisors connect directly to each
> >>+ * other in order to perform the migration, you may need to specify
> >>+ * a path from the source to the destination.  This is the purpose
> >>+ * of the uri parameter.  If uri is NULL, then libvirt will try to
> >>+ * find the best method.  Uri may specify the hostname or IP address
> >>+ * of the destination host as seen from the source.  Or uri may be
> >>+ * a URI giving transport, hostname, user, port, etc. in the usual
> >>+ * form.  Refer to driver documentation for the particular URIs
> >>+ * supported.
> >>+ *
> >>+ * The maximum bandwidth (in Mbps) that will be used to do migration
> >>+ * can be specified with the resource parameter.  If set to 0,
> >>+ * libvirt will choose a suitable default.  Some hypervisors do
> >>+ * not support this feature and will return an error if resource
> >>+ * is not 0.
> >
> >  Do you really want to fail there too ?
> 
> I'm not sure ...  It seemed like the safest thing to do since we are 
> unable to enforce the requested limit so Bad Things might happen 
> (effectively a DoS on the network).

  Okay, it's probably better to avoid any fuzziness in the definition
of the API, and then error in that case. I'm convinced !

> >Similary the capability to 
> >limit bandwidth should be added to the <capabilities><migration_features>
> >possibly as a <bandwidth/> optional element.
> 
> Yes.  Note that although xend takes and indeed requires a resource 
> parameter, the implementation in xen 3.1 completely ignores it.  For 
> this reason, <bandwidth/> is _not_ a xen 3.1 capability.

  okay

  thanks !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard at redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/