[libvirt] Block replication driver

Eric Blake eblake at redhat.com
Fri Feb 12 20:22:01 UTC 2016


On 12/02/2015 02:08 AM, Simon Kollberg wrote:
> Hi!

Apologies for not noticing this mail sooner.

> 
> I'm working on supporting a new FT/HA solution for qemu called COLO
> (http://wiki.qemu.org/Features/COLO). The part that is currently being
> focused
> on for libvirt integration is Block Replication
> (http://wiki.qemu.org/Features/BlockReplication) which enables guest state
> synchronization for disks.

Here's some rough thoughts on the matter, although we may go through
several iterations before landing on something that everyone likes.

> 
> Right now there are three issues that I'd like to get your input on:
> 
> 1.
> As you can see on the block replication wiki-page we need to reference the
> secondary disk id.
> 
> Example from the wiki:
> -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
> -drive if=xxx,driver=replication,mode=secondary,\
>      ...
>      file.backing.backing=colo1
> 
> My initial thought was to manually set the alias of the
> disk and add a new reference element to the backingStore:
> <disk type='file' device='disk'>
>   ...
>   <alias name='colo1'/>
> </disk>
> <disk type='file' device='disk'>
>   ...
>   <backingStore type='file'>
>     ...
>     <reference name='colo1'/>
>   </backingStore>
> </disk>
> 
> Though, I quickly realized that setting the alias is only done by the
> hypervisor and is therefore not an option with the current code.
> 
> Would it be bad letting the user set the alias, and if so, do you have any
> ideas of how to solve the referencing?

I'm a little bit leery of letting the user set the alias; one benefit
we've had of NOT letting the user control it is that we could avoid name
collisions.  It's not a strong enough reason to reject the idea, but
certainly worth thinking about.

Another consideration, if you do 'virsh dumpxml' on a running domain,
the live xml contains alias names; you can then 'virsh define' that xml,
and the aliases will be silently dropped.  This is in fact useful, if we
have to change the alias name we generate under the hood when first
starting a domain under a newer version of qemu.  If the user can set
the alias, we are stuck with that name.  On the other hand, as long as
we have an alias name and use it consistently, we can just document that
the user can't cause conflicts, making the name persistent may rather easy.

On the other hand, we DO want to make the index='1' of <backingStore>
something that becomes persistent.  And the <target dev='...'> attribute
coupled with the <backingStore index='...'> is sufficiently unique to
reference ANY element of the backing chain.

That is, I would lean towards something more like this:

<disk type='file' device='disk'>
  ...
  <source file='...' index='0'/>
  <backingStore/>
  <target dev='vda' bus='virtio'/>
</disk>
<disk type='file' device='disk'>
  ...
  <backingStore type='replication'>
    ...
    <reference dev='vda' index='0'/>
  </backingStore>
</disk>

A couple of things to note there: I think a new type='replication'
(rather than reusing existing type='file') will make it obvious that we
are adding new XML specifically for block replication; then in that new
type, we can add a new <reference> that refers to dev='vda' and
index='0' (we'll have to start exposing an index for the active layer,
not just the backingStore layers), as what the device will be replicating.

> 2.
> The format of the disk and the driver type currently shares the same
> attribute in libvirt (the type attribute on driver XML element). However,
> with
> the new replication disk driver you need to be able to set both the disk
> format
> and also the driver name.
> 
> Example from the wiki:
> -drive if=xxx,driver=replication,mode=secondary,\
>          file.file.filename=active_disk.qcow2,\
>          file.driver=qcow2,\

So we are basically stacking TWO drivers on top of a single file.  I
think that means we'll want two layers of XML, something like:

<disk type='replication'>
  <backingStore type='file'>
    <driver name='qemu' type='qcow2'>
    <source file='/path/to/active_disk.qcow2'/>
  </backingStore>
</disk>

Again, anywhere we have two layers of protocol in qemu to get to the
underlying file, it makes sense to have two layers of XML in libvirt.
We'll want the same sort of type='quorum' as a new disk type for
handling quorum drives, where those 0 direct <source> elements but
instead have multiple <backingStore> child elements.  Ideally, since
everything can be represented as a BDS tree in qemu, it should also be
represented as a similar tree in XML in libvirt, except that libvirt has
already taken the shortcut that a single protocol and file layer can be
combined (that is, we show qcow2 images and source files in the same
layer), due to historical usage.

>          ...
> 
> I saw that there was a function in libvirt called virStorageFileProbeFormat
> that could let us get the format of the disk without stating it in the XML.
> But
> as I'm sure you know, it's strongly advised not to be used since you can
> trick
> the function by modifying the disk file.

Correct, any solution that requires probing rather than explicit format
will not fly.

> 
> 
> 3.
> When using the replication driver the secondary disk is supposed to be added
> but not attached.
> Example from the wiki:
> -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
> -drive if=xxx,driver=replication,mode=secondary,\
>     ...
> 
> Clearly, trying to setup a disk without a target is not allowed at the
> moment.
> Is there any better way of doing it?

Hmm. I'm almost wondering if <disk> is the wrong element.  Most of the
XML is trying to describe something the guest will see, but if we are
creating a replication driver that is NOT visible to the guest, that
almost argues that we should create an entirely new sibling element next
to <disk>.  The new element would not need a <target> (because it is not
guest visible), but would otherwise be similar to <disk>.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 604 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20160212/fbcd3418/attachment-0001.sig>


More information about the libvir-list mailing list