[libvirt] [PATCH v7 00/13] qemu: Add quorum support to libvirt

Alberto Garcia berto at igalia.com
Wed Jan 20 14:47:56 UTC 2016


Hi Peter,

I'm the current maintainer of Quorum in QEMU and I'd like to try to
answer some of your comments.

On Fri, Jan 08, 2016 at 06:20:04PM +0100, Peter Krempa wrote:

> So I have a few comments/observations regarding the quorum block
> driver in qemu and it's usability.
>
> At first I'd like to as you to describe your use case a bit
> more. I'm currently lacking the motivation to do anything about
> this, as the series is just partial and I don't really see any
> advantage of using the qorum driver at all and can't come up with
> any useful use case.
>
> Also a good use case is usually a good reason to drive development
> of a feature and I'm afraid that this could become abandoned without
> any real use.

The original use case for which Quorum was designed was a data center
doing redundancy with storage in multiple separate rooms shared using
NFS.

One of the issues that the customer was facing was not only problems
in the file servers themselves but -mainly- data corruption accross
the network. Quorum can correct this on the fly and is able to
identify which one of the file servers is causing the problem without
having to rebuild a whole array (like it would be the case with RAID).

Quorum is also used for the COLO block replication functionality
currently being discussed in QEMU:

   http://wiki.qemu.org/Features/BlockReplication

> 1) No traking of integrity
>     As the quorum members don't have headers, failed quorum members
>     are not recorded and remembered. The user or management app then
>     has to do this externally for given storage devices.
>
> 2) No internal tracking of quorum members
>     Members of the quorum don't have any header marking them
>     as such and thus any images may be mixed together with
>     unforseen/catastrophic results. Higher level management then
>     needs to take the role of remembering which images belong
>     together. Reimplementing this looks like reimplementing a
>     distriuted storage system to me.

That's right, Quorum does not have its own file format and was
designed to work with any driver or protocol that QEMU supports, so
I'm not sure if there's much that can be done about this.

> 3) Lack of auto-resync:
>     Once the quorum get's few inconsistencies it does not
>     automatically resync like the linux MD driver. With the current
>     implementation the only way to resync this would be to issue a
>     block-mirror (blockCopy) to /dev/null so that all blocks are
>     read and rewritten to the identical copy. This also requires a
>     user action.
>
>     Additionally the member of the quorum is not ignored if it was
>     out of sync in any previous time without being resynced allowing
>     for split-brain/corruption scenarios.

Quorum can fix errors on the fly (there's the 'rewrite-corrupted' flag
for that), so in those cases no manual intervention is required.

If we want a way to auto-resync a complete image that should be
doable, I believe it's relatively simple to implement in QEMU
(depending on the semantics).

For the manual resync I also agree that it would be good to have a
simple API to do that in case the user wants to do it manually. That
can be done.

> 4) Necessity for at least 3 copies
>     Since a majority needs to win in a vote, you need at least 3
>     member disks for this to be fault-tolerant.
>
> 5) Lack of speedup
>     Since always all blocks are read from all members and verified
>     the quorum backend doesn't really add any speed to the
>     reads. This can be mostly attributed to the fact that fault
>     tracking is not present.
>
>     In other cases, due to internal error correcting codes it's very
>     unlikely that a storage medium would return a corrupted sector
>     without producing a error.

4) and 5) are part of the design of Quorum, as I said one the goals
is to detect (and correct) silent data corruption on the fly, not to
speed up disk access or to be space efficient.

> 6) Almost every remote storage technology does quorums internally
>     Any distributed storage (ceph/rbd, gluster, sheepdog, etc..)
>     provide the quorum functionality internally with added benefit
>     that their internal working fixes problems when split of the
>     network occurs.
>
> 7) Tools are restricted to qemu and qemu-img
>     It's a "proprietary" implementation so for a rebuild you have
>     to use one of the two tools. AFAIK qemu-img is not really
>     user friendly for the less common disk backends and we don't
>     really provide any abstraction on top of that. This means
>     that there really aren't any reasonable tools to do a offline
>     resync. (Okay, if you know which instance is okay, you can just
>     copy it ...)

Right. If this is important I can propose to write a tool for QEMU to
deal with this. It's probably a good idea anyway.

> This series also lacks implementation of any user/maganement
> warning method that a block operation didn't have 100% votes in the
> quorum voting thus it's not really possible for the users to do a
> rebuild/diagnostic if something fails.

I can't say much about this series because I haven't looked into the
code in detail yet, but I'm willing to help fix the existing problems,
add the missing features and improve the code (both in libvirt and
QEMU) if there are no other major blockers.

Thanks,

Berto




More information about the libvir-list mailing list