[libvirt] [PATCH v7 00/13] qemu: Add quorum support to libvirt

Alberto Garcia berto at igalia.com
Tue Feb 23 16:15:30 UTC 2016


On Thu, Feb 11, 2016 at 02:50:55PM +0100, Peter Krempa wrote:

> > > Whoah. Data corruption accross network? I'm not quite sure
> > > whether I'd use this to cover up a problem with the storage
> > > technology or network rather than just fix the root cause. If
> > > you have 3 copies, and manage to have a sector where all 3
> > > differ then the quorum driver won't help. And it will make it
> > > even harder to find any possible problems.
> > 
> > But in that case you detect that it went wrong and you get an I/O
> > error. The problem with silent data corruption is that it can be
> > hard to detect.
> 
> Yes, and that's why it should be fixed at the network storage
> technology layer rather than anywhere else.

I've had the chance to discuss this a bit with a cloud provider that
is using Quorum.

In their experience they have had problems in their tests with Gluster
or Ceph, particularly when sharing the same images among several
clients. They have experienced major delays and crashes when one of
the nodes fail, and in general they don't consider them stable enough
for their needs. On the other hand NFS is easy to use and manipulate,
robust, and allows the use of hardware appliances.

> > If there's a bit-flip across the network Quorum can detect it,
> > report it and correct the faulty version without needing to
> > rebuild everything.
> 
> I still think that you do wan't to rebulild the whole volume in such
> case if you care about your data in the slightest. Otherwise you
> don't have to do stuff like this.

In general, yes. But that's right, I agree that having API to deal
with these scenarios is a good idea and I can work on it.

> > >     * since we don't use node-names yet, it's not really
> > >       possible to do block jobs on quorum disks, thus they are
> > >       forbidden
> > 
> > I'm not sure what's the status of node names in libvirt, I could
> > also try to help to make it happen.
> 
> They are basically non-existent. To be honest I think that the node
> name support stuff and better approach at constructing block devices
> and their backing chains and better handling of block jobs should be
> done prior to quorum.
> 
> This series tries to partially do the stuff that is a plan how to
> approach some stuff regarding disks. One of them is that the backing
> chain of a disk is persisted in the XML and then fully constructed.
> 
> By adding this code the refactor will be even more painful as it
> will currently be.
> 
> I'm actually planing to do this in short term future, but
> unfortunately this is not a weekend project.
> 
> > 
> > >     * since block jobs are forbidden and rewrite-corrupted can't
> > >     * be enabled, no way to do the rebuild
> > 
> > 'rewrite-corrupted' can be easily added to the series so I don't
> > think that's a problem. The block jobs thing I would need to
> > see first. Would you really need to have node names in order to
> > rebuild a Quorum?
> 
> Most probably yes. Without them, it will be just an ugly hack.

For the common usage I think you can use the device name just fine,
but if you have a scenario where a Quorum is part of a backing chain
then if wouldn't work without node names.

Berto




More information about the libvir-list mailing list