[libvirt] potential bug in qemu snapshots

Eric Blake eblake at redhat.com
Wed Dec 22 21:02:04 UTC 2010


On 12/22/2010 11:35 AM, Eric Blake wrote:
> Right now, we create qemu snapshots to a file by using an exec: monitor
> command that passes 'compressor | { dd && dd; }' with stdout connected
> to the target file.  However, since dd is using a larger bs= than
> PIPE_MAX, it is conceivable that under heavy machine load, that dd will
> get a short read from the pipe, and unless we use the GNU extension of
> iflag=fullblock, that short read will be padded out to the output block
> size and result in data corruption in the destination file.
> 
> The possibility of corruption due to short reads when dd is fed input
> through a pipe but produces output through a large block size has
> occurred several times on the coreutils mailing list:
> http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00092.html
> Coreutils currently obeys the (non-intuitive) POSIX requirements for
> this short read behavior, and while it is being considered to make dd
> when POSIXLY_CORRECT is unset be more sensible, you can't rely on that
> being a default.
> 
> Should we refuse to make snapshots if we don't detect the GNU extension
> of 'dd iflag=fullblock'?  Probably safe to do on GNU/Linux machines, but
> I'm wondering if it will have negative effects on BSD machines where GNU
> coreutils is not installed.

I've gone back and read POSIX again.  It states:

If the bs=expr operand is specified and no conversions other than sync,
noerror, or notrunc are requested, the data returned from each input
block shall be written as a separate output block; if the read returns
less than a full block and the sync conversion is not specified, the
resulting output block shall be the same size as the input block. If the
bs=expr operand is not specified, or a conversion other than sync,
noerror, or notrunc is requested, the input shall be processed and
collected into full-sized output blocks until the end of the input is
reached.

So I stand corrected - because we aren't using conv=sync, there is no
zero padding or data corruption; however, behavior can still be
improved.  With bs=, a short read results in a short write, so we aren't
using the full 1M block size that we asked of dd, and end up with more
syscalls than necessary.  Swapping to ibs= obs= instead of bs= allows dd
to do fewer syscalls on output.  Patch coming up soon, but I'm grateful
that this is not a data corruption bug after all.  And the GNU extension
of iflag=fullblock is only useful when mixing bs= and conv=sync, so
there's no need to use that extension.

-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20101222/b2c8fe60/attachment-0001.sig>


More information about the libvir-list mailing list