[Libguestfs] Checksums and other verification

Nir Soffer nsoffer at redhat.com
Thu Mar 2 13:37:13 UTC 2023


On Thu, Mar 2, 2023 at 10:46 AM Richard W.M. Jones <rjones at redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 07:09:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> > > I think it would be more useful if (or in addition) it could compute
> > > the checksum of a stream which is being converted with 'qemu-img
> > > convert'.  Extra points if it can compute the checksum over either the
> > > input or output stream.
> >
> > I thought about this, it could be a filter that you add in the graph
> > that gives you checksum as a side effect of copying. But this requires
> > disabling unordered writes, which is pretty bad for performance.
> >
> > But even if you compute the checksum during a transfer, you want to
> > verify it by reading the transferred data from storage. Once you computed
> > the checksum you can keep it for verifying the same image in the future.
>
> The use-case I have in mind is being able to verify a download when
> you already know the checksum and are copying / converting the image
> in flight.
>
> eg: You are asked to download https://example.com/distro-cloud.qcow2
> with some published checksum and you will on the fly download and
> convert this to raw, but want to verify the checksum (of the qcow2)
> during the conversion step.  (Or at some point, but during the convert
> avoids having to spool the image locally.)

I'm thinking about the same flow. I think the best way to verify is:

1. The remote server publishes a block-checksum of the image
2. The system gets the block-checksum from the server (from http header?)
3. The system pulls data from the server, pushes to the target disk in
the wanted format
4. The system computes a checksum of the target disk

This way you verify the entire pipeline including the storage. If we
compute a checksum
during the conversion, we verify only that we got the correct data
from the server.

If we care only about verifying the transfer from the server, we can compute the
checksum during the download, which is likely to be sequential (so easy to
integrate with blkhash)

If we want to validate nbdcopy, it will be much harder to compute a checksum
inside nbdcopy because it does not stream the data in order.

Nir



More information about the Libguestfs mailing list