[libvirt] [PATCH 2/2] rbd: Implement buildVolFrom using RBD cloning
Wido den Hollander
wido at widodh.nl
Fri Jan 22 13:14:34 UTC 2016
On 21-01-16 21:55, John Ferlan wrote:
>
>
> On 12/23/2015 04:29 AM, Wido den Hollander wrote:
>> RBD supports cloning by creating a snapshot, protecting it and create
>> a child image based on that snapshot afterwards.
>>
>> The RBD storage driver will try to find a snapshot with zero deltas between
>> the current state of the original volume and the snapshot.
>>
>> If such a snapshot is found a clone/child image will be created using
>> the rbd_clone2() function from librbd.
>>
>> It will use the same features, strip size and stripe count as the parent image.
>>
>> This implementation will only create a single snapshot on the parent image if
>> this never changes. That should improve performance when removing the parent
>> image at some point.
>>
>> During build the decision will be made to user rbd_diff_iterate() or rbd_diff_iterate2().
>> The latter is faster, but only available on Ceph versions after 0.94 (Hammer).
>>
>> Cloning is only supported if RBD format 2 is used. All images created by libvirt
>> are already format 2.
>>
>> If a RBD format 1 image is used as the original volume libvirt will return VIR_ERR_OPERATION_UNSUPPORTED
>
> Long line... and the API should return 0 or -1 (based on other
> backends)... Caller doesn't expect that VIR_ERR* on return...
>
> I assume you realize that 'virsh vol-create-from' and 'vol-clone' end up
> using the same API...
>
> FWIW: I reviewed bottom up - I may also have been distracted more than
> once looking for specific code. Hopefully this is coherent ;-)
>
Thanks for all the feedback, really appreciate it!
Working on a revised version of all the patches right now. Rebasing a
lot, editing commits, etc, etc.
The coding style changes are new to me though. I run make syntax-check
and that works. Never knew that this was needed:
static int
virStorageRBDXXXXXXX
All the exisiting code does:
static int virStorageRBDXXXXXX
Anyway, I'll get back with new patches. Ignore anything which is on the
list right now.
Btw, a Pull Request mechanism like Github would be so much easier than
sending patches per e-mail ;)
>>
>> Signed-off-by: Wido den Hollander <wido at widodh.nl>
>> ---
>> src/storage/storage_backend_rbd.c | 340 ++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 340 insertions(+)
>>
>> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
>> index 5e16e7a..d22e5e0 100644
>> --- a/src/storage/storage_backend_rbd.c
>> +++ b/src/storage/storage_backend_rbd.c
>> @@ -33,6 +33,7 @@
>> #include "viruuid.h"
>> #include "virstring.h"
>> #include "virutil.h"
>> +#include "time.h"
>
> Instead of time, perhaps virrandom... see note below
>
>> #include "rados/librados.h"
>> #include "rbd/librbd.h"
>>
>> @@ -632,6 +633,344 @@ virStorageBackendRBDBuildVol(virConnectPtr conn,
>> return ret;
>> }
>>
>> +static int virStorageBackendRBDImageInfo(rbd_image_t image, char *volname,
>> + uint64_t *features,
>> + uint64_t *stripe_unit,
>> + uint64_t *stripe_count)
>
> static int
> virStorage...
>
> and one line per parameter
>
> Also, I think this moves to help with the previous patch...
>
>> +{
>> + int r = -1;
>
> int ret = -1;
>
>> + uint8_t oldformat;
>> +
>> + r = rbd_get_old_format(image, &oldformat);
>> + if (r < 0) {
>
> You could use (in more places than just here):
>
> "if ((r = rbd_*(...)) < 0) {"
>
> I'll just mention it once though...
>
>> + virReportSystemError(-r, _("failed to get the format of RBD image %s"),
>> + volname);
>> + goto cleanup;
>> + }
>> +
>> + if (oldformat != 0) {
>> + virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
>> + _("RBD image %s is old format. Does not support "
>> + "extended features and striping"),
>> + volname);
>> + r = VIR_ERR_OPERATION_UNSUPPORTED;
>
> unnecessary
>
>> + goto cleanup;
>> + }
>
> FWIW:
> This is the check I'm referring to in the review of the other patch.
>
>> +
>> + r = rbd_get_features(image, features);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to get the features of RBD image %s"),
>> + volname);
>> + goto cleanup;
>> + }
>> +
>> + r = rbd_get_stripe_unit(image, stripe_unit);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to get the stripe unit of RBD image %s"),
>> + volname);
>> + goto cleanup;
>> + }
>> +
>> + r = rbd_get_stripe_count(image, stripe_count);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to get the stripe count of RBD image %s"),
>> + volname);
>> + goto cleanup;
>> + }
>
> ret = 0;
>
>> +
>> + cleanup:
>> + return r;
>
> return ret;
>
>> +}
>> +
>> +/* Callback function for rbd_diff_iterate() */
>> +static int virStorageBackendRBDIterateCb(uint64_t offset ATTRIBUTE_UNUSED,
>> + size_t length ATTRIBUTE_UNUSED,
>> + int exists ATTRIBUTE_UNUSED,
>> + void *arg)
>
> static int
> virStorage*
>
>
>> +{
>> + /*
>> + * Just set that there is a diff for this snapshot, we do not care where
>> + */
>> + *(int*) arg = 1;
>> + return -1;
>
> So I assume the only reason this gets called then is when there is a
> difference...
>
>> +}
>> +
>
> This one will require some comments w/r/t what it returns... I'm also
> curious how would the caller distinguish between EPERM and a -1 return
> call? That's the problem with returning negative errno values. I think
> you need to pick specific numbers to return and not mess w/ errno.
>
> Seems like you have 1 for success, 0 for nothing found, and -1 for
> error. Then your caller is adjusted thusly.
>
>> +static int virStorageBackendRBDSnapshotFindNoDiff(rbd_image_t image, char *imgname,
>> + virBufferPtr snapname)
>
> static int
> virStorage...
>
> and one line per parameter
>
>> +{
>> + int r = -1;
>
> int ret = -1;
>
>> + int snap_count;
>> + int max_snaps = 128;
>> + size_t i;
>> + int diff;
>> + rbd_snap_info_t *snaps = NULL;
>> + rbd_image_info_t info;
>> +
>> + r = rbd_stat(image, &info, sizeof(info));
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to stat the RBD image %s"),
>> + imgname);
>> + goto cleanup;
>> + }
>
> Perhaps interesting to note that virStorageBackendRBDVolWipe calls this
> too - why not call it once and save/pass the result (by reference)?
>
>> +
>> + do {
>> + if (VIR_ALLOC_N(snaps, max_snaps))
>> + goto cleanup;
>> +
>> + snap_count = rbd_snap_list(image, snaps, &max_snaps);
>
> There isn't an API to tell you how many are there? Seems like a much
> better idea to have that rather than iterative loop like this trying to
> guess... Especially one that creates 128 to start... Lots of wasted
> space if there's only 1. There are some API's where you pass NULL for
> 'snaps' and 'max_snaps = 0', then it returns the number found... That
> would be more useful here.
>
>> + if (snap_count <= 0)
>> + VIR_FREE(snaps);
>> +
>> + } while (snap_count == -ERANGE);
>> +
>> + VIR_DEBUG("Found %d snapshots for RBD image %s", snap_count, imgname);
>
> Found -# snapshots will look very strange on error...
>
>> +
>> + if (snap_count == 0) {
>> + r = -ENOENT;
>> + goto cleanup;
>> + }
>
> This then becomes:
>
> if (snap_count <= 0) {
> if (snap_count == 0)
> ret = 0;
> goto cleanup;
> }
>> +
>> + if (snap_count > 0) {
>
> That means this if goes away...
>
>> + for (i = 0; i < snap_count; i++) {
>> + VIR_DEBUG("Quering diff for RBD snapshot %s@%s", imgname,
>> + snaps[i].name);
>
> Querying or checking.
>
>> +
>> + /* The callback will set diff to non-zero if there is a diff */
>> + diff = 0;
>> +
>> +/*
>> + * rbd_diff_iterate2() is available in versions above Ceph 0.94 (Hammer)
>> + * It uses a object map inside Ceph which is faster than rbd_diff_iterate()
>> + * which iterates all objects.
>
> Rather than split function calls between #if #else #endif - I'd copy
> that duplicated line into both.
>
>> + */
>> +#if LIBRBD_VERSION_CODE > 266
>> + r = rbd_diff_iterate2(image, snaps[i].name, 0, info.size, 0, 1,
>> +#else
>> + r = rbd_diff_iterate(image, snaps[i].name, 0, info.size,
>> +#endif
>> + virStorageBackendRBDIterateCb, (void *)&diff);
>> +
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to iterate RBD snapshot %s@%s"),
>> + imgname, snaps[i].name);
>> + goto cleanup;
>> + }
>> +
>> + if (diff == 0) {
>
> So this is the I found the matching image/snaps[i].name and it's exactly
> the same? I'm a bit unclear on what the iterate call does especially
> with that callback function added into the mix.
>
>> + VIR_DEBUG("RBD snapshot %s@%s has no delta", imgname,
>> + snaps[i].name);
>> + virBufferAsprintf(snapname, "%s", snaps[i].name);
>> + r = 0;
>
> I assume though here we're saying match and the same, so the caller
> should take a specific path, thus:
>
> ret = 1;
>
>> + goto cleanup;
>> + }
>> +
>> + VIR_DEBUG("RBD snapshot %s@%s has deltas", imgname,
>> + snaps[i].name);
>
> Otherwise, we'll try the next snaps[i]?
>
>> + }
>> + }
>> +
>> + r = -ENOENT;
>
> Thus if we get here we found nothing matching, so:
>
> ret = 0;
>
>> +
>> + cleanup:
>> + if (snaps)
>> + rbd_snap_list_end(snaps);
>> +
>> + VIR_FREE(snaps);
>> +
>> + return r;
>
> return ret;
>
>> +}
>> +
>> +static int virStorageBackendRBDSnapshotCreate(rbd_image_t image, char *imgname,
>> + char *snapname)
>
> static int
> virStorage...
>
> one line per argument
>
>> +{
>> + int r = -1;
>
> int ret = -1;
>
>> +
>> + VIR_DEBUG("Creating RBD snapshot %s@%s", imgname, snapname);
>> +
>> + r = rbd_snap_create(image, snapname);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to create RBD snapshot %s@%s"),
>> + imgname, snapname);
>> + goto cleanup;
>> + }
>
> ret = 0;
>
>> +
>> + cleanup:
>> + return r;
>
> return ret;
>
>> +}
>> +
>> +static int virStorageBackendRBDSnapshotProtect(rbd_image_t image, char *imgname,
>> + char *snapname)
>
> static int
> virStorage...
>
> And one line per parameter...
>
>> +{
>> + int r = -1;
>
> int ret = -1;
>
>> + int protected;
>> +
>> + VIR_DEBUG("Quering if RBD snapshot %s@%s is protected", imgname, snapname);
>
> Querying or Checking
>
>> +
>> + r = rbd_snap_is_protected(image, snapname, &protected);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed verify if RBD snapshot %s@%s "
>> + "is protected"), imgname, snapname);
>> + goto cleanup;
>> + }
>> +
>> + if (protected == 0) {
>> + VIR_DEBUG("RBD Snapshot %s@%s is not protected, protecting",
>> + imgname, snapname);
>> +
>> + r = rbd_snap_protect(image, snapname);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed protect RBD snapshot %s@%s"),
>> + imgname, snapname);
>> + goto cleanup;
>> + }
>> + } else {
>> + VIR_DEBUG("RBD Snapshot %s@%s is already protected", imgname, snapname);
>> + }
>> +
> ret = 0;
>
>> + cleanup:
>> + return r;
>
> return ret;
>
>> +}
>> +
>> +static int virStorageBackendRBDCloneImage(rados_ioctx_t io, char *origvol,
>> + char *newvol)
>
> static int
> virStorage...
>
> Also one line per parameter...
>
>> +{
>> + int r = -1;
>
> int ret = -1;
>
>> + int order = 0;
>> + uint64_t features;
>> + uint64_t stripe_count;
>> + uint64_t stripe_unit;
>> + virBuffer snapname = VIR_BUFFER_INITIALIZER;
>> + char *snapname_buff;
>
> = NULL;
>
> Since we can get to cleanup without ever initializing...
>
>> + rbd_image_t image = NULL;
>> +
>> + r = rbd_open(io, origvol, &image, NULL);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to open the RBD image %s"),
>> + origvol);
>> + goto cleanup;
>> + }
>> +
>> + r = virStorageBackendRBDImageInfo(image, origvol, &features, &stripe_unit,
>> + &stripe_count);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to get info from RBD image %s"),
>> + origvol);
>
> Don't overwrite error messages from virStorageBackendRBDImageInfo
>
> This then just becomes:
>
> if (virStorageBackendRBDImageInfo() < 0)
>
> Yes, I see this is where the commit message claim about returning
> VIR_ERR_OPERATION_UNSUPPORTED is sourced, but I don't think that's the
> right choice...
>
>
>> + goto cleanup;
>> + }
>> +
>> + /*
>> + * First we attempt to find a snapshot which has no differences between
>> + * the current state of the RBD image.
>> + *
>> + * This prevents us from creating a new snapshot for every clone operation
>> + * while it could be that the original volume has not changed
>> + */
>> + r = virStorageBackendRBDSnapshotFindNoDiff(image, origvol, &snapname);
>> + if (r < 0 && r != -ENOENT) {
>> + virReportSystemError(-r, _("failed to find snapshot for RBD image %s"),
>> + origvol);
>> + goto cleanup;
>> + }
>
> Same here regarding error message...
>
> Using my suggestions above "if (r < 0) goto cleanup;"
>
>> +
>> + /*
>> + * No such snapshot could be found, so we will create a new snapshot
>> + * and use that for cloning
>> + */
>> + if (r == -ENOENT) {
>
> again from my suggestion "if (r == 0)"
>
>> + VIR_DEBUG("No RBD snapshot with zero delta could be found for image %s",
>> + origvol);
>> +
>> + virBufferAsprintf(&snapname, "libvirt-%d", (int)time(NULL));
>
> Ewww... I assume you're shooting for unique name, why not use
> 'virRandomInt'.
>
>> +
>> + if (virBufferCheckError(&snapname) < 0)
>> + goto cleanup;
>> +
>> + snapname_buff = virBufferContentAndReset(&snapname);
>> +
>> + r = virStorageBackendRBDSnapshotCreate(image, origvol, snapname_buff);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to snapshot RBD image %s"),
>> + origvol);
>> + goto cleanup;
>> + }
>
> Same here regarding error message
>
>> + } else {
>> + snapname_buff = virBufferContentAndReset(&snapname);
>
> This path assumes snapname was generated via the virBufferAsprintf in
> the called function, but this path doesn't do the virBufferCheckError
> leaving the possibility that the virBufferAsprintf failed and then the
> snapname_buff could be "NULL" (if error was set), resulting in issues
> below...
>
>> +
>> + VIR_DEBUG("Found RBD snapshot %s with zero delta for image %s",
>> + snapname_buff, origvol);
>
> This is somewhat redundant with the VIR_DEBUG in the *NoDiff helper
>
>> + }
>> +
>> + VIR_DEBUG("Using snapshot name %s for cloning RBD image %s to %s",
>> + snapname_buff, origvol, newvol);
>> +
>> + /*
>> + * RBD snapshots have to be 'protected' before they can be used
>> + * as a parent snapshot for a child image
>> + */
>> + r = virStorageBackendRBDSnapshotProtect(image, origvol, snapname_buff);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to protect RBD snapshot %s@%s"),
>> + origvol, snapname_buff);
>> + goto cleanup;
>> + }
>
> Same here regarding error message.
>
>> +
>> + VIR_DEBUG("Performing RBD clone from %s to %s", origvol, newvol);
>> +
>> + r = rbd_clone2(io, origvol, snapname_buff, io, newvol, features, &order,
>> + stripe_unit, stripe_count);
>
> Seeing rbd_clone2 makes me wonder is there an rbd_clone? and if so,
> then what version does rbd_clone2 show up in? e.g. why no
> LIBRBD_VERSION_CODE syntax here?
>
> Why even get this far if rbd_clone2 isn't available?
>
>
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to clone RBD volume %s to %s"),
>> + origvol, newvol);
>> + goto cleanup;
>> + }
>> +
>> + VIR_DEBUG("Cloned RBD image %s to %s", origvol, newvol);
>> +
> ret = 0;
>
>> + cleanup:
>> + virBufferFreeAndReset(&snapname);
>> + VIR_FREE(snapname_buff);
>> +
>> + if (image)
>> + rbd_close(image);
>> +
>> + return r;
>
> return ret;
>
>> +}
>> +
>
> Since the code relies upon rbd_clone2, rather than go through all the
> effort in the code only to fail because the right function isn't
> available - check once, early on and be done. I used 266 here because
> it's the *iterate2 - if "rbd_clone2" is later, then that version should
> be used. Just be sure to document your choice.
>
>
> #if LIBRBD_VERSION_CODE > 266
>
>> +static int virStorageBackendRBDBuildVolFrom(virConnectPtr conn,
>> + virStoragePoolObjPtr pool,
>> + virStorageVolDefPtr newvol,
>> + virStorageVolDefPtr origvol,
>> + unsigned int flags)
>
> static int
> virStorage...
>
>> +{
>> + virStorageBackendRBDState ptr;
>> + ptr.cluster = NULL;
>> + ptr.ioctx = NULL;
>> + int r = -1;
>
> Add a 'int ret = -1;'
>
>> +
>> + VIR_DEBUG("Creating clone of RBD image %s/%s with name %s",
>> + pool->def->source.name, origvol->name, newvol->name);
>> +
>> + virCheckFlags(0, -1);
>> +
>> + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0)
>> + goto cleanup;
>> +
>> + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0)
>> + goto cleanup;
>> +
>> + r = virStorageBackendRBDCloneImage(ptr.ioctx, origvol->name, newvol->name);
>> + if (r < 0) {
>> + virReportSystemError(-r, _("failed to clone volume '%s/%s' to %s"),
>> + pool->def->source.name, origvol->name,
>> + newvol->name);
>
> This will replicate or overwrite an error from called function? Causing
> the user to look deeper in error log history... If you really need the
> pool name, then pass it since you're passing the other two.
>
> Thus this could just become:
>
> if (virStorage*() < 0)
>
>> + goto cleanup;
>> + }
>
> ret = 0;
>
>> +
>> + cleanup:
>> + virStorageBackendRBDCloseRADOSConn(&ptr);
>> + return r;
>
> return ret;
>
>> +}
>> +
>
> #else
>
> static int
> virStorageBackendRBDBuildVolFrom(virConnectPtr conn ATTRIBUTE_UNUSED,
> virStoragePoolObjPtr pool ATTRIBUTE_UNUSED,
> virStorageVolDefPtr newvol
> ATTRIBUTE_UNUSED,
> virStorageVolDefPtr origvol
> ATTRIBUTE_UNUSED,
> unsigned int flags ATTRIBUTE_UNUSED)
> {
> virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> _(""))
>
> Devise an appropriate error message...
>
> return -1;
>
> }
>
> #endif
>
>
> John
>> static int virStorageBackendRBDRefreshVol(virConnectPtr conn,
>> virStoragePoolObjPtr pool ATTRIBUTE_UNUSED,
>> virStorageVolDefPtr vol)
>> @@ -793,6 +1132,7 @@ virStorageBackend virStorageBackendRBD = {
>> .refreshPool = virStorageBackendRBDRefreshPool,
>> .createVol = virStorageBackendRBDCreateVol,
>> .buildVol = virStorageBackendRBDBuildVol,
>> + .buildVolFrom = virStorageBackendRBDBuildVolFrom,
>> .refreshVol = virStorageBackendRBDRefreshVol,
>> .deleteVol = virStorageBackendRBDDeleteVol,
>> .wipeVol = virStorageBackendRBDVolWipe,
>>
More information about the libvir-list
mailing list