[libvirt] [PATCH 1/2] rbd: Add wiping RBD volumes by using rbd_discard() or rbd_write()

John Ferlan jferlan at redhat.com
Thu Jan 21 19:10:02 UTC 2016



On 12/23/2015 10:06 AM, Wido den Hollander wrote:
> This allows user to use the volume wiping functionality of the libvirt
> storage driver.
> 
> This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD
> 
> By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with
> RBD this will called rbd_write() in chunks of the underlying object size
> to completely zero out the volume.
> 
> With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the
> same object size chunks which will trim/discard all underlying RADOS objects
> in the Ceph cluster.
> 
> Signed-off-by: Wido den Hollander <wido at widodh.nl>
> ---
>  include/libvirt/libvirt-storage.h |   4 +
>  src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++-
>  tools/virsh-volume.c              |   2 +-
>  3 files changed, 159 insertions(+), 2 deletions(-)
> 

Found these buried in my todo list of things to look at from during the
holiday break. I figure by bumping it'll bring it back into focus...
"Semantically" speaking - this patch is a v2 of the original patch series...

I'm still a bit conflicted whether to add a new option to Wipe or
whether a new API should be developed. I see value in both options.
Although perhaps thinking of this as "trim" and not "discard" could make
it more palatable for wipe. As a new API, each backend driver could
decide whether it supports the discard/trim option, but that's quite a
bit more work (essentially mimic the Wipe functionality, but generate Trim).

I'll note off the top that if we go with adding a new wipe algorithm and
we've updated virsh-volume.c to recognize that, then virsh.pod would
also need an update to describe it.

Also rather than one patch here - I suggest smaller individual patches
to make it easier to debug issues down the line when using git bisect. I
see perhaps 4 patches...

Patch 1:
You probably want to start by adjusting virStorageBackendVolWipeLocal.
In particular, the switch statement there needs some tweaking - first to
use the "switch ((virStorageVolWipeAlgorithm) algorithm) {" construct,
but also fixing a 'bug' I just noted in the current design. If the
current 'default:' option is taken, the code reports an error, but still
attempts the SCRUB command (which will return/cause a different error).
BTW: Instead of default it would the *_LAST case... If you're really
ambitious, adding a check for the "expected" 'flags' bits would also be
beneficial especially since you'll be adding one.

Patch 2:
Add wipe support for rbd and add the Zero algorithm. This gives a base.
The switch in virStorageBackendRBDVolWipe could still remain, but the
Flags would only be for _ZERO

Patch 3:
Then add a the 'trim' option to libvirt-storage.h, virsh-volume.c, and
virsh.pod...

Patch 4:
This patch would add the 'trim' support to the backend. Also grab
virStorageBackendRBDImageInfo from patch 2.  You're making the same
'stripe_count' call in this patch, but don't have the same checks. If
you're concerned about the perhaps extra unnecessary calls you could
allow the 3 return parameters to be NULL, then prior to fetching do "if
(param)" type trick.  The caller could then provide a NULL if they don't
care about features and unit...


> diff --git a/include/libvirt/libvirt-storage.h b/include/libvirt/libvirt-storage.h
> index 2c55c93..139add3 100644
> --- a/include/libvirt/libvirt-storage.h
> +++ b/include/libvirt/libvirt-storage.h
> @@ -153,6 +153,10 @@ typedef enum {
>  
>      VIR_STORAGE_VOL_WIPE_ALG_RANDOM = 8, /* 1-pass random */
>  
> +    VIR_STORAGE_VOL_WIPE_ALG_DISCARD = 9, /* 1-pass, discard all data on the
> +                                                     volume by using TRIM or
> +                                                     DISCARD */

Assuming we use wipe, I think "TRIM" with the description of the option
to be "trimming" the contents of the volume. Whether that's sparse
files, thin/sparse logical volumes, or rbd object discarding...  The
aren't your problem to solve here, unless you have that desire to make
those changes too. Also, the 2nd/3rd comments should line up under 1-pass...

> +
>  # ifdef VIR_ENUM_SENTINELS
>      VIR_STORAGE_VOL_WIPE_ALG_LAST
>      /*
> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
> index cdbfdee..d13658d 100644
> --- a/src/storage/storage_backend_rbd.c
> +++ b/src/storage/storage_backend_rbd.c
> @@ -32,6 +32,7 @@
>  #include "base64.h"
>  #include "viruuid.h"
>  #include "virstring.h"
> +#include "virutil.h"

This isn't necessary I believe.  I was able to remove without issue.

>  #include "rados/librados.h"
>  #include "rbd/librbd.h"
>  
> @@ -700,6 +701,157 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn ATTRIBUTE_UNUSED,
>      return ret;
>  }
>  
> +static int virStorageBackendRBDVolWipeZero(rbd_image_t image,
> +                                           char *imgname,
> +                                           rbd_image_info_t info,
> +                                           uint64_t stripe_count)

Newer libvirt convention is:

static int
virStorage...

> +{
> +    int r = -1;

Add:

    int ret = -1;

Usually it's 'ret' instead of just 'r'... Keeping 'r' for rbd_*() call
failures fine though since that will contain (and possibly message)
rbd_* specific API call errors...

> +    size_t offset = 0;
> +    uint64_t length;
> +    char *writebuf;
> +
> +    if (VIR_ALLOC_N(writebuf, info.obj_size * stripe_count) < 0)
> +        goto cleanup;
> +
> +    while (offset < info.size) {
> +        length = MIN((info.size - offset), (info.obj_size * stripe_count));
> +
> +        r = rbd_write(image, offset, length, writebuf);
> +        if (r < 0) {
> +            virReportSystemError(-r, _("writing %llu bytes failed on "
> +                                       " RBD image %s at offset %llu"),

This will generate two spaces "... failed on  RBD image..."

> +                                     (unsigned long long)length,
> +                                     imgname,
> +                                     (unsigned long long)offset);

So is length a "uint64_t" or not?  I do note that librdb.h deems it a
"size_t"...  The query is more why caste to (unsigned long long) other
than the %llu (of course).

As for offset, IIRC the convention is "%zu", although for this one I
note that the librdb.h deems it a "uint64_t".

> +            goto cleanup;
> +        }
> +
> +        VIR_DEBUG("Wrote %llu bytes to RBD image %s at offset %llu",
> +                  (unsigned long long)length,
> +                  imgname, (unsigned long long)offset);

similar comments regarding the castes and the variable types.
> +
> +        offset += length;
> +    }

Here would be:

    ret = 0;

> +
> + cleanup:

writebuf is leaked.  Need a VIR_FREE()

> +    return r;

and this becomes return ret;

> +}
> +
> +static int virStorageBackendRBDVolWipeDiscard(rbd_image_t image,
> +                                              char *imgname,
> +                                              rbd_image_info_t info,
> +                                              uint64_t stripe_count)

static int
virStorage...

> +{
> +    int r = -1;

Need int ret = -1

> +    size_t offset = 0;
> +    uint64_t length;
> +
> +    VIR_DEBUG("Wiping RBD %s volume using discard)", imgname);
> +
> +    while (offset < info.size) {
> +        length = MIN((info.size - offset), (info.obj_size * stripe_count));
> +
> +        r = rbd_discard(image, offset, length);

rbd_discard deems 'offset' to also be a uint64_t

> +        if (r < 0) {
> +            virReportSystemError(-r, _("discarding %llu bytes failed on "
> +                                       " RBD image %s at offset %llu"),

similar to *Zero - you'll have "...failed on  RBD image..."

> +                                     (unsigned long long)length,
> +                                     imgname,
> +                                     (unsigned long long)offset);

similar comments regarding caste's of length and offset

> +            goto cleanup;
> +        }
> +
> +        VIR_DEBUG("Discarded %llu bytes of RBD image %s at offset %llu",
> +                  (unsigned long long)length,
> +                  imgname, (unsigned long long)offset);

similar comments regarding caste's

> +
> +        offset += length;
> +    }

Here would be

    ret = 0;

> +
> + cleanup:
> +    return r;

And return ret;

> +}
> +
> +static int virStorageBackendRBDVolWipe(virConnectPtr conn,
> +                                       virStoragePoolObjPtr pool,
> +                                       virStorageVolDefPtr vol,
> +                                       unsigned int algorithm,
> +                                       unsigned int flags)

static int
virStorage...

> +{
> +    virStorageBackendRBDState ptr;
> +    ptr.cluster = NULL;
> +    ptr.ioctx = NULL;
> +    rbd_image_t image = NULL;
> +    rbd_image_info_t info;
> +    uint64_t stripe_count;
> +    int r = -1;

Add
    int ret = -1;


> +
> +    virCheckFlags(VIR_STORAGE_VOL_WIPE_ALG_ZERO |
> +                  VIR_STORAGE_VOL_WIPE_ALG_DISCARD, -1);
> +
> +    VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name, vol->name);
> +
> +    if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0)
> +        goto cleanup;
> +
> +    if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0)
> +        goto cleanup;
> +
> +    r = rbd_open(ptr.ioctx, vol->name, &image, NULL);
> +    if (r < 0) {

BTW: This can be :

    if ((r = rbd_open(ptr.ioctx, vol->name, &image, NULL)) < 0) {

For this and all rbd_* calls...

> +        virReportSystemError(-r, _("failed to open the RBD image %s"),
> +                             vol->name);
> +        goto cleanup;
> +    }
> +
> +    r = rbd_stat(image, &info, sizeof(info));
> +    if (r < 0) {
> +        virReportSystemError(-r, _("failed to stat the RBD image %s"),
> +                             vol->name);
> +        goto cleanup;
> +    }
> +
> +    r = rbd_get_stripe_count(image, &stripe_count);
> +    if (r < 0) {
> +        virReportSystemError(-r, _("failed to get stripe count of RBD image %s"),
> +                             vol->name);
> +        goto cleanup;
> +    }

I see the subsequent patch has some extra checks before calling this.
Why wouldn't those also need to be made here?

> +
> +    VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s",
> +              (unsigned long long)info.size, pool->def->source.name, vol->name);
> +
> +    switch (algorithm) {

Follow the convention of

  "switch ((virStorageVolWipeAlgorithm) algorithm) {"

Then each "case" lines up under "switch".

> +        case VIR_STORAGE_VOL_WIPE_ALG_ZERO:
> +            r = virStorageBackendRBDVolWipeZero(image, vol->name,
> +                                                info, stripe_count);

I would change this (and the next one) to:

    if (virStorageBackendRBDVolWipeZero(image, vol->name,
                                        info, stripe_count) < 0)
        goto cleanup;

Also, I ran these patches through Coverity - it complains that 'info' is
passed by value of 160 bytes... Although neither API adjusts it, why not
just pass "info.size" and "info.obj_size" or pass by reference the whole
'info' (just to be safe).

> +            break;
> +        case VIR_STORAGE_VOL_WIPE_ALG_DISCARD:
> +            r = virStorageBackendRBDVolWipeDiscard(image, vol->name,
> +                                                   info, stripe_count);
> +            break;
> +        default:

And listing each case allowed - so it's clearer. That way if someone in
the future comes along and adds ALG_ONE, the rbd code isn't forgotten to
be adjusted...  The compiler catches it.

> +            virReportError(VIR_ERR_INVALID_ARG, _("unsupported algorithm %d"),
> +                           algorithm);
> +            r = -VIR_ERR_INVALID_ARG;

This will be unnecessary...

> +            goto cleanup;
> +    }
> +
> +    if (r < 0) {
> +        virReportSystemError(-r, _("failed to wipe RBD image %s"),
> +                             vol->name);

This overwrites the errors found in the *WipeZero and *WipeDiscard API's

> +        goto cleanup;
> +    }

The assumption here being

    ret = 0;

> +
> + cleanup:
> +    if (image)
> +        rbd_close(image);
> +
> +    virStorageBackendRBDCloseRADOSConn(&ptr);
> +    return r;

return ret;

> +}
> +
>  virStorageBackend virStorageBackendRBD = {
>      .type = VIR_STORAGE_POOL_RBD,
>  
> @@ -708,5 +860,6 @@ virStorageBackend virStorageBackendRBD = {
>      .buildVol = virStorageBackendRBDBuildVol,
>      .refreshVol = virStorageBackendRBDRefreshVol,
>      .deleteVol = virStorageBackendRBDDeleteVol,
> -    .resizeVol = virStorageBackendRBDResizeVol,
> +    .wipeVol = virStorageBackendRBDVolWipe,
> +    .resizeVol = virStorageBackendRBDResizeVol

No need to remove the "," - that way the only diff is the line.

>  };
> diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c
> index 7932ef2..3e95aa5 100644
> --- a/tools/virsh-volume.c
> +++ b/tools/virsh-volume.c
> @@ -954,7 +954,7 @@ static const vshCmdOptDef opts_vol_wipe[] = {
>  VIR_ENUM_DECL(virStorageVolWipeAlgorithm)
>  VIR_ENUM_IMPL(virStorageVolWipeAlgorithm, VIR_STORAGE_VOL_WIPE_ALG_LAST,
>                "zero", "nnsa", "dod", "bsi", "gutmann", "schneier",
> -              "pfitzner7", "pfitzner33", "random");
> +              "pfitzner7", "pfitzner33", "random", "discard");

I think "trim" will be better.


John
>  
>  static bool
>  cmdVolWipe(vshControl *ctl, const vshCmd *cmd)
> 




More information about the libvir-list mailing list