[libvirt] [PATCH 1/2] rbd: Add wiping RBD volumes by using rbd_discard() or rbd_write()
John Ferlan
jferlan at redhat.com
Thu Jan 21 19:10:02 UTC 2016
On 12/23/2015 10:06 AM, Wido den Hollander wrote:
> This allows user to use the volume wiping functionality of the libvirt
> storage driver.
>
> This patch also adds a new wiping algorithm VIR_STORAGE_VOL_WIPE_ALG_DISCARD
>
> By default the VIR_STORAGE_VOL_WIPE_ALG_ZERO algorithm is used and with
> RBD this will called rbd_write() in chunks of the underlying object size
> to completely zero out the volume.
>
> With VIR_STORAGE_VOL_WIPE_ALG_DISCARD it will call rbd_discard() in the
> same object size chunks which will trim/discard all underlying RADOS objects
> in the Ceph cluster.
>
> Signed-off-by: Wido den Hollander <wido at widodh.nl>
> ---
> include/libvirt/libvirt-storage.h | 4 +
> src/storage/storage_backend_rbd.c | 155 +++++++++++++++++++++++++++++++++++++-
> tools/virsh-volume.c | 2 +-
> 3 files changed, 159 insertions(+), 2 deletions(-)
>
Found these buried in my todo list of things to look at from during the
holiday break. I figure by bumping it'll bring it back into focus...
"Semantically" speaking - this patch is a v2 of the original patch series...
I'm still a bit conflicted whether to add a new option to Wipe or
whether a new API should be developed. I see value in both options.
Although perhaps thinking of this as "trim" and not "discard" could make
it more palatable for wipe. As a new API, each backend driver could
decide whether it supports the discard/trim option, but that's quite a
bit more work (essentially mimic the Wipe functionality, but generate Trim).
I'll note off the top that if we go with adding a new wipe algorithm and
we've updated virsh-volume.c to recognize that, then virsh.pod would
also need an update to describe it.
Also rather than one patch here - I suggest smaller individual patches
to make it easier to debug issues down the line when using git bisect. I
see perhaps 4 patches...
Patch 1:
You probably want to start by adjusting virStorageBackendVolWipeLocal.
In particular, the switch statement there needs some tweaking - first to
use the "switch ((virStorageVolWipeAlgorithm) algorithm) {" construct,
but also fixing a 'bug' I just noted in the current design. If the
current 'default:' option is taken, the code reports an error, but still
attempts the SCRUB command (which will return/cause a different error).
BTW: Instead of default it would the *_LAST case... If you're really
ambitious, adding a check for the "expected" 'flags' bits would also be
beneficial especially since you'll be adding one.
Patch 2:
Add wipe support for rbd and add the Zero algorithm. This gives a base.
The switch in virStorageBackendRBDVolWipe could still remain, but the
Flags would only be for _ZERO
Patch 3:
Then add a the 'trim' option to libvirt-storage.h, virsh-volume.c, and
virsh.pod...
Patch 4:
This patch would add the 'trim' support to the backend. Also grab
virStorageBackendRBDImageInfo from patch 2. You're making the same
'stripe_count' call in this patch, but don't have the same checks. If
you're concerned about the perhaps extra unnecessary calls you could
allow the 3 return parameters to be NULL, then prior to fetching do "if
(param)" type trick. The caller could then provide a NULL if they don't
care about features and unit...
> diff --git a/include/libvirt/libvirt-storage.h b/include/libvirt/libvirt-storage.h
> index 2c55c93..139add3 100644
> --- a/include/libvirt/libvirt-storage.h
> +++ b/include/libvirt/libvirt-storage.h
> @@ -153,6 +153,10 @@ typedef enum {
>
> VIR_STORAGE_VOL_WIPE_ALG_RANDOM = 8, /* 1-pass random */
>
> + VIR_STORAGE_VOL_WIPE_ALG_DISCARD = 9, /* 1-pass, discard all data on the
> + volume by using TRIM or
> + DISCARD */
Assuming we use wipe, I think "TRIM" with the description of the option
to be "trimming" the contents of the volume. Whether that's sparse
files, thin/sparse logical volumes, or rbd object discarding... The
aren't your problem to solve here, unless you have that desire to make
those changes too. Also, the 2nd/3rd comments should line up under 1-pass...
> +
> # ifdef VIR_ENUM_SENTINELS
> VIR_STORAGE_VOL_WIPE_ALG_LAST
> /*
> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
> index cdbfdee..d13658d 100644
> --- a/src/storage/storage_backend_rbd.c
> +++ b/src/storage/storage_backend_rbd.c
> @@ -32,6 +32,7 @@
> #include "base64.h"
> #include "viruuid.h"
> #include "virstring.h"
> +#include "virutil.h"
This isn't necessary I believe. I was able to remove without issue.
> #include "rados/librados.h"
> #include "rbd/librbd.h"
>
> @@ -700,6 +701,157 @@ static int virStorageBackendRBDResizeVol(virConnectPtr conn ATTRIBUTE_UNUSED,
> return ret;
> }
>
> +static int virStorageBackendRBDVolWipeZero(rbd_image_t image,
> + char *imgname,
> + rbd_image_info_t info,
> + uint64_t stripe_count)
Newer libvirt convention is:
static int
virStorage...
> +{
> + int r = -1;
Add:
int ret = -1;
Usually it's 'ret' instead of just 'r'... Keeping 'r' for rbd_*() call
failures fine though since that will contain (and possibly message)
rbd_* specific API call errors...
> + size_t offset = 0;
> + uint64_t length;
> + char *writebuf;
> +
> + if (VIR_ALLOC_N(writebuf, info.obj_size * stripe_count) < 0)
> + goto cleanup;
> +
> + while (offset < info.size) {
> + length = MIN((info.size - offset), (info.obj_size * stripe_count));
> +
> + r = rbd_write(image, offset, length, writebuf);
> + if (r < 0) {
> + virReportSystemError(-r, _("writing %llu bytes failed on "
> + " RBD image %s at offset %llu"),
This will generate two spaces "... failed on RBD image..."
> + (unsigned long long)length,
> + imgname,
> + (unsigned long long)offset);
So is length a "uint64_t" or not? I do note that librdb.h deems it a
"size_t"... The query is more why caste to (unsigned long long) other
than the %llu (of course).
As for offset, IIRC the convention is "%zu", although for this one I
note that the librdb.h deems it a "uint64_t".
> + goto cleanup;
> + }
> +
> + VIR_DEBUG("Wrote %llu bytes to RBD image %s at offset %llu",
> + (unsigned long long)length,
> + imgname, (unsigned long long)offset);
similar comments regarding the castes and the variable types.
> +
> + offset += length;
> + }
Here would be:
ret = 0;
> +
> + cleanup:
writebuf is leaked. Need a VIR_FREE()
> + return r;
and this becomes return ret;
> +}
> +
> +static int virStorageBackendRBDVolWipeDiscard(rbd_image_t image,
> + char *imgname,
> + rbd_image_info_t info,
> + uint64_t stripe_count)
static int
virStorage...
> +{
> + int r = -1;
Need int ret = -1
> + size_t offset = 0;
> + uint64_t length;
> +
> + VIR_DEBUG("Wiping RBD %s volume using discard)", imgname);
> +
> + while (offset < info.size) {
> + length = MIN((info.size - offset), (info.obj_size * stripe_count));
> +
> + r = rbd_discard(image, offset, length);
rbd_discard deems 'offset' to also be a uint64_t
> + if (r < 0) {
> + virReportSystemError(-r, _("discarding %llu bytes failed on "
> + " RBD image %s at offset %llu"),
similar to *Zero - you'll have "...failed on RBD image..."
> + (unsigned long long)length,
> + imgname,
> + (unsigned long long)offset);
similar comments regarding caste's of length and offset
> + goto cleanup;
> + }
> +
> + VIR_DEBUG("Discarded %llu bytes of RBD image %s at offset %llu",
> + (unsigned long long)length,
> + imgname, (unsigned long long)offset);
similar comments regarding caste's
> +
> + offset += length;
> + }
Here would be
ret = 0;
> +
> + cleanup:
> + return r;
And return ret;
> +}
> +
> +static int virStorageBackendRBDVolWipe(virConnectPtr conn,
> + virStoragePoolObjPtr pool,
> + virStorageVolDefPtr vol,
> + unsigned int algorithm,
> + unsigned int flags)
static int
virStorage...
> +{
> + virStorageBackendRBDState ptr;
> + ptr.cluster = NULL;
> + ptr.ioctx = NULL;
> + rbd_image_t image = NULL;
> + rbd_image_info_t info;
> + uint64_t stripe_count;
> + int r = -1;
Add
int ret = -1;
> +
> + virCheckFlags(VIR_STORAGE_VOL_WIPE_ALG_ZERO |
> + VIR_STORAGE_VOL_WIPE_ALG_DISCARD, -1);
> +
> + VIR_DEBUG("Wiping RBD image %s/%s", pool->def->source.name, vol->name);
> +
> + if (virStorageBackendRBDOpenRADOSConn(&ptr, conn, &pool->def->source) < 0)
> + goto cleanup;
> +
> + if (virStorageBackendRBDOpenIoCTX(&ptr, pool) < 0)
> + goto cleanup;
> +
> + r = rbd_open(ptr.ioctx, vol->name, &image, NULL);
> + if (r < 0) {
BTW: This can be :
if ((r = rbd_open(ptr.ioctx, vol->name, &image, NULL)) < 0) {
For this and all rbd_* calls...
> + virReportSystemError(-r, _("failed to open the RBD image %s"),
> + vol->name);
> + goto cleanup;
> + }
> +
> + r = rbd_stat(image, &info, sizeof(info));
> + if (r < 0) {
> + virReportSystemError(-r, _("failed to stat the RBD image %s"),
> + vol->name);
> + goto cleanup;
> + }
> +
> + r = rbd_get_stripe_count(image, &stripe_count);
> + if (r < 0) {
> + virReportSystemError(-r, _("failed to get stripe count of RBD image %s"),
> + vol->name);
> + goto cleanup;
> + }
I see the subsequent patch has some extra checks before calling this.
Why wouldn't those also need to be made here?
> +
> + VIR_DEBUG("Need to wipe %llu bytes from RBD image %s/%s",
> + (unsigned long long)info.size, pool->def->source.name, vol->name);
> +
> + switch (algorithm) {
Follow the convention of
"switch ((virStorageVolWipeAlgorithm) algorithm) {"
Then each "case" lines up under "switch".
> + case VIR_STORAGE_VOL_WIPE_ALG_ZERO:
> + r = virStorageBackendRBDVolWipeZero(image, vol->name,
> + info, stripe_count);
I would change this (and the next one) to:
if (virStorageBackendRBDVolWipeZero(image, vol->name,
info, stripe_count) < 0)
goto cleanup;
Also, I ran these patches through Coverity - it complains that 'info' is
passed by value of 160 bytes... Although neither API adjusts it, why not
just pass "info.size" and "info.obj_size" or pass by reference the whole
'info' (just to be safe).
> + break;
> + case VIR_STORAGE_VOL_WIPE_ALG_DISCARD:
> + r = virStorageBackendRBDVolWipeDiscard(image, vol->name,
> + info, stripe_count);
> + break;
> + default:
And listing each case allowed - so it's clearer. That way if someone in
the future comes along and adds ALG_ONE, the rbd code isn't forgotten to
be adjusted... The compiler catches it.
> + virReportError(VIR_ERR_INVALID_ARG, _("unsupported algorithm %d"),
> + algorithm);
> + r = -VIR_ERR_INVALID_ARG;
This will be unnecessary...
> + goto cleanup;
> + }
> +
> + if (r < 0) {
> + virReportSystemError(-r, _("failed to wipe RBD image %s"),
> + vol->name);
This overwrites the errors found in the *WipeZero and *WipeDiscard API's
> + goto cleanup;
> + }
The assumption here being
ret = 0;
> +
> + cleanup:
> + if (image)
> + rbd_close(image);
> +
> + virStorageBackendRBDCloseRADOSConn(&ptr);
> + return r;
return ret;
> +}
> +
> virStorageBackend virStorageBackendRBD = {
> .type = VIR_STORAGE_POOL_RBD,
>
> @@ -708,5 +860,6 @@ virStorageBackend virStorageBackendRBD = {
> .buildVol = virStorageBackendRBDBuildVol,
> .refreshVol = virStorageBackendRBDRefreshVol,
> .deleteVol = virStorageBackendRBDDeleteVol,
> - .resizeVol = virStorageBackendRBDResizeVol,
> + .wipeVol = virStorageBackendRBDVolWipe,
> + .resizeVol = virStorageBackendRBDResizeVol
No need to remove the "," - that way the only diff is the line.
> };
> diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c
> index 7932ef2..3e95aa5 100644
> --- a/tools/virsh-volume.c
> +++ b/tools/virsh-volume.c
> @@ -954,7 +954,7 @@ static const vshCmdOptDef opts_vol_wipe[] = {
> VIR_ENUM_DECL(virStorageVolWipeAlgorithm)
> VIR_ENUM_IMPL(virStorageVolWipeAlgorithm, VIR_STORAGE_VOL_WIPE_ALG_LAST,
> "zero", "nnsa", "dod", "bsi", "gutmann", "schneier",
> - "pfitzner7", "pfitzner33", "random");
> + "pfitzner7", "pfitzner33", "random", "discard");
I think "trim" will be better.
John
>
> static bool
> cmdVolWipe(vshControl *ctl, const vshCmd *cmd)
>
More information about the libvir-list
mailing list