[libvirt] [RFC PATCH 1/1] gluster: cache glfs connection object per volume
Prasanna Kalever
pkalever at redhat.com
Tue Nov 29 15:13:10 UTC 2016
On Tue, Nov 15, 2016 at 3:07 PM, Peter Krempa <pkrempa at redhat.com> wrote:
> On Mon, Nov 14, 2016 at 20:04:16 +0530, Prasanna Kumar Kalever wrote:
>> This patch offer,
>> 1. Optimizing the calls to glfs_init() and friends
>> 2. Temporarily reduce the memory leak appear in libvirt process account,
>> even if the root cause for the leaks are glfs_fini() (7 - 10MB per object)
>
> That is not a valid use case as I've said a few times.
>
>> [Hopefully gluster should address this in its future releases, not very near]
>>
>> Currently, a start of a VM will call 2 glfs_new/glfs_init (which will create
>> glfs object, once for stat, read headers and next to chown) and then will fork
>> qemu process which will call once again (for actual read write IO).
>>
>> Not that all, in case if we are have 4 extra attached disks, then the total
>> calls to glfs_init() and friends will be (4+1)*2 in libvirt and (4+1)*1 in
>> qemu space i.e 15 calls. Since we don't have control over qemu process as that
>> executes in a different process environment, lets do not bother much about it.
>>
>> This patch shrinks these 10 calls (i.e objects from above example) to just
>> one, by maintaining a cache of glfs objects.
>>
>> The glfs object is shared across other only if volume name and all the
>> volfile servers match. In case of hit glfs object takes a ref and
>> only on close unref happens.
>>
>> Thanks to 'Peter Krempa' for the discussion.
>>
>> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever at redhat.com>
>> ---
>>
>> WORK IN PROGRESS: (WIP)
>> ----------------
>> While initially caching the glfs object, i.e. in
>> virStorageBackendGlusterSetPreopened() I have took a ref=2, so on the following
>> virStorageBackendGlusterClosePreopened() --ref will make ref=1, which will
>> help not cleaning up the object which has to be shared with next coming disks
>> (if conditions meat).
>>
>> Given some context, the idea is that on time-out (or after all disks are
>> initiallized) some one should call virStorageBackendGlusterClosePreopened()
>> which will ideally make ref=0, hence cached object will be cleaned/deleted
>> calling glfs_fini()
Apologies for delay w.r.t this patch, was on PTO last week.
>
> Second option is to add storage driver APIs that will allow to register
> and unregister the storage driver volumes when the VM driver will need
> to access them.
>
> This is basically equivalent to calling virStorageFileInit(As) right when
> starting the VM and avoiding closing it until cleaning up from the
> start.
>
> This will immediately remove the two openings of the gluster volume when
> chmoding and checking that the volume exists.
>
> Additionally it gives you an convenient point to do reference counting
> on similar connections to the gluster node.
Wonderful. Was not able to figure out a point for ref counting. This
approch unblocks me :-)
>
> Thankfully gluster's security handling is so terrible that we don't
> really need to pass auth objects to it, since it would complicate things
> further.
>
>>
>> I had a thought of doing the time-out cleanup call in
>> virSecurityManagerSetAllLabel() or similar, but that looks too odd for me?
>
> No, your endeavour needs to be split into two separate things:
>
> 1) Unifying of calls to virStorageFileInit and virStorageFileDeinit so
> that it's called just once during VM start
>
> 2) adding connection caching to the gluster backend driver so that
> multiple similar connections don't need to open the connection.
Yeah, makes perfect sense.
>
> Adding timed closing of the cached connections would be complicated and
> prone to breaking.
>>
>> Some help ?
>> Thanks in advance.
>>
>> ---
>> src/storage/storage_backend_gluster.c | 136 ++++++++++++++++++++++++++++++++--
>> src/storage/storage_backend_gluster.h | 33 ++++++++-
>> 2 files changed, 160 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/storage/storage_backend_gluster.c b/src/storage/storage_backend_gluster.c
>> index 8e86704..4f53ebc 100644
>> --- a/src/storage/storage_backend_gluster.c
>> +++ b/src/storage/storage_backend_gluster.c
>> @@ -47,19 +47,132 @@ struct _virStorageBackendGlusterState {
>> char *dir; /* dir from URI, or "/"; always starts and ends in '/' */
>> };
>>
>> +virGlusterDefPtr ConnCache = {0,};
>
> variables in libvirt don't start with a capital letter. Also the type
> name is not accurately chosen.
Agree.
>
>> +
>> typedef struct _virStorageBackendGlusterState virStorageBackendGlusterState;
>> typedef virStorageBackendGlusterState *virStorageBackendGlusterStatePtr;
>>
>> +void
>> +virStorageBackendGlusterSetPreopened(virStorageSourcePtr src, glfs_t *fs)
>> +{
>> + size_t i;
>> + virStorageBackendGlusterStatePtrPreopened entry = NULL;
>> +
>> + if (ConnCache == NULL && (VIR_ALLOC(ConnCache) < 0))
>
> This needs to be serialized, otherwise it can create races between
> multiple threads.
>
> Please use the virOnce infrastructure for this.
Okay!
>
>> + return;
>> +
>> + for (i = 0; i < ConnCache->nConn; i++) {
>> + if (STREQ(ConnCache->Conn[i]->volname, src->volume))
>
> As I've pointed out, just matching the volume name is not enough. All
> hosts and ports need to be checked as well, otherwise you can't be sure
> that the volume is the same.
>
> I've reiterated this point multiple times during our chat.
I remember this, sorry for not mentioning this in WIP list, will
address this in next spin.
Are there any util functions to compare IP vs FQDN Hostname ?
Else I need to write one.
>
>> + return;
>> + }
>> +
>> + if (VIR_ALLOC(entry) < 0)
>> + goto L1;
>> +
>> + if (VIR_STRDUP(entry->volname, src->volume) < 0)
>> + goto L1;
>> +
>> + entry->nhosts = src->nhosts;
>> + for (i = 0; i < src->nhosts; i++) {
>> + if (VIR_ALLOC_N(entry->hosts[i], strlen(src->hosts[i].name) + 1) < 0)
>> + goto L2;
>> + strcpy(entry->hosts[i], src->hosts[i].name);
>
>
> VIR_STRDUP instead of all the above.
Uhhhh...
My bad.
>
>> + }
>> +
>> + entry->fs = fs;
>> + entry->ref = 2; /* persist glfs obj per volume until a final timeout
>> + virStorageBackendGlusterClosePreopened() is called */
>> +
>> + if (VIR_INSERT_ELEMENT(ConnCache->Conn, -1, ConnCache->nConn, entry) < 0)
>> + goto L2;
>> +
>> + return;
>> +
>> +L2:
>> + for (i = 0; i < entry->nhosts; i++)
>> + VIR_FREE(entry->hosts[i]);
>> +L1:
>
> These won't pass syntax check. Please follow the libvirt coding rules
> and choose sensible names for the labels.
sure, Looks like I have edited some portions post to make syntax-check
execution.
sorry, will try not to repeat this.
>
>> + if(ConnCache->nConn == 0)
>> + VIR_FREE(ConnCache);
>> + VIR_FREE(entry->volname);
>> + VIR_FREE(entry);
>> +}
>> +
>> +glfs_t *
>> +virStorageBackendGlusterFindPreopened(virStorageSourcePtr src)
>> +{
>> + size_t i, j, k, ret = 0;
>> + size_t min, max;
>> +
>> + if (ConnCache == NULL)
>> + return NULL;
>> +
>> + virStorageBackendGlusterStatePtrPreopened entry;
>> +
>> + for (i = 0; i < ConnCache->nConn; i++) {
>> + entry = ConnCache->Conn[i];
>> + if (STREQ(entry->volname, src->volume)) {
>> + min = entry->nhosts < src->nhosts ? entry->nhosts : src->nhosts;
>> + max = entry->nhosts >= src->nhosts ? entry->nhosts : src->nhosts;
>
> If the number of hosts does not match, there's no point in checking that
> they are equal ... since they certainly are not.
Yeah, I was also thinking if I can convince you with the subset servers match ?
Example:
Disk1:
IPa, IPc, IPe, IPg
Disk2:
IPa, IPe
Since all addresses in Disk2 matches with Disk1, therefore ... ?
>
>> + for (j = 0; j< min; j++) {
>> + if (entry->nhosts == min) {
>> + for (k = 0; k < max; k++) {
>> + if (STREQ(entry->hosts[j], src->hosts[k].name)) {
>
> You are not checking port numbers.
Should also take care about transport type as well.
>
>> + ret = 1;
>> + break;
>> + }
>> + }
>> + if (!ret)
>> + return NULL;
>> + } else {
>> + for (k = 0; k < max; k++) {
>> + if (STREQ(src->hosts[j].name, entry->hosts[k])) {
>> + ret = 1;
>> + break;
>> + }
>> + }
>> + if (!ret)
>> + return NULL;
>> + }
>> + }
>> + entry->ref++;
>> + return entry->fs;
>> + }
>> + }
>> + return NULL;
>> +}
>> +
>> +int
>> +virStorageBackendGlusterClosePreopened(glfs_t *fs)
>> +{
>> + size_t i;
>> + int ret = 0;
>> +
>> + if (fs == NULL)
>> + return ret;
>> +
>> + for (i = 0; i < ConnCache->nConn; i++) {
>> + if (ConnCache->Conn[i]->fs == fs) {
>> + if (--ConnCache->Conn[i]->ref)
>> + return ret;
>
> Please use libvirt's reference counted objects instead of hand-rolled
> reference counting stuff.
>
>> +
>> + ret = glfs_fini(ConnCache->Conn[i]->fs);
>> + VIR_FREE(ConnCache->Conn[i]->volname);
>> + VIR_FREE(ConnCache->Conn[i]);
>> +
>> + VIR_DELETE_ELEMENT(ConnCache->Conn, i, ConnCache->nConn);
>
> This needs locked access to the cache. This would introduce a rather
> nasty race possibility.
Yeah, something like virMutexLock(connCache->lock);
>
>> + }
>> + }
>> + return ret;
>> +}
>> +
>> static void
>> virStorageBackendGlusterClose(virStorageBackendGlusterStatePtr state)
>> {
>> if (!state)
>> return;
>>
>> - /* Yuck - glusterfs-api-3.4.1 appears to always return -1 for
>> - * glfs_fini, with errno containing random data, so there's no way
>> - * to tell if it succeeded. 3.4.2 is supposed to fix this.*/
>> - if (state->vol && glfs_fini(state->vol) < 0)
>> + if (state->vol && virStorageBackendGlusterClosePreopened(state->vol) < 0)
>> VIR_DEBUG("shutdown of gluster volume %s failed with errno %d",
>> state->volname, errno);
>>
>> @@ -556,8 +669,7 @@ virStorageFileBackendGlusterDeinit(virStorageSourcePtr src)
>> src, src->hosts->name, src->hosts->port ? src->hosts->port : "0",
>> src->volume, src->path);
>>
>> - if (priv->vol)
>> - glfs_fini(priv->vol);
>> + virStorageBackendGlusterClosePreopened(priv->vol);
>> VIR_FREE(priv->canonpath);
>>
>> VIR_FREE(priv);
>> @@ -630,11 +742,20 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src)
>> src, priv, src->volume, src->path,
>> (unsigned int)src->drv->uid, (unsigned int)src->drv->gid);
>>
>> +
>> + priv->vol = virStorageBackendGlusterFindPreopened(src);
>> + if (priv->vol) {
>> + src->drv->priv = priv;
>> + return 0;
>> + }
>> +
>> if (!(priv->vol = glfs_new(src->volume))) {
>> virReportOOMError();
>> goto error;
>> }
>>
>> + virStorageBackendGlusterSetPreopened(src, priv->vol);
>> +
>> for (i = 0; i < src->nhosts; i++) {
>> if (virStorageFileBackendGlusterInitServer(priv, src->hosts + i) < 0)
>> goto error;
>> @@ -652,8 +773,7 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src)
>> return 0;
>>
>> error:
>> - if (priv->vol)
>> - glfs_fini(priv->vol);
>> + virStorageBackendGlusterClosePreopened(priv->vol);
>> VIR_FREE(priv);
>>
>> return -1;
>> diff --git a/src/storage/storage_backend_gluster.h b/src/storage/storage_backend_gluster.h
>> index 6796016..a0326aa 100644
>> --- a/src/storage/storage_backend_gluster.h
>> +++ b/src/storage/storage_backend_gluster.h
>> @@ -22,9 +22,40 @@
>> #ifndef __VIR_STORAGE_BACKEND_GLUSTER_H__
>> # define __VIR_STORAGE_BACKEND_GLUSTER_H__
>>
>> -# include "storage_backend.h"
>> +#include "storage_backend.h"
>> +#include <glusterfs/api/glfs.h>
>
> NACK to this hunk. The gluster stuff needs to stay contained in the
> backend driver and should not leak to other parts of libvirt.
Will remove.
>
>>
>> extern virStorageBackend virStorageBackendGluster;
>> extern virStorageFileBackend virStorageFileBackendGluster;
>>
>> +
>> +struct _virStorageBackendGlusterStatePreopened {
>> + char *volname;
>> + size_t nhosts;
>> + char *hosts[1024]; /* FIXME: 1024 ? */
>
> You need to allocate this dynamically.
That's the idea, since this was initial patch, just gotneglected.
>
>> + glfs_t *fs;
>
> This needs to be internal.
>
>> + int ref;
>> +};
>> +
>> +typedef struct _virStorageBackendGlusterStatePreopened virStorageBackendGlusterStatePreopened;
>> +typedef virStorageBackendGlusterStatePreopened *virStorageBackendGlusterStatePtrPreopened;
>> +
>> +struct _virGlusterDef {
>> + size_t nConn;
>> + virStorageBackendGlusterStatePtrPreopened *Conn;
>> +};
>> +
>> +typedef struct _virGlusterDef virGlusterDef;
>> +typedef virGlusterDef *virGlusterDefPtr;
>> +
>> +extern virGlusterDefPtr ConnCache;
>> +
>> +void
>> +virStorageBackendGlusterSetPreopened(virStorageSourcePtr src, glfs_t *fs);
>> +
>> +glfs_t*
>> +virStorageBackendGlusterFindPreopened(virStorageSourcePtr src);
>> +
>> +int
>> +virStorageBackendGlusterClosePreopened(glfs_t *fs);
>> #endif /* __VIR_STORAGE_BACKEND_GLUSTER_H__ */
>
>
> Again these can't be exported since they take gluster data types. Also
> it looks that they are internal to the gluster impl so there's no need
> to export them at all.
Make Sense.
Thanks a lot Peter!
Will post a modified version sometime tomorrow.
--
Prasanna
>
> Peter
More information about the libvir-list
mailing list