[libvirt] [PATCH v4 4/4] gluster: cache glfs connection object per volume
Prasanna Kalever
pkalever at redhat.com
Wed Dec 14 19:48:22 UTC 2016
On Mon, Dec 12, 2016 at 8:02 PM, Prasanna Kalever <pkalever at redhat.com> wrote:
> On Wed, Dec 7, 2016 at 10:38 PM, Peter Krempa <pkrempa at redhat.com> wrote:
>> On Tue, Dec 06, 2016 at 22:52:01 +0530, Prasanna Kumar Kalever wrote:
>>> Currently, in case if we have 4 extra attached disks, then for each disk
>>> we need to call 'glfs_init' (over network) and friends which could be costly.
>>>
>>> Additionally snapshot(external) scenario will further complex the situation.
>>
>> This part was extracted to a separate patch already, thus this doesn't
>> have to deal with snapshots in any way than with other gluster
>> connections.
>
> Yeah!
Sorry, this still holds good. on taking a snapshot
virStorageFileGetMetadataRecurse will be called on each overlay.
Which seeks/hits the cache help while the virStorageFileInitAs
respective calls (In case if snaps reside on the same volume).
Thanks,
--
Prasanna
>
>>
>>> This patch maintain a cache of glfs objects per volume, hence the
>>> all disks/snapshots belonging to the volume whose glfs object is cached
>>> can avoid calls to 'glfs_init' and friends by simply taking a ref on
>>> existing object.
>>>
>>> The glfs object is shared only if volume name and all the volfile servers match
>>> (includes hostname, transport and port number).
>>>
>>> Thanks to 'Peter Krempa' for all the inputs.
>>>
>>> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever at redhat.com>
>>> ---
>>> src/storage/storage_backend_gluster.c | 279 +++++++++++++++++++++++++++++++---
>>> 1 file changed, 254 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/src/storage/storage_backend_gluster.c b/src/storage/storage_backend_gluster.c
>>> index e0841ca..9f633ac 100644
>>> --- a/src/storage/storage_backend_gluster.c
>>> +++ b/src/storage/storage_backend_gluster.c
>>> @@ -36,6 +36,20 @@
>>>
>>> VIR_LOG_INIT("storage.storage_backend_gluster");
>>>
>>> +
>>> +typedef struct _virStorageBackendGlusterState virStorageBackendGlusterState;
>>> +typedef virStorageBackendGlusterState *virStorageBackendGlusterStatePtr;
>>> +
>>> +typedef struct _virStorageFileBackendGlusterPriv virStorageFileBackendGlusterPriv;
>>> +typedef virStorageFileBackendGlusterPriv *virStorageFileBackendGlusterPrivPtr;
>>> +
>>> +typedef struct _virStorageBackendGlusterCache virStorageBackendGlusterCache;
>>> +typedef virStorageBackendGlusterCache *virStorageBackendGlusterCachePtr;
>>> +
>>> +typedef struct _virStorageBackendGlusterCacheConn virStorageBackendGlusterCacheConn;
>>> +typedef virStorageBackendGlusterCacheConn *virStorageBackendGlusterCacheConnPtr;
>>> +
>>> +
>>> struct _virStorageBackendGlusterState {
>>> glfs_t *vol;
>>>
>>> @@ -47,8 +61,205 @@ struct _virStorageBackendGlusterState {
>>> char *dir; /* dir from URI, or "/"; always starts and ends in '/' */
>>> };
>>>
>>> -typedef struct _virStorageBackendGlusterState virStorageBackendGlusterState;
>>> -typedef virStorageBackendGlusterState *virStorageBackendGlusterStatePtr;
>>> +struct _virStorageFileBackendGlusterPriv {
>>> + glfs_t *vol;
>>
>> You should also keep a reference to the object in the cache, so that you
>> can return it and don't have to look it up again.
>>
>>> + char *canonpath;
>>> +};
>>> +
>>> +struct _virStorageBackendGlusterCacheConn {
>>> + virObjectLockable parent;
>>> + virStorageFileBackendGlusterPrivPtr priv;
>>
>> You don't really need the complete object here. you only need the glfs_t
>> part here. The 'cannonpath' variable is different for possible storage
>> source objects.
>
> got it.
>
>>
>>> + size_t nhosts;
>>> + virStorageNetHostDefPtr hosts;
>>> + char *volname;
>>> +};
>>> +
>>> +struct _virStorageBackendGlusterCache {
>>> + virMutex lock;
>>> + size_t nConn;
>>> + virStorageBackendGlusterCacheConnPtr *conn;
>>> +};
>>> +
>>> +
>>> +static virStorageBackendGlusterCachePtr virStorageBackendGlusterGlobalCacheObj;
>>> +static virClassPtr virStorageBackendGlusterCacheConnClass;
>>> +static virOnceControl virStorageBackendGlusterCacheOnce = VIR_ONCE_CONTROL_INITIALIZER;
>>> +static bool virStorageBackendGlusterCacheInitGlobalError;
>>> +static bool virStorageBackendGlusterCacheConnCleaned;
>>
>> As said. Don't use global flags. You should be able to avoid both of
>> them.
>>
>> Okay. I've seen below. They not only are avoidable, but your usage is
>> outright dangerous.
>
> Okay.
>
>>
>>> +
>>> +
>>> +static void virStorageBackendGlusterCacheConnDispose(void *obj);
>>> +
>>> +
>>> +static int virStorageBackendGlusterCacheConnOnceInit(void)
>>> +{
>>> + if (!(virStorageBackendGlusterCacheConnClass =
>>> + virClassNew(virClassForObjectLockable(),
>>> + "virStorageBackendGlusterCacheConnClass",
>>> + sizeof(virStorageBackendGlusterCacheConn),
>>> + virStorageBackendGlusterCacheConnDispose)))
>>
>> I've noticed that you actually create a lockable object here, but don't
>> use the locks. This is currently okay, since it's write only. The lock
>> might come handy though ...
>>
>>> + return -1;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void virStorageBackendGlusterCacheConnDispose(void *object)
>>> +{
>>> + virStorageBackendGlusterCacheConnPtr conn = object;
>>> +
>>> + glfs_fini(conn->priv->vol);
>>> +
>>> + VIR_FREE(conn->priv->canonpath);
>>> + VIR_FREE(conn->priv);
>>> + VIR_FREE(conn->volname);
>>> +
>>> + virStorageNetHostDefFree(conn->nhosts, conn->hosts);
>>> +
>>> + /* all set to delete the connection from cache */
>>> + virStorageBackendGlusterCacheConnCleaned = true;
>>> +}
>>> +
>>> +VIR_ONCE_GLOBAL_INIT(virStorageBackendGlusterCacheConn);
>>> +
>>> +static bool
>>> +virStorageBackendGlusterCompareHosts(size_t nSrcHosts,
>>> + virStorageNetHostDefPtr srcHosts,
>>> + size_t nDstHosts,
>>> + virStorageNetHostDefPtr dstHosts)
>>> +{
>>> + bool match = false;
>>> + size_t i;
>>> + size_t j;
>>> +
>>> + if (nSrcHosts != nDstHosts)
>>> + return false;
>>> + for (i = 0; i < nSrcHosts; i++) {
>>> + for (j = 0; j < nDstHosts; j++) {
>>> + if (srcHosts[i].type != dstHosts[j].type)
>>> + continue;
>>> + switch (srcHosts[i].type) {
>>> + case VIR_STORAGE_NET_HOST_TRANS_UNIX:
>>> + if (STREQ(srcHosts[i].u.uds.path, dstHosts[j].u.uds.path))
>>> + match = true;
>>> + break;
>>> + case VIR_STORAGE_NET_HOST_TRANS_TCP:
>>> + case VIR_STORAGE_NET_HOST_TRANS_RDMA:
>>> + if (STREQ(srcHosts[i].u.inet.addr, dstHosts[j].u.inet.addr)
>>> + &&
>>> + STREQ(srcHosts[i].u.inet.port, dstHosts[j].u.inet.port))
>>> + match = true;
>>> + break;
>>> + case VIR_STORAGE_NET_HOST_TRANS_LAST:
>>> + break;
>>> + }
>>> + if (match)
>>> + break; /* out of inner for loop */
>>> + }
>>> + if (!match)
>>> + return false;
>>> + match = false; /* reset */
>>> + }
>>> + return true;
>>> +}
>>> +
>>> +static int
>>> +virStorageBackendGlusterCacheAdd(virStorageSourcePtr src,
>>> + virStorageFileBackendGlusterPrivPtr priv)
>>> +{
>>> + virStorageBackendGlusterCacheConnPtr newConn = NULL;
>>> + virStorageBackendGlusterCacheConnPtr conn = NULL;
>>> + size_t i;
>>> +
>>> + for (i = 0; i < virStorageBackendGlusterGlobalCacheObj->nConn; i++) {
>>
>> Umm ... you iterate the cache ...
>
> This is the problem with changing the code in the last minute.
> I have edited something after I did git format-patch.
> Will address them.
>
>>
>>> + conn = virStorageBackendGlusterGlobalCacheObj->conn[i];
>>> + if (STRNEQ(conn->volname, src->volume))
>>> + continue;
>>> + if (virStorageBackendGlusterCompareHosts(conn->nhosts, conn->hosts,
>>> + src->nhosts, src->hosts)) {
>>> + return 0;
>>> + }
>>> + }
>>> +
>>> + if (virStorageBackendGlusterCacheConnInitialize() < 0)
>>> + return -1;
>>
>> ... before attempting to initialize the global cache?!? That doesn't
>> make sense. Also this call is not needed here, since the only entrypoint
>> is virStorageFileBackendGlusterInit which initializes it.
>>
>>> +
>>> + if (!(newConn = virObjectLockableNew(virStorageBackendGlusterCacheConnClass)))
>>> + return -1;
>>> +
>>> + if (VIR_STRDUP(newConn->volname, src->volume) < 0)
>>> + goto error;
>>> +
>>> + newConn->nhosts = src->nhosts;
>>> + if (!(newConn->hosts = virStorageNetHostDefCopy(src->nhosts, src->hosts)))
>>> + goto error;
>>> + newConn->priv = priv;
>>> +
>>> + if (VIR_APPEND_ELEMENT(virStorageBackendGlusterGlobalCacheObj->conn,
>>> + virStorageBackendGlusterGlobalCacheObj->nConn,
>>> + newConn) < 0)
>>> + goto error;
>>> +
>>> + return 0;
>>> +
>>> + error:
>>> + virStorageNetHostDefFree(newConn->nhosts, newConn->hosts);
>>> + VIR_FREE(newConn->volname);
>>
>> You leak newConn in some cases.
>
> Unref!
>
>>
>>> + return -1;
>>> +}
>>> +
>>> +static glfs_t *
>>> +virStorageBackendGlusterCacheQuery(virStorageSourcePtr src)
>>
>> We usually use "Lookup".
>
> good one. My naming skills are pathetic. Mercy me ...
>
>>
>>> +{
>>> + virStorageBackendGlusterCacheConnPtr conn;
>>> + size_t i;
>>> +
>>> + if (virStorageBackendGlusterGlobalCacheObj == NULL)
>>> + return NULL;
>>> +
>>> + for (i = 0; i < virStorageBackendGlusterGlobalCacheObj->nConn; i++) {
>>> + conn = virStorageBackendGlusterGlobalCacheObj->conn[i];
>>> + if (STRNEQ(conn->volname, src->volume))
>>> + continue;
>>> + if (virStorageBackendGlusterCompareHosts(conn->nhosts, conn->hosts,
>>> + src->nhosts, src->hosts)) {
>>> + virObjectRef(conn);
>>> + return conn->priv->vol;
>>> + }
>>> + }
>>> +
>>> + return NULL;
>>> +}
>>> +
>>> +static void
>>> +virStorageBackendGlusterCacheRefresh(virStorageSourcePtr src)
>>
>> This name is weird. Did you mean "Release"?
>
> As you say.
>
>>
>>> +{
>>> + virStorageFileBackendGlusterPrivPtr priv = src->drv->priv;
>>> + virStorageBackendGlusterCacheConnPtr conn = NULL;
>>> + size_t i;
>>> +
>>> + virMutexLock(&virStorageBackendGlusterGlobalCacheObj->lock);
>>> + for (i = 0; i < virStorageBackendGlusterGlobalCacheObj->nConn; i++) {
>>> + conn = virStorageBackendGlusterGlobalCacheObj->conn[i];
>>> + if (STRNEQ(conn->volname, src->volume))
>>> + continue;
>>> + if (virStorageBackendGlusterCompareHosts(conn->nhosts, conn->hosts,
>>> + src->nhosts, src->hosts)) {
>>> + virObjectUnref(conn);
>>> + if (!virStorageBackendGlusterCacheConnCleaned) {
>>
>> Umm, nope, that's not how it works. This is totaly thread unsafe. Other
>> thread can modify your global flag.
>>
>> You probably did not check how virObjectUnref actually works. The return
>> value notifes you if the object was disposed (you removed the last
>> referernce) or not.
>
> Exactly.
>
> Okay I realized virObjectUnref has ret value, which no one used it
> earlier or I missed them all.
> Me bad hacker used the same way. I will start using the ret value.
>
> The best part is, I did not notice your comment above and bugged the
> code, after many tries finally found virObjectUnref has ret, when I
> stared reading the email for reply I found that you already gave the
> clue above. Poor me!
>
>>
>>> + if (priv && priv->canonpath)
>>> + VIR_FREE(priv->canonpath);
>>> + goto unlock;
>>> + }
>>> + virStorageBackendGlusterCacheConnCleaned = false;
>>> + VIR_DELETE_ELEMENT(virStorageBackendGlusterGlobalCacheObj->conn, i,
>>> + virStorageBackendGlusterGlobalCacheObj->nConn);
>>> + VIR_FREE(src->drv);
>>
>> This leaks src->drv in cases where the entry was used by two objects
>> simultaneously. This needs to be done in the calling function. (As also
>> pointed out in one of the other patches)
>
> right, Will make this part of virStorageFileDeinit.
>
>>
>>> + }
>>> + }
>>> +
>>> + unlock:
>>> + virMutexUnlock(&virStorageBackendGlusterGlobalCacheObj->lock);
>>> +}
>>>
>>> static void
>>> virStorageBackendGlusterClose(virStorageBackendGlusterStatePtr state)
>>> @@ -538,32 +749,15 @@ virStorageBackend virStorageBackendGluster = {
>>> };
>>>
>>>
>>> -typedef struct _virStorageFileBackendGlusterPriv virStorageFileBackendGlusterPriv;
>>> -typedef virStorageFileBackendGlusterPriv *virStorageFileBackendGlusterPrivPtr;
>>> -
>>> -struct _virStorageFileBackendGlusterPriv {
>>> - glfs_t *vol;
>>> - char *canonpath;
>>> -};
>>> -
>>> -
>>> static void
>>> virStorageFileBackendGlusterDeinit(virStorageSourcePtr src)
>>> {
>>> - virStorageFileBackendGlusterPrivPtr priv = src->drv->priv;
>>> -
>>> VIR_DEBUG("deinitializing gluster storage file %p (gluster://%s:%s/%s%s)",
>>> src, src->hosts->u.inet.addr,
>>> src->hosts->u.inet.port ? src->hosts->u.inet.port : "0",
>>> src->volume, src->path);
>>>
>>> - if (priv->vol)
>>> - glfs_fini(priv->vol);
>>> - VIR_FREE(priv->canonpath);
>>> -
>>> - VIR_FREE(priv);
>>> -
>>> - VIR_FREE(src->drv);
>>> + virStorageBackendGlusterCacheRefresh(src);
>>> }
>>>
>>> static int
>>> @@ -610,6 +804,20 @@ virStorageFileBackendGlusterInitServer(virStorageFileBackendGlusterPrivPtr priv,
>>> return 0;
>>> }
>>>
>>> +static void
>>> +virStorageBackendGlusterCacheInit(void)
>>> +{
>>> + if (VIR_ALLOC(virStorageBackendGlusterGlobalCacheObj) < 0) {
>>> + virStorageBackendGlusterCacheInitGlobalError = false;
>>> + return;
>>> + }
>>> +
>>> + if (virMutexInit(&virStorageBackendGlusterGlobalCacheObj->lock) < 0) {
>>> + virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>> + _("Unable to initialize mutex"));
>>> + virStorageBackendGlusterCacheInitGlobalError = false;
>>> + }
>>> +}
>>>
>>> static int
>>> virStorageFileBackendGlusterInit(virStorageSourcePtr src)
>>> @@ -632,11 +840,32 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src)
>>> src, priv, src->volume, src->path,
>>> (unsigned int)src->drv->uid, (unsigned int)src->drv->gid);
>>>
>>> + if (virOnce(&virStorageBackendGlusterCacheOnce,
>>> + virStorageBackendGlusterCacheInit) < 0)
>>> + return -1;
>>> +
>>> + if (virStorageBackendGlusterCacheInitGlobalError)
>>> + return -1;
>>> +
>>> + virMutexLock(&virStorageBackendGlusterGlobalCacheObj->lock);
>>> +
>>> + priv->vol = virStorageBackendGlusterCacheQuery(src);
>>> + if (priv->vol) {
>>> + src->drv->priv = priv;
>>> + virMutexUnlock(&virStorageBackendGlusterGlobalCacheObj->lock);
>>> + return 0;
>>> + }
>>> +
>>> if (!(priv->vol = glfs_new(src->volume))) {
>>> virReportOOMError();
>>> - goto error;
>>> + goto unlock;
>>> }
>>>
>>> + if (virStorageBackendGlusterCacheAdd(src, priv) < 0)
>>> + goto unlock;
>>> +
>>> + virMutexUnlock(&virStorageBackendGlusterGlobalCacheObj->lock);
>>> +
>>
>> So yet another thread-unsafe part. You correctly add the not-yet-opened
>> connection to the cache, but there is no mechanism to guard suff after
>> this.
>>
>> If a second thread obtains the cache entry wile the first one is not yet
>> finished opening the connection it will get an invalid one.
>>
>> Any thread obtaining an entry from the cache needs to check whether the
>> conection was initialized already as the first thing to do. It can
>> continue only if it's already open and otherwise needs to wait until the
>> original thread finishes.
>>
>> The first thread that added the object into the cache needs to open the
>> connection and notify all other threads potentialy waiting on the
>> condition that it was opened. A virCond should help.
>
> Thanks for the clue, will need to figure out how to use virCond.
>
>>
>>> for (i = 0; i < src->nhosts; i++) {
>>> if (virStorageFileBackendGlusterInitServer(priv, src->hosts + i) < 0)
>>> goto error;
>
> Apologies for the delay.
> Got a switch to internal priority item in gluster.
>
> Will try to work on the reviews tomorrow.
>
> Thanks Peter!
>
> --
> Prasanna
More information about the libvir-list
mailing list