[libvirt] [PATCH v3 00/16] virObject adjustments for common object

Fri Jul 28 13:44:59 UTC 2017

On 07/28/2017 04:51 AM, Pavel Hrdina wrote:
> On Thu, Jul 27, 2017 at 04:46:41PM -0400, John Ferlan wrote:
>>
>>
>> On 07/25/2017 05:29 AM, Pavel Hrdina wrote:
>>> On Thu, Jun 22, 2017 at 10:02:30AM -0400, John Ferlan wrote:
>>>
>>> Let's move the discussion [1] into correct place.
>>>
>>>> Still, I must be missing something. Why is it wrong to create a new
>>>> object that would have a specific use? virObjectLockable was created at
>>>> some point in history and then used as a way to provide locking for a
>>>> virObject that only had a @ref. Someone could still create a virObject
>>>> as well and just get @refs. With the new objects based on those two
>>>> objects you get a few more features that allow for add, search, lookup,
>>>> remove.  But you call that wrong, which is confusing to me.
>>>
>>> What's wrong, or better wording, what I don't like personally about this
>>> approach is that in order to have features like add, search, lookup,
>>> remove it would require that the object is derived from
>>> virObjectLookupKeys.  The only thing it adds to the simple virObject
>>> is two variables @uuid and @name.
>>>
>>> We don't need the virObjectLookupKeys object at all.
>>>
>>
>> First off - thanks for taking the time to put your thoughts here. It
>> really helped me understand where I believe you were coming from. There
>> are some disadvantages to working at home when it comes to sounding
>> boards or drawing things up on a white board for those around you in an
>> office to provide/receive face-2-face feedback. Black and white text can
>> be harsher than intended.
>>
>> Now that I understand that it's less 'wrong' and more 'not necessary' or
>> as you put it a personal dislike I can come in off of the ledge ;-).
>> Still there's a lot of history w/r/t the initial posting where I was
>> encouraged to use/update Objects vs. doing what I did creating a new
>> module and essentially renaming objs and objlists. Scaling back and
>> considering usage of objects sent me down this current path which hasn't
>> received as much feedback. It may not be 100% correct, but I really was
>> hoping it wasn't wrong especially since I have working code.
>>
>> Storing the UUID/Name in an @obj is the means I use to make certain
>> decisions along the way - especially w/r/t Add/Remove logic which as I
>> pointed out previously - there's no way to "Add" something without a key
>> nor "Remove" it; however, the Lookup/Search logic do not need it because
>> the hash table keeps track of the key and all the find/lookup vir*obj
>> logic would have the @def->{uuid|name} as input for the search. The
>> ForEach logic won't use it, although it could (more later).
>>
>> Storing the keys in @obj removes a few decision points from the caller.
>> The consumer only cares they've created an object with 1 or 2 keys in
>> order to allow the objList code "find" those keys and make decisions
>> based on the key rather than having multiple API's that fetch using the
>> key name in the API name. Adding to the objlist is just that - adding
>> and not caring/knowing how many or which tables the object is using.
>>
>> While it's certainly not necessary to have UUID/Name and a LookupKeys
>> object, it is convenient and more than just for UUID/Name. The new
>> object is (will be) used to add bool's (active, persistent, started,
>> etc.) that are found in multiple objects in order to have a common API
>> to manage that. It's not required, but something possible. There are
>> some other 'shared' type pieces of data (configFile, autostartLink,
>> autostart, etc.) that could also make use of "sharing" data and code if
> 
> The shared code for storing a list of object is what we need and want,
> however adding this 'shared' data into the listing API is out of scope
> of that API and not all objects uses this data.  Yes I can see the
> benefit, but IMHO it doesn't belong into that API.
> 

And I don't believe that's the right solution. Ironically from the
original series when the idea of using polymorphic or inherited objects
was suggested, there seemed to be consensus that this was the correct
approach. Although once implemented and shown to work, now you don't
like it and would prefer some sort of common listing code. I'll stick
with this, update the series and post it again with all that I've
written here a bit more fresh in memory.

>> so inclined. Of course the primary reason I've adjusted all those
>> obj->def->X to be def = obj->def and def->X access is that moving the
>> @def and @newDef inside the object was another initial goal, which needs
>> to be rethought/reworked as that was considered too generic and possibly
>> prone to coding errors.
>>
>>> For example, current add/remove API is:
>>>
>>>
>>> ==== virObjectLookupHashNew ====
>>>
>>> virObjectLookupHashNew(virClassPtr klass,
>>>                        int tableElemsStart);
>>>
>>> could be
>>>
>>> virObjectListNew(int tableSize,
>>>                  bool useName,
>>>                  bool useUUID);
>>>
>>> or
>>>
>>> typedef enum {
>>>     VIR_OBJECT_LIST_TABLE_NAME = (1 << 0),
>>>     VIR_OBJECT_LIST_TABLE_UUID = (1 << 1),
>>> } virObjectListTableType;
>>>
>>> virObjectListNew(int tableSize,
>>>                  unsigned int flags);
>>>
>>> I don't see why the virObjectLookupHashNew() has to have the @klass
>>> parameter and the @tableElemsStart is not correct, it's not the number
>>> of elements, it's has table size.
>>>
>>
>> Well @klass is a parameter for other virObject*New() functions, so why
>> should virObjectLookupHashNew not have one?  Or asked differently how
>> would virObjectListNew determine it was derived from some class? Does
> 
> That's the difference, the proposed VirObjectList is a final class and
> currently there is no need to derive any other class from it.
> 
>> that leave the hash tables in the calling vir*obj.c files or in some
>> common file, such as vircommonobjlist.c? To me that's just a shim to
>> virObject for each of the vir*obj.c. I don't like that solution - we
> 
> All of the code for virObjectList would be for example in
> virobjectlist.c|h files, it would be a separate module that uses
> virObject as a building block and also is tailored to contain virObject.
> 
>> shouldn't create a shim that doesn't converge objects, we would take the
>> plunge and create object code. I also don't like ObjectList if I'm using
>> a hash table - it's not a representative name. Perhaps I'm missing
>> something else you had in mind.
> 
> I chose the "list" because that's what the API does, it's storing a data
> addressed by key without any specific order, usually this type of
> data structure is "map" or "dictionary", so yes, the "list" is not the
> perfect description, we can use "map" or "dict", but "hash" is one of
> the possible implementation of this data structure.
>

Lots of agita over a name... For as long as I've been doing this when
someone has an API w/ List or code with @list variables that means some
sort of forward or doubly-link linked list (flink and blink). Under the
covers even the hash table algorithm uses a forward linked list, but it
(mostly) limits the length of the list for any particular bucket to be
of a particular size before the table is made to grow in order to change
the population of the buckets.

A map is something to show you how to get from point A to point Z with
numerous route options through various points along the way that could
be interesting. Although the shortest distance between any 2 points is a
straight line - it's not always the fastest way. I associate the term
                with memory or cpu and has been a 2D/3D type concept. It
allows sharing between things and changing it's size has synchronicity
problems. A dictionary is something on a shelf that keeps everything
alphabetically (and numerically) ordered, but is not great for searches
because there's a higher likelihood of a few of the 62 letter/number
buckets in the english language being overloaded while other buckets
have very few entries. While "hash" isn't best name, it is the
implementation just as much as the current implementation is a linked
list of objects.

BTW: Yes what each of the existing API's returns is a forward linked
list of objects as (in the future) taken from a object containing a hash
table of all objects (such as how the domain code handles things).

>> @tableElemsStart is the starting size of the hash table that can grow
>> beyond the starting size. Whether it changes names to @tableSize is just
>> window dressing to hide the bikes in the shed. I think the reason why I
>> used a variable was to allow different sizes for different ObjLists.
> 
> Yes, the variable should be there, you should be able to specify the
> size of hash table, just the name was confusing.
> 

Again name agita. The reality is - it's just the starting size.  The
@tableSize could change over time. In fact tableSize is even a misnomer.
The libvirt hash table starting size of 10 could actually store up to 80
elements without ever changing size if it was perfectly balanced. Once
there were more than 8 elements in any one of the original 10 buckets
the table gets resized by a factor of 8 w/ a limitation that the next
size cannot be > "8 * 2048" - taking into account the size is the number
of buckets as opposed to the number of elements, but that's a different
problem and discussion.

>> There may have been something else but that's long since left the short
>> term memory. It might be for the clone or backup copy of an ObjList and
>> wanted to have the original table size in order to create the clone.
>> Doesn't seem I use it now though, so saving it could be dropped. Heck we
>> could drop the initial size logic too and give virHashGrow some
>> exercise. Although I see that has some interesting limitations w/r/t
>> bucket chain length and larger element count tables where a max of 2048
>> buckets can be created allowing bucket chains to potentially grow really
>> long. Perhaps not a problem today with test beds using 100's of objects,
>> but IIRC there was a storage example using 1000's of volumes if not 10's
>> of 1000's. Once you get to about 15K objs, then the probability of
>> longer chains increases (I played around with virhashtest a bit - which
>> currently only maxes at 250 objs, yawn).
>>
>> FWIW: I went away from @useName or @useBool type logic because some
>> tables have UUID only, some have Name only, others have both. For those
>> with one - there is a waste of space, but if we start with table sizes
>> of 1, it's minimized. Again, I preferred @primary and @secondary, but
>> that idea was already squashed. In any case, here's a crudely drawn
>> table of UUID/Name and the various vir*obj consumers.
>>
>>             UUID    |  Name
>> ____________________|___________
>> NodeDev     No      |  Yes
>> ____________________|___________
>> Secret      Yes     |  No [1]
>> ____________________|___________
>> NWFilter    Maybe[2]|  Yes
>> ____________________|___________
>> Interface   No      |  Yes
>> ____________________|___________
>> Network     Yes     |  Yes
>> ____________________|___________
>> StoragePool Yes     |  Yes
>> ____________________|___________
>> StorageVol  No      |  Yes
>> ____________________|___________
>> Domain      Yes     |  Yes
>> ____________________|___________
>>
>> [1] I do have patches that would make the secret usage_id be a secondary key
>>
>> [2] Awful piece of history that allows the nwfilter to be defined
>> without a UUID if saving the UUID to the file fails, thus allowing a
>> future start to overwrite the UUID of a previous nwfilter.
>>
>> There's also the possibility that Snapshots could use this, but I don't
>> think it's as beneficial since "unpredictable ordering" for HashTables
>> could make things AFU'd and I don't suppose you'd end up with 1000's of
>> snapshots.
>>
>>>
>>> ==== virObjectLookupHash(Add|Remove) ====
>>>
>>> virObjectLookupHashAdd(void *tableobj,
>>>                        virObjectLookupKeysPtr obj);
>>>
>>> virObjectLookupHashRemove(void *tableobj,
>>>                           virObjectLookupKeysPtr obj);
>>>
>>> The only difference without the virObjectLookupKeys object would be
>>> that the API will take two more parameters to give it the @uuid and
>>> @name.
>>>
>>> virObjectListAdd(virObjectListPtr listObj,
>>>                  void *obj,
>>>                  const char *name,
>>>                  const char *uuid);
>>>
>>> virObjectListRemove(virObjectListPtr listObj,
>>>                     const char *name,
>>>                     const char *uuid);
>>>
>>> or
>>>
>>> virObjectListRemoveByName(virObjectListPtr listObj,
>>>                           const char *name);
>>> virObjectListRemoveByUUID(virObjectListPtr listObj,
>>>                           const char *uuid);
>>>
>>> Yes, at first it may looks worse, but on the other hand it gives you
>>> better flexibility in the way that the object added/removed from the
>>> list doesn't even have to have the exact same variables and it will
>>> work with every object that is derived from virObject.
>>>
>>
>> Ahhh.. and this is where our fundamental difference lies. You see
>> obj->uuid and obj->name as unnecessary and I find them useful especially
>> for the purpose of adding/removing @obj to/from the @listObj.
> 
> Agree, for adding/removing you need all keys for all tables, having a
> common object with predefined keys makes it easier but also adds a
> limitations that you can use the list only for these objects.
> 

Forcing naming by UUID or by Name is limiting to me. Does it really
matter what the name of a key is to the hash object? No it just needs to
be unique and for our purposes is currently a string comparison. The
fact that the driver/vir*obj's "know" that it's by UUID or by Name is a
higher level abstraction. Still I capitulated and altered my primary and
secondary to use uuid/name and that's now causing other problems. Feels
like a no win situation.

When you add a 3rd key by FOO, then by having API's with By{KEY} names
means you have to add a series of API's instead of modifying the object
handling code to handle that.

In my view, for any current driver/vir*obj that has 1 key (e.g.
secrets), if I one day decide to use a second key (say by usageID), then
all I do is add that key to the virSecretObjNew processing and voila
under the covers the object knows how to handle things by one or two
keys. The driver/vir*obj code then can use LookupByKey instead of Search
for the new key. Similarly, if there were a 3rd key then a few simple
modifications to add a new key to virObjectLookupKeys, modify the
Add/Remove algorithm to handle the 3rd key, and teach the
Find/ForEach/Search API's to handle any one of 3 keys and things should
work. There's probably a couple of other missing details, but to me less
work than new API's represent.

The nodedev code currently has the most multiple key possibilities with
ObjListFind* API's for SysfsPath, Name, WWNs, FabricWWN, Cap, and
SCSIHostByWWNs. Since a key has to exist in all objs to be valid, that
means for nodedev's, the only one that comes close is SysfsPath, but it
doesn't work for the 'computer' object, so that's not an option. Thus
(for now) nodedev is stuck using a single key. Luckily we're guaranteed
some amount of uniqueness by the host/kernel algorithms.

>> It seems you'd rather see virObject* API's (or perhaps some new
>> vircommonobjlist.c) that force the callers to know which call to use
>> (ByName/ByUUID) and which table's were created for the object. I prefer
>> having a common API that doesn't need to be that specific, but does the
>> thing just using an argument.
> 
> Yes, that's exactly what I'm proposing, having a separate module, it's
> not even an virObject* API.
> 
>> Your characterization of "flexible" is my characterization of "control".
>> It's not like UUID/Name are going away - they're just being managed in a
>> lower layer. My view is that the @obj as created during vir*ObjNew is
>> what decides which tables it will need to be present in. The opposing
>> view is I have an object, I'm going to place it in this table and
>> possibly that table and I know where I want to put it and will manage
>> that from the calling function entirely.
>>
>> One thing I would like to see changed is *Remove returning status. Not a
>> a lot of good happens if something during the Remove processing fails
>> and no one is told.
>>
>>>
>>> ==== virObjectLookupHashFind ====
>>>
>>> virObjectLookupHashFind(void *tableobj,
>>>                         bool useUUID,
>>>                         const char *key);
>>>
>>> could be split into two APIs:
>>>
>>> virObjectListFindByName(virObjectListPtr listObj,
>>>                         const char *name);
>>> virObjectListFindByUUID(virObjectListPtr listObj,
>>>                         const char *UUID);
>>>
>>> Which in my opinion is way better than having a single API with a
>>> boolean parameter that switches what type of key it is.
>>>
>>
>> Six of one, half-dozen of the other. The consumer is still going to call
>> one or the other ByName/ByUUID, but that leaves that decision point
>> higher up. It's easy enough to make ByName and ByUUID API's, but they
>> just turn into shims to the common code - so why bother?
> 
> Because a bool parameter is not extendable, imagine that we would like
> to add another key in future, it would force us to rewrite all the APIs.
> 

So after X amount of years of using UUID/Name do you really believe
there's going to be a 3rd key? Even if there were, changing the @bool to
an @flags is very simple. It's not like we've never done that before and
it's not like we're exposing these API's in such a way that we need to
have a whole new API. It's an internal mechanism.

Still what purpose does a 3rd key ever serve for an object? That
probably means one didn't set unique enough primary and secondary keys.
A tertiary key means searching 3 tables to make sure none of them has
keys being used that the new object is trying to use as it's keyset. I
sincerely doubt there's much use for that. I have a feeling we'd
outright reject someone creating a 3 keys for the simple reason the
API's are designed to handle at most 2 keys which should guarantee
uniqueness as long as you're not forced by API naming scheme to have
certain named keys.

>> I still prefer the concept of @primary and @secondary. Consider the
>> table above - should it matter that a table stores UUID's? or Names? or
>> just one or two keys?  We made the decision early on during ObjNew what
>> our keys would be.  After that it's up to calling function to provide
>> some key that we'll search on. For some consumers which use both tables,
>> they want to Find in a specific table and that's why you'd have a
>> boolean to say - don't search primary, search secondary.
>>
>> How about a different idea. Why not make the hash table count variable
>> (1, 2, 3, etc.) and let the caller decide how many to create and which
> 
> I had the same idea, however for some object it doesn't give us any
> benefit to have 4 or 5 hash tables for different keys because the number
> of object stored in the tables would be small.  We have to think
> about balance between speed and memory consumption.
> 

Right and having more than 2 keys for any object means that object *has*
to have all N keys in order for creation of a tertiary table to exist
and all keys have to be unique between 3 tables. I hadn't fully thought
through that when originally writing.

Using a variable number of hash tables is primarily beneficial for that
memory consumption conundrum for those existing vir*obj that use 1
entry, IOW: Nodedev, Interface, Secret, and Volume.

Of those only nodedev seems to be stuck with 1 key. Secrets could (and
should) add unique_id as a Name to go along with the existing UUID.
Interfaces could add MAC and Volumes could add either Key or Path.

However, in order to do so because previous reviews forced me to use
UUID and Name, that means I either "utilize" UUID for "MAC" or "Key" or
I create a tertiary table just because I have driver/vol*obj that
doesn't conform the "chosen" standard using UUID and Name as the key names.

Does it at least make more sense now why I prefer non deterministic
names for the table key names. I chose primary and secondary, but it
could have been "foo" and "bar" too. As a side note for Interface, IIRC
a virtual MAC can change that means the table key changes which makes
things tricky. So doing so is "outside" my current scope.

>> to use. That way the nodedev code removes the 4 or 5 FindBy type API's
>> and replaces it with 4 or 5 hash tables to allow for lookup by key. The
>> consumer can then have vir*ObjListFind{ByUUID|ByName}, but that ends up
>> being a virObjectLookupHashFind(tableobj, tableidx, const char *key)
>> where the consumer knows tableidx = 0 is UUID, tableidx = 1 is Name, etc.
>>
>>>
>>> ==== virObjectLookupHashForEach ====
>>>
>>> virObjectLookupHashForEach(void *tableobj,
>>>                            bool useUUID,
>>>                            virHashIterator callback,
>>>                            virObjectLookupHashForEachDataPtr data);
>>>
>>> could be
>>>
>>> virObjectListForEach(virObjectListPtr listObj,
>>>                      virHashIterator callback,
>>>                      void *callbackData);
>>>
>>> There are two things that I don't like.  There is no need to have the
>>> @useUUID because both tables should contain the same objects if both are
>>> used so this can be hidden from the user and the API will simply use one
>>> of the available tables.  The second issue is that this API forces you
>>> to use some predefined data structure and you cannot create your own
>>> specifically tailored for the task that you are about to do.
>>>
>>> The @useUUID argument also applies to virObjectLookupHashSearch().
>>>
>>
>> [NB: written before rereading and writing the above paragraph regarding
>> configurable number of hash tables]
>>
>> How do you choose which table? How do you know which data the caller
>> wants? Refer to the table above - some objects have UUID and some have
>> Name. Some consumers may even want to return a list of all objects by
>> UUID using an input parameter and other by Name with a different input
>> parameter. The *ForEach ends up being a multi-purpose backend for NumOf,
>> Get{Name|UUID}, and Export functions. Typically the Get{Name|UUID} is
>> the "primary key" for the object, but I'm just saying...
>>
>> While I'm typing this response I just realized that the Get{Names|UUIDs}
>> type API functions wouldn't need vir*obj.c specific *Callback functions
>> to deref obj->def->{name|uuid} since obj->{uuid|name} exists! I'd still
>> need some function to determine whether @aclfilter was passed or not for
>> @def and I'd to know to @useUUID or not. But I could converge things
>> even more than I have.
>>
>>>
>>>
>>>
>>> The usage of the virObjectList* APIs could be something like this:
>>>
>>> virObjectListPtr
>>> virInterfaceObjListNew(void) {
>>>     return virObjectListNew(10, VIR_OBJECT_LIST_TABLE_NAME);
>>> }
>>>
>>> static virInterfaceObjPtr
>>> virInterfaceObjListFindByName(virObjectListPtr interfaces,
>>>                               const char *name)
>>> {
>>>     return virObjectListFindByName(interfaces, name);
>>> }
>>>
>>> Pavel
>>>
>>
>> And here's a perfect example of an object that only lives in one table.
>> If I call virObjectLookupHashForEach or virObjectLookupHashSearch and
>> expect to find something in objsUUID, I'm going to be sorely
>> disappointed. One solution is to have two API's - my solution is one API
>> with a parameter that dictates.
> 
> That's the thing, for APIs ForEach or Search you need to use only one
> hash table.  If there is only one, it's easy to pick the correct one, if
> there are two or more tables, they should contain the same objects only
> mapped by different keys so it doesn't matter which table is used.
> 
> Pavel
> 

OK - sure since all objects have to be in both tables *if* there are two
tables, then it's a selection of one table to search. However, since my
concept of @primary and @secondary was discarded - how does one know
which to pick - UUID or Name?  Nodedev is not going to find anything by
UUID. So as another adjustment to the next posting I'll alter the names
to be less deterministic. That way the Find/Search/ForEach algorithms
can always don't need to know whether the upper layer is using one or
two tables.

John

>> Given the amount of change since this series was first posted, I have to
>> deal with merge conflicts, the virHashSearch new argument, and using RW
>> locks for all the objlists. So I'm going to have to repost the patches
>> again anyway.  At this time, I prefer to leave them as is for the
>> reasons stated regarding avoiding a shim. If that's the avenue to take,
>> then I'll let someone else go down that path. I can finish up cleanup
>> and conversion of all the vir*obj code to use vir*Obj and vir*ObjList,
>> but converging to a shim is something I'll avoid.
>>
>>
>> John
>>
>> --
>> libvir-list mailing list
>> libvir-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list