[Pulp-list] Importer Sync APIs

Tue Nov 22 15:18:34 UTC 2011

A few more other random thoughts now that I've had some coffee.

> To be both consistent, flexible and efficient, I suggest an API based
> around a "ContentUnitData" class with the following attributes:
> - type_id
> - unit_id (may be None when defining a new unit to be added to Pulp)
> - key_data
> - other_data
> - storage_path (may be None if no bits are stored for the content type -
> perhaps whether or not bits are stored should be part of the content
> type definition?)

I actually already wrote this about two weeks ago in the refinements of 
the importer APIs. The original intention was for the importer's "add" 
functionality which is meant to field user uploaded content units (not 
gonna go any deeper into that now). That made me happy to find this 
morning  :)

> The content management API itself could then look like:
>
> - get_units() -> two level mapping {type_id: {unit_id: ContentUnitData}}
> Replacement for get_unit_keys_for_repo()
> Note that if you're concerned about exposing 'unit_id', the existing
> APIs already exposed it as the return value from
> 'add_or_update_content_unit'.
> I think you're right to avoid exposing a "single lookup" API, at least
> initially - that's a performance problem waiting to happen.

I'm unsure of how people would best want the return type. I can see an 
argument for wanting to organize by unit keys too.

So to that end, I'm returning a custom object (currently named UnitBag, 
but I may find something better) that offers a bunch of transformations 
(queries is probably a better term) on the set of units. So "give me 
them by unit key" or "give me just the unit keys" or "give me them 
mapped by ID". Stuff like that.

> - new_unit(type_id, key_data, other_data, relative_path) -> ContentUnitData
> Does *not* assign a unit ID (or touch the database at all)
> Does fill in absolute path in storage_path based on relative_path
> Replaces any use of "request_unit_filename"
>
> - save_unit(ContentUnitData) -> ContentUnitData
> Assigns a unit ID with the unit and stores the unit in the database
> Associates the unit with the repo
> Batching will be tricky due to error handling if the save fails
> Replaces any use of 'add_or_update_content_unit' and
> 'associate_content_unit'

Another thing I forgot to discuss in that blog is the idea of child 
units. So an errata is itself a unit, but has references to RPMs which 
are their own units.

I'm thinking of sticking with the proposed model:

link_child(parent_unit, child_unit)

More details on that later, I'm still kinda flushing it out.

-- 
Jay Dobies
Freenode: jdob @ #pulp
http://pulpproject.org | http://blog.pulpproject.org