[Pulp-list] Importer Sync APIs
Nick Coghlan
ncoghlan at redhat.com
Mon Nov 21 23:33:39 UTC 2011
On 11/22/2011 07:43 AM, Jay Dobies wrote:
> http://blog.pulpproject.org/2011/11/21/importer-sync-apis/
>
> I know the week of Thanksgiving isn't the best time to ask for deep
> thought, but I'm asking anyway.
No Thanksgiving over here, so no holiday distractions for me :)
> I also know there are at least two other teams interested in writing
> plugins that I'd like to give some feedback on how this will meet their
> needs.
1. The new sync log API looks pretty good. What I'll do is set up my
sync commands to log to a file on disk (since of them run in a different
process), then when everything is done, read that file and pass the
contents back in the final report.
However, it would be nice to be able to store a "stats" mapping in
addition to the raw log data.
2. I *think* the 'working directory' API is the
'get_repo_storage_directory()' call on the conduit. However, I'm not
entirely clear on that, nor what the benefits are over using Python's
own tempfile module (although that may be an artefact of the requirement
for 2.4 compatibility in Pulp - with 2.5+, the combination of context
managers, tempfile.mkdtemp() and shutil.remove() means that cleaning up
temporary directories is a *lot* easier than it used to be)
3. The "request_unit_filename" and "add_or_update_content_unit" APIs
seem oddly asymmetrical, and the "get_unit_keys_for_repo" naming
includes quite a bit of redundancy.
To be both consistent, flexible and efficient, I suggest an API based
around a "ContentUnitData" class with the following attributes:
- type_id
- unit_id (may be None when defining a new unit to be added to Pulp)
- key_data
- other_data
- storage_path (may be None if no bits are stored for the content
type - perhaps whether or not bits are stored should be part of the
content type definition?)
The content management API itself could then look like:
- get_units() -> two level mapping {type_id: {unit_id: ContentUnitData}}
Replacement for get_unit_keys_for_repo()
Note that if you're concerned about exposing 'unit_id', the
existing APIs already exposed it as the return value from
'add_or_update_content_unit'.
I think you're right to avoid exposing a "single lookup" API, at
least initially - that's a performance problem waiting to happen.
- new_unit(type_id, key_data, other_data, relative_path) -> ContentUnitData
Does *not* assign a unit ID (or touch the database at all)
Does fill in absolute path in storage_path based on relative_path
Replaces any use of "request_unit_filename"
- save_unit(ContentUnitData) -> ContentUnitData
Assigns a unit ID with the unit and stores the unit in the database
Associates the unit with the repo
Batching will be tricky due to error handling if the save fails
Replaces any use of 'add_or_update_content_unit' and
'associate_content_unit'
- remove_unit(type_id, pulp_id) -> bool
True if removed, False if association retained
Replaces any use of 'unassociate_content_unit'
For the content unit lifecycle, I suggest adopting a reference counting
model where the importer owns one set of references (controlled via
save_unit/remove_unit on the importer conduit) and manual association
owns a second set of references (which the importer conduit can't
touch). A reference through either mechanism would then keep the content
unit alive and associated with the repository (the repo should present a
unified interface to other code, so client code doesn't need to care if
it is an importer association or a manual association that is keeping
the content unit alive).
4. It's probably worth also discussing the scratchpad API that lets the
importer store additional state between syncs. I like having this as
a convenience API (rather than having to store everything as explicit
metadata on the repo), but for debugging purposes, it would probably be
good to publish "_importer_scratchpad" as a metadata attribute on the
repo that is accessible via REST.
> This is too big and ambitious for me to get right on my own.
Definitely headed in the right direction, but I think it's worth pushing
the "structured data" approach even further. You've already started
doing this on the configuration side of things, I think it makes sense
on the metadata side as well.
Cheers,
Nick.
--
Nick Coghlan
Red Hat Engineering Operations, Brisbane
More information about the Pulp-list
mailing list