[Pulp-list] Client Refactoring

Mon Jul 25 15:28:38 UTC 2011

On Mon, Jul 25, 2011 at 10:10:57AM -0400, Jay Dobies wrote:
> What about things like query calls? Are they then defined in the
> objects themselves and each object exposes a number of query_*
> methods that indicate the possibilties?

Not sure what you mean by query calls.  If you mean the ability to pass in
arbitrary key/value pairs such that the server constructs a query based on the
input, I think those same types of calls could still be supported as they are
today.

> I'm not a huge fan of this approach, but more generally I'm not a
> fan of the pure REST approach we've been taking either. It sounds
> great for simple use cases (single resource, trying to create/delete
> it). But I think it's ultimately going to be limiting in terms of
> developing an API that's actually useful.

I'm only really talking about developing a client library that interfaces
easily with the server API we already have.  I don't mean to suggest changes
to the server side as part of this effort.  

Right now we have a REST server API.  However, we have a client api library
that sometimes acts if it's xmlrpc.  E.g., calling methods with long argument
lists instead of working with resources directly.  Personally, I think it
would be a little easier if we exposed the resources, as models using Python
classes, in the client.  If we have an API that is too hard to use, or can't
be used by interacting via resources in the REST-ful way, I'd argue the API
could be improved.

Keep in mind, this is somewhat orthogonal to the client refactoring, this is
just something I noticed and thought it would be good to get on the radar. 

> It's the things that cross "resource" boundaries that complicate it.
> When adding a package to a repo, where does that fall? Is that
> .create() in package? Is that .add_package() in repo?

I'd say whichever way it's done on the server.  If you go against /repo to add
packages to it, it would be on repo, etc.  

> We currently have a big issue in our API on how bad our status APIs
> are. Everything is broken down into individual resources which just
> doesn't make sense from a usefulness perspective. I have to get the
> repo, then get it's sync list, then get the sync entry to get to
> actual useful data. That's going to be somewhat annoying if we have
> to dig through an object model to get at all that, as compared to
> providing an api call that just says "get_latest_sync(repo_id)".

There's no reason why you couldn't have a repo.get_latest_sync(), where repo
is an instance of the model.  I'd even say if this is a very common operation,
the server should support it natively.  You're right, it shouldn't be that
hard to do it from a client, nor should "business logic" be required in the
client.  The server should support it natively by having a URL like:

/pulp/api/repos/<repo_id>/latest_sync

which returns the latest sync.

> I also think this is going to start to fall apart when we start
> trying to optimize queries. We're talking about absurdly large sets
> of data, so we're going to need to take performance into account
> early on. If we tie the client lib so tightly into an object model
> then I suspect we're going to be shoehorning in advanced
> (cross-object, derived fields, etc) queries later.

I think the server should support this stuff, or at least have a mechanism to
build a dynamic query so that this stuff is computed server side.  Kind of
like what was done for package groups with and/or logic.  We don't want the
client to have to get the full list of groups, and then do and/or logic on the
large data set.  The server should be doing the query, and returning the
result set.

--
-- James Slagle
--