[Pulp-dev] Single-Table Content API Changes, Performance Discussion

Fri Dec 7 19:30:31 UTC 2018

On Thu, Dec 6, 2018 at 6:04 PM Daniel Alley <dalley at redhat.com> wrote:

> I'm not necessarily against this but I'll recap some points I made on IRC:
>
> The burden of knowing where to go to get that information would be pushed
> off onto the API user.  If we're not returning the URL, then anyone using
> the API must know that they need to query /pulp/api/v3/content/file/files/
> (and likewise for every other content type), and that they need to use a
> filter for repository_version=... or repository_version_added=... and so so
> on.
>
> I'm not sure how that would work, how that knowledge would be provided or
> if it's something that can be hardcoded into the bindings.  If you think
> that's possible, then I'm open to it.
>
>
The bindings that get generated provide documentation about the available
filters. The following doc block is generated in the bindings right now:

    def content_file_files_list(self, **kwargs):  # noqa: E501
        """content_file_files_list  # noqa: E501

        ViewSet for FileContent.  # noqa: E501
        This method makes a synchronous HTTP request by default. To make an
        asynchronous HTTP request, please pass async=True
        >>> thread = api.content_file_files_list(async=True)
        >>> result = thread.get()

        :param async bool
        :param str relative_path: Filter results where relative_path
matches value
        :param str digest: Filter results where digest matches value
        :param str repository_version: Repository Version referenced by HREF
        :param str repository_version_added: Repository Version referenced
by HREF
        :param str repository_version_removed: Repository Version
referenced by HREF
        :param int page: A page number within the paginated result set.
        :param int page_size: Number of results to return per page.
        :return: InlineResponse2001
                 If the method is called asynchronously,
                 returns the request thread.
        """

The bindings user would know that when listing all the file content that
she can provide keyword arguments to filter that content. Does that
alleviate the concern?

>
>
> On Thu, Dec 6, 2018 at 4:53 PM Dennis Kliban <dkliban at redhat.com> wrote:
>
>> On Tue, Nov 20, 2018 at 12:31 PM Dennis Kliban <dkliban at redhat.com>
>> wrote:
>>
>>> On Mon, Nov 19, 2018 at 6:20 PM Daniel Alley <dalley at redhat.com> wrote:
>>>
>>>> Some of the API changes that are required by single-table-content would
>>>> be beneficial even if we didn't go forwards with the modelling changes.
>>>> For instance, currently we have single endpoints for each of
>>>> repository_version/.../content/,  .../added_content/, and
>>>> .../removed_content/ which mix content of all types together.  This makes
>>>> it impossible for clients to expect the data returned to expect any
>>>> particular schema.  What the single-table-content does is to provide
>>>> separate query urls for each content type present in the repository
>>>> version, which I believe is a usability win for us, and it's something we
>>>> could implement without using any of the modelling changes.
>>>>
>>>>
>>> The current behavior of the 'content' APIs is already causing a problem
>>> for our OpenAPI 2.0 schema. OpenAPI 2.0 does not support polymorphic
>>> responses. We are currently tracking problem with a bug[0]. The only way to
>>> resolve this problem is to provide APIs that return heterogeneous types.
>>>
>>> [0] https://pulp.plan.io/issues/4052
>>>
>>>
>>>> Besides being a general update, I'd like to start a discussion to
>>>> understand:  is changing the Pulp 3 API so that it's organized around
>>>> content type URLs OK with everyone? This resolves the usability issues of
>>>> returning mixed types. Are there any downsides with this approach?
>>>>
>>>> To clarify what I mean on that last point -- by "content type URLs" I
>>>> mean that where you currently get back the url "
>>>> /pulp/api/v3/repository_version/.../content/" under the "_content"
>>>> field on a repoversion, you would instead get back something like
>>>>
>>>> { "pulp_file.filecontent":
>>>> "/pulp/api/v3/content/file/files/?repository_version=.. }
>>>>
>>>
>>> I am +1 to making this change to our REST API.
>>>
>>
>> Thank you Daniel for putting together the patches[0,1] to make these
>> changes possible. I've had a chance to try out the Python bindings. When
>> using the bindings, I discovered that I could not do anything with the URLs
>> returned for each content type added or removed. Making the request to
>> those URLs requires making a call that looks like this:
>>
>>
>> api.content_file_files_list(repository_version_added=repository_version_1.href)
>>
>> What if instead the API returned the number of each content type added or
>> removed. So a repository version response would look like:
>>
>> {'base_version': None,
>>  'content_added': {'pulp_file.file': 4},
>>  'content_removed': {'pulp_file.file': 1},
>>  'content_summary': {'pulp_file.file': '3'},
>>  'created': datetime.datetime(2018, 12, 5, 23, 34, 26, 948749,
>> tzinfo=tzlocal()),
>>  'href': '/pulp/api/v3/repositories/4/versions/1/',
>>  'number': 1}
>>
>> Thoughts?
>>
>> [0] https://github.com/pulp/pulp/pull/3774
>> [1] https://github.com/pulp/pulp_file/pull/133
>>
>>
>>>
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20181207/1e4e399a/attachment.htm>