[Pulp-dev] Theme of the bachelor thesis

David Davis daviddavis at redhat.com
Fri Sep 21 14:34:18 UTC 2018


Our previous experiment was to analyze two different ways to store content
for a repository version: a direct relationship between a version and
content vs storing diffs between versions (which we have adopted in Pulp
3). So we didn’t really examine different branching models and I wouldn’t
really say we have a branching model today—it’s more of a linear history of
repository content sets.

I figure it might help if I outline some of my concerns about pursuing a
branching model.

First is the user experience. Only a small subset of users care about being
able to branch. If any branching model makes Pulp more difficult to use for
users who don’t care about branching, that would be a showstopper. Having
worked with users and other groups at Red Hat, I often worry that Pulp is
already too difficult to use for some users. I’m much more concerned about
simplifying Pulp than I am with introducing branching.

Secondly, a branching model would introduce code complexity. Already,
explaining how repository versions work is probably one of the most
difficult topics when on-boarding new developers. I don’t see how switching
to a branching model like git has will make this any easier. Even if we
mirror git’s model, I am not sure that’s any help as most developers aren’t
aware of how git works internally.

Another problem is performance. You’ve mentioned this already. Any
branching model would have to be just as performant as what we have today
for it to be worth adopting. Perhaps if branching were a central feature to
Pulp, we would have some more leeway in terms of performance.

Lastly, users can branch today by creating new repositories and copying
over content. It’s a workaround and not as great as having a branching
model in the database but Katello uses this method when publishing content
views. I think it solves most of our users' needs. It’s easy for users to
understand and it keeps our data model simple.

To kind of sum up things, I can’t imagine a branching model that could
overcome all of these problems. If the goal is just analyze Pulp's system
of versioning content vs a model like git while not necessarily committing
any code to Pulp 3 (except perhaps just some small improvements/tweaks),
then I’m not opposed to this. I am just doubtful about how productive it
will be.

David


On Fri, Sep 21, 2018 at 3:34 AM Milan Kovacik <mkovacik at redhat.com> wrote:

>
>
> On Thu, Sep 20, 2018 at 11:14 PM David Davis <daviddavis at redhat.com>
> wrote:
>
>> I’m interested to know more about this versioning/branching proposal and
>> its goals. I assume this would be to explore changing Pulp 3 to use a
>> non-linear branching model?
>>
>
> The main goal of the thesis would be to compare Git and Pulp versioning,
> identifying the advantages and disadvantages of both, when content storage
> and management is concerned; ideally we'd learn what use cases we are
> trading for what system traits.
>
>
>> Would the goal be to contribute this model to Pulp 3 codebase in the near
>> term (like before the MVP)? I guess my concern would be that I don’t think
>> there is enough of a need nor the time to properly vet it before the MVP. I
>> don’t think we can guarantee that a branching proposal would result in a
>> change to Pulp 3 in the near term, and switching the data model post-MVP is
>> potentially risky and likely backwards incompatible, so adoption later
>> would be unlikely also.
>>
>
> I think a valuable essay would base the conclusion on an experiment so
> we'll try to implement (a subset of) the Git data model in Pulp core to get
> some first-hand experimental data; we'd like to see the code, REST API and
> performance impact on Pulp core and Pulp plugins.
> I share your concerns though so we'd work on a fork to avoid pressure on
> both the experiment and Pulp goals, keeping a focused scope and minimal
> diff with the project origins; we'd ideally base the fork on an RC (MVP?)
> Pulp core tag.
> The decision to merge the fork to the origin projects would be based on
> the experiment results so it would be out of scope of the thesis.
>
> I also want to point out that we explored another versioning model before
>> we went with our current design. We did some performance analysis and found
>> that the model wasn’t optimized for the most common use cases in Pulp.
>> Here’s the discussion in case you’re interested:
>>
>> https://www.redhat.com/archives/pulp-dev/2017-December/msg00075.html
>>
>
> Great input! I briefly looked at the conversation and your experiment and
> I believe we'll utilize this in the comparison indeed.
> I'd like to ask though, based on the mailing list discussion, did the
> branching experiment setup reflect the most frequent use case i.e a sparse
> tree/DAG with few, (long) branches?
> Did the experiment consider the Git data model and workflows?
>
>
>> David
>>
>>
>> On Thu, Sep 20, 2018 at 10:01 AM Robin Chan <rchan at redhat.com> wrote:
>>
>>> Having only this exposure to the bachelor thesis, I've just been reading
>>> for my own learning.
>>>
>>> I'll only share that I would invite Vlada to perhaps write a blog post
>>> with some summary type information targetting newcomers with an
>>> understanding of git. Such a post might be a way to understand Pulp by
>>> allowing them to relate existing understanding to pulp a little quicker.
>>>
>>> On Thu, Sep 20, 2018 at 9:41 AM, Milan Kovacik <mkovacik at redhat.com>
>>> wrote:
>>>
>>>> Folks,
>>>>
>>>> we've been discussing my proposal with Vlada recently and he's about to
>>>> commit officially to it at the school.
>>>> I'd like to poke for feedback esp. if there are concerns/doubts about
>>>> the sanity of my proposal.
>>>> Encouraging comments are of course welcome too ;)
>>>>
>>>> Cheers,
>>>> milan
>>>>
>>>> On Mon, Sep 10, 2018 at 11:25 AM Milan Kovacik <mkovacik at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'd like to propose a comparison of Git versioning model to Pulp3
>>>>> versioning (and branching) model.
>>>>> The textual part would briefly introduce both Git and Pulp, with focus
>>>>> on their respective usecases, trying to identify overlapping scenarios.
>>>>> Next it would in detail examine the versioning models of both Git and
>>>>> Pulp3.
>>>>> The main matter would compare implementation of Git versioning model
>>>>> in Pulp3 (as e.g a fork) to the current versioning model.
>>>>> The textual part would conclude with advantages and disadvantages of
>>>>> both versioning models, reiterating over the usecases, comparing
>>>>> performance (storage and time) of both the implementations.
>>>>>
>>>>> I believe this would have a direct practical impact on Pulp3,
>>>>> affecting feature planning and if feasible, providing an alternative
>>>>> versioning implementation[1][2].
>>>>>
>>>>> --
>>>>> milan
>>>>>
>>>>> PS: I'm volunteering to oversee the thesis on the project part
>>>>>
>>>>> [1] https://pulp.plan.io/issues/3360
>>>>> [2] https://pulp.plan.io/issues/3842
>>>>>
>>>>> On Wed, Sep 5, 2018 at 9:16 PM, Austin Macdonald <austin at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> How about reproducibility? Bihan and I presented on this topic at
>>>>>> scipy but there is a lot left to say/research.
>>>>>> https://www.youtube.com/watch?v=5czGgUG0oXA
>>>>>>
>>>>>> In the scientific community, reproducibility is a hot topic (I bet
>>>>>> grad schools would love this!), and the computational aspect of it has been
>>>>>> under-emphasized in the existing literature. Our talk focused on the
>>>>>> fundamental problems of environmental reproducibility and how to use Pulp
>>>>>> to solve them. Especially since the scope was narrow, we barely scratched
>>>>>> the surface. There a ton of potential angles; some ideas are that you could
>>>>>> catalog and compare technologies or discuss academic vs industry use cases
>>>>>> and trends.
>>>>>>
>>>>>> On Wed, Sep 5, 2018 at 5:01 AM Vladimir Dusek <vdusek at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't ask you for a specific assignment, we'll set up it with
>>>>>>> Milan later. I only need some idea on a topic.
>>>>>>>
>>>>>>> Scope of the work should be 40-50 standard pages. Additional
>>>>>>> istructions could be find here -
>>>>>>> http://www.fit.vutbr.cz/info/szz/pokyny_bp.php.en#pozadavky.
>>>>>>> However I don't think it's neccessary for you to know them.
>>>>>>>
>>>>>>> All bachelor theses from our university last year could be find here
>>>>>>> - http://www.fit.vutbr.cz/study/DP/BP.php.en. I presume most of
>>>>>>> them are written in Czech but there are abstracts in English which might be
>>>>>>> useful for inspiration.
>>>>>>>
>>>>>>> Random abstracts:
>>>>>>>
>>>>>>> Alias Analysis in C Compiler
>>>>>>> -------------------------------------
>>>>>>> This thesis is dedicated to the problem of alias analysis and
>>>>>>> possibilities of its improvement in the LLVM framework. The goal of this
>>>>>>> thesis is to improve the accuracy, which was achieved by extending the
>>>>>>> existing implementation of Andresen algorithm to be field sensitive. The
>>>>>>> terms related to alias analysis and algorithms of the alias analysis
>>>>>>> available in LLVM are explained. These algorithms are compared according to
>>>>>>> their base idea, features, and limitations. The implementation of the field
>>>>>>> sensitivity has been tested using compiler test suites. Its impact on
>>>>>>> program compilation speed and performance has been analyzed. The measured
>>>>>>> results show an increase in the accuracy of alias analysis in the LLVM
>>>>>>> framework.
>>>>>>>
>>>>>>> Reinforcement Learning for Starcraft Game Playing
>>>>>>> --------------------------------------------------------------------
>>>>>>> This work focuses on methods of machine learning for playing
>>>>>>> real-time strategy games. The thesis applies mainly methods of Q-learning
>>>>>>> based on reinforcement learning. The practical part of this work is
>>>>>>> implementing an agent for playing Starcraft II. Mine solution is based on 4
>>>>>>> simple networks, that are colaborating together. Each of the network also
>>>>>>> teaches itself how to process all given actions optimally. Analysis of the
>>>>>>> system is based on experiments and statistics from played games.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 8:27 PM Robin Chan <rchan at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>> I'm not familiar with what types of topics are appropriate for this
>>>>>>>> type of project - could you share some criteria and examples of what makes
>>>>>>>> a good topic?
>>>>>>>>
>>>>>>>> Robin
>>>>>>>>
>>>>>>>> On Mon, Sep 3, 2018 at 8:50 AM, Vladimir Dusek <vdusek at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi team,
>>>>>>>>>
>>>>>>>>> since September I'm in the third grade of bachelor studies and it
>>>>>>>>> means that I'm going to write a bachelor thesis. I haven't selected a topic
>>>>>>>>> yet and if it'll be possible I'd like to work on something that is relevant
>>>>>>>>> to Pulp. Do you have any idea?
>>>>>>>>>
>>>>>>>>> Thank you guys,
>>>>>>>>> Vlada
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
> Thanks a lot!
> milan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180921/f51204bd/attachment.htm>


More information about the Pulp-dev mailing list