[Pulp-dev] Theme of the bachelor thesis

Fri Sep 21 16:12:55 UTC 2018

On Fri, Sep 21, 2018 at 4:34 PM David Davis <daviddavis at redhat.com> wrote:

> Our previous experiment was to analyze two different ways to store content
> for a repository version: a direct relationship between a version and
> content vs storing diffs between versions (which we have adopted in Pulp
> 3). So we didn’t really examine different branching models and I wouldn’t
> really say we have a branching model today—it’s more of a linear history of
> repository content sets.
>
> I figure it might help if I outline some of my concerns about pursuing a
> branching model.
>
> First is the user experience. Only a small subset of users care about
> being able to branch. If any branching model makes Pulp more difficult to
> use for users who don’t care about branching, that would be a showstopper.
> Having worked with users and other groups at Red Hat, I often worry that
> Pulp is already too difficult to use for some users. I’m much more
> concerned about simplifying Pulp than I am with introducing branching.
>
> Secondly, a branching model would introduce code complexity. Already,
> explaining how repository versions work is probably one of the most
> difficult topics when on-boarding new developers. I don’t see how switching
> to a branching model like git has will make this any easier. Even if we
> mirror git’s model, I am not sure that’s any help as most developers aren’t
> aware of how git works internally.
>
> Another problem is performance. You’ve mentioned this already. Any
> branching model would have to be just as performant as what we have today
> for it to be worth adopting. Perhaps if branching were a central feature to
> Pulp, we would have some more leeway in terms of performance.
>
> Lastly, users can branch today by creating new repositories and copying
> over content. It’s a workaround and not as great as having a branching
> model in the database but Katello uses this method when publishing content
> views. I think it solves most of our users' needs. It’s easy for users to
> understand and it keeps our data model simple.
>
> To kind of sum up things, I can’t imagine a branching model that could
> overcome all of these problems. If the goal is just analyze Pulp's system
> of versioning content vs a model like git while not necessarily committing
> any code to Pulp 3 (except perhaps just some small improvements/tweaks),
> then I’m not opposed to this. I am just doubtful about how productive it
> will be.
>
>
We'd like to get an honest answer to the question: Excuse me, but why not
Git?
We won't presume any conclusions. In case the experiment shows the git-like
approach doesn't add much value, we'll have a solid answer for the question
above.
In any case, examining the details of the current versioning implementation
has value added on its own: just as Robin mentioned above, we could use the
analysis part of the paper to explain current solution to ourselves,
plugin-writers and the users.
Your concerns are valid though and we would like to include them in the
discussion too.

Thanks,
milan

> David
>
>
> On Fri, Sep 21, 2018 at 3:34 AM Milan Kovacik <mkovacik at redhat.com> wrote:
>
>>
>>
>> On Thu, Sep 20, 2018 at 11:14 PM David Davis <daviddavis at redhat.com>
>> wrote:
>>
>>> I’m interested to know more about this versioning/branching proposal and
>>> its goals. I assume this would be to explore changing Pulp 3 to use a
>>> non-linear branching model?
>>>
>>
>> The main goal of the thesis would be to compare Git and Pulp versioning,
>> identifying the advantages and disadvantages of both, when content storage
>> and management is concerned; ideally we'd learn what use cases we are
>> trading for what system traits.
>>
>>
>>> Would the goal be to contribute this model to Pulp 3 codebase in the
>>> near term (like before the MVP)? I guess my concern would be that I don’t
>>> think there is enough of a need nor the time to properly vet it before the
>>> MVP. I don’t think we can guarantee that a branching proposal would result
>>> in a change to Pulp 3 in the near term, and switching the data model
>>> post-MVP is potentially risky and likely backwards incompatible, so
>>> adoption later would be unlikely also.
>>>
>>
>> I think a valuable essay would base the conclusion on an experiment so
>> we'll try to implement (a subset of) the Git data model in Pulp core to get
>> some first-hand experimental data; we'd like to see the code, REST API and
>> performance impact on Pulp core and Pulp plugins.
>> I share your concerns though so we'd work on a fork to avoid pressure on
>> both the experiment and Pulp goals, keeping a focused scope and minimal
>> diff with the project origins; we'd ideally base the fork on an RC (MVP?)
>> Pulp core tag.
>> The decision to merge the fork to the origin projects would be based on
>> the experiment results so it would be out of scope of the thesis.
>>
>> I also want to point out that we explored another versioning model before
>>> we went with our current design. We did some performance analysis and found
>>> that the model wasn’t optimized for the most common use cases in Pulp.
>>> Here’s the discussion in case you’re interested:
>>>
>>> https://www.redhat.com/archives/pulp-dev/2017-December/msg00075.html
>>>
>>
>> Great input! I briefly looked at the conversation and your experiment and
>> I believe we'll utilize this in the comparison indeed.
>> I'd like to ask though, based on the mailing list discussion, did the
>> branching experiment setup reflect the most frequent use case i.e a sparse
>> tree/DAG with few, (long) branches?
>> Did the experiment consider the Git data model and workflows?
>>
>>
>>> David
>>>
>>>
>>> On Thu, Sep 20, 2018 at 10:01 AM Robin Chan <rchan at redhat.com> wrote:
>>>
>>>> Having only this exposure to the bachelor thesis, I've just been
>>>> reading for my own learning.
>>>>
>>>> I'll only share that I would invite Vlada to perhaps write a blog post
>>>> with some summary type information targetting newcomers with an
>>>> understanding of git. Such a post might be a way to understand Pulp by
>>>> allowing them to relate existing understanding to pulp a little quicker.
>>>>
>>>> On Thu, Sep 20, 2018 at 9:41 AM, Milan Kovacik <mkovacik at redhat.com>
>>>> wrote:
>>>>
>>>>> Folks,
>>>>>
>>>>> we've been discussing my proposal with Vlada recently and he's about
>>>>> to commit officially to it at the school.
>>>>> I'd like to poke for feedback esp. if there are concerns/doubts about
>>>>> the sanity of my proposal.
>>>>> Encouraging comments are of course welcome too ;)
>>>>>
>>>>> Cheers,
>>>>> milan
>>>>>
>>>>> On Mon, Sep 10, 2018 at 11:25 AM Milan Kovacik <mkovacik at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'd like to propose a comparison of Git versioning model to Pulp3
>>>>>> versioning (and branching) model.
>>>>>> The textual part would briefly introduce both Git and Pulp, with
>>>>>> focus on their respective usecases, trying to identify overlapping
>>>>>> scenarios.
>>>>>> Next it would in detail examine the versioning models of both Git and
>>>>>> Pulp3.
>>>>>> The main matter would compare implementation of Git versioning model
>>>>>> in Pulp3 (as e.g a fork) to the current versioning model.
>>>>>> The textual part would conclude with advantages and disadvantages of
>>>>>> both versioning models, reiterating over the usecases, comparing
>>>>>> performance (storage and time) of both the implementations.
>>>>>>
>>>>>> I believe this would have a direct practical impact on Pulp3,
>>>>>> affecting feature planning and if feasible, providing an alternative
>>>>>> versioning implementation[1][2].
>>>>>>
>>>>>> --
>>>>>> milan
>>>>>>
>>>>>> PS: I'm volunteering to oversee the thesis on the project part
>>>>>>
>>>>>> [1] https://pulp.plan.io/issues/3360
>>>>>> [2] https://pulp.plan.io/issues/3842
>>>>>>
>>>>>> On Wed, Sep 5, 2018 at 9:16 PM, Austin Macdonald <austin at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> How about reproducibility? Bihan and I presented on this topic at
>>>>>>> scipy but there is a lot left to say/research.
>>>>>>> https://www.youtube.com/watch?v=5czGgUG0oXA
>>>>>>>
>>>>>>> In the scientific community, reproducibility is a hot topic (I bet
>>>>>>> grad schools would love this!), and the computational aspect of it has been
>>>>>>> under-emphasized in the existing literature. Our talk focused on the
>>>>>>> fundamental problems of environmental reproducibility and how to use Pulp
>>>>>>> to solve them. Especially since the scope was narrow, we barely scratched
>>>>>>> the surface. There a ton of potential angles; some ideas are that you could
>>>>>>> catalog and compare technologies or discuss academic vs industry use cases
>>>>>>> and trends.
>>>>>>>
>>>>>>> On Wed, Sep 5, 2018 at 5:01 AM Vladimir Dusek <vdusek at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I don't ask you for a specific assignment, we'll set up it with
>>>>>>>> Milan later. I only need some idea on a topic.
>>>>>>>>
>>>>>>>> Scope of the work should be 40-50 standard pages. Additional
>>>>>>>> istructions could be find here -
>>>>>>>> http://www.fit.vutbr.cz/info/szz/pokyny_bp.php.en#pozadavky.
>>>>>>>> However I don't think it's neccessary for you to know them.
>>>>>>>>
>>>>>>>> All bachelor theses from our university last year could be find
>>>>>>>> here - http://www.fit.vutbr.cz/study/DP/BP.php.en. I presume most
>>>>>>>> of them are written in Czech but there are abstracts in English which might
>>>>>>>> be useful for inspiration.
>>>>>>>>
>>>>>>>> Random abstracts:
>>>>>>>>
>>>>>>>> Alias Analysis in C Compiler
>>>>>>>> -------------------------------------
>>>>>>>> This thesis is dedicated to the problem of alias analysis and
>>>>>>>> possibilities of its improvement in the LLVM framework. The goal of this
>>>>>>>> thesis is to improve the accuracy, which was achieved by extending the
>>>>>>>> existing implementation of Andresen algorithm to be field sensitive. The
>>>>>>>> terms related to alias analysis and algorithms of the alias analysis
>>>>>>>> available in LLVM are explained. These algorithms are compared according to
>>>>>>>> their base idea, features, and limitations. The implementation of the field
>>>>>>>> sensitivity has been tested using compiler test suites. Its impact on
>>>>>>>> program compilation speed and performance has been analyzed. The measured
>>>>>>>> results show an increase in the accuracy of alias analysis in the LLVM
>>>>>>>> framework.
>>>>>>>>
>>>>>>>> Reinforcement Learning for Starcraft Game Playing
>>>>>>>> --------------------------------------------------------------------
>>>>>>>> This work focuses on methods of machine learning for playing
>>>>>>>> real-time strategy games. The thesis applies mainly methods of Q-learning
>>>>>>>> based on reinforcement learning. The practical part of this work is
>>>>>>>> implementing an agent for playing Starcraft II. Mine solution is based on 4
>>>>>>>> simple networks, that are colaborating together. Each of the network also
>>>>>>>> teaches itself how to process all given actions optimally. Analysis of the
>>>>>>>> system is based on experiments and statistics from played games.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 4, 2018 at 8:27 PM Robin Chan <rchan at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Vladimir,
>>>>>>>>>
>>>>>>>>> I'm not familiar with what types of topics are appropriate for
>>>>>>>>> this type of project - could you share some criteria and examples of what
>>>>>>>>> makes a good topic?
>>>>>>>>>
>>>>>>>>> Robin
>>>>>>>>>
>>>>>>>>> On Mon, Sep 3, 2018 at 8:50 AM, Vladimir Dusek <vdusek at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi team,
>>>>>>>>>>
>>>>>>>>>> since September I'm in the third grade of bachelor studies and it
>>>>>>>>>> means that I'm going to write a bachelor thesis. I haven't selected a topic
>>>>>>>>>> yet and if it'll be possible I'd like to work on something that is relevant
>>>>>>>>>> to Pulp. Do you have any idea?
>>>>>>>>>>
>>>>>>>>>> Thank you guys,
>>>>>>>>>> Vlada
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pulp-dev mailing list
>>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>> Pulp-dev mailing list
>>>>>>>> Pulp-dev at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>
>> Thanks a lot!
>> milan
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180921/bf0378d1/attachment.htm>