[Pulp-dev] Moving to Github Actions

Brian Bouterse bmbouter at redhat.com
Sat Feb 8 15:21:51 UTC 2020


Thanks for replying @dalley and @daviddavis, both of your replies make good
points that resonate with me. Rather than inline responses, I'll try to
bring back some of your points and comment on them.

@dalley, your articulation of how we would split up the CI to run each part
on only one CI platform sounds good to me. +1 to the SELinux and FIPS
testing running on Centos CI, and everything else running in another CI.
This addresses my concern that we were going to duplicate features from one
CI to another.

@daviddavis +1 to merging PRs to give us more Github Actions data on repos
that are not managed by the plugin_template. I'm concerned about merging
Github Action PRs against plugin_template managed repos. For example with
pulp_file, I work on that regularly and I'd like to continue using the
existing CI capabilities it has as-is until the new system is ready. Let me
know if you think we should do this aspect differently.

@daviddavis to your point that we must move to Github Actions and off of
Travis makes sense to me because Travis is a huge bottleneck and Github
Actions can run a lot more in parallel. If we're going to do that though I
think we need to see a plan on how and when Pulp would leave Travis for
Github Actions. In terms of making such a plan I would think it would need
a few aspects in it:

* We need details on each piece of the Travis workflow, where it will be
ported to, and a rough estimate of how long each piece would take. I think
these things would make a great EPIC.
* Who will work on it? It needs I think 2 fully dedicated people who
already completely understand the Travis stuff in detail. It's too hard for
one person and would take too long. Not being able to have these people
fully-dedicated on this task would be a deal-breaker for me. This type of
activity needs no distraction.
* It's got to happen fully - If we're leaving Travis for Github Actions, we
have to fully leave.
* I think it would be good if when a plugin switches, they switch
fully-and-at-once from Travis to Github Actions. I think this because
otherwise, every few days, another plugin_template update will take away a
Travis feature and move it to Github Actions, which across the 10+ plugins
and 10+ features would be painful. This would be very confusing I think.
* It needs to come with education somehow. Maybe a demo video, blog post
recap, and certainly great docs replacing the Travis ones we have now.

I'm suggesting a plan instead of a decision because without a plan. I don't
know how long the work will take, and thus I can't know if we can afford it
in terms of development capacity now. Given the whole convo, I'm more
wondering if "now is the right time" and less about "if this is the right
long-term idea". I think the best long-term situation for the Pulp
development community is likely not with Travis. Now could be the right
time, if we look at the development team and determine if we can meet all
of our goals while fully dedicated 1-3 people to this other effort.

Let me know how I can help. Thank you both and Fabricio for continuing to
drive this improvement for the community.

-Brian











On Thu, Feb 6, 2020 at 12:29 PM Daniel Alley <dalley at redhat.com> wrote:

> I agree that Centos CI should be a high priority, however I think it is
> still important to discuss what we want our end-state to look like, because
> that will strongly influence our approach going forwards.  And FWIW, I
> don't think Fabricio's work will do any harm in this respect, especially
> given that the main focus has been on repos that don't use the template
> (pulp-rpm-prerequisites, ansible-pulp), and are putting enough load on
> Travis to cause us tangible problems (ansible-pulp, pulp_file performance
> tests).
>
> I don't believe Fabricio was suggesting that some plugins would use Travis
> and other plugins would use Github Actions.  It was an idea thrown around
> that maybe we would want to support a choice of CI for potential plugin
> writers, but personally I think we should just ditch Travis entirely.  The
> outages (such as the one on Monday) and resource restrictions are hindering
> development, and I don't expect it to get better considering how many
> senior engineers they laid off after being sold to a private equity firm
> with a poor reputation. <https://news.ycombinator.com/item?id=19218036>
>
> But I also don't think we should try to use Centos CI to replace all the
> things Travis is currently doing.  I would rather use Github Actions for
> everything except for the very few workflows that require Centos CI,
> namely, running tests on a FIPS platform and with SELinux configured.  I
> think that this proposal would be both the optimal outcome, and also the
> easiest thing to do, and here is why.
>
> Centos CI would not be involved with any of the following:
>
> * Code formatting lints
> * Commit message checks
> * Changelog checks
> * Everything involving a matrix of different combinations of Python /
> PostgreSQL / Django versions
> * Deploy to PyPI upon pushing a new tag
> * Testing things against a specific PR or PRs (probably, if we were to run
> the jobs nightly instead of on every PR, which doesn't strike me as
> necessary)
>
> The majority of CI complexity is due to these auxillary features and I
> don't see any reason to try to port this to Jenkins/Centos CI, much less
> try to maintain it across both CI systems.  Here we agree: that would be a
> nightmare.  Almost all of the CI-service-specific code deals with these
> auxillary checks.  But Fabricio has already proven that these things are
> relatively easy to port to Github Actions, which, while different from
> Travis, is much more similar to Travis than Jenkins is.  And this work is
> already done, and will be really easy to port back into the plugin template
> to use everywhere.
>
> Of our various CI scripts, the only ones which would be remain in common
> between GHA and CentosCI are install.sh and before_script.sh, which perform
> the core setup tasks for our containers.  Every other script in our
> .travis/ directory does something which can be the sole concern of Github
> Actions.  So the maintenance burden of maintaining that small amount of
> common code would not be very high, and certainly not double.
>
>
>
>
> On Thu, Feb 6, 2020 at 10:03 AM David Davis <daviddavis at redhat.com> wrote:
>
>> I think there is an immediate need to move to Github Actions. Yesterday,
>> for example, I spent a good deal of time on failing pulp_file jobs, which
>> are exceeding Travis' 50 minute threshold[0] (Github Actions has a 6 hour
>> limit). We've also been working for weeks on alleviating the bottlenecks
>> that we've been experiencing due to Travis' limit of 3 concurrent jobs.
>> Paying the Travis tax is detracting from our stakeholder work.
>>
>> Regarding supporting two CIs, won't we have to support multiple CIs to
>> run against selinux and FIPS? The only alternative would be to move
>> everything to CentOS CI. Fabricio's pulp_file PR demonstrates though that
>> our CI scripts can be made to run in multiple CIs. These scripts are the
>> majority of our CI/CD code; the Travis/Github Actions configs are only a
>> couple hundred lines. So most of our code will be shared across CIs, which
>> should alleviate most of the burden of supporting more than one CI.
>>
>> I would suggest as a next step we merge the ansible-pulp PR[1] as it
>> should provide some real world data about running on Github Actions which
>> we can consider. Moreover, its CI is independent from the plugin_template
>> and it should help to alleviate most of our bottlenecks in Travis. We can
>> postpone the decision around plugins until we have more data and consensus.
>>
>> [0] https://pulp.plan.io/issues/6104
>> [1] https://github.com/pulp/ansible-pulp/pull/217
>>
>> David
>>
>>
>> On Thu, Feb 6, 2020 at 5:51 AM Brian Bouterse <bmbouter at redhat.com>
>> wrote:
>>
>>> Inline replies to three convos would be too confusing, so I'm going to
>>> try to bring it back to a single thread.
>>>
>>> The Pulp team can't afford to do two CI's. I estimate it's taken many
>>> hundreds of hours cumulatively and probably >10 hours a week at least
>>> maintaining the CI for Travis in the plugin template. The current
>>> commitments and size of the pulp dev team can't sustain doubling that
>>> additional level of investment. Think about allllllll the changes that we
>>> make weekly. Are we prepared to "port" those continuously? I'm not. I think
>>> it's categorically a non-starter from a resource perspective.
>>>
>>> I don't think it's a good thing to split the plugins to use various
>>> CI's. Today if something doesn't work, it doesn't work in all plugins CI,
>>> and if someone fixes it, all plugins get fixed (for the most part).
>>> Splitting plugins across different CI's with incompatible features and no
>>> parity between them will put us in a situation where we lose the benefits
>>> of every improvement improving everyone.
>>>
>>> Is this work being done to serve a stakeholder asking for it? I ask
>>> because if it isn't, it's taking the place of work stakeholders are asking
>>> for to be delivered in Feb and March. Those timelines are so close, I'm
>>> surprised others perceive that now is the right time to take on a goal like
>>> this.
>>>
>>> I'm on PTO until the 17th so I will only be able to provide input on his
>>> decision sparsely until then.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I'm perceiving that people don't want to continue on Travis and this is
>>> the way for some plugin writers to leave Travis. The problem is that
>>>
>>> On Wed, Feb 5, 2020 at 12:44 PM Fabricio Aguiar <
>>> fabricio.aguiar at redhat.com> wrote:
>>>
>>>> I believe we can add GH actions on plugin_template,  then we would have:
>>>> $ ./plugin-template --travis PLUGIN_NAME
>>>> or
>>>> $ ./plugin-template --ghactions PLUGIN_NAME
>>>> it is not implemented yet on plugin_template,
>>>> but my experience with pulp_file (
>>>> https://github.com/pulp/pulp_file/pull/353)  makes me think it will be
>>>> easy to create a template for it since I didn't change many files,
>>>> and I have not removed travis.
>>>> This way, we can make plugin_template run both, travis and GH actions.
>>>> Working with GH actions was a good exercise, I struggled to find a
>>>> replacement for TRAVIS_COMMIT_RANGE, and got some config issues with
>>>> kubectl and httpie.
>>>> I personally think changing to GH is totally optional for plugins, but
>>>> I believe ansible-pulp and pulp_rpm_prerequisites should move to GH
>>>> actions, as both not use plugin_template and consume a lot of time.
>>>> And make plugin_template run in both travis and GH actions, for pushing
>>>> us to be more agnostic.
>>>>
>>>> Best regards,
>>>> Fabricio Aguiar
>>>> Software Engineer, Pulp Project
>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>> +55 11 999652368
>>>>
>>>>
>>>> On Wed, Feb 5, 2020 at 2:16 PM David Davis <daviddavis at redhat.com>
>>>> wrote:
>>>>
>>>>> Brian,
>>>>>
>>>>> Thanks for the feedback. Responses inline below.
>>>>>
>>>>> On Wed, Feb 5, 2020 at 10:31 AM Brian Bouterse <bmbouter at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> I'm concerned about the move to GH actions and also the timing. The
>>>>>> benefits of lowering the CI runtime are really great, but I'm worried it
>>>>>> isn't helping us towards our goals and even takes us further from them.
>>>>>>
>>>>>> I'm worried about double the outage risk. There are outages, and
>>>>>> structurally repo CI pipelines that require more services are at more risk
>>>>>> for total outage. This raises the risk of "total CI pipelines halting" in a
>>>>>> concerning way for me. Trading runtime for risk I don't think is an overall
>>>>>> win; I'd like to find a way to lower the runtime and keep risk the same or
>>>>>> lower.
>>>>>>
>>>>>
>>>>> We've been plagued by Travis outages and bottlenecks over the past
>>>>> year. Our plugin_template is currently tied to Travis so one option would
>>>>> be to allow plugin writers to choose which CI to use and divorce Pulp from
>>>>> being tied to a single one. This ought to reduce risk and the impact of
>>>>> outages.
>>>>>
>>>>>
>>>>>>
>>>>>> Whatever we do I want to make sure we're doing it fully through the
>>>>>> plugin template. Is this through the plugin template? If it isn't, or it
>>>>>> requires additional steps to configure it than they had before, then I'm
>>>>>> concerned about it taking us further from our goals of having the plugin
>>>>>> writer take as much burden from the plugin writer as possible. I use this
>>>>>> thinking to answer the question posed from daviddavis. My take is that the
>>>>>> plugin template's goal is to make writing a plugin with great CI as easy as
>>>>>> possible. It's design to be a quality improver and a time saver.
>>>>>>
>>>>>
>>>>> Agreed, the goal is to update the plugin_template. The plan is to
>>>>> start by moving ansible-pulp to Github Actions first and test out Github
>>>>> Actions as a viable replacement for Travis. Then move pulpcore and plugins
>>>>> (via the plugin_template). The ansible-pulp repo doesn't use
>>>>> plugin_template for its CI configuration so we don't have to change the
>>>>> plugin_template in testing out Github Actions for ansible-pulp and also
>>>>> ansible-pulp is the main hog of our Travis resources consuming job runners
>>>>> for 1+ hours.
>>>>>
>>>>> To your point about the plugin_template, supporting Github Actions
>>>>> shouldn't add additional burden to the plugin writer. The two options are
>>>>> to either move to Github Actions wholesale or let plugin writers choose
>>>>> which CI to use (which we could default). Either option would require zero
>>>>> extra steps for plugin writers. And the latter would give more flexibility
>>>>> to plugin writers if they want to use a different CI.
>>>>>
>>>>>
>>>>>>
>>>>>> Having the lower runtime is nice, but if we're going to put effort in
>>>>>> the CI, I'd like to bring up prioritizing getting the plugin_template
>>>>>> integrated with https://ci.centos.org/ as a high-value goal. I'm
>>>>>> concerned that we're about to ship the SELinux policy and we have no way to
>>>>>> test it. Similar concerns with certguard's dependency and its dependencies
>>>>>> not being packaged on Ubuntu (so it's hard to run on Travis). Also, I'm
>>>>>> concerned we don't have an environment to evaluate FIPS compatibility with.
>>>>>> Relatively speaking if we can only do one of these two initiatives at this
>>>>>> time, I believe we should do the CentOS CI.
>>>>>>
>>>>>
>>>>> I don't see moving to CentOS CI and Github Actions as mutually
>>>>> exclusive. In fact, I think moving to Github Actions could make it easier
>>>>> to use to CentOS CI by making our CI/CD code more CI agnostic. Moreover,
>>>>> much of the hard work to move to Github Actions was already completed by
>>>>> Fabricio last week.
>>>>>
>>>>>
>>>>>> Lowering the runtime I'm really in favor of, so I hope these concerns
>>>>>> prompt discussion more than stop the initiative. What do you all think?
>>>>>>
>>>>>> On Wed, Feb 5, 2020 at 9:05 AM David Davis <daviddavis at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Great question. IMO the main benefit in continuing to support Travis
>>>>>>> is that we could better separate our test/deployment code from the CI
>>>>>>> specific bits so that most of the plugin_template code could be CI
>>>>>>> agnostic. That said, this would be more work. I think it comes down to
>>>>>>> whether we want our plugin_template to be more opinionated or more
>>>>>>> configurable.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 5, 2020 at 8:18 AM Dana Walker <dawalker at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1 to moving to Github Actions.
>>>>>>>>
>>>>>>>> Can anyone think of reasons a plugin would want to stay with Travis
>>>>>>>> specifically?  As fao89 pointed out on the issue, at least each plugin that
>>>>>>>> does choose to move takes some of the workload with them to free up job
>>>>>>>> runners for plugins that choose to remain.
>>>>>>>>
>>>>>>>> Dana Walker
>>>>>>>>
>>>>>>>> She / Her / Hers
>>>>>>>>
>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>
>>>>>>>> Red Hat <https://www.redhat.com>
>>>>>>>>
>>>>>>>> dawalker at redhat.com
>>>>>>>> <https://www.redhat.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 4, 2020 at 10:26 AM David Davis <daviddavis at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Over the past year, we've experienced several growing pains with
>>>>>>>>> using Travis as our CI/CD environment. Perhaps the biggest has been the
>>>>>>>>> limitation of having only 3 concurrent job runners[0] across our entire
>>>>>>>>> Pulp organization. At times, it has slowed development by bottlenecking the
>>>>>>>>> merging of PRs and delayed numerous releases of Pulp.
>>>>>>>>>
>>>>>>>>> Last year, Github introduced Github Actions which offers open
>>>>>>>>> source projects 20 concurrent jobs[1]. I've filed an issue here to get
>>>>>>>>> feedback on moving our repos and plugins to Github Actions:
>>>>>>>>>
>>>>>>>>> https://pulp.plan.io/issues/6065
>>>>>>>>>
>>>>>>>>> Also, @fao89 has opened a couple PoC PRs to demonstrate using
>>>>>>>>> Github Actions:
>>>>>>>>>
>>>>>>>>> https://github.com/pulp/pulp_file/pull/353
>>>>>>>>> https://github.com/pulp/ansible-pulp/pull/217
>>>>>>>>>
>>>>>>>>> You'll notice for example that the ansible-pulp build time went
>>>>>>>>> from more than 1 hour[2] to 27 minutes[3] as all the jobs ran in parallel
>>>>>>>>> on Github Actions.
>>>>>>>>>
>>>>>>>>> Unless there are objections, we plan to merge the ansible-pulp PR
>>>>>>>>> this week since it's CI configuration is independent from other pulp and
>>>>>>>>> plugin repos (ie it doesn't use the plugin_template's Travis files).
>>>>>>>>>
>>>>>>>>> We're hoping though to get feedback on whether we should move
>>>>>>>>> pulpcore and plugin repos to Github Actions. If so, should we provide
>>>>>>>>> plugins with the option to continue using Travis if they want?
>>>>>>>>>
>>>>>>>>> If there's no objections by February 11, 2020, we'll proceed with
>>>>>>>>> moving pulp_file to Github Actions and look at updating plugin_template.
>>>>>>>>>
>>>>>>>>> [0] https://travis-ci.com/plans
>>>>>>>>> [1]
>>>>>>>>> https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#usage-limits
>>>>>>>>> [2] https://travis-ci.org/pulp/ansible-pulp/builds/645651353
>>>>>>>>> [3]
>>>>>>>>> https://github.com/fabricio-aguiar/ansible-pulp/actions/runs/33601847
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200208/aabfaf36/attachment.htm>


More information about the Pulp-dev mailing list