[Pulp-dev] Moving to Github Actions

Daniel Alley dalley at redhat.com
Thu Feb 6 17:28:57 UTC 2020


I agree that Centos CI should be a high priority, however I think it is
still important to discuss what we want our end-state to look like, because
that will strongly influence our approach going forwards.  And FWIW, I
don't think Fabricio's work will do any harm in this respect, especially
given that the main focus has been on repos that don't use the template
(pulp-rpm-prerequisites, ansible-pulp), and are putting enough load on
Travis to cause us tangible problems (ansible-pulp, pulp_file performance
tests).

I don't believe Fabricio was suggesting that some plugins would use Travis
and other plugins would use Github Actions.  It was an idea thrown around
that maybe we would want to support a choice of CI for potential plugin
writers, but personally I think we should just ditch Travis entirely.  The
outages (such as the one on Monday) and resource restrictions are hindering
development, and I don't expect it to get better considering how many
senior engineers they laid off after being sold to a private equity firm
with a poor reputation. <https://news.ycombinator.com/item?id=19218036>

But I also don't think we should try to use Centos CI to replace all the
things Travis is currently doing.  I would rather use Github Actions for
everything except for the very few workflows that require Centos CI,
namely, running tests on a FIPS platform and with SELinux configured.  I
think that this proposal would be both the optimal outcome, and also the
easiest thing to do, and here is why.

Centos CI would not be involved with any of the following:

* Code formatting lints
* Commit message checks
* Changelog checks
* Everything involving a matrix of different combinations of Python /
PostgreSQL / Django versions
* Deploy to PyPI upon pushing a new tag
* Testing things against a specific PR or PRs (probably, if we were to run
the jobs nightly instead of on every PR, which doesn't strike me as
necessary)

The majority of CI complexity is due to these auxillary features and I
don't see any reason to try to port this to Jenkins/Centos CI, much less
try to maintain it across both CI systems.  Here we agree: that would be a
nightmare.  Almost all of the CI-service-specific code deals with these
auxillary checks.  But Fabricio has already proven that these things are
relatively easy to port to Github Actions, which, while different from
Travis, is much more similar to Travis than Jenkins is.  And this work is
already done, and will be really easy to port back into the plugin template
to use everywhere.

Of our various CI scripts, the only ones which would be remain in common
between GHA and CentosCI are install.sh and before_script.sh, which perform
the core setup tasks for our containers.  Every other script in our
.travis/ directory does something which can be the sole concern of Github
Actions.  So the maintenance burden of maintaining that small amount of
common code would not be very high, and certainly not double.




On Thu, Feb 6, 2020 at 10:03 AM David Davis <daviddavis at redhat.com> wrote:

> I think there is an immediate need to move to Github Actions. Yesterday,
> for example, I spent a good deal of time on failing pulp_file jobs, which
> are exceeding Travis' 50 minute threshold[0] (Github Actions has a 6 hour
> limit). We've also been working for weeks on alleviating the bottlenecks
> that we've been experiencing due to Travis' limit of 3 concurrent jobs.
> Paying the Travis tax is detracting from our stakeholder work.
>
> Regarding supporting two CIs, won't we have to support multiple CIs to run
> against selinux and FIPS? The only alternative would be to move everything
> to CentOS CI. Fabricio's pulp_file PR demonstrates though that our CI
> scripts can be made to run in multiple CIs. These scripts are the majority
> of our CI/CD code; the Travis/Github Actions configs are only a couple
> hundred lines. So most of our code will be shared across CIs, which should
> alleviate most of the burden of supporting more than one CI.
>
> I would suggest as a next step we merge the ansible-pulp PR[1] as it
> should provide some real world data about running on Github Actions which
> we can consider. Moreover, its CI is independent from the plugin_template
> and it should help to alleviate most of our bottlenecks in Travis. We can
> postpone the decision around plugins until we have more data and consensus.
>
> [0] https://pulp.plan.io/issues/6104
> [1] https://github.com/pulp/ansible-pulp/pull/217
>
> David
>
>
> On Thu, Feb 6, 2020 at 5:51 AM Brian Bouterse <bmbouter at redhat.com> wrote:
>
>> Inline replies to three convos would be too confusing, so I'm going to
>> try to bring it back to a single thread.
>>
>> The Pulp team can't afford to do two CI's. I estimate it's taken many
>> hundreds of hours cumulatively and probably >10 hours a week at least
>> maintaining the CI for Travis in the plugin template. The current
>> commitments and size of the pulp dev team can't sustain doubling that
>> additional level of investment. Think about allllllll the changes that we
>> make weekly. Are we prepared to "port" those continuously? I'm not. I think
>> it's categorically a non-starter from a resource perspective.
>>
>> I don't think it's a good thing to split the plugins to use various CI's.
>> Today if something doesn't work, it doesn't work in all plugins CI, and if
>> someone fixes it, all plugins get fixed (for the most part). Splitting
>> plugins across different CI's with incompatible features and no parity
>> between them will put us in a situation where we lose the benefits of every
>> improvement improving everyone.
>>
>> Is this work being done to serve a stakeholder asking for it? I ask
>> because if it isn't, it's taking the place of work stakeholders are asking
>> for to be delivered in Feb and March. Those timelines are so close, I'm
>> surprised others perceive that now is the right time to take on a goal like
>> this.
>>
>> I'm on PTO until the 17th so I will only be able to provide input on his
>> decision sparsely until then.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> I'm perceiving that people don't want to continue on Travis and this is
>> the way for some plugin writers to leave Travis. The problem is that
>>
>> On Wed, Feb 5, 2020 at 12:44 PM Fabricio Aguiar <
>> fabricio.aguiar at redhat.com> wrote:
>>
>>> I believe we can add GH actions on plugin_template,  then we would have:
>>> $ ./plugin-template --travis PLUGIN_NAME
>>> or
>>> $ ./plugin-template --ghactions PLUGIN_NAME
>>> it is not implemented yet on plugin_template,
>>> but my experience with pulp_file (
>>> https://github.com/pulp/pulp_file/pull/353)  makes me think it will be
>>> easy to create a template for it since I didn't change many files,
>>> and I have not removed travis.
>>> This way, we can make plugin_template run both, travis and GH actions.
>>> Working with GH actions was a good exercise, I struggled to find a
>>> replacement for TRAVIS_COMMIT_RANGE, and got some config issues with
>>> kubectl and httpie.
>>> I personally think changing to GH is totally optional for plugins, but I
>>> believe ansible-pulp and pulp_rpm_prerequisites should move to GH actions,
>>> as both not use plugin_template and consume a lot of time.
>>> And make plugin_template run in both travis and GH actions, for pushing
>>> us to be more agnostic.
>>>
>>> Best regards,
>>> Fabricio Aguiar
>>> Software Engineer, Pulp Project
>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>> +55 11 999652368
>>>
>>>
>>> On Wed, Feb 5, 2020 at 2:16 PM David Davis <daviddavis at redhat.com>
>>> wrote:
>>>
>>>> Brian,
>>>>
>>>> Thanks for the feedback. Responses inline below.
>>>>
>>>> On Wed, Feb 5, 2020 at 10:31 AM Brian Bouterse <bmbouter at redhat.com>
>>>> wrote:
>>>>
>>>>> I'm concerned about the move to GH actions and also the timing. The
>>>>> benefits of lowering the CI runtime are really great, but I'm worried it
>>>>> isn't helping us towards our goals and even takes us further from them.
>>>>>
>>>>> I'm worried about double the outage risk. There are outages, and
>>>>> structurally repo CI pipelines that require more services are at more risk
>>>>> for total outage. This raises the risk of "total CI pipelines halting" in a
>>>>> concerning way for me. Trading runtime for risk I don't think is an overall
>>>>> win; I'd like to find a way to lower the runtime and keep risk the same or
>>>>> lower.
>>>>>
>>>>
>>>> We've been plagued by Travis outages and bottlenecks over the past
>>>> year. Our plugin_template is currently tied to Travis so one option would
>>>> be to allow plugin writers to choose which CI to use and divorce Pulp from
>>>> being tied to a single one. This ought to reduce risk and the impact of
>>>> outages.
>>>>
>>>>
>>>>>
>>>>> Whatever we do I want to make sure we're doing it fully through the
>>>>> plugin template. Is this through the plugin template? If it isn't, or it
>>>>> requires additional steps to configure it than they had before, then I'm
>>>>> concerned about it taking us further from our goals of having the plugin
>>>>> writer take as much burden from the plugin writer as possible. I use this
>>>>> thinking to answer the question posed from daviddavis. My take is that the
>>>>> plugin template's goal is to make writing a plugin with great CI as easy as
>>>>> possible. It's design to be a quality improver and a time saver.
>>>>>
>>>>
>>>> Agreed, the goal is to update the plugin_template. The plan is to start
>>>> by moving ansible-pulp to Github Actions first and test out Github Actions
>>>> as a viable replacement for Travis. Then move pulpcore and plugins (via the
>>>> plugin_template). The ansible-pulp repo doesn't use plugin_template for its
>>>> CI configuration so we don't have to change the plugin_template in testing
>>>> out Github Actions for ansible-pulp and also ansible-pulp is the main hog
>>>> of our Travis resources consuming job runners for 1+ hours.
>>>>
>>>> To your point about the plugin_template, supporting Github Actions
>>>> shouldn't add additional burden to the plugin writer. The two options are
>>>> to either move to Github Actions wholesale or let plugin writers choose
>>>> which CI to use (which we could default). Either option would require zero
>>>> extra steps for plugin writers. And the latter would give more flexibility
>>>> to plugin writers if they want to use a different CI.
>>>>
>>>>
>>>>>
>>>>> Having the lower runtime is nice, but if we're going to put effort in
>>>>> the CI, I'd like to bring up prioritizing getting the plugin_template
>>>>> integrated with https://ci.centos.org/ as a high-value goal. I'm
>>>>> concerned that we're about to ship the SELinux policy and we have no way to
>>>>> test it. Similar concerns with certguard's dependency and its dependencies
>>>>> not being packaged on Ubuntu (so it's hard to run on Travis). Also, I'm
>>>>> concerned we don't have an environment to evaluate FIPS compatibility with.
>>>>> Relatively speaking if we can only do one of these two initiatives at this
>>>>> time, I believe we should do the CentOS CI.
>>>>>
>>>>
>>>> I don't see moving to CentOS CI and Github Actions as mutually
>>>> exclusive. In fact, I think moving to Github Actions could make it easier
>>>> to use to CentOS CI by making our CI/CD code more CI agnostic. Moreover,
>>>> much of the hard work to move to Github Actions was already completed by
>>>> Fabricio last week.
>>>>
>>>>
>>>>> Lowering the runtime I'm really in favor of, so I hope these concerns
>>>>> prompt discussion more than stop the initiative. What do you all think?
>>>>>
>>>>> On Wed, Feb 5, 2020 at 9:05 AM David Davis <daviddavis at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Great question. IMO the main benefit in continuing to support Travis
>>>>>> is that we could better separate our test/deployment code from the CI
>>>>>> specific bits so that most of the plugin_template code could be CI
>>>>>> agnostic. That said, this would be more work. I think it comes down to
>>>>>> whether we want our plugin_template to be more opinionated or more
>>>>>> configurable.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 5, 2020 at 8:18 AM Dana Walker <dawalker at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 to moving to Github Actions.
>>>>>>>
>>>>>>> Can anyone think of reasons a plugin would want to stay with Travis
>>>>>>> specifically?  As fao89 pointed out on the issue, at least each plugin that
>>>>>>> does choose to move takes some of the workload with them to free up job
>>>>>>> runners for plugins that choose to remain.
>>>>>>>
>>>>>>> Dana Walker
>>>>>>>
>>>>>>> She / Her / Hers
>>>>>>>
>>>>>>> Software Engineer, Pulp Project
>>>>>>>
>>>>>>> Red Hat <https://www.redhat.com>
>>>>>>>
>>>>>>> dawalker at redhat.com
>>>>>>> <https://www.redhat.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 4, 2020 at 10:26 AM David Davis <daviddavis at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Over the past year, we've experienced several growing pains with
>>>>>>>> using Travis as our CI/CD environment. Perhaps the biggest has been the
>>>>>>>> limitation of having only 3 concurrent job runners[0] across our entire
>>>>>>>> Pulp organization. At times, it has slowed development by bottlenecking the
>>>>>>>> merging of PRs and delayed numerous releases of Pulp.
>>>>>>>>
>>>>>>>> Last year, Github introduced Github Actions which offers open
>>>>>>>> source projects 20 concurrent jobs[1]. I've filed an issue here to get
>>>>>>>> feedback on moving our repos and plugins to Github Actions:
>>>>>>>>
>>>>>>>> https://pulp.plan.io/issues/6065
>>>>>>>>
>>>>>>>> Also, @fao89 has opened a couple PoC PRs to demonstrate using
>>>>>>>> Github Actions:
>>>>>>>>
>>>>>>>> https://github.com/pulp/pulp_file/pull/353
>>>>>>>> https://github.com/pulp/ansible-pulp/pull/217
>>>>>>>>
>>>>>>>> You'll notice for example that the ansible-pulp build time went
>>>>>>>> from more than 1 hour[2] to 27 minutes[3] as all the jobs ran in parallel
>>>>>>>> on Github Actions.
>>>>>>>>
>>>>>>>> Unless there are objections, we plan to merge the ansible-pulp PR
>>>>>>>> this week since it's CI configuration is independent from other pulp and
>>>>>>>> plugin repos (ie it doesn't use the plugin_template's Travis files).
>>>>>>>>
>>>>>>>> We're hoping though to get feedback on whether we should move
>>>>>>>> pulpcore and plugin repos to Github Actions. If so, should we provide
>>>>>>>> plugins with the option to continue using Travis if they want?
>>>>>>>>
>>>>>>>> If there's no objections by February 11, 2020, we'll proceed with
>>>>>>>> moving pulp_file to Github Actions and look at updating plugin_template.
>>>>>>>>
>>>>>>>> [0] https://travis-ci.com/plans
>>>>>>>> [1]
>>>>>>>> https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#usage-limits
>>>>>>>> [2] https://travis-ci.org/pulp/ansible-pulp/builds/645651353
>>>>>>>> [3]
>>>>>>>> https://github.com/fabricio-aguiar/ansible-pulp/actions/runs/33601847
>>>>>>>>
>>>>>>>> David
>>>>>>>> _______________________________________________
>>>>>>>> Pulp-dev mailing list
>>>>>>>> Pulp-dev at redhat.com
>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200206/3e8af74c/attachment.htm>


More information about the Pulp-dev mailing list