[Pulp-dev] Moving to Github Actions

David Davis daviddavis at redhat.com
Sun Feb 9 14:45:54 UTC 2020


Thanks Brian and Daniel. I agree on the points you both raised.

Brian, to you specific questions/points:

## We need details on each piece of the Travis workflow, where it will be
ported to, and a rough estimate of how long each piece would take. I think
these things would make a great EPIC.

I have a Github Actions epic. I plan to update it this week based on our
conversation and will add more specific details, estimates, etc. I'll
respond when it's ready for review.

https://pulp.plan.io/issues/6065


## Who will work on it? It needs I think 2 fully dedicated people who
already completely understand the Travis stuff in detail. It's too hard for
one person and would take too long...

I definitely agree we need at least 2 people to work on this. We need as
many people as possible to understand Github Actions.

I don't know who has time for this right now. I imagine it'll probably have
to wait until next sprint (Sprint 67). Or at least I personally won't have
time until next week at the earliest. That'll give us time to plan though.

In the meantime, I'd consider letting the installer team merge Fabricio's
ansible-pulp PR[0]. This will also alleviate much of the immediate need and
let us begin collecting real world data/experience as well.


## It's got to happen fully - If we're leaving Travis for Github Actions,
we have to fully leave
## I think it would be good if when a plugin switches, they switch
fully-and-at-once from Travis to Github Actions...

Makes sense to me.


## It needs to come with education somehow. Maybe a demo video, blog post
recap, and certainly great docs replacing the Travis ones we have now.

100% agreed. I added these items to the epic.


[0] https://github.com/pulp/ansible-pulp/pull/217

David


On Sat, Feb 8, 2020 at 12:28 PM Daniel Alley <dalley at redhat.com> wrote:

> Thanks for your response Brian, I think all of those concerns are
> reasonable!  I'll try to add to/help with some of them.
>
> The approach Fabricio took with his PR to pulp_file is incredibly smart, I
> think.  In his PR to pulp_file, all of the CI scripts remain unchanged.  He
> just fakes being in a Travis environment by using the information that GHA
> provides to set all of the $TRAVIS_* environment variables [0] that those
> scripts use.  Not only was this much faster to do than doing a wholesale
> conversion to idiomatic GHA (Fabricio got everything working with only
> about 2 days of work!), it means that Travis can continue running for as
> long or as short as we want it to, and once we do switch over the process
> of converting the CI scripts to be more idiomatic with GHA can be done at
> our leisure rather than frontloading a bunch of work.
>
> Adding GHA support to the template and the other plugins should be as
> simple as taking the GHA configs (analogous to .travis.yml) that are
> already written for pulp_file, generalizing them for the template, and then
> re-applying the template to all the other plugins.  I don't expect that it
> would take longer than one engineer-day to complete the whole process!
>
> [0]
> https://github.com/pulp/pulp_file/pull/353/files#diff-d45cbc8d15de0f15cdce609ec195cf2eR34-R47
>
> On Sat, Feb 8, 2020 at 10:22 AM Brian Bouterse <bmbouter at redhat.com>
> wrote:
>
>> Thanks for replying @dalley and @daviddavis, both of your replies make
>> good points that resonate with me. Rather than inline responses, I'll try
>> to bring back some of your points and comment on them.
>>
>> @dalley, your articulation of how we would split up the CI to run each
>> part on only one CI platform sounds good to me. +1 to the SELinux and FIPS
>> testing running on Centos CI, and everything else running in another CI.
>> This addresses my concern that we were going to duplicate features from one
>> CI to another.
>>
>> @daviddavis +1 to merging PRs to give us more Github Actions data on
>> repos that are not managed by the plugin_template. I'm concerned about
>> merging Github Action PRs against plugin_template managed repos. For
>> example with pulp_file, I work on that regularly and I'd like to continue
>> using the existing CI capabilities it has as-is until the new system is
>> ready. Let me know if you think we should do this aspect differently.
>>
>> @daviddavis to your point that we must move to Github Actions and off of
>> Travis makes sense to me because Travis is a huge bottleneck and Github
>> Actions can run a lot more in parallel. If we're going to do that though I
>> think we need to see a plan on how and when Pulp would leave Travis for
>> Github Actions. In terms of making such a plan I would think it would need
>> a few aspects in it:
>>
>> * We need details on each piece of the Travis workflow, where it will be
>> ported to, and a rough estimate of how long each piece would take. I think
>> these things would make a great EPIC.
>> * Who will work on it? It needs I think 2 fully dedicated people who
>> already completely understand the Travis stuff in detail. It's too hard for
>> one person and would take too long. Not being able to have these people
>> fully-dedicated on this task would be a deal-breaker for me. This type of
>> activity needs no distraction.
>> * It's got to happen fully - If we're leaving Travis for Github Actions,
>> we have to fully leave.
>> * I think it would be good if when a plugin switches, they switch
>> fully-and-at-once from Travis to Github Actions. I think this because
>> otherwise, every few days, another plugin_template update will take away a
>> Travis feature and move it to Github Actions, which across the 10+ plugins
>> and 10+ features would be painful. This would be very confusing I think.
>> * It needs to come with education somehow. Maybe a demo video, blog post
>> recap, and certainly great docs replacing the Travis ones we have now.
>>
>> I'm suggesting a plan instead of a decision because without a plan. I
>> don't know how long the work will take, and thus I can't know if we can
>> afford it in terms of development capacity now. Given the whole convo, I'm
>> more wondering if "now is the right time" and less about "if this is the
>> right long-term idea". I think the best long-term situation for the Pulp
>> development community is likely not with Travis. Now could be the right
>> time, if we look at the development team and determine if we can meet all
>> of our goals while fully dedicated 1-3 people to this other effort.
>>
>> Let me know how I can help. Thank you both and Fabricio for continuing to
>> drive this improvement for the community.
>>
>> -Brian
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 6, 2020 at 12:29 PM Daniel Alley <dalley at redhat.com> wrote:
>>
>>> I agree that Centos CI should be a high priority, however I think it is
>>> still important to discuss what we want our end-state to look like, because
>>> that will strongly influence our approach going forwards.  And FWIW, I
>>> don't think Fabricio's work will do any harm in this respect, especially
>>> given that the main focus has been on repos that don't use the template
>>> (pulp-rpm-prerequisites, ansible-pulp), and are putting enough load on
>>> Travis to cause us tangible problems (ansible-pulp, pulp_file performance
>>> tests).
>>>
>>> I don't believe Fabricio was suggesting that some plugins would use
>>> Travis and other plugins would use Github Actions.  It was an idea thrown
>>> around that maybe we would want to support a choice of CI for potential
>>> plugin writers, but personally I think we should just ditch Travis
>>> entirely.  The outages (such as the one on Monday) and resource
>>> restrictions are hindering development, and I don't expect it to get better
>>> considering how many senior engineers they laid off after being sold to
>>> a private equity firm with a poor reputation.
>>> <https://news.ycombinator.com/item?id=19218036>
>>>
>>> But I also don't think we should try to use Centos CI to replace all the
>>> things Travis is currently doing.  I would rather use Github Actions for
>>> everything except for the very few workflows that require Centos CI,
>>> namely, running tests on a FIPS platform and with SELinux configured.  I
>>> think that this proposal would be both the optimal outcome, and also the
>>> easiest thing to do, and here is why.
>>>
>>> Centos CI would not be involved with any of the following:
>>>
>>> * Code formatting lints
>>> * Commit message checks
>>> * Changelog checks
>>> * Everything involving a matrix of different combinations of Python /
>>> PostgreSQL / Django versions
>>> * Deploy to PyPI upon pushing a new tag
>>> * Testing things against a specific PR or PRs (probably, if we were to
>>> run the jobs nightly instead of on every PR, which doesn't strike me as
>>> necessary)
>>>
>>> The majority of CI complexity is due to these auxillary features and I
>>> don't see any reason to try to port this to Jenkins/Centos CI, much less
>>> try to maintain it across both CI systems.  Here we agree: that would be a
>>> nightmare.  Almost all of the CI-service-specific code deals with these
>>> auxillary checks.  But Fabricio has already proven that these things are
>>> relatively easy to port to Github Actions, which, while different from
>>> Travis, is much more similar to Travis than Jenkins is.  And this work is
>>> already done, and will be really easy to port back into the plugin template
>>> to use everywhere.
>>>
>>> Of our various CI scripts, the only ones which would be remain in common
>>> between GHA and CentosCI are install.sh and before_script.sh, which perform
>>> the core setup tasks for our containers.  Every other script in our
>>> .travis/ directory does something which can be the sole concern of Github
>>> Actions.  So the maintenance burden of maintaining that small amount of
>>> common code would not be very high, and certainly not double.
>>>
>>>
>>>
>>>
>>> On Thu, Feb 6, 2020 at 10:03 AM David Davis <daviddavis at redhat.com>
>>> wrote:
>>>
>>>> I think there is an immediate need to move to Github Actions.
>>>> Yesterday, for example, I spent a good deal of time on failing pulp_file
>>>> jobs, which are exceeding Travis' 50 minute threshold[0] (Github Actions
>>>> has a 6 hour limit). We've also been working for weeks on alleviating the
>>>> bottlenecks that we've been experiencing due to Travis' limit of 3
>>>> concurrent jobs. Paying the Travis tax is detracting from our stakeholder
>>>> work.
>>>>
>>>> Regarding supporting two CIs, won't we have to support multiple CIs to
>>>> run against selinux and FIPS? The only alternative would be to move
>>>> everything to CentOS CI. Fabricio's pulp_file PR demonstrates though that
>>>> our CI scripts can be made to run in multiple CIs. These scripts are the
>>>> majority of our CI/CD code; the Travis/Github Actions configs are only a
>>>> couple hundred lines. So most of our code will be shared across CIs, which
>>>> should alleviate most of the burden of supporting more than one CI.
>>>>
>>>> I would suggest as a next step we merge the ansible-pulp PR[1] as it
>>>> should provide some real world data about running on Github Actions which
>>>> we can consider. Moreover, its CI is independent from the plugin_template
>>>> and it should help to alleviate most of our bottlenecks in Travis. We can
>>>> postpone the decision around plugins until we have more data and consensus.
>>>>
>>>> [0] https://pulp.plan.io/issues/6104
>>>> [1] https://github.com/pulp/ansible-pulp/pull/217
>>>>
>>>> David
>>>>
>>>>
>>>> On Thu, Feb 6, 2020 at 5:51 AM Brian Bouterse <bmbouter at redhat.com>
>>>> wrote:
>>>>
>>>>> Inline replies to three convos would be too confusing, so I'm going to
>>>>> try to bring it back to a single thread.
>>>>>
>>>>> The Pulp team can't afford to do two CI's. I estimate it's taken many
>>>>> hundreds of hours cumulatively and probably >10 hours a week at least
>>>>> maintaining the CI for Travis in the plugin template. The current
>>>>> commitments and size of the pulp dev team can't sustain doubling that
>>>>> additional level of investment. Think about allllllll the changes that we
>>>>> make weekly. Are we prepared to "port" those continuously? I'm not. I think
>>>>> it's categorically a non-starter from a resource perspective.
>>>>>
>>>>> I don't think it's a good thing to split the plugins to use various
>>>>> CI's. Today if something doesn't work, it doesn't work in all plugins CI,
>>>>> and if someone fixes it, all plugins get fixed (for the most part).
>>>>> Splitting plugins across different CI's with incompatible features and no
>>>>> parity between them will put us in a situation where we lose the benefits
>>>>> of every improvement improving everyone.
>>>>>
>>>>> Is this work being done to serve a stakeholder asking for it? I ask
>>>>> because if it isn't, it's taking the place of work stakeholders are asking
>>>>> for to be delivered in Feb and March. Those timelines are so close, I'm
>>>>> surprised others perceive that now is the right time to take on a goal like
>>>>> this.
>>>>>
>>>>> I'm on PTO until the 17th so I will only be able to provide input on
>>>>> his decision sparsely until then.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I'm perceiving that people don't want to continue on Travis and this
>>>>> is the way for some plugin writers to leave Travis. The problem is that
>>>>>
>>>>> On Wed, Feb 5, 2020 at 12:44 PM Fabricio Aguiar <
>>>>> fabricio.aguiar at redhat.com> wrote:
>>>>>
>>>>>> I believe we can add GH actions on plugin_template,  then we would
>>>>>> have:
>>>>>> $ ./plugin-template --travis PLUGIN_NAME
>>>>>> or
>>>>>> $ ./plugin-template --ghactions PLUGIN_NAME
>>>>>> it is not implemented yet on plugin_template,
>>>>>> but my experience with pulp_file (
>>>>>> https://github.com/pulp/pulp_file/pull/353)  makes me think it will
>>>>>> be easy to create a template for it since I didn't change many files,
>>>>>> and I have not removed travis.
>>>>>> This way, we can make plugin_template run both, travis and GH actions.
>>>>>> Working with GH actions was a good exercise, I struggled to find a
>>>>>> replacement for TRAVIS_COMMIT_RANGE, and got some config issues with
>>>>>> kubectl and httpie.
>>>>>> I personally think changing to GH is totally optional for plugins,
>>>>>> but I believe ansible-pulp and pulp_rpm_prerequisites should move to GH
>>>>>> actions, as both not use plugin_template and consume a lot of time.
>>>>>> And make plugin_template run in both travis and GH actions, for
>>>>>> pushing us to be more agnostic.
>>>>>>
>>>>>> Best regards,
>>>>>> Fabricio Aguiar
>>>>>> Software Engineer, Pulp Project
>>>>>> Red Hat Brazil - Latam <https://www.redhat.com/>
>>>>>> +55 11 999652368
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 5, 2020 at 2:16 PM David Davis <daviddavis at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Brian,
>>>>>>>
>>>>>>> Thanks for the feedback. Responses inline below.
>>>>>>>
>>>>>>> On Wed, Feb 5, 2020 at 10:31 AM Brian Bouterse <bmbouter at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm concerned about the move to GH actions and also the timing. The
>>>>>>>> benefits of lowering the CI runtime are really great, but I'm worried it
>>>>>>>> isn't helping us towards our goals and even takes us further from them.
>>>>>>>>
>>>>>>>> I'm worried about double the outage risk. There are outages, and
>>>>>>>> structurally repo CI pipelines that require more services are at more risk
>>>>>>>> for total outage. This raises the risk of "total CI pipelines halting" in a
>>>>>>>> concerning way for me. Trading runtime for risk I don't think is an overall
>>>>>>>> win; I'd like to find a way to lower the runtime and keep risk the same or
>>>>>>>> lower.
>>>>>>>>
>>>>>>>
>>>>>>> We've been plagued by Travis outages and bottlenecks over the past
>>>>>>> year. Our plugin_template is currently tied to Travis so one option would
>>>>>>> be to allow plugin writers to choose which CI to use and divorce Pulp from
>>>>>>> being tied to a single one. This ought to reduce risk and the impact of
>>>>>>> outages.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Whatever we do I want to make sure we're doing it fully through the
>>>>>>>> plugin template. Is this through the plugin template? If it isn't, or it
>>>>>>>> requires additional steps to configure it than they had before, then I'm
>>>>>>>> concerned about it taking us further from our goals of having the plugin
>>>>>>>> writer take as much burden from the plugin writer as possible. I use this
>>>>>>>> thinking to answer the question posed from daviddavis. My take is that the
>>>>>>>> plugin template's goal is to make writing a plugin with great CI as easy as
>>>>>>>> possible. It's design to be a quality improver and a time saver.
>>>>>>>>
>>>>>>>
>>>>>>> Agreed, the goal is to update the plugin_template. The plan is to
>>>>>>> start by moving ansible-pulp to Github Actions first and test out Github
>>>>>>> Actions as a viable replacement for Travis. Then move pulpcore and plugins
>>>>>>> (via the plugin_template). The ansible-pulp repo doesn't use
>>>>>>> plugin_template for its CI configuration so we don't have to change the
>>>>>>> plugin_template in testing out Github Actions for ansible-pulp and also
>>>>>>> ansible-pulp is the main hog of our Travis resources consuming job runners
>>>>>>> for 1+ hours.
>>>>>>>
>>>>>>> To your point about the plugin_template, supporting Github Actions
>>>>>>> shouldn't add additional burden to the plugin writer. The two options are
>>>>>>> to either move to Github Actions wholesale or let plugin writers choose
>>>>>>> which CI to use (which we could default). Either option would require zero
>>>>>>> extra steps for plugin writers. And the latter would give more flexibility
>>>>>>> to plugin writers if they want to use a different CI.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Having the lower runtime is nice, but if we're going to put effort
>>>>>>>> in the CI, I'd like to bring up prioritizing getting the plugin_template
>>>>>>>> integrated with https://ci.centos.org/ as a high-value goal. I'm
>>>>>>>> concerned that we're about to ship the SELinux policy and we have no way to
>>>>>>>> test it. Similar concerns with certguard's dependency and its dependencies
>>>>>>>> not being packaged on Ubuntu (so it's hard to run on Travis). Also, I'm
>>>>>>>> concerned we don't have an environment to evaluate FIPS compatibility with.
>>>>>>>> Relatively speaking if we can only do one of these two initiatives at this
>>>>>>>> time, I believe we should do the CentOS CI.
>>>>>>>>
>>>>>>>
>>>>>>> I don't see moving to CentOS CI and Github Actions as mutually
>>>>>>> exclusive. In fact, I think moving to Github Actions could make it easier
>>>>>>> to use to CentOS CI by making our CI/CD code more CI agnostic. Moreover,
>>>>>>> much of the hard work to move to Github Actions was already completed by
>>>>>>> Fabricio last week.
>>>>>>>
>>>>>>>
>>>>>>>> Lowering the runtime I'm really in favor of, so I hope these
>>>>>>>> concerns prompt discussion more than stop the initiative. What do you all
>>>>>>>> think?
>>>>>>>>
>>>>>>>> On Wed, Feb 5, 2020 at 9:05 AM David Davis <daviddavis at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Great question. IMO the main benefit in continuing to support
>>>>>>>>> Travis is that we could better separate our test/deployment code from the
>>>>>>>>> CI specific bits so that most of the plugin_template code could be CI
>>>>>>>>> agnostic. That said, this would be more work. I think it comes down to
>>>>>>>>> whether we want our plugin_template to be more opinionated or more
>>>>>>>>> configurable.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 5, 2020 at 8:18 AM Dana Walker <dawalker at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1 to moving to Github Actions.
>>>>>>>>>>
>>>>>>>>>> Can anyone think of reasons a plugin would want to stay with
>>>>>>>>>> Travis specifically?  As fao89 pointed out on the issue, at least each
>>>>>>>>>> plugin that does choose to move takes some of the workload with them to
>>>>>>>>>> free up job runners for plugins that choose to remain.
>>>>>>>>>>
>>>>>>>>>> Dana Walker
>>>>>>>>>>
>>>>>>>>>> She / Her / Hers
>>>>>>>>>>
>>>>>>>>>> Software Engineer, Pulp Project
>>>>>>>>>>
>>>>>>>>>> Red Hat <https://www.redhat.com>
>>>>>>>>>>
>>>>>>>>>> dawalker at redhat.com
>>>>>>>>>> <https://www.redhat.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 4, 2020 at 10:26 AM David Davis <
>>>>>>>>>> daviddavis at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Over the past year, we've experienced several growing pains with
>>>>>>>>>>> using Travis as our CI/CD environment. Perhaps the biggest has been the
>>>>>>>>>>> limitation of having only 3 concurrent job runners[0] across our entire
>>>>>>>>>>> Pulp organization. At times, it has slowed development by bottlenecking the
>>>>>>>>>>> merging of PRs and delayed numerous releases of Pulp.
>>>>>>>>>>>
>>>>>>>>>>> Last year, Github introduced Github Actions which offers open
>>>>>>>>>>> source projects 20 concurrent jobs[1]. I've filed an issue here to get
>>>>>>>>>>> feedback on moving our repos and plugins to Github Actions:
>>>>>>>>>>>
>>>>>>>>>>> https://pulp.plan.io/issues/6065
>>>>>>>>>>>
>>>>>>>>>>> Also, @fao89 has opened a couple PoC PRs to demonstrate using
>>>>>>>>>>> Github Actions:
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/pulp/pulp_file/pull/353
>>>>>>>>>>> https://github.com/pulp/ansible-pulp/pull/217
>>>>>>>>>>>
>>>>>>>>>>> You'll notice for example that the ansible-pulp build time went
>>>>>>>>>>> from more than 1 hour[2] to 27 minutes[3] as all the jobs ran in parallel
>>>>>>>>>>> on Github Actions.
>>>>>>>>>>>
>>>>>>>>>>> Unless there are objections, we plan to merge the ansible-pulp
>>>>>>>>>>> PR this week since it's CI configuration is independent from other pulp and
>>>>>>>>>>> plugin repos (ie it doesn't use the plugin_template's Travis files).
>>>>>>>>>>>
>>>>>>>>>>> We're hoping though to get feedback on whether we should move
>>>>>>>>>>> pulpcore and plugin repos to Github Actions. If so, should we provide
>>>>>>>>>>> plugins with the option to continue using Travis if they want?
>>>>>>>>>>>
>>>>>>>>>>> If there's no objections by February 11, 2020, we'll proceed
>>>>>>>>>>> with moving pulp_file to Github Actions and look at updating
>>>>>>>>>>> plugin_template.
>>>>>>>>>>>
>>>>>>>>>>> [0] https://travis-ci.com/plans
>>>>>>>>>>> [1]
>>>>>>>>>>> https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#usage-limits
>>>>>>>>>>> [2] https://travis-ci.org/pulp/ansible-pulp/builds/645651353
>>>>>>>>>>> [3]
>>>>>>>>>>> https://github.com/fabricio-aguiar/ansible-pulp/actions/runs/33601847
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Pulp-dev mailing list
>>>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>> Pulp-dev mailing list
>>>>>>>>> Pulp-dev at redhat.com
>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200209/56f042ca/attachment.htm>


More information about the Pulp-dev mailing list