[libvirt PATCH 4/4] gitlab-ci: Introduce a new test 'integration' pipeline stage

Wed Mar 2 12:11:04 UTC 2022

On Wed, Mar 02, 2022 at 09:43:24AM +0000, Daniel P. Berrangé wrote:
...

> > > 
> > > No, I got that part. My question was whether
> > > 
> > >   other-project-pipeline:
> > >     trigger:
> > >       project: other-project
> > >       strategy: depend
> > > 
> > >   our-job:
> > >     needs:
> > >       - other-project-pipeline
> > >       - project: other-project
> > >         job: other-project-job
> > >         artifacts: true
> > > 
> > > actually guarantees that the instance of other-project-job whose
> > > artifacts are available to our-job is the same one that was started
> > > as part of the pipeline triggered by the other-project-pipeline job.
> > 
> > Sorry for a delayed response.
> > 
> > I don't think so. We can basically only rely on a fact that the jobs would
> > actually be queued in order they arrive which means that jobs submitted earlier
> > should finish earlier, but that of course is only a premise not a guarantee.
> > 
> > On the other hand I never intended to run the integration CI on every single
> > push to the master branch, instead, I wanted to make this a scheduled pipeline
> > which would effectively alleviate the problem, because with scheduled pipelines
> > there would very likely not be a concurrent pipeline running in libvirt-perl
> > which would make us download artifacts from a pipeline we didn't trigger
> > ourselves.
> 
> Ultimately when we switch to using merge requests, the integration tests
> should be run as a gating job, triggered from the merge train when the
> code gets applied to git, so that we prevent regressions actually making
> it into git master at all.
> 
> Post-merge integration testing always exhibits the problem that people will
> consider it somebody else's problem to fix the regression. So effectively
> whoever creates the integration testing system ends up burdened with the
> job of investigating failures and finding someone to poke to fix it. With
> it run pre-merge then whoever wants to get their code merged needs to
> investigate the problems. Now sometimes the problems with of course be
> with the integration test system itself, not the submitters code, but
> this is OK because it leads to situation where the job of maintaining
> the integration tests are more equitably spread across all involved and
> builds a mindset that functional / integration testing is a critical
> part of delivering code, which is something we've lacked for too long
> in libvirt.

Agreed.

> 
> 
> > > > > Taking a step back, why exactly are we triggering a rebuild of
> > > > > libvirt-perl in the first place? Once we change that project's
> > > > > pipeline so that RPMs are published as artifacts, can't we just grab
> > > > > the ones from the latest successful pipeline? Maybe you've already
> > > > > explained why you did things this way and I just missed it!
> > > >
> > > > ...which brings us here. Well, I adopted the mantra that all libvirt-friends
> > > > projects depend on libvirt and given that we need libvirt-perl bindings to test
> > > > upstream, I'd like to always have the latest bindings available to test with
> > > > the current upstream build. The other reason why I did the way you commented on
> > > > is that during development of the proposal many times I had to make changes to
> > > > both libvirt and libvirt-perl in lockstep and it was tremendously frustrating
> > > > to wait for the pipeline to get to the integration stage only to realize that
> > > > the integration job didn't wait for the latest bindings and instead picked up
> > > > the previous latest artifacts which I knew were either faulty or didn't contain
> > > > the necessary changes yet.
> > > 
> > > Of course that would be annoying when you're making changes to both
> > > projects at the same time, but is that a scenario that we can expect
> > > to be common once the integration tests are in place?
> > > 
> > > To be clear, I'm not necessarily against the way you're doing things
> > > right now, it's just that it feels like using the artifacts from the
> > > latest successful libvirt-perl pipeline would lower complexity, avoid
> > > burning additional resources and reduce wait times.
> > > 
> > > If the only only downside is having a worse experience when making
> > > changes to the pipeline, and we can expect that to be infrequent
> > > enough, perhaps that's a reasonable tradeoff.
> > 
> > I gave this more thought. What you suggest is viable, but the following is worth
> > considering if we go with your proposal:
> > 
> > - libvirt-perl jobs build upstream libvirt first in order to build the bindings
> >     -> generally it takes until right before the release that APIs/constants
> >        are added to the respective bindings (Perl/Python)
> >     -> if we rely on the latest libvirt-perl artifacts without actually
> >        triggering the pipeline, yes, the artifacts would be stable, but fairly
> >        old (unless we schedule a recurrent pipeline in the project to refresh
> >        them), thus not giving us feedback from the integration stage that
> >        bindings need to be added first, because the API coverage would likely
> >        fail, thus failing the whole libvirt-perl pipeline and thus invalidating
> >        the integration test stage in the libvirt project
> >         => now, I admit this would get pretty annoying because it would force
> >            contributors (or the maintainer) who add new APIs to add respective
> >            bindings as well in a timely manner, but then again ultimately we'd
> >            like our contributors to also introduce an integration test along
> >            with their feature...
> 
> Note right now the perl API coverage tests are configured to only be gating
> when run on nightly scheduled jobs. I stopped them being gating on contributions
> because if someone if fixing a bug in the bindings it is silly to force
> their merge request to also add new API bindings.
> 
> I'm thinking about whether we should even making the API coverage tests be
> non-gating even for scheduled jobs. I miss the fact that we when we see a
> notification of a failed pipeline we don't see at a glance whether it is
> a genuine build failure or merely a new API missing.

T think ^this could be done with an external dashboard monitoring gitlab CI
pipelines, providing all relevant information: artifacts, job that failed the
pipeline (as long as coverage is a separate job).
Anyhow, I believe that's a topic for another day :).

> 
> The python bindings have a little different of a situation. Sometimes the
> code generator can do the job on its own, but other times the code generator
> trips over its cane and breaks a leg. In the latter cases, we're always going
> to get a hard CI failure we can't ignore, unless we teach the code generator
> to make it a soft failure and just skip the API with a warning when it is
> something it can't cope with. I think if we're going to use the python
> bindings from automated tests we'll have no choice but to make the code
> generator treat it as a soft failure, if we want to use the tests as a
> gating check, otherwise you'll end up with a chicken & egg problem between
> merging new APIs to C lib and Python.

^This one's going to be "fun" though. Once we merge this integration CI
prototype I can work on improving the situation with Python bindings, we'll
need them anyway.

> 
> 
> > > > What's the point, we'd have to constantly refresh the tags if the platforms
> > > > come and go given our support, whereas fedora-vm and centos-stream-vm cover all
> > > > currently supported versions - always!
> > > > Other than that, I'm not sure that tags are passed on to the gitlab job itself,
> > > > I may have missed it, but unless the tags are exposed as env variables, the
> > > > provisioner script wouldn't know which template  to provision. Also, the tag is
> > > > supposed to annotate the baremetal host in this case, so in that context having
> > > > '-vm' in the tag name makes sense, but doesn't for the provisioner script which
> > > > relies on/tries to be compatible with lcitool as much as possible.
> > > 
> > > Okay, my misunderstanding was caused by not figuring out the purpose
> > > of DISTRO. I agree that more specific tags are not necessary.
> > > 
> > > Should we make them *less* specific instead? As in, is there any
> > > reason for having different tags for Fedora and CentOS jobs as
> > > opposed to using a generic "this needs to run in a VM" tag for both?
> > 
> > Well, I would not be against, but I feel this is more of a political issue:
> > this HW was provided by Red Hat with the intention to be dedicated for Red Hat
> > workloads. If another interested 3rd party comes (and I do hope they will) and
> > provides HW, we should utilize the resources fairly in a way respectful to the
> > donor's/owner's intentions, IOW if party A provides a single machine to run
> > CI workloads using Debian VMs, we should not schedule Fedora/CentOS workloads
> > in there effectively saturating it.
> > So if the tags are to be adjusted, then I'd be in favour of recording the owner
> > of the runner in the tag.
> 
> If we have hardware available, we should use to the best of its ability.
> Nothing is gained by leaving it idle if it has spare capacity to run jobs.
> 

Well, logically there's absolutely no disagreement with you here. Personally,
I would go about it the same. The problem is that the HW we're talking about
wasn't an official donation, Red Hat still owns and controls the HW, so the
company can very much disagree with running other workloads on it long term.
I'm not saying we shouldn't test the limits, reliability and bandwidth to its
fullest potential. What I'm trying to say is that the major issue here is that
contributing to open source projects is a collaborative effort of all
interested parties (duh, should go without saying) and so we cannot expect a
single party which just happens to have the biggest stake in the project to run
workloads for everybody else. I mean the situation would have been different if
the HW were a proper donation, but unfortunately it is not. If we pick and run
workloads on various distros for the sake of getting coverage (which makes
total sense btw), it would later be harder to communicate back to the community
why the number of distros (or their variety) would need to be decreased once
the HW's capabilities are saturated with demanding workloads, e.g. migration
testing or device assignment, etc.

Whether I do or do not personally feel comfortable being involved in ^this
political situation and decision making, as a contributor using Red Hat's email
domain I do respect the company's intentions with regards to the offered HW.
I think that the final setup we agree on eventually is up for an internal
debate and doesn't have a direct impact on this proposal per-se.

> Until we start using it though, we will not have a clear idea of how many
> distro combinations we can cope with for integration testing. We'll also
> want to see how stable the jobs prove to be as we start using it for real.
> With that in mind it makes sense to start off with a limit number of distro
> jobs and monitor the situation. If it is reliable and the machine shows it
> has capacity to run more then we can add more, picking distros that give

Yes, the number of machines will rise once we progress with the test suite and
enable migration tests which is something I haven't polished and tested yet in
libvirt-tck.

> the maximum benefit in terms of identifying bugs.  IOW, I would much
> rather run 1x CentOS Stream + 1x Fedora + 1x Debian + 1x Suse, than
> 2 x CentOS and 2x Fedora, because the former will give much broader
> ability to find bugs.

Regards,
Erik