[libvirt PATCH 0/4] RFC: tests: introduce lavocado

Thu Jul 22 21:11:53 UTC 2021

Daniel P. Berrangé writes:

> On Tue, Jul 20, 2021 at 02:19:25PM +0200, Erik Skultety wrote:
>> On Thu, Jul 01, 2021 at 07:04:32PM +0100, Daniel P. Berrangé wrote:
>> > Libvirt has consumers writing applications in a variety of
>> > languages, and periodically reporting bugs.  My general wish
>> > for a test harness would be for something that can invoke
>> > and consume arbitrary test cases. Essentially if someone
>> > can provide us a piece of code that demonstrates a problem
>> > in libvirt, I would like to be able to use that directly as
>> > a functional test, regardless of language it was written
>> > in.
>> 
>> Well, in theory yes - it is all nice to let anyone contribute tests in their
>> language of preference. Realistically, look at the nwfilter stuff in TCK, it's
>> a pile of unintelligible Bash which cannot be debugged. So, I understand why
>> you'd want to imagine or write tests in Go, but someone will have to maintain
>> them otherwise we'd end up with a 3rd framework nobody will want to use.
>
> The nwfilter shell script was just badly written, and we should have
> pushed back on accepting it in that state, but it is was so broadly
> functional it was preferrable than getting no testing. It could have
> been just as horrible if written in Perl or Python too - the core
> approach of the tests themselves was just wrong. The use of shell
> was just icing on the cake.
>

Clearly that are exceptions, but I agree that for non-unit tests, the
"foreign" aspect of a language is pretty much irrelevant.  The
suitability of a given language will mostly be based on its own
characteristics, and Python seems to do quite well for writing tests
indeed.

Also, my impression is that the quality of documentation, test templates
and relevant functionality that is found in an existing "framework" is
generally inversely proportional to people's motivation to write a test
in another language.  Even if people have another language they strongly
prefer, and even if the project will execute without tests written in other
languages without major hurdles, documentation/templates/relevant APIs
weight very heavily on the other side of the scale.

That explains at least partially why Perl is, still to this day, the
language of virtually every single test on TCK.

>
> The main point is that we have applications consuming libvirt in
> multiple languages. Python and Go are the most important ones in
> terms of usage today, and we get some quite detailed bug reports
> with demo code. If we get a bug report with a Go demo program,
> it can't be assumed to be easy to just convert that to Python,
> and vica-verca.
>
> We do have to consider the maintainability of any tests being
> contributed, but I don't want to be rejecting bug reproducers
> that work, just because they're not in a specific language.
>
>
>

QEMU is a close example of tests written in many languages.  I agree
that it's better to have a test in whatever language than none at all.

Having said that, maintaining the quality of tests over time is hard
enough when they are mostly uniform (built and/or executed similarly).
When tests are written in "one off" languages that require extra steps
to be built/executed, this is further aggravated.

In short, as long as there's an easy way to run tests, they are run
often, results are clear and logs are also somewhat uniform (if not in
content at least wrt location), then it's a no-brainer whether to keep a
contributed test or not in the tree.

>> Luckily, Python as a language doesn't suffer from this and we've already made
>> that preference in the libvirt repo already. So, I'd suggest to use our
>> expertise in helping contributors who wish to contribute a test with porting
>> their test case to a Python one - yes, it does consumer human resources and
>> prolongs the time for a contribution to be accepted, but that pays off later
>> down the road when the test cases need to be adapted to new
>> conditions/environments.
>
> Certainly I'd say we can consider python as the default or preferred
> language for writing tests, but I'd like us to be open to accepting
> tests in other languages. 
>
>
>> > In theory the libvirt TCK allowed for that, and indeed the
>> > few tests written in shell essentially came to exist because
>> > someone at IBM had written some shell code to test firewalls
>> > and contributed that as a functional test. They just hacked
>> > their script to emit TAP data.
>> 
>> Others have already responded that Avocado can already do the same.
>> What I'd like to say, again in context of arbitrary languages used in tests,
>> the problem with the whole bash machinery IBM contributed is not only that it
>> cannot be debugged, but also that you cannot replay a single failed test, so if
>> an error happened in e.g. test 1148, you have literally no idea how to work
>> only with that test case. Knowing literally zero about Avocado's nrunner and
>> external (non-native) test suite support I cannot tell whether Avocado would
>> have done a better job than TCK (I doubt it), but what I know for sure is that
>> if you write native tests, you get much better support and control over the
>> test suite.
>
> As above the shell tests were just bad as an overall conceptual approach.
> Running commands and then comparing the stdout/err against expected
> output is way too fragile, and inflexible for debugging too. We see the
> same problem in QEMU block layer tests too.
>
>> IIRC it was still necessary in TCK to run some of the network-related
>> tests with root permissions, but polkit may help (I haven't tried) - I sure
>> don't want to run the test suite with root permissions to overcome some of the
>> issues, because that doesn't simulate the real world usage. Therefore, I'd like
>> libvirt Avocado as a framework to be set up by default in a way where polkit
>> rules are in effect.
>
> In theory you shouldn't need root perimissions directly, because you just
> have to grant access to libvirtd.  In practice of course that is just a
> facade as a libvirt connection gives you indirect permissions equiva to
> a root shell. The tests written was supposed to take care when interacting
> with host resources - so only use NICs that are explicitly listed in the
> config file, otherise skip the test. This kind of approach is doable for
> any test framework - just requires someone to define the best practice
> and reviewers to make sure people actually follow it.
>
>> > I think the test framework should not concern itself with this
>> > kind of thing directly. We already have libvirt-ci, which has
>> > tools for provisioning VMs with pre-defined package sets. The
>> > test harness should just expect that this has been done, and
>> > that it is already running in such an environment if the user
>> > wanted it to be.
>> 
>> I suggested Beraldo to put this one in so as to make it very clear what our
>> intentions are wrt/ coming up with a new framework and our expectations of it
>> as an end result (like I said, we already have 2 frameworks, we don't want this
>> one to end up the same)
>
>
>
>> 
>> > 
>> > In the TCK config file we provided setting for a URI to connect
>> > to other hosts, to enable multi-hosts tests like live migration
>> > to be done.
>> 
>> We would not exclude live migration from the test suite, it's simply a beast on
>> its own (mainly because of automation, more specifically preparing a know
>> destination to migrate to). But again, ^this is not something the framework
>> itself should be concerned about, just a background story :).
>> 
>> > 
>> > > I'm adding more information with some details inside the README file.
>> > 
>> > Overall, I'm more enthusiastic about writing tests in Python
>> > than Perl, for the long term, but would also potentially like
>> > to write tests in Go too.
>> > 
>> > I'm wondering if we can't bridge the divide between what we
>> > have already in libvirt-tck, and what you're bringing to the
>> > table with avocado here. While we've not done much development
>> > with the TCK recently, there are some very valuable tests
>> > there, especially related to firewall support and I don't
>> > fancy rewriting them.
>> 
>> Nobody does fancy rewriting those, but there's no way going around that, we'll
>> have to do it at some point because like I said, they're very error prone
>> since they match against the verbatim output of firewall CLI tools which leads
>> to frequent breakages and you figuring out what went wrong is pure torment.
>
> Agreed, we'll have to bite the bullet eventually. The hard thing is
> deciding just how we want to validate correctness. Comparing stdout
> format is an "easy" way. If we don't do that we need to figure out a
> way to declare what the firewall state is suposed to be, and how to
> validate a match, as we want to validate that the correct set of
> rules is actually generated.
>
> Possibly we should have used a python binding for libxtables to
> query the firewall state and thus been in control of the expected
> data format.
>
>> > Thus my suggestion is that we:
>> > 
>> >   - Put this avocado code into the libvirt-tck repository,
>> >     with focus on the supporting infra for making it easy to
>> >     write Python tests
>> 
>> Clearly I missed the section where you mentioned that we don't need to host
>> the framework in the main repo...Anyway, I disagree here, again, we have a
>> precedent of 2 failed frameworks where nobody really knows where to contribute
>> a test nor wants to deal with any additional repos. A few remarks
>> - it is much easier if the
>>   test suite lives alongside our unit test suite which is an "environment" every
>>   developer is familiar with.
>
> I don't buy that argument. We already have countless repos and people
> who are motivated to contribute do so easily. The failures of the
> existing frameworks is down to a number of reasons. People really
> didn't want to write Perl code is the big one for the TCK. For the
> other framework its complexity and fact that it was not safe to
> run on precious hosts put off ever using it.
>
> The biggest thing though is that we never even made an effort to
> develop a culture of writing functional tests as a prerequisite
> for merging feature.
>
> In QEMU there is a avocado integration in-repo, but very few
> QEMU contributors ever do anything with it.  The block iotests

Well, while this doesn't have to be the only deciding factor for in-tree
or out-of-tree tests, I mentioned exactly the opposite a few days ago as
a very positive outcome of having the avocado tests within the QEMU
repo, so I'll have to share it here:

 $ git log --pretty=format:"%an%x09" tests/acceptance/ | sort | uniq | wc -l
 36

And this is from a ~3 year old effort.  On the other hand, just for
comparison on libvirt-tck:

 $ git log --pretty=format:"%an%x09" scripts/ | sort | uniq | wc -l
 17

Within the span of ~12 years.  I understand other factors (such as the
average number of contributors in each project) apply here.

But, I must say that I'm positively impressed by the contributions and
number of contributors in "tests/acceptance", and I believe that the
ability to send a single patch series with tests included at once have
played an significant role here.

> integration is another functional test suite and that is very
> successful, because the block maintainers have set clear
> expectation that you must contributor tests to it if you want
> your features accepted.
>
> We can see the same with other projects which have a strong
> culture of testing. In OpenStack you'll not get your feature
> accepted unless you've made a credible effort to provide
> functional tests - which live in a completely separate git
> repo.
>

Right.  The project's culture and guidelines plays a large role in how
successfull testing will be.

>> - integration into gitlab CI would be much easier with a single repo than with
>>   a multi-repo setup
>>     -> yes, we already do that with binding builds, but I'm talking about
>>        setups where we'd have to configure gitlab to clone 3 repos and build
>>        them in chain, setup the framework config and execute the tests;
>
> As you say, we already deal with that. Cloning extra git repos is
> a complete non-issue in the grand scheme of everything needed to
> create and maintain a CI pipeline.  If we really wanted to we could
> create a psuedo-monolithic repo and graft in the individual repos
> using 'git subtree'.
>

>From a completely ignorant standpoint I ask: are there extra steps users
need to do do have those tests run on their own GIT clones?  I mean, how
does a CI system know that it must test "user/libvirt" with
"user/libvirt-tck"?  I will have my mind blown if this "just works",
with the right branch names used across repos, etc.

>> - keep in mind that we're trying to keep our efforts in sync with QEMU wrt/
>>   (should apply vice-versa as well), so if avocado-qemu was accepted in QEMU
>>   upstream as the current framework which also happens to be hosted under the
>>   main repo, I don't see a problem why we could not do the same in libvirt
>
> QEMU has a very different maintainance model, with its monolithic
> repo approach is actually a significant burden to QEMU development.
> QEMU would be better off in many ways by modularizing its repos.
> As above, having it in the tree doesn't magically make people
> contribute to it
>
>> - the main selling point IIRC of having TCK hosted in a standalone repo was
>>   that you could test ANY version of libvirt
>>     -> we didn't have CI back then, but we do now...so if we start running the
>>        functional tests early, every upstream release would technically be
>>        tested, so there would not be a need to test older versions of libvirt
>>        with the same test suite, because it would have already been done before
>>        releasing that specific version at some point in time
>
> That's only true of unit tests because those run in isolation only
> exercising the libvirt code. The functional / integration testing
> is looking at the results of the entire system. You absolutely
> need to run those tests in many different scenarios against both
> pre-release and post-release code, because changes in the OS
> will frequently break you. Changes in the OS will require fixing
> the test suite and you then need to run the updated test suite
> against the existing libvirt release. This is the key reason why
> it is beneficial to have to cleanly separated from the repo.
>

These points are true, but in my experience, it's very easy to make them
non-issue.

In a recent experience, I was using an acceptance tests to find a
regression (which slipped through because the test was not being run on
CI due to lack of suitable hardware), so Erik's point is also very
relevant here.

In that case, I had to preserve the test, while going to a point in
history where the "avocado_qemu" supporting code had some changes I
needed.  Either a "avocado run -p qemu_bin=/path/to/other/qemu-bin" or a
PYTHONPATH tweak pointing to a stable "avocado_qemu" was enough to solve
that.

Having important tests "always" running, and/or documenting/scripting
how to run newer tests with older repos/binaries is, in my experience,
an easy enough remedy to the single repo problem.

> We see this in Fedora/RHEL CI gating jobs, which run after the
> RPM builds are done, using a full composed OS environent. For
> many packages we have to untangle the test suite from the main
> app code, in order to be able to run it seprately
>
>> >   - Switch the TCK configuration file to use JSON/YAML 
>> >     instead of its current custom format. This is to enable
>> >     the Python tests to share a config file with the Perl
>> >     and Shell tests. Thus they can all have the same way
>> >     to figure out what block devs, NICs, PCI devs, etc
>> >     they are permitted to use, and what dirs it is safe
>> >     to create VM images in, etc.
>> 
>> I'm worried that any more time we spend on trying to re-shape TCK into
>> something modern is wasted compared to adopting Avocado. I've been fixing the
>> tests in TCK for a while just to make sure TCK doesn't fail, because the
>> original idea was for Avocado to utilize its nrunner to run TCK in its current
>> shape and start testing early just to give us time to transition from TCK to
>> Avocado by porting a good fistful of tests.
>
> The TCK is frequently broken largely because we never run it, and so it
> bit rots as the OS we're running it on changes beneath our feet. We see
> the same problem with any of QEMU's functional tests that don't get run
> on a frequent basis, includng the avocado tests it has.
>
> The only real solution there is to ensure whatever test suite we have
> is actually run frequently and set an expectation that a maintainers
> job is more than just writing code - they should be expected to help
> fix test failures that arise. If the CI is used as a gating check
> this becomes a strong motivating factor, because contributors code
> will never get merged if the gating is broken.
>
> With the every little experimentation I've done it looks like it should
> be trivial to get avocado to run all the existing TCK tests by just
> giving it a hint that they're TAP format. That looks like a small part
> of work involved in actually developing avocado into a solid test
> framework for libvirt.
>
> Regards,
> Daniel

Again, thanks a lot for all your feedback here!
- Cleber.