[Freeipa-devel] [RANT] pytest fixture scope is braindead

Mon Dec 8 14:15:10 UTC 2014

On Mon, 2014-12-08 at 12:27 +0100, Petr Viktorin wrote:
> On 12/05/2014 11:51 PM, Nathaniel McCallum wrote:
> > So I've been working this week on porting the OTP tests that were
> > manually coded to the new pytest framework. Now, I'm the first to want
> > better tooling to make our job easier. But, pytest? Meh. I'm having to
> > write just as much code (or more) to get my tests on par with pytest,
> > and they are riddled with subtle bugs.
> 
> Hi,
> Yeah, IPA runs into specific situations that are very different from how 
> tests usually work in other projects.

Yeah, this I get.

> And you took one of the harder cases.
> IPA's existing tests also don't really follow best practices – they 
> pretty much assume you run the whole test suite, or at least the whole 
> module, so selecting individual tests is currently not reliable. On the 
> other hand, this assumption made some things easy to code – like 
> complicated ordering and setup/teardown.

I think this is pretty much a red herring. It isn't so much about setup
and teardown but about mutually exclusive parameterization. It is a poor
assumption on the part of pytest that the lifecycles of parameters
overlap. At the least, an option should be provided to control this.

> One thing we want to do during the move to pytest is to have the tests 
> be more independent of the current state of the database, and also of 
> each other.

Sure, this is a fine goal. Unless of course you want to test the
functionality of the code against all possible database states. I think
pretty much every xmlrpc test will have some component of this.

> If certain assertions only make sense in a specific order, either put 
> them in the same test*, or write additional code that explicitly handles 
> dependencies and ordering. (Sorry, that's the price to pay for isolated 
> tests – I don't see a way around it.)

In an ideal world, we define a test and all possible parameters to that
test and allow the running of that test independently by choosing from
the proscribed inputs. This all works fine so long as you specify a way
to indicate when inputs are mutually exclusive. Since inputs are single
valued anyway, this should be the default behavior. Unfortunately,
pytest fails on both of these counts.

> If on the other hand the order doesn't matter – any order of any subset 
> of individual tests should pass, which is our goal – but pytest's 
> default ordering makes the suite slow, there is a hook that allows you 
> to override the order: pytest_collection_modifyitems [0]. Stick it in 
> the affected module (or conftest.py file for multiple modules, or a 
> plugin for the whole suite), and sort the items explicitly.

The problem isn't (I think) ordering but lifecycle. I think that if I
scope the fixture to the function but reorder the items, they will still
be constructed and torn down for every single function iteration. On the
other hand, if I scope the fixture for a longer lifecycle, the
parameters may not be mutually exclusive.

> If your test depends on something not existing or being disabled, you 
> should try removing/disabling it before the test, or write a fixture to 
> do that.

That's not the problem. The problem is that I can't define parameters
for a single object which sets the same variable to different values.
Such an act means the parameters are mutually exclusive.

> * might make sense, for starters?
> [0] 
> http://pytest.org/latest/plugins.html#_pytest.hookspec.pytest_collection_modifyitems
> 
> 
> > Most painful, is pytest.fixture(). It is a cool idea, but the scoping
> > and lifecycle is horribly broken. Here is an example:
> >
> > Here is an example. I want to create a user, enable otp in two different
> > ways, test with two different types of tokens under two different
> > authentication scenarios. This is a total of 8 tests.
> >
> > @pytest.fixture(scope="function") # This scope is default.
> > def user(request):
> >    user = create_user('tuser1')
> >    request.addfinalizer(lambda: delete_user('tuser1'))
> >    return user
> >
> > @pytest.fixture(params=["per-user", "global"])
> > def enablement(request, user):
> >    output = enable(user, request.param)
> >    request.addfinalizer(lambda: disable(user))
> >    return output
> >
> > @pytest.fixture(params=[{'type': 'HOTP'}, {'disabled': True}])
> > def token(request, user, enablement):
> >    output = create_token(request.param)
> >    request.addfinalizer(lambda: delete_token(output.id))
> >    return output
> >
> > @pytest.mark.parametrize("opts", [foo, bar])
> > def test_authentication(user, token, opts):
> >    return auth(user.uid, token.code(), opts)
> >
> > Because the default scope is "function", a new user is created for every
> > single run of token. This is *way* more work than necessary, and leads
> > to slow runtimes. Fortunately, we can fix this by changing the scope of
> > the user fixture to "module". In this case, only one user will be
> > created per module. So far so good.
> >
> > The enablement scope is still "function", so it will still be called
> > eight times (#tokens * #enablements * #opts). Now, this is a problem
> > because (let's say) enablement takes a long time to switch tests. Here
> > is problem #1 with pytest: it presumes that all variants of fixtures
> > should be grouped by test parameters. This is a bad assumption. It may
> > make sense in some cases, but there is no way to change it.
> >
> > Now, we can work around this problem just like with the user fixture:
> > change the scope to "module". Unfortunately, we've just caused a second
> > bug. Now, enablement() will only be run twice; but the finalizer will
> > not be called until all tests are run. This means that "per-user"
> > enablement will still be active when "global" enablement runs; both will
> > be torn down simultaneously.
> >
> > This same problem arises for the token fixture: eight tokens will be
> > created and destroyed when only four are necessary. You guessed it, we
> > can fix this by changing the scope. But by doing this all four tokens
> > will be deleted simultaneously. This means we cannot test the state
> > where only a disabled token exists.
> >
> > Any attempt to use a fixture to create long-lived states (precisely what
> > fixtures are designed for) with mutually exclusive parameters is
> > extremely error prone as it is not clear which items are alive in the
> > LDAP db at any point. This leads to both long runtimes *and* extremely
> > subtle bugs in the tests that are multiplied by every layer of fixtures.
> > Call me nonplussed.
> >
> > Now, let's look at how this same test can be implemented without pytest:
> >
> > user = create_user('tuser1')
> > try:
> >    for enablement in ('per-user', 'global'):
> >      enable(user, enablement)
> >      try:
> >        for token in [{'type': 'HOTP'}, {'disabled': True}]:
> >          object = create_token(token)
> >          try:
> >            for opt in [foo, bar]:
> >              auth(user.uid, object.code(), opt)
> >          finally:
> >            delete_token(object.id)
> >      finally:
> >        disable(user)
> > finally:
> >    delete_user('tuser1')
> >
> > Comparing the non-pytest implementation to the pytest one we can see a
> > few things. First, the pytest code has some nice separation of roles.
> > This is the big benefit of pytest. Second, the non-pytest implementation
> > is less code. Third, the non-pytest implementation has no subtle scoping
> > bugs: we know exactly when each database record is created and
> > destroyed.
> 
> This "non-pytest" implementation is a *single test*, since it hard-codes 
> dependencies and ordering. If you write it as as a single test function 
> in pytest, it should look much the same, and get the 
> ordering/dependencies right (except the case of the test user already 
> existing).

Which conflicts with the desired granularity; which is the whole reason
we are moving away from this model. I should be able to define, as
fixtures, a set of states required to run any one of a set of test. Then
I can choose just to run a single iteration of a set of tests and the
set of states should be assembled correctly. The problem is that, with
pytest, this all works unless the parameterized fixtures contain
mutually exclusive state values.

So, I'm obviously not proposing that we do something like the non-pytest
implementation. What I'm pointing out is that being forced to implement
it this way in pytest negates all the benefits of pytest.

> > I *want* to like pytest. Honestly I do. But this scoping idiocy has
> > already cost me roughly two days tracking down subtle bugs in the tests
> > (while not finding any bugs in the OTP code itself). And I can't think
> > that OTP tests written using pytest will be maintainable in any regard;
> > not when such small changes can cause such hard to find bugs.
> >
> > Now, it may be possible that I have just missed something obvious. If I
> > have, forgive me this rant. But I've spent a lot of time reading docs
> > and I haven't seen any obvious solutions to these problems.
> >
> > Nathaniel
>