[Freeipa-devel] [RANT] pytest fixture scope is braindead

Tue Dec 9 14:30:03 UTC 2014

On Tue, 2014-12-09 at 10:39 +0100, Petr Viktorin wrote:
> On 12/08/2014 03:15 PM, Nathaniel McCallum wrote:
> > On Mon, 2014-12-08 at 12:27 +0100, Petr Viktorin wrote:
> >> On 12/05/2014 11:51 PM, Nathaniel McCallum wrote:
> >>> So I've been working this week on porting the OTP tests that were
> >>> manually coded to the new pytest framework. Now, I'm the first to want
> >>> better tooling to make our job easier. But, pytest? Meh. I'm having to
> >>> write just as much code (or more) to get my tests on par with pytest,
> >>> and they are riddled with subtle bugs.
> >>
> >> Hi,
> >> Yeah, IPA runs into specific situations that are very different from how
> >> tests usually work in other projects.
> >
> > Yeah, this I get.
> >
> >> And you took one of the harder cases.
> >> IPA's existing tests also don't really follow best practices – they
> >> pretty much assume you run the whole test suite, or at least the whole
> >> module, so selecting individual tests is currently not reliable. On the
> >> other hand, this assumption made some things easy to code – like
> >> complicated ordering and setup/teardown.
> >
> > I think this is pretty much a red herring. It isn't so much about setup
> > and teardown but about mutually exclusive parameterization. It is a poor
> > assumption on the part of pytest that the lifecycles of parameters
> > overlap. At the least, an option should be provided to control this.
> >
> >> One thing we want to do during the move to pytest is to have the tests
> >> be more independent of the current state of the database, and also of
> >> each other.
> >
> > Sure, this is a fine goal. Unless of course you want to test the
> > functionality of the code against all possible database states. I think
> > pretty much every xmlrpc test will have some component of this.
> >
> >> If certain assertions only make sense in a specific order, either put
> >> them in the same test*, or write additional code that explicitly handles
> >> dependencies and ordering. (Sorry, that's the price to pay for isolated
> >> tests – I don't see a way around it.)
> >
> > In an ideal world, we define a test and all possible parameters to that
> > test and allow the running of that test independently by choosing from
> > the proscribed inputs. This all works fine so long as you specify a way
> > to indicate when inputs are mutually exclusive. Since inputs are single
> > valued anyway, this should be the default behavior. Unfortunately,
> > pytest fails on both of these counts.
> >
> >> If on the other hand the order doesn't matter – any order of any subset
> >> of individual tests should pass, which is our goal – but pytest's
> >> default ordering makes the suite slow, there is a hook that allows you
> >> to override the order: pytest_collection_modifyitems [0]. Stick it in
> >> the affected module (or conftest.py file for multiple modules, or a
> >> plugin for the whole suite), and sort the items explicitly.
> >
> > The problem isn't (I think) ordering but lifecycle. I think that if I
> > scope the fixture to the function but reorder the items, they will still
> > be constructed and torn down for every single function iteration. On the
> > other hand, if I scope the fixture for a longer lifecycle, the
> > parameters may not be mutually exclusive.
> >
> >> If your test depends on something not existing or being disabled, you
> >> should try removing/disabling it before the test, or write a fixture to
> >> do that.
> >
> > That's not the problem. The problem is that I can't define parameters
> > for a single object which sets the same variable to different values.
> > Such an act means the parameters are mutually exclusive.
> 
> OK, I see the problem now. Pytest fixtures are meant for resources, not 
> states of a shared object.
> You ran into a fairly exotic use case (which is what I feared, so I 
> wanted to look into porting this.)
> I think in this case we should use using separate users for the 
> enablement states, since user creation is not expensive. Or am I missing 
> something else?

Yes, two things.

First, while a single user creation isn't expensive (~5 seconds w/ IPA
running on tmpfs), 360 user creations *are* (~30 minutes). This figure
does not include user deletions which have a similar run-time. So I
estimate creating a user for every test adds about 45 minutes to a full
test run. This is entirely unnecessary.

Second, there is a much worse case: global otp enablement. We can't
guarantee the changes take effect without sleeping for 60 seconds on
every modification (see 8b2f4443dcf61e1edf59ef0812ed05e1fa93f8fc). This
means 2 minutes for every enablement/disablement cycle. This isn't a big
deal if we do it once. If we do it 360 times, however... You can do the
math. It isn't pretty.

However, if we put this in a fixture which has module/class scope and
allow the parameters to be mutually exclusive (i.e. the finalizer for
the previous parameter is called immediately before construction of the
next parameter) this problem goes away.

> >> * might make sense, for starters?
> >> [0]
> >> http://pytest.org/latest/plugins.html#_pytest.hookspec.pytest_collection_modifyitems
> >>
> >>
> >>> Most painful, is pytest.fixture(). It is a cool idea, but the scoping
> >>> and lifecycle is horribly broken. Here is an example:
> >>>
> >>> Here is an example. I want to create a user, enable otp in two different
> >>> ways, test with two different types of tokens under two different
> >>> authentication scenarios. This is a total of 8 tests.
> >>>
> >>> @pytest.fixture(scope="function") # This scope is default.
> >>> def user(request):
> >>>     user = create_user('tuser1')
> >>>     request.addfinalizer(lambda: delete_user('tuser1'))
> >>>     return user
> >>>
> >>> @pytest.fixture(params=["per-user", "global"])
> >>> def enablement(request, user):
> >>>     output = enable(user, request.param)
> >>>     request.addfinalizer(lambda: disable(user))
> >>>     return output
> >>>
> >>> @pytest.fixture(params=[{'type': 'HOTP'}, {'disabled': True}])
> >>> def token(request, user, enablement):
> >>>     output = create_token(request.param)
> >>>     request.addfinalizer(lambda: delete_token(output.id))
> >>>     return output
> >>>
> >>> @pytest.mark.parametrize("opts", [foo, bar])
> >>> def test_authentication(user, token, opts):
> >>>     return auth(user.uid, token.code(), opts)
> >>>
> >>> Because the default scope is "function", a new user is created for every
> >>> single run of token. This is *way* more work than necessary, and leads
> >>> to slow runtimes. Fortunately, we can fix this by changing the scope of
> >>> the user fixture to "module". In this case, only one user will be
> >>> created per module. So far so good.
> >>>
> >>> The enablement scope is still "function", so it will still be called
> >>> eight times (#tokens * #enablements * #opts). Now, this is a problem
> >>> because (let's say) enablement takes a long time to switch tests. Here
> >>> is problem #1 with pytest: it presumes that all variants of fixtures
> >>> should be grouped by test parameters. This is a bad assumption. It may
> >>> make sense in some cases, but there is no way to change it.
> >>>
> >>> Now, we can work around this problem just like with the user fixture:
> >>> change the scope to "module". Unfortunately, we've just caused a second
> >>> bug. Now, enablement() will only be run twice; but the finalizer will
> >>> not be called until all tests are run. This means that "per-user"
> >>> enablement will still be active when "global" enablement runs; both will
> >>> be torn down simultaneously.
> >>>
> >>> This same problem arises for the token fixture: eight tokens will be
> >>> created and destroyed when only four are necessary. You guessed it, we
> >>> can fix this by changing the scope. But by doing this all four tokens
> >>> will be deleted simultaneously. This means we cannot test the state
> >>> where only a disabled token exists.
> >>>
> >>> Any attempt to use a fixture to create long-lived states (precisely what
> >>> fixtures are designed for) with mutually exclusive parameters is
> >>> extremely error prone as it is not clear which items are alive in the
> >>> LDAP db at any point. This leads to both long runtimes *and* extremely
> >>> subtle bugs in the tests that are multiplied by every layer of fixtures.
> >>> Call me nonplussed.
> >>>
> >>> Now, let's look at how this same test can be implemented without pytest:
> >>>
> >>> user = create_user('tuser1')
> >>> try:
> >>>     for enablement in ('per-user', 'global'):
> >>>       enable(user, enablement)
> >>>       try:
> >>>         for token in [{'type': 'HOTP'}, {'disabled': True}]:
> >>>           object = create_token(token)
> >>>           try:
> >>>             for opt in [foo, bar]:
> >>>               auth(user.uid, object.code(), opt)
> >>>           finally:
> >>>             delete_token(object.id)
> >>>       finally:
> >>>         disable(user)
> >>> finally:
> >>>     delete_user('tuser1')
> >>>
> >>> Comparing the non-pytest implementation to the pytest one we can see a
> >>> few things. First, the pytest code has some nice separation of roles.
> >>> This is the big benefit of pytest. Second, the non-pytest implementation
> >>> is less code. Third, the non-pytest implementation has no subtle scoping
> >>> bugs: we know exactly when each database record is created and
> >>> destroyed.
> >>
> >> This "non-pytest" implementation is a *single test*, since it hard-codes
> >> dependencies and ordering. If you write it as as a single test function
> >> in pytest, it should look much the same, and get the
> >> ordering/dependencies right (except the case of the test user already
> >> existing).
> >
> > Which conflicts with the desired granularity; which is the whole reason
> > we are moving away from this model. I should be able to define, as
> > fixtures, a set of states required to run any one of a set of test. Then
> > I can choose just to run a single iteration of a set of tests and the
> > set of states should be assembled correctly. The problem is that, with
> > pytest, this all works unless the parameterized fixtures contain
> > mutually exclusive state values.
> >
> > So, I'm obviously not proposing that we do something like the non-pytest
> > implementation. What I'm pointing out is that being forced to implement
> > it this way in pytest negates all the benefits of pytest.
> >
> >>> I *want* to like pytest. Honestly I do. But this scoping idiocy has
> >>> already cost me roughly two days tracking down subtle bugs in the tests
> >>> (while not finding any bugs in the OTP code itself). And I can't think
> >>> that OTP tests written using pytest will be maintainable in any regard;
> >>> not when such small changes can cause such hard to find bugs.
> >>>
> >>> Now, it may be possible that I have just missed something obvious. If I
> >>> have, forgive me this rant. But I've spent a lot of time reading docs
> >>> and I haven't seen any obvious solutions to these problems.
> >>>
> >>> Nathaniel
>