[Freeipa-devel] [RANT] pytest fixture scope is braindead

Tue Dec 9 09:39:39 UTC 2014

On 12/08/2014 03:15 PM, Nathaniel McCallum wrote:
> On Mon, 2014-12-08 at 12:27 +0100, Petr Viktorin wrote:
>> On 12/05/2014 11:51 PM, Nathaniel McCallum wrote:
>>> So I've been working this week on porting the OTP tests that were
>>> manually coded to the new pytest framework. Now, I'm the first to want
>>> better tooling to make our job easier. But, pytest? Meh. I'm having to
>>> write just as much code (or more) to get my tests on par with pytest,
>>> and they are riddled with subtle bugs.
>>
>> Hi,
>> Yeah, IPA runs into specific situations that are very different from how
>> tests usually work in other projects.
>
> Yeah, this I get.
>
>> And you took one of the harder cases.
>> IPA's existing tests also don't really follow best practices – they
>> pretty much assume you run the whole test suite, or at least the whole
>> module, so selecting individual tests is currently not reliable. On the
>> other hand, this assumption made some things easy to code – like
>> complicated ordering and setup/teardown.
>
> I think this is pretty much a red herring. It isn't so much about setup
> and teardown but about mutually exclusive parameterization. It is a poor
> assumption on the part of pytest that the lifecycles of parameters
> overlap. At the least, an option should be provided to control this.
>
>> One thing we want to do during the move to pytest is to have the tests
>> be more independent of the current state of the database, and also of
>> each other.
>
> Sure, this is a fine goal. Unless of course you want to test the
> functionality of the code against all possible database states. I think
> pretty much every xmlrpc test will have some component of this.
>
>> If certain assertions only make sense in a specific order, either put
>> them in the same test*, or write additional code that explicitly handles
>> dependencies and ordering. (Sorry, that's the price to pay for isolated
>> tests – I don't see a way around it.)
>
> In an ideal world, we define a test and all possible parameters to that
> test and allow the running of that test independently by choosing from
> the proscribed inputs. This all works fine so long as you specify a way
> to indicate when inputs are mutually exclusive. Since inputs are single
> valued anyway, this should be the default behavior. Unfortunately,
> pytest fails on both of these counts.
>
>> If on the other hand the order doesn't matter – any order of any subset
>> of individual tests should pass, which is our goal – but pytest's
>> default ordering makes the suite slow, there is a hook that allows you
>> to override the order: pytest_collection_modifyitems [0]. Stick it in
>> the affected module (or conftest.py file for multiple modules, or a
>> plugin for the whole suite), and sort the items explicitly.
>
> The problem isn't (I think) ordering but lifecycle. I think that if I
> scope the fixture to the function but reorder the items, they will still
> be constructed and torn down for every single function iteration. On the
> other hand, if I scope the fixture for a longer lifecycle, the
> parameters may not be mutually exclusive.
>
>> If your test depends on something not existing or being disabled, you
>> should try removing/disabling it before the test, or write a fixture to
>> do that.
>
> That's not the problem. The problem is that I can't define parameters
> for a single object which sets the same variable to different values.
> Such an act means the parameters are mutually exclusive.

OK, I see the problem now. Pytest fixtures are meant for resources, not 
states of a shared object.
You ran into a fairly exotic use case (which is what I feared, so I 
wanted to look into porting this.)
I think in this case we should use using separate users for the 
enablement states, since user creation is not expensive. Or am I missing 
something else?

>> * might make sense, for starters?
>> [0]
>> http://pytest.org/latest/plugins.html#_pytest.hookspec.pytest_collection_modifyitems
>>
>>
>>> Most painful, is pytest.fixture(). It is a cool idea, but the scoping
>>> and lifecycle is horribly broken. Here is an example:
>>>
>>> Here is an example. I want to create a user, enable otp in two different
>>> ways, test with two different types of tokens under two different
>>> authentication scenarios. This is a total of 8 tests.
>>>
>>> @pytest.fixture(scope="function") # This scope is default.
>>> def user(request):
>>>     user = create_user('tuser1')
>>>     request.addfinalizer(lambda: delete_user('tuser1'))
>>>     return user
>>>
>>> @pytest.fixture(params=["per-user", "global"])
>>> def enablement(request, user):
>>>     output = enable(user, request.param)
>>>     request.addfinalizer(lambda: disable(user))
>>>     return output
>>>
>>> @pytest.fixture(params=[{'type': 'HOTP'}, {'disabled': True}])
>>> def token(request, user, enablement):
>>>     output = create_token(request.param)
>>>     request.addfinalizer(lambda: delete_token(output.id))
>>>     return output
>>>
>>> @pytest.mark.parametrize("opts", [foo, bar])
>>> def test_authentication(user, token, opts):
>>>     return auth(user.uid, token.code(), opts)
>>>
>>> Because the default scope is "function", a new user is created for every
>>> single run of token. This is *way* more work than necessary, and leads
>>> to slow runtimes. Fortunately, we can fix this by changing the scope of
>>> the user fixture to "module". In this case, only one user will be
>>> created per module. So far so good.
>>>
>>> The enablement scope is still "function", so it will still be called
>>> eight times (#tokens * #enablements * #opts). Now, this is a problem
>>> because (let's say) enablement takes a long time to switch tests. Here
>>> is problem #1 with pytest: it presumes that all variants of fixtures
>>> should be grouped by test parameters. This is a bad assumption. It may
>>> make sense in some cases, but there is no way to change it.
>>>
>>> Now, we can work around this problem just like with the user fixture:
>>> change the scope to "module". Unfortunately, we've just caused a second
>>> bug. Now, enablement() will only be run twice; but the finalizer will
>>> not be called until all tests are run. This means that "per-user"
>>> enablement will still be active when "global" enablement runs; both will
>>> be torn down simultaneously.
>>>
>>> This same problem arises for the token fixture: eight tokens will be
>>> created and destroyed when only four are necessary. You guessed it, we
>>> can fix this by changing the scope. But by doing this all four tokens
>>> will be deleted simultaneously. This means we cannot test the state
>>> where only a disabled token exists.
>>>
>>> Any attempt to use a fixture to create long-lived states (precisely what
>>> fixtures are designed for) with mutually exclusive parameters is
>>> extremely error prone as it is not clear which items are alive in the
>>> LDAP db at any point. This leads to both long runtimes *and* extremely
>>> subtle bugs in the tests that are multiplied by every layer of fixtures.
>>> Call me nonplussed.
>>>
>>> Now, let's look at how this same test can be implemented without pytest:
>>>
>>> user = create_user('tuser1')
>>> try:
>>>     for enablement in ('per-user', 'global'):
>>>       enable(user, enablement)
>>>       try:
>>>         for token in [{'type': 'HOTP'}, {'disabled': True}]:
>>>           object = create_token(token)
>>>           try:
>>>             for opt in [foo, bar]:
>>>               auth(user.uid, object.code(), opt)
>>>           finally:
>>>             delete_token(object.id)
>>>       finally:
>>>         disable(user)
>>> finally:
>>>     delete_user('tuser1')
>>>
>>> Comparing the non-pytest implementation to the pytest one we can see a
>>> few things. First, the pytest code has some nice separation of roles.
>>> This is the big benefit of pytest. Second, the non-pytest implementation
>>> is less code. Third, the non-pytest implementation has no subtle scoping
>>> bugs: we know exactly when each database record is created and
>>> destroyed.
>>
>> This "non-pytest" implementation is a *single test*, since it hard-codes
>> dependencies and ordering. If you write it as as a single test function
>> in pytest, it should look much the same, and get the
>> ordering/dependencies right (except the case of the test user already
>> existing).
>
> Which conflicts with the desired granularity; which is the whole reason
> we are moving away from this model. I should be able to define, as
> fixtures, a set of states required to run any one of a set of test. Then
> I can choose just to run a single iteration of a set of tests and the
> set of states should be assembled correctly. The problem is that, with
> pytest, this all works unless the parameterized fixtures contain
> mutually exclusive state values.
>
> So, I'm obviously not proposing that we do something like the non-pytest
> implementation. What I'm pointing out is that being forced to implement
> it this way in pytest negates all the benefits of pytest.
>
>>> I *want* to like pytest. Honestly I do. But this scoping idiocy has
>>> already cost me roughly two days tracking down subtle bugs in the tests
>>> (while not finding any bugs in the OTP code itself). And I can't think
>>> that OTP tests written using pytest will be maintainable in any regard;
>>> not when such small changes can cause such hard to find bugs.
>>>
>>> Now, it may be possible that I have just missed something obvious. If I
>>> have, forgive me this rant. But I've spent a lot of time reading docs
>>> and I haven't seen any obvious solutions to these problems.
>>>
>>> Nathaniel

-- 
Petr³