[Freeipa-devel] [RANT] pytest fixture scope is braindead

Fri Dec 5 22:51:31 UTC 2014

So I've been working this week on porting the OTP tests that were
manually coded to the new pytest framework. Now, I'm the first to want
better tooling to make our job easier. But, pytest? Meh. I'm having to
write just as much code (or more) to get my tests on par with pytest,
and they are riddled with subtle bugs.

Most painful, is pytest.fixture(). It is a cool idea, but the scoping
and lifecycle is horribly broken. Here is an example:

Here is an example. I want to create a user, enable otp in two different
ways, test with two different types of tokens under two different
authentication scenarios. This is a total of 8 tests.

@pytest.fixture(scope="function") # This scope is default.
def user(request):
  user = create_user('tuser1')
  request.addfinalizer(lambda: delete_user('tuser1'))
  return user

@pytest.fixture(params=["per-user", "global"])
def enablement(request, user):
  output = enable(user, request.param)
  request.addfinalizer(lambda: disable(user))
  return output

@pytest.fixture(params=[{'type': 'HOTP'}, {'disabled': True}])
def token(request, user, enablement):
  output = create_token(request.param)
  request.addfinalizer(lambda: delete_token(output.id))
  return output

@pytest.mark.parametrize("opts", [foo, bar])
def test_authentication(user, token, opts):
  return auth(user.uid, token.code(), opts)

Because the default scope is "function", a new user is created for every
single run of token. This is *way* more work than necessary, and leads
to slow runtimes. Fortunately, we can fix this by changing the scope of
the user fixture to "module". In this case, only one user will be
created per module. So far so good.

The enablement scope is still "function", so it will still be called
eight times (#tokens * #enablements * #opts). Now, this is a problem
because (let's say) enablement takes a long time to switch tests. Here
is problem #1 with pytest: it presumes that all variants of fixtures
should be grouped by test parameters. This is a bad assumption. It may
make sense in some cases, but there is no way to change it.

Now, we can work around this problem just like with the user fixture:
change the scope to "module". Unfortunately, we've just caused a second
bug. Now, enablement() will only be run twice; but the finalizer will
not be called until all tests are run. This means that "per-user"
enablement will still be active when "global" enablement runs; both will
be torn down simultaneously.

This same problem arises for the token fixture: eight tokens will be
created and destroyed when only four are necessary. You guessed it, we
can fix this by changing the scope. But by doing this all four tokens
will be deleted simultaneously. This means we cannot test the state
where only a disabled token exists.

Any attempt to use a fixture to create long-lived states (precisely what
fixtures are designed for) with mutually exclusive parameters is
extremely error prone as it is not clear which items are alive in the
LDAP db at any point. This leads to both long runtimes *and* extremely
subtle bugs in the tests that are multiplied by every layer of fixtures.
Call me nonplussed.

Now, let's look at how this same test can be implemented without pytest:

user = create_user('tuser1')
try:
  for enablement in ('per-user', 'global'):
    enable(user, enablement)
    try:
      for token in [{'type': 'HOTP'}, {'disabled': True}]:
        object = create_token(token)
        try:
          for opt in [foo, bar]:
            auth(user.uid, object.code(), opt)
        finally:
          delete_token(object.id)
    finally:
      disable(user)
finally:
  delete_user('tuser1')

Comparing the non-pytest implementation to the pytest one we can see a
few things. First, the pytest code has some nice separation of roles.
This is the big benefit of pytest. Second, the non-pytest implementation
is less code. Third, the non-pytest implementation has no subtle scoping
bugs: we know exactly when each database record is created and
destroyed.

I *want* to like pytest. Honestly I do. But this scoping idiocy has
already cost me roughly two days tracking down subtle bugs in the tests
(while not finding any bugs in the OTP code itself). And I can't think
that OTP tests written using pytest will be maintainable in any regard;
not when such small changes can cause such hard to find bugs.

Now, it may be possible that I have just missed something obvious. If I
have, forgive me this rant. But I've spent a lot of time reading docs
and I haven't seen any obvious solutions to these problems.

Nathaniel