[Freeipa-devel] DN patch and documentation

Tue Jul 17 22:47:57 UTC 2012

On 07/10/2012 04:23 AM, Petr Viktorin wrote:
> I've read your summary (which you should summarize into a commit message
> before this is pushed), and gone through the patch.
> Here is what I found doing that; I didn't get to actual testing yet.
> I also didn't do a detailed review of ldap2.py changes yet.

Thank you for your careful review Petr, you had many good suggestions 
and comments. It was an enormous amount of material to wade through, 
thank you for your diligence. I'm in agreement with most of your 
suggestions.

> I agree with Simo that it would have been better to put at least your
> automated changes in a separate patch. Without knowing what was
> automated and what wasn't, I have to eyeball each change at least to
> figure that out. Not complaining, it's a reviewer's work after all, and
> with an intra-line diff tool it's not that hard, but I can't really
> honor your suggestion for a faster review :)

I agree fully it would be nice if things were broken up into smaller 
well defined patches. When I started the work I tried to keep things 
separate but as I worked I discovered there were so many inter-dependent 
relationships I was spending as much time juggling patch sets that could 
be applied independently as I was making forward progress to the real 
goal. So I abandoned the idea of independent patch sets in favor of 
getting the work completed as expeditiously as possible, it was a 
pragmatic trade-off. Now that the work is (nearly) completed I can see 
where some things can be broken out cleanly that was not obvious in the 
midst of the work. Hindsight is 20/20.

> ==== Blockers ====
>
> Mutable objects that support __hash__ are a *very* bad thing to do. It
> violates a fundamental assumption in Python, as you outlined. Mutable
> objects *must not* be allowed to be dictionary keys.
> I understand the patch is not perfect, but this is a big issue that
> invites very hard-to-debug errors.
> You say you're already working on a patch to fix this. Please post that
> ASAP; I can't ack the DN work without it.

Understood, I agree with you and I will start work on those changes 
immediately.

> In the long run, it would be good to use immutable DNs only: I think
> modeling them after Python strings in this way would be beneficial.

We edit DN's in a variety of places thus I feel mutable DN's have a 
distinct value.

If we only had immutable DN's then forming a new DN would require more 
code that would extract the components from the right locations in the 
original DN and use those extracted components to build a new DN. That 
would not be too far from some of the earlier code which used a series 
of calls over multiple lines to slice and dice the strings. I believe 
using just one simple obvious Python operator beats multiple lines of 
convoluted code every time. The expression of the concept is inherently 
obvious in the line of code using Python operator(s), plus it's vastly 
more succinct (easier to read, easier to code, more robust).

Because the logic to build the new DN is inherently centralized in the 
object operator it's a single implementation (much easier to maintain 
and prove correctness) rather than the cut-n-paste coping of code to 
decompose and reassemble an DN over many locations in the code base.

I believe mutable DN's have a valuable role to play and make the best 
use of Python's object oriented facilities. They go a long way to making 
IPA more robust and easier to comprehend.

> You seem to forged locked=True for some global DNs, e.g.
> ipalib/plugins/pwpolicy.py:172: global_policy_dn
> ipalib/plugins/migration.py:117: _compat_dn
> Locked by default would prevent these omissions.
>
>
>
> Please fix a lint failure; ipaserver.ipautil moved to ipapython.ipautil.
>
>
>
> ==== Design decision criticism ====
>
>   > Originally I was against automatic type promotion on the belief if
> you were comparing a string to a DN it was a failure of program logic,
> but there were some circumstances where it was evil necessity.
>
> Would you care to elaborate? I also think that's a logic failure; such
> circumstances should at least be marked by a TODO.

I should have taken better notes because I don't remember the exact 
place this came up, it was late in the testing and I believe the problem 
showed up in the ldap updater, possibly some other places.

But here is the bottom line: I encountered a handful of places in the 
code where a string was being compared to a DN object for equality (I 
seem to recall the equality comparison was indirect by virtue of the 
"in" operator). What I do recall is it was logically impossible to know 
a prori the string being tested was supposed to be a DN. If it were 
possible to know the string represented a DN it would have been easy to 
promote it to a DN object. But there were a handful of situations where 
there simply was not enough information available to ascertain if the 
string was logically a DN or not. This was never an issue when dn's were 
simple strings. If it's getting compared to a DN that's a pretty strong 
hint it's supposed to be a DN (fwiw the coercion must occur in the 
equality operator).

So I pondered the problem for a while and decided there was legitimate 
value in allowing an equality comparison between a string and DN. I 
could see no downside to it other than it *might* mask an error in 
program logic. I didn't take the decision lightly. I looked through the 
Python internals to see where type coercion occurs. While not often it's 
certainly done where "it makes sense" (e.g. between string types, 
between numerical types, between buffer types, etc.). I thought one 
could make the argument that a DN object is akin to a string type and 
hence falls into the same category of coercion as occurs between string 
types. I'm not saying I'm in love with it, rather it's a pragmatic 
necessity and can be justified if you look at it a certain way.

We could add a debug message with a backtrace whenever the coercion 
occurs if we wanted to completely eliminate all places in the code where 
a string-to-DN comparison occurs. But as I said I'm not sure it's 
possible to absolutely eliminate this comparison. We've eliminated 99.9% 
of those cases already, I'm not sure the other 0.1% can be eliminated or 
that it's worth the work involved. I'm happy with 99.9% :-)
>
>   > The find() and rfind() methods (modeled after their string cousins)
> to locate a RDN or DN in an existing DN.
>
> I would much rather see the index() and rindex() methods, which do the
> Pythonic thing of raising an exception rather than silently returning -1
> when the thing isn't found.
> Compare:
>       something[something.find(thing)]  => something[-1] if not found
>       something[something.index(thing)] => raises exception if not found

You're right exceptions are more Pythoninc than return codes. I can add 
index and rindex and convert the code to catch an exception.

However, we use find and rfind extensively in the IPA code base 
(DN.find() and DN.rfind() simply replaced their string equivalents 
during the conversion). Perhaps you should open a ticket to purge all 
use of find() and rfind() from our code base and add a prohibition 
against their use in our Python coding standard document.

>
>
> ipapython/dn.py:448:
>       Each argument must be scalar convertable to unicode or a callable
> object whose result is convertable to unicode.
>
> Is it really necessary to allow callables here? If the user has a
> callable, he/she should call it before making an AVA out of it. Same
> with converting to Unicode.
> As you point out in dn_summary, automatic coercion is evil.

There are two issues here, type coercion and string encoding.

Not all coercion is evil, there is value in making it easy to do the 
right thing and minimizing inline coercion that otherwise obfuscates the 
logical intent of the code. We shouldn't require programmers to be 
perfect coders who never forget to perform some operation in the 
hundreds of places it might be required. Rather the tools should quietly 
and consistently enforce the rules while leaving the code readable, 
concise and with clear intent.

There is a well established precedent in Python for coercion to string 
in a string context. Coercion to string is so fundamental to Python it 
has it's own low-level operator embedded in every class type definition 
utilized in both Python and the CPython implementations.

1) We know all DN components must be strings. Let's say someone passes 
an integer object; is there really any harm to converting this to a 
string representation of an integer? That makes the DN API much 
friendlier and it's use more succinct. In fact Python does automatic 
coercion to string in many places where it knows something must be a 
string, thus it's a well accepted and established practice. We do 
automatic coercion to string in many places in IPA (a good example is 
the ldap encode/decode operations).

I wish other Python API's also coerced to string as a service to the 
caller. I wish python-ldap would have coerced to string those elements 
of it's API which it knows must be strings. If it had we could have just 
passed DN objects into the python-ldap API instead of having to wrap the 
class to perform the coercion. This wouldn't be violating a guideline, 
rather it would be utilizing a inherent feature of the Python language.

There are an astonishing number of CPython modules that barf if you pass 
the wrong string type. Instead they should be friendly and provide the 
right string coercion. (There is a long and sordid history of this 
problem and I have a extensive write-up on it).

2) All strings in IPA are unicode. In python 3.x all strings will be 
unicode (the way it should have been in python 2.x alas). When string 
comparisons are performed they should be done as unicode objects, not 
encoded in some undefined encoding. Likewise when returning a string 
belonging to a component in a DN it should be unicode. Traditionally 
dn's have been str objects (a python-ldap foible), so the decoding of 
str to unicode promotes correctness in Python 2.x.

Accepting a callable is just a further refinement of the discussion in 
#1. If you allow for automatic coercion to sting (in a context that 
demands a string type) then why not allow anything which yields a 
string? It just makes the API friendlier and easier to use without 
introducing ugly compromises. Some coercion is evil, but some is darn 
nice and useful. The trick is recognizing which is which.

>
>
>
> ==== Major issues ====
>
> ipalib/plugins/aci.py:340:
> +        groupdn = DN(groupdn)
> +        try:
> +            if groupdn[0].attr == 'cn':
> +                dn = DN()
> +                entry_attrs = {}
> +                try:
> +                    (dn, entry_attrs) = ldap.get_entry(groupdn, ['cn'])
> +                except errors.NotFound, e:
> +                    # FIXME, use real name here
> +                    if test:
> +                        dn = DN(('cn', 'test'),
> api.env.container_permission)
> +                        entry_attrs = {'cn': [u'test']}
> +                if api.env.container_permission in dn:
> +                    kw['permission'] = entry_attrs['cn'][0]
> +                else:
> +                    if 'cn' in entry_attrs:
> +                        kw['group'] = entry_attrs['cn'][0]
> +        except IndexError:
> +            pass
>
> Why is any IndexError ignored here?

Because the original test that entered the code block whose size you're 
concerned about was:

if groupdn.startswith('cn='):

The very first executable statement in the try block is

if groupdn[0].attr == 'cn'

If groupdn is actually empty it immediately exits the block because 
there will be an IndexError due to trying to access groupdn[0]. The 
combination of the try and the first if test mimics the original logic 
for deciding if one should enter the block.

However, you make a valid point that there are other indexing statements 
in the block that could raise an IndexError that should not be silently 
caught and ignored. I will change that.

I chose to use IndexError because error checking on DN "existence 
checks" in the rest of the code most always uses IndexError and 
KeyError, I was trying to be consistent.

The test for entering the block can be changed to:

if len(groupdn) and groupdn[0].attr == 'cn':

and remove the try/except.

 > That try block is much too long, it
> could catch unintended errors. Please only `try` the smallest piece of
> code that could raise the exception you want to catch.

see above

> There are enough `DN(xyz.replace('ldap:///',''))` calls in aci.py to
> warrant a strip_ldap_prefix helper. (It could also only remove the
> prefix, not all the occurrences -- unlikely to matter but it's better to
> do things correctly.)

No argument, I agree but I didn't write the code and I was trying not to 
change code I didn't need to. Please open a ticket to get this fixed.

> ipalib/plugins/baseldap.py:1028:
> +            try:
> +                if dn[0].attr != self.obj.primary_key.name:
> +                    self.obj.handle_duplicate_entry(*keys)
> +            except (IndexError, KeyError):
> +                pass
>
> Again, there's too much in the try block. You don't want to catch errors
> from handle_duplicate_entry. Do this instead:
>       try:
>           dn_attr = dn[0].attr
>       except (IndexError, KeyError):
>           pass
>       else:
>           if dn_attr != self.obj.primary_key.name:
>               self.obj.handle_duplicate_entry(*keys)

Good suggestion, I'll fix it.

>
> ==== Minor issues ====
>
> In ipalib/plugins/aci.py:
> @@ -343,10 +341,12 @@ def _aci_to_kw(ldap, a, test=False, pkey_only=False):
>                else:
>                    # See if the target is a group. If so we set the
>                    # targetgroup attr, otherwise we consider it a subtree
> -                if api.env.container_group in target:
> -                    targetdn = unicode(target.replace('ldap:///',''))
> -                    target = DN(targetdn)
> -                    kw['targetgroup'] = target['cn']
> +                targetdn = DN(target.replace('ldap:///',''))
> +                if api.env.container_group in targetdn:
> +                    try:
> +                        kw['targetgroup'] = targetdn[0]['cn']
> +                    except (IndexError, KeyError):
> +                        pass
>                    else:
>                        kw['subtree'] = unicode(target)
>
> Why is (IndexError, KeyError) ignored?

Because the original code did not seem to handle the error condition 
either and I wasn't sure what to do with the error. But you're right, 
silently ignoring those two exceptions is wrong. I'll fix it, good catch 
on your part.

> Wouldn't be more correct to use endswith(DN(container_group, suffix))
> instead of the `in` operator here, and in other places like this.

Good idea, the test you suggest is actually the correct test (shame on 
me for trying to translate the code too literally). I'll fix it.

> ipapython/dn.py:26
>       LOCKED_ERROR_MSG = 'Object is locked, it cannot be modified: %s'
> and then:
>       raise TypeError(LOCKED_ERROR_MSG % (repr(self)))
>
> You should define a TypeError subclass for this.
> Using a string instead of a specialized object is exactly what this
> patch tries so hard to get rid of... :)

Moot point since locking will be replaced in favor of mutable and 
immutable DN variants. Base Python exceptions would be raised instead 
for these types of coding errors, that's consistent with how we handle 
those problems now.

> ipapython/entity.py:49:
>       elif isinstance(entrydata, DN):
>           self.dn = entrydata
>           self.data = ipautil.CIDict()
>       elif isinstance(entrydata, basestring):
>           self.dn = DN(entrydata)
>           self.data = ipautil.CIDict()
>
> Should we not use DNs exclusively here? At least a TODO is warranted.

It seemed reasonable to allow the constructor to coerce to DN rather 
than forcing the caller of the constructor to coerce a parameter.

Someday soon I hope the Entity and Entry classes go away as we converge 
on a single API. In the interim this compromise seems reasonable and 
meant there were fewer places to find and fix (everywhere where Entity 
and Entry objects were instantiated).

>
> ipaserver/ipaldap.py:109:
>       elif isinstance(entrydata,DN):
>           self.dn = entrydata
>           self.data = ipautil.CIDict()
>       elif isinstance(entrydata,str) or isinstance(entrydata,unicode):
>           self.dn = DN(entrydata)
>           self.data = ipautil.CIDict()
>
> Again, why not use DNs exclusively? At least put in a TODO.
> For now at least do isinstance(entrydata, basestring) instead of the two
> isinstances.

Agreed, there should be one isinstance(), I didn't code it, but I should 
have fixed it when I saw it.

>
>
> In several places:
> +    # Use a property for <attr> to assure it's always a DN type
> +    def _set_<attr>(self, value):
> +        self._<attr> = DN(value)
> +
> +    def _get_<attr>(self):
> +        assert isinstance(getattr(self, '_<attr>'), DN)
> +        return self._<attr>
> +
> +    ldap_suffix = property(_get_<attr>, _set_<attr>)
>
> Shouldn't the assert be in the setter? Do we not want people to only put
> DNs in these properties, instead of silently converting?

I'm afraid I don't follow you here. When setting the property it should 
be coerced to a DN, we want to force the property to always be a DN.

However it's possible for someone to set the _<attr> directly even 
though they should never do that, the assert in the getter is just 
bullet-proofing against that.

> This appears enough times to justify using a helper to construct these.
> But if you don't want to generalize, the getattr call is unnecessary.

Once again I don't follow you. How would a helper work?

I don't know why there is a getattr in the getter, it would seem to be 
unnecessary unless I was trying to protect against an uninitialized 
attribute. The getattr should probably be replaced with normal class 
attribute access. I'll fix that.

> tests/util.py:307:
>       if isinstance(expected, DN):
>           if isinstance(got, basestring):
>               got = DN(got)
>
> This may be overly broad, the tests should be able to check if they got
> a DN or a string. What about something like:
>
>       if isinstance(expected, DNString):
>           assert isinstance(got, basestring)
>           assert DN(got) == expected.dn
> with a suitable DNString class?

Sorry I don't understand what you're trying to protect against nor what 
a DNString class would be or what it's purpose is.

> ipapython/dn.py:541: def _set_locked(self, value):
> There should be only one way to do things, either the lock/unlock
> methods, or setting an attribute. More ways to do one same thing lead to
> subtle errors in one of the ways.
>
> The locking code is duplicated in AVA, RDN and DN; it would be good to
> make a Lockable mixin to avoid that. A similar case is with __repr__.
>
> Anyway unlocking is bad (see above) so these are mostly moot points.

A mixin would eliminate code duplication but in this case the 
duplication is trivial, not sure it warrants introducing a new class in 
this circumstance (judgment call), but the topic is moot because locking 
is going to be eliminated.

>
> ipaserver/plugins/ldap2.py:134:
>       class ServerSchema(object):
>
> Please don't use nested classes without reason.

There is a valid reason, the ServerSchema class is private to the 
SchemaCache class.

> ==== Nitpicks/opinions ====
>
> In ipa-replica-manage:369
>   >    sys.exit("winsync agreement already exists on subtree %s" %
> please remove the trailing space
>
>
>
> ipapython/dn.py:
>       in docstring:  DN(arg0, ..., locked=False, first_key_match=True)
>       followed by:  def __init__(self, *args, **kwds):
>       and:  kwds.get('first_key_match', True)
>
> I don't see the reason for this. Just write `def __init__(self, *args,
> locked=False, first_key_match=True)` and put a proper summary in the
> first line of the docstring. Same in AVA & RDN.

A valid point. I think I was trying to be too flexible/extensible.

> ipapython/ipautil.py:201:
> -    """Convert a kerberos realm into the IPA suffix."""
>
> Why are you removing a docstring?

Must have been a mistake. I'll fix.

> ipaserver/plugins/ldap2.py:79:
>       _debug_log_ldap = True
>       if _debug_log_ldap:
>           import pprint
>
> Just import it unconditionally if you need it.

Not to worry, I'm going to nuke the pprint formatting anyway, it never 
worked correctly and when it did it didn't add much value.

> ipalib/plugins/baseldap.py:1874:
> - sortfn=lambda x,y: cmp(x[1][self.obj.primary_key.name][0].lower(),
> y[1][self.obj.primary_key.name][0].lower())
> +    sortfn=lambda x,y: cmp(x[1][self.obj.primary_key.name][0],
> +        y[1][self.obj.primary_key.name][0])
>       entries.sort(sortfn)
>
> Since you're touching this: it's better (easier, faster, more readable)
> to use a key function. (Also, defining a named function is better with
> `def foo` than `foo=lambda...`):
>       def sortkey(x):
>           return x[1][self.obj.primary_key.name][0]
>       entries.sort(key=sortkey)

Agreed, it would have been better to rewrite this using a key function 
rather than preserving the old code, I'll fix.
>
>
> ipapython/entity.py:95:
>       # Python "internal" class attributes are prefixed and suffixed
>         with double underscores.
>       # Make sure we continue to return things like __str__ from
>         this class.
>
> I think a better solution would be making Entity a new-style class.
> Or even better, killing Entity off altogether. Which is the plan, so
> this is a moot point.
> Same with Entry in ipaserver/ipaldap.py.

Yeah, that would have worked too, probably cleaner, but I really want 
these classes to go away, not much point in tweaking them now.

> ipaserver/install/ldapupdate.py:132:
> +        assert isinstance(suffix, (None.__class__, DN))
>
> type(x) is preferable to x.__class__ (the same way e.g. len(x) is
> preferable to x.__len__). However, I think it would be more readable to use:
>       if suffix is not None:
>           assert isinstance(suffix, DN)

I don't think you can use type() to cover the case when something might 
be a derived subclass, hence the way it was coded (I anticipated DN 
might be subclassed). I was trying to allow for both None or any DN or 
subclass of DN in one statement. Perhaps not the best idea.

Your suggestion is more readable, I'll change it to match your suggestion.

> ipaserver/plugins/ldap2.py:109:
> def value_to_utf8(val):
>       '''
>       If val is not a string we need to convert it to a string
>       (specifically a unicode string). Naively we might think we need to
>       call str(val) to convert to a string. This is incorrect because if
>       val is already a unicode object then str() will call
>       encode(default_encoding) returning a str object encoded with
>       default_encoding. But we don't want to apply the default_encoding!
>       Rather we want to guarantee the val object has been converted to a
>       unicode string because from a unicode string we want to explicitly
>       encode to a str using our desired encoding (utf-8 in this case).
>
>       Note: calling unicode on a unicode object simply returns the exact
>       same object (with it's ref count incremented). This means calling
>       unicode on a unicode object is effectively a no-op, thus it's not
>       inefficient.
>       '''
> That is not an explanation of *what* the function does, but *how* it
> does it. Thus it should be in a comment, not in a docstring.
>
> Also, if something needs such an explanation, it also need unit tests to
> ensure it actually works correctly.
>
> IPASimpleLDAPObject has the same issue -- lengthy descriptions of design
> decisions should go in a comment (if not in a commit message or on the
> mailing list).

Valid point, I'll make them comments.

>   > Require that every place a dn is used it must be a DN object. This
>   > translates into lot of `assert isinstance(dn, DN)` sprinkled through
>   > out the code.
>
> That seems like a crutch to get your DN work done. We should remove
> these once the dust settles.

I disagree with removing the asserts. You're correct they were an 
invaluable aid in performing the conversion but they have a value beyond 
that. I fully expect programmers to make mistakes with DN usage in the 
future. Such mistakes are really easy to make especially when for years 
we just used strings. I've already seen examples of dn string formatting 
getting committed to master when folks should have known better. As such 
I believe the asserts will continue to play a valuable role in the 
(foreseeable?) future.

We do use asserts in a fair number of other locations. I don't presume 
you mean you want all assert statements removed.

Instead I think we should make sure the Python interpreter is started 
with the -O option in production mode, then assertions will be stripped 
out and not evaluated eliminating the overhead.

see http://wiki.python.org/moin/UsingAssertionsEffectively

>   > Python provides the __all__ module export list, we really should be
>   > making better use of that.
>
> I recommend importing names explicitly, or importing a module and using
> qualified names, over __all__ and import *. Star imports make it
> unnecessarily hard to see where names come from.
> (You do use explicit imports in the code, my gripe is just about
> recommending * and __all__).

We should come to a project consensus on this topic and add it to our 
Python style guide. This really isn't germane to the dn work.

>   > I equate Python decorators with C preprocessor hell when CPP macros
>   > are abused.
>
> I do not share that view. It is possible to abuse decorators, but not
> all of their use is necessarily evil.
> It's true that encode.py wasn't a good use of them.

Agreed, decorators are not always bad, they have a role to play. I was 
perhaps being too dramatic in my wording and not selective enough. Only 
some decorators are like C preprocessor hell, some are perfectly fine.

It took me so long to unravel and fully comprehend what the 
encode/decode decorators were doing in detail that I was probably 
venting my frustration with that experience.

>   > Removed all usage of get_schema() which was being called prior to
>   > accessing the .schema attribute of an object. If a class is using
>   > internal lazy loading as an optimization it's not right to require
>   > users of the interface to be aware of internal optimization's.
>   > Instead the class should handle it's lazy loading completely
>   > internally by loading the data on first access.
>   > This is very easy to do with class properties. schema is now a
>   > property of the class instead of an attribute. When the schema
>   > property is accessed it actually calls a private internal method to
>   > perform the lazy loading.
>
> In other words, you're reinventing the reify decorator:
> https://github.com/Pylons/pyramid/blob/master/pyramid/decorator.py#L1

Hmm... I wasn't aware of reify. A quick look at it and it smacks of 
self-modifying code, more hidden weirdness that's not obvious in static 
code reading. Personally I vastly prefer things to be obvious and not 
tricky, clever, obscure, and hidden.

>
> Being a grammar nazi, I must point out the it's/its confusion and
> plurals formed with apostrophes.

Yup, can't disagree, but heck I pumped out 10 pages of documentation in 
a little over a day, there was some grammar road-kill along the way no 
doubt.

BTW, as a non-native English speaker it's admirable to pick these things 
up. I'm impressed!

> ====
>
> Thanks for the work! This will really make IPA better when it's polished
> a bit. Hopefully all the criticism doesn't sound too negative :)

Thank you for the review. It was a lot of material to slog though and 
your careful attention is appreciated, it will definitely make it 
better. You made good points and I was in agreement with the majority of 
them. Thank you.

-- 
John Dennis <jdennis at redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/