[Freeipa-devel] [PATCH] Password vault

Mon Apr 20 06:12:33 UTC 2015

Dne 3.4.2015 v 05:37 Endi Sukma Dewata napsal(a):
> Hi,
>
> Attached are new patches replacing all old ones. Please take a look at
> them. They should applied in this order: 365, 353-8, 355-6, 357-3,
> 359-2, 360-1, 364-1, 361-1.

Thanks for squashing patches 362-364 into the original patches, it's 
much more digestible this way.

>
> I'm planning to merge the vault and vault container object and use the
> vault type attribute to distinguish between the two. See more discussion
> about that below.

OK.

>>>> 3) The container_vault config option should be renamed to
>>>> container_vaultcontainer, as it is used in the vaultcontainer plugin,
>>>> not the vault plugin.
>>>
>>> It was named container_vault because it defines the DN for of the
>>> subtree that contains all vault-related entries. I moved the base_dn
>>> variable from vaultcontainer object to the vault object for clarity.
>>
>> That does not make much sense to me. Vault objects are contained in
>> their respective vaultcontainer objects, not directly in cn=vaults.
>
> The cn=vaults itself is actually a vault container (i.e.
> ipaVaultContainer). Theoretically you could store a vault in any
> container including cn=vaults, but we just don't want people to use it
> that way.
>
> I think this is consistent with other plugins. For example, the
> container_user points to cn=users, which is an nsContainer. There is no
> concept of 'user container' other than the cn=users itself. But even if
> there is one, the container_user will still be stored in the user class.

In fact it is not consistent with other plugins. All entries managed by 
the user plugin are stored *directly* under cn=users. Entries managed by 
the vault plugin are not stored directly under cn=vaults, but rather 
anywhere in the cn=vaults subtree and their DN is derived from the DN of 
the parent vault container. For such objects, we don't set 
<plugin>.container_dn and don't have container_<plugin> constant, but 
rather define them as child objects of their container objects.

>
> When the vault & vaultcontainer is merged, this will no longer be an issue.

OK.

>
>>>> 4) The vault object should be child of the vaultcontainer object.
>>>>
>>>> Not only is this correct from the object model perspective, but it
>>>> would
>>>> also make all the container_id hacks go away.
>>>
>>> It's a bit difficult because it will affect how the container & vault
>>> ID's are represented on the CLI.
>>
>> Yes, but the API should be done right (without hacks) first. You can
>> tune the CLI after that if you want.
>
> I think the current framework is rather limiting. It's kind of hard to
> build an interface that looks exactly what you want then add the
> implementation later to match the interface because many things are
> interrelated. In this particular case the object hierarchy on the server
> side would affect how the vault ID will be represented on the client side.

It indeed is limiting and that's a good thing. We don't want people to 
be able to create any crazy interfaces they can imagine, inconsistent 
with everything else in IPA.

>
>>> In the design the container ID would be a single value like this:
>>>
>>>    $ ipa vault-add /services/server.example.com/HTTP
>>>
>>> And if the vault ID is relative (without initial slash), it will be
>>> appended to the user's private container (i.e. /users/<username>/):
>>>
>>>    $ ipa vault-add PrivateVault
>>>
>>> The implementation is not complete yet. Currently it accepts this
>>> format:
>>>
>>>    $ ipa vault-add <vault name> [--container <container ID>]
>>>
>>> and I'm still planning to add this:
>>>
>>>    $ ipa vault-add <vault ID>
>
> This is actually now done in the latest patch. Internally the ID is
> still split into name & parent ID.
>
>>> If the vault must be a child of vaultcontainer, and the vaultcontainer
>>> must be a child of a vaultcontainer, does it mean the vault ID would
>>> have to be split into separate arguments like this?
>>>
>>>    $ ipa vaultcontainer-add services server.example.com HTTP
>>>
>>> If that's the case we'd lose the ability to specify a relative vault ID.
>>
>> Yes, that's the case.
>>
>> But I don't think relative IDs should be a problem, we can do this:
>>
>>      $ ipa vaultcontainer-add a b c  # absolute
>>      $ ipa vaultcontainer-add . c    # relative
>
> I think a "." will be confusing because there's no concept of "current
> vaultcontainer" like "current directory".
>
>> or this:
>>
>>      $ ipa vaultcontainer-add '' a b c  # absolute
>>      $ ipa vaultcontainer-add c         # relative
>
> An empty string is also confusing and can be problematic to distinguish
> with missing argument.

I didn't mean empty string specifically, it could have been any special 
value.

>
>> or this:
>>
>>      $ ipa vaultcontainer-add a b c         # absolute
>>      $ ipa vaultcontainer-add c --relative  # relative
>>
>> or this:
>>
>>      $ ipa vaultcontainer-add a b c --absolute  # absolute
>>      $ ipa vaultcontainer-add c                 # relative
>
> Per discussion in the IPA-CS meeting, we'd rather keep the "/" for vault
> ID delimiters because the spaces will be confusing to users, but we'll
> not use absolute ID anymore.

I'm sorry if I gave you the impression that this is up for discussion, 
but it is not. You either follow the convention without doing ugly hacks 
or your patch will not be accepted.

It won't be confusing to users, because they are used to the convention.

>
> It's not implemented yet, but here is the plan. By default the vault
> will be created in the user's private container:
>
>    $ ipa vault-add PrivateVault
>
> For shared vaults, instead of specifying an absolute ID we can specify a
> --shared option:
>
>    $ ipa vault-add --shared projects/IPA
>
> Same thing with service vaults:
>
>    $ ipa vault-add --service server.example.com/LDAP
>
> To access a vault in another user's private container:
>
>    $ ipa vault-show --user testuser PrivateVault

Fine by me, as long as you follow the convention.

>
>>>> 16) You do way too much stuff in vault_add.forward(). Only code that
>>>> must be done on the client needs to be there, i.e. handling of the
>>>> "data", "text" and "in" options.
>>>>
>>>> The vault_archive call must be in vault_add.execute(), otherwise a) we
>>>> will be making 2 RPC calls from the client and b) it won't be called at
>>>> all when api.env.in_server is True.
>>>
>>> This is done by design. The vault_add.forward() generates the salt and
>>> the keys. The vault_archive.forward() will encrypt the data. These
>>> operations have to be done on the client side to secure the transport of
>>> the data from the client through the server and finally to KRA. This
>>> mechanism prevents the server from looking at the unencrypted data.
>>
>> OK, but that does not justify that it's broken in server-side API. It
>> can and should be done so that it works the same way on both client and
>> server.
>>
>> I think the best solution would be to split the command into two
>> commands, server-side vault_archive_raw to archive already encrypted
>> data, and client-side vault_archive to encrypt data and archive them
>> with vault_archive_raw in its .execute(). Same thing for vault_retrieve.
>
> Actually I think it's better to just merge the add and archive, reducing
> the number of RPC calls. The vault_archive now will support two types of
> operations:
>
> (a) Archive data into a new vault (it will create the vault just before
> archiving the data):
>
>    $ ipa vault-archive testvault --create --in data ...
>
> (b) Archive data into an existing vault:
>
>    $ ipa vault-archive testvault --in data ...
>
> The vault_add is now just a wrapper for the vault_archive(a).

If that's just an implementation detail, OK.

If it's possible to modify existing vault objects using vault-add or 
create new objects using vault-archive, then NACK.

>
>> BTW, I also think it would be better if there were 2 separate sets of
>> commands for binary and textual data
>> (vault_{archive,retrieve}_{data,text}) rather than trying to handle
>> everything in vault_{archive,retrieve}.
>
> I don't think we want to provide a separate command of every possible
> data type & operation combination. Users would get confused. The archive
> & retrieve commands should be able to handle all current & future data
> types with options.

A command with two sets of mutually exclusive options is really two 
commands in disguise, which is a good sign it should be divided into two 
actual commands.

Who are you to say users would get confused? I say they would be more 
confused by a command with a plethora of mutually exclusive "options".

What other possible data types are there?

>
>>> The add & archive combination was added for convenience, not for
>>> optimization. This way you would be able to archive data into a new
>>> vault using a single command. Without this, you'd have to execute two
>>> separate commands: add & archive, which will result in 2 RPC calls
>>> anyway.
>>
>> I think I would prefer if it was separate, as that would be consistent
>> with other plugins (e.g. for objects with members, we don't allow adding
>> members directly in -add, you have to use -add-member after -add).
>
> The vault data is similar to group description, not group members. When
> creating a group we can supply the description. If not specified it will
> be blank. Archiving vault data is similar to updating the group
> description.

It's similar to group members because there are separate commands to 
manipulate them.

You have to choose one of:

   a) vault data is settable using vault-add and vault-mod and gettable 
using vault-mod, vault-show and vault-find

   b) vault data is settable using vault-archive and gettable using 
vault-retrieve

Anything in between is not permitted.

>
> Vault secrets on the other hand is similar to group members. You will
> see that in the other patch.

>
>>>> 18) Why are vaultcontainer objects automatically created in vault_find?
>>>>
>>>> This is just plain wrong and has to be removed, now.
>>>
>>> The code was supposed to create the user's private container like in
>>> #17, but the behavior has been changed. If the container being searched
>>> is the user's private container, it will ignore the container not found
>>> error and return zero results as if the private container already
>>> exists. For other containers the container must already exist. For this
>>> to work I had to add a handle_not_found() into LDAPSearch so the plugins
>>> can customize the proper search response for the missing private
>>> container.
>>
>> No ad-hoc refactoring please. If you want to refactor anything, it
>> should be first designed properly and put in a separate patch.
>>
>> Anyway, what should actually happen here is that if parent object is not
>> found, its object plugin's handle_not_found is called, i.e. something
>> like this:
>>
>>      parent = self.obj.parent_object
>>      if parent:
>>          self.api.Object[parent].handle_not_found(*args[:-1])
>>      else:
>>          raise errors.NotFound(
>>              reason=self.obj.container_not_found_msg % {
>>                  'container': self.obj.container_dn,
>>              }
>>          )
>
> It will not work because vault doesn't have a parent object. I'm adding
> handle_not_found() into LDAPCreate and LDAPSearch in the first patch.

NACK, this change exists for the sole reason of supporting your hacks. 
Follow IPA convetions and this change won't be necessary.

>
>>>> 21) vault_archive is not a retrieve operation, it should be based on
>>>> LDAPUpdate instead of LDAPRetrieve. Or Command actually, since it does
>>>> not do anything with LDAP. The same applies to vault_retrieve.
>>>
>>> The vault_archive does not actually modify the LDAP entry because it
>>> stores the data in KRA. It is actually an LDAPRetrieve operation because
>>> it needs to get the vault info before it can perform the archival
>>> operation. Same thing with vault_retrieve.
>>
>> It is not a LDAPRetrieve operation, because it has different semantics.
>> Please use Command as base class and either use ldap2 for direct LDAP or
>> call vault_show instead of hacking around LDAPRetrieve.
>
> It's been changed to inherit from LDAPQuery instead.

NACK, it's not a LDAPQuery operation, because it has different 
semantics. There is more to a command than executing code, so you should 
use a correct base class.

>
>>>> 22) vault_archive will break with binary data that is not UTF-8 encoded
>>>> text.
>>>>
>>>> This is where it occurs:
>>>>
>>>> +        vault_data[u'data'] = unicode(data)
>>>>
>>>> Generally, don't use unicode() on str values and str() on unicode
>>>> values
>>>> directly, always use .decode() and .encode().
>
> The unicode(s, encoding) is actually equivalent to s.decode(encoding),
> so the following code will not solve the problem:
>
>    vault_data[u'data'] = data.decode()
>
> As you said, decode() will only work if the data being decoded actually
> follows the encoding rules (i.e. already UTF-8 encoded).
>
>>> It needs to be a Unicode because json.dumps() doesn't work with binary
>>> data. Fixed by adding base-64 encoding.
>
> The base-64 encoding is necessary to convert random binaries into ASCII
> so it can be decoded into Unicode. Here is the current code:
>
>    vault_data[u'data'] = unicode(base64.b64encode(data))
>
> which is equivalent to:
>
>    vault_data[u'data'] = base64.b64encode(data).decode()

If you read a little bit further, you would get to the point, which is 
certainly not calling .decode() without arguments, but *always 
explicitly specify the encoding*.

>
>> If something str needs to be unicode, you should use .decode() to
>> explicitly specify the encoding, instead of relying on unicode() to pick
>> the correct one.
>
> Since we know this is ASCII data we can now specify UTF-8 encoding.
>
>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>
> But for anything that comes from user input (e.g. filenames, passwords),
> it's better to use the default encoding because that can be configured
> by the user.

You are confusing user's configured encoding with Python's default 
encoding. Default encoding in Python isn't derived from user's 
localization settings.

Anyway, anything that comes from user input is already decoded using 
user's configured encoding when it enters the framework so I don't know 
why are you even bringing it up here.

>
>> Anyway, I think a better solution than base64 would be to use the
>> "raw_unicode_escape" encoding:
>
> As explained above, base-64 encoding is necessary because random
> binaries don't follow any encoding rules. I'd rather not use
> raw_unicode_escape because it's not really a text data.

The result of decoding binary data using raw_unicode_escape is perfectly 
valid unicode data which doesn't eat 33% more space like base64 encoded 
binary does, hence my suggestion.

Anyway, it turns out when encoded in JSON, raw_unicode_escape string 
generally takes more space than base64 encoded string because of JSON 
escaping, so base64 is indeed better.

> Here's how it's
> now implemented:
>
>>      if data:
>>          data = data.decode('raw_unicode_escape')
>
> Input data is already in binaries, no conversion needed.
>
>>      elif text:
>>          data = text
>
> Input text will be converted to binaries with default encoding:
>
>    data = text.encode()

See what the default encoding actually is and why you shouldn't rely on 
it above.

>
>>      elif input_file:
>>          with open(input_file, 'rb') as f:
>>              data = f.read()
>>          data = data.decode('raw_unicode_escape')
>
> Input contains binary data, no conversion needed.
>
>>      else:
>>          data = u''
>
> If not specified, the data will be empty string:
>
>    data = ''
>
> The data needs to be converted into binaries so it can be encrypted
> before transport (depending on the vault type):
>
>    data = self.obj.encrypt(data, ...)
>
>>      vault_data[u'data'] = data
>
> Then for transport the data is base-64 encoded first, then converted
> into Unicode:
>
>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')

>
>>>> 26) Instead of the delete_entry refactoring in baseldap and
>>>> vaultcontainer_add, you can put this in vaultcontainer_add's
>>>> pre_callback:
>>>>
>>>>      try:
>>>>          ldap.get_entries(dn, scope=ldap.SCOPE_ONELEVEL, attrs_list=[])
>>>>      except errors.NotFound:
>>>>          pass
>>>>      else:
>>>>          if not options.get('force', False):
>>>>              raise errors.NotAllowedOnNonLeaf()
>>>
>>> I suppose you meant vaultcontainer_del. Fixed, but this will generate an
>>> additional search for each delete.
>>>
>>> I'm leaving the changes baseldap because it may be useful later and it
>>> doesn't change the behavior of the current code.
>>
>> Again, no ad-hoc refactoring please.
>
> The refactoring has also been moved into a separate patch. Just a note,
> I still don't think a plugin should do a search and maybe generate a
> NotAllowedOnLeaf exception on each delete operation. The exception
> should have been generated automatically by the DS. But we can discuss
> that separately.

NACK, turns out there is a better (and preferable) solution I didn't 
remember before, you can use exception callback in vaultcontainer_del:

     def exc_callback(self, keys, options, exc, call_func, *call_args, 
**call_kwargs):
         if call_func.func_name == 'delete_entry':
             if isinstance(exc, errors.NotAllowedOnLeaf):
                 if not options.get('force', False):
                     raise errors.DatabaseError(...)
         raise exc

>
>>>> 28) The vault and vaultcontainer plugins seem to be pretty similar, I
>>>> think it would make sense to put common stuff in a base class and
>>>> inherit vault and vaultcontainer from that.
>>>
>>> I plan to refactor the common code later. Right now the focus is to get
>>> the functionality working correctly first.
>>
>> Please do it now, "later" usually means "never". It shouldn't be too
>> hard and I can give you a hand with it if you want.
>
> As mentioned above, I'm considering merging the vault & vault container
> classes, so no need to refactor the common code out of these classes.
> This will be delivered as a separate patch later.

OK.

>
> Thanks.
>

-- 
Jan Cholasta