[Freeipa-devel] [PATCH] Password vault

Wed May 27 05:39:54 UTC 2015

Dne 27.5.2015 v 02:38 Endi Sukma Dewata napsal(a):
> Please take a look at the attached patch to add vault-archive/retrieve
> commands.
>
> On 4/20/2015 1:12 AM, Jan Cholasta wrote:
>>>>>> 16) You do way too much stuff in vault_add.forward(). Only code that
>>>>>> must be done on the client needs to be there, i.e. handling of the
>>>>>> "data", "text" and "in" options.
>>>>>>
>>>>>> The vault_archive call must be in vault_add.execute(), otherwise
>>>>>> a) we
>>>>>> will be making 2 RPC calls from the client and b) it won't be
>>>>>> called at
>>>>>> all when api.env.in_server is True.
>>>>>
>>>>> This is done by design. The vault_add.forward() generates the salt and
>>>>> the keys. The vault_archive.forward() will encrypt the data. These
>>>>> operations have to be done on the client side to secure the
>>>>> transport of
>>>>> the data from the client through the server and finally to KRA. This
>>>>> mechanism prevents the server from looking at the unencrypted data.
>>>>
>>>> OK, but that does not justify that it's broken in server-side API. It
>>>> can and should be done so that it works the same way on both client and
>>>> server.
>>>>
>>>> I think the best solution would be to split the command into two
>>>> commands, server-side vault_archive_raw to archive already encrypted
>>>> data, and client-side vault_archive to encrypt data and archive them
>>>> with vault_archive_raw in its .execute(). Same thing for
>>>> vault_retrieve.
>>>
>>> Actually I think it's better to just merge the add and archive, reducing
>>> the number of RPC calls. The vault_archive now will support two types of
>>> operations:
>>>
>>> (a) Archive data into a new vault (it will create the vault just before
>>> archiving the data):
>>>
>>>    $ ipa vault-archive testvault --create --in data ...
>>>
>>> (b) Archive data into an existing vault:
>>>
>>>    $ ipa vault-archive testvault --in data ...
>>>
>>> The vault_add is now just a wrapper for the vault_archive(a).
>>
>> If that's just an implementation detail, OK.
>>
>> If it's possible to modify existing vault objects using vault-add or
>> create new objects using vault-archive, then NACK.
>
> The vault-archive is now completely separate from vault-add. You can no
> longer archive data while creating a vault:
>
>    $ ipa vault-add test
>    $ ipa vault-archive test --in secret.bin

OK.

>
>>>> BTW, I also think it would be better if there were 2 separate sets of
>>>> commands for binary and textual data
>>>> (vault_{archive,retrieve}_{data,text}) rather than trying to handle
>>>> everything in vault_{archive,retrieve}.
>>>
>>> I don't think we want to provide a separate command of every possible
>>> data type & operation combination. Users would get confused. The archive
>>> & retrieve commands should be able to handle all current & future data
>>> types with options.
>>
>> A command with two sets of mutually exclusive options is really two
>> commands in disguise, which is a good sign it should be divided into two
>> actual commands.
>>
>> Who are you to say users would get confused? I say they would be more
>> confused by a command with a plethora of mutually exclusive "options".
>>
>> What other possible data types are there?
>
> The patch has been simplified to support only binary data. The data can
> be specified using either of these options:
>
>    $ ipa vault-archive test --data <base-64 encoded data>
>    $ ipa vault-archive test --in <input file>
>
> I don't think we want to provide two separate commands for these options
> although they are mutually exclusive, do we?

No, these are not really 2 different options, but rather 2 forms of the 
same option, which for the lack of better support for files in the 
framework have to be done as 2 options.

>
>>>>> The add & archive combination was added for convenience, not for
>>>>> optimization. This way you would be able to archive data into a new
>>>>> vault using a single command. Without this, you'd have to execute two
>>>>> separate commands: add & archive, which will result in 2 RPC calls
>>>>> anyway.
>>>>
>>>> I think I would prefer if it was separate, as that would be consistent
>>>> with other plugins (e.g. for objects with members, we don't allow
>>>> adding
>>>> members directly in -add, you have to use -add-member after -add).
>>>
>>> The vault data is similar to group description, not group members. When
>>> creating a group we can supply the description. If not specified it will
>>> be blank. Archiving vault data is similar to updating the group
>>> description.
>>
>> It's similar to group members because there are separate commands to
>> manipulate them.
>
> Just because there are separate commands doesn't mean vault data
> (single-valued) is similar to group members (multi-valued). It uses
> separate commands because of the encryption steps involved while
> archiving/retrieving data.

That was not the point, but whatever.

>
>> You have to choose one of:
>>
>>    a) vault data is settable using vault-add and vault-mod and gettable
>> using vault-mod, vault-show and vault-find
>>
>>    b) vault data is settable using vault-archive and gettable using
>> vault-retrieve
>>
>> Anything in between is not permitted.
>
> As mentioned above, the add and archive commands are now separate.
>
>>>>>> 21) vault_archive is not a retrieve operation, it should be based on
>>>>>> LDAPUpdate instead of LDAPRetrieve. Or Command actually, since it
>>>>>> does
>>>>>> not do anything with LDAP. The same applies to vault_retrieve.
>>>>>
>>>>> The vault_archive does not actually modify the LDAP entry because it
>>>>> stores the data in KRA. It is actually an LDAPRetrieve operation
>>>>> because
>>>>> it needs to get the vault info before it can perform the archival
>>>>> operation. Same thing with vault_retrieve.
>>>>
>>>> It is not a LDAPRetrieve operation, because it has different semantics.
>>>> Please use Command as base class and either use ldap2 for direct
>>>> LDAP or
>>>> call vault_show instead of hacking around LDAPRetrieve.
>>>
>>> It's been changed to inherit from LDAPQuery instead.
>>
>> NACK, it's not a LDAPQuery operation, because it has different
>> semantics. There is more to a command than executing code, so you should
>> use a correct base class.
>
> Changed to inherit from Command as requested. Now these commands no
> longer have a direct access to the vault object (self.obj) although they
> are accessing vault objects like other vault commands. Also now the
> vault name argument has to be added explicitly on each command.

You can inherit from crud.Retrieve and crud.Update to get self.obj and 
the argument back.

>
> There are existing commands that inherit from LDAPQuery while doing
> other things too, so the 'semantic' seems to be kind of arbitrarily
> defined and subjective.

There is a lot of existing stuff in IPA that is bad in one way or 
another, but it doesn't mean new code should be bad as well.

>
>>>>>> 22) vault_archive will break with binary data that is not UTF-8
>>>>>> encoded
>>>>>> text.
>>>>>>
>>>>>> This is where it occurs:
>>>>>>
>>>>>> +        vault_data[u'data'] = unicode(data)
>>>>>>
>>>>>> Generally, don't use unicode() on str values and str() on unicode
>>>>>> values
>>>>>> directly, always use .decode() and .encode().
>>>
>>> The unicode(s, encoding) is actually equivalent to s.decode(encoding),
>>> so the following code will not solve the problem:
>>>
>>>    vault_data[u'data'] = data.decode()
>>>
>>> As you said, decode() will only work if the data being decoded actually
>>> follows the encoding rules (i.e. already UTF-8 encoded).
>>>
>>>>> It needs to be a Unicode because json.dumps() doesn't work with binary
>>>>> data. Fixed by adding base-64 encoding.
>>>
>>> The base-64 encoding is necessary to convert random binaries into ASCII
>>> so it can be decoded into Unicode. Here is the current code:
>>>
>>>    vault_data[u'data'] = unicode(base64.b64encode(data))
>>>
>>> which is equivalent to:
>>>
>>>    vault_data[u'data'] = base64.b64encode(data).decode()
>>
>> If you read a little bit further, you would get to the point, which is
>> certainly not calling .decode() without arguments, but *always
>> explicitly specify the encoding*.
>
> Added the explicit encoding name although it's not necessary since the
> data being encoded/decoded is base-64 encoded (i.e. ASCII).

It may not be necessary but it doesn't hurt either, anyway my main 
concern was with the other uses of unicode() in the original patch.

>
>>>> If something str needs to be unicode, you should use .decode() to
>>>> explicitly specify the encoding, instead of relying on unicode() to
>>>> pick
>>>> the correct one.
>>>
>>> Since we know this is ASCII data we can now specify UTF-8 encoding.
>>>
>>>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>>>
>>> But for anything that comes from user input (e.g. filenames, passwords),
>>> it's better to use the default encoding because that can be configured
>>> by the user.
>>
>> You are confusing user's configured encoding with Python's default
>> encoding. Default encoding in Python isn't derived from user's
>> localization settings.
>>
>> Anyway, anything that comes from user input is already decoded using
>> user's configured encoding when it enters the framework so I don't know
>> why are you even bringing it up here.
>
> It's irrelevant now that the command only supports binary data.

OK.

>
>>>> Anyway, I think a better solution than base64 would be to use the
>>>> "raw_unicode_escape" encoding:
>>>
>>> As explained above, base-64 encoding is necessary because random
>>> binaries don't follow any encoding rules. I'd rather not use
>>> raw_unicode_escape because it's not really a text data.
>>
>> The result of decoding binary data using raw_unicode_escape is perfectly
>> valid unicode data which doesn't eat 33% more space like base64 encoded
>> binary does, hence my suggestion.
>>
>> Anyway, it turns out when encoded in JSON, raw_unicode_escape string
>> generally takes more space than base64 encoded string because of JSON
>> escaping, so base64 is indeed better.
>
> Unchanged. It still uses base-64 encoding.

Right.

>
>>> Here's how it's
>>> now implemented:
>>>
>>>>      if data:
>>>>          data = data.decode('raw_unicode_escape')
>>>
>>> Input data is already in binaries, no conversion needed.
>>>
>>>>      elif text:
>>>>          data = text
>>>
>>> Input text will be converted to binaries with default encoding:
>>>
>>>    data = text.encode()
>>
>> See what the default encoding actually is and why you shouldn't rely on
>> it above.
>
> Irrelevant now.
>
>>>>      elif input_file:
>>>>          with open(input_file, 'rb') as f:
>>>>              data = f.read()
>>>>          data = data.decode('raw_unicode_escape')
>>>
>>> Input contains binary data, no conversion needed.
>>>
>>>>      else:
>>>>          data = u''
>>>
>>> If not specified, the data will be empty string:
>>>
>>>    data = ''
>>>
>>> The data needs to be converted into binaries so it can be encrypted
>>> before transport (depending on the vault type):
>>>
>>>    data = self.obj.encrypt(data, ...)
>>>
>>>>      vault_data[u'data'] = data
>>>
>>> Then for transport the data is base-64 encoded first, then converted
>>> into Unicode:
>>>
>>>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>

If KRA is not installed, vault-archive and vault-retrieve fail with 
internal error.

The commands still behave differently based on whether they were called 
from API which was initialized with in_server set to True or False.

There is no point in exposing the session_key, nonce and vault_data 
options in CLI when their value is always overwritten in forward().

Will this always succeed?

+        # deactivate vault record in KRA
+        response = kra_client.keys.list_keys(
+            client_key_id, pki.key.KeyClient.KEY_STATUS_ACTIVE)
+
+        for key_info in response.key_infos:
+            kra_client.keys.modify_key_status(
+                key_info.get_key_id(),
+                pki.key.KeyClient.KEY_STATUS_INACTIVE)

If not, we might get into an inconsistent state, where the vault is 
deleted in LDAP but still active in KRA. (I'm not sure if this is 
actually a problem or not.)

-- 
Jan Cholasta