[Freeipa-devel] [PATCH] Password vault

Thu May 28 05:43:49 UTC 2015

Dne 27.5.2015 v 07:39 Jan Cholasta napsal(a):
> Dne 27.5.2015 v 02:38 Endi Sukma Dewata napsal(a):
>> Please take a look at the attached patch to add vault-archive/retrieve
>> commands.
>>
>> On 4/20/2015 1:12 AM, Jan Cholasta wrote:
>>>>>>> 16) You do way too much stuff in vault_add.forward(). Only code that
>>>>>>> must be done on the client needs to be there, i.e. handling of the
>>>>>>> "data", "text" and "in" options.
>>>>>>>
>>>>>>> The vault_archive call must be in vault_add.execute(), otherwise
>>>>>>> a) we
>>>>>>> will be making 2 RPC calls from the client and b) it won't be
>>>>>>> called at
>>>>>>> all when api.env.in_server is True.
>>>>>>
>>>>>> This is done by design. The vault_add.forward() generates the salt
>>>>>> and
>>>>>> the keys. The vault_archive.forward() will encrypt the data. These
>>>>>> operations have to be done on the client side to secure the
>>>>>> transport of
>>>>>> the data from the client through the server and finally to KRA. This
>>>>>> mechanism prevents the server from looking at the unencrypted data.
>>>>>
>>>>> OK, but that does not justify that it's broken in server-side API. It
>>>>> can and should be done so that it works the same way on both client
>>>>> and
>>>>> server.
>>>>>
>>>>> I think the best solution would be to split the command into two
>>>>> commands, server-side vault_archive_raw to archive already encrypted
>>>>> data, and client-side vault_archive to encrypt data and archive them
>>>>> with vault_archive_raw in its .execute(). Same thing for
>>>>> vault_retrieve.
>>>>
>>>> Actually I think it's better to just merge the add and archive,
>>>> reducing
>>>> the number of RPC calls. The vault_archive now will support two
>>>> types of
>>>> operations:
>>>>
>>>> (a) Archive data into a new vault (it will create the vault just before
>>>> archiving the data):
>>>>
>>>>    $ ipa vault-archive testvault --create --in data ...
>>>>
>>>> (b) Archive data into an existing vault:
>>>>
>>>>    $ ipa vault-archive testvault --in data ...
>>>>
>>>> The vault_add is now just a wrapper for the vault_archive(a).
>>>
>>> If that's just an implementation detail, OK.
>>>
>>> If it's possible to modify existing vault objects using vault-add or
>>> create new objects using vault-archive, then NACK.
>>
>> The vault-archive is now completely separate from vault-add. You can no
>> longer archive data while creating a vault:
>>
>>    $ ipa vault-add test
>>    $ ipa vault-archive test --in secret.bin
>
> OK.
>
>>
>>>>> BTW, I also think it would be better if there were 2 separate sets of
>>>>> commands for binary and textual data
>>>>> (vault_{archive,retrieve}_{data,text}) rather than trying to handle
>>>>> everything in vault_{archive,retrieve}.
>>>>
>>>> I don't think we want to provide a separate command of every possible
>>>> data type & operation combination. Users would get confused. The
>>>> archive
>>>> & retrieve commands should be able to handle all current & future data
>>>> types with options.
>>>
>>> A command with two sets of mutually exclusive options is really two
>>> commands in disguise, which is a good sign it should be divided into two
>>> actual commands.
>>>
>>> Who are you to say users would get confused? I say they would be more
>>> confused by a command with a plethora of mutually exclusive "options".
>>>
>>> What other possible data types are there?
>>
>> The patch has been simplified to support only binary data. The data can
>> be specified using either of these options:
>>
>>    $ ipa vault-archive test --data <base-64 encoded data>
>>    $ ipa vault-archive test --in <input file>
>>
>> I don't think we want to provide two separate commands for these options
>> although they are mutually exclusive, do we?
>
> No, these are not really 2 different options, but rather 2 forms of the
> same option, which for the lack of better support for files in the
> framework have to be done as 2 options.
>
>>
>>>>>> The add & archive combination was added for convenience, not for
>>>>>> optimization. This way you would be able to archive data into a new
>>>>>> vault using a single command. Without this, you'd have to execute two
>>>>>> separate commands: add & archive, which will result in 2 RPC calls
>>>>>> anyway.
>>>>>
>>>>> I think I would prefer if it was separate, as that would be consistent
>>>>> with other plugins (e.g. for objects with members, we don't allow
>>>>> adding
>>>>> members directly in -add, you have to use -add-member after -add).
>>>>
>>>> The vault data is similar to group description, not group members. When
>>>> creating a group we can supply the description. If not specified it
>>>> will
>>>> be blank. Archiving vault data is similar to updating the group
>>>> description.
>>>
>>> It's similar to group members because there are separate commands to
>>> manipulate them.
>>
>> Just because there are separate commands doesn't mean vault data
>> (single-valued) is similar to group members (multi-valued). It uses
>> separate commands because of the encryption steps involved while
>> archiving/retrieving data.
>
> That was not the point, but whatever.
>
>>
>>> You have to choose one of:
>>>
>>>    a) vault data is settable using vault-add and vault-mod and gettable
>>> using vault-mod, vault-show and vault-find
>>>
>>>    b) vault data is settable using vault-archive and gettable using
>>> vault-retrieve
>>>
>>> Anything in between is not permitted.
>>
>> As mentioned above, the add and archive commands are now separate.
>>
>>>>>>> 21) vault_archive is not a retrieve operation, it should be based on
>>>>>>> LDAPUpdate instead of LDAPRetrieve. Or Command actually, since it
>>>>>>> does
>>>>>>> not do anything with LDAP. The same applies to vault_retrieve.
>>>>>>
>>>>>> The vault_archive does not actually modify the LDAP entry because it
>>>>>> stores the data in KRA. It is actually an LDAPRetrieve operation
>>>>>> because
>>>>>> it needs to get the vault info before it can perform the archival
>>>>>> operation. Same thing with vault_retrieve.
>>>>>
>>>>> It is not a LDAPRetrieve operation, because it has different
>>>>> semantics.
>>>>> Please use Command as base class and either use ldap2 for direct
>>>>> LDAP or
>>>>> call vault_show instead of hacking around LDAPRetrieve.
>>>>
>>>> It's been changed to inherit from LDAPQuery instead.
>>>
>>> NACK, it's not a LDAPQuery operation, because it has different
>>> semantics. There is more to a command than executing code, so you should
>>> use a correct base class.
>>
>> Changed to inherit from Command as requested. Now these commands no
>> longer have a direct access to the vault object (self.obj) although they
>> are accessing vault objects like other vault commands. Also now the
>> vault name argument has to be added explicitly on each command.
>
> You can inherit from crud.Retrieve and crud.Update to get self.obj and
> the argument back.
>
>>
>> There are existing commands that inherit from LDAPQuery while doing
>> other things too, so the 'semantic' seems to be kind of arbitrarily
>> defined and subjective.
>
> There is a lot of existing stuff in IPA that is bad in one way or
> another, but it doesn't mean new code should be bad as well.
>
>>
>>>>>>> 22) vault_archive will break with binary data that is not UTF-8
>>>>>>> encoded
>>>>>>> text.
>>>>>>>
>>>>>>> This is where it occurs:
>>>>>>>
>>>>>>> +        vault_data[u'data'] = unicode(data)
>>>>>>>
>>>>>>> Generally, don't use unicode() on str values and str() on unicode
>>>>>>> values
>>>>>>> directly, always use .decode() and .encode().
>>>>
>>>> The unicode(s, encoding) is actually equivalent to s.decode(encoding),
>>>> so the following code will not solve the problem:
>>>>
>>>>    vault_data[u'data'] = data.decode()
>>>>
>>>> As you said, decode() will only work if the data being decoded actually
>>>> follows the encoding rules (i.e. already UTF-8 encoded).
>>>>
>>>>>> It needs to be a Unicode because json.dumps() doesn't work with
>>>>>> binary
>>>>>> data. Fixed by adding base-64 encoding.
>>>>
>>>> The base-64 encoding is necessary to convert random binaries into ASCII
>>>> so it can be decoded into Unicode. Here is the current code:
>>>>
>>>>    vault_data[u'data'] = unicode(base64.b64encode(data))
>>>>
>>>> which is equivalent to:
>>>>
>>>>    vault_data[u'data'] = base64.b64encode(data).decode()
>>>
>>> If you read a little bit further, you would get to the point, which is
>>> certainly not calling .decode() without arguments, but *always
>>> explicitly specify the encoding*.
>>
>> Added the explicit encoding name although it's not necessary since the
>> data being encoded/decoded is base-64 encoded (i.e. ASCII).
>
> It may not be necessary but it doesn't hurt either, anyway my main
> concern was with the other uses of unicode() in the original patch.
>
>>
>>>>> If something str needs to be unicode, you should use .decode() to
>>>>> explicitly specify the encoding, instead of relying on unicode() to
>>>>> pick
>>>>> the correct one.
>>>>
>>>> Since we know this is ASCII data we can now specify UTF-8 encoding.
>>>>
>>>>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>>>>
>>>> But for anything that comes from user input (e.g. filenames,
>>>> passwords),
>>>> it's better to use the default encoding because that can be configured
>>>> by the user.
>>>
>>> You are confusing user's configured encoding with Python's default
>>> encoding. Default encoding in Python isn't derived from user's
>>> localization settings.
>>>
>>> Anyway, anything that comes from user input is already decoded using
>>> user's configured encoding when it enters the framework so I don't know
>>> why are you even bringing it up here.
>>
>> It's irrelevant now that the command only supports binary data.
>
> OK.
>
>>
>>>>> Anyway, I think a better solution than base64 would be to use the
>>>>> "raw_unicode_escape" encoding:
>>>>
>>>> As explained above, base-64 encoding is necessary because random
>>>> binaries don't follow any encoding rules. I'd rather not use
>>>> raw_unicode_escape because it's not really a text data.
>>>
>>> The result of decoding binary data using raw_unicode_escape is perfectly
>>> valid unicode data which doesn't eat 33% more space like base64 encoded
>>> binary does, hence my suggestion.
>>>
>>> Anyway, it turns out when encoded in JSON, raw_unicode_escape string
>>> generally takes more space than base64 encoded string because of JSON
>>> escaping, so base64 is indeed better.
>>
>> Unchanged. It still uses base-64 encoding.
>
> Right.
>
>>
>>>> Here's how it's
>>>> now implemented:
>>>>
>>>>>      if data:
>>>>>          data = data.decode('raw_unicode_escape')
>>>>
>>>> Input data is already in binaries, no conversion needed.
>>>>
>>>>>      elif text:
>>>>>          data = text
>>>>
>>>> Input text will be converted to binaries with default encoding:
>>>>
>>>>    data = text.encode()
>>>
>>> See what the default encoding actually is and why you shouldn't rely on
>>> it above.
>>
>> Irrelevant now.
>>
>>>>>      elif input_file:
>>>>>          with open(input_file, 'rb') as f:
>>>>>              data = f.read()
>>>>>          data = data.decode('raw_unicode_escape')
>>>>
>>>> Input contains binary data, no conversion needed.
>>>>
>>>>>      else:
>>>>>          data = u''
>>>>
>>>> If not specified, the data will be empty string:
>>>>
>>>>    data = ''
>>>>
>>>> The data needs to be converted into binaries so it can be encrypted
>>>> before transport (depending on the vault type):
>>>>
>>>>    data = self.obj.encrypt(data, ...)
>>>>
>>>>>      vault_data[u'data'] = data
>>>>
>>>> Then for transport the data is base-64 encoded first, then converted
>>>> into Unicode:
>>>>
>>>>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>>
>
> If KRA is not installed, vault-archive and vault-retrieve fail with
> internal error.

On a related note, since KRA is optional, can we move the vaults 
container to cn=kra,cn=vaults? This is the convetion used by the other 
optional components (DNS and recently CA).

>
>
> The commands still behave differently based on whether they were called
> from API which was initialized with in_server set to True or False.
>
>
> There is no point in exposing the session_key, nonce and vault_data
> options in CLI when their value is always overwritten in forward().
>
>
> Will this always succeed?
>
> +        # deactivate vault record in KRA
> +        response = kra_client.keys.list_keys(
> +            client_key_id, pki.key.KeyClient.KEY_STATUS_ACTIVE)
> +
> +        for key_info in response.key_infos:
> +            kra_client.keys.modify_key_status(
> +                key_info.get_key_id(),
> +                pki.key.KeyClient.KEY_STATUS_INACTIVE)
>
> If not, we might get into an inconsistent state, where the vault is
> deleted in LDAP but still active in KRA. (I'm not sure if this is
> actually a problem or not.)
>

-- 
Jan Cholasta