[Freeipa-devel] [PATCH] Password vault

Thu May 28 05:46:19 UTC 2015

Dne 28.5.2015 v 07:43 Jan Cholasta napsal(a):
> Dne 27.5.2015 v 07:39 Jan Cholasta napsal(a):
>> Dne 27.5.2015 v 02:38 Endi Sukma Dewata napsal(a):
>>> Please take a look at the attached patch to add vault-archive/retrieve
>>> commands.
>>>
>>> On 4/20/2015 1:12 AM, Jan Cholasta wrote:
>>>>>>>> 16) You do way too much stuff in vault_add.forward(). Only code
>>>>>>>> that
>>>>>>>> must be done on the client needs to be there, i.e. handling of the
>>>>>>>> "data", "text" and "in" options.
>>>>>>>>
>>>>>>>> The vault_archive call must be in vault_add.execute(), otherwise
>>>>>>>> a) we
>>>>>>>> will be making 2 RPC calls from the client and b) it won't be
>>>>>>>> called at
>>>>>>>> all when api.env.in_server is True.
>>>>>>>
>>>>>>> This is done by design. The vault_add.forward() generates the salt
>>>>>>> and
>>>>>>> the keys. The vault_archive.forward() will encrypt the data. These
>>>>>>> operations have to be done on the client side to secure the
>>>>>>> transport of
>>>>>>> the data from the client through the server and finally to KRA. This
>>>>>>> mechanism prevents the server from looking at the unencrypted data.
>>>>>>
>>>>>> OK, but that does not justify that it's broken in server-side API. It
>>>>>> can and should be done so that it works the same way on both client
>>>>>> and
>>>>>> server.
>>>>>>
>>>>>> I think the best solution would be to split the command into two
>>>>>> commands, server-side vault_archive_raw to archive already encrypted
>>>>>> data, and client-side vault_archive to encrypt data and archive them
>>>>>> with vault_archive_raw in its .execute(). Same thing for
>>>>>> vault_retrieve.
>>>>>
>>>>> Actually I think it's better to just merge the add and archive,
>>>>> reducing
>>>>> the number of RPC calls. The vault_archive now will support two
>>>>> types of
>>>>> operations:
>>>>>
>>>>> (a) Archive data into a new vault (it will create the vault just
>>>>> before
>>>>> archiving the data):
>>>>>
>>>>>    $ ipa vault-archive testvault --create --in data ...
>>>>>
>>>>> (b) Archive data into an existing vault:
>>>>>
>>>>>    $ ipa vault-archive testvault --in data ...
>>>>>
>>>>> The vault_add is now just a wrapper for the vault_archive(a).
>>>>
>>>> If that's just an implementation detail, OK.
>>>>
>>>> If it's possible to modify existing vault objects using vault-add or
>>>> create new objects using vault-archive, then NACK.
>>>
>>> The vault-archive is now completely separate from vault-add. You can no
>>> longer archive data while creating a vault:
>>>
>>>    $ ipa vault-add test
>>>    $ ipa vault-archive test --in secret.bin
>>
>> OK.
>>
>>>
>>>>>> BTW, I also think it would be better if there were 2 separate sets of
>>>>>> commands for binary and textual data
>>>>>> (vault_{archive,retrieve}_{data,text}) rather than trying to handle
>>>>>> everything in vault_{archive,retrieve}.
>>>>>
>>>>> I don't think we want to provide a separate command of every possible
>>>>> data type & operation combination. Users would get confused. The
>>>>> archive
>>>>> & retrieve commands should be able to handle all current & future data
>>>>> types with options.
>>>>
>>>> A command with two sets of mutually exclusive options is really two
>>>> commands in disguise, which is a good sign it should be divided into
>>>> two
>>>> actual commands.
>>>>
>>>> Who are you to say users would get confused? I say they would be more
>>>> confused by a command with a plethora of mutually exclusive "options".
>>>>
>>>> What other possible data types are there?
>>>
>>> The patch has been simplified to support only binary data. The data can
>>> be specified using either of these options:
>>>
>>>    $ ipa vault-archive test --data <base-64 encoded data>
>>>    $ ipa vault-archive test --in <input file>
>>>
>>> I don't think we want to provide two separate commands for these options
>>> although they are mutually exclusive, do we?
>>
>> No, these are not really 2 different options, but rather 2 forms of the
>> same option, which for the lack of better support for files in the
>> framework have to be done as 2 options.
>>
>>>
>>>>>>> The add & archive combination was added for convenience, not for
>>>>>>> optimization. This way you would be able to archive data into a new
>>>>>>> vault using a single command. Without this, you'd have to execute
>>>>>>> two
>>>>>>> separate commands: add & archive, which will result in 2 RPC calls
>>>>>>> anyway.
>>>>>>
>>>>>> I think I would prefer if it was separate, as that would be
>>>>>> consistent
>>>>>> with other plugins (e.g. for objects with members, we don't allow
>>>>>> adding
>>>>>> members directly in -add, you have to use -add-member after -add).
>>>>>
>>>>> The vault data is similar to group description, not group members.
>>>>> When
>>>>> creating a group we can supply the description. If not specified it
>>>>> will
>>>>> be blank. Archiving vault data is similar to updating the group
>>>>> description.
>>>>
>>>> It's similar to group members because there are separate commands to
>>>> manipulate them.
>>>
>>> Just because there are separate commands doesn't mean vault data
>>> (single-valued) is similar to group members (multi-valued). It uses
>>> separate commands because of the encryption steps involved while
>>> archiving/retrieving data.
>>
>> That was not the point, but whatever.
>>
>>>
>>>> You have to choose one of:
>>>>
>>>>    a) vault data is settable using vault-add and vault-mod and gettable
>>>> using vault-mod, vault-show and vault-find
>>>>
>>>>    b) vault data is settable using vault-archive and gettable using
>>>> vault-retrieve
>>>>
>>>> Anything in between is not permitted.
>>>
>>> As mentioned above, the add and archive commands are now separate.
>>>
>>>>>>>> 21) vault_archive is not a retrieve operation, it should be
>>>>>>>> based on
>>>>>>>> LDAPUpdate instead of LDAPRetrieve. Or Command actually, since it
>>>>>>>> does
>>>>>>>> not do anything with LDAP. The same applies to vault_retrieve.
>>>>>>>
>>>>>>> The vault_archive does not actually modify the LDAP entry because it
>>>>>>> stores the data in KRA. It is actually an LDAPRetrieve operation
>>>>>>> because
>>>>>>> it needs to get the vault info before it can perform the archival
>>>>>>> operation. Same thing with vault_retrieve.
>>>>>>
>>>>>> It is not a LDAPRetrieve operation, because it has different
>>>>>> semantics.
>>>>>> Please use Command as base class and either use ldap2 for direct
>>>>>> LDAP or
>>>>>> call vault_show instead of hacking around LDAPRetrieve.
>>>>>
>>>>> It's been changed to inherit from LDAPQuery instead.
>>>>
>>>> NACK, it's not a LDAPQuery operation, because it has different
>>>> semantics. There is more to a command than executing code, so you
>>>> should
>>>> use a correct base class.
>>>
>>> Changed to inherit from Command as requested. Now these commands no
>>> longer have a direct access to the vault object (self.obj) although they
>>> are accessing vault objects like other vault commands. Also now the
>>> vault name argument has to be added explicitly on each command.
>>
>> You can inherit from crud.Retrieve and crud.Update to get self.obj and
>> the argument back.
>>
>>>
>>> There are existing commands that inherit from LDAPQuery while doing
>>> other things too, so the 'semantic' seems to be kind of arbitrarily
>>> defined and subjective.
>>
>> There is a lot of existing stuff in IPA that is bad in one way or
>> another, but it doesn't mean new code should be bad as well.
>>
>>>
>>>>>>>> 22) vault_archive will break with binary data that is not UTF-8
>>>>>>>> encoded
>>>>>>>> text.
>>>>>>>>
>>>>>>>> This is where it occurs:
>>>>>>>>
>>>>>>>> +        vault_data[u'data'] = unicode(data)
>>>>>>>>
>>>>>>>> Generally, don't use unicode() on str values and str() on unicode
>>>>>>>> values
>>>>>>>> directly, always use .decode() and .encode().
>>>>>
>>>>> The unicode(s, encoding) is actually equivalent to s.decode(encoding),
>>>>> so the following code will not solve the problem:
>>>>>
>>>>>    vault_data[u'data'] = data.decode()
>>>>>
>>>>> As you said, decode() will only work if the data being decoded
>>>>> actually
>>>>> follows the encoding rules (i.e. already UTF-8 encoded).
>>>>>
>>>>>>> It needs to be a Unicode because json.dumps() doesn't work with
>>>>>>> binary
>>>>>>> data. Fixed by adding base-64 encoding.
>>>>>
>>>>> The base-64 encoding is necessary to convert random binaries into
>>>>> ASCII
>>>>> so it can be decoded into Unicode. Here is the current code:
>>>>>
>>>>>    vault_data[u'data'] = unicode(base64.b64encode(data))
>>>>>
>>>>> which is equivalent to:
>>>>>
>>>>>    vault_data[u'data'] = base64.b64encode(data).decode()
>>>>
>>>> If you read a little bit further, you would get to the point, which is
>>>> certainly not calling .decode() without arguments, but *always
>>>> explicitly specify the encoding*.
>>>
>>> Added the explicit encoding name although it's not necessary since the
>>> data being encoded/decoded is base-64 encoded (i.e. ASCII).
>>
>> It may not be necessary but it doesn't hurt either, anyway my main
>> concern was with the other uses of unicode() in the original patch.
>>
>>>
>>>>>> If something str needs to be unicode, you should use .decode() to
>>>>>> explicitly specify the encoding, instead of relying on unicode() to
>>>>>> pick
>>>>>> the correct one.
>>>>>
>>>>> Since we know this is ASCII data we can now specify UTF-8 encoding.
>>>>>
>>>>>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>>>>>
>>>>> But for anything that comes from user input (e.g. filenames,
>>>>> passwords),
>>>>> it's better to use the default encoding because that can be configured
>>>>> by the user.
>>>>
>>>> You are confusing user's configured encoding with Python's default
>>>> encoding. Default encoding in Python isn't derived from user's
>>>> localization settings.
>>>>
>>>> Anyway, anything that comes from user input is already decoded using
>>>> user's configured encoding when it enters the framework so I don't know
>>>> why are you even bringing it up here.
>>>
>>> It's irrelevant now that the command only supports binary data.
>>
>> OK.
>>
>>>
>>>>>> Anyway, I think a better solution than base64 would be to use the
>>>>>> "raw_unicode_escape" encoding:
>>>>>
>>>>> As explained above, base-64 encoding is necessary because random
>>>>> binaries don't follow any encoding rules. I'd rather not use
>>>>> raw_unicode_escape because it's not really a text data.
>>>>
>>>> The result of decoding binary data using raw_unicode_escape is
>>>> perfectly
>>>> valid unicode data which doesn't eat 33% more space like base64 encoded
>>>> binary does, hence my suggestion.
>>>>
>>>> Anyway, it turns out when encoded in JSON, raw_unicode_escape string
>>>> generally takes more space than base64 encoded string because of JSON
>>>> escaping, so base64 is indeed better.
>>>
>>> Unchanged. It still uses base-64 encoding.
>>
>> Right.
>>
>>>
>>>>> Here's how it's
>>>>> now implemented:
>>>>>
>>>>>>      if data:
>>>>>>          data = data.decode('raw_unicode_escape')
>>>>>
>>>>> Input data is already in binaries, no conversion needed.
>>>>>
>>>>>>      elif text:
>>>>>>          data = text
>>>>>
>>>>> Input text will be converted to binaries with default encoding:
>>>>>
>>>>>    data = text.encode()
>>>>
>>>> See what the default encoding actually is and why you shouldn't rely on
>>>> it above.
>>>
>>> Irrelevant now.
>>>
>>>>>>      elif input_file:
>>>>>>          with open(input_file, 'rb') as f:
>>>>>>              data = f.read()
>>>>>>          data = data.decode('raw_unicode_escape')
>>>>>
>>>>> Input contains binary data, no conversion needed.
>>>>>
>>>>>>      else:
>>>>>>          data = u''
>>>>>
>>>>> If not specified, the data will be empty string:
>>>>>
>>>>>    data = ''
>>>>>
>>>>> The data needs to be converted into binaries so it can be encrypted
>>>>> before transport (depending on the vault type):
>>>>>
>>>>>    data = self.obj.encrypt(data, ...)
>>>>>
>>>>>>      vault_data[u'data'] = data
>>>>>
>>>>> Then for transport the data is base-64 encoded first, then converted
>>>>> into Unicode:
>>>>>
>>>>>    vault_data[u'data'] = base64.b64encode(data).decode('utf-8')
>>>
>>
>> If KRA is not installed, vault-archive and vault-retrieve fail with
>> internal error.
>
> On a related note, since KRA is optional, can we move the vaults
> container to cn=kra,cn=vaults? This is the convetion used by the other
> optional components (DNS and recently CA).

I mean cn=vaults,cn=kra of course.

>
>>
>>
>> The commands still behave differently based on whether they were called
>> from API which was initialized with in_server set to True or False.
>>
>>
>> There is no point in exposing the session_key, nonce and vault_data
>> options in CLI when their value is always overwritten in forward().
>>
>>
>> Will this always succeed?
>>
>> +        # deactivate vault record in KRA
>> +        response = kra_client.keys.list_keys(
>> +            client_key_id, pki.key.KeyClient.KEY_STATUS_ACTIVE)
>> +
>> +        for key_info in response.key_infos:
>> +            kra_client.keys.modify_key_status(
>> +                key_info.get_key_id(),
>> +                pki.key.KeyClient.KEY_STATUS_INACTIVE)
>>
>> If not, we might get into an inconsistent state, where the vault is
>> deleted in LDAP but still active in KRA. (I'm not sure if this is
>> actually a problem or not.)
>>
>
>

-- 
Jan Cholasta