[Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Rob Crittenden rcritten at redhat.com
Mon Jan 14 21:56:42 UTC 2013


Petr Viktorin wrote:
> On 01/12/2013 12:49 AM, Rob Crittenden wrote:
>> Rob Crittenden wrote:
>>> Petr Viktorin wrote:
>>>> On 01/07/2013 05:42 PM, Rob Crittenden wrote:
>>>>> Petr Viktorin wrote:
>>>>>> On 01/07/2013 03:09 PM, Rob Crittenden wrote:
>>>>>>> Petr Viktorin wrote:
>>>> [...]
>>>>>>>>
>>>>>>>> Works for me, but I have some questions (this is an area I know
>>>>>>>> little
>>>>>>>> about).
>>>>>>>>
>>>>>>>> Can we be 100% sure these certs are always renewed together? Is
>>>>>>>> certmonger the only possible mechanism to update them?
>>>>>>>
>>>>>>> You raise a good point. If though some mechanism someone replaces
>>>>>>> one of
>>>>>>> these certs it will cause the script to fail. Some notification of
>>>>>>> this
>>>>>>> failure will be logged though, and of course, the certs won't be
>>>>>>> renewed.
>>>>>>>
>>>>>>> One could conceivably manually renew one of these certificates.
>>>>>>> It is
>>>>>>> probably a very remote possibility but it is non-zero.
>>>>>>>
>>>>>>>> Can we be sure certmonger always does the updates in parallel?
>>>>>>>> If it
>>>>>>>> managed to update the audit cert before starting on the others,
>>>>>>>> we'd
>>>>>>>> get
>>>>>>>> no CA restart for the others.
>>>>>>>
>>>>>>> These all get issued at the same time so should expire at the same
>>>>>>> time
>>>>>>> as well (see problem above). The script will hang around for 10
>>>>>>> minutes
>>>>>>> waiting for the renewal to complete, then give up.
>>>>>>
>>>>>> The certs might take different amounts of time to update, right?
>>>>>> Eventually, the expirations could go out of sync enough for it to
>>>>>> matter.
>>>>>> AFAICS, without proper locking we still get a race condition when the
>>>>>> other certs start being renewed some time (much less than 10 min)
>>>>>> after
>>>>>> the audit one:
>>>>>>
>>>>>> (time axis goes down)
>>>>>>
>>>>>>          audit cert                  other cert
>>>>>>          ----------                  ----------
>>>>>>      certmonger does renew                .
>>>>>>    post-renew script starts               .
>>>>>>   check state of other certs: OK          .
>>>>>>              .                   certmonger starts renew
>>>>>>   certutil modifies NSS DB  +  certmonger modifies NSS DB  == boom!
>>>>>
>>>>> This can't happen because we count the # of expected certs and wait
>>>>> until all are in MONITORING before continuing.
>>>>
>>>> The problem is that they're also in MONITORING before the whole renewal
>>>> starts. If the script happens to check just before the state changes
>>>> from MONITORING to GENERATING_CSR or whatever, we can get corruption.
>>>>
>>>>> The worse that would
>>>>> happen is the trust wouldn't be set on the audit cert and dogtag
>>>>> wouldn't be restarted.
>>>>>
>>>>>>
>>>>>>
>>>>>>> The state the system would be in is this:
>>>>>>>
>>>>>>> - audit cert trust not updated, so next restart of CA will fail
>>>>>>> - CA is not restarted so will not use updated certificates
>>>>>>>
>>>>>>>> And anyway, why does certmonger do renewals in parallel? It seems
>>>>>>>> that
>>>>>>>> if it did one at a time, always waiting until the post-renew
>>>>>>>> script is
>>>>>>>> done, this patch wouldn't be necessary.
>>>>>>>>
>>>>>>>
>>>>>>>  From what Nalin told me certmonger has some coarse locking such
>>>>>>> that
>>>>>>> renewals in a the same NSS database are serialized. As you point
>>>>>>> out, it
>>>>>>> would be nice to extend this locking to the post renewal scripts. We
>>>>>>> can
>>>>>>> ask Nalin about it. That would fix the potential corruption issue.
>>>>>>> It is
>>>>>>> still much nicer to not have to restart dogtag 4 times.
>>>>>>>
>>>>>>
>>>>>> Well, three extra restarts every few years seems like a small
>>>>>> price to
>>>>>> pay for robustness.
>>>>>
>>>>> It is a bit of a problem though because the certs all renew within
>>>>> seconds so end up fighting over who is restarting dogtag. This can
>>>>> cause
>>>>> some renewals go into a failure state to be retried later. This is
>>>>> fine
>>>>> functionally but makes QE a bit of a pain. You then have to make sure
>>>>> that renewal is basically done, then restart certmonger and check
>>>>> everything again, over and over until all the certs are renewed.
>>>>> This is
>>>>> difficult to automate.
>>>>
>>>> So we need to extend the certmonger lock, and wait until Dogtag is back
>>>> up before exiting the script. That way it'd still take longer than 1
>>>> restart, but all the renews should succeed.
>>>>
>>>
>>> Right, but older dogtag versions don't have the handy servlet to tell
>>> that the service is actually up and responding. So it is difficult to
>>> tell from tomcat alone whether the CA is actually up and handling
>>> requests.
>>>
>>
>> Revised patch that takes advantage of new version of certmonger.
>> certmonger-0.65 adds locking from the time renewal begins to the end of
>> the post_save_command. This lets us be sure that no other certmonger
>> renewals will have the NSS database open in read-write mode.
>>
>> We need to be sure that tomcat is shut down before we let certmonger
>> save the certificate to the NSS database because dogtag opens its
>> database read/write and two writers can cause corruption.
>>
>> rob
>>
>
> stop_pkicad and start_pkicad need the Dogtag version check to select
> pki_cad/pki_tomcatd.

Fixed.

>
> A more serious issue is that stop_pkicad needs to be installed on
> upgrades. Currently the whole enable_certificate_renewal step in
> ipa-upgradeconfig is skipped if it was done before.

I added a separate upgrade test for this. It currently won't work in 
SELinux enforcing mode because certmonger isn't allowed to talk to dbus 
in an rpm post script. It's being looked at.

> In stop_pkicad can you change the first log message to "certmonger
> stopping %sd"? It's before the action so we don't want past tense.

Fixed.

rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-1079-3-renewal.patch
Type: text/x-patch
Size: 27128 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20130114/94fd27cf/attachment.bin>


More information about the Freeipa-devel mailing list