[Freeipa-devel] [PATCH] 1079 address CA subsystem renewal issues

Petr Viktorin pviktori at redhat.com
Tue Jan 15 14:41:06 UTC 2013


On 01/14/2013 10:56 PM, Rob Crittenden wrote:
> Petr Viktorin wrote:
>> On 01/12/2013 12:49 AM, Rob Crittenden wrote:
>>> Rob Crittenden wrote:
>>>> Petr Viktorin wrote:
>>>>> On 01/07/2013 05:42 PM, Rob Crittenden wrote:
>>>>>> Petr Viktorin wrote:
>>>>>>> On 01/07/2013 03:09 PM, Rob Crittenden wrote:
>>>>>>>> Petr Viktorin wrote:
>>>>> [...]
>>>>>>>>>
>>>>>>>>> Works for me, but I have some questions (this is an area I know
>>>>>>>>> little
>>>>>>>>> about).
>>>>>>>>>
>>>>>>>>> Can we be 100% sure these certs are always renewed together? Is
>>>>>>>>> certmonger the only possible mechanism to update them?
>>>>>>>>
>>>>>>>> You raise a good point. If though some mechanism someone replaces
>>>>>>>> one of
>>>>>>>> these certs it will cause the script to fail. Some notification of
>>>>>>>> this
>>>>>>>> failure will be logged though, and of course, the certs won't be
>>>>>>>> renewed.
>>>>>>>>
>>>>>>>> One could conceivably manually renew one of these certificates.
>>>>>>>> It is
>>>>>>>> probably a very remote possibility but it is non-zero.
>>>>>>>>
>>>>>>>>> Can we be sure certmonger always does the updates in parallel?
>>>>>>>>> If it
>>>>>>>>> managed to update the audit cert before starting on the others,
>>>>>>>>> we'd
>>>>>>>>> get
>>>>>>>>> no CA restart for the others.
>>>>>>>>
>>>>>>>> These all get issued at the same time so should expire at the same
>>>>>>>> time
>>>>>>>> as well (see problem above). The script will hang around for 10
>>>>>>>> minutes
>>>>>>>> waiting for the renewal to complete, then give up.
>>>>>>>
>>>>>>> The certs might take different amounts of time to update, right?
>>>>>>> Eventually, the expirations could go out of sync enough for it to
>>>>>>> matter.
>>>>>>> AFAICS, without proper locking we still get a race condition when
>>>>>>> the
>>>>>>> other certs start being renewed some time (much less than 10 min)
>>>>>>> after
>>>>>>> the audit one:
>>>>>>>
>>>>>>> (time axis goes down)
>>>>>>>
>>>>>>>          audit cert                  other cert
>>>>>>>          ----------                  ----------
>>>>>>>      certmonger does renew                .
>>>>>>>    post-renew script starts               .
>>>>>>>   check state of other certs: OK          .
>>>>>>>              .                   certmonger starts renew
>>>>>>>   certutil modifies NSS DB  +  certmonger modifies NSS DB  == boom!
>>>>>>
>>>>>> This can't happen because we count the # of expected certs and wait
>>>>>> until all are in MONITORING before continuing.
>>>>>
>>>>> The problem is that they're also in MONITORING before the whole
>>>>> renewal
>>>>> starts. If the script happens to check just before the state changes
>>>>> from MONITORING to GENERATING_CSR or whatever, we can get corruption.
>>>>>
>>>>>> The worse that would
>>>>>> happen is the trust wouldn't be set on the audit cert and dogtag
>>>>>> wouldn't be restarted.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The state the system would be in is this:
>>>>>>>>
>>>>>>>> - audit cert trust not updated, so next restart of CA will fail
>>>>>>>> - CA is not restarted so will not use updated certificates
>>>>>>>>
>>>>>>>>> And anyway, why does certmonger do renewals in parallel? It seems
>>>>>>>>> that
>>>>>>>>> if it did one at a time, always waiting until the post-renew
>>>>>>>>> script is
>>>>>>>>> done, this patch wouldn't be necessary.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  From what Nalin told me certmonger has some coarse locking such
>>>>>>>> that
>>>>>>>> renewals in a the same NSS database are serialized. As you point
>>>>>>>> out, it
>>>>>>>> would be nice to extend this locking to the post renewal
>>>>>>>> scripts. We
>>>>>>>> can
>>>>>>>> ask Nalin about it. That would fix the potential corruption issue.
>>>>>>>> It is
>>>>>>>> still much nicer to not have to restart dogtag 4 times.
>>>>>>>>
>>>>>>>
>>>>>>> Well, three extra restarts every few years seems like a small
>>>>>>> price to
>>>>>>> pay for robustness.
>>>>>>
>>>>>> It is a bit of a problem though because the certs all renew within
>>>>>> seconds so end up fighting over who is restarting dogtag. This can
>>>>>> cause
>>>>>> some renewals go into a failure state to be retried later. This is
>>>>>> fine
>>>>>> functionally but makes QE a bit of a pain. You then have to make sure
>>>>>> that renewal is basically done, then restart certmonger and check
>>>>>> everything again, over and over until all the certs are renewed.
>>>>>> This is
>>>>>> difficult to automate.
>>>>>
>>>>> So we need to extend the certmonger lock, and wait until Dogtag is
>>>>> back
>>>>> up before exiting the script. That way it'd still take longer than 1
>>>>> restart, but all the renews should succeed.
>>>>>
>>>>
>>>> Right, but older dogtag versions don't have the handy servlet to tell
>>>> that the service is actually up and responding. So it is difficult to
>>>> tell from tomcat alone whether the CA is actually up and handling
>>>> requests.
>>>>
>>>
>>> Revised patch that takes advantage of new version of certmonger.
>>> certmonger-0.65 adds locking from the time renewal begins to the end of
>>> the post_save_command. This lets us be sure that no other certmonger
>>> renewals will have the NSS database open in read-write mode.
>>>
>>> We need to be sure that tomcat is shut down before we let certmonger
>>> save the certificate to the NSS database because dogtag opens its
>>> database read/write and two writers can cause corruption.
>>>
>>> rob
>>>
>>
>> stop_pkicad and start_pkicad need the Dogtag version check to select
>> pki_cad/pki_tomcatd.
>
> Fixed.
>
>>
>> A more serious issue is that stop_pkicad needs to be installed on
>> upgrades. Currently the whole enable_certificate_renewal step in
>> ipa-upgradeconfig is skipped if it was done before.
>
> I added a separate upgrade test for this. It currently won't work in
> SELinux enforcing mode because certmonger isn't allowed to talk to dbus
> in an rpm post script. It's being looked at.
>
>> In stop_pkicad can you change the first log message to "certmonger
>> stopping %sd"? It's before the action so we don't want past tense.
>
> Fixed.
>
> rob

I get a bunch of errors when installing the RPM:

   Updating   : freeipa-server-3.1.0GITfe82329-0.fc18.x86_64

                                                     4/14
certmonger failed to stop tracking certificate: Command 
'/usr/bin/getcert stop-tracking -i 20240902001817' returned non-zero 
exit status 1
certmonger failed to stop tracking certificate: Command 
'/usr/bin/getcert stop-tracking -i 20240902001813' returned non-zero 
exit status 1
certmonger failed to stop tracking certificate: Command 
'/usr/bin/getcert stop-tracking -i 20240902001814' returned non-zero 
exit status 1
certmonger failed to stop tracking certificate: Command 
'/usr/bin/getcert stop-tracking -i 20240902001815' returned non-zero 
exit status 1
certmonger failed to stop tracking certificate: Command 
'/usr/bin/getcert stop-tracking -i 20240902001816' returned non-zero 
exit status 1
certmonger failed to start tracking certificate: Command 
'/usr/bin/getcert start-tracking -d /etc/pki/pki-tomcat/alias -n 
auditSigningCert cert-pki-ca -c dogtag-ipa-renew-agent -B 
/usr/lib64/ipa/certmonger/stop_pkicad -C 
/usr/lib64/ipa/certmonger/renew_ca_cert "auditSigningCert cert-pki-ca" 
-P XXXXXXXX' returned non-zero exit status 1
certmonger failed to start tracking certificate: Command 
'/usr/bin/getcert start-tracking -d /etc/pki/pki-tomcat/alias -n 
ocspSigningCert cert-pki-ca -c dogtag-ipa-renew-agent -B 
/usr/lib64/ipa/certmonger/stop_pkicad -C 
/usr/lib64/ipa/certmonger/renew_ca_cert "ocspSigningCert cert-pki-ca" -P 
XXXXXXXX' returned non-zero exit status 1
certmonger failed to start tracking certificate: Command 
'/usr/bin/getcert start-tracking -d /etc/pki/pki-tomcat/alias -n 
subsystemCert cert-pki-ca -c dogtag-ipa-renew-agent -B 
/usr/lib64/ipa/certmonger/stop_pkicad -C 
/usr/lib64/ipa/certmonger/renew_ca_cert "subsystemCert cert-pki-ca" -P 
XXXXXXXX' returned non-zero exit status 1
certmonger failed to start tracking certificate: Command 
'/usr/bin/getcert start-tracking -d /etc/httpd/alias -n ipaCert -c 
dogtag-ipa-renew-agent -C /usr/lib64/ipa/certmonger/renew_ra_cert -p 
/etc/httpd/alias/pwdfile.txt' returned non-zero exit status 1
certmonger failed to start tracking certificate: Command 
'/usr/bin/getcert start-tracking -d /etc/pki/pki-tomcat/alias -n 
Server-Cert cert-pki-ca -c dogtag-ipa-renew-agent -P XXXXXXXX' returned 
non-zero exit status 1


For each stop-tracking the ipaupgrade.log says:
2030-07-20T04:07:40Z DEBUG Starting external process
2030-07-20T04:07:40Z DEBUG args=/usr/bin/getcert stop-tracking -i 
20280801040707
2030-07-20T04:08:11Z DEBUG Process finished, return code=1
2030-07-20T04:08:11Z DEBUG stdout=Please verify that the certmonger 
service is still running.

2030-07-20T04:08:11Z DEBUG stderr=
2030-07-20T04:08:11Z ERROR certmonger failed to stop tracking 
certificate: Command '/usr/bin/getcert stop-tracking -i 20280801040707' 
returned non-zero exit status 1

If I run the same command by hand, it removes the request without problems.

-- 
Petr³




More information about the Freeipa-devel mailing list