[Linux-cluster] Ricci doesn't work

Chip Burke CBurke at innova-partners.com
Thu Sep 6 15:11:17 UTC 2012

Well that was an easy enough fix finally. I thought perhaps the password
for the VMWare fence account was the issue and updated cluster.conf with a
place holder password of 'password'. Ricci would not work. So I updated
the actual ricci user account to use a password of 'password' and
restarted Ricci on all of the nodes. Ricci now works. So indeed, it
certainly did not like a character in the password I was using which was
65peC&E$taFRE&U. In all likelihood the & was the problem character. To
confirm that hypothesis, I changed the Ricci password to 65peC&E$taFREU
and everything still worked as expected. So there is our answer. From your
stand point I don't know if that needs to be coded around or what, but at
least we know how to reproduce the issue.

Thanks again for sticking with me on this even if the cause was somewhat

Chip Burke

On 9/6/12 6:11 AM, "Jan Pokorný" <jpokorny at redhat.com> wrote:

>On 05/09/12 16:21 +0000, Chip Burke wrote:
>> This gives me the same behavior.
>Sorry, I can now see it was a bad workaround guess from the beginning.
>I think the strace logs you provided contain good-enough information
>about the issue and still scratching my head.
>Part of it is that there are two sub-issues and I am not sure if
>they are isolated or there is a causality relationship.
>The first one is hidden and very probably innocent -- the one present
>in ricci.strace.5573 -- EPIPE/SIGPIPE.  First, there is an extra empty
>read because (I think) the client of ricci has shutdown the connection
>first (seems like ungraceful way of disconnection, but still tolerable),
>but ricci side, despite this fact, is trying to send closure notify
>message so as to achieve expected graceful disconnection.
>Apparently, this fails in this case, accompanied by EPIPE/SIGPIPE.
>However, the second is one -- unability to proceed the request, failing
>upon timeout as can be seen in ricci.strace.5575 -- is severe.
>> read(5, "\27\3\1\0p", 5)                = 5
>> read(5, 
>>o\361\21\265\266p"..., 112) = 112
>Here the two trailing bytes out of first five (0x00 0x70) indicates the
>whole size of the message that is indeed read as expected (112).
>Ricci should *not* keep trying to read pass this point as the whole XML
>message should have been received at this moment.  But for some
>reason it does (see subsequent poll with POLLIN flag).
>The easiest explanation is that this XML is not well-formed, which
>would boil down to your obfuscated password (not offending it,
>it's highly reasonable).  Did you password contain any XML-nonfriendly
>character, such as one of '<>"&'?  If so, could you please try digits,
>ASCII letters and surely-safe characters only (dot, dash, etc.)?
>As outlined, these two issues can be even interconnected (having
>OpenSSL error queue at the main thread, which does not get cleared
>explictly as it probably should, in mind).  I am going to look more
>into it (perhaps put together simple client for you to try) after
>knowing your situation with the password.
>If there is nothing suspicious about your password to authenticate
>against ricci, the inverse of previously suggested workaround could
>be tried (manually pre-authenticating ccs against ricci);
>from the host ccs is being run at, something along the lines:
>$ ccs ...  # if ~/.ccs/cacert.pem does not exist yet
>$ RICCIHOST=machina
>$ RICCICERT=$(mktemp -u /var/lib/ricci/certs/clients/client_cert_XXXXXX)
>$ scp ~/.ccs/cacert.pem root@$RICCIHOST:$RICCICERT
>$ ssh root@$RICCIHOST chown ricci:root $RICCICERT
>$ ssh root@$RICCIHOST restorecon $RICCICERT  # if using SELinux
>$ ssh root@$RICCIHOST service ricci restart
>$ ccs ...
>Linux-cluster mailing list
>Linux-cluster at redhat.com

More information about the Linux-cluster mailing list