[Linux-cluster] Ricci doesn't work

Jan Pokorný jpokorny at redhat.com
Thu Sep 6 10:11:02 UTC 2012


On 05/09/12 16:21 +0000, Chip Burke wrote:
> This gives me the same behavior.

Sorry, I can now see it was a bad workaround guess from the beginning.

I think the strace logs you provided contain good-enough information
about the issue and still scratching my head.

Part of it is that there are two sub-issues and I am not sure if
they are isolated or there is a causality relationship.


The first one is hidden and very probably innocent -- the one present
in ricci.strace.5573 -- EPIPE/SIGPIPE.  First, there is an extra empty
read because (I think) the client of ricci has shutdown the connection
first (seems like ungraceful way of disconnection, but still tolerable),
but ricci side, despite this fact, is trying to send closure notify
message so as to achieve expected graceful disconnection.
Apparently, this fails in this case, accompanied by EPIPE/SIGPIPE.


However, the second is one -- unability to proceed the request, failing
upon timeout as can be seen in ricci.strace.5575 -- is severe.
> read(5, "\27\3\1\0p", 5)                = 5
> read(5, "\373\303\16\20>\202%\34\211\214b\\l\260\354\3662\312\272\21<\t\r\235S\31o\361\21\265\266p"..., 112) = 112
Here the two trailing bytes out of first five (0x00 0x70) indicates the
whole size of the message that is indeed read as expected (112).
Ricci should *not* keep trying to read pass this point as the whole XML
message should have been received at this moment.  But for some
reason it does (see subsequent poll with POLLIN flag).

The easiest explanation is that this XML is not well-formed, which
would boil down to your obfuscated password (not offending it,
it's highly reasonable).  Did you password contain any XML-nonfriendly
character, such as one of '<>"&'?  If so, could you please try digits,
ASCII letters and surely-safe characters only (dot, dash, etc.)?


As outlined, these two issues can be even interconnected (having
OpenSSL error queue at the main thread, which does not get cleared
explictly as it probably should, in mind).  I am going to look more
into it (perhaps put together simple client for you to try) after
knowing your situation with the password.

If there is nothing suspicious about your password to authenticate
against ricci, the inverse of previously suggested workaround could
be tried (manually pre-authenticating ccs against ricci);
from the host ccs is being run at, something along the lines:

$ ccs ...  # if ~/.ccs/cacert.pem does not exist yet
$ RICCIHOST=machina
$ RICCICERT=$(mktemp -u /var/lib/ricci/certs/clients/client_cert_XXXXXX)
$ scp ~/.ccs/cacert.pem root@$RICCIHOST:$RICCICERT
$ ssh root@$RICCIHOST chown ricci:root $RICCICERT
$ ssh root@$RICCIHOST restorecon $RICCICERT  # if using SELinux
$ ssh root@$RICCIHOST service ricci restart
$ ccs ...

Thanks,
Jan




More information about the Linux-cluster mailing list