[Linux-cluster] Error: ClientSocket(String): connect() failed: No such file or directory
Jan Pokorný
jpokorny at redhat.com
Mon Jun 22 17:26:14 UTC 2015
Hello Megan,
On 04/06/15 08:23 -0400, Megan . wrote:
> On Wed, Jun 3, 2015 at 10:31 AM, Megan . <nagemnna at gmail.com> wrote:
[...]
> FYI - i talked to our network folks and it looks like they were doing some
> testing last night with port failover which may or may not have caused this
unlikely, unless you were "lucky" enough to contact a different
actual machine under the network address than you intended or if
modclusterd was fragile enough to break on these intermittent
changes (not exactly sure what you mean with "port failover" TBH).
Indicated error:
>> Error: ClientSocket(String): connect() failed: No such file or directory
means that modclusterd on particular node was not running (by itself,
this is still OK) and it could not be started within 8 seconds, which is
what modcluster (ricci's helper, but from clustermon package) tries to
do if the socket /var/run/clumond.sock (indication of running modclusterd)
cannot be reached (for whatever reason, including SELinux, but that
should be OK as well).
So if the problem recidivates, definitely check the troubling node if:
- modclusterd service is running and/or is able to start (provide
/var/run/clumond.sock socket) within 5 seconds or so under the
typical workload (may be subtle in virtualized environment)
- when modclusterd is started, /var/run/clumond.sock exists and has
the expected properties (file-like socket, expected permissions)
- SELinux (if enabled) audit contains any clumond.sock or modclusterd
reference
> issue. However, I was able to correct it by fencing the problem nodes.
Provided that those "port failover" shakes were settled down by that
time, perhaps modclusterd just started to be happy again and not
failing anymore if it was the case previously.
>> Anybody ever seen "Error: ClientSocket(String): connect() failed: No such
>> file or directory" when doing a start all? Something seems to have
>> broken with our closer. Our UAT setup works as expected. I looked at
>> tcpdumps the best that i could (i'm not a network person though) and i
>> didn't see anything obvious. I shutdown iptables on all nodes.
FWIW, most if not all of the packet sniffing tools cannot hook into local
file-like sockets.
>> We are running Centos 6,6, ccs-0.16.2-75.el6_6.1.x86_64
Good, this excluded all known (and fixed!) bugs preventing modclusterd
from operation (IPv4-only environment, huge cluster.conf).
>> cman-3.0.12.1-68.el6.x86_64. We have a 12 node cluster in production that
>> allows us to share gfs2 iscsi mounts. no other services are used. clvmd
>> -R runs fine at this time. ccs -h node --sync --activate also runs fine.
>>
>>
>> [root at admin1 ~]# ccs -h admin1-ops --startall
>> Unable to start map1-ops, possibly due to lack of quorum, try --startall
>> Error: ClientSocket(String): connect() failed: No such file or directory
>> Started cache2-ops
>> Unable to start data1-ops, possibly due to lack of quorum, try --startall
>> Error: ClientSocket(String): connect() failed: No such file or directory
>> Started map2-ops
>> Unable to start archive1-ops, possibly due to lack of quorum, try
>> --startall
>> Error: ClientSocket(String): connect() failed: No such file or directory
>> Started data3-ops
>> Started mgmt1-ops
>> Unable to start admin1-ops, possibly due to lack of quorum, try --startall
>> Error: ClientSocket(String): connect() failed: No such file or directory
>> Started data2-ops
>> Started cache1-ops
The out-of-context, hilarious hint (use --startall when you actually
do) led me to file a bug: <https://bugzilla.redhat.com/1234515>.
Thanks for indirectly showing this off!
--
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150622/a81baefb/attachment.sig>
More information about the Linux-cluster
mailing list