Hi all,<br><br>I'm having a odd problem with my 2 node cluster.<br><br>Everything starts up fine, I can relocate, restart, stop, start services fine. I did perform all rg_test tests that were possible and everything is working as designed.<br>
<br>However, for some weird reason rg_manager is freezing.<br><br>Yesterday, I raised the log_level to 7 and also started clurgmgrd with the "-d" option. What happened is that is started to print all the status check to my logging facility (which is what I expected to happen), however, around 8 hours later it stopped and clurgmgrd completely freezes and do not respond to any "clusvcadm" commands. It also stopped to print anything to my log files.<br>
<br>The only solution is to restart rg_manager on both nodes.<br><br>I tried to attach a strace to clurgmgrd process PID and got some timeout errors, such as:<br><br><div style="margin-left: 80px; font-family: courier new,monospace;">
select(12, [10 11], NULL, NULL, {0, 908000}) = 0 (Timeout)<br></div><br>Although all the socks and FIFOs files are in place and with the correct permissions. cman_tool, and ccs_tool are working just fine. <br><br>I noted that clurgmgrd isn't forking as it is expected to.<br>
<br>I also executed strace with the clusvcadm, here is the whole output.<br><br><div style="margin-left: 80px; font-family: courier new,monospace;">[root@node1 ~]# strace clusvcadm -R "Service Web"<br>execve("/usr/sbin/clusvcadm", ["clusvcadm", "-R", "Service Web"], [/* 18 vars */]) = 0<br>
brk(0)                                  = 0x83a4000<br>access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)<br>open("/etc/ld.so.cache", O_RDONLY)      = 3<br>fstat64(3, {st_mode=S_IFREG|0644, st_size=40664, ...}) = 0<br>
mmap2(NULL, 40664, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fa6000<br>close(3)                                = 0<br>open("/usr/lib/libcman.so.2", O_RDONLY) = 3<br>read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000MA\0004\0\0\0"..., 512) = 512<br>
fstat64(3, {st_mode=S_IFREG|0755, st_size=17368, ...}) = 0<br>mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa5000<br>mmap2(0x414000, 18952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x414000<br>
mmap2(0x418000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x418000<br>close(3)                                = 0<br>open("/lib/libpthread.so.0", O_RDONLY)  = 3<br>read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\330\265\0004\0\0\0"..., 512) = 512<br>
fstat64(3, {st_mode=S_IFREG|0755, st_size=125612, ...}) = 0<br>mmap2(0xb59000, 90592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb59000<br>mmap2(0xb6c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12) = 0xb6c000<br>
mmap2(0xb6e000, 4576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6e000<br>close(3)                                = 0<br>open("/lib/libdl.so.2", O_RDONLY)       = 3<br>read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P:\265\0004\0\0\0"..., 512) = 512<br>
fstat64(3, {st_mode=S_IFREG|0755, st_size=16428, ...}) = 0<br>mmap2(0xb53000, 12408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb53000<br>mmap2(0xb55000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1) = 0xb55000<br>
close(3)                                = 0<br>open("/usr/lib/libncurses.so.5", O_RDONLY) = 3<br>read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`\204\252\0034\0\0\0"..., 512) = 512<br>fstat64(3, {st_mode=S_IFREG|0755, st_size=297464, ...}) = 0<br>
mmap2(0x3a9a000, 297220, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3a9a000<br>mmap2(0x3ada000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x40) = 0x3ada000<br>mmap2(0x3ae2000, 2308, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ae2000<br>
close(3)                                = 0<br>open("/lib/libc.so.6", O_RDONLY)        = 3<br>read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\237\237\0004\0\0\0"..., 512) = 512<br>fstat64(3, {st_mode=S_IFREG|0755, st_size=1606808, ...}) = 0<br>
mmap2(0x9e4000, 1324452, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x110000<br>mmap2(0x24e000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13e) = 0x24e000<br>mmap2(0x251000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x251000<br>
close(3)                                = 0<br>mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa4000<br>mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa3000<br>
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7fa36c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0<br>mprotect(0x24e000, 8192, PROT_READ)     = 0<br>
mprotect(0xb55000, 4096, PROT_READ)     = 0<br>mprotect(0xb6c000, 4096, PROT_READ)     = 0<br>mprotect(0x9e0000, 4096, PROT_READ)     = 0<br>munmap(0xb7fa6000, 40664)               = 0<br>set_tid_address(0xb7fa3708)             = 2488<br>
set_robust_list(0xb7fa3710, 0xc)        = 0<br>futex(0xbfdb0074, FUTEX_WAKE_PRIVATE, 1) = 0<br>rt_sigaction(SIGRTMIN, {0xb5d3d0, [], SA_SIGINFO}, NULL, 8) = 0<br>rt_sigaction(SIGRT_1, {0xb5d2e0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0<br>
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0<br>getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0<br>uname({sys="Linux", node="node1", ...}) = 0<br>geteuid32()                             = 0<br>
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0<br>brk(0)                                  = 0x83a4000<br>brk(0x83c5000)                          = 0x83c5000<br>socket(PF_FILE, SOCK_STREAM, 0)         = 3<br>
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0<br>connect(3, {sa_family=AF_FILE, path="/var/run/cman_client"}, 110) = 0<br>open("/dev/zero", O_RDONLY)             = 4<br>writev(3, [{"NAMC\3\0\0\20\24\0\0\0\7\0\0\0\0\0\0\0", 20}], 1) = 20<br>
recv(3, "NAMC\24\0\0\0h\3\0\0\7\0\0@\0\0\0\0", 20, 0) = 20<br>read(3, "\2\0\0\0\250\1\0\0\1\0\0\0\1\0\0\0\0\0\0\0\264\2\0\0\2\0\0\0hows"..., 852) = 852<br>writev(3, [{"NAMC\3\0\0\20\24\0\0\0\7\0\0\0\0\0\0\0", 20}], 1) = 20<br>
recv(3, "NAMC\24\0\0\0h\3\0\0\7\0\0@\0\0\0\0", 20, 0) = 20<br>read(3, "\2\0\0\0\250\1\0\0\1\0\0\0\1\0\0\0\0\0\0\0\264\2\0\0\2\0\0\0hows"..., 852) = 852<br>writev(3, [{"NAMC\3\0\0\20\274\1\0\0\220\0\0\0\0\0\0\0", 20}, {"x^\372\267\0\0\0\0\0P\372\267\304@A\0008Z\372\267\300\17\236\0\0\373\332\2774\374\332\277"..., 424}], 2) = 444<br>
recv(3, "NAMC\24\0\0\0\300\1\0\0\220\0\0@\0\0\0\0", 20, 0) = 20<br>read(3, "\0\0\0\0\250\1\0\0\1\0\0\0\1\0\0\0\0\0\0\0\264\2\0\0\2\0\0\0hows"..., 428) = 428<br>fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0<br>
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7faf000<br>write(1, "Local machine trying to restart "..., 54Local machine trying to restart service:Service Web...) = 54<br>socket(PF_FILE, SOCK_STREAM, 0)         = 5<br>
connect(5, {sa_family=AF_FILE, path="/var/run/cluster/<a href="http://rgmanager.sk">rgmanager.sk</a>"}, 110) = 0<br>select(6, NULL, [5], [5], NULL)         = 1 (out [5])<br>write(5, "h\0\0\0\4\275\321?\22:\274\0\0\0\0h\0\23\205\202\0\0\0\0\0\0\0\0\0\0\0\0"..., 112) = 112<br>
select(6, [5], NULL, [5], NULL <unfinished ...><br></div><br>     As it wasn't executing what it was supposed to do I ctrl+c'ed it.<br><br>     Following are my servers info. All of them running the same kernel and versions.<br>
<br><div style="margin-left: 80px;"><span style="font-family: courier new,monospace;"># cat /etc/redhat-release </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">Red Hat Enterprise Linux Server release 5.3 (Tikanga)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"># uname -a</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">Linux hows001nex 2.6.18-128.1.10.el5PAE #1 SMP Wed Apr 29 14:24:53 EDT 2009 i686 i686 i386 GNU/Linux</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"># rpm -qa | egrep 'cman|rgm'</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">rgmanager-2.0.46-1.el5</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">cman-2.0.98-1.el5</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"># cat /etc/cluster/cluster.conf</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"><?xml version="1.0"?></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><cluster alias="CLSCLU01" config_version="43" name="CLSCLU01"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        <fence_daemon post_fail_delay="0" post_join_delay="3"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">        <clusternodes></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                <clusternode name="node1" nodeid="1" votes="1"></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        <fence></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                <method name="1"></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                                        <device name="node1-rsa"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                </method></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        </fence></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                </clusternode></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                <clusternode name="node2" nodeid="2" votes="1"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        <fence></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                                <method name="1"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                        <device name="node2-rsa"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                                </method></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        </fence></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                </clusternode></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        </clusternodes></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">        <cman expected_votes="1" two_node="1"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        <fencedevices></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                <fencedevice agent="fence_rsa" ipaddr="node1-rsa" login="username" name="node1-rsa" passwd="password"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                <fencedevice agent="fence_rsa" ipaddr="node2-rsa" login="username" name="node2-rsa" passwd="password"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        </fencedevices></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">        <rm log_level="7"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                <failoverdomains></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        <failoverdomain name="WEB" ordered="1" restricted="1"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                <failoverdomainnode name="node1" priority="1"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                                <failoverdomainnode name="node2" priority="2"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        </failoverdomain></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                </failoverdomains></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                <resources></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        <ip address="10.9.16.40" monitor_link="1"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        <ip address="10.9.16.41" monitor_link="1"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        <ip address="10.9.16.45" monitor_link="1"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        <ip address="10.9.16.46" monitor_link="1"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                </resources></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                <service autostart="1" domain="WEB" name="Service Web" recovery="relocate"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        <ip ref="10.9.16.40"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        <ip ref="10.9.16.41"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        <ip ref="10.9.16.45"/></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                        <ip ref="10.9.16.46"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                <script file="/etc/init.d/jboss423.sh" name="Script Jboss423"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                        <script file="/etc/init.d/httpd" name="Script Apache2"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                        <script file="/etc/init.d/xinetd" name="Script Xinetd"></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                                <script file="/etc/init.d/cron-user.sh" name="Script Crond User"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                        </script></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                                </script></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                        </ip></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">                </service></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        </rm></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"></cluster></span><br></div><br>Thank you,<br>
<br>- G. Felix<br>