[Spacewalk-list] Jabberd/sm segfaults on shutdown

Sun Oct 14 23:25:37 UTC 2012

Just when I thought I got OSAD figured out on the client side, I seem to be having trouble with jabberd on the spacewalk server side. OSA pings work and clients show an online status with a time stamp in the spacewalk Web UI.  The clients, however, are not picking up jobs in their queue in the reasonably timely manner that OSAD provides.  The clients wait until their next regularly scheduled rhn_check to pick up jobs in their queue.
This is what I am seeing on a CentOS 6.3 x64 running Spacewalk 1.7:
When jabberd/c2s stops, sm immediately segfaults.  This behaviour is very consistent.  The restart output always looks like this:
[root at spacewalk ~]# service jabberd restartTerminating jabberd processes ...Stopping s2s:                                              [  OK  ]Stopping c2s:                                              [  OK  ]Stopping sm:                                               [FAILED]Stopping router:                                           [  OK  ]Initializing jabberd processes ...Starting router:                                           [  OK  ]Starting sm:                                               [  OK  ]Starting c2s:                                              [  OK  ]Starting s2s:                                              [  OK  ]
The error message from the logs is:
kernel: sm[19141]: segfault at 18 ip 0000003808b272d6 sp 00007fff49280ac8 error 4 in libc-2.12.so[3808a00000+189000]
I checked that the <id> xml tags in the c2s.xml and sm.xml both have the fqdn of the server, which matches the CN in the jabberd server.pem.
I tried opening 4 browsers and running each of 'router sm c2s and s2s' processes with the -D flag to catch any output.  Here are the tails of the c2s and sm outputs on shutdown (c2s is stopped first, then sm segfaults immediately after):
c2s====Sun Oct 14 17:58:03 2012 [notice] connection to router closedsx (sx.c:78) freeing sx for 5sx (sx.c:111) freeing 5 env pluginssx (sasl_gsasl.c:767) cleaning up conn stateSun Oct 14 17:58:03 2012 authreg_db.c:260 db module shutting downDatabase handles still open at environment closeOpen database handle: authreg.db/
sm====Sun Oct 14 17:58:03 2012 [notice] session ended: jid=osad-3b61b283b1 at spacewalk.example.com/osadSun Oct 14 17:58:03 2012 user.c:81 freeing user osad-3b61b283b1 at spacewalk.example.comSun Oct 14 17:58:03 2012 mod_privacy.c:105 freeing zebra ctxSun Oct 14 17:58:03 2012 mod_roster.c:65 freeing roster for osad-3b61b283b1 at spacewalk.example.comSegmentation fault
I've the tried the "service jabberd stop; rm -rf /var/lib/jabberd/db/*; service jabberd start" trick and that seemed to allow clients to pick up their queues quickly via OSAD, but only for a short time before failing again.
This makes me believe that something in the jabber session manager database gets corrupted over time, but I could be totally wrong.
Is it recommended to use an alternate storage driver, perhaps sqlite instead of the default berkeley db?  Should I try re-reunning spacewalk-setup-jabber again with the correct macros use during install? Does any one have any other troubleshooting steps or solutions I could try?
Thanks,Giovanni 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20121014/3f2062d2/attachment.htm>