[Spacewalk-list] Scheduled Tasks and osa-dispatcher

Mon Jul 23 15:24:44 UTC 2012

On Mon, July 23, 2012 05:38, Jan Pazdziora wrote:
> On Fri, Jul 20, 2012 at 07:37:57PM -0400, Tom Priore wrote:
>> I'm new to spacewalk, and while I have seen posts on osa-dispatcher in
the list, I have not yet been able to fix my system. I'm running
spacewalk 1.7 on Centos 6 with postgress.  I noticed none of the
scheduled jobs are running, and troubleshooting this I discovered
errors with osa-dispatcher.  The error is below.
>>
>> I have check the rhn.conf file for the correct host name and certs.
/etc/hosts is pointing to the correct fqdn/ip.
>> Jabberd is running.
>> iptables is off.
>> selinux is disabled.
>>
>> Starting osa-dispatcher: RHN 1233 2012/07/18 18:51:29 -05:00:
>> ('Traceback (most recent call last):\n  File
>> "/usr/share/rhn/osad/jabber_lib.py", line 252, in setup_connection\n
>>  c = self._get_jabber_client(js)\n  File
>> "/usr/share/rhn/osad/jabber_lib.py", line 309, in _get_jabber_client\n
>>    c.connect()\n  File "/usr/share/rhn/osad/jabber_lib.py", line 567,
>> in connect\n    jabber.Client.connect(self)\n  File
>> "/usr/lib/python2.6/site-packages/jabber/xmlstream.py", line 488, in
connect\n    raise socket.error("Unable to connect to the host and port
specified")\nerror: Unable to connect to the host and port
specified\n',)
>
> So, are you able to telnet to that hostname, port 5222?
>
> Is anything in /var/log/audit/audit.log?
>

Caveat:  this is from experience with licensed Satellites, not spacewalk
per se.  YMMV.  Maybe Jan will weigh back in on this.

I started getting this tracback again this morning.

It appears that there is still a bug in Satellite resulting in broken
jabberd under some circumstances. One thing that happens is connections
that the server thinks are open but the clients think are down.  In "lsof
-i -P|grep 5222" you'll see ESTABLISHED on the satellite and CLOSE_WAIT on
some clients.  What's worked for me has been to kill the associated pid on
both sides (this kills the jabberd itself on the satellite side), then
restart jabberd, osa-dispatcher, and osad.

I've also had to reset jabberd sometimes like this:

#!/bin/bash
# reset jabberd database entries per case 00627769
set -x
service osa-dispatcher stop
service jabberd stop
rm -f /var/lib/jabberd/db/*
 sqlplus $(spacewalk-cfg-get default_db) <<ENDOFSQL
delete from rhnPushDispatcher;
delete from rhnpushclient;
commit;
quit;
ENDOFSQL
service jabberd start
service osa-dispatcher start