[Spacewalk-list] Ongoing jabberd/osad issues.

Robert Paschedag robert.paschedag at web.de
Thu Aug 18 07:30:20 UTC 2016


Hi Daryl,

as long as there are no error messages within the logs that there seems to be an error with the jabber db, I wouldn't do anything with the db.

As I earlier wrote, I only had to repair the db once within about 3 1/2 years.

So, what I would do now is to really delete the jabber db (back it up... just in case) to start up with a"clean " install. If the clients (that already have authentication information) do not re-register automatically, you should go to the client, stop osad, remove /etc/sysconfig/rhn/osad-auth.conf and start osad again. The client should then register and you should see the status on the web GUI as "online". If not, check the /var/log/rhn/osad.log on the client  (if I remember correct right now) and osa-dispatcher logs in the server.

I also wrote, that my spacewalk servers are NOT clients of themselves. I don't think, that should be a problem but just for " testing " you should deactivate osad "client" on the spacewalk server.

Start with one test server.

Good luck.

Regards
Robert
Am 17.08.2016 20:43 schrieb Daryl Rose <darylrose at outlook.com>:
>
> I've posted here issues that I've had with jabberd and osad, as have others.  But I haven't gotten things resolved, so I am posting additional information.
>
>
> I put SW into production about a year ago.  After a period of time, I noticed issues with the WUI and servers not reporting correctly and other issues.  Google searches show that I need to shutdown spacewalk and remove all the contents in /var/lib/jabberd/db.   This seemed to work, but after a few months, I realized that osad was no longer communicating with osa-dispatcher.   
>
>
> I started doing some additional research and learned that was not a good way to resolve this issue.  According to the official Spacewalk documentation, I should create a checkpoint and then clean up log files keeping the database and auth database files.   
>
>
> https://fedorahosted.org/spacewalk/wiki/JabberDatabase
>
> JabberDatabase – spacewalk - Fedora Hosted
> fedorahosted.org
> Jabber Database. Spacewalk utilizes Jabber to facilitate communications between the server and the clients for osa-dispatcher/osad. The Jabber program uses the ...
> These are the steps that I followed:
>
>
> /usr/bin/db_checkpoint -1 -h /var/lib/jabberd/db/ ## mark logs for deletion
> /usr/bin/db_archive -d -h /var/lib/jabberd/db/  ## delete logs
> service jabberd restart
>
> However, this also causes problems with jabberd and osad.  If I use the commands as the documentation instructs, then osa-dispatcher will start, but die, and I get errors in the log that there is an invalid password. 
>
>
> So to help explain my issue, I ran a test and tried to capture everything that I could and I'll post it here.
>
>
> 1. Listing of /var/lib/jabberd/db
>
> [root@<spwalk-server> db]# ls 
> __db.001  __db.006        log.0000000004  log.0000000009  log.0000000014  log.0000000019  log.0000000024  sm.db
> __db.002  authreg.db      log.0000000005  log.0000000010  log.0000000015  log.0000000020  log.0000000025
> __db.003  log.0000000001  log.0000000006  log.0000000011  log.0000000016  log.0000000021  log.0000000026
> __db.004  log.0000000002  log.0000000007  log.0000000012  log.0000000017  log.0000000022  log.0000000027
> __db.005  log.0000000003  log.0000000008  log.0000000013  log.0000000018  log.0000000023  log.0000000028
>
> 2. Spacewalk Server Status
>
> [root@<spwalk-server> db]# spacewalk-service status
> postmaster (pid  1175) is running...
> router (pid 21431) is running...
> sm (pid 21441) is running...
> c2s (pid 21451) is running...
> s2s (pid 21461) is running...
> tomcat6 (pid 1304) is running...                           [  OK  ]
> httpd (pid  1385) is running...
> osa-dispatcher (pid  21479) is running...
> rhn-search is running (1441).
> cobblerd (pid 1491) is running...
> RHN Taskomatic is running (1515).
>
> 3.  Most recent log file entry:
>
> 2016/08/17 07:44:13 -05:00 21476 0.0.0.0: osad/jabber_lib.__init__
> 2016/08/17 07:44:13 -05:00 21476 0.0.0.0: osad/jabber_lib.setup_connection('Connected to jabber server', '<spwalk-server>.com')
> 2016/08/17 07:44:13 -05:00 21476 0.0.0.0: osad/osa_dispatcher.fix_connection('Upstream notification server started on port', 1290)
> 2016/08/17 07:44:14 -05:00 21476 0.0.0.0: osad/jabber_lib.process_forever
>
> 4.  Ran the commands as instructed in the jabberd documentation.
>
> /usr/bin/db_checkpoint -1 -h /var/lib/jabberd/db/ ## mark logs for deletion
> /usr/bin/db_archive -d -h /var/lib/jabberd/db/  ## delete logs
> service jabberd restart
>
> 5.  Log file entry:
>
> 2016/08/17 13:28:19 -05:00 21476 0.0.0.0: osad/jabber_lib.main('ERROR', 'Traceback (most recent call last):\n  File "/usr/share/rhn/osad/jabber_lib.py", line 121, in main\n    self.process_forever(c)\n  File "/usr/share/rhn/osad/jabber_lib.py", line 179, in process_forever\n    self.process_once(client)\n  File "/usr/share/rhn/osad/osa_dispatcher.py", line 187, in process_once\n    client.retrieve_roster()\n  File "/usr/share/rhn/osad/jabber_lib.py", line 729, in retrieve_roster\n    stanza = self.get_one_stanza()\n  File "/usr/share/rhn/osad/jabber_lib.py", line 801, in get_one_stanza\n    self.process(timeout=tm)\n  File "/usr/share/rhn/osad/jabber_lib.py", line 1055, in process\n    data = self._read(self.BLOCK_SIZE)\nSSLError: (\'OpenSSL error; will retry\', "(-1, \'Unexpected EOF\')")\n')
> 2016/08/17 13:28:29 -05:00 21476 0.0.0.0: osad/jabber_lib.__init__
> 2016/08/17 13:28:29 -05:00 21476 0.0.0.0: osad/jabber_lib.setup_connection('Connected to jabber server', '<spwalk-server>.com')
> 2016/08/17 13:28:29 -05:00 21476 0.0.0.0: osad/jabber_lib.register('ERROR', 'Invalid password')
>
> 6.  Spacewalk server status
>
> [root@<spwalk-server> db]# spacewalk-service status
> postmaster (pid  1175) is running...
> router (pid 27119) is running...
> sm (pid 27129) is running...
> c2s (pid 27139) is running...
> s2s (pid 27149) is running...
> tomcat6 (pid 1304) is running...                           [  OK  ]
> httpd (pid  1385) is running...
> osa-dispatcher dead but pid file exists
> rhn-search is running (1441).
> cobblerd (pid 1491) is running...
> RHN Taskomatic is running (1515).
>
> 7. Long listing of /var/lib/jabberd/db
>
> [root@<spwalk-server> db]# ls -l
> total 7536
> -rw-r-----. 1 jabber jabber    24576 Aug 17 13:28 __db.001
> -rw-r-----. 1 jabber jabber   204800 Aug 17 13:29 __db.002
> -rw-r-----. 1 jabber jabber   270336 Aug 17 13:29 __db.003
> -rw-r-----. 1 jabber jabber    98304 Aug 17 13:29 __db.004
> -rw-r-----. 1 jabber jabber   753664 Aug 17 13:29 __db.005
> -rw-r-----. 1 jabber jabber    57344 Aug 17 13:29 __db.006
> -rw-r-----. 1 jabber jabber   368640 Aug 17 07:46 authreg.db
> -rw-r-----. 1 jabber jabber 10485760 Aug 17 13:29 log.0000000031
> -rw-r-----. 1 jabber jabber   487424 Aug 17 13:29 sm.db
>
> So, neither completely cleaning out jabberd database/log files works, and creating a checkpoint and removing log files that need to be cleaned out doesn't' work, so what can I do to get jabberd and osad to work, and to push out updates when I need to push them out?
>
>
> Thank you.
>
>
> Daryl
>
>
>
>
>
>
>
>
>
>
>
>
>




More information about the Spacewalk-list mailing list