[Spacewalk-list] Osad problems

Matt Moldvan matt at moldvan.com
Thu Sep 1 19:57:22 UTC 2016


I have the same issues with 2.5 and latest OSAD packages... the connection
still looks like it's established at the client side, but for some reason
it has stopped trying to send data.  The master no longer sees the
connection as open and therefore cannot send anything to it.

The only resolution I've found is to restart the client(s), but for so many
systems this caused the dispatchers to become unresponsive during our
maintenance windows.  Essentially, Puppet would run, restart OSAD, and it
would consume all the HTTP connections and make the GUI unresponsive.
Update and reboot actions were picked up outside of the scheduled
maintenance, and it was all around chaos.

So at this point I'm stuck babysitting OSAD status of systems because there
is nothing easily found in /var/log/osad that indicates an issue, even
though the client still has 5222 open to the dispatcher and the osad
service is running.  In the Spacewalk database, the system is marked
down... I ran an strace on the OSAD process on the client for about 30
minutes, and didn't see any attempts to do anything.

[me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
osad    21996 root    3u  IPv4 7392569      0t0      TCP
osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
[me at osad-client1 ~]$ service osad status
osad (pid  21996) is running...
[me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
osad    21996 root    3u  IPv4 7392569      0t0      TCP
osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
[me at osad-client1 ~]$ sudo strace -fp 21996
Process 21996 attached
select(4, [3], [], [], NULL

---
rhnschema=# select s.name,pc.state_id from rhnpushclient pc, rhnserver s
where s.name='osad-client1' and pc.server_id=s.id;
         name          | state_id
-----------------------+----------
 osad-client1          |        2
(1 row)

Even though osad-client1 thought it was still connected, the master didn't
have a corresponding connection on 5222:
[me at spacewalk-master ~]$ netstat -a | grep osad-client1
[me at spacewalk-master ~]$

For me, changing the values in /etc/jabberd/*.xml as recommended in
https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD wasn't going to
work... I tried that and all systems would be disconnected, then would
reconnect, causing some (perhaps insignificant) load on the database as
well as unnecessary network traffic and client processing.  I could see the
number systems marked as "online" in the database flapping wildly between
1,000 and 5,000 over time.

One thing I did notice on the systems that were marked offline... a netstat
showed two connections, one in CLOSE_WAIT status and another in
ESTABLISHED.  On restart of OSAD, only one was there, in ESTABLISHED state
and the system was marked online again.

I'm thinking that the OSAD Python code isn't closing the sockets properly
when an error is encountered, and leaves the client thinking it's still
connected, while the master doesn't have a corresponding connection to send
data to.

Basically, as a workaround, I think I'm going to have systems restart OSAD
if they see connections open on 5222 in CLOSE_WAIT status... until
something better comes along and the client code is fixed up.
Unfortunately the workaround isn't even a full one... not every system had
multiple connections, but it's a step toward more systems staying usable
than before.

On Thu, Sep 1, 2016 at 1:26 PM Konstantin Raskoshnyi <konrasko at gmail.com>
wrote:

> 2.4, I tried, actually after I did spacewalk-service restart it helped for
> one day.
>
> Now it's the same, but no any errors on both sides.
>
> On Wed, Aug 31, 2016 at 9:06 AM, Matthew Madey <mattmadey at gmail.com>
> wrote:
>
>> What version of Spacewalk are you running? You likely need to reset the
>> osad credentials on the clients. This typically only occurs when the jabber
>> database has been corrupted.
>>
>> On the clients, run the below commands:
>>
>>
>> rm -f /etc/sysconfig/rhn/osad-auth.conf ; service osad restart
>>
>> You may find the below links helpful
>>
>> https://fedorahosted.org/spacewalk/wiki/OsadHowTo
>>
>> https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD
>>
>>
>>
>>
>> On Aug 30, 2016 4:43 PM, "Konstantin Raskoshnyi" <konrasko at gmail.com>
>> wrote:
>>
>>> Something strange with some of my osad clients ~1/3
>>>
>>> They don't pickup any jobs from osa-dispatcher, no any errors during
>>> starting the service,
>>>
>>> also if I restart osad on sp I see logs:
>>>
>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142]
>>> [::ffff:172.90.7.220, port=43046] disconnect
>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad, packets: 29, bytes:
>>> 3738
>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session ended:
>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: user unloaded
>>> jid=osad-e43e3265db at spacewalk15.ooma.internal
>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] traditional.digest
>>> authentication succeeded: osad-e43e3265db@/osad ::ffff:
>>> 172.90.7.220:43454 TLS
>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] requesting
>>> session: jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session started:
>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>
>>> So looks like everything should be fine
>>>
>>> _______________________________________________
>>> Spacewalk-list mailing list
>>> Spacewalk-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160901/d83034f1/attachment.htm>


More information about the Spacewalk-list mailing list