[Spacewalk-list] Osad problems

Konstantin Raskoshnyi konrasko at gmail.com
Thu Sep 1 22:59:03 UTC 2016


Yes, such a pain,
but something interesting - I have two servers, the
first was set up by previous employee and it was renamed manually and I had
to replace cert on all clients & renamed the server through sp utility
the second server was installed from the scratch and have never been
renamed, and it works smoothly without any problems with osa-dispatcher.

Anyway, thanks man

On Thu, Sep 1, 2016 at 12:57 PM, Matt Moldvan <matt at moldvan.com> wrote:

> I have the same issues with 2.5 and latest OSAD packages... the connection
> still looks like it's established at the client side, but for some reason
> it has stopped trying to send data.  The master no longer sees the
> connection as open and therefore cannot send anything to it.
>
> The only resolution I've found is to restart the client(s), but for so
> many systems this caused the dispatchers to become unresponsive during our
> maintenance windows.  Essentially, Puppet would run, restart OSAD, and it
> would consume all the HTTP connections and make the GUI unresponsive.
> Update and reboot actions were picked up outside of the scheduled
> maintenance, and it was all around chaos.
>
> So at this point I'm stuck babysitting OSAD status of systems because
> there is nothing easily found in /var/log/osad that indicates an issue,
> even though the client still has 5222 open to the dispatcher and the osad
> service is running.  In the Spacewalk database, the system is marked
> down... I ran an strace on the OSAD process on the client for about 30
> minutes, and didn't see any attempts to do anything.
>
> [me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
> osad    21996 root    3u  IPv4 7392569      0t0      TCP
> osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
> [me at osad-client1 ~]$ service osad status
> osad (pid  21996) is running...
> [me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
> osad    21996 root    3u  IPv4 7392569      0t0      TCP
> osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
> [me at osad-client1 ~]$ sudo strace -fp 21996
> Process 21996 attached
> select(4, [3], [], [], NULL
>
> ---
> rhnschema=# select s.name,pc.state_id from rhnpushclient pc, rhnserver s
> where s.name='osad-client1' and pc.server_id=s.id;
>          name          | state_id
> -----------------------+----------
>  osad-client1          |        2
> (1 row)
>
> Even though osad-client1 thought it was still connected, the master didn't
> have a corresponding connection on 5222:
> [me at spacewalk-master ~]$ netstat -a | grep osad-client1
> [me at spacewalk-master ~]$
>
> For me, changing the values in /etc/jabberd/*.xml as recommended in
> https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD wasn't going to
> work... I tried that and all systems would be disconnected, then would
> reconnect, causing some (perhaps insignificant) load on the database as
> well as unnecessary network traffic and client processing.  I could see the
> number systems marked as "online" in the database flapping wildly between
> 1,000 and 5,000 over time.
>
> One thing I did notice on the systems that were marked offline... a
> netstat showed two connections, one in CLOSE_WAIT status and another in
> ESTABLISHED.  On restart of OSAD, only one was there, in ESTABLISHED state
> and the system was marked online again.
>
> I'm thinking that the OSAD Python code isn't closing the sockets properly
> when an error is encountered, and leaves the client thinking it's still
> connected, while the master doesn't have a corresponding connection to send
> data to.
>
> Basically, as a workaround, I think I'm going to have systems restart OSAD
> if they see connections open on 5222 in CLOSE_WAIT status... until
> something better comes along and the client code is fixed up.
> Unfortunately the workaround isn't even a full one... not every system had
> multiple connections, but it's a step toward more systems staying usable
> than before.
>
> On Thu, Sep 1, 2016 at 1:26 PM Konstantin Raskoshnyi <konrasko at gmail.com>
> wrote:
>
>> 2.4, I tried, actually after I did spacewalk-service restart it helped
>> for one day.
>>
>> Now it's the same, but no any errors on both sides.
>>
>> On Wed, Aug 31, 2016 at 9:06 AM, Matthew Madey <mattmadey at gmail.com>
>> wrote:
>>
>>> What version of Spacewalk are you running? You likely need to reset the
>>> osad credentials on the clients. This typically only occurs when the jabber
>>> database has been corrupted.
>>>
>>> On the clients, run the below commands:
>>>
>>>
>>> rm -f /etc/sysconfig/rhn/osad-auth.conf ; service osad restart
>>>
>>> You may find the below links helpful
>>>
>>> https://fedorahosted.org/spacewalk/wiki/OsadHowTo
>>>
>>> https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD
>>>
>>>
>>>
>>>
>>> On Aug 30, 2016 4:43 PM, "Konstantin Raskoshnyi" <konrasko at gmail.com>
>>> wrote:
>>>
>>>> Something strange with some of my osad clients ~1/3
>>>>
>>>> They don't pickup any jobs from osa-dispatcher, no any errors during
>>>> starting the service,
>>>>
>>>> also if I restart osad on sp I see logs:
>>>>
>>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142]
>>>> [::ffff:172.90.7.220, port=43046] disconnect jid=osad-e43e3265db@
>>>> spacewalk15.ooma.internal/osad, packets: 29, bytes: 3738
>>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session ended:
>>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: user unloaded
>>>> jid=osad-e43e3265db at spacewalk15.ooma.internal
>>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142]
>>>> traditional.digest authentication succeeded: osad-e43e3265db@/osad
>>>> ::ffff:172.90.7.220:43454 TLS
>>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] requesting
>>>> session: jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session started:
>>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>>
>>>> So looks like everything should be fine
>>>>
>>>> _______________________________________________
>>>> Spacewalk-list mailing list
>>>> Spacewalk-list at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>>
>>>
>>> _______________________________________________
>>> Spacewalk-list mailing list
>>> Spacewalk-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160901/e636fceb/attachment.htm>


More information about the Spacewalk-list mailing list