[Spacewalk-list] Osad problems

Konstantin Raskoshnyi konrasko at gmail.com
Fri Sep 2 19:44:38 UTC 2016


No certs are fine. Unless you can't start osad without errors.

I found one bugzilla bug for spacewalk, it's related to 2.4 on rhel7. Some
problems with jabber. Probably going to update to 2.5

On Friday, September 2, 2016, Ree, Jan-Albert van <J.A.v.Ree at marin.nl>
wrote:

> Sounds like you still have certificate issues.
>
> Are you sure all clients are using the correct new cert? If unsure, you
> might want to try manually specifying the correct new cert in the
> /etc/sysconfig/rhn/osad.conf file to see if that helps
>
> Also how did you replace the certs, did you install a newer version of the
> certificates RPM?
>
>
> The database is only updated if the osa-dispatcher is running properly
>
> The following post might be of some use too, it helped me a lot recently
> debugging OSA related issues too https://www.redhat.com/
> archives/spacewalk-list/2014-May/msg00124.html​
>
>
> Regards
>
> Jan-Albert
>
>
>
> Jan-Albert van Ree | Linux System Administrator | MARIN Support Group
> MARIN | T +31 317 49 35 48 | J.A.v.Ree at marin.nl
> <javascript:_e(%7B%7D,'cvml','J.A.v.Ree at marin.nl');> | www.marin.nl
>
> [image: LinkedIn] <https://www.linkedin.com/company/marin> [image:
> YouTube] <http://www.youtube.com/marinmultimedia> [image: Twitter]
> <https://twitter.com/MARIN_nieuws> [image: Facebook]
> <https://www.facebook.com/marin.wageningen>
> MARIN news: Subsidy granted for offshore project with Ampyx Power, ECN
> and Mocean
> <http://www.marin.nl/web/News/News-items/Subsidy-granted-for-offshore-project-with-Ampyx-Power-ECN-and-Mocean.htm>
>
> ------------------------------
> *From:* spacewalk-list-bounces at redhat.com
> <javascript:_e(%7B%7D,'cvml','spacewalk-list-bounces at redhat.com');> <
> spacewalk-list-bounces at redhat.com
> <javascript:_e(%7B%7D,'cvml','spacewalk-list-bounces at redhat.com');>> on
> behalf of Konstantin Raskoshnyi <konrasko at gmail.com
> <javascript:_e(%7B%7D,'cvml','konrasko at gmail.com');>>
> *Sent:* Friday, September 02, 2016 00:59
> *To:* spacewalk-list at redhat.com
> <javascript:_e(%7B%7D,'cvml','spacewalk-list at redhat.com');>
> *Subject:* Re: [Spacewalk-list] Osad problems
>
> Yes, such a pain,
> but something interesting - I have two servers, the
> first was set up by previous employee and it was renamed manually and I
> had to replace cert on all clients & renamed the server through sp utility
> the second server was installed from the scratch and have never been
> renamed, and it works smoothly without any problems with osa-dispatcher.
>
> Anyway, thanks man
>
> On Thu, Sep 1, 2016 at 12:57 PM, Matt Moldvan <matt at moldvan.com
> <javascript:_e(%7B%7D,'cvml','matt at moldvan.com');>> wrote:
>
>> I have the same issues with 2.5 and latest OSAD packages... the
>> connection still looks like it's established at the client side, but for
>> some reason it has stopped trying to send data.  The master no longer sees
>> the connection as open and therefore cannot send anything to it.
>>
>> The only resolution I've found is to restart the client(s), but for so
>> many systems this caused the dispatchers to become unresponsive during our
>> maintenance windows.  Essentially, Puppet would run, restart OSAD, and it
>> would consume all the HTTP connections and make the GUI unresponsive.
>> Update and reboot actions were picked up outside of the scheduled
>> maintenance, and it was all around chaos.
>>
>> So at this point I'm stuck babysitting OSAD status of systems because
>> there is nothing easily found in /var/log/osad that indicates an issue,
>> even though the client still has 5222 open to the dispatcher and the osad
>> service is running.  In the Spacewalk database, the system is marked
>> down... I ran an strace on the OSAD process on the client for about 30
>> minutes, and didn't see any attempts to do anything.
>>
>> [me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
>> osad    21996 root    3u  IPv4 7392569      0t0      TCP
>> osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
>> [me at osad-client1 ~]$ service osad status
>> osad (pid  21996) is running...
>> [me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
>> osad    21996 root    3u  IPv4 7392569      0t0      TCP
>> osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
>> [me at osad-client1 ~]$ sudo strace -fp 21996
>> Process 21996 attached
>> select(4, [3], [], [], NULL
>>
>> ---
>> rhnschema=# select s.name,pc.state_id from rhnpushclient pc, rhnserver s
>> where s.name='osad-client1' and pc.server_id=s.id;
>>          name          | state_id
>> -----------------------+----------
>>  osad-client1          |        2
>> (1 row)
>>
>> Even though osad-client1 thought it was still connected, the master
>> didn't have a corresponding connection on 5222:
>> [me at spacewalk-master ~]$ netstat -a | grep osad-client1
>> [me at spacewalk-master ~]$
>>
>> For me, changing the values in /etc/jabberd/*.xml as recommended in
>> https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD wasn't going to
>> work... I tried that and all systems would be disconnected, then would
>> reconnect, causing some (perhaps insignificant) load on the database as
>> well as unnecessary network traffic and client processing.  I could see the
>> number systems marked as "online" in the database flapping wildly between
>> 1,000 and 5,000 over time.
>>
>> One thing I did notice on the systems that were marked offline... a
>> netstat showed two connections, one in CLOSE_WAIT status and another in
>> ESTABLISHED.  On restart of OSAD, only one was there, in ESTABLISHED state
>> and the system was marked online again.
>>
>> I'm thinking that the OSAD Python code isn't closing the sockets properly
>> when an error is encountered, and leaves the client thinking it's still
>> connected, while the master doesn't have a corresponding connection to send
>> data to.
>>
>> Basically, as a workaround, I think I'm going to have systems restart
>> OSAD if they see connections open on 5222 in CLOSE_WAIT status... until
>> something better comes along and the client code is fixed up.
>> Unfortunately the workaround isn't even a full one... not every system had
>> multiple connections, but it's a step toward more systems staying usable
>> than before.
>>
>> On Thu, Sep 1, 2016 at 1:26 PM Konstantin Raskoshnyi <konrasko at gmail.com
>> <javascript:_e(%7B%7D,'cvml','konrasko at gmail.com');>> wrote:
>>
>>> 2.4, I tried, actually after I did spacewalk-service restart it helped
>>> for one day.
>>>
>>> Now it's the same, but no any errors on both sides.
>>>
>>> On Wed, Aug 31, 2016 at 9:06 AM, Matthew Madey <mattmadey at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','mattmadey at gmail.com');>> wrote:
>>>
>>>> What version of Spacewalk are you running? You likely need to reset the
>>>> osad credentials on the clients. This typically only occurs when the jabber
>>>> database has been corrupted.
>>>>
>>>> On the clients, run the below commands:
>>>>
>>>>
>>>> rm -f /etc/sysconfig/rhn/osad-auth.conf ; service osad restart
>>>>
>>>> You may find the below links helpful
>>>>
>>>> https://fedorahosted.org/spacewalk/wiki/OsadHowTo
>>>>
>>>> https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD
>>>>
>>>>
>>>>
>>>>
>>>> On Aug 30, 2016 4:43 PM, "Konstantin Raskoshnyi" <konrasko at gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','konrasko at gmail.com');>> wrote:
>>>>
>>>>> Something strange with some of my osad clients ~1/3
>>>>>
>>>>> They don't pickup any jobs from osa-dispatcher, no any errors during
>>>>> starting the service,
>>>>>
>>>>> also if I restart osad on sp I see logs:
>>>>>
>>>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142]
>>>>> [::ffff:172.90.7.220, port=43046] disconnect jid=osad-e43e3265db at spacewalk15.ooma.internal/osad,
>>>>> packets: 29, bytes: 3738
>>>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session ended:
>>>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: user unloaded
>>>>> jid=osad-e43e3265db at spacewalk15.ooma.internal
>>>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142]
>>>>> traditional.digest authentication succeeded: osad-e43e3265db@/osad
>>>>> ::ffff:172.90.7.220:43454 TLS
>>>>> Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] requesting
>>>>> session: jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>>> Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session started:
>>>>> jid=osad-e43e3265db at spacewalk15.ooma.internal/osad
>>>>>
>>>>> So looks like everything should be fine
>>>>>
>>>>> _______________________________________________
>>>>> Spacewalk-list mailing list
>>>>> Spacewalk-list at redhat.com
>>>>> <javascript:_e(%7B%7D,'cvml','Spacewalk-list at redhat.com');>
>>>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>>>
>>>>
>>>> _______________________________________________
>>>> Spacewalk-list mailing list
>>>> Spacewalk-list at redhat.com
>>>> <javascript:_e(%7B%7D,'cvml','Spacewalk-list at redhat.com');>
>>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>>
>>>
>>> _______________________________________________
>>> Spacewalk-list mailing list
>>> Spacewalk-list at redhat.com
>>> <javascript:_e(%7B%7D,'cvml','Spacewalk-list at redhat.com');>
>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> <javascript:_e(%7B%7D,'cvml','Spacewalk-list at redhat.com');>
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/9d8a2ab4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imagea7cf7a.PNG
Type: image/png
Size: 253 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/9d8a2ab4/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image07a26e.PNG
Type: image/png
Size: 333 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/9d8a2ab4/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image04b02d.PNG
Type: image/png
Size: 293 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/9d8a2ab4/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image995d6d.PNG
Type: image/png
Size: 331 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/9d8a2ab4/attachment-0003.png>


More information about the Spacewalk-list mailing list