[Spacewalk-list] Massive problems with slow updates on rhnServerAction

Jonathan Scott lists at xistenz.org
Fri Sep 21 13:55:14 UTC 2012


So, after the changes below, I uncommented my nightly errata cron and let
it fly. This morning the app is completely unresponsive (web and other
methods such as yum checks against it). There are ~570 postgres processes
running (258 httpd), of that, the breakdown is as follows:

- 40 idle in transaction
- 225 UPDATE waiting (postgres says these are "update rhnServerAction set
status =1")
-305 idle

I am able to work with the database on its own just fine (it is responding
to connection attempts and general poking around queries with an external
PGAdmin client). The server its self doesn't appear under much stress save
the 570 postgres processes eating memory and httpd processes all being
dumped into swap due to the backup.

If I restart postgres I get a flood of traceback emails complaining that
the database is gone, then the app returns to some level of functionality.
/var/lib/pgsql/data/base/pgsql_tmp/ is empty.

Any thoughts?
Jonathan

On Thu, Sep 20, 2012 at 2:53 PM, Jonathan Scott <lists at xistenz.org> wrote:

> - Forgot about ulimit, not sure why... I deal with Oracle almost daily.
> I've adjusted my ulimit nofile setting (both soft and hard):
>
> # ulimit -a | grep files
> open files                      (-n) 4096
>
> # ulimit -aH | grep files
> open files                      (-n) 65000
>
> - There is a /var/lib/pgsql/data/base/pgsql_tmp/ folder, but it is empty
> at the moment; I will keep an eye on it.
>
> - I have the keepalive changed at the sysctl level as every time I do so
> on the postgresql side, I cannot do so much as cancel a scheduled task
> without the app going 503 on me. Neither sysctl or postgresql.conf changes
> appear to help in this case, regardless of what I set them to.
>
> Thanks again for your time and assistance with this frustrating issue o'
> mine,
> Jonathan
>
>
> On Thu, Sep 20, 2012 at 2:09 PM, Paul Robert Marino <prmarino1 at gmail.com>wrote:
>
>> well thats one part of it but also
>> ulimit -n
>> which can be set persistently with a nofile entry in
>> /etc/security/limits.conf or equivalent file in
>> /etc/security/limits.d/
>>
>> there is also one more thing do you have a
>> /var/lib/pgsql/data/base/pgsql_tmp/ directory and does it contain
>> files. the existence of that directory indicates there was a query
>> that required more memory than it was allowed to use.
>>
>>
>> I suspect you may have a query thats timing out and the space walk
>> kills the connection uncleanly and leaves behind artifact connections.
>> or possibly you may be reaching a file handle limit on one of the
>> spacewalk processes and keep in mind an network socket is counted as a
>> file.
>>
>>
>>
>> also in the postgresql.conf look at the section about TCP keepalive
>>
>>
>> "
>> # - TCP Keepalives -
>> # see "man 7 tcp" for details
>>
>> #tcp_keepalives_idle = 0                # TCP_KEEPIDLE, in seconds;
>>                                         # 0 selects the system default
>> #tcp_keepalives_interval = 0            # TCP_KEEPINTVL, in seconds;
>>                                         # 0 selects the system default
>> #tcp_keepalives_count = 0               # TCP_KEEPCNT;
>>                                         # 0 selects the system default
>> "
>>
>> note all of these have have a value greater than 1
>> setting these flags should cull any connections from clients that are
>> no longer there.
>>
>> On Thu, Sep 20, 2012 at 1:42 PM, Jonathan Scott <lists at xistenz.org>
>> wrote:
>> > By this, do you mean the "fs.file-max" kernel parameter? If so, no I
>> have
>> > not; I am still at the default.
>> >
>> > - Jonathan
>> >
>> >
>> > On Thu, Sep 20, 2012 at 1:12 PM, Paul Robert Marino <
>> prmarino1 at gmail.com>
>> > wrote:
>> >>
>> >> Also did either of you tune the max open file limit
>> >>
>> >> On Sep 20, 2012 1:08 PM, "Paul Robert Marino" <prmarino1 at gmail.com>
>> wrote:
>> >>>
>> >>> I think I may have some idea on what may be causing this but I haven't
>> >>> had time to look. At it yet. Did eitherof you tune the sort memory or
>> >>> working memory in your postgres.conf
>> >>>
>> >>> On Sep 20, 2012 10:20 AM, "Patrick Hurrelmann"
>> >>> <patrick.hurrelmann at lobster.de> wrote:
>> >>>>
>> >>>> On 20.09.2012 16:04, Jonathan Scott wrote:
>> >>>> > Paul,
>> >>>> >
>> >>>> > This is reading like the exact same issue you and I discussed
>> on-list
>> >>>> > a
>> >>>> > few weeks ago. I had closed out that thread as resolved, but the
>> issue
>> >>>> > has since creeped its way back up. Patrick breaks it down well, I
>> too
>> >>>> > just get a pile up of "idle in transaction" db connections which do
>> >>>> > not
>> >>>> > clear with any configuration change I have made (tcp timeout, idle
>> >>>> > timeout and connection limit adjustments in postgresql.conf); a
>> >>>> > restart
>> >>>> > of all associated services gives me about 3-5 days before the app
>> >>>> > becomes unresponsive.
>> >>>> >
>> >>>> > Patrick, may I ask how are you loading your errata?
>> >>>> >
>> >>>> > - Jonathan
>> >>>>
>> >>>> As a short update: I missed one node that had osad still running.
>> Osad
>> >>>> was disabled there as well. I no longer have any update queries
>> waiting.
>> >>>> There are still some idle transactions, but the number is way lower
>> now.
>> >>>> I have the strong feeling that this is all connected osad and push to
>> >>>> clients. Anyone else?
>> >>>>
>> >>>> But back to your question. I'm running David Nutter's
>> centos-errata.py
>> >>>> on a nightly basis directly after a spacewalk restart.
>> >>>>
>> >>>> Regards
>> >>>> Patrick
>> >>>>
>> >>>> --
>> >>>> Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg
>> >>>>
>> >>>> HRB 178831, Amtsgericht München
>> >>>> Geschäftsführer: Dr. Martin Fischer, Rolf Henrich
>> >>>>
>> >>>> _______________________________________________
>> >>>> Spacewalk-list mailing list
>> >>>> Spacewalk-list at redhat.com
>> >>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>> >>
>> >>
>> >> _______________________________________________
>> >> Spacewalk-list mailing list
>> >> Spacewalk-list at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
>> >
>> >
>> >
>> > _______________________________________________
>> > Spacewalk-list mailing list
>> > Spacewalk-list at redhat.com
>> > https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20120921/23a31a7c/attachment.htm>


More information about the Spacewalk-list mailing list