[Spacewalk-list] Massive problems with slow updates on rhnServerAction

Wed Oct 31 09:01:57 UTC 2012

On 19.09.2012 10:13, Patrick Hurrelmann wrote:
> Hi List,
> 
> since some weeks my SW 1.7 on CentOS 6.3 is grinding to halt regularly
> and it is getting worse from day to day. Right now I have to restart it
> several times a day. The db connections to PostgreSQL fail with "FATAL:
>  sorry, too many clients already". The max connections were already
> bumped several times and is set to 300 currently. But thats not the real
> problem, it seems.
> 
> I tried to track it down and and stumbled over frequent updates on the
> table rhnServerAction that take ages (several hours for a single update
> statement) to complete. The client seems to run into a timeout and
> reissues the statements (I have some update statement several times in
> logs) while the old ones are still running until all connections are in
> use and SW grinds to halt.
> E.g.:
> 2012-09-18 11:24:31 CEST [3284]: [118-1] LOG:  duration: 8250548.876 ms
>  statement:
> 	                update rhnServerAction
> 	                    set status = 1,
> 	                        pickup_time = current_timestamp,
> 	                        remaining_tries = 3 - 1
> 	                where action_id = 6233
> 	                  and server_id = 1000010014
> 
> 2012-09-18 11:24:31 CEST [3119]: [295-1] LOG:  duration: 8248422.890 ms
>  statement:
> 	                update rhnServerAction
> 	                    set status = 1,
> 	                        pickup_time = current_timestamp,
> 	                        remaining_tries = 3 - 1
> 	                where action_id = 6252
> 	                  and server_id = 1000010007
> 	
> For each update on rhnServerAction the trigger
> rhn_server_action_mod_trig_fun() is fired, but I still can't see why the
> update should take so long. Manually analyzing the updates does not show
> anything suspicious.
> 
> My SW installation is not that big (35 clients, with osad and
> configuration management). Total database size is 2,3GB. The table
> rhnServerAction itself only has 4600 rows.
> 
> 
> Can anybody please help in this regard or shed some light on this?
> 
> Regards
> Patrick
> 

Hi all,

just an update on the issue. I think I finally got to fix this. After
reading the thread "rhn_check hangs"
(https://www.redhat.com/archives/spacewalk-list/2012-October/msg00024.html)
and the associated bugzilla entries, I tried the patch for
python-psycopg2 myself as I found similar errors in my logs and
rhn_check hung several time. And it seems to be the cure. Since I
applied the patch and built a new rpm locally I no longer have any
hanging update statements. All is running smoothly. I even could
reenable osad on the clients and disable my nightly restart of SW. There
are sill idle connection, but thats a different issue for sure.

The bugzilla entry and patch für this is
https://bugzilla.redhat.com/show_bug.cgi?id=843723. Maybe someone else
can verify this and test if this fixes their problems, too?

Is there any progress in getting this pushed upstream? From my pov this
is getting a showstopper. It seems that many problems are connected to this.

Regards
Patrick

-- 
Lobster LOGsuite GmbH, Münchner Straße 15a, D-82319 Starnberg

HRB 178831, Amtsgericht München
Geschäftsführer: Dr. Martin Fischer, Rolf Henrich