[Spacewalk-list] cleanup-data-bunch takes forever

Gerald Vogt vogt at spamcop.net
Thu May 28 04:57:24 UTC 2015


Eventually cleanup-data-bunch finished after 8:40 h. Definitively not
suitable to run daily.

I don't quite understand why the query runs with two sequential scans
and does not use the indexes.

Does anyone have an idea how to optimize that? Is that always taking so
long? I am using CentOS 6.6, Spacewalk 2.3, PostgreSQL 8.4.20.

For now, I'll schedule that task monthly...

Thanks,

Gerald

On 27.05.15 13:39, Gerald Vogt wrote:
> Hi!
> 
> It seems cleanup-data-bunch takes a very long time.
> 
> I have noticed that it was in state "INTERRUPTED" and did not run for a
> couple of months. I have applied the changes as suggested in
> 
> https://www.redhat.com/archives/spacewalk-list/2015-May/msg00091.html
> 
> and now cleanup-data-bunch does run again. It's however very slow.
> Currently, it is already running for 2:15 h and still goes on.
> 
> I can see with pg_stat_activity that is running this query:
> 
> DELETE FROM rhnPackageChangeLogData
>          WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
> rhnPackageChangeLogRec )
> 
> which seems to keep the database busy for a while:
> 
>> spacewalk=# explain verbose DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>>                                                 QUERY PLAN                                                 
>> -----------------------------------------------------------------------------------------------------------
>>  Seq Scan on rhnpackagechangelogdata  (cost=803563.75..215621093.82 rows=201534 width=6)
>>    Output: 
>>    Filter: (NOT (SubPlan 1))
>>    SubPlan 1
>>      ->  Materialize  (cost=803563.75..804467.22 rows=64947 width=8)
>>            Output: rhnpackagechangelogrec.changelog_data_id
>>            ->  Unique  (cost=781080.12..803244.80 rows=64947 width=8)
>>                  Output: rhnpackagechangelogrec.changelog_data_id
>>                  ->  Sort  (cost=781080.12..792162.46 rows=4432935 width=8)
>>                        Output: rhnpackagechangelogrec.changelog_data_id
>>                        Sort Key: rhnpackagechangelogrec.changelog_data_id
>>                        ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..109859.35 rows=4432935 width=8)
>>                              Output: rhnpackagechangelogrec.changelog_data_id
>> (13 rows)
>>
>> spacewalk=# select count(*) from rhnpackagechangelogrec;
>>   count  
>> ---------
>>  5112795
>> (1 row)
>>
>> spacewalk=# select count(*) from rhnpackagechangelogdata;
>>  count  
>> --------
>>  401000
>> (1 row)
>>
> 
> If it is taking this long I guess there is no point scheduling it daily.
> 
> Any thoughts?
> 
> Thanks,
> 
> Gerald
> 
> 
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
> 




More information about the Spacewalk-list mailing list