[Spacewalk-list] Run away DELETE query eating my disk space

Wed Apr 30 23:21:43 UTC 2014

On 30/04/14 14:05, Jon Miller wrote:
> Before asking for more disk space, I scheduled some down time to perform a
> "VACUUM FULL VERBOSE ANALYZE" on my PostgreSQL database today along with
> rebuilding all of my indexes in the DB. After bringing Spacewalk back up
> again, the DELETE query commenced again and is right back in the same cycle
> again. I was hoping that a full maintenance cycle would magically fix things
> for me.
>
> So more detail on my situation. Here is the EXPLAIN plan which also gives you
> insight into our DB size with the row counts:
>
>    EXPLAIN DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>                                                  QUERY PLAN
>    ------------------------------------------------------------------------------------------------------
>     Seq Scan on rhnpackagechangelogdata  (cost=318312.96..134742963.76 rows=192456 width=6)
>       Filter: (NOT (SubPlan 1))
>       SubPlan 1
>         ->  Materialize  (cost=318312.96..318905.08 rows=42512 width=8)
>               ->  HashAggregate  (cost=317678.33..318103.45 rows=42512 width=8)
>                     ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..281315.66 rows=14545066 width=8)
>
> Beyond just raw row counts, the size of each table are
> rhnPackageChangeLogData = 110 MB and rhnPackageChangeLogRec = 2469 MB. Trying
> my best to impersonate a DBA here. A couple things I'm contemplating now:
>
> 1. Try increasing the working memory parameter.
>     I'm thinking this due to the fact that it looks like it wants to load a
>     ~2.5G hash in memory for that subquery. Within my postgresql.conf file,
>     the "work_mem" is not explicitly set and I'm still trying to figure out
>     how to query the current size.
>
> 2. Drop the DISTINCT clause from the subquery.
>     I'm thinking this because the EXPLAIN without the 'DISTINCT' clause makes
>     it drop the HashAggregate step:
>    EXPLAIN DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT changelog_data_id FROM rhnPackageChangeLogRec );
>                                               QUERY PLAN
>    ------------------------------------------------------------------------------------------------
>     Seq Scan on rhnpackagechangelogdata  (cost=352677.73..45926203886.33 rows=192456 width=6)
>       Filter: (NOT (SubPlan 1))
>       SubPlan 1
>         ->  Materialize  (cost=352677.73..554945.39 rows=14545066 width=8)
>               ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..281315.66 rows=14545066 width=8)
>
>    I actually see the following comment within the Task_queries.xml file:
>    "DISTINCT makes the DELETE faster for postgresql" but as I'm thinking "less
>    memory and able to complete" might be preferable than potential speed.
>
> 3. Others? Perhaps there an alternate SQL approach that can be taken? Or would you
>     suspect something else is terrible wrong here?

Reverse the logic so you find the id's to explicitly delete:

delete from rhnPackageChangeLogData where id in (SELECT d.id from 
rhnPackageChangeLogData d left join rhnPackageChangeLogRec l on 
(d.id=l.changelog_data_id) where l.changelog_data_id is null);

I don't think I have the same amount of data as you, but here are some 
quick stats:

spacewalk=# SELECT count(*) from rhnPackageChangeLogData;
  count
--------
  176325
(1 row)

Time: 20.140 ms
spacewalk=#
spacewalk=# select count(*) from rhnPackageChangeLogRec;
   count
---------
  5121684
(1 row)

Time: 421.497 ms
spacewalk=# begin;
BEGIN
spacewalk=# delete from rhnPackageChangeLogData where id in (SELECT d.id 
from rhnPackageChangeLogData d left join rhnPackageChangeLogRec l on 
(d.id=l.changelog_data_id) where l.changelog_data_id is null);
DELETE 0
Time: 2646.490 ms
spacewalk=# rollback;
ROLLBACK
Time: 0.155 ms

-- 
Postgresql & php tutorials
http://www.designmagick.com/