[Spacewalk-list] Run away DELETE query eating my disk space

Jon Miller jonebird at gmail.com
Wed Apr 30 04:05:57 UTC 2014


Before asking for more disk space, I scheduled some down time to perform a
"VACUUM FULL VERBOSE ANALYZE" on my PostgreSQL database today along with
rebuilding all of my indexes in the DB. After bringing Spacewalk back up
again, the DELETE query commenced again and is right back in the same cycle
again. I was hoping that a full maintenance cycle would magically fix things
for me.

So more detail on my situation. Here is the EXPLAIN plan which also gives you
insight into our DB size with the row counts:

  EXPLAIN DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
                                                QUERY PLAN
  ------------------------------------------------------------------------------------------------------
   Seq Scan on rhnpackagechangelogdata  (cost=318312.96..134742963.76 rows=192456 width=6)
     Filter: (NOT (SubPlan 1))
     SubPlan 1
       ->  Materialize  (cost=318312.96..318905.08 rows=42512 width=8)
             ->  HashAggregate  (cost=317678.33..318103.45 rows=42512 width=8)
                   ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..281315.66 rows=14545066 width=8)

Beyond just raw row counts, the size of each table are
rhnPackageChangeLogData = 110 MB and rhnPackageChangeLogRec = 2469 MB. Trying
my best to impersonate a DBA here. A couple things I'm contemplating now:

1. Try increasing the working memory parameter.
   I'm thinking this due to the fact that it looks like it wants to load a
   ~2.5G hash in memory for that subquery. Within my postgresql.conf file,
   the "work_mem" is not explicitly set and I'm still trying to figure out
   how to query the current size. 

2. Drop the DISTINCT clause from the subquery.
   I'm thinking this because the EXPLAIN without the 'DISTINCT' clause makes
   it drop the HashAggregate step:
  EXPLAIN DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT changelog_data_id FROM rhnPackageChangeLogRec );
                                             QUERY PLAN
  ------------------------------------------------------------------------------------------------
   Seq Scan on rhnpackagechangelogdata  (cost=352677.73..45926203886.33 rows=192456 width=6)
     Filter: (NOT (SubPlan 1))
     SubPlan 1
       ->  Materialize  (cost=352677.73..554945.39 rows=14545066 width=8)
             ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..281315.66 rows=14545066 width=8)

  I actually see the following comment within the Task_queries.xml file:
  "DISTINCT makes the DELETE faster for postgresql" but as I'm thinking "less
  memory and able to complete" might be preferable than potential speed. 

3. Others? Perhaps there an alternate SQL approach that can be taken? Or would you
   suspect something else is terrible wrong here?

Thanks,
Jon Miller

Setup Info: Running spacewalk-postgresql 2.0.1-1.el6 on a RHEL 6.4 machine
            with 8 G of memory and PostgreSQL 8.4.13-1.el6_3.

Jon Miller <jonEbird at gmail.com> writes:

> Hello,
>
> I have a situation where it appears that a DELETE statement is eating my disk
> space via tmp table spaces and eventually fills the file system and then
> rolls it back. Then starts up again and I'm observing this cycle thanks to
> space alerts. By eating disk space, I'm talking about swings of close to 50G
> of space that occurs in the span of 44min. 
>
> Here is query being issued:
>   DELETE FROM rhnPackageChangeLogData
>     WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec )
>
> I had earlier added an extra 20G lun to our Spacewalk (2.0) server but that
> was before I realized what was happening. I'm curious if I can 1. determine
> how much disk space I'll really need and 2. is there a better way to perform
> this clean job? 
>
> Per #2, I'm very tempted to shutdown spacewalk, make a DB backup and then
> figure out a way to run that DELETE without transaction support, though I'm
> not sure how to do that with PostgreSQL yet.
>
> Before I get too crazy, I thought I'd ask folks here for advice.




More information about the Spacewalk-list mailing list