[Spacewalk-list] Run away DELETE query eating my disk space
Jon Miller
jonebird at gmail.com
Thu May 1 04:09:59 UTC 2014
Chris <dmagick at gmail.com> writes:
> On 30/04/14 14:05, Jon Miller wrote:
>> Before asking for more disk space, I scheduled some down time to perform a
>> "VACUUM FULL VERBOSE ANALYZE" on my PostgreSQL database today along with
>> rebuilding all of my indexes in the DB. After bringing Spacewalk back up
>> again, the DELETE query commenced again and is right back in the same cycle
>> again. I was hoping that a full maintenance cycle would magically fix things
>> for me.
>>
>> So more detail on my situation. Here is the EXPLAIN plan which also gives you
>> insight into our DB size with the row counts:
>>
>> EXPLAIN DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>> QUERY PLAN
>> ------------------------------------------------------------------------------------------------------
>> Seq Scan on rhnpackagechangelogdata (cost=318312.96..134742963.76 rows=192456 width=6)
>> Filter: (NOT (SubPlan 1))
>> SubPlan 1
>> -> Materialize (cost=318312.96..318905.08 rows=42512 width=8)
>> -> HashAggregate (cost=317678.33..318103.45 rows=42512 width=8)
>> -> Seq Scan on rhnpackagechangelogrec (cost=0.00..281315.66 rows=14545066 width=8)
>>
>> Beyond just raw row counts, the size of each table are
>> rhnPackageChangeLogData = 110 MB and rhnPackageChangeLogRec = 2469 MB. Trying
>> my best to impersonate a DBA here. A couple things I'm contemplating now:
>>
>> 1. Try increasing the working memory parameter.
>> I'm thinking this due to the fact that it looks like it wants to load a
>> ~2.5G hash in memory for that subquery. Within my postgresql.conf file,
>> the "work_mem" is not explicitly set and I'm still trying to figure out
>> how to query the current size.
>>
>> 2. Drop the DISTINCT clause from the subquery.
>> I'm thinking this because the EXPLAIN without the 'DISTINCT' clause makes
>> it drop the HashAggregate step:
>> EXPLAIN DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT changelog_data_id FROM rhnPackageChangeLogRec );
>> QUERY PLAN
>> ------------------------------------------------------------------------------------------------
>> Seq Scan on rhnpackagechangelogdata (cost=352677.73..45926203886.33 rows=192456 width=6)
>> Filter: (NOT (SubPlan 1))
>> SubPlan 1
>> -> Materialize (cost=352677.73..554945.39 rows=14545066 width=8)
>> -> Seq Scan on rhnpackagechangelogrec (cost=0.00..281315.66 rows=14545066 width=8)
>>
>> I actually see the following comment within the Task_queries.xml file:
>> "DISTINCT makes the DELETE faster for postgresql" but as I'm thinking "less
>> memory and able to complete" might be preferable than potential speed.
>>
>> 3. Others? Perhaps there an alternate SQL approach that can be taken? Or would you
>> suspect something else is terrible wrong here?
>
> Reverse the logic so you find the id's to explicitly delete:
>
> delete from rhnPackageChangeLogData where id in (SELECT d.id from
> rhnPackageChangeLogData d left join rhnPackageChangeLogRec l on
> (d.id=l.changelog_data_id) where l.changelog_data_id is null);
>
> I don't think I have the same amount of data as you, but here are some
> quick stats:
>
> spacewalk=# SELECT count(*) from rhnPackageChangeLogData;
> count
> --------
> 176325
> (1 row)
>
> Time: 20.140 ms
> spacewalk=#
> spacewalk=# select count(*) from rhnPackageChangeLogRec;
> count
> ---------
> 5121684
> (1 row)
>
> Time: 421.497 ms
> spacewalk=# begin;
> BEGIN
> spacewalk=# delete from rhnPackageChangeLogData where id in (SELECT d.id
> from rhnPackageChangeLogData d left join rhnPackageChangeLogRec l on
> (d.id=l.changelog_data_id) where l.changelog_data_id is null);
> DELETE 0
> Time: 2646.490 ms
> spacewalk=# rollback;
> ROLLBACK
> Time: 0.155 ms
Thanks for the reply. I wondered if I could do that but it would have been a
while before I would have came up with that query. Perhaps upstream folks
would like to test that version for Task_queries.xml within the git repo.
Like before, I like to turn it into a "select count(*)" version first as a
test. I ran it and forgot about it until I realized it was still running and
killed it after ~3.5hours.
spaceschema=# select count(*) from rhnPackageChangeLogData where id in
spaceschema-# (SELECT d.id from rhnPackageChangeLogData d
spaceschema(# left join rhnPackageChangeLogRec l on (d.id=l.changelog_data_id)
spaceschema(# where l.changelog_data_id is null);
^C
Session terminated, killing shell... ...killed.
I now fear that something more fundamental is wrong with my DB.
--
Jon Miller
More information about the Spacewalk-list
mailing list