[Spacewalk-list] Run away DELETE query eating my disk space

Sun May 4 00:29:40 UTC 2014

That's another fine suggestion though I still believe the issues were/are
still related to the actual SQL queries being ran. I completed my planned
routine last evening but just recreating the DB and reloading my backup
didn't change anything. I then changed the following PostgreSQL tunables:
- shared_buffers from (default?) 32MB => 2048MB
- work_mem from 1MB => 20MB
- maintenance_work_mem from 16MB => 300MB

After changing those, I'm able to run the SELECT version of that pesky query
that I was unable to complete before. So, that's a victory but unfortunately
I'm still staring at a fluctuating file system size. It is now going from 52G
available down to nothing in 11min and then presumably something is removed
and the cycle from 52G free repeats.

When I use "inotifywatch" to help characterize the activity on the filesystem
(limited to opens and creates), I get the following:
  $ inotifywatch -r /local/mnt -e open -e create
  Establishing watches...
  Finished establishing watches, now collecting statistics.
  total    open     create  filename
  2990751  2986326  4425    /local/mnt/workspace/pgsql/data/base/pgsql_tmp/
  780         780   0       /local/mnt/workspace/pgsql/data/base/314186/
  710         682   28      /local/mnt/workspace/pgsql/data/pg_stat_tmp/
  154         154   0       /local/mnt/workspace/pgsql/data/global/
  77          77    0       /local/mnt/openv/var/host_cache/
  49          25    24      /local/mnt/openv/var/host_cache/19a/
  33          17    16      /local/mnt/openv/var/host_cache/021/
  ...

So, that's why I keep my attention on the DB. And now when I look for what
queries are running, I keep seeing only this one:

SELECT DISTINCT snv.server_id AS server_id, S.name, S.release, SA.name as arch, urn.user_id
   FROM (select rhnChannelErrata.errata_id, rhnChannelErrata.channel_id, rhnServerChannel.server_id,rhnErrataPackage.package_id
          from rhnChannelErrata, rhnErrataPackage, rhnChannelNewestPackage, rhnPackageEVR,
               rhnServerChannel, rhnServerPackage, rhnPackageUpgradeArchCompat
          where rhnChannelErrata.errata_id = rhnErrataPackage.errata_id
            and rhnChannelErrata.channel_id = rhnChannelNewestPackage.channel_id
            and rhnErrataPackage.package_id = rhnChannelNewestPackage.package_id
            and rhnChannelErrata.channel_id = rhnServerChannel.channel_id
            and rhnChannelNewestPackage.name_id = rhnServerPackage.name_id
            and rhnServerChannel.server_id = rhnServerPackage.server_id
            and rhnChannelNewestPackage.evr_id = rhnPackageEVR.id
            and rhnServerPackage.package_arch_id = rhnPackageUpgradeArchCompat.package_arch_id
            and rhnPackageUpgradeArchCompat.pack )

Does PostgreSQL use pgsql_tmp space when queries can not be performed within
working memory?

Thanks again,
Jon Miller

Paul Robert Marino <prmarino1 at gmail.com> writes:

> Try increasing taskomatics working memory. I had a similar issue recently
> with taskomatic where my repo syncs were using more ram than allowed. The
> default is 1 GB I increase it to two and my problem went away
>
> -- Sent from my HP Pre3
>
> ---------------------------------------------------------------------------
> On May 2, 2014 19:01, Jon Miller <jonebird at gmail.com> wrote: 
>
> My current plan is to stop Spacewalk this evening, dump the DB, recreate
> the 
> DB and load my data back. Why? I was troubleshooting another issue related
> to 
> clients that use a Spacewalk proxy. While I was strace'ing Tomcat, I found 
> that forked processes were segfault'ing and the thing they did just before 
> was reading from a PostgreSQL socket. My evening procedure is going off the 
> theory that my DB is somehow corrupted. 
>
> Meanwhile, here are some specs on our Spacewalk server: 
> - VMware instance running RHEL 6.4 
> + 8GB of memory, 2 vCPUs 
> - PosgresSQL version 8.4.13-1.el6_3.x86_64 
> - Spacewalk version 2.0.1-1 (spacewalk-postgresql) 
>
> Thanks for the interest, 
> Jon Miller 
>
> Paul Robert Marino <prmarino1 at gmail.com> writes: 
>
>> Increase your working memory. Also increase your maintenance working 
>> memory. 
>> That will help but we need to know more details before we can help. 
>> The exact specs of the box. 
>> The PostgreSQL configuration. 
>> The exact version of PostgreSQL you are running there are huge
> differences 
>> between 8 and 9. 
>> What OS and version you are running on. For example I run PostgreSQL 9.x
> on 
>> RHEL 6 which causes huge problems unless you change the default java
> driver 
>> which was at the time I did it a manual operation. And this can also be a 
>> problem for external PostgreSQL databases. 
>> 
>> -- Sent from my HP Pre3 
>> 
>> -
> -------------------------------------------------------------------------- 
>> On May 1, 2014 21:52, Jon Miller <jonebird at gmail.com> wrote: 
>> 
>> Chris <dmagick at gmail.com> writes: 
>> 
>>>> Thanks for the reply. I wondered if I could do that but it would have 
>> been a 
>>>> while before I would have came up with that query. Perhaps upstream 
>> folks 
>>>> would like to test that version for Task_queries.xml within the git 
>> repo. 
>>>> 
>>>> Like before, I like to turn it into a "select count(*)" version first
> as 
>> a 
>>>> test. I ran it and forgot about it until I realized it was still
> running 
>> and 
>>>> killed it after ~3.5hours. 
>>>> 
>>>> spaceschema=# select count(*) from rhnPackageChangeLogData where id in 
>>>> spaceschema-# (SELECT d.id from rhnPackageChangeLogData d 
>>>> spaceschema(# left join rhnPackageChangeLogRec l on 
>> (d.id=l.changelog_data_id) 
>>>> spaceschema(# where l.changelog_data_id is null); 
>>>> ^C 
>>>> Session terminated, killing shell... ...killed. 
>>> 
>>> Yikes. 
>>> 
>>> Time to check indexes? Check there is one on 
>>> rhnpackagechangelogrec(changelog_data_id): 
>>> 
>>> spacewalk=# \d rhnpackagechangelogrec 
>>> Table "public.rhnpackagechangelogrec" 
>>> Column | Type | Modifiers 
>>> -------------------+--------------------------+------------------------ 
>>> id | numeric | not null 
>>> package_id | numeric | not null 
>>> changelog_data_id | numeric | not null 
>>> created | timestamp with time zone | not null default now() 
>>> modified | timestamp with time zone | not null default now() 
>>> Indexes: 
>>> "rhn_pkg_clr_id_pk" PRIMARY KEY, btree (id) 
>>> "rhn_pkg_clr_pid_cld_uq" UNIQUE, btree (package_id, changelog_data_id) 
>>> "rhn_pkg_clr_cld_uq" btree (changelog_data_id) 
>>> 
>>> <snip> 
>> 
>> Yes, I have the same indexes: 
>> spaceschema=# \d rhnpackagechangelogrec 
>> Table "public.rhnpackagechangelogrec" 
>> Column | Type | Modifiers 
>> -------------------+--------------------------+------------------------ 
>> id | numeric | not null 
>> package_id | numeric | not null 
>> changelog_data_id | numeric | not null 
>> created | timestamp with time zone | not null default now() 
>> modified | timestamp with time zone | not null default now() 
>> Indexes: 
>> "rhn_pkg_clr_id_pk" PRIMARY KEY, btree (id) 
>> "rhn_pkg_clr_pid_cld_uq" UNIQUE, btree (package_id, changelog_data_id) 
>> "rhn_pkg_clr_cld_uq" btree (changelog_data_id) 
>> 
>>> and the one on 'id' in changelogData should be there because it's a 
>>> primary key: 
>>> 
>>> spacewalk=# \d rhnpackagechangelogdata 
>>> Table "public.rhnpackagechangelogdata" 
>>> Column | Type | Modifiers 
>>> ---------+--------------------------+------------------------ 
>>> id | numeric | not null 
>>> name | character varying(128) | not null 
>>> text | character varying(3000) | not null 
>>> time | timestamp with time zone | not null 
>>> created | timestamp with time zone | not null default now() 
>>> Indexes: 
>>> "rhn_pkg_cld_id_pk" PRIMARY KEY, btree (id) 
>>> 
>>> <snip> 
>> 
>> Same here as well or perhaps an extra one? (maybe you snipped too much?) 
>> spaceschema=# \d rhnpackagechangelogdata 
>> Table "public.rhnpackagechangelogdata" 
>> Column | Type | Modifiers 
>> ---------+--------------------------+------------------------ 
>> id | numeric | not null 
>> name | character varying(128) | not null 
>> text | character varying(3000) | not null 
>> time | timestamp with time zone | not null 
>> created | timestamp with time zone | not null default now() 
>> Indexes: 
>> "rhn_pkg_cld_id_pk" PRIMARY KEY, btree (id) 
>> "rhn_pkg_cld_nt_idx" btree (name, "time") 
>> 
>> Two days ago I actually shutdown Spacewalk so I could do a "VACUUM FULL 
>> VERBOSE ANALYZE;" and then I rebuild every index on the DB before
> bringing 
>> Spacewalk back up again. 
>> 
>> While the query was running yesterday, I was watching a "vmstat" output
> and 
>> while it was busy, there didn't seem to be any blatant resource
> constraint 
>> with the system on the whole. A lot of block I/O and the occasional swap 
>> activity but definitely not thrashing. 
>> 
>> Have you made any tweaks to your postgresql.conf that you can share? That 
>> is 
>> where my head is now but I've got more reading / learning to do before I 
>> start introducing tuning changes. 
>> 
>> Thanks, 
>
> _______________________________________________ 
> Spacewalk-list mailing list 
> Spacewalk-list at redhat.com 
> https://www.redhat.com/mailman/listinfo/spacewalk-list