[Spacewalk-list] Run away DELETE query eating my disk space
Jon Miller
jonebird at gmail.com
Fri May 2 22:54:35 UTC 2014
My current plan is to stop Spacewalk this evening, dump the DB, recreate the
DB and load my data back. Why? I was troubleshooting another issue related to
clients that use a Spacewalk proxy. While I was strace'ing Tomcat, I found
that forked processes were segfault'ing and the thing they did just before
was reading from a PostgreSQL socket. My evening procedure is going off the
theory that my DB is somehow corrupted.
Meanwhile, here are some specs on our Spacewalk server:
- VMware instance running RHEL 6.4
+ 8GB of memory, 2 vCPUs
- PosgresSQL version 8.4.13-1.el6_3.x86_64
- Spacewalk version 2.0.1-1 (spacewalk-postgresql)
Thanks for the interest,
Jon Miller
Paul Robert Marino <prmarino1 at gmail.com> writes:
> Increase your working memory. Also increase your maintenance working
> memory.
> That will help but we need to know more details before we can help.
> The exact specs of the box.
> The PostgreSQL configuration.
> The exact version of PostgreSQL you are running there are huge differences
> between 8 and 9.
> What OS and version you are running on. For example I run PostgreSQL 9.x on
> RHEL 6 which causes huge problems unless you change the default java driver
> which was at the time I did it a manual operation. And this can also be a
> problem for external PostgreSQL databases.
>
> -- Sent from my HP Pre3
>
> ---------------------------------------------------------------------------
> On May 1, 2014 21:52, Jon Miller <jonebird at gmail.com> wrote:
>
> Chris <dmagick at gmail.com> writes:
>
>>> Thanks for the reply. I wondered if I could do that but it would have
> been a
>>> while before I would have came up with that query. Perhaps upstream
> folks
>>> would like to test that version for Task_queries.xml within the git
> repo.
>>>
>>> Like before, I like to turn it into a "select count(*)" version first as
> a
>>> test. I ran it and forgot about it until I realized it was still running
> and
>>> killed it after ~3.5hours.
>>>
>>> spaceschema=# select count(*) from rhnPackageChangeLogData where id in
>>> spaceschema-# (SELECT d.id from rhnPackageChangeLogData d
>>> spaceschema(# left join rhnPackageChangeLogRec l on
> (d.id=l.changelog_data_id)
>>> spaceschema(# where l.changelog_data_id is null);
>>> ^C
>>> Session terminated, killing shell... ...killed.
>>
>> Yikes.
>>
>> Time to check indexes? Check there is one on
>> rhnpackagechangelogrec(changelog_data_id):
>>
>> spacewalk=# \d rhnpackagechangelogrec
>> Table "public.rhnpackagechangelogrec"
>> Column | Type | Modifiers
>> -------------------+--------------------------+------------------------
>> id | numeric | not null
>> package_id | numeric | not null
>> changelog_data_id | numeric | not null
>> created | timestamp with time zone | not null default now()
>> modified | timestamp with time zone | not null default now()
>> Indexes:
>> "rhn_pkg_clr_id_pk" PRIMARY KEY, btree (id)
>> "rhn_pkg_clr_pid_cld_uq" UNIQUE, btree (package_id, changelog_data_id)
>> "rhn_pkg_clr_cld_uq" btree (changelog_data_id)
>>
>> <snip>
>
> Yes, I have the same indexes:
> spaceschema=# \d rhnpackagechangelogrec
> Table "public.rhnpackagechangelogrec"
> Column | Type | Modifiers
> -------------------+--------------------------+------------------------
> id | numeric | not null
> package_id | numeric | not null
> changelog_data_id | numeric | not null
> created | timestamp with time zone | not null default now()
> modified | timestamp with time zone | not null default now()
> Indexes:
> "rhn_pkg_clr_id_pk" PRIMARY KEY, btree (id)
> "rhn_pkg_clr_pid_cld_uq" UNIQUE, btree (package_id, changelog_data_id)
> "rhn_pkg_clr_cld_uq" btree (changelog_data_id)
>
>> and the one on 'id' in changelogData should be there because it's a
>> primary key:
>>
>> spacewalk=# \d rhnpackagechangelogdata
>> Table "public.rhnpackagechangelogdata"
>> Column | Type | Modifiers
>> ---------+--------------------------+------------------------
>> id | numeric | not null
>> name | character varying(128) | not null
>> text | character varying(3000) | not null
>> time | timestamp with time zone | not null
>> created | timestamp with time zone | not null default now()
>> Indexes:
>> "rhn_pkg_cld_id_pk" PRIMARY KEY, btree (id)
>>
>> <snip>
>
> Same here as well or perhaps an extra one? (maybe you snipped too much?)
> spaceschema=# \d rhnpackagechangelogdata
> Table "public.rhnpackagechangelogdata"
> Column | Type | Modifiers
> ---------+--------------------------+------------------------
> id | numeric | not null
> name | character varying(128) | not null
> text | character varying(3000) | not null
> time | timestamp with time zone | not null
> created | timestamp with time zone | not null default now()
> Indexes:
> "rhn_pkg_cld_id_pk" PRIMARY KEY, btree (id)
> "rhn_pkg_cld_nt_idx" btree (name, "time")
>
> Two days ago I actually shutdown Spacewalk so I could do a "VACUUM FULL
> VERBOSE ANALYZE;" and then I rebuild every index on the DB before bringing
> Spacewalk back up again.
>
> While the query was running yesterday, I was watching a "vmstat" output and
> while it was busy, there didn't seem to be any blatant resource constraint
> with the system on the whole. A lot of block I/O and the occasional swap
> activity but definitely not thrashing.
>
> Have you made any tweaks to your postgresql.conf that you can share? That
> is
> where my head is now but I've got more reading / learning to do before I
> start introducing tuning changes.
>
> Thanks,
More information about the Spacewalk-list
mailing list