[Spacewalk-list] Spacewalk-list Digest, Vol 42, Issue 31

Parsons, Aron parsonsa at bit-sys.com
Tue Nov 22 15:49:53 UTC 2011


You are right and I realized that this morning when thinking about directories and symlinks.

Your suggestion to do a nested query does make a difference and it looks like it gives the results we're looking for.

--- slow.sql    2011-11-22 10:04:25.951471422 -0500
+++ query.sql   2011-11-22 10:47:21.481745356 -0500
@@ -22,8 +22,12 @@
 rhnConfigRevision cr
 left join rhnConfigContent ccont
 on cr.config_content_id = ccont.id
-left join rhnChecksumView c
-on ccont.checksum_id = c.id,
+left join
+  (select cs.id, cs.checksum_type, cs.checksum
+   from rhnChecksumView cs
+   inner join rhnConfigContent ccont
+   on ccont.checksum_id = cs.id) c
+on c.id = ccont.checksum_id,
 rhnServerConfigChannel scc,
 rhnConfigFile cf,
 rhnConfigFileType cft,

explain analyze select E'/tmp/symlink' path,
cc.label config_channel,
ccont.contents file_contents,
ccont.is_binary is_binary,
c.checksum_type,
c.checksum,
ccont.delim_start, ccont.delim_end,
cr.revision,
cf.modified,
ci.username,
ci.groupname,
ci.filemode,
cft.label,
cct.priority,
ci.selinux_ctx,
case
when cft.label='symlink' then (select path from rhnConfigFileName where id = ci.SYMLINK_TARGET_FILENAME_ID)
else ''
end as symlink
from rhnConfigChannel cc,
rhnConfigInfo ci,
rhnConfigRevision cr
left join rhnConfigContent ccont
on cr.config_content_id = ccont.id
left join
  (select cs.id, cs.checksum_type, cs.checksum
   from rhnChecksumView cs
   inner join rhnConfigContent ccont
   on ccont.checksum_id = cs.id) c
on c.id = ccont.checksum_id,
rhnServerConfigChannel scc,
rhnConfigFile cf,
rhnConfigFileType cft,
rhnConfigChannelType cct
where scc.server_id = 1000010000
and scc.config_channel_id = cc.id
and cf.config_channel_id = cc.id
and cf.config_file_name_id = lookup_config_filename(E'/tmp/symlink')
and cr.config_file_id = cf.id
and cr.config_info_id = ci.id
and cf.latest_config_revision_id = cr.id
and cr.config_file_type_id = cft.id
and cct.id = cc.confchan_type_id
order by cct.priority, scc.position;

                                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------
 Sort  (cost=335.69..335.70 rows=1 width=886) (actual time=1.837..1.837 rows=1 loops=1)
   Sort Key: cct.priority, scc."position"
   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=321.05..335.68 rows=1 width=886) (actual time=1.457..1.786 rows=1 loops=1)
         ->  Nested Loop  (cost=321.05..333.82 rows=1 width=812) (actual time=1.427..1.756 rows=1 loops=1)
               ->  Nested Loop  (cost=321.05..333.20 rows=1 width=812) (actual time=1.415..1.743 rows=1 loops=1)
                     Join Filter: (cc.id = scc.config_channel_id)
                     ->  Nested Loop  (cost=321.05..332.11 rows=1 width=819) (actual time=1.402..1.729 rows=1 loops=1)
                           Join Filter: ((cf.id = cr.config_file_id) AND (cf.latest_config_revision_id = cr.id))
                           ->  Nested Loop  (cost=0.00..8.67 rows=1 width=54) (actual time=0.955..1.227 rows=1 loops=1)
                                 Join Filter: (cc.confchan_type_id = cct.id)
                                 ->  Nested Loop  (cost=0.00..7.60 rows=1 width=55) (actual time=0.949..1.219 rows=1 loops=1)
                                       Join Filter: (cc.id = cf.config_channel_id)
                                       ->  Seq Scan on rhnconfigfile cf  (cost=0.00..6.51 rows=1 width=29) (actual time=0.941..1.209 rows=1 loops=1)
                                             Filter: (config_file_name_id = lookup_config_filename('/tmp/symlink'::character varying))
                                       ->  Seq Scan on rhnconfigchannel cc  (cost=0.00..1.04 rows=4 width=26) (actual time=0.003..0.004 rows=5 loops=1)
                                 ->  Seq Scan on rhnconfigchanneltype cct  (cost=0.00..1.03 rows=3 width=13) (actual time=0.002..0.002 rows=3 loops=1)
                           ->  Hash Left Join  (cost=321.05..322.90 rows=36 width=793) (actual time=0.442..0.485 rows=57 loops=1)
                                 Hash Cond: (cr.config_content_id = ccont.id)
                                 ->  Seq Scan on rhnconfigrevision cr  (cost=0.00..1.36 rows=36 width=42) (actual time=0.006..0.010 rows=39 loops=1)
                                 ->  Hash  (cost=320.61..320.61 rows=35 width=765) (actual time=0.400..0.400 rows=54 loops=1)
                                       ->  Hash Left Join  (cost=312.12..320.61 rows=35 width=765) (actual time=0.312..0.349 rows=54 loops=1)
                                             Hash Cond: (ccont.checksum_id = c.id)
                                             ->  Seq Scan on rhnconfigcontent ccont  (cost=0.00..8.35 rows=35 width=723) (actual time=0.004..0.011 rows=36 loops=1)
                                             ->  Hash  (cost=311.68..311.68 rows=35 width=59) (actual time=0.293..0.293 rows=36 loops=1)
                                                   ->  Hash Join  (cost=1.11..311.68 rows=35 width=59) (actual time=0.045..0.260 rows=36 loops=1)
                                                         Hash Cond: (c.checksum_type_id = ct.id)
                                                         ->  Nested Loop  (cost=0.00..310.09 rows=35 width=60) (actual time=0.021..0.212 rows=36 loops=1)
                                                               ->  Seq Scan on rhnconfigcontent ccont  (cost=0.00..8.35 rows=35 width=9) (actual time=0.001..0.016 rows=36 loops=1)
                                                               ->  Index Scan using rhnchecksum_pk on rhnchecksum c  (cost=0.00..8.61 rows=1 width=60) (actual time=0.004..0.004 row
s=1 loops=36)
                                                                     Index Cond: (c.id = ccont.checksum_id)
                                                         ->  Hash  (cost=1.05..1.05 rows=5 width=13) (actual time=0.013..0.013 rows=5 loops=1)
                                                               ->  Seq Scan on rhnchecksumtype ct  (cost=0.00..1.05 rows=5 width=13) (actual time=0.004..0.004 rows=5 loops=1)
                     ->  Seq Scan on rhnserverconfigchannel scc  (cost=0.00..1.06 rows=2 width=14) (actual time=0.007..0.008 rows=1 loops=1)
                           Filter: (scc.server_id = 1000010000::numeric)
               ->  Index Scan using rhn_conffiletype_id_pk on rhnconfigfiletype cft  (cost=0.00..0.60 rows=1 width=14) (actual time=0.009..0.010 rows=1 loops=1)
                     Index Cond: (cft.id = cr.config_file_type_id)
         ->  Index Scan using rhn_confinfo_id_pk on rhnconfiginfo ci  (cost=0.00..0.60 rows=1 width=88) (actual time=0.009..0.009 rows=1 loops=1)
               Index Cond: (ci.id = cr.config_info_id)
         SubPlan 1
           ->  Seq Scan on rhnconfigfilename  (cost=0.00..1.25 rows=1 width=20) (actual time=0.004..0.008 rows=1 loops=1)
                 Filter: (id = $0)
 Total runtime: 2.135 ms
(43 rows)


And if we pull out the WHERE clause, we can see that the correct results are returned for a file, symlink and directory.

     path     | config_channel | file_contents | is_binary | checksum_type |             checksum             | delim_start | delim_end | revision |           modified            |
 username | groupname | filemode |   label   | priority | selinux_ctx |  symlink
--------------+----------------+---------------+-----------+---------------+----------------------------------+-------------+-----------+----------+-------------------------------+
----------+-----------+----------+-----------+----------+-------------+-----------
 /tmp/symlink | test           |               |           |               |                                  |             |           |        1 | 2011-11-22 09:41:14.189421-05 |
          |           |          | symlink   |        1 |             | /dev/null
 /tmp/symlink | test           | test          | N         | md5           | 098f6bcd4621d373cade4e832627b4f6 | {|          | |}        |        1 | 2011-11-22 09:41:37.216542-05 |
 root     | root      |      644 | file      |        1 |             |
 /tmp/symlink | test           |               |           |               |                                  |             |           |        1 | 2011-11-22 09:41:48.279517-05 |
 root     | root      |      644 | directory |        1 |             |
(3 rows)

/aron

------------------------

Date: Tue, 22 Nov 2011 16:19:04 +0100
From: "'jpazdziora at redhat.com'" <jpazdziora at redhat.com>
To: "Parsons, Aron" <parsonsa at bit-sys.com>
Cc: "'spacewalk-list at redhat.com'" <spacewalk-list at redhat.com>
Subject: Re: [Spacewalk-list] Postgres spacewalk
Message-ID: <20111122151903.GA10682 at redhat.com>
Content-Type: text/plain; charset=us-ascii

On Mon, Nov 21, 2011 at 10:27:57PM +0000, Parsons, Aron wrote:
> I did some digging into this query because it is one that is very obviously slow when working with real systems.
> 
> It seems that the left join to rhnchecksum is the culprit of this slow query.  Here's the effect on execution time by changing it to an inner join:
> LEFT JOIN: 2973.886 ms
> INNER JOIN: 1.444 ms
> 
> Based on what this query is used for and the columns we're selecting, this seems safe to me; the inner join is just going to exclude rows that don't exist in both, but that should never happen under normal operation.  I have not tested this on Oracle, but have verified that client operations are still operating normally.  The real world effect is that this takes a ~50s 'rhncfg-client diff' on the client down to ~3s.  I tested by updating the query in /usr/share/rhn/server/handlers/config/rhn_config_management.py.
> 
> Jan,
> What's your take on this?

If you have a query

	select from table1
		outer join table 2
		outer join table 3

you cannot replace it with

	select from table1
		outer join table 2
		inner join table 3

-- that is not semantically equivalent.

While your are right that rhnConfigContent.checksum_id is defined as
not null, the same is not true for rhnConfigRevision.config_content_id
-- that value may be null. If that one is null,
rhnConfigContent.checksum_id will be null as well (in the select,
thanks to the outer join), and the select will return nulls from
rhnChecksum as well.

What you could do would be

	select from table1
		outer join (
		select table 2
			inner join table 3
		) as subselect

-- you'd need to check thou if it made some difference in the
execution plan.

-- 
Jan Pazdziora
Principal Software Engineer, Satellite Engineering, Red Hat



------------------------------

_______________________________________________
Spacewalk-list mailing list
Spacewalk-list at redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-list

End of Spacewalk-list Digest, Vol 42, Issue 31
**********************************************




More information about the Spacewalk-list mailing list