Home
The Toolkit for Online Communities
15887 Community Members, 0 members online, 1960 visitors today
Log In Register

Forum OpenACS Q&A: Problem with acs-subsite query "permissions" (need help)

OpenACS Home : Forums : OpenACS Q&A : Problem with acs-subsite query "permissions" (need help)

Icon of Envelope Request notifications

Hi,

I have a problem with the query "permissions" in
acs-subsite/www/permissions/perm-include-postgresql.xql

This is called for instance under forums-->administration-->permissions of a forum

The same query worked under dotLRN 2.0.3
Now after upgrade to 2.1 it is never returning

After analyzing the query I found out, that removing the first sub query the query at least returns after 70 sec.

Full Query

select ptab.grantee_id,
acs_object__name(ptab.grantee_id) as grantee_name,
o.object_type,
sum(ptab.read_p) as read_p, (case when sum(ptab.read_p) > 0 then 'checked' else '' end) as read_checked, sum(ptab.create_p) as create_p, (case when sum(ptab.create_p) > 0 then 'checked' else '' end) as create_checked, sum(ptab.write_p) as write_p, (case when sum(ptab.write_p) > 0 then 'checked' else '' end) as write_checked, sum(ptab.admin_p) as admin_p, (case when sum(ptab.admin_p) > 0 then 'checked' else '' end) as admin_checked,
sum(read_p + create_p + write_p + admin_p) as any_perm_p_
from (select grantee_id,
(case when privilege='read' then 2 else 0 end) as read_p, (case when privilege='create' then 2 else 0 end) as create_p, (case when privilege='write' then 2 else 0 end) as write_p, (case when privilege='admin' then 2 else 0 end) as admin_p
from acs_permissions_all
where object_id = '367496'
union all
select grantee_id,
(case when privilege='read' then -1 else 0 end) as read_p, (case when privilege='create' then -1 else 0 end) as create_p, (case when privilege='write' then -1 else 0 end) as write_p, (case when privilege='admin' then -1 else 0 end) as admin_p
from acs_permissions
where object_id = '367496'
union all
select -1 as grantee_id,
0 as read_p, 0 as create_p, 0 as write_p, 0 as admin_p
union all
select -2 as grantee_id,
0 as read_p, 0 as create_p, 0 as write_p, 0 as admin_p
union all
select component_id as grantee_id,
0 as read_p, 0 as create_p, 0 as write_p, 0 as admin_p
from group_component_map
where group_id = '-2'
union all
select segment_id as grantee_id,
0 as read_p, 0 as create_p, 0 as write_p, 0 as admin_p
from rel_segments rel_seg
where rel_seg.group_id = '-2'
union all
select segment_id as grantee_id,
0 as read_p, 0 as create_p, 0 as write_p, 0 as admin_p
from rel_segments rel_seg,
group_component_map gcm
where gcm.group_id = '-2'
and rel_seg.group_id = gcm.group_id
) ptab,
acs_objects o
where o.object_id = ptab.grantee_id
and acs_permission__permission_p(0, ptab.grantee_id, 'admin') = 'f'
group by ptab.grantee_id, grantee_name, object_type
order by object_type desc, grantee_name

Deleted sub query

select grantee_id,
(case when privilege='read' then 2 else 0 end) as read_p, (case when privilege='create' then 2 else 0 end) as create_p, (case when privilege='write' then 2 else 0 end) as write_p, (case when privilege='admin' then 2 else 0 end) as admin_p
from acs_permissions_all
where object_id = '367496'
union all

Results of "explain analyze"

  1. Before Upgrade

    QUERY PLAN
    Sort (cost=4130.43..4130.44 rows=1 width=41) (actual time=703.53..703.54 rows=7 loops=1)
    Sort Key: o.object_type, acs_object__name(ptab.grantee_id)
    -> Aggregate (cost=4130.23..4130.42 rows=1 width=41) (actual time=703.38..703.46 rows=7 loops=1)
    -> Group (cost=4130.23..4130.29 rows=6 width=41) (actual time=703.34..703.38 rows=10 loops=1)
    -> Sort (cost=4130.23..4130.24 rows=6 width=41) (actual time=703.34..703.34 rows=10 loops=1)
    Sort Key: ptab.grantee_id, acs_object__name(ptab.grantee_id), o.object_type
    -> Nested Loop (cost=0.00..4130.15 rows=6 width=41) (actual time=3.49..703.25 rows=10 loops=1)
    -> Subquery Scan ptab (cost=0.00..4110.07 rows=6 width=15) (actual time=0.98..680.43 rows=10 loops=1)
    -> Append (cost=0.00..4110.07 rows=6 width=15) (actual time=0.97..680.36 rows=10 loops=1)
    -> Subquery Scan "*SELECT* 1" (cost=0.00..30.91 rows=1 width=15) (actual time=0.97..3.03 rows=4 loops=1)
    -> Index Scan using acs_permissions_pk on acs_permissions (cost=0.00..30.91 rows=1 width=15) (actual time=0.96..3.00 rows=4 loops=1)
    Index Cond: (object_id = 367496)
    Filter: (acs_permission__permission_p(0, grantee_id, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 2" (cost=0.00..0.01 rows=1 width=0) (actual time=0.67..0.68 rows=1 loops=1)
    -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.67..0.67 rows=1 loops=1)
    One-Time Filter: (acs_permission__permission_p(0, -1, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 3" (cost=0.00..0.01 rows=1 width=0) (actual time=0.67..0.68 rows=1 loops=1)
    -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.67..0.67 rows=1 loops=1)
    One-Time Filter: (acs_permission__permission_p(0, -2, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 4" (cost=0.00..2863.14 rows=1 width=4) (actual time=72.34..72.34 rows=0 loops=1)
    -> Seq Scan on group_element_index (cost=0.00..2863.14 rows=1 width=4) (actual time=72.34..72.34 rows=0 loops=1)
    Filter: ((ancestor_rel_type = 'composition_rel'::character varying) AND (group_id = -2) AND (acs_permission__permission_p(0, element_id, 'admin'::character varying) = false))
    -> Subquery Scan "*SELECT* 5" (cost=0.00..11.67 rows=1 width=4) (actual time=0.98..2.40 rows=4 loops=1)
    -> Index Scan using rel_segments_grp_rel_type_uq on rel_segments rel_seg (cost=0.00..11.67 rows=1 width=4) (actual time=0.97..2.37 rows=4 loops=1)
    Index Cond: (group_id = -2)
    Filter: (acs_permission__permission_p(0, segment_id, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 6" (cost=0.00..1204.32 rows=1 width=12) (actual time=601.19..601.19 rows=0 loops=1)
    -> Nested Loop (cost=0.00..1204.32 rows=1 width=12) (actual time=601.19..601.19 rows=0 loops=1)
    -> Seq Scan on rel_segments rel_seg (cost=0.00..13.75 rows=2 width=8) (actual time=0.47..176.74 rows=450 loops=1)
    Filter: (acs_permission__permission_p(0, segment_id, 'admin'::character varying) = false)
    -> Index Scan using group_elem_idx_group_idx on group_element_index (cost=0.00..529.13 rows=1 width=4) (actual time=0.94..0.94 rows=0 loops=450)
    Index Cond: (("outer".group_id = group_element_index.group_id) AND (group_element_index.group_id = -2))
    Filter: (ancestor_rel_type = 'composition_rel'::character varying)
    -> Index Scan using acs_objects_pk on acs_objects o (cost=0.00..3.33 rows=1 width=21) (actual time=0.05..0.06 rows=1 loops=10)
    Index Cond: (o.object_id = "outer".grantee_id)
    Total runtime: 703.96 msec
  2. After Upgrade

    QUERY PLAN
    Sort (cost=3292.78..3292.78 rows=1 width=192) (actual time=66187.68..66187.69 rows=7 loops=1)
    Sort Key: o.object_type, acs_object__name(ptab.grantee_id)
    -> Aggregate (cost=3292.51..3292.77 rows=1 width=192) (actual time=66187.52..66187.60 rows=7 loops=1)
    -> Group (cost=3292.51..3292.59 rows=8 width=192) (actual time=66187.47..66187.51 rows=10 loops=1)
    -> Sort (cost=3292.51..3292.53 rows=8 width=192) (actual time=66187.46..66187.47 rows=10 loops=1)
    Sort Key: ptab.grantee_id, acs_object__name(ptab.grantee_id), o.object_type
    -> Nested Loop (cost=0.00..3292.39 rows=8 width=192) (actual time=218.35..66187.41 rows=10 loops=1)
    -> Subquery Scan ptab (cost=0.00..3244.88 rows=8 width=172) (actual time=216.04..66074.82 rows=10 loops=1)
    -> Append (cost=0.00..3244.88 rows=8 width=172) (actual time=216.03..66074.76 rows=10 loops=1)
    -> Subquery Scan "*SELECT* 1" (cost=0.00..1684.72 rows=3 width=172) (actual time=216.03..760.72 rows=4 loops=1)
    -> Index Scan using acs_permissions_pk on acs_permissions (cost=0.00..1684.72 rows=3 width=172) (actual time=216.02..760.69 rows=4 loops=1)
    Index Cond: (object_id = 367496)
    Filter: (acs_permission__permission_p(0, grantee_id, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 2" (cost=0.00..0.01 rows=1 width=0) (actual time=132.87..132.87 rows=1 loops=1)
    -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=132.86..132.86 rows=1 loops=1)
    One-Time Filter: (acs_permission__permission_p(0, -1, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 3" (cost=0.00..0.01 rows=1 width=0) (actual time=131.24..131.24 rows=1 loops=1)
    -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=131.24..131.24 rows=1 loops=1)
    One-Time Filter: (acs_permission__permission_p(0, -2, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 4" (cost=0.00..1511.35 rows=1 width=4) (actual time=56.58..56.58 rows=0 loops=1)
    -> Index Scan using group_elem_idx_group_idx on group_element_index (cost=0.00..1511.35 rows=1 width=4) (actual time=56.58..56.58 rows=0 loops=1)
    Index Cond: (group_id = -2)
    Filter: ((ancestor_rel_type = 'composition_rel'::character varying) AND (acs_permission__permission_p(0, element_id, 'admin'::character varying) = false))
    -> Subquery Scan "*SELECT* 5" (cost=0.00..9.80 rows=1 width=4) (actual time=138.77..545.64 rows=4 loops=1)
    -> Index Scan using rel_segments_grp_rel_type_uq on rel_segments rel_seg (cost=0.00..9.80 rows=1 width=4) (actual time=138.76..545.62 rows=4 loops=1)
    Index Cond: (group_id = -2)
    Filter: (acs_permission__permission_p(0, segment_id, 'admin'::character varying) = false)
    -> Subquery Scan "*SELECT* 6" (cost=0.00..39.00 rows=1 width=12) (actual time=64447.67..64447.67 rows=0 loops=1)
    -> Nested Loop (cost=0.00..39.00 rows=1 width=12) (actual time=64447.66..64447.66 rows=0 loops=1)
    -> Seq Scan on rel_segments rel_seg (cost=0.00..13.75 rows=2 width=8) (actual time=132.99..64219.12 rows=458 loops=1)
    Filter: (acs_permission__permission_p(0, segment_id, 'admin'::character varying) = false)
    -> Index Scan using group_elem_idx_group_idx on group_element_index (cost=0.00..11.21 rows=1 width=4) (actual time=0.49..0.49 rows=0 loops=458)
    Index Cond: (("outer".group_id = group_element_index.group_id) AND (group_element_index.group_id = -2))
    Filter: (ancestor_rel_type = 'composition_rel'::character varying)
    -> Index Scan using acs_objects_pk on acs_objects o (cost=0.00..6.01 rows=1 width=172) (actual time=9.28..9.28 rows=1 loops=10)
    Index Cond: (o.object_id = "outer".grantee_id)
    Total runtime: 66187.96 msec

    Any idea why?

    Greetings,

    Nima

+
Posted by Don Baccus on
This could not have possibly worked well under .LRN 2.0 ... did forums perhaps do something else rather than call that page in 2.0?

Anyway, this query exhibits all the badness of the original permissions queries that I sped up back for 4.6. I didn't realize this query existed.

Note the UNION ALL statements. PG doesn't optimize UNION ALL. Also notice all the references to "-2" etc ... that's "The public" (or registered users?) ...

The denormalized party-privilege map I added in 4.6 can be used to speed this up. Take a look at the current permissions view "acs_object_party_privilege_map" and its use of "Party_approved_member_map" for a hint as to how to tackle this.

Also ... the call to "acs_permission__permission_p(0, ptab.grantee_id, 'admin') = 'f'" in the where clause is a BIG NO NO for both Oracle and PG.

They each should read something like:

"not in (select 1 from acs_object_party_privilege_map where object_id = 0 and grantee_id = ptab.grantee_id and privilege = 'admin')"

for performance.

However I think the use of object 0 is bogus in this case as it's now the "unregistered visitor" not the system context root. That "0" should be replaced with the current value for the context root (see acs_magic_objects)

+
Posted by Don Baccus on
This query hasn't changed in 2.1 vs. 2.0 ... jeff claims, in a commit, to have fixed the WHERE clause but I see no evidence in the code ... searching further ...
+
Posted by Don Baccus on
Hmmm...the updated plan is estimated to run FASTER than before, but the actual execution time is horrible.

Is this on the same machine, Nima? Both plans look good. I don't see why they would be different because the CVS LOG shows no changes to the query. Can you do a hand-inspection of the query on your 2.0.6 instance and your 2.1 instance to see if there's actually a difference?

My guess is that it's going to disk in the second case. Can you use TIME (put the query in a file and do a "time psql dbname -f the_file_with_the_query) to see how much CPU time vs. disk time is being used in this query?

My comments above still stand at least in regard to the acs_perm call in the WHERE clause ...

+
Posted by Don Baccus on
What query plan do you get for the deleted subquery?

We saw a case on openacs.org where PG 7.4 fell on its face on a query involving a view where in 7.3 it was very fast. Apparently 7.4 decided to materialize the view.

+
Posted by Nima Mazloumi on
The old forum calls exactly the same.

Both instances of openacs are on the same machine.

A diff on the file in 2.0.3 and 2.1 returned no differences.

The analysis for the deleted query returned this query plan for the after-upgrade installation:

QUERY PLAN
Merge Join (cost=6700.19..28463.87 rows=1331364 width=180) (actual time=0.27..239.48 rows=16 loops=1)
Merge Cond: ("outer".object_id = "inner".ancestor_id)
-> Index Scan using acs_permissions_pk on acs_permissions p (cost=0.00..4821.55 rows=115414 width=176) (actual time=0.11..173.07 rows=81449 loops=1)
-> Sort (cost=6700.19..6705.96 rows=2307 width=4) (actual time=0.13..0.15 rows=14 loops=1)
Sort Key: acs_object_context_index.ancestor_id
-> Index Scan using acs_object_context_index_pk on acs_object_context_index (cost=0.00..6571.32 rows=2307 width=4) (actual time=0.08..0.10 rows=5 loops=1)
Index Cond: (object_id = 367496)
Total runtime: 239.56 msec

The Time analysis for that query returned this:

 time psql unima0 -f test.sql
grantee_id | grantee_name | object_type | read_p | read_checked | create_p | create_checked | write_p | write_checked | admin_p | admin_checked | any_perm_p_
------------+----------------------------+-------------------+--------+--------------+----------+----------------+---------+---------------+---------+---------------+-------------
40727 | Stefan Wurdack | user | 0 | | 0 | | 0 | | -1 | | -1
435 | Main Site Administrators | rel_segment | 0 | | 0 | | 0 | | 0 | | 0
434 | Main Site Members | rel_segment | 0 | | 0 | | 0 | | 0 | | 0
52201 | Registered .LRN Guests | rel_segment | 0 | | 0 | | 0 | | 0 | | 0
52202 | Registered .LRN Non-Guests | rel_segment | 0 | | 0 | | 0 | | 0 | | 0
-1 | The Public | group | -1 | | 0 | | 0 | | 0 | | -1
-2 | Registered Users | application_group | 0 | | 0 | | 0 | | 0 | | 0
(7 rows)


real 1m11.734s
user 0m0.002s
sys 0m0.003s

+
Posted by Don Baccus on
What does EXPLAIN give you for the subquery in 2.0? This is quite the mystery since you've verified that the queries and db engine are exactly the same.

Have you run VACUUM FULL ANALYZE since updating? (I'm sure the answer's yes from the plans you've posted, I'm grasping at straws here).

How does one cause EXPLAIN to expand subquery plans rather than simply say "Subquery Scan (*SELECT* 5)"?

+
Posted by Don Baccus on
Also Nima ... is the 2.0 your production site? If so, is it a BUSY production site?

If so the entire discrepency might be due to cache contention, and the busy site will win that war over your inactive test site all the time. The notion is that your inactive site will be forced to go to disk to get its data nearly all the time because the busy site will be hogging both the DB shared buffer and the Linux disk block cache.

In fact I diagnosed a similar situation for furfly about six weeks ago ...

If this isn't the production site then there's no point in my continuing, so I'll stop for now until I read your response tomorrow AM. If it is, though, I can suggest an approach for making a diagnosis.

Don, the 2.0 is a very busy production site. 25.000 uploaded users accounts, 5.100 active users and between 30 - 100 concurrent users. The test server is on the same machine but on a different port. Both use the same aolserver and the same postgres.
+
Posted by Don Baccus on
Do you have a spare machine you can test on with the new version and a dump of your data? Cache contention seems very likely.

The way to confirm this on the production box would be to turn on statistics gathering in PG (edit the config file/SIGHUP the Postmaster) and to look at cache hit/miss ratios for the relevant tables in your test database.

But better would be to set up on a second machine and do your testing in isolation to remove all doubt. I highly recommend you do this before putting more energy scratching our heads over the query itself.

It may be a good idea to turn on stats on your production box along with testing your test instance on another box (if possible). You may be running on the hairy edge of overloading that server and stats may help determine whether or not more memory should be allocated to the shared buffer, etc.

I have a couple of canned queries available to help make sense of the statistics I can give you if you turn them on and are interested ...

+
Posted by Nima Mazloumi on
I have turned on the statistic collector. What should I do now? I have never worked with it before.
+
Posted by Don Baccus on
Here are queries I put into a file for furfly that helped a lot. The first selects (i.e. "select 'Best Hit Ratios'") are just there to give headers for the various outputs.

You can put them in a file then say psql dbname -f filename.sql

select 'Best hit ratios';
select relname,
   heap_blks_read + heap_blks_hit as heap_blk_requests,
   case when heap_blks_hit > 0
     then 
round(100.0-100.0*heap_blks_read/(heap_blks_read+heap_blks_hit),2)
     else round(0.0,2)
   end as heap_blks_hit_percent,
   idx_blks_read + idx_blks_hit as idx_blk_requests,
   case when idx_blks_hit > 0
     then round(100.0-100.0*idx_blks_read/(idx_blks_read + 
idx_blks_hit),2)
     else round(0.0,2)
   end as idx_blks_hit_percent
from pg_statio_user_tables
where heap_blks_read > 0
order by heap_blks_hit_percent desc
limit 10;

select 'Worst hit ratios';
select relname,
   heap_blks_read + heap_blks_hit as heap_blk_requests,
   case when heap_blks_hit > 0
     then 
round(100.0-100.0*heap_blks_read/(heap_blks_read+heap_blks_hit),2)
     else round(0.0,2)
   end as heap_blks_hit_percent,
   idx_blks_read + idx_blks_hit as idx_blk_requests,
   case when idx_blks_hit > 0
     then round(100.0-100.0*idx_blks_read/(idx_blks_read + 
idx_blks_hit),2)
     else round(0.0,2)
   end as idx_blks_hit_percent
from pg_statio_user_tables
where heap_blks_read > 0
order by heap_blks_hit_percent
limit 10;

select 'Most active tables';
select relname,
   heap_blks_read + heap_blks_hit as heap_blk_requests,
   case when heap_blks_hit > 0
     then 
round(100.0-100.0*heap_blks_read/(heap_blks_read+heap_blks_hit),2)
     else round(0.0,2)
   end as heap_blks_hit_percent,
   idx_blks_read + idx_blks_hit as idx_blk_requests,
   case when idx_blks_hit > 0
     then round(100.0-100.0*idx_blks_read/(idx_blks_read + 
idx_blks_hit),2)
     else round(0.0,2)
   end as idx_blks_hit_percent
from pg_statio_user_tables
where heap_blks_read > 0
order by heap_blk_requests desc
limit 10;

+
Posted by Nima Mazloumi on
Hi Don,

just tested your modified query.

Changing the WHERE-clause from

acs_permission__permission_p(0, ptab.grantee_id, 'admin') = 'f'

to

not exists (select 1
from acs_object_party_privilege_map m
where m.object_id = acs__magic_object_id('security_context_root')
and m.party_id = ptab.grantee_id
and m.privilege = 'admin')

improved the query that

1. it is returning results again, without having me to remove that subquery

2. I took 17265 ms for the query to suceed

I assume that it will be even faster on the productive instance.

More info on that coming soon.

Thank you very much,
Nima

+
Posted by Don Baccus on
OK, keep me posted, I've been sick with the flu since Friday and am just now recovering but will be checking in a couple of times a day until the flu's run its course (probably two-three more days)