Forum .LRN Q&A: Forums package problem

Collapse
Posted by Peter Alberer on
I noticed that posting a message into a forum takes about 10seconds
in my system. Looking at the db-procs i found the following to be
the problem:

->tcl calls forums_message__new()
-->forums_message__new() calls acs_object__update_last_modified()
--->recursively updates the last_modified date of acs_objects going
up the hierarchy.

At some point the object id reaches -3. Updating that takes about 9
seconds.

openacs-stage=# explain analyze update acs_objects set last_modified
= now() where object_id = -3;
NOTICE:  QUERY PLAN:

Index Scan using acs_objects_pk on acs_objects  (cost=0.00..3.94
rows=1 width=93) (actual time=0.08..0.17 rows=1 loops=1)
Total runtime: 1.52 msec
(takes 9sec to return to the prompt after that runtime info)
EXPLAIN 1
openacs-stage=#

Why does it take so long to update acs_object with ID "-3"? Is this
some kind of root object and updating that triggers an avalanche of
other updates?

TIA,

Collapse
Posted by Dan Wickstrom on
It sounds like you need to run vacuum analyze:


openacs4=# explain analyze update acs_objects set last_modified = now() where object_id = -3;
NOTICE:  QUERY PLAN:

Index Scan using acs_objects_pk on acs_objects  (cost=0.00..3.09 rows=1 width=86) (actual time=0.17..0.24 rows=1 loops=1)
Total runtime: 31.36 msec

EXPLAIN
openacs4=# select version();
                             version                              
------------------------------------------------------------------
 PostgreSQL 7.2.1 on sparc-sun-solaris2.8, compiled by GCC 2.95.3
(1 row)

openacs4=# 

Collapse
Posted by Peter Alberer on
I am running vacuum analyze every night. But to make sure i tried it again with running "vacuum analyze" immediately before the update. It takes 9secs nevertheless. Note that it does not take long to update any other object in the hierarchy of the forum message. Only the "-3" takes so long.
Collapse
Posted by Peter Alberer on
I tried this in a clean install of openacs4+dotLRN and it only takes under a second. (with 2000 items in the acs_objects table) In my main db there are 35000 entries in the acs_objects table :(
Collapse
Posted by Dan Wickstrom on
Sounds like a trigger attached to acs_objects might be causing the problem. You can view the triggers on a table using something like the following (this query excludes RI triggers):

openacs4=# select tgname from pg_trigger where tgrelid = (select oid from pg_class where relname = 'acs_objects') and tgname not like 'RI_Constraint%';
             tgname             
--------------------------------
 acs_objects_context_id_del_tr
 acs_objects_context_id_up_tr
 acs_objects_context_id_in_tr
 acs_objects_update_tr
 acs_objects_insert_tr
 acs_objects_last_mod_update_tr
 acs_objects_mod_ip_insert_tr
(7 rows)

Try dropping the triggers one by one and see if you can get the update speed of object_id = -3 to increase. If at the end of this exercise, you still have a problem, then it's probably one of the RI triggers. You can also try dropping them one by one, but be warned, there's alot of them.

Before you do any of this, make a backup, because your db will be in a bad state afterwards. Or if this a production site, do a dump and restore into a test db, and then do the debugging on the test site.

Collapse
Posted by Peter Alberer on
thanks dan, i will try that.
Collapse
Posted by Peter Alberer on
<p>create trigger acs_objects_context_id_up_tr after update<br>
on acs_objects<br>
for each row execute procedure acs_objects_context_id_up_tr ();</p>
<p>this is the evil doer :) i will investigate that further.</p>
Collapse
Posted by Peter Alberer on

I just had an in-depth look at the trigger causing problems (acs_objects_context_id_up_tr on acs_objects) and found the following:


the trigger starts with

if new.object_id = old.object_id and new.context_id = old.context_id and new.security_inherit_p = old.security_inherit_p then return new; end if; Raise Notice ''Continuing 1'';

Doing the update on acs_objects mentioned above does not cause the trigger code to return "new". But i do not understand why? None of "object_id", "context_id" or "security_inherit_p" have changed.


The second part of the trigger code causes the problem. A delete statement using an "in" clause.

delete from acs_object_context_index where object_id in (select object_id from acs_object_contexts where ancestor_id = old.object_id) and ancestor_id in (select ancestor_id from acs_object_contexts where object_id = old.object_id); Raise Notice ''Continuing 2'';

The rest of the trigger code does not really take much time. But the table acs_object_context_index has about 160.000 entries in my database, so this statement takes a looong time.

Analyzing a corresponding select statement shows the problem:

openacs-stage=# explain analyze select * from acs_object_context_index openacs-stage-# where object_id in (select object_id openacs-stage(# from acs_object_contexts openacs-stage(# where ancestor_id = -3) openacs-stage-# and ancestor_id in (select ancestor_id openacs-stage(# from acs_object_contexts openacs-stage(# where object_id = -3); NOTICE: QUERY PLAN: Seq Scan on acs_object_context_index (cost=0.00..12089439.77 rows=40624 width=12) (actual time=87654.15..174282.81 rows=167 loops=1) SubPlan -> Materialize (cost=45.67..45.67 rows=11 width=4) (actual time=0.00..0.47 rows=167 loops=162498) -> Index Scan using acs_obj_ctx_idx_ancestor_idx on acs_object_context_index (cost=0.00..45.67 rows=11 width=4) (actual time=0.07..1.74 rows=167 loops=1) -> Materialize (cost=28.71..28.71 rows=8 width=4) (actual time=0.00..0.01 rows=1 loops=729) -> Index Scan using acs_object_context_index_pk on acs_object_context_index (cost=0.00..28.71 rows=8 width=4) (actual time=0.07..0.08 rows=1 loops=1)
Total runtime: 174283.57 msec

I would be very happy about any help with this situtation.

Collapse
Posted by Peter Alberer on

Yippi !!

if new.object_id = old.object_id and new.context_id = old.context_id and new.security_inherit_p = old.security_inherit_p then return new; end if;

Both new.context_id and old.context_id are NULL therefor new != old.

I suppose adding:

if new.context_id is null and old.context_id is null then return new; end if;
Collapse
Posted by Dan Wickstrom on
Damn! I thought we fixed all of these triggers in the core a long time ago.

To fix it, combine the is null tests with the rest of the original if condition, or you will end up with some unexepected results. Use something like the following:

   
  if new.object_id = old.object_id and
     (new.context_id = old.context_id or (new.context_id is null and old.context_id is null)) and
     new.security_inherit_p = old.security_inherit_p then
    return new;
  end if;