Forum OpenACS Q&A: corrupt cr? How to audit cr_ tables?

A system running oacs 5.1 (pg 7.4, aolserver 4.0.10) is showing errors like this:

[26/Dec/2007:09:33:33][36768.5426176][-sched-] Warning: sched: excessive time taken by proc 10 (1401 seconds)
[26/Dec/2007:10:27:15][36768.5426176][-sched-] Warning: db_exec: longdb 2322454625 nsdb0 0or1row dbqd.acs-content-repository.tcl.acs-content-repository-init.acs_cr_scheduled_release_exec.schedule_releases
[26/Dec/2007:10:27:15][36768.5426176][-sched-] Warning: sched: excessive time taken by proc 10 (2322 seconds)
[26/Dec/2007:11:29:18][36768.5426176][-sched-] Warning: db_exec: longdb 2823465959 nsdb0 0or1row dbqd.acs-content-repository.tcl.acs-content-repository-init.acs_cr_scheduled_release_exec.schedule_releases
[26/Dec/2007:11:29:18][36768.5426176][-sched-] Warning: sched: excessive time taken by proc 10 (2823 seconds)
[26/Dec/2007:12:29:09][36768.5426176][-sched-] Warning: db_exec: longdb 2691307436 nsdb0 0or1row dbqd.acs-content-repository.tcl.acs-content-repository-init.acs_cr_scheduled_release_exec.schedule_releases
[26/Dec/2007:12:29:09][36768.5426176][-sched-] Warning: sched: excessive time taken by proc 10 (2691 seconds)
[26/Dec/2007:13:36:29][36768.5426176][-sched-] Warning: db_exec: longdb 3140522125 nsdb0 0or1row dbqd.acs-content-repository.tcl.acs-content-repository-init.acs_cr_scheduled_release_exec.schedule_releases
[26/Dec/2007:13:36:29][36768.5426176][-sched-] Warning: sched: excessive time taken by proc 10 (3140 seconds)
[26/Dec/2007:14:07:37][36768.6235136][-conn:openacs-4::1] Error: nsopenssl (openacs-4): SSL read error: tlsv1 alert unknown ca
[26/Dec/2007:14:36:32][36768.5426176][-sched-] Warning: db_exec: longdb 2703876075 nsdb0 0or1row dbqd.acs-content-repository.tcl.acs-content-repository-init.acs_cr_scheduled_release_exec.schedule_releases
[26/Dec/2007:14:36:32][36768.5426176][-sched-] Warning: sched: excessive time taken by proc 10 (2703 seconds)

and the db is growing to over 11gb.. (should be much less than 1) cr_scheduled_release_job is huge..

I'm guessing the cr is corrupt somehow. How would you suggest I go about checking it or otherwise resolve this?

Collapse
Posted by Torben Brosten on
For example, something simple like:

select * from cr_scheduled_release_job limit 3;

just hangs indefinitely.. so how would I find out what jobs are scheduled?

Collapse
Posted by Torben Brosten on
select * from cr_scheduled_release_log shows 700+ rows of this:

           exec_date           | items_released | items_expired | err_num | err_msg 
-------------------------------+----------------+---------------+---------+---------
 2007-11-29 02:02:03.004584+00 |              0 |             0 |         | 

ie. all indicate 0 items_released and no errors

Collapse
Posted by Torben Brosten on
=# select count(*) from cr_scheduled_release_job;
  count  
---------
 1690075
(1 row)

=# select * from cr_scheduled_release_job limit 3;
 job_id |           last_exec           
--------+-------------------------------
        | 2007-12-27 05:05:27.041547+00
        | 2007-12-27 05:05:27.041547+00
        | 2007-12-27 05:05:27.041547+00
(3 rows)

Collapse
Posted by Torben Brosten on
select count(*) from cr_scheduled_release_job where job_id is not null limit 3;
 count
-------
     0
(1 row)
Collapse
Posted by Torben Brosten on

Just did a

delete from cr_scheduled_release_job;

Will see what happens..

Collapse
Posted by Don Baccus on
Something's screwy...

create function cr_scheduled_release_tr () returns opaque as '
begin

raise EXCEPTION ''-20000: Inserts are not allowed into cr_scheduled_release_job.'';

return new;
end;' language 'plpgsql';

create trigger cr_scheduled_release_tr
before insert on cr_scheduled_release_job
for each row execute procedure cr_scheduled_release_tr ();

This trigger exists because cr_scheduled_release_job is supposed to be a one-row table.

Other than the initial INSERT that follows the CREATE TABLE command when OpenACS is installed, there's only one UPDATE statement on the table in acs-content-repository.

It looks to me as though PG's making the old dead rows visible (UPDATE creates a new row, but transaction management is supposed to make the previous row invisible).

BTW deleting all the rows isn't probably the best thing to do, since the trigger will now prevent you from adding the one row the UPDATE wants to UPDATE :) You might drop the trigger, add the row, and recreate the trigger.

However ... there may be something wrong with your database that's a lot more important than this table.

If I were you, I'd dump the database, load it into a fresh db, and do some sanity checking to make sure you can dump and restore it ...

Collapse
Posted by Tom Jackson on
Not related to the bug really, but scheduled procs should nearly always run in a thread. If any scheduled procs were supposed to run while this one was busy, they would be skipped. In addition, these skipped scheduled procs will not reschedule, so you lose all of them until a restart.