Forum OpenACS Q&A: OpenACS 4.6.3 restore problem on Postgres7.2

Help!
My OpenACS4.6 site was restored to a new Postgres 7.2 instance (with schema name other than nsadmin). 

[There is a a bug when restoring on 7.2, that needs to 
source a postgresql.sql script as described here:
https://openacs.org/forums/message-view?message_id=70759]

On startup, it hangs at:
[03/Jan/2006:05:14:26][69307.134533120][-main-] Notice: Querying '
        select 1 as admin_exists_p
        from dual
        where exists (select 1
                      from acs_object_party_privilege_map m, users u
                      where m.object_id = 0
                      and m.party_id = u.user_id
                      and m.privilege = 'admin');'

for 1 hour or so before a fatal exit:

The offending view is acs_object_party_privilege_map 
which queries this view, acs_object_grantee_priv_map 
for which I have 5.5 million rows in the underlying tables.

Has anybody seen this behaviour before?

Could it be that the index creation of such big tables takes several hours 
   and it just hasn't finished?
Could it be that the indexes are not extracted/restored by Postgres?
Could it be that I don't have the privileges on that indexes  
and can't use them, because the restore was to another 
Postgres database and user as originally?

Am I missing something else?

Any help greatly appreciated. 
Collapse
Posted by Torben Brosten on
Stefan Deusch,

Did you use the -O option with pg_dump?

Were there any errors reported during the import of the db (not the postgresql.sql file)?

Did you vacuum analyze before aolserver startup?

Which tables have the 5.5 million rows?

Any other information that might help others help you?

Collapse
Posted by Stefan Deusch on
YEs, on Pg72 I did the export:
pg_dump -O -d -R -D -Fp -x -c -f myDB.Jan04.sql myDB
and imported the text sql with psql. 


I have 5.5 million rows for this view
      acs_object_grantee_priv_map := SELECT a.object_id,           
      a.grantee_id, m.descendant AS privilege FROM
      acs_permissions_all a, acs_privilege_descendant_map m 
      WHERE (a.privilege = m.privilege);
which is nested in view 
      acs_object_party_privilege_map
which is joined with users and specifically user 'admin' on 
startup, see above. 
That's where it hangs... 

Further analysis in Postgres showed, that 
the indexes are there. The only difference btw. old and new 
system - that I can think of - is the nspostgres.so driver 
on AolServer 4.1 vs. the old postgres.so on 3.4 and that the 
new system has database and AolServer distributed vs. a 
colocated setup previously. 


What is also strange is that the query as seen in the ns_log 
output above executes in < 1s from AolServer at startup, but 
when connecting to PG with psql, it takes >15 mins. before I 
kill it. I.e. I cannot run it manually on old OR new system. 

Has anyone an explanation for that?
Collapse
Posted by Torben Brosten on
If aolserver hangs at that point, there may be another cause.

You are using aolserver4.1? Can you try the more stable aolserver 4.0.10? ie. make changes to the system once you have verified that the transfer works.

Have the aolserver performance settings changed? stacksize? maxconnections? maxthreads? These are in the config.tcl file.

Is this on a bsd type system? If so, does the startup script include "ulimit -n 1024"?

Does the startup log have any warnings or other errors in it before the "fatal error"?

On aolserver startup, are there any differences in the messages between the two systems (since the db and files are supposedly the same)?

Collapse
Posted by Marc Fournier on
Torben ... he's running the same aolserver and nspostgres as we have running in your vServer ... aolserver 4.0.10 and nspostgres 4.0 ...

the big part that bothers me in all of this is that the query doesn't appear to ever return when I run it from the command line *on* the database server itself ... and I've checked disk I/O, swap and CPUs, and there is nothing unusual happening that I can see ...

Collapse
Posted by Marc Fournier on
Just on a whim, I tried to load the database on a 7.4 server, and that query doesn't even work over there:

ERROR: operator is not unique: bit varying || "unknown"

HINT: Could not choose a best candidate operator. You may need to add explicit type casts.

CONTEXT: PL/pgSQL function "tree_right" line 12 at return

Collapse
Posted by Torben Brosten on

I had a 7.2 pg db with sql_ascii encoding that didn't import very well because it contained many characters that were not part of the sql_ascii set. Maybe there's one or more case creating silent errors during import?

check the original db dump with:

iconv -t ascii db_filename > db_test

It should error if there's anything it cannot convert. You could try 'iconv -c' on the dump for good measure, if you decide to import one more time..

Collapse
Posted by Torben Brosten on

btw, latin1 may be a more suitable choice than ascii.

Reviewing my notes from a year ago, there were a few cases where ; was not quoted in values, causing premature EOLs, for example:

ERROR:  unterminated quoted string at or near "'2m035 1/15hp universal 115volt 2 1/4 x 2 3/4 x4 1/4 ;" at character 993427

and other control codes in inappropriate places, resulting in some nearly unique errors like these:

invalid command \012cal_item',NULL);

and

invalid command \004
ERROR:  syntax error at or near "cal_item_write" at character 1157529

One can visually miss them quite easily while watching for errors scrolling off the screen.

IIRC, there were also cases where users had registered with nonstandard characters in name fields. Being late for me, and my thinking a bit far fetched right now, I see how something like that might cause some incomplete import of object_id data (and relationships) and subsequently adversley affect some abstracted views and queries.

Anyway, hope this helps,

Collapse
Posted by Marc Fournier on
tried both ascii and latin1, and in both cases, all I'm getting is:

pgsql74# iconv -t ascii ascina.sql > db.test

iconv: ascina.sql: cannot convert

and the last line in db.test is at the non-ascii characters at:

"Supported by an Erwin-Schr\xc3\xb6dinger Fellowship"

but, doing a SELECT on that record in the database itself, the full record is showing up, so the load (at least at that point) appears to be fine ...

Collapse
Posted by Torben Brosten on
sure.. but it only takes one special character in an expected place to wreak some havoc.

Trying a version of the dump via 'iconv -c' might be the fastest way to eliminate the gremlin character theory. On the off chance it works, it becomes a matter of grepping for the piece of offending content.