Forum OpenACS Q&A: Development branch hosed?

Collapse
Posted by Ola Hansson on
Hi,

I checked out a fresh version of the HEAD and now I can't get to the homepage, http://192.168.0.100 anymore. It worked a couple of days ago.

AOLserver starts without errors (almost, see below) and listens on that IP, port 80.

These are the only errors I get during startup but they are not related, I believe:

...

[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: Querying 
'select count(*) from pg_class;'
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: dbinit: 
sql(localhost::dotlrn-test): 'select count(*) from pg_class'
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: Querying 'drop 
function __test__();'
[17/Sep/2002:11:26:10][15034.1024][-main-] Error: Ns_PgExec: 
result status: 7 message: ERROR:  RemoveFunction: function 
'__test__()' does not exist

[17/Sep/2002:11:26:10][15034.1024][-main-] Error: dbinit: 
error(localhost::dotlrn-test,ERROR:  RemoveFunction: function 
'__test__()' does not exist
): 'drop function __test__();'
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: Querying 
'create function __test__() returns integer as 'begin end;' 
language 'plpgsql';'
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: dbinit: 
sql(localhost::dotlrn-test): 'create function __test__() returns 
integer as 'begin end;' language 'plpgsql''
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: Querying 'drop 
function __test__();'
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: dbinit: 
sql(localhost::dotlrn-test): 'drop function __test__();'
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: Loading acs-tcl
[17/Sep/2002:11:26:10][15034.1024][-main-] Notice: Loading 
packages/acs-tcl/tcl/00-database-procs-postgresql.tcl...

...
When I request the index page, this is what I get:

[17/Sep/2002:11:32:05][15034.2051][-sched-] Notice: Running 
scheduled proc search_indexer...
[17/Sep/2002:11:32:05][15034.2051][-sched-] Debug: 
db_qd_get_fullname: following query in file: 
packages.search.tcl.search-procs proc: search_indexer
[17/Sep/2002:11:32:05][15034.2051][-sched-] Debug: PgBindCmd: sql 
= 
            select object_id, event_date, event
            from search_observer_queue
            order by event_date asc
        
[17/Sep/2002:11:32:05][15034.2051][-sched-] Notice: Querying '
            select object_id, event_date, event
            from search_observer_queue
            order by event_date asc;'
[17/Sep/2002:11:32:05][15034.2051][-sched-] Notice: dbinit: 
sql(localhost::dotlrn-test): '
            select object_id, event_date, event
            from search_observer_queue
            order by event_date asc
        '
[17/Sep/2002:11:32:05][15034.2051][-sched-] Notice: Done running 
scheduled proc search_indexer.

I noticed here that some work on the core has been going on lately. I'm mot saying that this is what broke things, but that it might have been.
Collapse
Posted by Dave Bauer on
Ola,

I had the same problem yesterday. It seems to be host-node map related. Tail the log, and you should see the query it stops on when you request a page. No error shows up in the log that I could find. I was just getting connection refused. I noticed that my system was saying connection refused by dave.deepskydesign.com even though I was using the IP address in the location bar. So perhaps adding an entry to the host-node map will help.

The search indexer runs every 30 seconds so I can often scroll by in the log while you are looking for a real error message.

I dropped my database and recreated it, and reloaded OpenACS and it was working again. Sorry, I didn't troubleshoot it further, I thought it was something I had done to my system.

If you don't need any of the data, this is an ok solution.

Collapse
Posted by Jeff Davis on
Oh, I think I know what it is.  I just applied a patch to make
ForceHostP work.  If you have ForceHostP=1 the server will redirect
to the cannonical hostname (i.e. http://yourserver.com will redirect
to http://www.yourserver.com).  This is normally what you want but
if the hostname you set in your server config tcl file does not
resolve or point at the correct IP you will be hosed.

You can either fix hostname or set ForceHostP to 0 (which should
probably be the default).

Let me know if that fixes it.

Collapse
Posted by Lars Pind on
Jeff,

It should default to 0, the other is too risky.

Do you want to or should I? It's in packages/acs-kernel/acs-kernel.info.

/Lars

Collapse
Posted by Ola Hansson on
You see, my server is only a dev server that I run locally so I can only get to it through 192.168.0.100 (except I can't for the moment😉).

I did do a total re-install - dropping and recreating the db, etc...

Jeff, I can't set the ForceHostP since I can't get to any of the urls on my setup.

The last snippet above is all I get in the log when I request the index page.

Collapse
Posted by Jeff Davis on
you can change it directly via psql:
update apm_parameter_values set attr_value = '0' where parameter_id = 
(select parameter_id from apm_parameters where parameter_name = 'ForceHostP');
Anyway, I think the other problem is that it will not play well with host node maps. I will look into that as well.
Collapse
Posted by Ola Hansson on
On some Debian boxes the [ns_info address] and/or (can't remember which) [ns_info hostname] does not work, so I usually just enter a valid IP...
Collapse
Posted by David Walker on
I would think for most installs 0.0.0.0 (Listen on all IPs) should
be satisfactory.
Collapse
Posted by Jeff Davis on
David, the issue is not which IP is listened to, it is that
ForceHostP will issue a redirect to the "cannonical hostname"
If the hostname parameter of nssock does not translate to
something the client can resolve the redirect will fail.  It uses
[ns_conn location] to determine the url to which to redirect.
If you are going to use an IP for Address you will also need to
provide an IP or hostname for nssock's hostname parameter
(and 0.0.0.0 for hostname would probably be a bad choice unless you
are only ever going to connect from localhost).
Collapse
Posted by Jeff Davis on
I checked in a fix to make hostname based subsites work correctly
with ForceHostP set
Collapse
Posted by Ola Hansson on
Jeff, I updated using the statement you provided and I restarted the server, but it still doesn't work with the following in config.tcl (this is what worked before).
#set hostname               [ns_info hostname]
set hostname 127.0.0.1
#set address                [ns_info address]
set address 192.168.0.100
What do I have to change?
Collapse
Posted by Jeff Davis on
If you did the psql statement I sent then there is something else
altogether broken with your server.  Why don't you email me the
last 1000 lines of your error log and I will see if I see anything strange.

On the other hand the setup you sent will fail with ForceHostP=1 since when the server
sees get http://192.168.0.100 it wil redirect to http://127.0.0.1
but your server is not listening on that port (which you can confirm
via netstat -an).  You should have hostname=192.168.0.100 as well
and that should work.

Collapse
Posted by Ola Hansson on
No luck so far (with 192.168.0.100 as both hostname and address).
debian:~# netstat -an | grep :80
tcp        0      0 192.168.0.100:80        0.0.0.0:*             LISTEN 
I sent you the log file, Jeff. Thanks.
Collapse
Posted by Ola Hansson on
It appears to be working now. I think I have seen some cached pages the last time I tried, when hostame and address were the same. I reinstalled a few times and closed and started the browser, and all of a sudden it "just worked".

Thanks Jeff, and you others for helping and for putting this important patch in there!

Collapse
Posted by Kjell Wooding on
Glarg. I was burned by this one too. The ForceHostP default should
DEFINATELY be 0.
Collapse
Posted by Jeff Davis on
the default has been changed to 0 in the .info file but if you
don't want to reinstall and you can't get to your site you will
have to change it in the db with the sql I posted above.  Hopefully
we did not ruin too many people's day with this :)
Collapse
Posted by Stan Kaufman on
It sounds as if you guys have fixed a variety of ills over the past 24 hours, but I wonder if something has gotten broken in the process. After a CVS update today, my server is dead in the water.

I brought this up over at this thread https://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=0006DJ&topic_id=12&topic=OpenACS%204%2e0%20Design but perhaps this is a more appropriate place to beg for help.

No pages get served even though the AOLServer starts OK, is listening on the correct port, and ForceHostP is 0. Any page request produces this error:

debug: RP (9.866 ms): rp_filter: setting up request: GET / 
[16/Sep/2002:14:26:44][26713.5126][-conn2-] Error: tclop: invalid return code from filter proc 'can't read "node(instance_name)": no such element in array': must be filter_ok, filter_return, or filter_break

Lars introduced instance_name in several packages/acs-tcl/tcl/ procs. I can't figure out what needs to be fixed here; it's too deep in the plumbing.

I've dropped the database and reinstalled a couple of times; the install goes fine without errors, but once it's complete, the server won't handle any more pages due to this error.

Any suggestions? Has anyone else installed from the current CVS and got it to work? TIA.

Collapse
Posted by Jeff Davis on
can you post the output of "cvs status site-nodes-procs.tcl request-processor-procs.tcl"
(this in the packages/acs-tcl/tcl dir).
Collapse
Posted by Stan Kaufman on
Hi Jeff! Thanks for your reply!

Here are the outputs from cvs status:

===================================================================
File: site-nodes-procs.tcl      Status: Up-to-date

   Working revision:    1.17
   Repository revision: 1.17    /cvsroot/openacs-4/packages/acs-tcl/tcl/site-nodes-procs.tcl,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

===================================================================
File: request-processor-procs.tcl       Status: Up-to-date

   Working revision:    1.21
   Repository revision: 1.21    /cvsroot/openacs-4/packages/acs-tcl/tcl/request-processor-procs.tcl,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

===================================================================
File: site-nodes-procs-oracle.xql       Status: Up-to-date

   Working revision:    1.11
   Repository revision: 1.11    /cvsroot/openacs-4/packages/acs-tcl/tcl/site-nodes-procs-oracle.xql,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

===================================================================
File: site-nodes-procs-postgresql.xql   Status: Up-to-date

   Working revision:    1.15
   Repository revision: 1.15    /cvsroot/openacs-4/packages/acs-tcl/tcl/site-nodes-procs-postgresql.xql,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

FWIW, I just installed the 4.5 tarball and it runs just fine (well it did once I'd psql'd ForceHostP to 0 as you directed above). So I'm confident the problem isn't with AOLServer or my nsd.tcl config file, or PG, or the install process itself. It could of course be some other stooooopid error I've made, but if so, I can't spot it.

Thanks!

Collapse
Posted by Stan Kaufman on
This morning I nuked my local CVS repository and started new from openacs.org:/cvsroot, and the install went fine and the site comes up. I'm mystified what happened, since there were no new commits to the tree between my last update last night and this morning. So whatever was going on was a problem in my local CVS respository (or something). Bizarre.

I've posted a bit more info about this ordeal at https://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=0006DJ&topic_id=12&topic=OpenACS%204%2e0%20Design.

Anyway, thanks all for your help, and sorry for wasting bandwidth on a self-solving problem.