Forum OpenACS Q&A: Oracle Ate my Server

Collapse
Posted by Kevin Crosbie on
Hi all,

I'm having a problem with my pool connections to Oracle.  My setup is 
as follows:  
Oracle 8.1.7
RedHat 7.2
AOLserver/3.3.1+ad13
ArsDigita Oracle Driver version 2.6
OpenACS 4.5.1b

Basically, I started my web-process, and had it running for a few 
days.   My site began to get very slow, so I ran "top" to see what 
was going on.   I had an Oracle process which had been running for 
1708 minutes and was taking up 99% of the CPU, 5% of Memory.

I downed the webserver to see if that would fix it.  It didn't.   

I checked my v$session table in oracle to see what connections were 
being made to Oracle, and there were three entries for OpenACS, one 
of which was performing a high number of Physical Reads and a high 
number of Block Gets.   This process had the same process ID as the 
one from my "top".

I did a few tests.   I ran the following TCL commands through a tcl 
page on my server:
set pool [db_nth_pool_name $db_state(n_handles_used)]
ns_log Notice $pool
set dba [ns_db gethandle pool1]
ns_log Notice $dba
set dbc [ns_db gethandle pool3]
ns_log Notice $dbc
#Notice I rearranged the pools because pool2 fails
set dbb [ns_db gethandle pool2]		
ns_log Notice $dbb
set output "Results: Pool: $pool, DB: $dba $dbb $dbc"
ns_return 200 text/html $output

had the following results:
[05/Apr/2002:12:45:54][8196.7176][-conn4-] Notice: pooln
[05/Apr/2002:12:45:54][8196.7176][-conn4-] Notice: nsdb1
[05/Apr/2002:12:45:54][8196.7176][-conn4-] Notice: nsdb2
[05/Apr/2002:12:45:54][8196.7176][-conn4-] Notice: RP (280.098 ms): 
error in rp_handler: serving GET /pooltest.tcl 
	ad_url "/pooltest.tcl" maps to 
file "/web/demo/www/pooltest.tcl"
errmsg is no access to pool: "pool2"
[05/Apr/2002:12:45:54][8196.7176][-conn4-] Error: GET /pooltest.tcl  
no access to pool: "pool2"
    while executing
"ns_db gethandle pool2"
    invoked from within
"set dbb [ns_db gethandle pool2]"

Looking at top, the oracle process was still there.

Next I edited my Aolserver tcl file and changed my tcl library to 
point to an empty directory, i.e. such that it would not source 
OpenACS Tcl files, and switched on EnableTclPages.
I reran the above code(without the set pool), and all handles were 
obtained without errors.

This time the oracle process disappeared when I checked it 
with "top", thus fixing the problem.

It looks like somewhere along the way, pool2 is being allocated and 
not released.   I found it strange that when I downed my web-server 
and restarted, the same process was there, and pool2 was still not 
available.
Collapse
Posted by Don Baccus on
I've had Oracle run away with my laptop like this once a couple of months ago, but I couldn't relate it to any particular action I was taking in OpenACS 4.5.

Nor could I replicate it later.  It happened once then never again.

It really had the feel of being an Oracle problem.

Of course your problem could be very different.  Do you have lots of users or content on your site?  It may be that you hit a query that does an unqualified join on two large tables.

If your site's a development site with little content, though, as much as I hate to say it you may've run into an Oracle problem.

Collapse
Posted by Tobel Graves on

I had a similar problem, and I found that the answer to my problem was to uninstall the webmail module. It makes some calls to java packages that must be downloaded from Sun's website and then loaded by hand (as is very well documented in the webmail module). If the classes aren't present, oracle's internal JVM runs amok.

The problem is also solved if you install the appropriate java classes they are: javamail, and the Java Activation Famework .

Don't know if this will help you or not...

Good luck.

Collapse
Posted by Kevin Crosbie on
Thanks for that Tobel.  That solved the problem.  The log was showing some webmail errors, but I let them go as I knew I hadn't installed the proper jars.

I've taken webmail off, and everything seems to be fine now.

Cheers.