Forum OpenACS Q&A: ORA-01000: maximum open cursors exceeded ??

Folks, using ACS 4.2 and Oracle, I've started seeing fairly frequent "ORA-01000: maximum open cursors exceeded" errors. This is with Oracle 8.1.7 on Solaris, and open_cursors = 500 in my init.ora.

This is a very lightly loaded Dev site, and I seemed to start seeing this problem after loading the Static Pages and Site Wide Search ACS packages, and using Intermedia with them. And when I do get this error, it is very disturbing as it takes down my whole site - every single db-backed page dies. Restarting AOLserver seems to alleviate the problem.

I am not sure what the effect of the error might be on another ACS install running on the same Oracle instance, but I know that sometimes when I've gotten this error on the web server, I've ALSO gotten it when trying to do ANYTHING in my sqlplus session. And other times, the error has appeard on the web server while sqlplus seemed ok.

But, one very strange thing is, immediately before, during, and after getting this error, I can often go to sqlplus and do:

SQL> select count(*)  from v$open_cursor;
  COUNT(*)
----------
        48

And 48 is clearly far under my 500 maximum open cursors. At other times, I've seen vastly more open cursors - in the hundreds - but it looks almost as if the number of cursors reported in v$open_cursor has no bearing on this problem, which seems odd to say the least.

Has anyone else seen this problem before?

Collapse
Posted by Brian Fenton on
Hi Andrew, this error means that somewhere cursors aren't being closed when opened. They will stay open until the database user is disconnected, which is probably why you're being forced to close AOLserver. The following bits of SQL should help you track down which cursors are causing the problem:
select user_name, status, osuser, machine, a.sql_text 
from v$session b, 
v$open_cursor a 
where a.sid = b.sid 
/ 
and
select a.value, b.name 
from v$mystat a, v$statname b 
where a.statistic# = b.statistic# 
and a.statistic#= 3 
/ 
I found these at http://www.orafaq.com/error/ora-01000.htm.
Hope this helps
Thanks, Brian. From that FAQ, it sounds like the only ways I could be getting cursors left open when they should not be are:
  1. Bugs in the Oracle driver.
  2. Uncommitted transactions.
  3. Failing to close cursors in PL/SQL.

Option 3 (PL/SQL) seems the likely culprit to me... Probably exacerbated by any big, long-running transactions.

I did not actually determine where my open cursors were coming from, but I did review the static-pages, site-wide-search, and acs-content-repository ACS packages, and fixed un-closed cursors in the following files:

site-wide-search/sql/content-revision-sws.sql
site-wide-search/sql/search-tables.sql
acs-content-repository/sql/content-perms.sql
acs-content-repository/sql/content-item.sql
acs-content-repository/sql/content-schedule.sql
acs-content-repository/sql/doc-package.sql

These are pretty trivial fixes, and I haven't really tested them at all yet on my system, but if somebody wants patches against ACS 4.2 holler and I'll get myself organized and provide them.

Also, it sure is annoying the way v$open_cursor.sql_text only shows you the first 60 characters of the SQL statement. I tried, but failed, to get the full text of the SQL from v$sqltext.piece in the way you can with this query:

-- What are users currently doing?
set linesize 180
select s.username, s.serial#, sql.sql_text
from v$session s, v$sqltext sql
where sql.address    = s.sql_address
  and sql.hash_value = s.sql_hash_value
  --and upper(s.username) like 'USERNAME%'
order by s.username ,s.sid ,s.serial# ,sql.piece ;
Collapse
Posted by Brian Fenton on
Hi Andrew, good to hear you've been tracking down the cursors.
Try this query to see the full SQL text:
select user_name, status, osuser, machine, a.sql_text, c.sql_text 
from v$session b, v$open_cursor a,  V$SQLAREA c
where a.sid = b.sid
and c.address = a.address;
I find AOLserver a much better client than SQL*Plus for outputting these long queries! A nice example of this is admin/monitoring/cassandracle/performance/pct-large-table-scans in Cassandracle.
Take care.
Collapse
Posted by Don Baccus on
Andrew - thanks for the dirty, hands-on debugging effort.  I've informed Dan Wickstrom (acs-content-repository) and Neophytos Demetriou (site-wide-search) that you've uncovered these cursor leaks,
so we should have these fixed by the time we roll-out our OpenACS4 alpha release of the toolkit (in a week or so).
Collapse
Posted by Dan Wickstrom on
One things I noticed about the openacs-4 code was the inconsistent way in which cursors were closed. For example, this code in doc-package.sql seems to imply that the cursor is closed when NOTFOUND occurs:

  begin
  
    v_result := '';
    v_started := 'f';

    open v_package_cur;
    loop
      fetch v_package_cur into v_line, v_text; 
      exit when v_package_cur%NOTFOUND;
      
      -- Look for the function header
      if v_started = 'f' then
        if v_text like '--%' then
          v_started := 't';
        end if;
      end if;
    
      -- Process the header
      if v_started = 't' then

        if v_text not like '--%' then
          close v_package_cur;
          return v_result;
        end if;
        
        v_result := v_result || v_text;
      end if;
    end loop;
    -- Return unfinished result
    return v_result;     

  end get_package_header;

end doc;
/
show errors

As you can see, the cursor is closed prior to returning from the loop, but if a NOTFOUND condition occurs, the loop and then the routine is exited without closing the cursor. To test this, I setup a simple test case that emulated this code, and I explicitly closed the cursor after the NOTFOUND condition occured. This worked without throwing an error, so, as Don pointed out to me, the cursor must have still been open, since closing an already closed cursor will cause an error.

Collapse
Posted by Don Baccus on
At least the Oracle documentation claims an error would be thrown!  In
aD's defense I'll point out that I found two conflicting examples in the Oracle documentation on explicit cursors, one which closes the cursor after a NOTFOUND condition was raised and one which doesn't.

Since explicit closing doesn't throw an error we should ferret out these cases and do the close.

Andrew - would you have time to make these changes to your local instance of ACS 4.2 Classic and let us know if doing so causes your cursor leak problem to go away?

Collapse
Posted by Dan Wickstrom on
I did try testing the close of an already closed cursor, and it threw and error, so I think it's probably safe to assume that a cursor is still open after a NOTFOUND event, since it doesn't throw an error when it is closed.
Collapse
Posted by Tom Jackson on

You definitely should not close the cursor inside the loop. You should close it just outside the loop, since that is when you are done with it. Also, you should get an error if you try to open an already open cursor, so you should check for that before you open the cursor. Occasionally you get an error in your loop. You should use the exception block to check for and close an open cursor (Just remembering that you might have to comment out the exception block to find bugs in your code)

  --
  -- Open the cursor
  --
  if not my_cursor%ISOPEN
  then
   OPEN my_cursor;
  end if;
  --
  -- Begin Looping through records
  --
  loop
   FETCH my_cursor INTO v_my_cursor_var;
   EXIT WHEN my_cursor%NOTFOUND;
   --
   -- Do some work
   --
   null;
  end loop;
  CLOSE my_cursor;
  exception 
   when others
   then
     if my_cursor%ISOPEN
     then
       CLOSE my_cursor;
     end if;
Collapse
Posted by Don Baccus on
The example closes the cursor in the loop just before the return.  As you point out it should close it at the end of the loop, I'm sure that's where Dan put it (since the loop only EXITs on that condition).

Now ... the exception case is one I can guarantee neither Dan nor I thought of, and thank you very much for pointing this out.  Yes, indeed, one should do so, just to be safe.

Andrew - would you have time to make these changes to your local instance of ACS 4.2 Classic and let us know if doing so causes your cursor leak problem to go away?
Since cleaning up the my cursors in site-wide-search and acs-content-repository, I've now done a fair amount of letting static-pages loading big blobs in and out of the CR and the like, and I haven't seen the maximum open cursors exceeded error come back. I still tend see rather large numbers of usually inactive cursors hanging around, so I suspect there's more of this bug in the other ACS packages. But coincidence or not, I haven't had it bite me again, after I cleaned up those two packages.

As for when cursors get closed, I always assumed that if you ever explicitly open a cursor in Oracle (as opposed to using a cursor for loop, for instance), then the only way it's ever going to get closed is if you manually close it, or if the client program (AOLserver, sqlplus) terminates its connection. But does that mesh with the behavior you guys have seen in practice?

As a quick check, I just did
  find openacs4 -name '*.[xs]ql' |xargs grep -i 'open ' |
    egrep -v 'ql:[[:space:]]*--|ql:[[:space:]]/[*]'

to list all the places in .sql and .xql files that cursors get opened (the nasty-looking egrep -v just excludes comment lines from the output). I see 74 such places in 28 files. Someone who knows PL/SQL and plpgsql better than I do might want to review each such open of a cursor to check that there is a corresponding close in all possible code paths, generating patches as necessary?

Now that Andrew has found and identified the issue, it would be good to deal with it systematically, rather than on an ad-hoc per-module basis, I think.

Collapse
Posted by Don Baccus on
Yes, it would be better to tackle this systematically.  We need a volunteer ... Dan's swamped, that much I know.
Anyone willing to take this task on?  Going over these procs looks like a good day's work.
Collapse
Posted by Dan Wickstrom on
I did go through the core packages and fix all of the open cursors.  Of course that was before Tom pointed out the need to test and close cursors from within exceptions.  I'll go ahead and fix the exception cases in the core (there's only a couple of them), if somebody else wants to track down and fix the rest of them for non-core packages.