Forum OpenACS Q&A: opencs.org stops working at night?

Last night, I happened to notice that openacs.org became completely un-responsive for a long time (probably 1-3 hours), c. 4 or 5 am EDT or so. There were no errors or anything like that, it just refused to serve the page, any page. The browser would just spin waiting forever and eventually give up.

To be sure it wasn't some sort of browser or network snafu on my end, I ssh'd to the openacs.org unix box, and tried hitting the openacs page with lynx from there - same problem. Load on the box was around 3 to 5. Two postmaster processes were sucking up the most cpu (dumping backup files?), but I don't know if that had anything to do with it.

Is this normal? To be expected? Does it happen every night? Anyone know why?

Collapse
Posted by Randy Ferrer on
I was not able to get into the site until this afternoon. So either the site was down for more than 3 hours or the problem is intermittent...
Collapse
Posted by Andrew Piskorski on
I did not try the site again until this afternoon either, so I simply assumed that the problem only occurred for a few hours last night. Seems my assumption was wrong...
Collapse
Posted by Andrew Piskorski on
There was some discussion of the downtime on IRC.
Collapse
Posted by C. R. Oldham on
So did someone restart it, or did it just mysteriously start working again?
Collapse
Posted by Don Baccus on
It probably got restarted.  One thing that Janine/Mike noticed was that googlebot decided to take a swing through the site.

We have some known issues with the site but the causes aren't known, if that makes sense.  Posts in particular are extremely slow.  We need to do some site debugging to see what's going on.

Collapse
Posted by Andrew Piskorski on
According to the /usr/local/aolserver/log/openacs-error.log* logs, the openacs.org AOLserver was last restarted at: 13/Apr/2003:22:47:51

Oh, oops. Looks like that's the wrong AOLserver. openacs.org-error.log says openacs.org restarted 3 times today:

[andrewp@samoyed log]$ grep 'AOLserv.*starting' openacs.org-error.log
[07/Apr/2003:15:46:32][836.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[07/Apr/2003:16:43:39][5210.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[07/Apr/2003:16:48:00][8046.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[13/Apr/2003:18:58:22][844.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[13/Apr/2003:19:06:47][824.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[13/Apr/2003:20:40:01][854.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[14/Apr/2003:16:07:59][16888.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[17/Apr/2003:10:32:12][12481.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[17/Apr/2003:12:05:03][15306.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[17/Apr/2003:12:56:38][32096.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[18/Apr/2003:08:04:12][27651.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[18/Apr/2003:09:02:01][8673.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting
[18/Apr/2003:12:26:51][23113.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 starting

It's kind of interesting that just before the last restart, there was absolutely nothing in the error log between these two lines:

[18/Apr/2003:09:07:29][8673.36893][-sched:16-] Notice: NOTIF- process_all_replies starting
[18/Apr/2003:12:26:14][8673.1024][-main-] Notice: nsmain: AOLserver/3.3.1+ad13 stopping
Collapse
Posted by Malte Sussdorff on
We have seen some behaviour like that in our hosting environment as well. This is why we use keepalive to check for the AOLserver and automatically restart it.
Collapse
Posted by C. R. Oldham on
Ah, so the problem we are having with our site falling over is possibly not unique to us?  cf. my thread from earlier this week:

https://openacs.org/forums/message-view?message_id=93102

Of course, our AOLserver is segfaulting--is the OpenACS.org one doing the same thing?

Collapse
Posted by Mike Sisk on
Our network seems to have an affinity to attract googlebot...

Yep, googlebot started going through the site last night and at one time there were over 20 googlebot connections stacked up.

There is a load problem with the site that Don and Jeff know about but haven't gotten around to fixing yet.  Until we get this issue resolved I added a robots.txt file to block googlebot from crawling the site.

Also, the site isn't running with keepalive, which we run on all our other sites and something we'll be adding here soon, too.

Collapse
Posted by Randy Ferrer on

Perhaps there is an issue at Google...It might be a good idea to follow their suggestion as well - "For most sites, Googlebot should not access your site more than once every few seconds on average. Since network delays are involved it is possible over short periods the rate will appear to be slightly higher. If you find that we are placing too high a load on your site, please let us know by sending us e-mail at googlebot@google.com."

I and others use Google regularly to search Oacs since the Oacs search engine regretably does not always produce the best results....

Collapse
Posted by Michael Bluett on
There appears to be something wrong with the site search, results mostly don't include titles and the brief snippets. E.g. Searching for "Google" brings back results like: "Untitled https://openacs.org/forums/message-view?message_id=53769". Perhaps this is due to Google's visit and the subsequent load on the machine?

The cvs repository suggests that there isn't a robots.txt for OpenACS.org in CVS. I plan to exclude Google from attempting to post messages and log-on on my site using robots.txt (by barring robots from /forum/message-post and /register). I'm sure a few of us would be interested in any subsequent alterations to the robots.txt to exclude various dynamic parts of the site from Google's gaze.

For those unaware, Google is doing a "deep crawl".

Collapse
Posted by Tilmann Singer on
Dave Bauer and I have recently made some changes to the search on openacs.org, with the final goal to only index full threads instead of single messages, to avoid the cluttering of search results. It seems to work already for new threads - check this search:

https://openacs.org/search/search?q=powerbook+browse+compile+debug

note that the word 'debug' only occurs in the last message - previously search would not have found anything when the search spans the full thread.

What still needs to be done is to reindex all the old messages. I don't know why the abstracts are currently wrong, but after the reindexing that will be repaired.

Collapse
Posted by Michael Bluett on
Thanks Tilmann, the search should be much better after this.  I was always a bit disconcerted by the searching of individual messages rather than threads.
Collapse
Posted by Andrew Piskorski on
Eventually, listing the highest-hit individual posts as part of the thread hit listing would be ideal, but even without that, just returning the thread hit in search looks like a big improvement. Thank you, Tilmann and Dave!

Hm, in Forums, although each individual post always has its own message_id, Forums gives a link to display just that one post (and its direct replies) only if the poster assigned a different subject line, rather than accepting the default subject. That's kind of disconcerting. Occasionally you want to give a link to an individual post, so there should be a (small, unobtrusive) link on each post for displaying just that post.

It'd also be very nice to have <a name="message_id_foo"></a> anchors embedded in the page too, so you could display the entire thread but zooming in on a particular post. The UI decision of whether to make display using those name tags the default behavior for viewing a single post could (and should) be postponed to a later date, just sticking the name tags into the Forums HTML hurts nothing and would definitely be worthwhile.

Simple Matters of Programming, I guess. I'm sure others have thought of this too, just figured I'd point it out anyway...