Forum OpenACS Development: Re: Blocking access to login based on repetitive hits by ip address?

19: Re: Blocking access to login based on repetitive hits by ip address? (response to 1)

Posted by Brian Fenton on 10/11/04 01:28 PM

Hi Gustaf,
you say "on our system we have up to 3.5 mio hits per day,
around 15 dynamic views per second (sustained
avg over an hour, not counting images/css files)."

This is very impressive! You must have some interesting war stories to tell. Can you provide us with some information on your setup, hardware, failover etc? I'm sure many of us would like to hear how such a high performance site is setup.

Regards and many thanks.
Brian

20: Re: Blocking access to login based on repetitive hits by ip address? (response to 19)

Posted by Gustaf Neumann on 10/12/04 12:56 AM

There is a slide-set from the .LRN-Meeting in Heidelberg at

http://nm.wu-wien.ac.at/research/publications/learn-heidelberg-1.pdf
http://nm.wu-wien.ac.at/research/publications/learn-heidelberg-2.pdf

In part 1 towards the end, you see our setup. The load is
distributed to three dual Pentium-4 processor servers,
which are
(1) reverse proxy pound + aol-server for static requests
(2) aolserver for dynamic requests
(3) database server running PostreSQL

Server (1) is handling the load easily, (2) is the
bottleneck of the configuration. These servers use
a common RAID-System. We have a fallback configuration
for each system, but we attempted no automatic switch
through a heart-beat etc, but switching is quite
easy though the reverse proxy. The system is running very
robust, e.g. server 2 has currently an uptime of >400
days.

The performance tweaking was done by Peter Alberer and was
achieved through caching on various places
and downstripping some packages
(e.g. using static portal pages for courses and classes).

All performance figures are from this described configuration.
For the next month we expect between 4 and 5 mio requests
per day, and we hope to switch back to the dynamic portlets
in many places. We would not be able to handle this load
with the current configuration easily; furthermore it is
quite hard to switch the configuration during the term.

Fortunately, we got a hardware grant and bought two
eight-processor xeon systems (2.7GHz MP) for
server (2) and (3). On friday we switched to
this new server; from the SPECint rate, we should
be able to get nearly three times the throughput
of the old system. Currently, we have only 1.2
mio requests (the term ist just starting), it is
to early to have a feeling about the real world
performance of the new servers...

-gustaf

21: Re: Blocking access to login based on repetitive hits by ip address? (response to 20)

Posted by Brian Fenton on 10/12/04 10:25 AM

Thanks Gustaf. Again it's very impressive. Congratulations!

22: AOLserver 4.1 for WU-Wien (response to 20)

Posted by Andrew Piskorski on 10/14/04 06:56 PM

Indeed, that is quite impressive!

Gustaf, your slides mention that your using AOLserver 4.1. 4.1 is still in alpha, so I'm curious why you're using it rather than 4.0.x - could you tell us more about that? You said your rate limitor is serving dynamic content in AOLserver, so is it because of performance improvements there in 4.1?

23: Re: AOLserver 4.1 for WU-Wien (response to 22)

Posted by Gustaf Neumann on 10/15/04 01:02 AM

We simply got 4.1 (from cvs?) and it happens to work nicely.
I have no idea whether there is a performance difference
compared to the 4.0.* versions. Here is the version we are using:

from ns.h:
* $Header: /cvsroot/aolserver/aolserver/include/ns.h,v 1.58 2004/03/10 04:45:04 dossy Exp $
*/
#define NS_PATCH_LEVEL "4.1.0"

The throttle module can throttle all kinds of requests. We
did not want to catch cases, where a user requests
an HTML page including a couple of images, but cases,
where multiple HTML pages are requested frequently within
a time period form a user. In "throttle check", we do simply
...
if {[string match image/* [ns_guesstype $url]]} {
return 0
}
...
On our site, practically all HTML page requests
are quite costly dynamic requests. Without the
throttling code, we had users who tried to copy
the whole site content with IE or other tools.
Our uses did that in particular, when the server was
quite busy. This eager copying however had an effect
of a DOS attack bringing the server to its knees.
The blocking code simply returns a short error
message to the requestor telling him to slow down...

Does this answer your question?
-gustaf

24: Re: Blocking access to login based on repetitive hits by ip address? (response to 20)

Posted by Andrew Piskorski on 10/15/04 03:40 AM

Eight-way SMP boxes are horrendously expensive. Wouldn't you have been better off buying only one of those for the RDBMS, and using several dual CPU servers to run the AOLservers for the dynamic pages?

25: Re: Blocking access to login based on repetitive hits by ip address? (response to 24)

Posted by Gustaf Neumann on 10/15/04 11:27 AM

There are many aspects to this.

* first of all, we applied for national hardware
grant and it looks as if our impressive numbers
and some vision helped us to get it. However, the grant
was purely a server-hardware grant and we are obliged
to spend the money for this purpose.

* Secondly, we got a very good deal on the machine,
much better than i expected (i can't talk about
prices here). There would have been
a significant price jump between the 2.7 GHz and 3 GHz
processors. Another, much more significant
price jump would have been heading towards itanium
machines. We got the machine without OS, put FC2
on it without any problems and that was it.

Concering just one big iron: there are a couple of
aspects:

- in our current setup (up to last week),
the biggest bottleneck
was the "dynamic" aolserver, followed by the
database server (with earlier versions of
postgres, it was the other way around).
so, using two n-way machines helps us
immediately without restructuring the
apps, thinking aboutflushing distributed
caches, etc. With this setup we are simply
on the safe side.

- Many of the dynamic requests depend on the
content repository and therefore the file
system. I did not make tests, but i would
not be surprised to run into problems when
different machines hammer around in the
same file-system (e.g. shared via NFS).
We are frequently thinking about further
distribution and more redundant backend
servers (managed by pound), we will go for it
when necessary.

- the two 8-way machines can be combined
into one 16-way SMP machine. So we have the
option to switch to one big database machine
in the future.

Alltogether, we are not only worrying about
performance, but also about reliability, robustness
and maintainability. The new machines can be maintained
over the web (rebooted etc), they are highly redundant
(they can have hot swap memory, but we did not go
for that), and they are nicely engineered.

Our system is seen as central infrastructure of a large and
important university. As it is with infrastructures
and utilities, people expect it to work 7x24 (but
we have no personnel for ensuring that). While many
(most?) learning management systems provide mostly print
materials (slides, handouts) in electronic form,
we have mostly interactive materials, providing immediate
feedback. So, the students really prepare for their
exams over the system, they rely on it. If our system
would work unrelyably, many people would be immediately
upset. If this would happen at a bad moment, we will most
likely make it to the newspapers. So, spending more money
in robustness seems worthwhile.

-gustaf