Forum OpenACS Development: performance hint for busy sites with linux

Dear all,

maybe somebody finds this interesting and helpful:

In March we spent quite a long time to understand a set of strange performance problems under Linux, where e.g. password verification from an external server took up to 20 or more seconds (password server idle, no unusual network traffic), or where sometimes simple db operations in PostgreSQL took suddenly 10secs, or 1 minute, etc. We tried to address the PostgreSQL problem with various tune options for checkpoint writing with no apparent success.

The real cause is actually well known as the "ext3 latency problem" and become famous as the Firefox system-freeze problem under Linux, where the fix of the problem is outside of Firefox. In short, the problem is caused by the "ordered writes" in fsync operations (data=ordered) which means that the writing of meta-data is delayed until all writing of content-data has finished. During this time period, the file system might block every request. If there is a lot of data written, the blocking of the file system can take a long time (even minutes). During this period, the whole file-system can stall, the whole system freezes. The problem with authentication came actually from writing to the log-file, which is blocked as well.

There is a long discussion about the potential data loss implications of changing data=ordered into data=writeback (which makes ext3 more similar to ext2). Read the discussion below and build your own opinion. Linus decided to change the default mount option for ext3 to data=writeback in newer Linux versions (2.6.30+).

If you have a new kernel, and you use no special mount options on your site, you are using this already. If you have an older version of the linux kernel, you might consider to alter the journaling with e.g.

sudo tune2fs -o journal_data_writeback /dev/hda1

Consider this only if you have a busy site, were large amounts of data are written. For our production site this change made a big difference.

http://lwn.net/Articles/328363/
http://article.gmane.org/gmane.linux.kernel/818261

Best regards

-gustaf neumann