Forum OpenACS Q&A: Response to PostgreSQL and Journaling FS

Collapse
Posted by Rodger Donaldson on
A data point on file system overhead with databases:Oracle developers recently announced they see a 30%+ improvement performance in Linux when moving from ext2 to raw partitions.

I would expect to see more overhead from ext3 (or any other journalled FS), since there are additional seeks/writes to the journals.

As far as running no-fsync + journal protecting you while gaining performance: the data corruption comes in the window where you're holding the data in RAM, before the FS layer tries to write it down to disk[1]; a journalling FS won't help with that, because it only commits data to the journal periodically.  You could reduce that period, but then you'll be paying the same performance penalty you do for allowing Postgresql to run fsync() constantly.

Also, note that Reiser only journals metadata, not data; it'll leave you with consistent filesystems, but can leave you with corrupt data on them.  It's also optimised for the case of many small files (proxy caches, news servers, etc), not for a small number of large files (RDBMS).

ext3 can journal data, but it'll cost you for performance, since data is being written twice.

Really, the best answer for data you really care about is dual power supplies, one on city power, and one running through a UPS, and disks with small caches, or caches that can be write-disabled.  If IO is your bottleneck, RAID 10 is the answer.

[1] Of course, with modern discs with 4+ MB caches that buffer writes, even calling fsync leaves a window of vulnerability, since data in the cache of the disk may be lost.  This gets worse if you use hardware RAID that doesn't have battery backup for the drives and its cache, since it's unlikely you'll save your 64 MB+ of buffered writes to disc in the event of a power loss.