Forum OpenACS Q&A: Response to Database errors after power fails

Collapse
Posted by S. Y. on
I think some folks refer to what Don described as write-back caching. If you have one of the nice new Ultra160 drives, your disk buffer is four or eight megabytes. Even a consumer-grade ATA-100 drive has a two meg disk buffer. As an OpenACS developer, you should be able to realize how many typical bboard messages can fit in two megs.

Somewhere on the Adaptec web site, is a very terse FAQ that describes write-back caching. In that FAQ, they clearly warn that write-back caching should only be used on UPS-backed systems, lest the problems Bob described occur.

My Adaptec 29160 64-bit SCSI controller has three options for write-back caching: Yes, No, and N/C. The latter stands for "No Change", meaning that the SCSI controller will use whatever the drive has been set up for. Thus, if your drive manufacturer ships drives with drive-back caching turned on, then the SCSI controller will do it. And yes, the Adaptec SCSI controller defaults are set to "N/C".

Journalled filesystems don't help since the OS or database doesn't know if everything has committed to the disk platters or not. It's really a function of the disk drive hardware.

I'm sorry to have to parrot Don, but mission-critical data deserves a UPS. Period. And you have to be absolutely sure you can trust your client's data to that UPS. That means testing the computer/UPS connection. I use apcupsd and currently my system is supposed to go down with two minutes left on the batteries. I'm not running a database, and my system can shutdown properly in 20-30 seconds. You need to time your system shutdowns and make sure you have plenty of leeway for an orderly halt.

This is really a system administration issue, rather than a web/db developer issue. If you don't have a professional sysadmin on staff, you need to find someone to worry about this sort of stuff. Even good sysadmins make mistakes though, which is why you know banks, insurance companies, and hospitals still have UPSes on their computers.