Forum OpenACS Q&A: File-Storage on PostgreSQL: Store files in the database or in the file-system

I'm not sure whether it is better to store file-storage's files in the database or in the file-system. As per default in .LRN2/OpenACS 5 files are stored in the database.

What are people's experiences with this? Is it bad for PostgreSQL performance with a growing number of files? Do vacuum and analyze take an unreasonably long time then? What about backup?

The major problem is the fact that they're uuencoded by the driver and this takes CPU cycles.  I did this originally because of a bunch of flaws in the PG large object implementation, including (at the time) not being able to make coherent backups of databases with native large objects.  The driver hack for LOBs solves this, backups are no problem.

You can now make coherent backups but, I believe, only if you use their binary format rather than the default format that dumps SQL statements to create your datamodel and load the backed-up data.  The problem with this is that there are still some dependency issues in the backing up of functions that require one to at times have to hand-edit the backup file to get a good restore.  This is easy in the SQL format ... I have no idea how one would do this in the binary format.

So for now we're still using the driver hack.  And as long as we do so, systems with a great deal of activity will perform better if you store files in the filesystem.

But the default's probably right because then the default database backup strategy that's documented correctly grabs content repository items (it's really a CR not file storage thing).

So, in OpenACS 5.0 we ship this line in etc/backup.sh:

pg_dump -f $dmp_file -Fp $dbname

This is probably incorrect if you store files in the database and should read:

pg_dump -f $dmp_file -Fc $dbname

No, -Fp is fine because the driver's uuencoding the file data before storing it in the database.

-Fc requires use of pg_restore ... it may be that this utility, with its re-ordering capabilities, is robust to the point where we can trust it to properly restore an OpenACS site but when it first came out we didn't really trust it.

If it is trustworthy in all cases we *could* investigate using PG native LOBs again (they've also been improved).

OK, just wanted to make sure that backups are all fine - and that the default value for the above mentioned parameter makes sense.
Don,

On a configured site (http://www.thedesignexperience.org) I was able to pg_dump and reload the database without loading postgresql.sql using PostgreSQL 7.4. It appears that the objects are dumped in the correct order with the latest version of PostgreSQL.

Cool that's good news ... maybe one of these days when I have a whole, whole bunch of spare time I may re-open analysis of the PG LOB issue.  It would be nice to use the built-in LOBs if it's safe and convenient.  Someday.  Like when everything else we care about works and there aren't any bugs :)