Forum OpenACS Q&A: AOLserver, aD, PG, ACS and OpenACS on Linux Journal, again

The tools of our choice have been mentioned several times on this month's LinuxJournal Magazine.

Reuven Lerner, who writes the At the Forge column, talking about the different web technologies that have come up over the past 4 years and where are they going, writes (most interesting paragraphs. There's more in the column):

Another open source web server, AOLserver, has long contained an
embedded Tcl interpreter. (...)

Of these, I find AOLserver's technique of persistent, pooled
connections to be the most elegant, since it works for all languages
--although that is almost always going to be Tcl-- and scales
extremely well. (...)

Another application server taht has been getting a lot of publicity is
the ArsDigita Content System, written and maintained by the ArsDigita
consulting company and released under the GPL.(...)

A volunteer effort known as OpenACS has been working to sove this
problem (dependency on Oracle) by porting the ACS software to use
PostgreSQL as a database. The software is not quite complete but does
include a great deal of functionality and will undoubtedly improve
over time.(...)

We have already begun to use ACS for some large jobs, in no small part
because of the very large number of working applications--not just
underlying tools--that come with it. Moreover, the fact that ACS is
free software and works with Linux makes it easy to work with since we
can rely on the community to provide functionality, documentation,
testing and bug fixes.(...)

There's also an article entitled "PHP4 and PostgreSQL: Building Serious Web Applications with Open Source Software" which praises the PostgreSQL improvements and robustness.

I just thumbed through the November issue, and his column talks about PostgreSQL vs. MySQL (and he doesn't understand that the REAL reason transactions are important become clear if you pull the plug on your processor in the midst of one).

What month is the issue you're referring to?

The december issue that just arrived.

Well, I don't claim to be the world's greatest authority on transactional theory. And I haven't yet seen the printed edition of the column that people mentioned about MySQL vs. PostgreSQL. However, my original manuscript did include the following paragraph:

Thanks to transactions, you can be sure that a transfer of money from
one account to another will not accidentally leave you with too much
money or with two little, even if the power fails in the middle of the
transaction.  Until the transaction is finally "committed," the
database pretends that none of it has happened.

FYI, I'm expecting to review ACS and OpenACS within the next six months; I'm going to be briefly distracted by the pluses and minuses of n-tier systems and SOAP, and then I'll get onto a year-long comparison of various application servers and frameworks.

OK, you do understand :)  My only criticism, then, is that the point should've been considerably amplified in the column.  Most folks in the MySQL world don't seem to understand this point, so the more frequently it is emphasized the better.  Overall, your column was very good.

ACS 4.0 is vastly different than ACS 3.x and previous versions, so before reviewing you might want to check in with us to see where we're at in our efforts to port the new version to PostgreSQL (and possibly InterBase, if the political machinations surrounding efforts to Open Source it come to an end).  You'll also probably want to take a look at Ars Digita's project to build a Java version of ACS 4.0, since Tcl seems to be the major complaint some folks have about the ACS.

You should also give some thought to doing at least a short review of PG 7.1 when it comes out, as it is another vast step forward along the  lines of 6.4 vs. 6.5 (when MVCC was implemented), or 6.5 to 7.0.  Outer joins, sub-selects in the "from" clause, views that can contain "union" clauses etc, a function manager that handles NULL parameters correctly, and REDO logging for even greater data integrity head the list of features included in PG 7.1.  It rocks! (I'm currently testing  a pre-beta version).

Yep, I'll look into all of those.

I'm using PostgreSQL more and more in my own work, and am anxiously awaiting 7.1 for all of the features you mentioned. Until now, the biggest reason for me not to use PostgreSQL has been the 8k tuple limit. But the more I use PostgreSQL, the most I'm totally impressed by it. When 7.1 comes out, I think that I'll basically be telling my clients to use either PostgreSQL (for nearly all jobs) or Oracle (if they need all sorts of fancy-shmancy features).

As for ACS 4.0, I wish that I had time to look into it more, particularly given the Java abstraction layer. But you can be sure that I'll be looking at it in the near future (personally and in print), and that I can't wait to see things get a bit more -- um -- organized...

As for the 8K tuple limit, it's a compile time option you can change. AFAIK, this will be completely gone in PG 7.2, but I'll let Don say more about it.

Let me give you a run-down of ACS 4.0's enhancements then:

* Complete Modularity

* Package Manager and Central Repository - You install just the ACS core and the packages you need from ArsDigita's central repository (ala Debian, but through a web interface).

* Improved Database API using bind variables - more secure and the programmer does not have to especifically grab and release db handles. Plus it's easier to port.

* Request Processor - takes care of all requests, filters. etc.

* Templated - complete separation of presentation and logic.

* Content Management package

* Easy to run several sites from one ACS 4 installation (almost impossible with 3.x)

* Several packages are already available, but not as many as there are in 3.x are available yet.

The Java version basically replaces Tcl with Java and it's currently in Alpha, with a beta release scheduled for next month and final in january.

I don't know if you are aware, but ArsDigita has a mod_aolserver for Apache that lets you run (Open)ACS in Apache, and an IIS dll that lets you run (Open)ACS on IIS.

Your columns have been great. I was kind of getting sick of only seeing MySQL and Python on LinuxJournal. It seemed to be all the magazine staff cared about.

Tuple limit will be entirely gone in PG 7.1, actually.  It works similarly to the large-object kludge I put in the PG driver to let us port things like the file-storage module without rewriting it to store files in the file system rather than the DB (though,  of course, Ben later did the sensible thing and rewrote it).

In other words, large columns get shuffled off into an auxiliary table.  This implies that retrieving them involves a JOIN operation.  The cost of that may well be offset by the fact that selecting the rows to be returned won't require dragging along the lengthy column data (in other words, the long columns are JOINed to the rows returned  by the query).  And there's built-in compression ala the secret lzText type we've been using in OpenACS.

However, as Roberto points out, the tuple size limit in PG has always been a compile-time option.  The PG folk haven't done a good job of making that clear to people, which has led to much of the world believing that the 8KB *default* limit on the length of a tuple was actually a *hard-wired* limit.  Which has never been true.  The *real*  hard-wired limit has always been 32KB.  Still fixed, still too short for LOBs, but a lot better than 8KB.

It is arguable that PG would've gotten a lot better press over the years if they'd just made the default blocksize 32KB rather than 8KB!

For instance,  Reuven, the OpenACS project recommends compiling with a 16KB blocksize.  Combined with the compressed lzText type, this has proven adequate for holding real-world bboard text, etc with truly long stuff like PDF files being stuffed into the db via the long-object hack I wrote into the AOLserver postgres driver.

With PG 7.1, it will be reasonable to run OpenACS with the default 8KB  tuple size and the long-object hack will no longer be necessary.  Which is truly exciting.

And there's built-in compression ala the secret lzText type we've been using in OpenACS.
Maybe I should have listened more closely. Where are the docs on the lzText scheme? Is lzText a Postgres basic type, or an OpenACS user-defined type? Where does the decompression take place, ie, how do you run lexical queries against it?
lzText is undocumented.  Jan Wieck implemented it, but shortly afterwards had a flash of inspiration and came up with the scheme that  is now used in PG 7.1 to implement long tuples (which includes compression).

I (and a couple of others) lobbied the PG crew to leave lzText in, since it was so obviously useful for the OpenACS project.  It was decided to leave it in, but as an undocumented feature in PG 7.0 to be  superceeded by PG 7.1's long tuples implementation (which works fine with plain-old "text", etc).

"lzText" can be used anywhere "text" can be used and is fully type-compatible with it.  Compression/decompression are invisible to the user.  In PG 7.1, pg_dump will change "lzText" to "text".