Forum OpenACS Q&A: Re: referencing a specific db pool

Posted by Tom Jackson on

This is something that is just crying to be done in pl. I moved a select heavy procedure from tcl to plsql and saw the time go from 8 hours to 8 minutes on a process we wanted to run ever two hours, and also greatly increased the functionality. It also really lowers the load on the server.

But there isn't anything really bad about a ten minute query if it is only run once a month, and you are only able to lower it to four minutes.

Posted by Mark Aufflick on
Yes - but in postgres a pl/pgsql function is implicitly a transaction, thus running into the problems mentioned above...
Posted by Jeff Davis on
If in the end you are just doing a collection of inserts you
might consider doing it all with a COPY command (you can
look at db_load_sql_data and ref-timezones for examples
of loading data with copy).
Posted by Tom Jackson on

I think what I did was setup a controller function and then another worker function which ran each operation. It had to be done this way for the similar reasons: user data might crash the process, but in that case only a single record would be affected. So in the worker function, catch the exception, and return 0 or 1 based on success/failure. On failure, run another function to do your logging.

This is always the case in pl, even if you can catch an exception, you cannot continue in the body of the function and you have to cleanup and return.

Moving stuff around inside the database is always going to be much quicker than going back and forth to the webserver for control.

Posted by Mark Aufflick on
ah - so you're saying that if i have a plpgsql function that iteratively calls another plpgsql function, and the query inside the inner function fails (and rolls back the querie(s) in that function, the OUTER function continues?
Posted by Tom Jackson on

I know it works that way in plsql, and plpgsql is very similar, you add an exception block to the bottom of the function, something like:

        return 1;

The WHEN OTHERS clause handles all exceptions not otherwise handled. If you exect certain types of errors and want to distinguish between them, you need to include them first.

If you have several dml statements that need to be executed, and you want as many of them to succeed as possible, you can follow the advice of _Oracle PL/SQL Programming, page 266-267 and have each statement in its own block:

   DELETE FROM employee WHERE ...;

   UPDATE company SET ...;

Using this method, you could do away with the second function, and use a sub-block to do the work:


This works in pl/sql, you might try a simple example in pl/pgsql.

Posted by Mark Aufflick on
Postgres doesn't support subtransactions, so any error anywhere between the start and end of a transaction will roll right back to the start.

Postgres also has no exception handling - you can run a transaction on exception to log it or similar, but you cannot prevent the rollback.

Now that I have added the second (simpler but more data) part of the end of month process, processing time has blown out to 30 mins - and the amount of data will increase as sales increase.

I am going to try parallelizing the two  which should reduce the overall execution time.

Or, I guess I just abandon my idea and manually enforce the business rules as well as allowing constraints to do their thing and catch the errors.

Such a nice idea though...

Posted by Jeff Davis on
There was a post by Markus Schaber on benchmarking various insert methods on the pgsql-performance list. The very short answer is that copy is the fastest by a large margin and individual inserts are very slow.:
Here are the results of some benchmarks we did inserting 30k rows into a
table, using "time psql -f blubb.sql -q dbname":

File            kingfisher              skate
30kinsert.sql   39.359s    762r/s       335.024s           90r/s
30kcommonce.sql 11.402s   2631r/s         7.086s         4233r/s
30kwithids.sql  10.138s   2959r/s         6.936s         4325r/s
30kprepare.sql   8.173s   3670r/s         5.189s         5781r/s
30kdump.sql      1.286s  23328r/s         0.785s        38216r/s
30kdumpseq.sql   1.498s  20026r/s         0.927s        32362r/s

Kingfisher is the single processor machine I mentioned yesterday night,
skate the SMP machine.

The table has five rows (bigint, bigint, double, double, timestamp
without time zone). The first of them has a "default nextval('sequence'
::text)" definition, and there are no further constraints or indices.

The 30kinsert.sql uses simple insert statements with autocommit on, and
we insert all but the first column which is filled by the default
sequence. With this test, kingfisher seems to have an irrealistic high
value of commits (syncs) per second (see what I wrote yesterday) [1],
skate has a more realistic value.

30kcommonce.sql, as suspected, gives a rather high boost by
encapsulating all into a single commit statement.

30kwithids gives a small boost by inserting pre-calculated sequence
numbers, so it seems not worth the effort to move this logic into the

30kprepare prepares the insert statement, and then issues 30k EXECUTE
statements within one transaction, the speedup is noticeable.

30kdump simply inserts the 30k rows as a dump via COPY FROM STDIN. (as
with 30kwithids, the first column is contained in the insert data, so
the default value sequence is not used). Naturally, this is by far the
fastest method.

30kdumpseq.sql uses COPY, too, but omits the first column and such
utilizes the sequence generation again. This gives a noticeable 15%
slowdown, but seems to be still fast enough for our purposes. Sadly, it
is not available within jdbc.

Thanks for your patience.

[1] We suspect this to be some strange interaction between ide,
cryptoloop and ext3fs, so that the sync() call somehow does not really
wait for the data to be physically written to the disk. (I really can't
imagine a crypto-looped notebook harddisk to do more syncs/second than a
SCSI-Based RAID in a server machine. We did some small benches on the
sync() / fsync() calls that seem to prove this conclusion.)
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich