Forum OpenACS Development: Response to Query Dispatcher initial specs posted

Collapse
Posted by Kapil Thangavelu on
thoughts

i'm glad to see the spec.

responding first to previous comments

roberto brings up the following
1. memory
2. extra learning curve effort

memory. simple example take the entire openacs distro,
divide by 100 (lets assume arbitrarily that queries take 1% of the
source) multiply by 20 (extra memory overhead for tcl structures,
again arbitrary). what you get is peanuts compared to modern
machines memory. hence the query dispatcher memory cost is rather
little.

actually some preliminary tests indicate that the size here is
pretty small something on the order 400K.

extra learning curve effort... hmmm... well db portability doesn't
come for free. imo, a db portablity/abstraction mechanism would be
an awesome selling point for developers and clients.

mike brings up
1. caching queries

do one thing and do it well. what about clusters... do you want to
replace util_memoize functionality in the dispatcher. i guess you
could make it a interact with util memomize. but one of the goals in
the porting guidlines is to minimize code impact. wrapping the db
interface for portablity is being done with a goal of minimizing
code impact. i guess i just don't see this as being a big win.
adding caching to the templating layer is another matter.

ryan brings up
1. not forcing a convention.
2. flexible naming scheme.
3. include files

the concept of an include file is interesting. i like it. it does
however make the work of an automation tools harder.

at the same time i think forcing a convention for this purpose is
good. i think regular structure to the system is a good thing (TM).

in the pre dispatcher days :),
if you're going to be reusing a query, wrap it in a proc, the whole
point of the query name is not for resuse but to uniquely tag
database calls. also the acs4 allows you to set multirows in a tcl
proc that will be evaluated directly in the template namespace.

even post dispatcher spec, reusing query names will affect its
implementation and the amount of information that it can reliably
gather from the run time environment to uniquely identify the query.

i think that verbosity is good in this case. ben's also trying to
future proof the spec for future refinements and developments. take
for example allowing the query dispatcher to reload queries when a
file changes.

if people haven't read it, i highly recommend reading the acs4
database access api,
http://developer.arsdigita.com/doc/db-api-detailed.html

esp. the stuff at the bottom re implementation design

pascal brings up
1. an interesting idea of registering queries as functional calls
2. multiple files is a pain (seconded by roberto)

there is some merit to this way methodology, but not as brought up i
think. i'm curious what exactly you had in mind, because i don't get
it. if you mean run time initialization of the dispatcher query
structures, i think this is much worse than startup initilization of
the queries.

ben adida writes the spec.

thoughts on the query dispatcher

when i first heard about the query dispatcher i was skeptical for a
number of reasons, most relating to the fact that other open source
db's seemed like poor candidates, and the limited viability of acs4
when aD would be moving on to acs5 in a few months. but i've changed
my mind, db portablity is a good thing, esp now the sapdb is out. (i
think that in a nutshell, this just became the most advanced open
source database by a signifigant margin). also the acs tcl version
seems to have exploded in modules the cvs @ aD shows 50+ modules.
i've also been working with acs-java, its nice, the programming
paradigm reflects its nature as a port of acs-tcl, but its still
slow to develop in compared to the acs-tcl. i'm pretty convinced
that once the openacs4 is out it will enjoy a vibrant life on its
own no matter what aD does.

cost concerns - run time efficiecny.
1. function execution - i'm more than a little concerned by the
systems architecture. its looks like you're adding at minimum a two
function call overhead to every db call. this kind of overhead is
going to add up quickly. not to mention that postgres isn't as fast
oracle (no i don't want to use -f).

2. thread contention for shared memory - granted i'm out of my
element, so someone please correct me if i'm wrong. the way nsv
shares are implemented uses a bucket lock structure, but we're still
dealing with concurrent access to a shared resource. i haven't
looked at the nsv c code to see if its actually doing a proper
multiple reader/ single writer scheme or what, in which case this
will be somewhat mitigated.

thoughts on the spec
1. mainly a point of curiosity, is there any aD commitment to
migrating these changes to the apm in the interest of supporting the
openacs and multi db packages?

2. i'm a bit lost about the identification and tagging of sql92
queries, how is sql92 to be identified, and where would it be
stored, how retrieved?

3. the xml spec itself is pretty primitive... i'm not sure about
tcl's xml capabilities ( i hope its got a frontend to expat and not
just using the tcl dom, although regexps might do alright), but
using attributes on some of these things makes a hell of lot sense
to me. one glaring example is the <rdbms> tag, instead of the
current nested tags for version, type why not <rdbms
version="8.1.6">oracle</rdbms>. i think a dtd for the finished spec
would be nice for validating xml parsers.

4. umm... having the FullQuery( or some unique part) name in the xml
would be a good thing.

5. it would probably be nice to also have the query_type (the db_api
call),
if only for debuggin purposes.

6. dropping all the dispatcher files in the directories with the
content sounds like an ugly mess to me. a central location in a
package for this type of content sounds much better, i hate dealing
with fs clutter. there is also the matter of making it easier for
the dispatcher to find things (minor) and one less registered proc
(minor) for the request processor.

7. add the file name and proc (if lib) to the xml. this gives some
flexibility in terms of where the sql file maybe located.

8. change the xml tag for querytext to query_text, a stylistic point.

i know i'm adding a bit to the xml spec, but my idea for development
and the spec is not to put the onus of database indepedence on the
developer but instead to create some developer tools to facilitate
meeting the requirements of the query dispatcher by extracing the
xml for the sql straight from a package. (including replacement of
the originating call with the appropiate calls to the dispatcher).
basically this offers developers freedom to develop without having
to edit a separate file for db calls, which seems to be the major
arg, i see here against the current spec.

its hard to evaluate the spec or this claim without working
prototypes to look at. that said i spent a couple hrs today and
worked up an extractor which fufills the spec as outlined in terms
of xml descriptors for queries. currently its just a prototype, but
i've built it with flexibility in input and output in mind, so that
it can change with the spec. it incoporates ryan's suggestion and
creates per package files. i've run it against the 50 odd packages
in the acs-packages cvs directory @aD. its still got a small number
of minor issues (which i'll resolve on tuesday when i get some free
time), but on whole it preforms well (handling vast majority of
cases, and taking a little over a second to run on my laptop). the
whole thing is written in python (because i value my time, both
present and future) and is currently downloadable from

http://www.zope.org/Members/k_vertigo/AcsSqlExtractor.py

the commandline options are neglibile at the moment basically it
works by calling it thusly,

python AcsSqlExtractor.py acs_root

it will create a directory called output with resulting files
including one called statistics which has some info it gathered
during the extraction, namely errors and usage stats on the db_api
in the files read in. its got some hardcoded values in it, that i
will change as i develop it further if people are interested in this
approach. i haven't run on the acs core, and its hard coded
currently to run on the cvs of the packages (although anyone looking
at the source can see how to change it).

so what did i learn from my experiment.

1. python is fun
2. the size of the resultant xml files (with all the xml cruft) is
neglible ~700k from over 671 tcl files and 50 odd packages.
3. there are several sql calls that use tcl vars for the query. this
needs to be dealt with by the developer (and perhaps by the
dispatcher spec).
4. there is alot of pl/sql code lying around, the db api call
db_exec_plsql is the third most popular call after db_1row, and
db_dml.

ok. its 4am and i need to head to sleep...