Forum OpenACS Development: Query Dispatcher initial specs posted

1: Query Dispatcher initial specs posted

Posted by Ben Adida on 03/08/01 10:00 PM

I've posted the Query Dispatcher initial specifications at
https://openacs.org/4/query-dispatcher

Please read it and comment by responding to this bboard posting. The
specifications are by no means fully complete. I will continue to spec
the details as I begin implementation next week. However, for a few
days, everything is on the table. Please don't hesitate to comment on
any piece of this specification! I *expect* that I've been an idiot in
at least 2 or 3 ways, and now's your chance to
call me on it.

2: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Roberto Mello on 03/09/01 12:26 AM

I may just be too tired today, but using the query dispatcher would mean that I'll need to write 3 files for every page I want to write? (XML with queries, Tcl, ADP)

If we cache all the queries in all packages, we'll use up quite a bit of memory no?

The idea is nice, but it seems a lot of trouble and an even steeper learning curve for programmers wanting to develop for OpenACS. But please correct me if I am wrong.

3: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Mike Linksvayer on 03/09/01 02:07 AM

Caching all queries will use very little memory.

I'd like to see optional caching of query results as part of this
infrastructure. I shouldn't have to explicitly use util_memoize to
cache results.

I like the content of the proposed spec, though it is a pain to read
in its current form (everyting inside a table width=750).

4: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Roberto Mello on 03/09/01 03:03 AM

"Caching all queries will use very little memory" doesn't say much. Can you elaborate on why you think so?

So you want to cache queries for pages that haven't run? Or you want a command that will run a query and cache the results for you, stuffing the results somewhere (nsv?)?

I still think the learning curve is just getting steeper, while no killer advantage gained.

5: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Ryan Campbell on 03/09/01 08:23 AM

I think you shouldn't force people to create a SQL file for each tcl/adp file. Just let them specify where a query is, like an include file. You might recommend a convention, but you shouldn't dictate one.

Some modules might choose to put all their queries in one monolithic SQL file, while others might choose to break them up based on the type of query. The ecommerce module in 3.x does this with TCL libraries and I think it makes it much easier to see all the logic associated with one area of the application at once. Perhaps this is not always the case, but you need to give people the flexibility so they can decide what's best for a particular situation.

You could still use the naming scheme to determine which exact SQL file to read from.

Also, I think the package_name.section_name.chunk_name.query_name is a bit verbose. How about just package_name.query_name? Shouldn't I be able to reuse a query in different places throughout a module? Your naming scheme seems to prohibit this.

6: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Pascal Scheffers on 03/09/01 08:55 AM

I agree with Robert that having three files for each page quickly becomes a hassle, and would make the learning curve very steep. I seriously like the idea of a query dispatcher, though.

Wouldn't it be more convenient create something along the lines of the proc_doc scheme?

proc_sql { rdbms version query_proc } {
   if { rdbms and version match this_install} {
     register_query query_proc ;# or something
   }
}

You could then either execute the query just by Tcl function name, which would require some smart source file query ordering so that the most specific query comes first upon sourcing of the library. Or you could execute by calling the query dispatcher (or a hybrid of the two). Sadly, we lose the ability to resource the query after change - forcing us to keep reloading nsd.

I know that this could lead to a large Tcl library, but some said (this week IIRC) hardware is cheap, programmers are expensive.

Excuse my poor knowledge of Tcl or (Open)ACS if this is not at all possible.

7: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Kapil Thangavelu on 03/11/01 01:42 PM

thoughts

i'm glad to see the spec.

responding first to previous comments

roberto brings up the following
1. memory
2. extra learning curve effort

memory. simple example take the entire openacs distro,
divide by 100 (lets assume arbitrarily that queries take 1% of the
source) multiply by 20 (extra memory overhead for tcl structures,
again arbitrary). what you get is peanuts compared to modern
machines memory. hence the query dispatcher memory cost is rather
little.

actually some preliminary tests indicate that the size here is
pretty small something on the order 400K.

extra learning curve effort... hmmm... well db portability doesn't
come for free. imo, a db portablity/abstraction mechanism would be
an awesome selling point for developers and clients.

mike brings up
1. caching queries

do one thing and do it well. what about clusters... do you want to
replace util_memoize functionality in the dispatcher. i guess you
could make it a interact with util memomize. but one of the goals in
the porting guidlines is to minimize code impact. wrapping the db
interface for portablity is being done with a goal of minimizing
code impact. i guess i just don't see this as being a big win.
adding caching to the templating layer is another matter.

ryan brings up
1. not forcing a convention.
2. flexible naming scheme.
3. include files

the concept of an include file is interesting. i like it. it does
however make the work of an automation tools harder.

at the same time i think forcing a convention for this purpose is
good. i think regular structure to the system is a good thing (TM).

in the pre dispatcher days :),
if you're going to be reusing a query, wrap it in a proc, the whole
point of the query name is not for resuse but to uniquely tag
database calls. also the acs4 allows you to set multirows in a tcl
proc that will be evaluated directly in the template namespace.

even post dispatcher spec, reusing query names will affect its
implementation and the amount of information that it can reliably
gather from the run time environment to uniquely identify the query.

i think that verbosity is good in this case. ben's also trying to
future proof the spec for future refinements and developments. take
for example allowing the query dispatcher to reload queries when a
file changes.

if people haven't read it, i highly recommend reading the acs4
database access api,
http://developer.arsdigita.com/doc/db-api-detailed.html

esp. the stuff at the bottom re implementation design

pascal brings up
1. an interesting idea of registering queries as functional calls
2. multiple files is a pain (seconded by roberto)

there is some merit to this way methodology, but not as brought up i
think. i'm curious what exactly you had in mind, because i don't get
it. if you mean run time initialization of the dispatcher query
structures, i think this is much worse than startup initilization of
the queries.

ben adida writes the spec.

thoughts on the query dispatcher

when i first heard about the query dispatcher i was skeptical for a
number of reasons, most relating to the fact that other open source
db's seemed like poor candidates, and the limited viability of acs4
when aD would be moving on to acs5 in a few months. but i've changed
my mind, db portablity is a good thing, esp now the sapdb is out. (i
think that in a nutshell, this just became the most advanced open
source database by a signifigant margin). also the acs tcl version
seems to have exploded in modules the cvs @ aD shows 50+ modules.
i've also been working with acs-java, its nice, the programming
paradigm reflects its nature as a port of acs-tcl, but its still
slow to develop in compared to the acs-tcl. i'm pretty convinced
that once the openacs4 is out it will enjoy a vibrant life on its
own no matter what aD does.

cost concerns - run time efficiecny.
1. function execution - i'm more than a little concerned by the
systems architecture. its looks like you're adding at minimum a two
function call overhead to every db call. this kind of overhead is
going to add up quickly. not to mention that postgres isn't as fast
oracle (no i don't want to use -f).

2. thread contention for shared memory - granted i'm out of my
element, so someone please correct me if i'm wrong. the way nsv
shares are implemented uses a bucket lock structure, but we're still
dealing with concurrent access to a shared resource. i haven't
looked at the nsv c code to see if its actually doing a proper
multiple reader/ single writer scheme or what, in which case this
will be somewhat mitigated.

thoughts on the spec
1. mainly a point of curiosity, is there any aD commitment to
migrating these changes to the apm in the interest of supporting the
openacs and multi db packages?

2. i'm a bit lost about the identification and tagging of sql92
queries, how is sql92 to be identified, and where would it be
stored, how retrieved?

3. the xml spec itself is pretty primitive... i'm not sure about
tcl's xml capabilities ( i hope its got a frontend to expat and not
just using the tcl dom, although regexps might do alright), but
using attributes on some of these things makes a hell of lot sense
to me. one glaring example is the <rdbms> tag, instead of the
current nested tags for version, type why not <rdbms
version="8.1.6">oracle</rdbms>. i think a dtd for the finished spec
would be nice for validating xml parsers.

4. umm... having the FullQuery( or some unique part) name in the xml
would be a good thing.

5. it would probably be nice to also have the query_type (the db_api
call),
if only for debuggin purposes.

6. dropping all the dispatcher files in the directories with the
content sounds like an ugly mess to me. a central location in a
package for this type of content sounds much better, i hate dealing
with fs clutter. there is also the matter of making it easier for
the dispatcher to find things (minor) and one less registered proc
(minor) for the request processor.

7. add the file name and proc (if lib) to the xml. this gives some
flexibility in terms of where the sql file maybe located.

8. change the xml tag for querytext to query_text, a stylistic point.

i know i'm adding a bit to the xml spec, but my idea for development
and the spec is not to put the onus of database indepedence on the
developer but instead to create some developer tools to facilitate
meeting the requirements of the query dispatcher by extracing the
xml for the sql straight from a package. (including replacement of
the originating call with the appropiate calls to the dispatcher).
basically this offers developers freedom to develop without having
to edit a separate file for db calls, which seems to be the major
arg, i see here against the current spec.

its hard to evaluate the spec or this claim without working
prototypes to look at. that said i spent a couple hrs today and
worked up an extractor which fufills the spec as outlined in terms
of xml descriptors for queries. currently its just a prototype, but
i've built it with flexibility in input and output in mind, so that
it can change with the spec. it incoporates ryan's suggestion and
creates per package files. i've run it against the 50 odd packages
in the acs-packages cvs directory @aD. its still got a small number
of minor issues (which i'll resolve on tuesday when i get some free
time), but on whole it preforms well (handling vast majority of
cases, and taking a little over a second to run on my laptop). the
whole thing is written in python (because i value my time, both
present and future) and is currently downloadable from

http://www.zope.org/Members/k_vertigo/AcsSqlExtractor.py

the commandline options are neglibile at the moment basically it
works by calling it thusly,

python AcsSqlExtractor.py acs_root

it will create a directory called output with resulting files
including one called statistics which has some info it gathered
during the extraction, namely errors and usage stats on the db_api
in the files read in. its got some hardcoded values in it, that i
will change as i develop it further if people are interested in this
approach. i haven't run on the acs core, and its hard coded
currently to run on the cvs of the packages (although anyone looking
at the source can see how to change it).

so what did i learn from my experiment.

1. python is fun
2. the size of the resultant xml files (with all the xml cruft) is
neglible ~700k from over 671 tcl files and 50 odd packages.
3. there are several sql calls that use tcl vars for the query. this
needs to be dealt with by the developer (and perhaps by the
dispatcher spec).
4. there is alot of pl/sql code lying around, the db api call
db_exec_plsql is the third most popular call after db_1row, and
db_dml.

ok. its 4am and i need to head to sleep...

8: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Ben Adida on 03/13/01 12:34 AM

Thanks for these useful comments!

I think I should clear up something that Don and I have assumed but which we haven't made completely clear yet. Today, PostgreSQL is the only open-source RDBMS we would want to use OpenACS with (because we need transactions and real stored procedures). Thus, I understand that some might question the reason for multi-RDBMS capability. However, it's relatively clear that ArsDigita's focus is now on Java, and not so much on Tcl. This leaves OpenACS as the "keeper of the Tcl faith." And it means we really want to keep OpenACS compatible with Oracle. And potentially compatible with DB2 and other closed-source RDBMSs. Plus, who knows, one day MySQL might be a real RDBMS, and then we wouldn't want to have an architecture that is single-RDBMS only, making us look a lot like the ArsDigita we criticized a few months ago. Thus, the real push for multi-RDBMS.

Now, to answer a few issues.

three files for each web page: I think the move from 2 to 3 is actually not a big deal, as long as things are clearly laid out in the file hierarchy. Also, if we're going to have arbitrary compatibility with various RDBMS systems, I'm really not sure how we can preserve the simple development environment without going to 3 files. Having a central SQL file or a thematic system makes it certainly no simpler to understand or program, does it? You still have to have 3 buffers open. And I can't think of an easier way of locating a query than by finding a file with the same name in a parallel directory.
steeper learning curve: yes. OpenACS 4.x will have a slightly steeper learning curve than 3.x. Building more maintainable, more structured code can only happen if you teach your developers a structure by which they can abide.
caching of query results: This is a serious philosophy difference with the way ACS/OpenACS has functioned in the past. The master data is always in the database. Cached queries sound like a great idea, but they result in horrendous cache management, dirty reads, etc... I think if someone wants that functionality (and sometimes that is warranted!), they should explicitly use util_memoize. There is danger in easily allowing caching where that is inappropriate, and I would strongly vote against that idea.
memory usage: I doubt that there is more than 1 meg of SQL queries in a running ACS installation. However, assuming this gets very large, we can always use some level of cache flushing. I doubt this will be necessary, but certainly there are plenty of cache size-limiting methods that we can implement here without ever changing the specs.
function call overheahd: Compared with the overhead of performing a DB query, function call overhead seems minimal. In high-end architectures, where the DB is going to be on a separate machine with network overhead, this seems negligible to me. At the end of the day, I'm not quite sure how to make this more efficient without sacrificing significant functionality.
thread contention for shared memory: Inherently, it seems the thread contention on DB pool data structures is much worse than the contention on a bucket of nsv vars (at least for the first query in a page). I can't guarantee that this won't become a problem on pages with lots of serial queries (where the DB pool contention is no longer an issue, but the nsv vars are). If someone wants to add some insight here, that would be great. I do think that the AOL guys optimized nsv's pretty well, though.
SQL92 queries: there is no explicit tagging of SQL92 queries. Sorry if that was unclear. It's just that we will try to make the default query (the one that is tagged with neither RDBMS type nor version) as SQL92 compliant as can be. Thus, when an RDBMS isn't SQL92 compliant for that query's constructs, we will have to implement a query specific to that RDBMS, and tag it appropriately. Thus, the more SQL92-compliant an RDBMS, the easier it will be to port OpenACS to it.
Tcl-proc version of dispatching: I thought about this a while ago, having some smart Tcl approach to dispatching queries. The one presented by Pascal with a proc_doc approach is cleaner than anything I'd thought of, and looks fairly cool. The problem is that this is Tcl-dependent. Why is that a problem? Because, even though I don't have an immediate interest in OpenACS/Java today, that may change in the future. If OpenACS/Java is going to be a straight Tcl/Java port, then the queries will remain the same. If the queries remain the same, we need a programming-language independent means of storing the queries. In general, if we're going to separate SQL from code, we don't want to make our SQL-dispatching dependent on a particular programming language.
Verbose query names: Ryan's comment is interesting. How does one reuse queries? It would be nice to have a naming scheme that allows for that. The problem is that current packages and current naming schemes assume uniqueness by Tcl page or Tcl proc, nothing more. Thus, I'm afraid there might be conflicts if we go to a package-level naming scheme. Maybe we can dedicate part of the namespace to package-shared queries. Or we can use the Tcl proc wrapping idea. I'm not 100% sure. But this is a very good point. As long as we're naming queries, reusing existing queries is something we should be able to do easily.
XML parsing and spec: Yes, the current XML spec is primitive, and I'm happy to take suggestions in improving it. While I'm very familiar with XML parsers and knowing where XML is useful, I haven't designed many DTDs to date. Someone who has should jump in and help me out here. Oh, and having the FullQuery name in the XML is obvious... sorry about that one! Good catch. As for having the filename and proc name in the XML, that will be part of the FullQuery name already (once that's fixed of course!).
Location of XML Query files: Ideally, there would be three directories (www, templates, and sql), ACS 3.x style. However, ACS 4.x chose to co-locate the .tcl and .adp files. Unless we want to change that, too, I don't think it makes sense to create a parallel structure for .sql files. What do people think?
Making a Developer's Life Easier: Kapil, I think your code is cool. Python or any other language, it's useful stuff. More than just cool, I agree with your approach. If people want to code for one DB and then want to use your extractor to make it more multi-RDBMS friendly, then that's awesome. If you polish up that code, make it more modular, and eventually make it fit the final spec we agree upon for the layout of these files, I will happily include your code in the distribution as a developer tool.

These are great comments. I will begin to edit the specs with some of these good ideas. In the meantime, please feel free to continue to contribute! This is great stuff!

9: Tcl-proc dispatching (response to 1)

Posted by Pascal Scheffers on 03/13/01 09:49 AM

First off, I agree with Ben that language dependancy should be brought to a minimum and a proc_sql method would not only be language dependant, but also very much scripting language dependant. On the other hand, because we are using a scripting language, we should be able to overload the proc_sql_* functions so it will export any format we like, even another programming language. Using Tcl in the package installers should never be a problem, so in fact, the proc_sql could even become a sql-pre-processor.

When I first mentioned proc_sql I hadn't thought of how to exactly implement it, and it is still a bit hard, because I am not fluent in Tcl. So I'll keep it somewhat more general. I do not know how portable this mechanism is for to other webservers, but I thing it would not be too hard to implement it in IIS/ASP.

What I want to end up with is the ability to do something like this:

	set selection [bboard_latest_sql $db $bboard_id $bboard_user_last_login]

	while {[ns_db getrow $db $selection]} {
	  ...
	}

The SQL library file looks something like:

proc_sql bboard_latest_sql "sql92" {dbhandle bboard_id last_login} {
	#this is the query we want to do, the specs make it mandatory to use the 'sql' variable.
 	#that way we can automagically translate to Java or something.

	set sql "select * from bboard where bboard_id=$bboard_id and created < date($last_login)"
	
	set dbr [ns_db getrow $dbhandle $sql]

	return $dbr	
}

proc_sql bboard_latest_sql "postgres/7" {dbhandle bboard_id last_login} {

	set sql "select * from bboard where bboard_id=$bboard_id and created < date($last_login)"
	
	set dbr [ns_db getrow $dbhandle $sql]

	return $dbr	
}

Sourcing the library involves stuffing the parameters of every proc_sql call into a list, pre-filterering the queries for queries that will not work anyway. So a Postgres install will not stuff any Oracle functions in its dispatcher list. At the end of library sourcing, a decision has to be wether we use dynamic query resolving or static.

In the dynamic scenario, the 'proc_sql_activate_libarary' function will bind each distinct sql proc to the generic dispatcher function, the dispatcher can then, upon invocation (with the Tcl-info command?!?) discover its parameter definition and dispatch the most appropriate version it found in the proc_sql_list.

In the static scenario, after sourcing the libary, the most appropriate version of each proc_sql is found and then all of those are added to the Tcl environment with proc $proc_sql_name $proc_sql_args. This will make library loading take longer, but dispatching will not require any additional list searching.

As a third option, a hybrid could be made where most sql is static, and some comes from resourcing the sql.tcl files, and some libraries are dynamic. Although I can't really imagin where that would be a good idea, but still. I do not suppose there will be many installations where two different flavours of database are used for different sections of the ACS.

As for translating this source file to Java, I do not know how to create a catch-all converter, but it can probably be done. Each invocation of [ns_db XXX $db $sql] can probably be translated 1-to-1 to a Java construct that returns an object for the result.

I am not sure wether this method would be any better than the proposed XML standard. The best way to go about this, for clarity and standards enforcing, would put all the proc_sql calls in a separate file, like the sql.xml. This methodology could probably be implemented on top of the XML files, running as the innards of the Query dispatcher. That would work best if the XML definition gets an additional attribute called 'query-type' which could be one of 'getrow' '0or1row' '1row', etc. Come to think of it, a query type attribute might be useful for other purposes as well.

10: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Don Baccus on 03/13/01 05:23 PM

A couple of quick ideas ...

Query reuse
How about a simple scoping system? Define a scope for each query - file, package or global. File scope would be what Ben's described thus far. When given a query name, have the dispatcher look for a match first at file scope, then package then global, returning the first match.
This would allow for a high level of reuse while not blindly introducing assumptions about names being unique across packages or the sytsem. The potential for query reuse is a major potential benefit of the query dispatcher approach - let's try to think of ways to make reuse practical.
proc_doc approach vs. XML
The nice thing about doing this in XML rather is that there would be no further need to translate such procs to Java, by either manual or automated means. ACS/Java works with any environment that supports servelets and all you'd need would be to do the XML parse in that environment and the Java could would be up and running. I think the XML approach is the simplest when we consider the problem of perhaps supporting a Java version of OpenACS 4 as well as a Tcl version.
As many of you know I have no personal interest in this, at the moment, anyway. But others have expressed an interest and we've talked about being a "big tent" project, and in that spirit looking forward and helping others by doing what we can now to simplify any future port of Java ACS4 to the OpenACS world is a good thing.
Three files?
Some of the .tcl scripts have become exceedingly simple, with so much of the application code now buried in PL/SQL (and PL/pgSQL for OpenACS) and presentation split into template files.
Might it be possible to extend the XML definition of queries slightly and have them define datasources as well as queries, at least in some cases? Some of those .tcl files might disappear altogether if we did so...

11: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Jonathan Ellis on 03/16/01 07:57 PM

Just wanted to point out that IMO Ben's objection to Pascal's idea on the grounds of tcl-specificness isn't a huge deal, given how easy tcl is to imbed. There already exists a way to embed tcl in java and vice versa. [scriptics.com seems to be down... ?] And pgsql's own pltcl.c is a whopping 2200 loc.

12: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Pascal Scheffers on 03/16/01 09:12 PM

Ah, but it is. I think what Ben tried to say is that the SQL queries are something very 'unlanguagy'. If at all possible, we should keep it as language-neutral as possible, not having to do any conversion is very useful. That doesn't mean we shouldn't try to implement functions on top of the Query Dispacher so it can perform the [ns_db getrow $db $sql] and maybe do some nice error handling, fail-over, etc. Although we would have to be careful not to try to re-invent the wheel.

It would be nice if the Query Dispatcher was something really stand-alone, which would encourage porting it to other languages. I think there is not much reason, if any, to tie it to (Open)ACS. If it's possible I will use it for non-web and hybrid projects. I know too much abstraction is bad for performance - but there are plenty projects that try to run on different flavors RDBMS anyway.

I would also very much like it if we store some meta-data about the query.

A return type for example would be very useful. Will it return 1-row, a row-set, one column (stuff it in a Tcl-list from the QD), a single value or an ORDBMS object?

Next up is the parameter values, typing them would be a handy (So we can, for example, handle the QQ-ing of strings in the dispatcher).

Room for comments in would be good. I find SQL hard to read, describing what it should return for a complex query should aide in porting the module.

Maybe an approximation of computational cost of the query (for billing or setting quotas on those resources).

And finaly some asserts (done with regular expressions or something) to check input and output data, for debugging purposes mostly.

Ben, I am willing to create the DTD for the XML document, if you don't want to spend time on creating one yourself.

13: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Kapil Thangavelu on 03/21/01 02:37 AM

making a developer's life easier - i've uploaded the next iteration
of the acs sql extractor to
zope.org/Members/k_vertigo/acs-sql-extractor.tgz

major changes include a lot of refactoring (i took a few minutes and
designed this time:). all parsing code in the parser (really just
regexs), configuration via commandline, config file, or environment
variables, somewhat configurable outputing, use of container objects
to model package->directory->file->content.

this release has less true functionality than the original when
operating on sql files, but it is much better designed and thought
out in terms of flexibility. the biggest todo thing now is rewriting
the parser (the writer needs some work, but that will be easier to
figure out once this spec finalizes (at this point it looks like i
need to use dom to insert in a db's specific version of query to
query xml file). the current one uses regex, which are not my forte,
but i believe that i've reached the limit of what i can do
realistically with regexs. if anyone here is a regex guru/expert and
wants to help please email me. at this point i believe there is too
much possible contextual information thats possible in the body of a
query, and a great deal more of the acs4 uses dynamic queries than i
had previously thought. there just seems to be too much irregularity
in the possible sql calls to use regexes for everything. a brief
contrived example in case someone can come up with a solution (btw
python has perl style regexs), to my current problem of retrieving
the query portion of a db api call.

db_string foo "
select user_name from [AdParameter "user_table"]
where user_id = ${user_id}
"
(or the alternate syntax with {select...})

at this point i'm going to try and hand roll a context based
descent parser (i no jack about parser terminology, but i believe
this is correct, and it looks pretty easily doable). the basic idea
behind the new parser is to use a simple regex to identify db calls
and use a parser on a window around the context of that call to
parse out the query properly for statistics analysis, and warnings
about dynamic sql.

the main reason i'm putting this out right now in this state, is to
invite comments and to reassure anyone who's interested that this is
being worked on. feedback welcome.

xml query files - all db versions in one files, i'm obviously coming
from a generative POV, but i'm also thinking about it from a
developers perspective of going through all the various db versions
of a call to add your own. i was originally thinking that with a
generative approach, the queries could be separted from the web
directory and be further subdivided into directories based on db
type (including ansi). after the server tastes the db pool it could
process the appriopiate directory (plus the ansi for anything not
found). after realizing how much dynamic sql there is i realize that
this is not feasible and that a developer will need to do some hand
editing (maybe... more on dynamic sql later).

this would also have the added benefit of making the parsing easier
and less memory intensive (read faster), the current spec requires
reading all of the lower level nodes (of a file with all db types)
to figure which db a query is for, storing this at a higher level in
the xml format would make vast reductions in the numbers of nodes
that need to visited. i'm granting that any xml parsing is a pretty
trivial concern, but i don't want my server taking 5 minutes to
startup (dom is slow, sax would be ideal).

speaking of which what tcl libraries were people thinking about
using for xml processing, i haven't looked at tcl's xml capabilities
for a while, and the last time i did they were all dom based written
in tcl. does tcl have a front end to expat?, a fast dom (level2?),
etc (links, please).

the bottom line concern here is def. whats easiest for developers to
work with?. is having a buffer to another directory open a huge
hassle for developers?

the acs-4 already aggregates content and logic for that content in
the www. but the whole query dispatching system seems to be moving
towards something more along the lines of system resources (esp if
query reuse comes into play), and i don't think its a good idea to
stick system resources in www. (aesthestics).

query reuse - is a good thing, and should help matienance, but i'm
clueless on the best way to go about it. btw. afaik there is no hard
enforcement of unique names built into acs, its all convention
based. in fact i see some duplication of names in the same file.
grep -rn db_* * > ../out.sql

tcl proc - embedding tcl is a not an elegant solution nor is
rengineering tcl code in another language. i still think this is a
very interesting idea, but its not language indepedent. and since
i'm considering doing a zope or perhaps a java port i would really
prefer an xml based format.

three files - i don't see how its possible, to remove the tcl files
all toghether. there are a lot of dynamically generated calls, much
more than i had originally thought, this (dynamics queries) is
something that needs to be discussed in the context of the query
dispatcher. also figure in things like redirects, ad_page_contract,
etc... eliminating the tcl would basically mean recreating it in
another form, why bother?

dtd - if no one else gets around to it, i'll write one up when the
spec finalizes (i'm going to be using it for optional validation of
generated xml files).

metadata - pascal brings up alot cool ideas about increasing the
metadata associated with a query. i'm all for this in a generative
world, in a developer working by hand world this looks like it might
be a little too tedious. while i think having the information
available is a good thing. i think that keeping the dispatcher as
light as possible is more important than increasing its features (at
least until some real world tests are possible). IMO (i have no hard
evidence, but i can't possibly see how it could be otherwise) the
acs4 is slower than acs3x because of all the extra joins and stuff,
(permissions, and privilege heirarchy stuff alone result isome slow
very common calls), i think streamlining the db abstraction layer to
keep its performance impact to a minimum is good thing. but adding
in metadata seems like a perfectly reasonable thing to do that
should be useful. i think that return types can be safely
encompassed by including the name of the db_api call in the metadata
(db_string, db_1row, db_multirow, etc). typing in tcl:), everything
is a string. comments are a cool. pre and post conditions on queries
sounds way too expensive, IMO, this is application specific
information and responsibility not the systems (unless you're using
a language like effiel, which has native support for this type of
thing).

dynamic sql - this seems to be the white horse that hasn't been
addressed yet. i was originally very concerned by this, but now i'm
not so sure if it really makes much impact in terms of
functionality. the original query is just being stored as string,
and when its retrieved by the dispatcher the string will be
evaluated in context of the pages where these variables presumably
exist. so in the sense of functionality its not a huge issue. by
dynamics sql i'm including all variants of non-hardcoded queries
including fully dynamic queries, those with embedded vars, and
function calls. it does become issue wrt to being able to quantify a
queries db abstraction, since the code that generates the query
could possible be in the tcl page. i really think that this issues
needs some discussion. 95%+ of the dynamic sql calls i currently see
in the acs4 are not of this variety and just use a simple function
call or variable to add a value to a comparison clause or what not.
but the remaining cases could be problematic.

sql92- ben could you elaborate a little more on what envisioned for
adding sql92 support. its sounds a little open ended at the moment.
from what you wrote it looks like your identifying a queries
constructs and comparing them to an rdbms capabilities wrt sql92
compliance... why not just take a simpler approach and use the sql92
query unless an rdbms version of the query is found.

14: Response to Query Dispatcher initial specs posted (response to 1)

Posted by Kapil Thangavelu on 03/24/01 01:22 AM

another version of acs-sql-extractor is out.

check it at
http://www.zope.org/Members/k_vertigo/acs-sql-extractor.tgz

it basically works according to spec with the one major caveat of
not always being correct in assoc. of query names with the right
proc in library files (my regex is a little too greedy). the output
is according to the current spec, with some minor additions of a
container tag for queries and a additional query tag to note the
type of db api method.

i also added in some basic documentation, check the changes file for
more info on whats changed.