Forum OpenACS Q&A: Testing a Load Balanced OpenACS4.5 Cluster

Hi all,
I'm working at an OpenACS4.5 cluster installation. Here are some questions I'd like to discuss in the forum.

As you can read in a document by Jon Salz, server-cluster.html, the question with OpenACS clustering is :
<<
Many heavily-hit sites sit behind load balancers, which means that requests to a particular site can be handled by one of several machine conspiring to appear as a single server. For instance, requests to www.foobar.com might be routed to either www1.foobar.com, www2.foobar.com, or www3.foobar.com, three physically separate servers which share an Oracle tablespace (and hence all the data in ACS).

Many database queries are memoized in individual servers' local memory (using the util_memoize procedures) to minimize fetches from the database. When a server updates an item in the database, the old item needs to be removed from the server's local cache (using util_memoize_flush) to force a database query the next time this item is accessed. But what happens when:

www1.foobar.com does util_memoize "get_greeble_info 43" (incurring an actual database lookup, SELECT * FROM greeble WHERE greeble_id = 43, and caching the result)
www2.foobar.com does util_memoize "get_greeble_info 43" (incurring a database lookup and caching the result)
www1.foobar.com UPDATEs the info for greeble #43 and does util_memoize_flush "get_greeble_info 43"
www2.foobar.com does util_memoize "get_greeble_info 43" (returned a cached value). The old info for greeble #43 hasn't been flushed from its local cache, so the result is outdated!
In general, if any of several servers can update an item, the old version of the item can remain in other servers' local caches.
<blockquote>>
</blockquote>

The solution to this problem is to issue a distributed cache flush on all cluster hosts as a consequence of modification  of data which were previously cached.

Two procedures of ACS Tcl API are primarily involved in clustering/caching: util_memoize and util_memoize_flush. util_memoize is used for caching a script result, a database query for example, so that AOL server at next request of same data can respond in highly reduced time.
util_memoize_flush is used to flush old cached values. This procedure is responsible to call a distributed flush all over cluster machines.

Anyway, It seems that this procedure doesn't work in cluster configuration. Perhaps, body of this proc should be built at run time, but actually it seems to be executed only at startup, resulting in an always blank body (to be more precisely it contains only ns_cache flush util_memoize $script).

At this time I've not deeply analized this code so now I propose a raw solution. Modify util_memoize_flush code as follows:

ad_proc util_memoize_flush {script} {

Forget any cached value for <i>script</i>. If clustering is

enabled, flush the caches on all servers in the cluster.
@param script The Tcl script whose cached value should be flushed.
} {
# modified by Pask on Friday 13 December 2002
server_cluster_httpget_from_peers "/SYSTEM/flush-memoized-statement.tcl?statement=[ns_urlencode $script]"
ns_cache flush util_memoize $script
ns_log "warning" "Pask: sono in util_memoize_flush ed ho eseguito server_cluster_httpget_from_peers"
}

#$flush_body

Ok, now util_memoize_flush calls always cluster_httpget_from_peers. Anyway if you try testing cluster, you'll notice it's still not working. Why?

Well, server_cluster_httpget_from_peers schedules a server_cluster_do_httpget via the ad_schedule_proc procedure.

<<ad_schedule_proc -once t -thread f -debug t 0 server_cluster_do_httpget "http://$host$url" $timeout>>

Giving a look to this proc code you see:

<<
if { [server_cluster_enabled_p] && ![ad_canonical_server_p] && $all_servers == "f" } {
return
} >>

Now, [server_cluster_enabled_p] evaluates 1, ![ad_canonical_server] can be 1 or 0, while $all_servers is by defaul 0. So if host running ad_schedule_proc is not the canonical server, httpget will never be executed! To solve this, simply add all_servers==1 flag

<<
ad_schedule_proc -once t -thread f -debug t -all_servers t 0 server_cluster_do_httpget "http://$host$url" $timeout
<blockquote>>
</blockquote>

Now all should work.

Any comment or suggestion? Howto modify util_memoize_flush?

P.