Forum OpenACS Q&A: Tracking actual traffic usage with ad_returnfile_background?

I've been playing around the ad_returnfile_background for delivering large files and it works greats -- and scales well. However, I'll need to be able to track actually bandwidth usage for every file delivery, which seems to be easily doable with a trace filter when using ns_returnfile:

ad_proc -public filter_conncontext { conn arg why } {
ns_log notice "Actual bytes sent: [ad_conn contentsentlength]"
return filter_ok
}
ad_register_filter trace * * filter_conncontext

However, with the bgdelivery procs, the trace filter is invoked when the thread is sent to background, and contentsentlength is explicitly set to the size of the file being delivered. So even if the user cuts off the download prematurely, the access log and the value in ns_conn will show the entire file as having been sent.

Any good ideas or hints?

When bg_delivery is used, the connection thread is in charge for the request only until the header is returned. The actual spooling is performed by a separate thread. Releasing the connection thread asap is the primary reason for improved scalability. This has however consequences for trace filters, which are as well run at the time the connection thread is released (at a time the file is not yet delivered). This affects as well ns_log, which is implemented in aolserver as a trace filter.

When using bgdelivery, one can get the final amount of data by extending the proc "end-delivery" (4th argument). If there is interest, i can turn every delivery into an object, such one can register a mixin class to overload the behavior without changing the original procs.

-gustaf neumann

Thanks Gustaf,

I should've seen the commented ns_log line in the end-delivery proc and figured this out myself 😉

I'm not familiar with xotcl, so I'm wondering is there's an easy way for my "own" package to overwrite the end-delivery proc? So I could basically leave the xotcl-core package unchanges while getting a callback for my own logging needs after the connection is terminated? (That's basically the extent of my needs -- objectifying the delivery is probably overkill if I can just hack the way it's done at the moment.)

Thanks,
Steffen

Steffen,

turning the fileSpooler into an object is quite trivial and should have done at the first place already (i have committed this to CVS HEAD). One question is whether the fileSpooler and/or every spooling act should be an object. For the sake of making a minimal change, i objectified just the fileSpooler. Otherwise it would be possible to customize every single spooling act with different behavior.

By turning the file spooler into an XOTcl object, it is possible to overload/extend every method in multiple ways, such as e.g. via mixins (see e.g. http://nm.wu-wien.ac.at/research/publications/xotcl-tclconf.pdf).

When you upgrade from CVS, and add e.g. a file xotcl-core/tcl/custom-bgdelivery-procs.tcl with the content below, you can add your logging there. For the sake of performance, i would suggest to avoid blocking operations in the bgdelivery thread. Make sure, that the this code is sourced after bgdelivery-procs.tcl.

One can as well add the custom behavior dynamically at runtime of the server. This could be done e.g. via ds/shell by issuing there

 
bgdelivery do eval {
   Class customSpooler
   customSpooler instproc end-delivery args {
      ns_log notice "CUSTOM end-delivery $args"
      next
   }
   ::fileSpooler mixin add customSpooler
}
Adding/changeing behavior via ds/shell is often convenient for debugging or developing.

Some background information about the code below. To understand it, one has to know the following about about initialization of the Tcl threads in aolserver: the Tcl threads are created on demand, that is, at the time, when e.g. the first bgdelivery happens. At that time, a thread is created and initialized with an initcmd. The code below simply adds commands to the initcmd of the bgdelivery thread. Therefore, the customSpooler will only exist in the background thread.

Hope this helps,

-gustaf neumann

#
# Save this content e.g. as 
# xotcl-core/tcl/custom-bgdelivery-procs.tcl
#
bgdelivery append initcmd {

  # A small class to overload the end-delivery method of the fileSpooler
  
  Class customSpooler
  customSpooler instproc end-delivery {filename fd channel bytes args} {
    #
    # A file delivery finished, do some custom behavior; here the files
    # are still open...
    #
    ns_log notice "CUSTOM end-delivery, bytes written = $bytes"
    
    # call the standard behavior of the fileSpooler (e.g. closing files)
    next
    
    # The standard behavior has finished, we could do here something as well...
  }

  # register the customSpooler to the fileSpooler
  ::fileSpooler mixin add customSpooler

  ns_log notice "+++ SPOOL customSpooler registered"
  
}
Thanks Gustaf,

I've upgraded to the CVS HEAD version of xotcl-core and retrieving the information works exactly as described (I even used to occation to install the request monitor which makes for a bit of fun as well.)

Now for my, hopefully, final question: Since the customSpooler runs only in the background thread, how do I transfer information about the request in the best manner. For example, I might was to log the object_id of the file returned. Or some information about the connection, such as the user's IP or the request referer. Obviously, ad_conn headers is not available by the time I get the finally tally of bytes downloaded and similarly I'm not able to do a global downloaded_object_id and then access that from the background thread.

Thank you for your input,
Steffen

The probably cleanest way is to add a client_data to ad_returnfile_background and its supporting functions (i have committed this just now to cvs head). One can pass arbitrary data such as item_id, revision_id etc. to the send delivery operation, like in the following example:

#
# Call ad_returnfile background (e.g. in acs-content-repository/tcl/revision-procs.tcl)
# with some provided client_data.
#
set client_data [list item_id $item_id revision_id $revision_id \
       headers [ns_set array [ns_conn headers]]]
ad_returnfile_background -client_data $client_data 200 $mime_type $filename
Also the custom spooler receives the client_data option, which is per default empty. If data is provided at invocation time it can be evaluated in the custom spooler.
  Class customSpooler
  customSpooler instproc end-delivery {{-client_data ""} filename fd channel bytes args} {
    ns_log notice "CUSTOM end-delivery, bytes written = $bytes, client_data=$client_data"
    next
  }
With the sample settings provided, you should be able to get the object-ids and the request headers, etc. in the background thread in an extensible manner

All the best
-gustaf neumann

Thanks again, Gustaf -- it works wonderfully.

As an after thought: In xotcl-request-monitor/tcl/throttle_mod-procs.tcl line 857, there's a bug on first run. Apparently, the f variable isn't set, which means that unset causes an error. Wrapping the statement in a catch {...} fixes the problem.

Let me know if you want the bug report in a more formalized fashion.

Cheers,
Steffen

Good!

no need for a formal bug report, i fixed this slightly different in CVS head.

all the best
-gustaf neumann

Dear Steffen,
I've been playing around the ad_returnfile_background for delivering large files and it works greats -- and scales well
in case you have figures from your testing setup, i would be interested in these...

-gustaf neumann

I don't have any solid data for sharing yet; we'll be moving 23 to new servers using bgdelivery in the beginning on January. By then, I'll try to find something more coherent to share...