Forum OpenACS Development: Re: What issues remain for OpenACS on AOLserver4

Collapse
Posted by Andrew Piskorski on
Lars, my understanding is that fixing the Tcl thread-safety bug in question is quite non-trivial.

In his original 3003-03-26 post to the AOLserver list reporting the problem, Zoran said:

The problem is in Tcl generic/tclIOUtil.c and naive handling of static Tcl_Obj *cwdPathPtr. The pointer to this Tcl object gets shuffled arround threads by simple reference, it is read (referenced) without proper locks, etc. The implementor obviously protected the most obvious write operations, but neglected any others. Also, the Rule#1 in Tcl "Do not pass Tcl_Obj's between threads" is grossly violated.

And here's a different, more round-about take on why it's probably a difficult problem:

Back in March when this came up on the AOLserver list, I didn't understand that the current working directory of a process is maintained process by the kernel, not by the process itself. So I was speculating about maybe being able to fix things by simply giving every thread it's own independent thread local storage (aka, thread specific data) CWD. Here's what Rob Mayoff had to say about that:

Perhaps you do not realize that a process's current working directory is tracked by the kernel, not by the process. Tcl keeps track of its CWD for speed, but ultimately it's the kernel, not the process, that resolves relative pathnames, so it's the kernel's idea of the CWD that matters.

I believe that POSIX requires that all threads in a process share a working directory. Making each thread appear to have its own working directory requires either non-standard kernel support for per-thread CWD (which Linux has, but I don't think you can get to it through the pthreads interface), or intercepting every system call that involves a pathname (open, link, symlink, unlink, rename, access, stat, lstat, chdir, chroot, chmod, chown, lchown, mknod, mkdir, rmdir, bind, connect, and probably some more that I've forgotten). You might be able to ignore some of these for AOLserver, but intercepting any of them isn't necessarily easy, and it's definitely not possible to do so portably.

It still might be the best way to fix this problem, though.

Note Rob's last line - scary! Zoran independently said much the same thing:
Eh, the cwd is the thing which is used by most path-related sys/lib calls to resolve the absolute path of the file. It is tracked in the kernel, not in the process, so in order to make this happen, you ought to intercept *all* of the sys/lib calls fiddling with paths. Now, Tcl with its virtual filesystem *might* achieve this, since it really isolates the upper layers from the OS-specifics. But, if you ask me, I think this is voodoo.

To be honest, I was also playing with this idea, but after giving it a serious thought, I've abandoned it.

Anyway, Zoran was working on fixing the bug, and last we heard he had some sort of fix (maybe only partial, I'm not sure) as of March 27, but it wasn't in the Tcl core yet. I haven't heard anything since then.

Oh yeah, and totally off-topic: This business of CWD always being tracked by the kernel, etc., is making me think that the exokernel guys really do have the right idea, and that safely multitasking the hardware and providing nice system call abstractions should be independent features of the OS environment, not both mushed together into the one system-wide kernel.