Lars, my understanding is that fixing the Tcl thread-safety bug in
question is quite non-trivial.
In his original 3003-03-26 post to the AOLserver list reporting the
problem, Zoran said:
The problem is in Tcl generic/tclIOUtil.c and naive handling
of static Tcl_Obj *cwdPathPtr. The pointer to this Tcl object
gets shuffled arround threads by simple reference, it is
read (referenced) without proper locks, etc.
The implementor obviously protected the most obvious write
operations, but neglected any others. Also, the Rule#1 in
Tcl "Do not pass Tcl_Obj's between threads" is grossly violated.
And here's a different, more round-about take on why it's probably a
difficult problem:
Back in March when this came up on the AOLserver list, I didn't
understand that the current working directory of a process is
maintained process by the kernel, not by the process itself.
So I was speculating about maybe being able to fix things by simply
giving every thread it's own independent thread local storage (aka,
thread specific data) CWD. Here's what Rob Mayoff had to say about
that:
Perhaps you do not realize that a process's current working directory
is tracked by the kernel, not by the process. Tcl keeps track of its
CWD for speed, but ultimately it's the kernel, not the process, that
resolves relative pathnames, so it's the kernel's idea of the CWD that
matters.
I believe that POSIX requires that all threads in a process share a
working directory. Making each thread appear to have its own working
directory requires either non-standard kernel support for per-thread
CWD (which Linux has, but I don't think you can get to it through the
pthreads interface), or intercepting every system call that involves a
pathname (open, link, symlink, unlink, rename, access, stat, lstat,
chdir, chroot, chmod, chown, lchown, mknod, mkdir, rmdir, bind,
connect, and probably some more that I've forgotten). You might be
able to ignore some of these for AOLserver, but intercepting any of
them isn't necessarily easy, and it's definitely not possible to do so
portably.
It still might be the best way to fix this problem, though.
Note Rob's last line - scary! Zoran independently said much the same
thing:
Eh, the cwd is the thing which is used by most path-related sys/lib
calls to resolve the absolute path of the file. It is tracked in the
kernel, not in the process, so in order to make this happen, you ought
to intercept *all* of the sys/lib calls fiddling with paths. Now, Tcl
with its virtual filesystem *might* achieve this, since it really
isolates the upper layers from the OS-specifics. But, if you ask me,
I think this is voodoo.
To be honest, I was also playing with this idea, but after giving it a
serious thought, I've abandoned it.
Anyway, Zoran was working on fixing the bug, and last we heard he had
some sort of fix (maybe only partial, I'm not sure) as of March 27,
but it wasn't in the Tcl core yet. I haven't heard anything since
then.
Oh yeah, and totally off-topic: This business of CWD always being
tracked by the kernel, etc., is making me think that the
exokernel
guys really do have the right idea, and that safely multitasking the
hardware and providing nice system call abstractions should be
independent features of the OS environment, not both mushed
together into the one system-wide kernel.