Forum OpenACS Development: db_transaction hanging a thread how to detect, how to kill...

I am seeing what appears to be a deadlock within a db_transaction on OACS 4.6.2.

A page does something like:

db_transaction {
  db_dml ... {insert (money) values ((select * from wallet))}
  permission::grant morals bennett admin
  db_foreach vices {select vice from vices} {
      permission::grant $vice bennett use
  }
}

If I run this within the db_transaction, the thread hangs and then uh, something happens.  I see CPU load increase and stay there for a minute or two.  Then the CPI load drops down to idle.  Sophisticated ns_log debug shows the hang occurs with the 1st permission grant in the db_foreach loop.

What I find interesting is that:

1)  Without the db_transaction, all is swell.
2)  Granting perms could cause a deadlock?

What do folks think might be happening? Can PG detect deadlocks?  If not, should db_transaction have a timeout/watchdog mechanism?

PG will detect deadlocks, and fail the operation if it finds one. See the DEADLOCK_TIMEOUT option at http://www.au.postgresql.org/docs/7.2/static/runtime-config.html

This is pure unfounded speculation, but I'd start with the db_ API and it's handle allocation...

Jerry, you're not a loyal reader of my blog, are you? 😊

You shouldn't have db_dml statements inside a db_foreach or vice versa. Here's how to get around it:

http://rubick.com:8002/blogger/one-entry?entry%5fid=1643

Of course, the problem I have with that sometimes is that db_list and db_list_of_lists do not guarantee the order of the lists. So sorting gets screwed up.

So then you have to sort it. Ug.

Since the pg version of acs_permission__grant_permission, unlike the oracle version, is not able to detect duplicate inserts into acs_permissions, it uses a crude work-around that involves a table lock.  To get around this you would need to manually perform the table locking and inserting into acs_permissions from tcl.  See acs_permission__grant_permission for details.

Also, now that I think about it, I seem to recall that this work-around no longer works correctly in newer versions of pg (newer being > 7.1 maybe).  In practice, this hasn't been a problem since permissions are generally not manipulated much outside of the package installation phase.

Jade is correct in saying that you shouldn't mix db_foreach and db_dml in a transaction, since the two statements will use separate db handles, and you won't get the intended results. But, even if you make this change, you will still have the hanging problem, since the first permission grant statement outside the db_foreach loop will still be holding the lock until the transaction closes.

Jerry, in the code above you're using two different database handles both within a db_transaction block, right? Are the semantics of that even defined in the db_* API? What is it really doing? I bet only the first handle is getting a "begin transaction", the second is not doing anything special and is still in autocommit mode.

But either way it doesn't matter, if the intention is that the whole db_transaction block be one atomic transaction, that's impossible, as you're using two handles. Hm, probably db_transaction should notice this and throw an error... This seems like a good example of when the db_transaction syntactic sugar is noticeably more confusing than the simple "begin transaction ... end transaction" commands that are going on underneath.

I don't think db_transaction needs any form of timeout or deadlock detection. What it needs is to throw an error when people try to use two handles (to the same schema) within a db_transaction block, as two handles, by definition, cannot ever be part of one transaction. Even if using two handles within a db_transaction block happens to accidentally do what you want in any particuar case I'd still call it a programming error, and likely to make future maintenance of any such code rather confusing.

I'm still wondering whether we ought to simply turn off all use of more than one handle at once by default. I haven't yet seen any case where using two database handles at once (to the same schema) was actually a good idea. Seems normally we only do it by accident...

John McCarthy: By Grabthar's hammer, by the suns of Warvan, you shall be avenged!
Jerry, what's up, did something happen to John McCarthy? He's still alive an well at Stanford AFAIK.
Andrew,

I say that this is an excellent opportunity to imbed your extensive knowledge of the inner workings of the system by implementing your own suggestion to throw an error when this condition is detected. This type of inadvertant programming error may well be beyond the capability of the typical oacs programmer to detect and resolve. I would consider it great service if you were to implement your fix.

I would consider this to be a bug fix, but you may want to elevate it to TIP status.

Randy

Heh, is someone trying to feed my ego in order to get work out of me? ;)

Randy, ok, if no one objects, I'll do it sometime. Don't know exactly when yet though.

Jade, the business end of db_list looks like this:
    while { [db_getrow $db $selection] } {
        lappend result [ns_set value $selection 0]
    }
    return $result
I don't see how this could screw up your sorting, provided your order by was correct in the first place.
I think Jade probably meant the order of the columns in the list.

Any user of db_list_of_lists must assume that the column values are provided in a particular order, but the proper particular order to assume is defined elsewhere, in the query. So if you ever change the query, you have to change any code using the list generated by db_list_of_lists as well. Effectively, you have the specific column order defined in (at least) two independent places, which can lead to bugs.

If the result of query is used only in one place, then you're probably using it right after the db_list_of_lists call that generates it, and so this source of bugs, while still present, is minor. You are unlikely to forget to change both places when the two places are only three lines apart in the code.

If the query needs to be called from several different places, what is probably really desired is something like a keyed list, as found in R and (I think) TclX as well. Probably you could play with Tcl Arrays or ns_sets in some useful way here.

But instead, I've sometimes handled the problem like so:

Embed the query into a Tcl proc. Since you need to call it from several different places, you probably did this allready anyway. Add an optional "-cols_var" parameter to your proc:

ad_proc atp_foo_query {{ 
   -cols_var {} 
} foo_id } { 
      Returns info about the foo. 
} { 
   if { ![empty_string_p $cols_var] } { 
      upvar 1 $cols_var cols 
   } 
   # Order of the columns here and in the query MUST match: 
   set cols [list a b c d e f g] 
   return [db_list_of_lists big_nasty_query { 
   -- Big nasty query goes here.
   }] 
} 

Now, the caller of that proc does this:

set cols [list]
set data_lol [atp_foo_query -cols_var {cols} $foo_id] 
foreach row $data_lol { 
   # This sets local variables for all columns:
   foreach $cols $row { break } 
   # Do more processing here...
}

Note that the caller still has to know the name of every column he's interested in, but he does not need to care exactly what order the columns appear in. So, you can add a new column to the query and everything should (will probably) be fine. (Naturally, if the new column is named such that it smashes over a local variable, you will lose; but given halfway decent choice of variable names this is unlikely.)