Forum OpenACS Q&A: Slow SubSelect in BBoard Forum

Collapse
Posted by MaineBob OConnor on

Sulu: Warp 7 sir?
Kirk: It's too damn slow!...
-- Lyrics From: Star Trek Rhapsody by Weird Al

Here is the specific code that I use with my mods to the BBoard system in openacs3. What this does, is eliminate the "New Answers" page. This page is unreliable when using more than one machine to access the forums probably due to cookie changes. Also, the page often hides threads that I just didn't get around to read when I come back later.

My solution, implemented long ago, was to have just one page which shows the threads with the most recent listed first along with the name of the most recent poster.

Here is the code that I use. I have simplified it and removed the view so it is clear. The problem is that it can take 5 to 10 SECONDS to execute on our current bboard with about 18,000 messages.

The really slow part is the subselect: msg_id in (select... shown in red below. The subselect is used to get the most recent message in each thread. Every message in a thread has the same root_message_id and as well as the leading part of the sort_key (ex: 0004FF:01)

select now() from dual;
SELECT bb.msg_id, bb.root_msg_id, bb.refers_to, bb.topic_id,
bb.user_id AS poster_id, bb.one_line, bb.message, bb.html_p,
bb.posting_time, bb.sort_key, bt.topic,
u.first_names || ' ' || u.last_name AS name,  u.email
FROM bboard bb, bboard_topics bt, users u
WHERE bb.user_id = u.user_id AND
bb.topic_id = bt.topic_id AND
bb.dele_ted ISNULL AND
bt.topic_id = 11 and
date_num_days(sysdate() - bb.posting_time) < 90 and
bb.msg_id in (select max(bb.msg_id) from bboard
group by bb.root_msg_id)
order by bb.posting_time desc limit 1;
select now() from dual;

I even added a 2 column index:
CREATE INDEX bboard_root_msg_msg_idx
ON bboard (root_msg_id, msg_id);

But this seems to have little effect.

I've racked my brains on how to get the most recent row from variable sized group AS FAST AS POSSIBLE.

As I understand it, it is best to do the processing in the database before getting the results. I had considered getting the whole thread out of the database and in tcl throw away the unneeded messages.

Another idea is to build a helper table that inserts or overwrites the most recent message in a forum "topic" every time a post is made. Then, when I want to show the who and title of the most recent message in a thread, I just join bboard,users, bboard_topics with the new helper table.

Any suggestions?

-Bob

Collapse
Posted by Gilbert Wong on
I do the latter for my forums implementation.  I have a trigger which resets the last message id for any thread.  Each message also knows which thread it belongs to (another trigger).
Collapse
Posted by MaineBob OConnor on
Gilbert, could you share your triggers with me?  TIA
-Bob
Collapse
Posted by MaineBob OConnor on
Ok, I'm new to plpgsql, but I see that it is really cool, and potentally FAST and therefore it will be really useful to me. I haven't quite got the syntax right. Here is the code I plan to use in my helper table that stores the latest message in each thread.
Create table bboard_latest (
        root_msg_id      char(6) not null,
        msg_id           char(6) not null
);

DROP FUNCTION get_last_bbthread(char,char);
CREATE FUNCTION get_last_bbthread(char(6), char(6))
RETURNS boolean

AS 'DECLARE
vroot_msg_id ALIAS FOR $1;
vmsg_id ALIAS FOR $2;

BEGIN
IF length(vroot_msg_id) = 0 -- no root id, failure
   THEN RETURN ''f'';
END IF;

IF length(vmsg_id) = 0 -- no msg id, failure
   THEN RETURN ''f'';
END IF;

SELECT into temp x from bboard_latest 
  WHERE root_msg_id = vroot_msg_id;
  IF NOT FOUND -- is root not in table?
  THEN INSERT INTO bboard_latest (root_msg_id,msg_id)
    VALUES (vroot_msg_id, vmsg_id);
  ELSE UPDATE bboard_lastest
    SET msg_id = vmsg_id
    WHERE root_msg_id = vroot_msg_id;
  END IF;
  RETURN ''t'';
END;
' LANGUAGE 'plpgsql';

-- Try it
select get_last_bbthread('000400','000401');
ERROR:  parser: parse error at or near "x"

I have tried variations to get it right and I don't understand the necessity of "select into" vs just a select. I am further confused by the PG docs which say:

    Note: CREATE TABLE AS is functionally equivalent to SELECT INTO. CREATE TABLE AS is the recommended syntax, since SELECT INTO is not standard. In fact, this form of SELECT INTO is not available in PL/pgSQL or ecpg, because they interpret the INTO clause differently.

Ok, I would enjoy a bit of help and explaination.
THank you.
-Bob

Collapse
Posted by MaineBob OConnor on

Ok, Thank $your_deity for Google!

I found the solution with a couple of Google serches.
I had not defined "record".
So it is defined as rec and now works! Here's the good code:

DROP FUNCTION get_last_bbthread(char,char);
CREATE FUNCTION get_last_bbthread(char(6), char(6))
RETURNS boolean

AS 'DECLARE
vroot_msg_id ALIAS FOR $1;
vmsg_id ALIAS FOR $2;
rec record;

BEGIN
IF length(vroot_msg_id) = 0 -- no root id, failure
   THEN RETURN ''f'';
END IF;

IF length(vmsg_id) = 0 -- no msg id, failure
   THEN RETURN ''f'';
END IF;

  SELECT into rec * from bboard_latest 
    WHERE root_msg_id = vroot_msg_id;
  IF NOT FOUND -- is root not in table?
  THEN INSERT INTO bboard_latest (root_msg_id,msg_id)
    VALUES (vroot_msg_id, vmsg_id);
  ELSE UPDATE bboard_latest
    SET msg_id = vmsg_id
    WHERE root_msg_id = vroot_msg_id;
  END IF;
  RETURN ''t'';
END;
' LANGUAGE 'plpgsql';

-Bob

Collapse
Posted by Don Baccus on
It's saying that SELECT INTO means something different in PL/pgSQL as opposed to in the SQL engine proper, and that they recommend you use CREATE TABLE AS instead in SQL.

In PL/pgSQL SELECT INTO means the same as it does in Oracle's PL/SQL (which PL/pgSQL is modelled after). The idea is that you define some variables, say "v_foo" and "v_bar" and then select into those variables like this:

select t.foo, t.bar into v_foo, v_bar
from table t
where some_key = 'some_value';
Afterwards, v_foo and v_bar are available for use in your code. Grep around the OpenACS 4.5 sources for examples of this - there are tons of them.
Collapse
Posted by MaineBob OConnor on

OK, my above code works:

select get_last_bbthread('000400','000401');
 get_last_bbthread 
-------------------
 t
(1 row)

Now, I'm onto creating a trigger but it seems not to see my function:

CREATE TRIGGER bb_latest
BEFORE INSERT ON bbtest
FOR EACH ROW
EXECUTE PROCEDURE get_last_bbthread('root_msg_id','msg_id');

ERROR:  CreateTrigger: function get_last_bbthread() does not exist

Please help me find what is missing or incorrect. TIA.
Note: bbtest is the same as bboard and will eventually be changed in production.

-Bob

Collapse
Posted by MaineBob OConnor on
Ok, I took Don's suggestion and looked at some .sql code in oacs4.5. It appears that all functions used in triggers pass no variables and "returns opaque" is used to indicate this. Example:
create function my_fun () returns opaque as '
declare
      -- vars here;
begin
      -- code here;
end;' language 'plpgsql';

create trigger my_trigger after insert 
  on my_table for each row 
  execute procedure my_fun ();

So, slowly I'm getting this.... Based on this, it looks like I need to revise my function. What I need to understand is how column values are passed. In my working function above, They were passed to the function.

What I want this trigger to do, is to get two column values from bboard after any insert and then insert or update these two values into my helper table.
Hints appreciated!
-Bob

Collapse
Posted by MaineBob OConnor on
Ok, I've wandered around the pg docs and have finally groked it and I have something that works..... Posted here to help anyone else get the nack for messing with triggers and functions. Here it is:
DROP FUNCTION latest_thread();
CREATE FUNCTION latest_thread() RETURNS OPAQUE
AS 'DECLARE
   rec      record;

BEGIN
  SELECT into rec * from bboard_latest
    WHERE root_msg_id = new.root_msg_id;
  IF NOT FOUND -- is root not in table?
  THEN INSERT INTO bboard_latest (root_msg_id,msg_id)
    VALUES (new.root_msg_id, new.msg_id);
  ELSE UPDATE bboard_latest
    SET msg_id = new.msg_id
    WHERE root_msg_id = new.root_msg_id;
  END IF;
  RETURN NEW;
END;
' LANGUAGE 'plpgsql';

DROP TRIGGER latest_thread on bbtest;
CREATE TRIGGER latest_thread
AFTER INSERT ON bbtest
FOR EACH ROW
EXECUTE PROCEDURE latest_thread ();

-- now test it
select * from bboard_latest;
select * from bbtest;
insert into bbtest (root_msg_id,msg_id) values ('123456','123456');

-Bob