Forum OpenACS Development: Re: What is the general usefulness of the dir_all_users view?

I did a little more digging on the cc_users query plan and discovered that the problem seems to be that the genetic query optimization just gives a terrible result. The default threshold (GEQO_THRESHOLD) is 11 elements in the from clause and it turns out cc_users by itself is right on the threshold so when queried by itself it has a nice query plan but when joined it goes over the threshold and the (as far as I can tell -- horrible beyond words) genetic query optimizer takes over and we get these bad plans.

On openacs.org I did the following:

openacs.org-dev=# set geqo_threshold to 20;
SET VARIABLE
openacs.org-dev=# explain select * from forums_messages fm, cc_users u where fm.message_id = 55001 and u.user_id = fm.user_id;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..17.27 rows=1 width=379)
  ->  Nested Loop  (cost=0.00..15.23 rows=1 width=351)
        ->  Nested Loop  (cost=0.00..13.21 rows=1 width=335)
              ->  Nested Loop  (cost=0.00..10.34 rows=1 width=254)
                    ->  Nested Loop  (cost=0.00..8.30 rows=1 width=152)
                          ->  Nested Loop  (cost=0.00..6.26 rows=1 width=144)
                                ->  Index Scan using forums_messages_pk on forums_messages fm  (cost=0.00..4.23 rows=1 width=116)
                                ->  Index Scan using persons_pk on persons pe  (cost=0.00..2.02 rows=1 width=28)
                          ->  Index Scan using group_elem_idx_element_idx on group_element_index  (cost=0.00..2.03 rows=1 width=8)
                    ->  Index Scan using users_pk on users u  (cost=0.00..2.03 rows=1 width=102)
              ->  Index Scan using acs_objects_pk on acs_objects o  (cost=0.00..2.85 rows=1 width=81)
        ->  Index Scan using membership_rel_rel_id_pk on membership_rels mr  (cost=0.00..2.01 rows=1 width=16)
  ->  Index Scan using parties_pk on parties pa  (cost=0.00..2.02 rows=1 width=28)
Which is a nice plan and the query is then quite fast.

I posted on pgsql-performance about this problem and got an answer from Tom Lane about what is really going on here. Says Tom Lane on pgsql-performance:

This is not actually using GEQO. The reason you are seeing an effect from raising geqo_threshold is that geqo_threshold determines whether or not the view will be flattened into the upper query. For this particular query situation, flattening the view is essential (since you don't want the thing to compute the whole view). The relevant source code tidbit is

/*
 * Yes, so do we want to merge it into parent?    Always do
 * so if child has just one element (since that doesn't
 * make the parent's list any longer).  Otherwise we have
 * to be careful about the increase in planning time
 * caused by combining the two join search spaces into
 * one.  Our heuristic is to merge if the merge will
 * produce a join list no longer than GEQO_RELS/2.
 * (Perhaps need an additional user parameter?)
 */

AFAICS, your only good solution is to make geqo_threshold at least 14, since you want a 7-way join after flattening.

regards, tom lane