Forum OpenACS Development: Re: Project to build a new FtsEngineDriver using tsearch2 for PostgreSQL

Paul,

Right now the txt table includes the object_id (tid) and the indexed terms.

To expand that we might want to store the text to be indexed, but I am not sure how much duplicating that will work. For example, cr_revisions store the text of the revisions in cr_revisions.content.

We definitely need an efficient way to get the text to generate headlines when returning search results. Doing a seperate query for each row really doesn't work.

So, I guess a first draft of the tsearch search might be something like this:

create table txt (
    object_id integer references acs_objects,
    content text,
    tsv tsvector
);

And content to be indexed would be stored in the content column, and the tsvector index for the content would be stored in the tsv column.

For cr_items it would only contain the live revision of the content to be indexed, so other revisions would not be copied into the txt table.

I think we want to get most of the other attributes from the acs_objects table if possible.

One important point is that tsearch2 can assign 4 different weights to parts of the document. So the title, description, and content, as well as other metadata such as author, categories/keywords could be assigned to different weights. This should probably be configurable somehow. Should there be 4 coluns for each different part, A, B, C, D to be parsed by tsearch2 instead of one column just for "content"?