Forum OpenACS Q&A: Current Search Solutions for OpenACS

Collapse
Posted by Bob OConnor on

Now that OpenACS version 3.2.2 is out the door {YES!}, I want to revive these questions posted earlier regarding SEARCH and solutions that various OpenACS users have come up with. Most answers to these questions were posted in April/May:

Are there simple solutions to search based on keywords and not full text that can be implemented now until we get the "ultimate" solution?

TIA - Bob

Collapse
Posted by Bob OConnor on

I see that my email version of this question doesn't have the full url's extracted from the a href tag. So here they are:

1)Search: Not yet implemented with postgres posting1

2)Full text search in PostgreSQL? posting2

3)Searching a PostgeSQL database posting3

Collapse
Posted by Don Baccus on
I've barely begun resurrecting the old bboard search scripts used by photo.net before Oracle (and Context and Intermedia) entered the picture.  Slow, but served photo.net for a couple of years, and actually worked.

I've repaired bboard_contains but my laptop doesn't have network access at the moment (working on that, I'm in Boston for three weeks) so I've not updated the CVS tree.  I've been sick with something that's suspiciously like the flu, right down to details like 103 degree fevers, so it probably won't be until next week that I get this  first cut at a search facility - for bboard only - up and running.

I hope to work on a more comprehensive and more efficient approach for site-wide searching later this summer.

Collapse
Posted by MaineBob OConnor on
Hi Dan, I hope you get over the flu soon.--- IANAD and I don't play one on TV, but rest and fluids are recommended!

The BBoard search from the old photo.net will help.  I also wonder if there is a somewhat manual way to set up a search for content.  I envision running the static html files through a program that will
capture from (Meta Name="keywords"...).  Then when someone does a search, they get the list of docs that have these keywords.  Also
it would be good to record the search words that a user has entered so that when they get nothing, we can add and tweek the search table.

Not wanting to reinvent the wheel, has anyone done this?
Even when we have the full text search, it would be nice to
have this edit feature to help users get quickly to the most
relevant content.

-Bob

Collapse
Posted by MaineBob OConnor on
Sorry Don, I called you Dan in the message above. There is no way to change that.  When I want to post using the Q & A, my reply form
shows only the question not all of the other answers.  I think
that to carry on a conversation, the "thread to date" should be
shown in the /q-and-a-post-reply-form.tcl.  As it stands now,
I have to open another browser to cut and paste or review
other answers that I may want add to or comment on.  -Bob
Collapse
Posted by Roberto Mello on
Post this as a "feature request" in the SDM, otherwise it'll be forgotten.
Collapse
Posted by MaineBob OConnor on
Hi Roberto,

I Posted this as a "feature request" it's #533 in the SDM.

Collapse
Posted by Nicolas Boretos on
Hi,
Firts of all thanx for some great software. I do not know which approach to full text indexing you guys are considering, but you may want to check out the Isearch/Isite implementations on vinca.cnidr.org. These are open source, z39.50 based index/retrieval solutions aimed at the documentation sector, along with a web gateway via cgi. The indexing and searxh bins are about 300kb.

Another site with similar sw is www.indexdata.dk, again open source, offering their index, search and zebra server.
Both offer standard document types (text, sgml,html etc.), so fielded search is possible.

Assuming one can incorporate their libs into the acs, to bypass a cgi implementation, I think that would solve indexing a site's content not in a db, but indexing a postgres db would probably mean table dumps, and then indexing. So now we have two structured db's to contend with....

I am in the process of shifting this indexing gateway from apache cgi to aol cgi, but just cant seem to get aol cgi to respond...

Anyway, my too sense,
Nicolas Boretos

Collapse
Posted by Glen Stewart on
I've set a few web-crawler/search-engines loose on my company's Intranet and was quite disappointed with both ht://dig and UDMsearch - they are both VERY slow at lookups IF you can even complete the crawl.

I ended up writing my own crawler using Pavuk to crawl and Swish++ to index/search.  I've thrown a lot at Swish++ and it's the most impressive (speed, stability, overhead) indexing/search engine I've ever tried - GPL too.  See http://www.best.com/~pjl/software/swish/