Forum OpenACS Development: Response to Enhanced KM in OpenACs 4.0 -- interface to OpenCyc

OK, this helps quite a bit.

The query optimiztion requires taking the search request and passing it through OpenCyc to try to figure out the context of the request. Java is a good example: does the requestor want to know about the programming language, coffee, or the island in Indonesia.

One way to assist this contextualization is through the localization of the originating context. This can help with point 2. On the OpenACS site, for example, it is most likely that people are interested in the programming language. On Photo.Net, newbies are probably interested in the Island, but people who are familiar with Philip and Alex's Guide and its ties to photo.net might be interested in the programming language. In neither case is coffee a likely context.

If I understand correctly, OpenCyc implemented on an OpenACS site could, for example, recognize that a content item is likely to be about Java even if the term "Java" isn't used in it because it knows that the words "Swing," "classes," "beans," and "introspection" are all terms that relate to Java the language. It should even be able to guess that an OpenACS.org page that uses the word "Swing" is probably relevant to somebody looking for information about user interfaces.

Now, OpenCyc could accomplish this goal simply by presenting the search results in three categories: programming language, coffee, and geographic location. To do this much, you would just have to pass the search results through OpenCyc and display by category. (Right so far?)

It would be more convenient, though, if OpenCyc knew that people searching the OpenACS site are extremely unlikely to be looking for information about coffee or geographic location. This information would reduce the number of user clicks required to get the info they want and also reduce the number of false positives. That, by itself, doesn't sound like a huge gain on a site the size and focus of OpenACS.org, although it could be more significant in a large corporate intranet.

However, there is more here:

Step 1 requires no knowledge of the contents of the collection. Localization, as described above, does. In order to be able to localize, it's necessary to do Step 3, which is have OpenCyc process the entire contents of a collection so that it "understands" what is in the collection.

[Snip.]

A simple category structure would enable localization, too. If the user searched from within the "travel" category, then the scope of the search would automatically localized to travel.

So, if I'm getting this right, OpenCyc could learn to be more useful by being able to access the keywords in the content repository. It could, for example, start to recognize that certain proc names are associated with, say, the request processor. So Cyc could tell the search engine to look for those proc names as additional search terms when a user enters "request processor" as the term. Alternatively, if the proc is used in several different applications, it could group search results by application. (I have no idea if this particular example would be useful to anybody; I'm just making it up.)

Category organizers, in my way of thinking, should spend as much time thinking about metadata as they do about the category structure. For example, I have a personal interest in chocolate. If I was responsible for the chocolate category at a site that incorporated both categories and a page-based search engine, I would want to create a thesaurus of concepts that describe chocolate to aid a classification engine. With OpenCyc, I would use the OpenCyc tools to do this work using assertions like:

is a type of: white, ivory, milk, dark, bittersweet, truffle, bonbon, ganache
is a manufacturer of: Callebaut, Valrhona, Nestle, Hershey
has(?) holidays: Easter, Valentine's Day
is not related: labrador retriever

I had a similar thought to this a while back:

http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg%5fid= 0009Go&topic%5fid=ACS%20Design&topic=

I was originally thinking about relationships between content items only, but you could do it with relationships to categories or keywords as well. How much would you have to change in the content repository data model in order to do this?