Forum OpenACS Development: Re: Semantic Search in OpenACS

Collapse
Posted by Neophytos Demetriou on
I have shared an initial implementation of the TCL/C extension with Gustaf. In short, it provides three commands: load_model, unload_model, and ev (standing for embeddings vector). I'm going to use this module tomorrow to compute the vectors to store in pgvector (and later on in solr and faiss).

In short, an embedding vector is a list of numbers that captures some of the semantics of the input by placing semantically similar inputs close together. This list of numbers depends on the language model and how it was trained. Embedding vectors help us find phrases that are relevant to a query, even if they have different words.

Here's an example of what the results might look like to a given query (note that the first result contains only one of the keywords of the user, yet it is the most similar according to the language model used):

Search query: "Should I get health insurance?"

Search results:
1. Should I sign up for Medicare Part B if I have Veterans' Benefits?
(similarity score: 0.5152)
2. Can I sign up for Medicare Part B if I am working and have health insurance through an employer?
(similarity score: 0.4782)
3. How can I get help with my Medicare Part A and Part B premiums?
(similarity score: 0.4490)

If you have any questions, please do not hesitate and let me know.