Vector Search Primer – Unlocking Better Search

John Cizmar

John Cizmar

Vice President @ MC+A

Using vector search instead of traditional keyword search produces better relevance and provides a better search experience.  Using vectors is also foundational to additional technologies that can further enhance your search experience.  The shift in relevancy from density to “similarity” is a game changer.  Some key points:

  • Vector search provides a more effective, accurate and complete search experience beyond traditional keyword-based algorithms.
  • Vector embeddings turn complex information, like words, sentences, or entire documents into mathematical objects for computer understanding.
  • Vector search can be used in retail eCommerce and other industries to improve customer satisfaction, increase revenue and reduce costs.

Why is there a need for another method?

To understand the problem let us look into the retail eCommerce space, as an example. Sales are expected to exceed 8 trillion dollars by 2026. That is a lot of searches. The risk of your search failing to provide a compelling experience that meets or exceeds a user’s expectations, is not something to be overlooked. Poor or underperforming search is a competitive disadvantage because the transaction ends up completed somewhere else. All the big retailers are responding to increase traffic by improving their search accuracy. This then sets the expectations for users as they visit other sites. Traditional Keyword search matches best results based on matches of the search terms to fields in their database’s documents.  The more keywords that match, the more ‘relevant’ the document.  If the keyword is not in the document, it will not be matched.  A work around to this is to create synonyms.  This is typically a one-off fix to match a specific keyword to an alternative keyword.  (Think “Pop” and “Soda”).  To be successful with this approach, a relevancy team has to constantly manage their list of rewrites. Overall, this reduces the relevancy since you are reducing the precision of the results. This manual process can lead to overall lower relevancy scores.

Vector Search to the Rescue! (I promise there will be no math)

Let’s start with understanding vectors. You might remember vectors from math class, they are quantities that have both magnitude (how big they are) and direction (where they are going). A simple example could be a car driving at 60 mph due east; 60 mph is the magnitude and “east” is the direction. In the context of machine learning and artificial intelligence, vectors represent a lot more complex information.

A VERY Quick Overview of Vector Embeddings

In the world of machine learning, an “embedding” is a way of representing more complex information, like words, sentences, or even entire documents, as vectors. Just like we can represent a car’s movement as a vector, we can also represent a word or a piece of text as a vector. These vectors have many more dimensions than just two (like “east” and “60 mph”), sometimes hundreds or even thousands, and each dimension represents a different feature or characteristic of the word or text. Vector embeddings allow us to turn complex and abstract things like words, sentences, or documents into mathematical objects (a.k.a. vectors) that a computer can understand. We use these vectors to find similarities and differences, group similar items together, make predictions, and so forth. So, in simple terms, vector embeddings are a way of turning words or other pieces of information into a mathematical form that computers can understand and work with.

Still with me? On to why this matters.

So, what is the hype all about? Traditionally, search experiences rely on keyword-based algorithms, like Frequency/Inverse Document Frequency TF-IDF, so a searchers’ request is made against an index that might contain the content that they’re looking for based on this algorithm. This approach is limited by the effort in balancing precision and recall (also known as relevancy).

Vector search is a method for finding the most similar pieces of information in a database, based on these vector representations. This type of search looks for the most similar vectors in a database given a query vector. For example, if you have a database of word vectors, and you input a “vectorized query”, you will find the words in your database that are most similar to your input word.

Vector search use cases

Vector search = Semantic Search

Semantic search seeks to improve search accuracy by understanding the intent of the query and the contextual meaning of terms as they appear in the searchable data space. In other words, it’s a fancy way of saying, “Find stuff that means the same thing, not just stuff that uses the same words.”
https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english

Vector search = Recommendations

Recommendation systems are what suggest the next movie to watch on Netflix, the next song to listen to on Spotify, or the next product to buy on Amazon. They aim to predict what you might like based on your past behavior or the behavior of other users who are similar to you. Each item (movie, song, product) is represented as a vector. With movies, as an example, this vector could include elements like the movie’s genre, the actors, the director, the length, and so on. Now, if you just watched “Inception,” a vector is created based on the characteristics of this movie. The recommendation system then searches for other movie vectors that are close to “Inception” in the vector space and recommends those to you.
https://www.imdb.com/title/tt3659388/?ref_=nv_sr_srsg_0_tt_8_nm_0_q_martian

Vector search = Question Answering

Question answering (QA) systems aim to answer questions posed by users, usually in natural language. Think Siri, Alexa, or Google Assistant. In a QA system, vector search is used to find the most suitable answer to a user’s question. Like the other use cases, each potential answer is represented as a vector. When a user asks a question, that question is also converted into a vector. The system then finds the answer vectors that are closest to the question vector, meaning they are likely to be the most relevant answers.
https://www.bing.com/search?q=who+won+the+battle+of+gettysburg&form=QBLH&sp=-1&ghc=2&lq=0&pq=who+won+the+battle+of+gettysburg&sc=9-32&qs=n&sk=&cvid=6165ECE901AA4B7A94CFC0F7C024EC72&ghsh=0&ghacc=0&ghpl=

Next Steps

Implementation of vector search, will provide better experiences for users. It is a paradigm shift in technology that switches search from a model term density to “similarity” where the systems understands the user’s intent beyond the words they entered into a search box.  The opportunity for disruption is real. With the right strategic approach and tactical execution this technology effectively you will improve customer satisfaction, increase revenue, and reduce costs.

Trusted Advisor

Go Further with Expert Consulting

Launch your technology project with confidence. Our experts allow you to focus on your project’s business value by accelerating the technical implementation with a best practice approach. We provide the expert guidance needed to enhance your users’ search experience, push past technology roadblocks, and leverage the full business potential of search technology.

Recent Insights

Experience Intelligence – Investigation

Speakers Description In this webinar we discuss MC+A’s new solution and approach for Intelligence Experiences for investigation use cases using LLM technology and machine learning. We demo how the solutions can act as a catalyst in expediting investigative processes. LLM technology assist with investigation due to its ability to understand, interpret, and analyze vast swathes of data, thereby aiding in

Read More »

E-commerce Relevancy Improving B2B with Vectors

Join our panelists for a webinar where they discuss approaches for improving relevance for e-commerce search. They will cover ELAND and ELSER, promising to reshape the relevance landscape with vector-based search. Don’t miss out on an interesting discussion that could change your approach to e-commerce search relevancy.

Read More »
Scroll to Top