Russell Proud
Co-Founder @ Decided.AI
Michael Cizmar
Managing Director @ MC+A
Introduction to Vectors and Search
Search has come a long way from the days of using keywords to identify the most relevant results. The most recent iteration and improvement to search is semantic search, which is to extract meaning from content within documents (images and text) and extract meaning from queries at search time to generate more relevant results. The added benefit of this is also limiting zero result searches, as a search for “Red running shoes with white stripes” that previously using semantic search may have returned no results can now return products that are similar, eg it may return a pair of running shoes that are blue with white stripes. This is also a great way to create “related product” lists.
Companies and search engineers have a number of approaches that can be taken to deliver semantic search across their search infrastructure, and with the explosion of ChatGPT and LLMs in general, more and more companies are looking at how they can implement semantic search.
In this article we explore and compare some of the methods available with Elasticsearch. Specifically, we’re going to look at ELSER [reference], Elastic’s one click semantic search function and alternative approaches via ELAND, deploying and using some of the most recent text transformer models from Hugging Face to Elastic.
The Importance of the Judgment Set in B2B Relevancy
Knowing Your Data
- Title
- Description
- Brand
- Category Taxonomy
- Title
- Description
- Department (L0)
- Category (L1)
- Sub Category (L2)
Baseline Results
Prior to any work being undertaken, the judgment set and current lexical query was scored via the rank evaluation endpoint. This client’s use case was B2B which has typically a set of very knowledgeable users who are constant ‘active buyers’, resulting in a starting base level relevancy that was strong but unpredictable in many cases for what should be obvious. The results of the baseline query are below.
Search Method | Rank Eval Score |
---|---|
Current lexical query | 0.6347583710026408 |
One Click Semantic Search?
How Does it Perform?
Search Method | Rank Eval Score |
---|---|
Current lexical query | 0.6347583710026408 |
ELSER | 0.3742842674545238 |
ELSER combined with Lexical | 0.6318659471766467 |
In our clients case, using ELSER showed a large reduction in relevancy, though is this really the case? As we touched on, the judgment set is static, that means, if products weren’t in the prior result set of the lexical query, they’re not going to add to the score, even if they are highly relevant.
We used Quepid to execute the searches and manually inspected the results and re-scored some known low performing queries.
To give you an example of one of these queries, below is the query “AP Flour”. This term currently scores 0.51 using the nDCG@10 scoring method. When we manually reviewed and rescored for ELSER, this queries score improved to 0.81. Finally, the combined Lexical and ELSER query resulted in a score of 1, meaning the top 10 results were highly relevant.
Search Term | Search Method | Quepid Score |
---|---|---|
ap flour | Lexical | 0.51 |
ap flour | ELSER | 0.81 |
ap flour | EISER & Lexical Combined | 1.00 |
Search Term | Search Method | Quepid Score |
---|---|---|
ap flour | Lexical | 0.51 |
ap flour | ELSER | 0.81 |
ap flour | EISER & Lexical Combined | 1.00 |
bowls | Lexical | 0.75 |
bowls | ELSER | 0.23 |
bowls | EISER & Lexical Combined | 0.77 |
Taking it to the next level, eland and transformers
- Import the model into your Elastic Cluster
- Deploy the model so it is available to use
- Create an index with the relevant knn fields and ensure you have set the model via the query_vector_builder parameters on those fields.
- Create an ingestion pipeline that passes the relevant fields through the model and outputs the vectors for storing alongside the document
- Index the documents via the pipeline, either by re-indexing from source or copying an existing index
- Modify or create a new query that searches the index
How Did They Perform?
Search Method | Rank Eval Score |
---|---|
Current lexical query | 0.6347583710026408 |
ELSER | 0.3742842674545238 |
ELSER combined with Lexical | 0.6318659471766467 |
all-minilm-v12-l2 | 0.35326441721855384 |
Combined lexical and all-minilm-v12-l2 | 0.6343283590195509 |
all-mpnet-base-v2 | 0.3503362897757086 |
Combined lexical and all-mpnet-base-v2 | 0.6342153512985733 |
Search Term | Search Method | Quepid Score |
---|---|---|
ap flour | Lexical | 0.51 |
ap flour | ELSER | 0.81 |
ap flour | EISER & Lexical Combined | 1.00 |
ap flour | all-minilm-l12-v2 | 0.89 |
ap flour | all-minilm-l12-v2 & Lexical Combined | 1.00 |
bowls | Lexical | 0.75 |
bowls | ELSER | 0.23 |
bowls | EISER & Lexical Combined | 0.77 |
bowls | all-minilm-l12-v2 | 0.81 |
bowls | all-minilm-l12-v2 & Lexical Combined | 0.92 |
TL;DR Summary
What we’ve undertaken here shows us that:
- Combining lexical and a machine learning model yields comparative results to that of the pure lexical when using a static judgment set
- anecdotally, all-minilm-l12-v2 out performs ELSER and a good lexical search query based on our manual re-scoring of some known low performing terms
- ELSER is simple to deploy and yields better results when combined with Lexical, but has its limitations for our use case
- All-minilm-l12-v2 is likely superior to ELSER
This undertaking for our client was not aimed at providing a definitive answer for them to follow, hence, time was not invested to rescore every single term used in the judgment set. It was intended to provide our client a path forward that they could implement internally and then measure those improvements using their internal analytics.
At the end of the day, there is only so much you can do with measuring results programmatically, the proof is in the pudding as they say. With strong indications like we have above (and more than we undertook as part of this work), the customer will take the work and implement an A/B test. They then measure the performance of each search term via click through rates, click position, conversions and zero result search terms and other internal metrics and continually refine.
If you’re looking to improve your B2B or eCommerce search, we have the experience, tools and capabilities, to deliver independently or alongside your team, the process and systems needed to implement Semantic Search and improve your customer experience.
Reference Articles
Go Further with Expert Consulting
Launch your technology project with confidence. Our experts allow you to focus on your project’s business value by accelerating the technical implementation with a best practice approach. We provide the expert guidance needed to enhance your users’ search experience, push past technology roadblocks, and leverage the full business potential of search technology.
Recent Insights
Experience Intelligence – Investigation
Speakers Description In this webinar we discuss MC+A’s new solution and approach for Intelligence Experiences for investigation use cases using LLM technology and machine learning. We demo how the solutions can act as a catalyst in expediting investigative processes. LLM technology assist with investigation due to its ability to understand, interpret, and analyze vast swathes of data, thereby aiding in
Comparing Performance of OpenAI GPT-4 and Microsoft Azure GPT-4
In this article, we’ll compare the performance of OpenAI’s API versus Microsoft’s API when utilizing GPT-4.
E-commerce Relevancy Improving B2B with Vectors
Join our panelists for a webinar where they discuss approaches for improving relevance for e-commerce search. They will cover ELAND and ELSER, promising to reshape the relevance landscape with vector-based search. Don’t miss out on an interesting discussion that could change your approach to e-commerce search relevancy.