Elasticsearch 8 NLP and Beyond

blue and white floral textile
Gustavo Llermaly

Gustavo Llermaly

Elastic Certified Engineer

Elasticsearch releases always include many features and improvements. You review all the features in the official release guide. The most interesting features introduced from our perspective are the new Natural Language Processing (NLP) features.

Elasticsearch NLP “The fun part!”

NLP as a term describes methods and techniques that allow software to understand natural language in text or audio. The Elasticsearch machine learning features are based on BERT and transformer models that align to the standard BERT model interface.

For practical purposes, in Elasticsearch, we use ML models to facilitate NLP. These collections of models allow us to preform text processing to enrich text-based content, making it more robust and useful. This enrichment allows search requests to better understand strings of text allowing a search experience, for example, to interpret a user’s intent providing a better experience.

Constructing and training models is a topic for a separate article, so for the sake of this one, we’ll assume that you already have an existing model in place. We’ll walkthrough how the new Elasticsearch interface capabilities can be used to store and leverage your model in the search solution.

The first key is that most of these capabilities are applied at index time. This means the document’s processed and enriched, adding additional metadata information that was ‘inferred’ based on input coming from the same document and a pre-trained model during the ingestion process.

For our example enrichment, we’ll utilize some common NLP tasks in Elastic:

  • Language detection
  • Extract named entities
  • Phrase prediction

Language detection

In this simple example, before indexing a document, the language detection model enriches the document based on what the model inferred from the document – in this case, the content of the document title. Understanding the language of a document is a simple but powerful capability. You could, for example, use specific language mappings ‘automagically’ based on the output of this model to provide more precise and meaningful results to your users. You know what’s the best part? This model is available by default in Elasticsearch!

trained models for Elasticsearch NLP

Using Language detection in Elasticsearch

To use the model, navigate to Dev Tools and in Kibana run the following query:

				
					POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "inference": {
          "model_id": "lang_ident_model_1"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "text": "hello, my name is Gustavo"
      }
    }
  ]
}

				
			

This POST runs the string through the model and, as you can see below, adds many fields to our ‘document’.

Elasticsearch NLP language intent

Extracting named entities (NER)

According to Wikipedia, named-entity recognition or NER “is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.” This classification attempts to extract words from a selection of text into proper names or numerical entities.

Using NER in Elasticsearch

To use named-entity recognition in Elasticsearch we need to load one of the many supported 3rd party model.  Good news is that the process of loading models is straightforward.

1. Install Eland client to load models into Elasticsearch

				
					python -m pip install eland
				
			

2. Push the model to elasticsearch

				
					docker build -t elastic/eland
				
			

3. Replace URL with yours in format user:password@url

				
					  docker run -it –rm –network host \
    elastic/eland \
    eland_import_hub_model \
      --url https://user:password@example-instance-url:9243 \
      --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
      --task-type ner \
      --start

				
			

Sit tight while the models are pushed to your cluster.   After it is pushed, you can just try the model.

				
					POST /_ml/trained_models/dslim__bert-base-ner/deployment/_infer
{
  "docs": {
    "text_field": "MC+A Is a search company located in Chicago, Michael Cizmar is the CEO"
  }
}

				
			

As you can see above, the model has again added additional fields to our document, this time describing the classification. Pretty cool!

Mask Filling

Mask Filling or masked language modeling is an ML task of masking some words in a sentence and predicting which words should replace those masks. Mask filling can be very helpful when you need a statistical understanding of texted-based data, and it can be applied to domain-specific content, such as a large corpus of research papers.

Using Mask Filling in Elasticsearch

Since we already have eland in place, we just need to rerun it with a different model and task-type.

				
					eland_import_hub_model --url https://user:password@example-instance-url:9243 --hub-model-id bert-base-uncased --task-type fill_mask --start
				
			

Mask filling is personally one of my favorite techniques to use because you never know what the algorithm will predict. Let’s take a closer look by executing this line:

				
					POST /_ml/trained_models/bert-base-uncased/deployment/_infer
{
  “docs”: {
    “text_field”: “Michael Cizmar is a [MASK] person”
  }
}

				
			

What do you think the answer will be?

NER example Elasticsearch NLP

As you can see, the mask was replaced by ‘business.’ Not too bad out of the box!

				
					POST /_ml/trained_models/bert-base-uncased/deployment/_infer
{
  “docs”: {
    “text_field”: “The city of [MASK] is considered one of the best places to live”
  }
}
				
			

Conclusion

In this article, we covered some of the more interesting NLP capabilities in Elasticsearch, along with a few demonstrations of how you can use them in Elasticsearch.

Additional Information

Trusted Advisor

Go Further with Expert Consulting

Launch your technology project with confidence. Our experts allow you to focus on your project’s business value by accelerating the technical implementation with a best practice approach. We provide the expert guidance needed to enhance your users’ search experience, push past technology roadblocks, and leverage the full business potential of search technology.

Scroll to Top