Gustavo Llermaly
Elastic Certified Engineer
Elasticsearch NLP “The fun part!”
- Language detection
- Extract named entities
- Phrase prediction
Language detection
In this simple example, before indexing a document, the language detection model enriches the document based on what the model inferred from the document – in this case, the content of the document title. Understanding the language of a document is a simple but powerful capability. You could, for example, use specific language mappings ‘automagically’ based on the output of this model to provide more precise and meaningful results to your users. You know what’s the best part? This model is available by default in Elasticsearch!
Using Language detection in Elasticsearch
To use the model, navigate to Dev Tools and in Kibana run the following query:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"inference": {
"model_id": "lang_ident_model_1"
}
}
]
},
"docs": [
{
"_source": {
"text": "hello, my name is Gustavo"
}
}
]
}
This POST runs the string through the model and, as you can see below, adds many fields to our ‘document’.
Extracting named entities (NER)
According to Wikipedia, named-entity recognition or NER “is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.” This classification attempts to extract words from a selection of text into proper names or numerical entities.
Using NER in Elasticsearch
To use named-entity recognition in Elasticsearch we need to load one of the many supported 3rd party model. Good news is that the process of loading models is straightforward.
1. Install Eland client to load models into Elasticsearch
python -m pip install eland
2. Push the model to elasticsearch
docker build -t elastic/eland
3. Replace URL with yours in format user:password@url
docker run -it –rm –network host \
elastic/eland \
eland_import_hub_model \
--url https://user:password@example-instance-url:9243 \
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
--task-type ner \
--start
Sit tight while the models are pushed to your cluster. After it is pushed, you can just try the model.
POST /_ml/trained_models/dslim__bert-base-ner/deployment/_infer
{
"docs": {
"text_field": "MC+A Is a search company located in Chicago, Michael Cizmar is the CEO"
}
}
As you can see above, the model has again added additional fields to our document, this time describing the classification. Pretty cool!
Mask Filling
Mask Filling or masked language modeling is an ML task of masking some words in a sentence and predicting which words should replace those masks. Mask filling can be very helpful when you need a statistical understanding of texted-based data, and it can be applied to domain-specific content, such as a large corpus of research papers.
Using Mask Filling in Elasticsearch
Since we already have eland in place, we just need to rerun it with a different model and task-type.
eland_import_hub_model --url https://user:password@example-instance-url:9243 --hub-model-id bert-base-uncased --task-type fill_mask --start
Mask filling is personally one of my favorite techniques to use because you never know what the algorithm will predict. Let’s take a closer look by executing this line:
POST /_ml/trained_models/bert-base-uncased/deployment/_infer
{
“docs”: {
“text_field”: “Michael Cizmar is a [MASK] person”
}
}
What do you think the answer will be?
As you can see, the mask was replaced by ‘business.’ Not too bad out of the box!
POST /_ml/trained_models/bert-base-uncased/deployment/_infer
{
“docs”: {
“text_field”: “The city of [MASK] is considered one of the best places to live”
}
}
Conclusion
In this article, we covered some of the more interesting NLP capabilities in Elasticsearch, along with a few demonstrations of how you can use them in Elasticsearch.
Additional Information
Go Further with Expert Consulting
Launch your technology project with confidence. Our experts allow you to focus on your project’s business value by accelerating the technical implementation with a best practice approach. We provide the expert guidance needed to enhance your users’ search experience, push past technology roadblocks, and leverage the full business potential of search technology.