ReactiveSearch provides quite a lot of options to modify the search query before it reaches ElasticSearch. There are functionality like replace search, remove words etc. These are all provided in pipelines as well through pre-built stages.
In this guide, we will build a pipeline that overrides the _reactivesearch
endpoints and uses pre-built stages to manipulate the query before it reaches ElasticSearch.
Pretext
Before pipelines, this functionality was provided through Query rules. These rules can be created by the user and they take effect in order to manipulate certain queries before it reaches ElasticSearch. This is useful in cases like when there are certain terms that needs to be filtered out.
Relevant Search
In order to build a pipeline that implements relevant search, we will utilize the following pre-built stages. Here's a brief description of what each one of them will do:
searchRelevancy
: This stage provides support to apply settings to the ReactiveSearch API body. This is useful to apply default setting to fields likedataField
which otherwise will throw an error on not being passed from the client.replaceSearchTerm
: This stage provides functionality to replace the search term entered by the user.removeWords
: As the name suggests, this stage allows removing words from the search term.replaceWords
: This stage allows replacing words in the search term.addFilter
: Add filter allows adding filters to the search query.promotResults
: This stage allows injecting results to certain positions in the response.hideResults
: This stage allows hiding certain results based on the_id
field matched.customData
: This stage allows adding custom data to the response body
Now that we briefly know about the stages that can help us make search more relevant, let's go through the assumptions.
Assumptions
For the purpose of example, we will be working with an index called good-books-ds
that contains data about books. It is important to understand what index we will be working with in order to understan the stages being applied.
While overriding the _reactivesearch
, we will specify to override just the good-books-ds
index and the method will be POST that will invoke the pipeline.
Pre Setup
Let's define the basics of the pipeline. It will be in the following way:
enabled: true
description: Pipeline to implement relevant search
routes:
- path: good-books-ds/_reactivesearch
method: POST
classify:
category: reactivesearch
envs:
category: reactivesearch
index:
- good-books-ds
Note that we have also set the envs.index
field as good-books-ds
. This is an optional step but is good practice. The ElasticSearch step reads the index from this step as a fallback.
We are also setting the envs.category
as reactivesearch
for reference.
Stages
Now that we have the pre setup out of the way, let's define the stages for the pipeline.
Authorization
We need to make sure that the requests made to this endpoint are authenticated. To do this, we can use the pre-built stage authorization
. We can define it in the following way:
- id: authorize request
use: authorization
It's as simple as that, we don't need to do anything else, rest will be taken care of by the pipeline.
Search Relevancy
As explained above, this stage lets us set default values for fields so that even if those are not passed in the request body, they are automatically applied. ReactiveSearch pipelines provides this as a pre-built stage in the name searchRelevancy
.
We will define this stage in the following way:
- id: search relevancy
use: searchRelevancy
inputs:
search:
dataField:
- original_title
size: 1
suggestion:
dataField:
- original_title
enablePopularSuggestions: true
size: 3
popularSuggestionsConfig:
size: 1
enableRecentSuggestions: true
recentSuggestionsConfig:
size: 1
continueOnError: false
In the above, we are passing the following fields as inputs:
search.dataField
: The field present in the index to be set asdataField
in the request body.search.size
: Thesize
value to be set in the request body if it's not already passed.suggestion.dataField
: This is similar to the above search field but it is applied forsuggestion
type of requests.suggestion.enablePopularSuggestions
: This fields sets theenablePopularSuggestion
field in the request body if it's not already passed.suggestion.size
: Same as above except this is forsuggestion
type of requests.suggestion.popularSuggestionsConfig.size
: As the name suggests, sets thesize
field forpopularSuggestionsConfig
if it's not already passed in the request body.suggestion.enableRecentSuggestions
: This field indicates whether or not to enable recent suggestions and is set if not already passed in the request body.suggestions.recentSuggestionsConfig.size
: Similar as above except for recent suggestions.
Besides this, we are also setting the continueOnError
as false
which indicates that the execution of the pipeline should not continue if this stage fails. This is important since without applying these fields, the request will not be properly translated to the equivalent ElasticSearch request.
Replace Search Term
At times, we might have the need to replace the search term with something different. This can be achieved by using the pre-built stage replaceSearchTerm
. We can define it in the following way:
- id: replace search term
use: replaceSearchTerm
inputs:
data: harry potter
We can pass the new search term through the inputs.data
field.
Remove Words
At times, we might even want to remove certain words from the search term entered by the user. We can use the pre-built stage replaceWords
in a situation like this. We can define it in the following way:
- id: replace words
use: replaceWords
inputs:
data:
- test
- rick astley
We can pass the words to be removed in the inputs.data
field. This field should be an array of strings and every word that occurs in this array will be removed from the search term.
Replace Words
Sometimes, we might have the need to replace certain words with some other words. This can be for any number of reasons like improving search relevancy for the user and so.
We can do that in the following way using the pre-built stage replaceWords
:
- id: replace words
use: replaceWords
inputs:
data:
harry: harry potter
We can pass whatever field we want to replace in the inputs.data
field as an object. Every word matching the key
in the data field will be replaced with the value passed along with it.
Add Filter
Adding filter is sometimes an useful function in order to improve the search results. Let's say we want to add a filter to set the authors
field to Agatha Christie
. We can do that using the pre-built stage addFilter
.
That can be achieved in the following way:
- id: add filter
use: addFilter
inputs:
data:
authors: Agatha Christie
In the above case, we can pass the data through inputs.data
field. This field should be an object where every key is the field we want to add filter for with the value being the value.
Reactive Search
Now that we have applied most query rules, let's finally make the ReactiveSearch call. We can do that by using the reactivesearchQuery
This can be defined in the following way:
- id: reactive search query
use: reactivesearchQuery
Elastic Search Query
Once we have executed the reactivesearch query, we can continue and hit Elastic Search with the translated query now. This can be done by using the pre-built stage elasticsearchQuery
in the following way:
- id: elasticsearch query
use: elasticsearchQuery
Promote Results
Let's now do some manipulation to the response that we got from ElasticSearch. Let's say we want to modify the response and add a new item in the 5th position. We can do that by using the pre-built stage promoteResults
.
This stage can be defined in the following way:
- id: promote 5th result
use: promoteResults
inputs:
data:
- doc:
_id: inserted_5
_source:
title: This is the 5th result
position: 5
We can pass the data that needs to be inserted in the above way using the inputs.data
field. The data field should be an array of objects. Each object should contain the following fields:
doc
: The document to insertposition
: The position at which the document is to be inserted.
Hide Results
As explained above, at times we might want to hide certain results. This can be achieved by using the _id
of that document with the pre-built stage hideResults
.
This stage can be defined in the following way:
- id: hide results
use: hideResults
inputs:
data:
- some_id_to_remove
We need to pass the data in the inputs.data
field and the ID (if present in the response) will be removed.
Custom Data
Let's say we want to add some custom data to the response body, we can do that through the customData
pre-built stage. This stage can be defined in the following way:
- id: custom data
use: customData
inputs:
data:
reference: Hercule Poirot
In the above example, the response will have an custom key added to it. This key will be reference
and the value will be set to the value passed in the above example.
Basically, we insert the object passed in the inputs.data
field as is to the root level of the response body.
Create the pipeline
Now that we have the whole pipeline defined, we can create the pipeline by hitting the ReactiveSearch instance.
The URL we will hit is: /_pipeline
with a POST request.
The above endpoint expects a multipart/form-data
body with the pipeline
key containing the path to the pipeline file. All the scriptRef
files can be passed as a separate key in the form data and will be parsed by the API automatically. Read more about this endpoint here
We can create the pipeline in the following request:
Below request assumes all the files mentioned in this guide are present in the current directory
curl -X POST 'CLUSTER_ID/_pipeline' -H "Content-Type: multipart/form-data" --form "pipeline=pipeline.yaml"
Testing the pipeline
This pipeline can be testing with the following request. We will hit the URL: CLUSTER_ID/good-books-ds/_reactivesearch
.
The body passed will be following:
{
"query": [
"value": "some query"
]
}
Note that the
dataField
will be automatically added usingsearchRelevancy
. Also the search term will be replaced and manipulated.
Hit the above request using the following cURL to see the magic happen:
curl -X POST CLUSTER_ID/good-books-ds/_reactivesearch -H "Content-Type: application/json" -d '{"query": ["value": "some query"]}'
Above and Beyond
Let's say we want to get more out of the above pre-built stages. We can make the data passed to them dynamic. Let's say we have a complex JavaScript script that determines whether or not a certain query is to be replaced with a new query.
We can use this script with a custom stage along with the scriptRef
field. Once we run this script, all we need to do is store the new search term in the context.
Let's say we save the new search term in the context with the key newSearchTerm
. Once we do that, we can access this new search term in the inputs.data
field for replaceSearchTerm
. This can be achieved in the following way:
Assuming the stage that adds the
newSearchTerm
field has theid
set todetermine search term
.
- id: replace dynamic search term
use: replaceSearchTerm
needs:
- determine search term
inputs:
data: '{{newSearchTerm}}'
Yes, it's as simple as that and the pipeline will take care of the rest.