Some ways to optimize query Elasticsearch
Search with as few fields as possible
The more query fields in query_string or multi_match will make the query speed slower. A common technique to improve search speed across multiple fields is to copy their values into a single field at the time of indexing and then use this field to search. This can be automated with copy-to directives. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | PUT movies { "mappings": { "properties": { "name_and_plot": { "type": "text" }, "name": { "type": "text", "copy_to": "name_and_plot" }, "plot": { "type": "text", "copy_to": "name_and_plot" } } } } |
Pre-index data
You should make use of patterns in your query to optimize how the data is indexed. For example, if all your documents have price fields and most queries run aggregations range in a fixed list (for example, 0 – 10, 10 – 100, 100 -1000, … ), you can perform these aggregations faster by pre-indexing ranges and using terms aggregations to query:
1 2 3 4 5 6 | PUT index/_doc/1 { "designation": "spoon", "price": 13 } |
We will search like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | GET index/_search { "aggs": { "price_ranges": { "range": { "field": "price", "ranges": [ { "to": 10 }, { "from": 10, "to": 100 }, { "from": 100 } ] } } } } |
Or we can add a keyword field to store an array of ranges to index:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | PUT index { "mappings": { "properties": { "price_range": { "type": "keyword" } } } } PUT index/_doc/1 { "designation": "spoon", "price": 13, "price_range": "10-100" } |
And then we will search on the price_range field instead of price :
1 2 3 4 5 6 7 8 9 10 11 | GET index/_search { "aggs": { "price_ranges": { "terms": { "field": "price_range" } } } } |
Preferably use the type keyword when mapping
Not all numeric data must be mapped as numeric. Elaticsearch optimizes numeric fields, such as integer or long for range queries. However, keyword type is better for term queries and some other term-lever queries.
Identifiers, such as a product code or ID, are rarely used in range queries, they are often retrieved by term-level queries.
Consider mapping numeric fields with the type keyword if:
- You do not intend to use this field to query ranges
- You need to get data as quickly as possible. The term query on the keyword field is much faster than the term query on the numeric field
If you’re not sure how to use that field, you can use multi-field mapping with both keyword and numeric types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | PUT my_index { "mappings": { "properties": { "tier": { "type": "integer", "fields": { "keyword": { "type": "keyword" } } } } } } |
Avoid script usage
If possible, avoid using search scripts. Because the script does not use index results in slower search speed.
If you often use scripts to convert data already, you can speed up the search by tranforming the data before indexing. However, this means you will spend more time indexing.
An index, my_test_scores, contains two long fields:
- math_score
- verbal_score
When running a search, users often use scripts to sort results by the sum of these two field values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | GET /my_test_scores/_search { "query": { "term": { "grad_year": "2020" } }, "sort": [ { "_script": { "type": "number", "script": { "source": "doc['math_score'].value + doc['verbal_score'].value" }, "order": "desc" } } ] } |
To speed up the search, you can perform this calculation while indexing and add another field to sort.
First, add a new field, Total_score to the index. The Total_score field will contain the sum of the math_score and verbal_score field values.
1 2 3 4 5 6 7 8 9 | PUT /my_test_scores/_mapping { "properties": { "total_score": { "type": "long" } } } |
Next, use a pipeline containing the script to sum math_score and verbal_score and index the value into the Total_score field.
1 2 3 4 5 6 7 8 9 10 11 12 | PUT _ingest/pipeline/my_test_scores_pipeline { "description": "Calculates the total test score", "processors": [ { "script": { "source": "ctx.total_score = (ctx.math_score + ctx.verbal_score)" } } ] } |
To update existing data, use this pipeline to reindex any document from my_test_scores to an index, for example my_test_scores_2.
1 2 3 4 5 6 7 8 9 10 11 | POST /_reindex { "source": { "index": "my_test_scores" }, "dest": { "index": "my_test_scores_2", "pipeline": "my_test_scores_pipeline" } } |
Continue to use the pipeline to index any new document to my_test_scores_2.
1 2 3 4 5 6 7 8 | POST /my_test_scores_2/_doc/?pipeline=my_test_scores_pipeline { "student": "kimchy", "grad_year": "2020", "math_score": 800, "verbal_score": 800 } |
Finally, the user can sort using the Total_score field instead of using the script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | GET /my_test_scores_2/_search { "query": { "term": { "grad_year": "2020" } }, "sort": [ { "total_score": { "order": "desc" } } ] } |
References
https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.htm