Some ways to optimize query Elasticsearch

Tuesday, 23/06/2020

Tram Ho

Some ways to optimize query Elasticsearch

Search with as few fields as possible

The more query fields in query_string or multi_match will make the query speed slower. A common technique to improve search speed across multiple fields is to copy their values into a single field at the time of indexing and then use this field to search. This can be automated with copy-to directives. Here is an example:

PUT movies
{
  "mappings": {
    "properties": {
      "name_and_plot": {
        "type": "text"
      },
      "name": {
        "type": "text",
        "copy_to": "name_and_plot"
      },
      "plot": {
        "type": "text",
        "copy_to": "name_and_plot"
      }
    }
  }
}

PUT movies

{

"mappings": {

"properties": {

"name_and_plot": {

"type": "text"

"name": {

"type": "text",

"copy_to": "name_and_plot"

"plot": {

"type": "text",

"copy_to": "name_and_plot"

}

Pre-index data

You should make use of patterns in your query to optimize how the data is indexed. For example, if all your documents have price fields and most queries run aggregations range in a fixed list (for example, 0 – 10, 10 – 100, 100 -1000, … ), you can perform these aggregations faster by pre-indexing ranges and using terms aggregations to query:

PUT index/_doc/1
{
  "designation": "spoon",
  "price": 13
}

PUT index/_doc/1

{

"designation": "spoon",

"price": 13

}

We will search like this:

GET index/_search
{
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 10 },
          { "from": 10, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}

GET index/_search

{

"aggs": {

"price_ranges": {

"range": {

"field": "price",

"ranges": [

{ "to": 10 },

{ "from": 10, "to": 100 },

{ "from": 100 }

]

}

Or we can add a keyword field to store an array of ranges to index:

PUT index
{
  "mappings": {
    "properties": {
      "price_range": {
        "type": "keyword"
      }
    }
  }
}

PUT index/_doc/1
{
  "designation": "spoon",
  "price": 13,
  "price_range": "10-100"
}

PUT index

{

"mappings": {

"properties": {

"price_range": {

"type": "keyword"

}

PUT index/_doc/1

{

"designation": "spoon",

"price": 13,

"price_range": "10-100"

}

And then we will search on the price_range field instead of price :

GET index/_search
{
  "aggs": {
    "price_ranges": {
      "terms": {
        "field": "price_range"
      }
    }
  }
}

GET index/_search

{

"aggs": {

"price_ranges": {

"terms": {

"field": "price_range"

}

Preferably use the type keyword when mapping

Not all numeric data must be mapped as numeric. Elaticsearch optimizes numeric fields, such as integer or long for range queries. However, keyword type is better for term queries and some other term-lever queries.

Identifiers, such as a product code or ID, are rarely used in range queries, they are often retrieved by term-level queries.

Consider mapping numeric fields with the type keyword if:

You do not intend to use this field to query ranges
You need to get data as quickly as possible. The term query on the keyword field is much faster than the term query on the numeric field

If you’re not sure how to use that field, you can use multi-field mapping with both keyword and numeric types:

PUT my_index
{
  "mappings": {
    "properties": {
      "tier": {
        "type": "integer",
        "fields": {
          "keyword": { 
            "type":  "keyword"
          }
        }
      }
    }
  }
}

PUT my_index

{

"mappings": {

"properties": {

"tier": {

"type": "integer",

"fields": {

"keyword": {

"type": "keyword"

}

Avoid script usage

If possible, avoid using search scripts. Because the script does not use index results in slower search speed.

If you often use scripts to convert data already, you can speed up the search by tranforming the data before indexing. However, this means you will spend more time indexing.

An index, my_test_scores, contains two long fields:

math_score
verbal_score

When running a search, users often use scripts to sort results by the sum of these two field values:

GET /my_test_scores/_search
{
  "query": {
    "term": {
      "grad_year": "2020"
    }
  },
  "sort": [
    {
      "_script": {
        "type": "number",
        "script": {
          "source": "doc['math_score'].value + doc['verbal_score'].value"
        },
        "order": "desc"
      }
    }
  ]
}

GET /my_test_scores/_search

{

"query": {

"term": {

"grad_year": "2020"

}

"sort": [

{

"_script": {

"type": "number",

"script": {

"source": "doc['math_score'].value + doc['verbal_score'].value"

"order": "desc"

}

]

}

To speed up the search, you can perform this calculation while indexing and add another field to sort.

First, add a new field, Total_score to the index. The Total_score field will contain the sum of the math_score and verbal_score field values.

PUT /my_test_scores/_mapping
{
  "properties": {
    "total_score": {
      "type": "long"
    }
  }
}

PUT /my_test_scores/_mapping

{

"properties": {

"total_score": {

"type": "long"

}

Next, use a pipeline containing the script to sum math_score and verbal_score and index the value into the Total_score field.

PUT _ingest/pipeline/my_test_scores_pipeline
{
  "description": "Calculates the total test score",
  "processors": [
    {
      "script": {
        "source": "ctx.total_score = (ctx.math_score + ctx.verbal_score)"
      }
    }
  ]
}

PUT _ingest/pipeline/my_test_scores_pipeline

{

"description": "Calculates the total test score",

"processors": [

{

"script": {

"source": "ctx.total_score = (ctx.math_score + ctx.verbal_score)"

}

]

}

To update existing data, use this pipeline to reindex any document from my_test_scores to an index, for example my_test_scores_2.

POST /_reindex
{
  "source": {
    "index": "my_test_scores"
  },
  "dest": {
    "index": "my_test_scores_2",
    "pipeline": "my_test_scores_pipeline"
  }
}

POST /_reindex

{

"source": {

"index": "my_test_scores"

"dest": {

"index": "my_test_scores_2",

"pipeline": "my_test_scores_pipeline"

}

Continue to use the pipeline to index any new document to my_test_scores_2.

POST /my_test_scores_2/_doc/?pipeline=my_test_scores_pipeline
{
  "student": "kimchy",
  "grad_year": "2020",
  "math_score": 800,
  "verbal_score": 800
}

POST /my_test_scores_2/_doc/?pipeline=my_test_scores_pipeline

{

"student": "kimchy",

"grad_year": "2020",

"math_score": 800,

"verbal_score": 800

}

Finally, the user can sort using the Total_score field instead of using the script:

GET /my_test_scores_2/_search
{
  "query": {
    "term": {
      "grad_year": "2020"
    }
  },
  "sort": [
    {
      "total_score": {
        "order": "desc"
      }
    }
  ]
}

GET /my_test_scores_2/_search

{

"query": {

"term": {

"grad_year": "2020"

}

"sort": [

{

"total_score": {

"order": "desc"

}

]

}

References

https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.htm

Share the news now

Source : Viblo

Some ways to optimize query Elasticsearch

Some ways to optimize query Elasticsearch

Search with as few fields as possible

Pre-index data

Preferably use the type keyword when mapping

Avoid script usage

References

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers