function score in elasticsearch

Tram Ho

1. Introduction.

The heart of the Elasticsearch search engine is the concept of score, as most of what we know of TF-IDF

Score refers to the relevance of the document to the search keyword.

The function score allows you to edit the score of the returned document, which makes it easier to customize the impact and produce the results more accurate with more requirement.

2. The actual usecase of the function score.

2.1 Add the influence of views on note search results.

In a note application, in addition to searching text in notes, we need to add view notes to the effect of the score, the more note views, the more we tend to want to search again.

Add views to your search results, then use SUM to calculate the total views with the current score to make the new score

However, the above method has a bit of a disadvantage, the use of adding more views to the score at a time will make the score too large compared to the general plan, resulting in too high views that will overwrite the results in the plan. in general, saturating and losing the results the user really wants.

Solution:

Add the log function, to reduce the influence of score coming from the view, the graph of the logarithmic function is as follows:

Thus the score increase will be reflected on a lower scale dimension, which ensures the view value does not affect the score too much.

For example:

  • 10 views: log10 (10) = 1 => score = score + 1
  • 20 views: log10 (20) = 1 => score = score + 1.3
  • 50 views: log10 (50) = 1 => score = score + 1.69
  • 100 views: log10 (20) = 1 => score = score + 2

The lower the base logarihm, the higher the score added, and let’s examine the following variation table:

2.2 Use saturation so that the added value does not exceed the desired threshold.

For documents where the score does not differ much, with such a low score difference, we can use the saturation function to impose the effect should not exceed a desired number (also known as is up to a certain value, it is necessary to saturate the effect of a quantity to ensure the equilibrium required for the score formula

Recipe

Graph:

According to the above 0 <saturation <1, so just specify the coefficient t (upper threshold), you can determine the saturation value by t * saturation, for example: do not want the effect of the view to exceed 5

2.3 Use the gauss function to raise the score by concentration threshold.

You want to create a search feature where we raise the score for a given partition (eg time), for example, we raise the score for the notes that have updated time in the last 24 hours, and the closer it is to the time. the key point (the time the user desires, in the gauss function is the origin concept) is (Example) 11/13/2020 @ 12:00 am (UTC) , the further away from this point the more the score decreases, the scale in About 24 hours around the time of origin, apart from this scale we do not need to pay attention to raise the score anymore.

The quantities of time I converted to millisecond 24h => 86400000ms …

Above are some of the usercases in using the function score, hope you might find it a little helpful

Share the news now

Source : Viblo