Learn about Elasticsearch Aggregations

Tram Ho

Elasticsearch Aggregarions helps perform calculations and statistics using a simple search query. We can access the data we want to use the GET method in the Dev Tools of Kibana UI, CURL or APIs in the code. In this article, I will use Kibana UI’s Dev Tools to perform queries.

Here are two examples we can use aggregations:

  1. You are doing an online clothing business and want to know the average price of all products in the catalog.
  2. You want to check how many products cost between $ 100 and prices between $ 100 and $ 200.

To start using aggregations, you need to install elasticsearch and have some data / schema in the Elasticsearch index. In this article, we will use sample eCommerce orders and Web logs sample data provided by kibana.

You can get the data by going to the Kibana homepage and clicking on “Load a data set and a Kibana dashboard”

Syntax Aggregation

aggs —This keyword indicates that you want to perform an aggregation query.

name_of_aggregation —This is the name of the aggregation you define.

type_of_aggregation — The type of aggregation used.

field — keyword field.

document_field_name —field we want to perform aggregation.

For example:

The above query will return the total “clientip” in the index “kibana_sample_data_logs”:

The main types of Aggregation

Aggregations can be divided into four groups: bucket aggregations, metric aggregations, matrix aggregations, and pipeline aggregations.

  • Metric aggregations — This type of aggregation calculates data from values ​​obtained from the documents being aggregated.
  • Bucket aggregations — Bucket aggregations do not compute data from fields like Metric aggregations but generate buckets of documents. Each bucket corresponds to a criterion on which to determine whether a document belongs to that bucket in the current context.
  • Pipeline aggregations — This type of aggregation takes input from the output of other aggregations.
  • Matrix aggregations — These aggregations work on more than one field and provide statistical results based on documents obtained from the fields used.

Some important Aggregation

5 important aggregations in Elasticsearch are:

  1. Cardinality aggregation
  2. Stats aggregation
  3. Filter aggregation
  4. Terms aggregation
  5. Nested aggregation

Cardinality aggregation

This aggregation is a single-value aggregation of type Metric aggregations, used to calculate the number of different values ​​of a particular field.

To find out how many sku there is in e-commerce data, we perform a query

The result is:

Stats Aggregation

This is a multi-value Metric aggregations, which calculates statistics from numerical values ​​from aggregated documents.

The statistics returned include min , max , sum , count and avg .

Try checking the field total_quantity statistics in the sample data:

Result:

Filter Aggregation

This aggregarion belongs to Bucket aggregations, defines a single bucket containing documents that meet the filter condition, and can perform data calculations in this bucket.

For example, we filter the documents with the username “eddie” and calculate the average of the prices of the products that person bought.

Result:

Terms Aggregation

A type of Bucket aggregations, creating buckets from the field values, the number of buckets is dynamic, each different value of the specified field will create a bucket.

In the example below, we will perform the terms aggregation on the “user” field. As a result, we will have buckets for each user, each bucket will contain the number of documents.

Our query is:

Result:

Nested Aggregation

This is one of the most important types in Bucket Aggregations. A Nested Aggregation allows you to aggregate a field with nested documents — a field that has many sub-fields.

A field must have a “nested” type in index mapping if you want to use Nested Aggregation on that field.

Sample ecommerce data has no field of type “nested” so we will create a new index with the field “Employee” whose type is “nested”:

Add some data to the index we just created:

Now we have sample data to perform Nested Aggregation. Look at the example below to see how it works:

Result:

Summary

The article details some techniques in making use of aggregations. There are also some aggregations that may be useful to you:

  • Date histogram aggregation — use with dates values.
  • Scripted aggregation — used with scripts.
  • Top hits aggregation — use with the most relevant documents.
  • Range aggregation — used with a set of interval values.

There are also many other aggregations that are less common, so they are not mentioned in this article. If you want to learn more, you can read here

Refer

https://logz.io/blog/elasticsearch-aggregations/

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

Share the news now

Source : Viblo