Elasticsearch Aggregarions helps perform calculations and statistics using a simple search query. We can access the data we want to use the GET method in the Dev Tools of Kibana UI, CURL or APIs in the code. In this article, I will use Kibana UI’s Dev Tools to perform queries.
Here are two examples we can use aggregations:
- You are doing an online clothing business and want to know the average price of all products in the catalog.
- You want to check how many products cost between $ 100 and prices between $ 100 and $ 200.
To start using aggregations, you need to install elasticsearch and have some data / schema in the Elasticsearch index. In this article, we will use sample eCommerce orders and Web logs sample data provided by kibana.
You can get the data by going to the Kibana homepage and clicking on “Load a data set and a Kibana dashboard”
Syntax Aggregation
1 2 3 4 5 6 | "aggs”: { “name_of_aggregation”: { “type_of_aggregation”: { “field”: “document_field_name” } |
aggs —This keyword indicates that you want to perform an aggregation query.
name_of_aggregation —This is the name of the aggregation you define.
type_of_aggregation — The type of aggregation used.
field — keyword field.
document_field_name —field we want to perform aggregation.
For example:
1 2 3 4 5 6 7 8 9 10 11 | GET /kibana_sample_data_logs/_search { "size": 0, "aggs": { "ip_count": { "value_count": { "field": "clientip" } } } } |
The above query will return the total “clientip” in the index “kibana_sample_data_logs”:
The main types of Aggregation
Aggregations can be divided into four groups: bucket aggregations, metric aggregations, matrix aggregations, and pipeline aggregations.
- Metric aggregations — This type of aggregation calculates data from values obtained from the documents being aggregated.
- Bucket aggregations — Bucket aggregations do not compute data from fields like Metric aggregations but generate buckets of documents. Each bucket corresponds to a criterion on which to determine whether a document belongs to that bucket in the current context.
- Pipeline aggregations — This type of aggregation takes input from the output of other aggregations.
- Matrix aggregations — These aggregations work on more than one field and provide statistical results based on documents obtained from the fields used.
Some important Aggregation
5 important aggregations in Elasticsearch are:
- Cardinality aggregation
- Stats aggregation
- Filter aggregation
- Terms aggregation
- Nested aggregation
Cardinality aggregation
This aggregation is a single-value
aggregation of type Metric aggregations, used to calculate the number of different values of a particular field.
To find out how many sku there is in e-commerce data, we perform a query
1 2 3 4 5 6 7 8 9 10 11 12 | GET /kibana_sample_data_ecommerce/_search { "size": 0, "aggs": { "unique_skus": { "cardinality": { "field": "sku" } } } } |
The result is:
Stats Aggregation
This is a multi-value
Metric aggregations, which calculates statistics from numerical values from aggregated documents.
The statistics returned include min
, max
, sum
, count
and avg
.
Try checking the field total_quantity
statistics in the sample data:
1 2 3 4 5 6 7 8 9 10 11 12 | GET /kibana_sample_data_ecommerce/_search { "size": 0, "aggs": { "quantity_stats": { "stats": { "field": "total_quantity" } } } } |
Result:
Filter Aggregation
This aggregarion belongs to Bucket aggregations, defines a single bucket containing documents that meet the filter condition, and can perform data calculations in this bucket.
For example, we filter the documents with the username “eddie” and calculate the average of the prices of the products that person bought.
1 2 3 4 5 6 7 8 9 10 11 12 13 | GET /kibana_sample_data_ecommerce/_search { "size": 0, "aggs": { "User_based_filter" : { "filter" : { "term": { "user": "eddie"}}, "aggs" : { "avg_price" : { "avg" : { "field" : "products.price" } } }}}} |
Result:
Terms Aggregation
A type of Bucket aggregations, creating buckets from the field values, the number of buckets is dynamic, each different value of the specified field will create a bucket.
In the example below, we will perform the terms aggregation on the “user” field. As a result, we will have buckets for each user, each bucket will contain the number of documents.
Our query is:
1 2 3 4 5 6 7 8 9 10 | GET /kibana_sample_data_ecommerce/_search { "size": 0, "aggs": { "Terms_Aggregation" : { "terms": { "field": "user"}} } } |
Result:
Nested Aggregation
This is one of the most important types in Bucket Aggregations. A Nested Aggregation allows you to aggregate a field with nested documents — a field that has many sub-fields.
A field must have a “nested” type in index mapping if you want to use Nested Aggregation on that field.
Sample ecommerce data has no field of type “nested” so we will create a new index with the field “Employee” whose type is “nested”:
1 2 3 4 5 6 7 8 9 10 11 12 13 | PUT nested_aggregation { "mappings": { "properties": { "Employee": { "type": "nested", "properties" : { "first" : { "type" : "text" }, "last" : { "type" : "text" }, "salary" : { "type" : "double" } }}} }} |
Add some data to the index we just created:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | PUT nested_aggregation/_doc/1 { "group" : "Logz", "Employee" : [ { "first" : "Ana", "last" : "Roy", "salary" : "70000" }, { "first" : "Jospeh", "last" : "Lein", "salary" : "64000" }, { "first" : "Chris", "last" : "Gayle", "salary" : "82000" }, { "first" : "Brendon", "last" : "Maculum", "salary" : "58000" }, { "first" : "Vinod", "last" : "Kambli", "salary" : "63000" }, { "first" : "DJ", "last" : "Bravo", "salary" : "71000" }, { "first" : "Jaques", "last" : "Kallis", "salary" : "75000" }]} |
Now we have sample data to perform Nested Aggregation. Look at the example below to see how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | GET /nested_aggregation/_search { "aggs": { "Nested_Aggregation" : { "nested": { "path": "Employee" }, "aggs": { "Min_Salary": { "min": { "field": "Employee.salary" } } } }}} |
Result:
Summary
The article details some techniques in making use of aggregations. There are also some aggregations that may be useful to you:
- Date histogram aggregation — use with dates values.
- Scripted aggregation — used with scripts.
- Top hits aggregation — use with the most relevant documents.
- Range aggregation — used with a set of interval values.
There are also many other aggregations that are less common, so they are not mentioned in this article. If you want to learn more, you can read here
Refer
https://logz.io/blog/elasticsearch-aggregations/
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html