1. Introduction
Hello everyone, today I will introduce a pretty cool and popular search technology that is elasticsearch. In this article, we will explore the concepts, implementation, APIs & operations of this search technique.
2. Introducing elasticsearch
Elasticsearch is an open source technology that helps us to create servers that support full text search.
Elasticsearch is a search engine based on Lucene software. It provides a full-featured, distributed search engine with an HTTP web interface that supports JSON data. Elasticsearch is developed in Java and is released open source under the Apache license.
3. Installation
There are many ways to install Elasticsearch, you can consult the documentation installed on home Elasticsearch in here .
In this article, I will choose how to install using docker .
I often choose to install by docker because in this way, I can actively turn on / off the server if not necessary. In case of multiple projects it is also possible to create different elasticsearch containers to work with independent servers. In addition, it helps me avoid many errors when a traditional installation has problems (lack of libraries, related packages …)
This way, with a little knowledge of docker, we can easily set up it. First pull the image of elasticsearch with the command:
1 2 | docker pull docker.elastic.co/elasticsearch/elasticsearch:7.9.2 |
After successfully pulling the elasticsearch image, we can start the elasticsearch server using the docker container with the command:
1 2 | docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.9.2 |
After starting 1 Elasticsearch server with docker successfully, at this time, the container will bind 2 ports 9200 & 9300 respectively to the local machine.
Port 9200 will be used for all APIs over the HTTP protocol, while port 9300 is a custom binary protocol used to communicate between nodes in a cluster.
We can visit http: // localhost: 9200 in our browser to view information about Elasticsearch server.
4. Concepts
In the next section, we will learn about some of the concepts of Elasticsearch
Node
A node is a single server that is part of the cluster, which stores your data, and is involved in indexing and search functions.
Cluster
Cluster is a set of connections of nodes in Elasticsearch, If you start a single node (single node) like the command start Elasticsearch container above, you already have a cluster with 1 node in it.
Index
An index is a collection of documents that contain similar properties. Index is also identified by a name, which is used when performing operations such as adding, modifying, deleting, or updating documents within it.
Document
Document is the basic unit of information that needs to be indexed. For each type of index, you can have an infinite number of documents in it.
5. Elasticseach API
Next, we will explore the basic APIs of elasticsearch.
5.1. Check the status
First, let’s check the status of our elasticsearch server using the following API:
1 2 | GET http://localhost:9200/_cat/health |
Use the curl
command to test the following:
1 2 3 | ~$ curl -X GET http://localhost:9200/_cat/health 1603283841 12:37:21 docker-cluster green 1 1 0 0 0 0 0 0 - 100.0% |
The status green
shows that our elasticsearch server is up and running.
5.2. API indexes
In Elasticsearch, the index acts as a database table in SQL, it is a collection of data from which we can query for the desired information. In essence here, Elasticsearch is based on how we organize and put data in according to the indexes and from there Elasticsearch will assist us in querying and searching based on that data.
Get index
First we test the indexes with the following API
1 2 | GET http://localhost:9200/_cat/indices |
First maybe the server currently does not have any indexes, it’s okay, we continue to the index creation API and will check again later.
Create index
Create an index product using the following API:
1 2 | PUT http://localhost:9200/product?pretty |
The curl command:
1 2 3 4 5 6 7 | ~$ curl -X PUT http://localhost:9200/product?pretty { "acknowledged" : true, "shards_acknowledged" : true, "index" : "product" } |
The response returning the above result means that the index product was successfully created. The pretty
parameter to the output displays the pretty JSON result for easier testing
Now let’s check out the API get index above:
1 2 3 4 | ~$ curl -X GET http://localhost:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open product _g763PiKRceBeVtZ8bUw2A 1 1 0 0 208b 208b |
Thus we have seen information 1 index has just been created. The ?v
parameter to display the title makes checking in easier.
Delete index
Delete the index using the following API
1 2 | DELETE http://localhost:9200/product?pretty |
Execute the curl command:
1 2 3 4 5 | ~$ curl -X DELETE http://localhost:9200/product?pretty { "acknowledged" : true } |
The result shown above means the index product has been successfully deleted
5.3. Document APIs
In the next section, we will see how to manipulate data on indexes (which are documents)? How do the CRUD document APIs & retrieve document information?
First, let’s create a new index book with the following command:
1 2 | curl -X PUT http://localhost:9200/book?pretty |
Create document
Create a new document in the index using the following API:
1 2 3 | POST http://localhost:9200/{index}/_doc/{document_id} <{document request body}> |
For example, create a new document in index book with id = 1 with the following command
1 2 3 4 5 6 7 8 9 10 11 | curl -X PUT http://localhost:9200/book/_doc/1 -H 'Content-Type: application/json' -d' { "name": "Math", "price": 111, "author": { "name": "John", "age": 55 } } ' |
Normally, we will have to specify the id of the document, in case we don’t pass the parameter document_id
, the server will automatically generate an id for that document.
Update document
Similar to the API create document above, the API updates the document using the PUT method.
1 2 3 | PUT http://localhost:9200/{index}/_doc/{document_id} <{document request body}> |
The update, we are forced to specify the id of that document.
Get document by ID
Do a search for document information by id = the following API:
1 2 | GET http://localhost:9200/{index}/_doc/{document_id}?pretty |
For example, use curl to search for document in index book with id = 1 created above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ~$ curl -X GET http://localhost:9200/book/_doc/1?pretty { "_index" : "book", "_type" : "_doc", "_id" : "1", "_version" : 3, "_seq_no" : 3, "_primary_term" : 1, "found" : true, "_source" : { "name" : "Math", "price" : 111, "author" : { "name" : "John", "age" : 55 } } } |
We can check document information in the _source
part of the output.
Show all document
Besides, we can also check all documents with the following command:
1 2 | curl -X GET http://localhost:9200/book/_search?pretty |
Delete document
Performing delete document using API
1 2 | DELETE http://localhost:9200/{index}/_doc/{document_id} |
5.4. Query on Elasticsearch
At this point, we have grasped the concepts and some basic APIs for manipulating documents & indexes in Elasticsearch.
In this next section, we will learn about the very important feature of Elasticsearch which is some of the ways to query how to get data at will in Elasticsearch.
In Elasticsearch there are many querying techniques to get the desired data, but in this section I only mention some basic queries that help you to understand, apply and learn from that. another query
First, to make it easy to visualize this query example we need to prepare a sample data set. In this example I will use the sample data elasticoffee-data on Github repo elastic / examples .
Specifically, I will insert 20 records in the example data above into the elasticoffee
index with the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | curl -X POST http://localhost:9200/elasticoffee/_bulk?pretty -H 'Content-Type: application/x-ndjson' -d ' { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "1" } } {"sceneID": "2", "sceneData": "0", "entityID": "zwave.quad2", "quadId": 2, "quadMod": "1", "@timestamp": "2018-02-27T22:26:39Z", "beverageClass": "Hot Beverages", "beverage": "Latte", "beverageSide": "left", "beverageIndex": 5, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "2" } } {"sceneID": "3", "sceneData": "0", "entityID": "zwave.quad1", "quadId": 1, "quadMod": "0", "@timestamp": "2018-02-27T22:26:39Z", "beverageClass": "Hot Beverages", "beverage": "Mocha", "beverageSide": "left", "beverageIndex": 2, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "3" } } {"sceneID": "1", "sceneData": "0", "entityID": "zwave.quad2", "quadId": 2, "quadMod": "1", "@timestamp": "2018-02-27T22:26:39Z", "beverageClass": "Hot Beverages", "beverage": "Espresso", "beverageSide": "left", "beverageIndex": 4, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "4" } } {"sceneID": "2", "sceneData": "0", "entityID": "zwave.quad1", "quadId": 1, "quadMod": "0", "@timestamp": "2018-02-28T15:29:39Z", "beverageClass": "Hot Beverages", "beverage": "Americano", "beverageSide": "left", "beverageIndex": 1, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "5" } } {"sceneID": "1", "sceneData": "0", "entityID": "zwave.quad1", "quadId": 1, "quadMod": "0", "@timestamp": "2018-02-28T15:29:40Z", "beverageClass": "Hot Beverages", "beverage": "Cappuccino", "beverageSide": "left", "beverageIndex": 0, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "6" } } {"sceneID": "3", "sceneData": "0", "entityID": "zwave.quad1", "quadId": 1, "quadMod": "0", "@timestamp": "2018-02-28T15:29:40Z", "beverageClass": "Hot Beverages", "beverage": "Mocha", "beverageSide": "left", "beverageIndex": 2, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "7" } } {"sceneID": "3", "sceneData": "0", "entityID": "zwave.quad4", "quadId": 4, "quadMod": "1", "@timestamp": "2018-02-28T15:36:24Z", "beverageClass": "Hot Beverages", "beverage": "Coffee", "beverageSide": "right", "beverageIndex": 6, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "8" } } {"sceneID": "1", "sceneData": "0", "entityID": "zwave.quad4", "quadId": 4, "quadMod": "1", "@timestamp": "2018-02-28T15:36:24Z", "beverageClass": "Hot Beverages", "beverage": "Espresso", "beverageSide": "right", "beverageIndex": 4, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "9" } } {"sceneID": "2", "sceneData": "0", "entityID": "zwave.quad4", "quadId": 4, "quadMod": "1", "@timestamp": "2018-02-28T15:36:25Z", "beverageClass": "Hot Beverages", "beverage": "Latte", "beverageSide": "right", "beverageIndex": 5, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "10" } } {"sceneID": "1", "sceneData": "0", "entityID": "zwave.quad3", "quadId": 3, "quadMod": "0", "@timestamp": "2018-02-28T15:36:26Z", "beverageClass": "Hot Beverages", "beverage": "Cappuccino", "beverageSide": "right", "beverageIndex": 0, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "11" } } {"sceneID": "2", "sceneData": "0", "entityID": "zwave.quad3", "quadId": 3, "quadMod": "0", "@timestamp": "2018-02-28T15:36:26Z", "beverageClass": "Hot Beverages", "beverage": "Americano", "beverageSide": "right", "beverageIndex": 1, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "12" } } {"sceneID": "4", "sceneData": "0", "entityID": "zwave.quad3", "quadId": 3, "quadMod": "0", "@timestamp": "2018-02-28T15:36:25Z", "beverageClass": "Hot Beverages", "beverage": "Macchiato", "beverageSide": "right", "beverageIndex": 3, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "13" } } {"sceneID": "3", "sceneData": "0", "entityID": "zwave.quad3", "quadId": 3, "quadMod": "0", "@timestamp": "2018-02-28T15:36:25Z", "beverageClass": "Hot Beverages", "beverage": "Mocha", "beverageSide": "right", "beverageIndex": 2, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "14" } } {"sceneID": "3", "sceneData": "0", "entityID": "zwave.quad4", "quadId": 4, "quadMod": "1", "@timestamp": "2018-02-28T15:34:38Z", "beverageClass": "Hot Beverages", "beverage": "Coffee", "beverageSide": "right", "beverageIndex": 6, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "15" } } {"sceneID": "4", "sceneData": "0", "entityID": "zwave.quad4", "quadId": 4, "quadMod": "1", "@timestamp": "2018-02-28T15:36:23Z", "beverageClass": "Hot Beverages", "beverage": "Other", "beverageSide": "right", "beverageIndex": 7, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "16" } } {"sceneID": "3", "sceneData": "0", "entityID": "zwave.quad2", "quadId": 2, "quadMod": "1", "@timestamp": "2018-02-27T22:26:40Z", "beverageClass": "Hot Beverages", "beverage": "Coffee", "beverageSide": "left", "beverageIndex": 6, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "17" } } {"sceneID": "4", "sceneData": "0", "entityID": "zwave.quad2", "quadId": 2, "quadMod": "1", "@timestamp": "2018-02-27T22:26:42Z", "beverageClass": "Hot Beverages", "beverage": "Other", "beverageSide": "left", "beverageIndex": 7, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "18" } } {"sceneID": "1", "sceneData": "0", "entityID": "zwave.quad3", "quadId": 3, "quadMod": "0", "@timestamp": "2018-02-28T16:46:05Z", "beverageClass": "Hot Beverages", "beverage": "Cappuccino", "beverageSide": "right", "beverageIndex": 0, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "19" } } {"sceneID": "2", "sceneData": "0", "entityID": "zwave.quad3", "quadId": 3, "quadMod": "0", "@timestamp": "2018-02-28T16:46:39Z", "beverageClass": "Hot Beverages", "beverage": "Americano", "beverageSide": "right", "beverageIndex": 1, "quantity": 1} { "index" : { "_index" : "elasticoffee", "_type" : "doc", "_id" : "20" } } {"sceneID": "1", "sceneData": "0", "entityID": "zwave.quad1", "quadId": 1, "quadMod": "0", "@timestamp": "2018-02-28T15:54:22Z", "beverageClass": "Hot Beverages", "beverage": "Cappuccino", "beverageSide": "left", "beverageIndex": 0, "quantity": 1} ' |
To query, we will use the following API:
1 2 3 | GET http://localhost:9200/elasticoffee/_search?pretty <{request search body}> |
We will change the rquest body for the query, I will give a few examples to easily visualize, remember to pay attention to the request body part.
Example 1 – Search all documents, paginate 2 documents / page & start from document 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ~$ curl -X GET 'http://localhost:9200/elasticoffee/_search?pretty' --header 'Content-Type: application/json' --data-raw '{ "query": { "match_all": {} }, "size": 2, "from": 5 } ' ------ OUTPUT ---------------- { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "elasticoffee", "_type" : "doc", "_id" : "6", "_score" : 1.0, "_source" : { "sceneID" : "3", "sceneData" : "0", "entityID" : "zwave.quad1", "quadId" : 1, "quadMod" : "0", "@timestamp" : "2018-02-28T15:29:40Z", "beverageClass" : "Hot Beverages", "beverage" : "Mocha", "beverageSide" : "left", "beverageIndex" : 2, "quantity" : 1 } }, { "_index" : "elasticoffee", "_type" : "doc", "_id" : "7", "_score" : 1.0, "_source" : { "sceneID" : "3", "sceneData" : "0", "entityID" : "zwave.quad4", "quadId" : 4, "quadMod" : "1", "@timestamp" : "2018-02-28T15:36:24Z", "beverageClass" : "Hot Beverages", "beverage" : "Coffee", "beverageSide" : "right", "beverageIndex" : 6, "quantity" : 1 } } ] } } |
The response receives 2 documents with ids 6, 7
Example 2 – Search all documents, paging 2 documents / page sort by beverageIndex DESC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | curl -X GET 'http://localhost:9200/elasticoffee/_search?pretty' --header 'Content-Type: application/json' --data-raw '{ "query": { "match_all": {} }, "sort": [ { "beverageIndex": { "order": "desc" } } ], "size": 2 } ' ------------- OUTPUT --------------- { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "elasticoffee", "_type" : "doc", "_id" : "15", "_score" : null, "_source" : { "sceneID" : "4", "sceneData" : "0", "entityID" : "zwave.quad4", "quadId" : 4, "quadMod" : "1", "@timestamp" : "2018-02-28T15:36:23Z", "beverageClass" : "Hot Beverages", "beverage" : "Other", "beverageSide" : "right", "beverageIndex" : 7, "quantity" : 1 }, "sort" : [ 7 ] }, { "_index" : "elasticoffee", "_type" : "doc", "_id" : "17", "_score" : null, "_source" : { "sceneID" : "4", "sceneData" : "0", "entityID" : "zwave.quad2", "quadId" : 2, "quadMod" : "1", "@timestamp" : "2018-02-27T22:26:42Z", "beverageClass" : "Hot Beverages", "beverage" : "Other", "beverageSide" : "left", "beverageIndex" : 7, "quantity" : 1 }, "sort" : [ 7 ] } ] } } |
=> Response is displayed with 2 documents with id = 15 & 17 with the highest beverageIndex.
Example 3 – Search all documents, paginate 2 documents / page with condition entityID = “zwave.quad4”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | curl -X GET 'http://localhost:9200/elasticoffee/_search?pretty' --header 'Content-Type: application/json' --data-raw '{ "query": { "match": { "entityID": "zwave.quad4" } }, "size": 2 }' ------------- OUTPUT --------------- { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 5, "relation" : "eq" }, "max_score" : 1.3397743, "hits" : [ { "_index" : "elasticoffee", "_type" : "doc", "_id" : "7", "_score" : 1.3397743, "_source" : { "sceneID" : "3", "sceneData" : "0", "entityID" : "zwave.quad4", "quadId" : 4, "quadMod" : "1", "@timestamp" : "2018-02-28T15:36:24Z", "beverageClass" : "Hot Beverages", "beverage" : "Coffee", "beverageSide" : "right", "beverageIndex" : 6, "quantity" : 1 } }, { "_index" : "elasticoffee", "_type" : "doc", "_id" : "8", "_score" : 1.3397743, "_source" : { "sceneID" : "1", "sceneData" : "0", "entityID" : "zwave.quad4", "quadId" : 4, "quadMod" : "1", "@timestamp" : "2018-02-28T15:36:24Z", "beverageClass" : "Hot Beverages", "beverage" : "Espresso", "beverageSide" : "right", "beverageIndex" : 4, "quantity" : 1 } } ] } } |
=> Response returns 2 documents with id = 7, 8 with value entityID
= “zwave.quad4”
Example 4 – Search for documents with the conditional entityID = “zwave.quad4” & beverageIndex = “5”
=> Use the query bool.must
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | curl -X GET 'http://localhost:9200/elasticoffee/_search?pretty' --header 'Content-Type: application/json' --data-raw '{ "query": { "bool": { "must": [ { "match": { "entityID": "zwave.quad4" } }, { "match": { "beverageIndex": "5" } } ] } } }' |
=> Response will return document with id = 9 satisfying both the conditions entityID = “zwave.quad4” & beverageIndex = “5”
Example 5 – Search for documents with the condition entityID = “zwave.quad4” or beverageIndex = “5”
=> Use query bool.should
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | curl -X GET 'http://localhost:9200/elasticoffee/_search?pretty' --header 'Content-Type: application/json' --data-raw '{ "query": { "bool": { "should": [ { "match": { "entityID": "zwave.quad4" } }, { "match": { "beverageIndex": "5" } } ] } } }' |
=> Response returns documents with id = 9, 7, 8, 14, 15, 1 with 1 of 2 conditions satisfying entityID = “zwave.quad4” or beverageIndex = “5”
6. Conclusion
Thus, through this article I briefly introduced about Elasticsearch, how to install and use some basic APIs of this search technique. In addition to the basic queries I gave in the example above, there are still many other types of queries available to Elasticsearch, you can see more documentation on the Elasticsearch homepage here . Hope this article will help you in study as well as work, thank you for your attention.