How to create indexes and some common queries in Elasticsearch

Tram Ho

What is Index Elasticsearch?

By definition on ES , an Elasticsearch index is a collection of documents linked together. ES stores data as JSON documents. Each document corresponds to a set of keys (the key – the name of the fields or properties) with the value corresponding to them (strings, numbers, booleans, dates, optional values, …).

The index is identified by its name, which will be used to perform indexing, searching, updating or deleting documents in the index.

Some rules when identifying indexes:

  1. Only lowercase letters.
  2. Do not contain special characters, /, *,?, “, <,>, |,`, Space, comma (,), #
    • Before 7.0, the index name could contain a colon (:),
    • In 7.0+, not supported.
  3. Do not start with -, _, +
  4. Is not . or ..
  5. Not longer than 255 bytes. Note that bytes, so multi-byte characters will count toward the 255 limit faster.

Inverted index

ES uses a data structure called a reverse index (or Inverted index) and is designed to be able to perform full-text searches quickly.

An Inverted index contains a list of each unique word (unique work) that appears in any document, for which each word will be a list of documents from which this word appears (mapping). Inverted index is created from document and stored in Shard, then used for searching document.

During indexing, ES stores documents and builds a reverse index that allows document data to be searched in real time. Indexing starts with the index API, then it is possible to add or update a JSON document in a specific index.

You can find more details about Inverted index in ES at Inverted index , Understanding Inverted Index ES .

Create index with API

We interact with ES through REST APIs with HTTP methods like GET, POST, PUT, DELETE.
To add a new index to the cluster, use the API to create the index with the PUT method.

In addition, the following items can be specified in the Request body:

  • Settings: configuration options for index.

For example,

more concise

  • Mappings for fields in the index.

For example,

  • Aliases aliases.

For example,

More about ES REST APIs .

Some popular queries in Elasticsearch

Basic Match Query

  • Search for a keyword in all fields.

  • Search in a school.

  • Leave the query in the body of the request.

Match All Query

The simplest query, matching all documents, _score 1.0.

Match None Query

Inverse of match_all query, does not match any document.

Match Phrase Query

Require all terms in the query string to appear in the document, in the correct search order, and located close to each other.

The slop parameter is used to adjust the default distance between terms.

Multi-fields Query

Search for queries in multiple fields.

Boolean Query

The match document query is based on combining the boolean results of other queries. The bool query maps to Lucene BooleanQuery, using many clauses like must, should, must_not, filter. We can understand it simply,

  • must ~ AND
  • must_not ~ NOT
  • should ~ OR

in relational database queries. For example,

Boosting Query

Increase weight in specific schools. For example,

Fuzzy Query

Returns results similar to searching for term based on Levenshtein distance usage.

Levenshtein distance is the number of steps required to turn one term (string) into another. These changes may include:

  • Change one character.
  • Delete one character.
  • Insert a character.
  • Converts two adjacent characters.

To find similar terms, fuzzy creates a set of all possible variants or extensions of the term to be searched within a specified distance. The query then returns the exact match for each expansion.

We can configure the fuzziness parameter of the query.

  • 0, 1, 2: the largest Levenshtein distance is approved.
  • AUTO: automatically adjusts results based on the length of the term.
    • 0..2: match is required, the maximum distance is 0.
    • 3..5: the maximum distance is 1.
    • > 5: the maximum distance is 2.

For example,

Exists Query

Returns documents that contain an indexed value for a field. For example,

Query IDs

Use the document ID stored in the _id field.

Wildcard Query

In a pattern query, the wildcard operator is a placeholder that matches one or more characters.

  • The * wildcard operator matches 0 or more characters, including an empty.
  • Operator ? match any character.

It is possible to combine wildcard operators with other characters to create a wildcard pattern.

Regexp Query

Queries combine with regular expressions to form more complex patterns than wildcard queries.

Regex is a way to match patterns in data with placeholder characters, called operators.

For more information, see Query DSL .

These are some of my findings when studying Elasticsearch, it may have some shortcomings, I hope to receive many suggestions from readers. ?

Thanks all ❤️

Share the news now

Source : Viblo