Memory usage optimization in elasticsearch

Tram Ho

Turn off unnecessary features

By default, Elasticsearch indexes and adds a doc value to most fields so they can be searched and used aggregated. For example, if there is a numeric field called “foo” and you are only using it to plot histograms with no search or filter intent, you can disable indexing on this field in mapping:

If scoring is not concerned, you can configure Elasticsearch to index only match documents for every term. You can still search on this field, but phrase queries will generate errors, and the scoring will assume that the terms only appear once in all documents.

Do not use the default dynamic string mappings

By default, when not predefined in mapping, string data will be indexed for both text and keyword. This means that for the same amount of data we have to spend twice as much resources to store the index. This can be modified by defining the type for the field you want to index in advance or by setting the dynamic string mappings templates mapping string data with only one data type, text or keyword:

Watch shard size

Larger sized shards will store data more efficiently. To increase your shard size, you can create indexes with fewer primary shards, create fewer indexes (for example, by leveraging the Rollover API), or modify an existing index using the Shrink API. Remember that large shard sizes come with limitations, such as long recovery times.

Disable source field

The source field stores the original JSON body of the document. If you don’t need access to it, you can turn it off. However, APIs that need access to source like update and reindex will not work.

Force Merge

Indexes in Elasticsearch are stored in one or more shards. Each shard is a Lucene index and is made up of one or more segments – the inverted index sets are stored on disk. The larger segments will be more efficient to store the data.

The forcemerge API can be used to reduce the number of segments per shard. In many cases, the number of segments can be reduced to segment segments per shard by setting max_num_searies = 1.

Shrink Index

The Shrink API allows you to reduce the number of shards in an index. Together with the Force Merge API above, this can significantly reduce the number of segments and shards of an index.

Use the smallest numeric type possible

The type you choose for digital can have a significant impact on disk space usage. Specifically, integers must be stored with an integer type (byte, short, integer, or float) and floating points must be stored in scaled_float if appropriate or in the smallest possible type suitable for each use-case. . For example, using float instead of double, or Half_float instead of float will help save space.

Put data with fields already sorted in the same order

Due to the fact that multiple documents are compressed together into blocks, you will more likely find longer duplicate strings in the source of those documents if the fields are sorted in the same order.

Synthesize historical data

Keeping older data can be useful for later analysis but is often avoided due to storage costs. You can use the Rolling up data plugin to collapse and store historical data with less hard drive space usage.


Share the news now

Source : Viblo