Designing Time Series database with MongoDB

Tram Ho

In the process of making my IoT project on campus even though it was just beginning, I had some interesting problems. The first is that considering the use of SQL and NoSQL to store data, I think that compared to SQL, defining tables to store data does not seem to be a good solution in the development process. Because at first maybe the system only has a sensor to measure the humidity and then improve it to include the temperature sensor, creating a table to save will be complicated when querying between the tables and rewriting, the old data is processed. how ? . So I decided to use NoSql and the challenge this time was to design a database that stores time-series data with Mongodb. In this article I will mention how to design schema and how it affects memory in the read, write, update and delete that I have read.

Viblo logo

What is Time-series data?

Time-series Data : is a series of data, usually collecting data continuously from the same source over a period of time. This analysis of data over time aims to track changes over time. Regardless of the industry, real-time querying, analysis and reporting needs. For a stock trader they need continuous data to run algorithms to analyze trends and identify opportunities.

There are 2 types of data in time-series data

  • The regular time series, the type that shoots data at regular intervals.
  • Unusual time series (events) are events that occasionally take one shot.

Design time-series schema

The first thing I want to mention is that there is no design that is suitable for all applications. It depends on the requirements of your project that you have to choose accordingly. To illustrate, let's say that every 1s of temperature sensors sends data to the server and you have to handle it.

Method 1: Save every incoming data into a document (One document per data point)

Suppose this is data about collecting moisture per second.

Method 2: Combine data and save each document for 1 minute (Time-based bucketing of one document per minute)

Compare these 2 designs

Because I don't have much time to collect, I will use a statistics of the article I read to compare. Here is the article link: https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-2-schema-design-best-practices

They basically compared how the amount of stored data impacts storage size and affects memory based on data collected in 4 weeks.

The impact on Storage

Using design 1 seems to be quite understandable as it is like a row in a table. This design will create a large number of documents

Viblo logo

This is a comparison of the number of documents

Viblo logo

This is a comparison of data size (MB units) and storage size

As you can see, the data size in Design 1 is very memory intensive (note a little bit why the Data size is so much bigger than the Disk Storage Size because MongoDB's WiredTiger storage engine has compressed the data before saving it to disk. )

The impact on the memory usage

A large number of documents will not only increase the size of the data storage but also increase the size of the index. Unlike some key-value databases that position themselves, MongoDB provides extra indexes that give you more flexible access to your data, allowing you to optimize query performance. . In addition, saving indexes for fast queries also takes up memory as shown below, their indexing for document in the 4th week has a size of 103.03 MB already: v

Viblo logo

Option 3: Size-based bucketing

This type of design is suitable for situations where data is not inserted regularly. In time-based applications such as IoT projects, sensor data can be generated at irregular intervals, meaning that some sensors can be activated under certain conditions and then provided. data for the database (sometimes sent in 1s, 15s sent in 1 time, etc.). In these situations, using method 2 may not seem like a good idea. The alternative solution is to save by size. With the size-based locking feature, we design the schema around a document with some conditions.

For example, we will design with a limit of 200 events per document or 1 day's expiry (whichever comes first)

The example above is trying to limit the number of inserts added per document

The data will be added until it reaches max of 200 then it will create a new document based on the command ( upsert: true ).

The advantage of this approach is that when we take data over time, for example, one day or several days will be extremely effective. Using this method is one of the most effective ways to store data. IoT in MongoDB.

These are the ways I synthesize hope to help people choose the right design for their application.

Reference article: https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-2-schema-design-best-practices

Share the news now

Source : Viblo