Build an efficient number counting function

Tram Ho

Problem

The problem is to track the number of requests sent to the system so that it can be displayed in the backend in detail for each day, month or year.
Due to the characteristics of a large number of requests sent to the system, up to millions of requests per day, the processing of calculating the number of requests, designing a long-term storage database, and efficient querying are issues that need to be solved.

General idea

1. Handling saving the number of requests

1.1 Ideas

Starting with the idea everyone thinks of, for each request sent we will call 1 command +1 request to the db

image.png

Problem:

  1. We have to wait for the extra call to the db to slow down the response
  2. The large number of requests sent at the same time can increase the db load, which can affect the processing speed of the entire service.

1.2 Improvement

How to add the number of requests quickly, while minimizing the number of commands that call into the db?
We need to tweak the architecture a bit

image.png

  1. Instead of handling the addition of calls to the db, we will call redis because processing I/O using redis is very fast, and can handle a large number of requests at the same time.
  2. We will save information about the total number of requests in a day, after every day there will be 1 service calling to redis, updating information to the db.
    => This avoids calling the db too much

2. Efficient handling of long-term storage & querying

1.1 Ideas

Because we need to track the number of requests in detail, hourly. So we need to store the number of requests per hour in the db.
I use MongoDB so we can save the following information:

Problem:

  1. Simply saving information as above will make it difficult for us to self-aggregate data every day, then every month, every year.
  2. The speed of querying a large number of records and needing to aggregate them makes the search speed slow

1.2 Improvement

1.We can apply Bucket Pattern: It is simply to group data of the same type into one record to save memory, and easily query
I can save all the total number of messages by hour in a day with a single record

With this way of saving, we can easily query information by day. So what to do by month by year?
Similarly, we can also save monthly information as follows:

=> So we can query by day, month, year very effectively and simply

2. In order to have the above query data, the synchronization step needs to be updated into 3 records (day, month, year) instead of only 1 record as before.

Other problems

1. How should information be saved in redis?

Surely people will be confused between storing information as key value, or using hash in redis to store data for the day
In my opinion, it is better to save the data as a hash because it is easy to get all the information with the hgetall command and delete it when the synchronization is done.
Hash in redis also supports plus commands as hincrby handles the addition
The key can be of the following form: counts.{partner_id}.{yyyy-mm-dd}.
For example: counts.123456.2023-03-01

2. Handling syncing after every day

In my opinion, you should save the state each time you sync, and let the job run a few times a day to avoid running only 1 failure and losing data.
With the following logic:

  1. Check if the sync of the day was successful?
  2. If successful then stop
  3. If it has not been successful or has never been synchronized, continue to sync
  4. Save the sync state

Note: When the synchronization completes 1 hash in redis, we will also delete the hash, avoiding the deletion of the hash after the synchronization is complete.

So that’s the end of the article, see you all in the next posts

Share the news now

Source : Viblo