!!! WARNING: All of the following is done in ONE (01) day !!!
Do not follow because you will regret it !!!
BUT SHOULD READ
Legend has it that on a beautiful Sunday, Minh Monmen suddenly came up with a good idea if I built a dashboard to check the CCU of the system in real time . An extremely fancy ideas without talking about why it should be presented drawings do things that other guys a lot HAS dropping and tastier as Google Analytics, Analytics Firebase … All our work to do Just plug it in and everything’s ready.
But no, as a suspicious person, I do not believe in the tools, and only consider GA types as a reference only. And more importantly, I need to completely control the number that I create, understand what errors it has, what it means, why it is so, … Instead of closing my eyes and nodding. with an ineffective number and ask yourself about their true meaning.
Don’t Reinvent The Wheel, Unless You Plan on Learning More About Wheels
This is the title of an article I read on codinghorror.com about not reinventing the wheel when others have done it very well. So if you’re a pure coder and don’t care about the mystery behind the numbers, you can leave it, because everything behind me is reinvent the wheel.
First things first
Before returning to the story of the cannon car, please take 1 minute to commemorate the presence of the author Minh Monmen , who although made code in just 1 day but took up to 5 days to record the discovery process of yourself on the path to (old) knowledge. Occupation or health status, marriage, gender of the author, nguyen y van, should not be mentioned here.
Certainly this article will be very long and sometimes have to split into different parts, so to give you more motivation, I will summarize in advance some things that I have learned after a hard day. Design this CCU measuring system:
- Understanding CCU measurement is very difficult (of course) .
- Have the opportunity to practice 2 elements that I have long cherished: super fast and surreal (ie unrealistic =))).
- Super-light, super-fast approximation algorithm: Hyperloglog
- In-memory caching (serious model)
- Batch processing
- Using resources properly (type of frugality saving)
- Finally, the conclusion: Technology does NOT make your application run faster. YOU are the one to do that. …
And many other things that I not only learned to do this system, but also to apply to many other problems that I am also having a headache. Only if you know this, does the Redesign of the wheel help you not only know how to ride, but also help your car to walk.
Here is the knowledge you need to prepare to approach this article:
- Golang (because many implementers use golang, it is better to know)
- Time series data : time series data
- Probabilistic data : estimated data
- Prometheus, grafana (used for data collection and graphical representation)
- Courage and eagerness to learn (this is the most necessary thing)
Temporarily, come on.
A few lines of basic knowledge
If you have started to work on web, app, of course, everyone will at least once hear about CCU, the wall name is called kon ku , or Concurrent Users , that is, the number of concurrent users for a system. By definition, it is simply and simply like this.
The definition is so simple but how CCU is calculated, and what CCU means to you is a completely different story. There is no exact formula that can be applied to all systems because each CCU measurement gives you a different meaning. For example:
- For real-time systems such as games, chat, etc., and users opening a permanent connection to the server such as a socket, etc., the CCU can be measured by counting the number of connections to the server at a time . This number can be used to tell you the system’s load capacity, limit calculations, hardware plans, etc.
- For normal web systems that do not use long-term connection such as news web, sales web, etc., the CCU can be measured by counting the number of client requests to the server in a period of time . This number also says a part of the amount of traffic you need to meet, but depending on the specific system, it needs to combine with the number of requests made to the system, the nature of each request , … to have an accurate view of your system.
In this article, I will mention the CCU calculation type for most web systems, the current app is not using long-term connection but based on the user’s HTTP request.
How to calculate the CCU itself?
Normally with normal analytic systems, you can easily track indicators such as DAU, MAU, the total number of requests in a period of time, … and you can completely use these indicators to estimate. CCU. However, these figures only stop at the estimates . You may know that one day you have 100K active users, but these numbers are not evenly distributed? Your CCU may then focus on just a few hot times.
And of course, you do not want to calculate the CCU to be able to calculate it. Your system must meet several conditions on both the client and the server.
To be able to measure the CCU, the client must do something to the server to recognize the current user active. It could be:
- 1 request to initiate session
- or 1 ping request (eg 1 server ping every minute)
- or a request to perform an action like reading or writing data
- More advanced is that the client records the time the user is still on the app or the web (which can be sent later to the server) …
The most common and accurate mechanism commonly used by web and app types is to create a separate ping request to the server. This request may only mean ping, which may also contain more packed information. But it is important that it is called by the client to tell the server that the user is active on the app.
Request ping can be called every 10 seconds, 20 seconds, 60s or every few minutes depending on the characteristics of each app’s session.
There are 2 ways to measure CCU server side:
- 1 is a separate tracking ping request . If the client already supports ping to the server periodically, the only thing the server needs to do is to monitor this request.
- 2 is that when a dedicated ping request is not available, a middleware layer is needed to interfere with every endpoint (or most of the endpoint) of the system. This is just a way to fix the client does not have a dedicated ping request, which will create many errors.
Because my application does not yet support ping requests, I have reviewed and used the second method after finding my system architecture suitable for this method.
Difficulties piled up
Everything began to be difficult, arduous began to be discouraged. After embarking on the implementation, I have encountered difficulties in terms of both data and technology as follows:
Identify 1 CCU
This looks like a very difficult question to answer. When do you calculate 1 CCU? Is it when someone logs in to your site? Or include the user who is not logged in? So 1 user using 2 tabs is counted as how many CCU? Or even requests from bots google, facebook, 3rd party, … into the system also count 1 CCU?
When I first saw how to define a CCU, I just confuse already. And again, this question does not have the answer, but the answer depends entirely on whether your system supports it, your assumptions about the business and how you accept the number of results.
Because I have no control over the client’s information, so I have accepted the information I have about a client in each of the following requests:
- User ID : Obtain from the user’s session / token
- Client IP : Obtained from external proxy / gateway layer
- User Agent : Obtained from request header
After combining these 3 information, I have 1 way to identify an approximate CCU.
Why is it close ? Because there will be quite a lot of cases where a client calls with a token / no token, or a client opens many different tabs, or an IP contains many separate clients, or many clients have the same user-agent …
1 How long does CCU last?
1 How long will a CCU be recorded to last?
That’s the question we have to answer next to systems without long-term connections. Users only interact with our system through individual requests, so the CCU only makes sense when we attach it to a certain period of time. For example: In 1 minute, the number of people accessing the system is 1,000 – that is, in 1 minute 1000 different users have access to the system.
The choice of timeframe is 1 minute, 3 minute or 10s, 20s, … depends entirely on how you understand your system. For example, systems that interact through requests with a high frequency will reduce the timeframe and vice versa.
Here I define my system consists of 3 levels: (this is entirely my own initiative and is not based on any one theory, so it is only for your reference)
- PEAK level : is the minimum time level (used as a time base level to count). I took the value of the longest time between 2 requests from the client . It is related to the number of requests in a peak at the time that many clients came from and is called Peak CCU . For example, my app periodically calls a request to the server every 1 minute, the longest time between 2 client requests will be 1 minute. However, this number may not be accurate when there are no recurring requests, or the client processes a task that does NOT need to call the server.
- SESSION level : The average time level. I took the value around the average time threshold 1 session of the user. This number makes sense considering how many users appeared during the session. I call this number Session CCU . Normally, this number will reflect CCU more accurately when it covers users who have accessed the app but did not generate any server related task or were not shown via CCU at the time . For example, my app has an average Time On App of 5 minutes, then I will take 5 minutes to make the time to measure Session CCU
- LIMIT Level : is the longest time period still considered for CCU calculation. This time is usually flexible depending on the requirements, however I usually set at 3 ~> 6 times the SESSION Level . The goal is to consider the overall traffic to the system within a time frame long enough to assess the quality and limitations of push campaigns that attract visitors. For example, using 2 consecutive CCU peaks, will the next peak help the system how much new traffic or just the old traffic? This period should not be too long because it will be diluted and smoothed in the long run. I usually choose a period of 15 ~> 30 minutes for the LIMIT time limit and call this number the Limit CCU
Count unique CCU
Once we have a way to identify a CCU and a time to calculate the CCU, how do we count the unique CCU that time? Counting unique with singular numbers can be a simple problem, but when the number of requests reaches tens, hundreds of thousands of requests in a short period of time, it is not easy and quick to find unique CCU numbers. quickly.
The first idea that comes to mind when thinking about this problem is to use redis set . The Redis Set is a data type of Redis that allows storing an array without repeating elements. We can easily add elements and count the number of elements in a set with orders of O (1) complexity.
But using Redis Set with string data as above has many disadvantages:
- The union speed is low (complexity O (n) where n is the number of elements in every set). This makes the calculation for larger intervals much slower (for example, calculating a unique CCU in 5 minutes by combining 5 CCU keys of each minute)
- Use lots of ram because the data is saved as a string set. Although we can save by converting the
user_id-client_ip-user_agenta shorter hash string, the amount of ram needed to store data for CCU is still very large.
Fortunately, I have mentioned an algorithm born to handle this job. That is Hyperloglog .
Hyperloglog is an algorithm that approximates the different values in the set, this algorithm has high accuracy while maintaining a gentle data volume, supporting many operations involving counting elements in 1 or more. rally. And most importantly it is very fast. Superfast if we organize our data properly.
Lightly recall a bit of knowledge from the previous article. Details about hyperloglog you can google more. And here temporarily know that. I can save CCU data accurate to the minute in a system of several tens of millions of requests every day with only 10MB of ram .
For realtime measurement, I used hyperloglog on redis with the Redis Hyperloglog data type built on redis since version 2.8.9. You can see more about how redis implements hyperloglog in this article: Redis new data structure: the HyperLogLog .
Integrate with existing systems
This is a relatively cassava part in a short time (1 day) so that I can integrate the CCU counting into the already complicated system. For simplicity, I can describe my system as follows:
Now I will intervene in the middleware class to interfere with the requests. This carries a lot of risks related to the performance of the system because the middleware is the layer that affects every request from the client. In fact, when I did that, I tested the performance of my middleware tier by half.
A slight advantage is that I already have a monitor system with Prometheus + Grafana . So my job now only includes collecting data from middleware and exporting data to Prometheus so that the chart can be viewed on Grafana.
Embark on implementation
Long talk like that is for you to understand the nature of solving problems in programming. The time we take to solve a problem is very fast (as I only need 1 day), but the time we take to have enough knowledge to solve the problem is not at all. All of this knowledge has taken me many months to implement projects related to tracking, analytic, building system architecture, monitoring, …. I tried very hard to make the article as concise and easy to understand as possible. Please keep watching.
Here are the steps I took to collect the data:
- When there is a request to the system, create the CCU identifier by combining the information:
- In order to have a closer look at the CCU, I parse the user-agent to identify devices that users use. Specifically preliminarily classify clients into
- Create a key set on redis corresponding to PEAK Level , here is every minute to save hyperloglog, we call bucket . Each type of client will have its own bucket.
- Add the CCU identifier to the corresponding bucket with the PFADD command
Out of data collector. Next to data exporter.
- Prometheus will periodically call the data exporter endpoint to get metrics.
- Corresponding to 3 time periods: PEAK , SESSION , LIMIT will have 3 metrics respectively. Data exporter will calculate these 3 metrics using the PFCOUNT command
Out of data exporter. Finally, use Grafana to draw a chart. This is too easy to say =)))
Accept 1 user with wrong number
Oops, during the trial run, I discovered there are many cases that cause me errors when identifying a CCU:
- 3rd party requests.
- Requests from a client but to the public API and the private API will differ in whether or not they contain tokens.
- Requests from one user login multiple accounts with the same identifier.
Many things that affect whether a request is considered to be from a new CCU. Therefore, to overcome this limitation, I have separated into 2 separate sets of indicators:
- CCU measured by unique user (only use logged-in user with identifier
- CCU measured by unique session (using
IP + user-agent + user_ididentifier
IP + user-agent + user_idto count)
Thanks to these two indicators, I have a more general and accurate view of the number of users in the system, both in terms of the number of users (more accurate but lack of guests) and sessions (less accurate but more general. )
Accept the whole system with errors
The next sacrifice is about the accuracy of the data. As I mentioned above, if CCU is calculated using Redis Set , it will give more accurate numbers. However, to save memory, as well as speed up the union between the buckets, I have accepted to use Redis Hyperloglog to calculate.
With a relatively low error rate (<2%), the use of Hyperloglog has a superior benefit in this case. With only 10MB of RAM, I can save the entire CCU data of the system instead of several hundred MB like using a redis set.
Accept … well, not accept
Finally, the sacrifice on performance. As I said, the performance issue is a huge risk for the system that I have to accept. At the first implementation, with each request I have:
- Parse user-agent to categorize requests
PFADDtwice to add the identifier to the corresponding bucket.
Although the user-agent parse and redis call are completely run async, which is the response to the new user, the benchmark middleware has dropped as follows:
------ Before implement ------
Requests per second: 17332.91 [#/sec] (mean)
Time per request: 5.515 [ms] (mean)
------ After implement ------
Requests per second: 7644.02 [#/sec] (mean)
Time per request: 10.027 [ms] (mean)
That’s a big sacrifice, isn’t it? Sacrificing half of performance is just for measuring. So I do not accept but continue to find ways to optimize more. After testing for several hours (this optimization is only a long time ago), I have found the culprit that made my application sacrifice such performance:
- The parse of user-agent to classify requests is often quite slow due to having to check through many different types of regex. (~ 1ms / op)
PFADD, despite having O (1) complexity and using the redis pipeline (works as bulk command), still consumes a lot of resources and network. Even affects the speed of the main operation (also using redis)
Now let’s analyze what we can do about these two issues.
The first issue is to improve the speed of user-agent parse. Starting from the user agent of the app or the web is repeated between requests of a user, even among users with the same device. So I thought of an in-memory cache solution, that is, using global variable storage methods to cache the user agent parse result.
Of course, I’m not crazy and go ahead and think of this in-memory cache. I’ve only used redis as a cache class before. Now even the redis who is still slowing down the mango has to find another way. Incidentally, one morning surfing the news somewhere, I read 2 blog posts about Golang’s in-memory cache libraries: The State of Caching in Go and Introducing Ristretto: A High-Performance Go Cache . So I decided to use ristretto . Our benchmark process shows low RAM usage and impressive response speed. Done 1 job.
The second problem is to improve the flow of information written into the redis with the
PFADD command. With each request to the system having to call the two commands above, the ability to connect to redis slows down the system. I have researched a lot of places, then the awesome list, and that but I have not thought of any possible way. After that, I put it aside for a while to explore other issues first.
But just when I was learning about the
thread-safe ability of the redis library on golang, there was a comment in the issue that enlightened me: https://github.com/go-redis/redis/issues/166#issuecomment -149360999 .
The idea here is to use the same mechanism as buffer / flush to process. But here is using a very old library and also the archive of facebook is Muster to handle internal batching. That is, requests to redis will be pushed into an internal batch process. Periodically 1s or 10000 requests, I will flush the internal batch down to redis with the redis pipeline . This saves a lot of resources and avoids clogging up your client redis.
Wow, what a surprisingly worthwhile discovery. I have tried to implement the above library and here is the result:
------ Before implement ------
Requests per second: 17332.91 [#/sec] (mean)
Time per request: 5.515 [ms] (mean)
------ After implement #1 ------
Requests per second: 7644.02 [#/sec] (mean)
Time per request: 10.027 [ms] (mean)
------ After implement #2 ------
Requests per second: 16592.69 [#/sec] (mean)
Time per request: 6.027 [ms] (mean)
I will leave here and say nothing.
Ignore the beard that involves writing a
data exporter or making a simple graph grafana with metrics:
- Concurrent Session (PEAK, SESSION, LIMIT) by (device type)
- Concurrent User (PEAK, SESSION, LIMIT) by (device type)
- User TOA (time on app) by (device type)
I have succeeded in implementing real-time CCU measurement across the system without affecting too much the running system. Right?
At this point, I would like to close the very long and tiring article about my inspiring Sunday. Many thanks for your patience to follow me here. Although I always tell you not to follow me, but if someone wants to try, give it a try. Because everything I learn on the go is rewarding and there are many applications in my other projects.
Through this article, I draw 3 things (to repeat):
- Reinvent the wheel gives you a lot of things but only if you really understand what you are doing.
- When the problem is too difficult to solve , please ke-no (ignore it) and go do something else. Then naturally the answer will hit you in the face before you can.
- Technology – again – does not make your application run faster. You are the one who does it.