Attention when choosing MySQL as a database

Tram Ho

Coming to web programming, surely everyone of us has worked with the database. Especially the relational database management system like MySQL and PostgreSQL …

In which case should MySQL be used, which case should not use MySQL? What should I pay attention to when operating MySQL? …

Today I share with everyone the MySQL operating experience I have learned over the years.

Scaling is quite hard

Although when the release is finished running very fast but …

For ease of understanding, I take an example of using MySQL in the chat system with huge message data (eg Facebook Messenger for example)

Then the table to save the message I guess will look like this:

Of course not only this table but also many other tables. For example, the users table to save user information. Table attachments to manage the attached file information when sending. Table tags to save user tag information …

In addition, not only chat features but also users’ behavioral analysis features will be needed. So then we can save the user’s access_log to MySQL (or it can be saved in some external storage parts such as S3 for example).

1 series of processing threads like that, when the release is complete, with a small number of users, the query speed will be very fast. And few people pay attention to scaling later.

What is the problem?

The data gets bigger and cannot cache up memory, leading to read from disk and slowing down

The first stage when release, the amount of data used on MySQL is small, the memory allocated to the cache is small, it can also push data onto it.

But after a time of operation, the amount of data increased dramatically, the number of users access also increased. Resulting in the cache hit rate on MySQL will decrease (because it is not enough memory to cache). The result is to read data from the disk instead of from the cache. And make the speed drop. (Memory data read is 200 times faster than HDD, and 10 times more than SSD).

To solve this problem in a short time, there are 2 ways:

  • Increase memory to be able to cache more data on memory. This is called scale up .
  • Apply master slave mechanism to the system. The Master will take over the recording. Slave will do the reading. Whenever the data on the master is changed, it will be replicated to slave children to synchronize data. This is also called scale out .

No matter how you do it, the cost of running on MySQL will increase. At some point this cost will increase to a level that may occur when the monthly revenue does not pay for the server.

With this situation, what do you think?

  • The read and write speed will be slower, making the user experience reduced.
  • Want to release many campaigns to attract users. But MySQL was bottleneck and couldn’t do campaign.
  • Change to a new type of database. But due to its different nature, it is imperative to correct the application-side logic. Moreover, the migration of big data from one server to another may take many days. (1 day application maintain can make revenue drop quite a lot).

Computer maintenance, quite big failover

When MySQL is being run under a master-slave mechanism (data is always synchronized from master to slave). Unfortunately for one reason the master is down. Then MySQL will perform failover (transfer the downed master to another backup master). This process will take several minutes (with RDS case of aws).

For businesses with strict requirements in terms of downtime (for example, games), it may be difficult to accept the case if it takes a few minutes to perform failover.

So how should we solve that problem?

Select a more appropriate database?

In order to solve this problem, maybe we should choose another database other than MySQL. For example, NoSQL, for example.

But when the data of the system is already big, choosing a different database is not a good choice because it may need to fix the code a lot, the time it takes a lot of time to migrate from database to database. (may take days to 1 week).

So right in the design phase, thinking about choosing a suitable database is an extremely important thing, especially think about how big the future of data will be. With NoSQL, there are countless types, such as Cassandra, DynamoDB.

Both of these types are extremely scalable, big guys like Facebook, Discord, Instagram are currently using it.

I would like to tell you some of the main features of these two guys:

Cassandra:

  • Developed by Facebook, then donated to Apache.
  • Cassandra is a distributed database system that combines the best of Google Bigtable and Amazon DynamoDB. Cassandra development language is Java.
  • Data is spread to multiple nodes, so if one node dies, it can still query the data on another node. Therefore its fault tolerance is extremely large.
  • Its Throughput increases with the number of nodes. If you want it to handle more requests, just increase the number of nodes.
  • Write speed is always faster than read. Because write just append to the file. While reading need to check the CAP (can understand this is data integrity) between nodes, so the time will be longer. Although it is long, the read or write time is only in ms.

DynamoDB:

  • DynamoDB is a distributed database system, developed by Amazon.
  • It is possible to set read throughput parameters of read and write separately. And Amazon will automatically scale to meet that number of throughput.
  • For example, at the time of peak time, we set the read throughput number to be slightly higher, at the time of less access, the throughput setting is lower. To save cost.

==> In summary, choosing which database is the most important thing is that it does not lose the user experience.

Returning to the chat system problem as above, if the number of users increases and the user experience is not lost, we need to design a system that is scalable. Especially with the database, you should not choose MySQL, but should choose another type of database, for example, better.

Conclude

How do you feel? Also imagine the guy MySQL and NoSQL guy how different it?

Based on each problem we should research and make the right decision. Leave the maintain more simple.

It is easy to say, but indeed there is no big system in the world where design 1 can run for a lifetime. Almost any system has a system improvement. For example, early Chatwork code in PHP then switched to Scala for better performance …

Share the news now

Source : viblo.asia