How to build scalable architectures

Tram Ho


Extending the architecture of the current application is seen as a difficult problem to solve. People often think that this job needs wonderful tools, big budgets to perform. Of course that is not always the case. The first thing is to choose the right tool for your job. For example, some types of databases are faster for reading, and some are better for writing data. Although you may have chosen the right tool, one server may not be enough for your job. That’s when, in-depth knowledge of architecture is required to address multiple servers. Of course, you can choose the service packages that AWS has provided for us. But you should also know how to install highly scalable from the most primitive architectures.

The basic elements

Choose the right technology

As I mentioned above, the first thing is that you have to choose the right tool for your job. Different programming languages ​​are used for different purposes.

For example:

  • Python provides great data-working capabilities, without spending too many lines of code bundled with many built-in libraries. NodeJS may have the most external tools available today, but it’s a single threaded language. To run on many different cores, PM2 is required.

  • The choice of database is equally as bad as the choice of programming language. SQL provides the Turing machine language for querying and working with data. But this makes SQL slower than NoSQL. Databases are usually read-first or write-first . That means, having a database is faster for reading, while having a database is faster for writing.

The problem of multiple servers

As I mentioned above, even though I have chosen the right technology. When your application needs to be expanded, running on a single computer is impossible. And every 2 and 3 computers is not enough. To use multiple servers, your backend should be stateless . Why is it stateless ? Because if it is stateful , then we must have a way to share status information from one server to other servers. This makes our architecture even more complex. That is why functional languages ​​are so popular, the reason Scala was born.

When you place a load-balancer in front of servers, this load-balancer will redirect requests to the least busy server. Therefore, we need the same response between the server and the request. Therefore, please stateless application as soon as possible.

Caching and rate limiting

Imagine you have to calculate the same calculation every 100ms for all users. This leads to your server being very vulnerable to DDOS. To solve this, the easiest way is to put a middle ware in the middle. We only calculated that operation with the first user on our server. The next user, only taking results from that middle ware.

Such caching is beneficial but also harmful. It is possible that the data will be stale. This, in turn, requires a cache reset mechanism, which helps the data to be recalculated. Usually we should only cache the server output, not the user input, because the input from the user often changes. With the cache, our server can live if the user requests the same resource. But if the user requests different resources, for example every 1ms, the server will download. So we need rate limiting. If the count from the previous request is not enough time, the request will be rejected. This will help our server live.

Above are the basics, we need to seize to install a server can live okay. Here, I will present the common application architecture.

The popular architecture


This is the basic architecture, you can build with the web stack in just one evening. Of course, this type of architecture cannot be extended. But of course, this architecture is good enough for a small application or weekly exercise.

  • Amount of data response: Several GB
  • Number of users responded: Several thousand
  • Easy to DDOS


We add cache and rate limiting as we mentioned above to prevent DDOS. This also helps, the server runs faster. However, of course it is still difficult to extend with this architecture, because the backend part is still stateful

  • Amount of data response: Several GB
  • Number of users responded: About tens of thousands of people
  • Not scalable yet


With architecture we have expanded to many servers. Now if a server is “stuck”, we can still handle it, but the database server is still in the only one server !!!

  • Amount of response: Few TB
  • Number of users: hundreds of thousands
  • Because there is only one database server, if too many queries are requested, the server database will “open”.


With this architecture, our application is faster, DB works much less hard thanks to a load-balancer .

  • Amount of response: Hundreds of TB
  • Number of users: millions of people
  • If the distance between the user and your server is remote, the application may be slow


As you have guessed, to solve Tiger’s problem, we need to use CDN. You have many servers in different locations, so the distance to the user will be reduced.

  • Amount of response: Hundreds of TB
  • Number of users:> 10 million
  • Your application is limited by Big Data, you are limited by a Database Server in your area.


With a graph database like Riak, your data capacity is not limited.

  • Amount of data available: Unlimited
  • Number of users respond: swim here
  • Of course the only problem here is the money.


We have looked at some of the most popular architectures for most projects. You don’t need to stick with them – if the job requires it, go and design your own. Just remember that every tool has different uses and make sure you are using the right tools for your job.


Happy coding


Share the news now

Source : Viblo