How has Cloudflare designed their system to easily expand rapidly to protect large-scale DDoS attacks?

In the past few weeks, some legacy DNS and DDoS-mitigation services have encountered large-scale problems with strong network attacks. Many Cloudflare customers have asked (graciously) what we did to deal with these attacks.

While any service has its limitations, Cloudflare is certainly not an exception, but we still have a strong enough architecture to withstand recent attacks and will continue to expand to block stand wide-ranging attacks inevitable in the near future. We, still restless, overcome all affected botnets. Based on the published data, together with the private sources sent, we successfully prevented the same scale and model attack without harming customers.

Perhaps, this is the right time to analyze the difference between the architecture of Cloudflare and other DDoS-mitigation legacy DNS services, and these differences have helped us protect our customers from ever-increasing risks. how.

Analogy: How extensive is the Database

Before looking at our architecture, take a few minutes for another, more understandable, similar technology issue: scaling databases. Since the mid-1980s, when relational databases began to flourish, by the early 2000s, companies had conceived of equating expansion with the purchase of new hardware. The game is always: buy the largest database server you can buy, start injecting data, and hope you can buy a bigger server before you run out of memory. Therefore, hardware companies respond with increasingly expensive and optimized hardware devices for the database.

Meet the IBM z13 mainframe (source: IBM)

At a certain point, all the information that some organizations want to store is in no way encapsulated in a small box anymore. Google is a very typical example. Previously this company was just a startup, they didn't have the resources to buy expensive expensive database servers. Even with the money, the biggest servers don't have a way to store all the information (white as the internet) they want to include.

So, instead of following the path of the predecessor, Google wrote a novel software, allowing many inexpensive servers, usually able to work together like a complete large database. Gradually, as Google developed more services, the software became more and more effective, transmitting all computers in Google's network to make good use of multiple networks, handling capabilities and storage capabilities. . And, if Google needs to expand, they only need to add a regular server – that is, can expand resources to meet demand already.

Legacy DNS and DDoS Mitigation

Now we will compare this approach with the security of legacy DNS and DDoS mitigation services. Usually, the way to block attacks is usually to buy and build a large box and use it to filter incoming traffic. If you learn more and the technical details of the majority of legacy DDoS mitigation providers, you will see hardware from companies like Cisco, Arbor Networks, and Radware sticking together into "scrubbing centers."

Waste disposal system CC BY-SA 3.0 image by Annabel

Like the old database world, there are lots of cool tricks to force these giant iron boxes to work together (something like that), but it's quite subtle. Typically, the physical limit of the number of packets that a single box can absorb becomes the effective limit on the total amount that a service provider can prevent. At the same time, in large DDoS attacks, most of the traffic of the attack will never reach the scrubbing center, because with only a few locations, upstream ISPs will become bottle neck.

Moreover, in terms of cost, implementation of hardware scrubbing proved ineffective. If you are a DNS provider, are you actually attacked several times a year? Investing in a lot of expensive mitigation hardware in each data center is worth it? Even if you are a legacy DDoS provider, usually your service is only used when the customer is attacked, so upgrading the power much higher than the previous biggest attack proved ridiculous. . This is obviously the thinking of many people, but this conclusion is "deadly" to the traditional model.

The future is not wrapped in boxes

From the beginning to the present with Cloudflare, we see our infrastructure as Google with their database. In the early days, many traditional anti-DDos witnesses tried to convince us to use their technology. We ourselves have also "tried" to build a huge box to clear the traffic. This approach seems like a fascinating technical challenge, but we realize that this model will never expand.

Instead, we started with a very simple architecture. Cloudflare's first systems consist of only three components: routers (routers), switches (switches), and servers (servers). So far we have even simplified this model, occasionally removing the router completely and using switches that can handle the routing table to route packets across geographic regions. The data center is responsible.

Without using both load balancing or specialized anti-DDos hardware (which can turn into bottlenecks in an attack), we write software using BGP (Internet based routing protocol) to distribute load by geographic location. and go to each data center in our system . The main principle for this pattern: each server in each cluster is able to respond to each type of request. Our software continuously distributes downloads based on the needs of specific customers at specific times. In other words, we automatically distributed tens of thousands of servers in large-scale attacks.

Graphene: Simple structure 100 times stronger than the best steel (credit: Wikipedia)

As such, we can continue to invest in our network less expensive. If Frankfurt needs a 10% increase in capacity, we can distribute it by 10% more servers, not thinking about whether we should buy a Colossus Mega Scrubber ™ box.

Because each core in each server in each data center can help prevent attacks, so with each of the latest data centers in operation, we have even better near-source protection capabilities. faster, more flexible and more powerful. In the end, the best solution for hyper-distributed bonet is the massively distributed network . This is the principle of internet operation from ancient times, the distribution power, not how to focus small in each cleanup position.

How is our DDoS Mitigation service so cheap?

The efficiency of resources must be reflected in operating costs and should not be stopped at capital costs. Because we use the same device and network to provide all of Cloudflare's functions, we rarely have to spend more bandwidth when blocking a certain attack. You can read the explanation below a little bit more, because to understand this idea, you must first know a little bit about how to buy our bandwidth.

We pay the bandwidth from the amount that exceeds the number of monthly cumulative suppliers allowed (at the 95th percentile of the difference between ingress versus egress). Ingress is the network term for traffic sent to our network. Egress is traffic sent from the network.

Besides DDos rescue service, Cloudflare also offers many other solutions including caching. The nature of cache is that you should always have more traffic running out of the cache than running into it. In our case, in normal situations, we encounter more egress (traffic out) ingress (incoming traffic).

DDos attacks push ingress high but not affect egress. However, even in a super big attack , ingress exceeding egress is very dangerous. Because we only pay the ingress difference compared to egress, and because egress is always higher than ingress, we will have a huge amount of free bandwidth to handle the attack.

As our services become more popular, our ability to block attacks will increase accordingly. Many people are concerned how we can charge a fixed fee for all such offensive scales. And, although many provider legacies have stated that their pro bono DDoS mitigation service will cost them millions of dollars, we can still protect important sites (political and artistic) free of charge in front of them. dozens of super large attacks that still cover thanks to Project Galileo .

Winning the arms race

Cloudflare is the only DNS provider, originally, designed to prevent large-scale DDos attacks. Similar to the dispersed nature of DDos, Cloudflare's DDos relief system is also widely distributed across its global network.

Undoubtedly, we are in an armed war with the attackers. However, we are on the upper stage of the battle, both technically and economically. Facing each legacy provider, attackers still have the advantage: the cost of the vendor is very high because they have to buy expensive equipment and bandwidth, while the attackers spend less money because They use hacked devices. Because of that, our trump card is in this marvelous software, helping convey the huge network of cheap distribution hardware. When costs can be minimized, we can continue to invest extensively in anticipation of future attacks.

Today, we believe that Cloudflare has a lot of power to prevent the attacks of all opponents combined. And we will continue to expand, with each new center after each week. The best information for users is that we have been and will be designing Cloudflare in an economical way and still maintaining the power before the attackers. There is always a limit to all services, we must always be vigilant against any risks. But we believe that this is the right direction, with the maximum capacity that can prevent all the anticipation of difficult sabotage.

ITZone via Cloudflare

Share the news now