Instagram – What’s good about Systems Design?

Tram Ho

Is it difficult to design a system similar to Instagram? and things to note!

A system is very much chosen by the Brothers and sisters as a place to keep simple memories with a dedicated system for uploading / downloading (view) Photos, but to get a bunker architecture that holds A ** huge amount of traffic ** like Instagram is an extremely pitiful and complicated problem, let’s learn about the problems to design a system similar to Instagram!

Hi, I’m Khanh Ney , a genuine Backend Developer who adores Kings and brings the King’s mindset to problem solving with DIY (Do it Yourself). Hopefully, your knowledge can help you who have not come into contact with designable problems, estimating and evaluating problems from a system design problem.

To design an effective system, you need to understand the core requirements of the system, and put out the elements that make the system scalable in the future.

1 / System Requirements

  • Required:
    • Allows users to Upload / Download images of themselves or from other Users
    • User can search for information such as User name, Photo, Video
    • As a social network, then the feature to follow and love is an indispensable part
    • Images, videos or other content uploaded by users will not be lost after a period of time is allowed for storage by the system
    • The system itself is able to generate and display News Feed content from all followed users.
  • Optional requirements:
    • The system responds with the ability to respond from your Download / Viewed requests in less than 200ms
    • The system should be capable of meeting high availability (Highly Available)
  • Note:
    • The number of images and other content will be used a lot by the User, so the efficient storage management will bring the lowest response-time.
    • The system must ensure the level of data safety (Storage: images, videos, …) will be at 100%
    • Estimated Read / Write ratio ~ = 5/1 (average User will download (view) 5 images / 1 uploaded image).

2 / Estimated storage capacity

  • Assuming the system has 10M User, with the number of hits per day DAU is 1M User
  • 200,000 images uploaded per day → ~ = 3 photo / s
  • On average, 1 image will be ~ = 300KB in size
    • 2.1 / Estimated Traffic
      • with read / write ratio ~ = 5/1
      • Average per day 200,000 images are uploaded 5 * 200,000 (photo / day) = 1,000,000 (photo / day) → 1,000,000 / (1 day * 24 hours * 3000 seconds) ~ = 12 photo / s (QPS – queries per second) with a ratio of 5: 1 Read / Write: → 5 (5 Read photo / 1 Write photo) * 12 photo / s (Query per Second) => Request Per Second: 60 (rps)
      • 300 (KB / photo) * 12 photo / s = 3600KB / s = 3.5MB / s
    • 2.2 / Estimated Storage
      • 200,000 (photo / day) * 300 (KB / photo) ~ = 60GB / day
        • for 1 year → 60GB / day * 365 day = 29TB / year
    • 2.3 / Estimated Bandwidth
      • With Write Request (Upload), with 3 photo / s → total Incoming data = 3 (photo / s) * 300 (KB / photo) = 900KB / s
      • With Read Request (Download / View), with 60 (Request per second) → Outcoming data total = 60 (request_photo / s) * 300 (KB / photo) ~ = 17.5 MB / s
    • 2.4 / Estimated Memory
      • If you follow the 20-80 rule, that’s 20% for Hot Photos
        • With 60 (rps) → 1 day we have: 60 * 3600 seconds * 24 hours = 5.184.000 (Request per Day)
        • According to the 20-80 rule, we need: 5,184,000 * 0.2 (20% Hot Photo for Cache) * 300 (KB / Photo) ~ = 311GB

3 / Database & Storage

Collections / Tables

  • Designing a database that can store the information with the above requirements won’t cause you much trouble, but from a HighLevel perspective you need an approach to data exploding, and latency. (Latency) is a key-design that you need to target.
  • With current databases like SQL vs NoSQL, we need to have a suitable approach strategy, we will have a few notes before choosing a database for this System, such as:
  • Number of records to be archived: 200,000 records / day → 10 year = 200,000 * 365days * 10 years = 730M records / 10year
  • Each object is assumed to be less than 1K / record (document) (this can be tracked from CLI DBs, or assessed by the fileds, for example with MongoDB: _id = 12bytes, number = 8bytes, …)
  • There is no need to store relationships between records
  • The system has a high READ rate=> From the above stores, we can give effective DB TYPES such as: key-values ​​( Redis ), Wide-Columm ( Cassandra ), NoSQL ( MongoDB )
  • Regarding storage, we can use Storage Distributed files such as S3, BackBlaze, …

4 / Sharding

  • With requirements from the system:
    • The system responds with the ability to respond from your Download / Viewed requests in less than 200ms
    • The system should be capable of meeting high availability (Highly Available)
  • Sharding is an effective solution, helping you to distribute data (1 advanced version of INDEX Distributed  ~ I think so). Helps you to scale-Horizontal DB effectively.
  • The implementation of Sharding will help you to solve the above problem, in addition, when the nodes deploy with Sharding, deploying with the Master-Slave Architecture (Replica-set) will help you have a system with High Availability ( Highly Available)

Note : During the calculation of Metadata storage, if you still ensure that the generated data is still within the allowable area, you can still choose a Scale-up solution with increasing the system infrastructure, but in general , with a Sharding System, after 5 or 10 years, I feel secure, my rest is just how to choose SHARD- KEYs (like mongod indexed), and need to test all kinds of things. to make sure that your judgment will be ALIVE on the Production environment.

5 / NewsFeed – Is life for most of you using Fb or Instagram

  • With the traditional ways (I used to use it before), the process of creating a NewsFeed will be as follows:
    • Get a list of currently following Users → Query Photo latest of each User
    • Then the query adds metadata information (photo, user, …) → response for the user
    • Use a certain algorithm to order the images.
  • The procedure looks like this: Query → Sorting → Merging → Ranking.
    • Problems: With this approach we will encounter problems, like this:
      • High Lantency (DB), comes from Query from multiple tables (from 1 request)
      • High latency in process execution from Query → Sorting → Merging → Ranking (Logic).
    • Solution: Do implicit tasks or trigger → Pre-generating NewsFeed
      • Details of the example are as follows.
        • Create 1 table generateNewsFeed, with link
          • Users (recipients + top Ranking)
          • Author
          • … Metadata (newsFeed ref, or info NewsFeed)
        • Each time User creates 1 Post (Photo),
          • Server does a search → update (create a generateNewsFeed record for example), process Ranking algorithm → update DB.
          • Or you can run the Schedule-Task Events to handle the create / update jobs on the generateNewsFeed table
          • When User requests NewsFeed → Server Query on table generateNewsFeed
  • The next problem that we hold the brain to think is, How will NewsFeed send Clients (Users)? We have the following approaches.
    • Pull , ie Clients need to send requests to get NewsFeed, but the problem and from this approach is as follows:
      • Clients will request if you want NewsFeed (this is like when you use App Mobile Facebook, you need to swipe down from the screen to get new NewsFeed content)
      • Sometimes the server will have no data returned (it could be because all NewsFeed has been retrieved, or there is no new NewsFeed from the following Users).
    • Push , ie when one of the following Users creates a new post, the Server will automatically send out the task of sending to clients who are following this User, this approach also has the following problems:
      • For Users with BIG amount of User (I am following) (for example 100ngh), the system will push NewsFeed continuously, if they post pictures with a dizzying rate
      • But overcoming those obstacles, assuming that for the Users with tracking <100 thousand Users, then it is considered provisional, so we have to choose which Protocol to implement Push, there are many types of Protocol, but I just Give a few suggestions such as: HTTP Polling, HTTP Long Polling, WebSocket, … you can see more information and ratings → choose the right choice for your system. Go with the buddha in a robe, go with a ghost dressing áo -> From the pros / cons analysis we can choose a way that optimizes the sending of NewsFeed to Clients, as follows:
        • For the Users who are watching LESS (<100ngs), we can choose PUSH to send NewsFeed to those Clients.
        • For Users with HIGH tracking volume: + with NewsFeeds with high RANKING or a certain criterion that the Server prioritizes: → PUSH-based can be used to send NewsFeed + with normal NewsFeeds → use PULL -based to view the NewsFeed 6 / Caching

We have many levels of cache: Client Cache, CDN Cache, Webserver Cache, Application Cache, DB Cache, … but we will only talk about Application Cache.

  • Memcache or other Enginee like Redis will be a Tool to help you optimize your search (HOT POST / Photo). rely on mechanisms, such as:
    • LRU (Least Recently Used)
    • LFU (Least Frequency Used)

→ With the current system, NewsFeed will prioritize Ranking according to the criteria of Time (Nearest) → so we will choose LRU for Cache eviction.

  • Note:
    • Follow the 20% Traffic Rule
    • Caching will lead to headaches like Invalidate Cache, so master every game so you don’t have to use the phrase ‘this is Feature’.

Bingo, .. It’s quite a lot, hopefully with this article, I will share my knowledge about Designing an effective system, helping you to have a perspective, and pay attention to it when catching hands on projects.

Share the news now

Source : Viblo