(Kafka) Choosing the Right Hardware

Tram Ho

Why do we care about choosing hardware?

  • Choosing the configuration of Kafka Broker is actually not difficult, but it also requires art, even it is a scientific study.
    • Because there are no strict requirements for choosing hardware configuration for Kafka.
    • It can work well on all different systems
    • The main concern is its performance.
  • However, there will be a few factors that will contribute to its overall performance such as:
    • Disk Throughput
    • Capacity
    • Memory…
  • Once you determine which type of performance is most important for your own system, it is possible to choose the optimal hardware.

Disk Throughput

  • The performance of the producer-client will be directly affected by the throughput of the broker disk , which is used for storing log-segments . Kafka must be committed to the local-strorage when it is produced, all clients must wait until the last broker confirms committed then send a signal indicating that the message was delivered. This means that when the disk writes fast, it creates extremely low latency for your own system.
  • The obvious decision when it comes to disk throughput selection is the traditional use of spinning-drive-hards(HDD) or solid-state-disks(SSD).
    • SSDs have extremely fast search and access -> provide good performance.
    • HDDs: another option, which is more economical as well as offers better capacity per unit. To be able to improve the performance of HDDs, by using multiple HDDs in one broker. By configuring the data directory independently.

Disk Capacity

  • Another part that also needs attention is Dung Luong . The amount of disk storage also needs to be determined by how many messages need to be stored remaining at each time.
  • If the broker wants to receive about 1TB of traffic per day, as well as save it for 7 days -> the broker needs at least 7TB to be used for log storage.
  • You should have at least 10% of the other files, or should also reserve for traffic fluctuations for some time when the system is more used.
  • Storage capacity should also be considered when sizing a Kafka Cluster, which can then be scaled up. => The total amount of traffic of the cluster must be balanced when there are many partitions for each topic, when the storage capacity happens to be insufficient, more can be added.

Memory

  • Part of normal consumer operation in Kafka is reading messages from the ends of partitions where consumers are lagging (too many messages to process). If so in this case the consumer needs to read the message from the page cache, which results in faster message reading than reading from disk. So having more memory cache will increase the performance of the consumer client.
  • In fact, Kafka does not need too much heap memory for configuring the Java Virtual Machine (JVM). It can even handle “X messages per second and a data rate of X megabits per second can run with a 5 GB heap”.
  • The rest of the memory is actually used for the page cache . The advantage is that the system can cache the “log segment” used. This is the main reason why it is advisable to install Kafka running separately, not sharing it with any other application -> contributing to its performance increase.
Share the news now

Source : Viblo