Explaining kafka configuring producer

Tram Ho

  • acks
    • ack = 0: the producer will not wait to receive a response from the broker, before considering whether the send is successful or failed. This means that if an error occurs and the broker does not receive the message, the producer will not know about the status of the message, the message may be lost. However, because it does not wait to receive results from the server, its ability to process messages is extremely fast.
    • ack = 1: producer will get success result from broker, at which time leader will receive message. If the message is not written to the leader, the producer will receive an error as a result and they can try to resend the message, avoiding data loss. In this case the processing throughput depends on whether we send the message synchronously or asynchronously. If we handle waiting for the message response from the server, the delay increases significantly. If you handle callbacks, the latency will be reduced but the number of messages sent will be limited in one go (eg how many messages are sent in the producer before receiving replies from the server)
    • ack = all: is the producer that will receive the successful result from the broker, all the replicas will receive the message at the same time, this mechanism is a safe model, you can be sure that at least one broker will receive it. message from the producer and will sync when a crash occurs. However, the latency is high, so we will wait for more than one broker to receive the message.
  • buffer-memory
    • is the total number of bytes of memory used for storage, while waiting for the message to be sent to the server. If the message sent to the application is faster than the process of sending the message to the server, the producer will block for a period of max.block.ms and then throw an exception.
    • This setting must correspond to the total amount of memory used by the producer, but is not hard-fixed and can be additionally buffered. Some additional memory will be used for data compression and some will be used for operation.
  • retries
    • When the producer receives an error message from the server, the error may be temporary (missing the leader of the partition). The value of parameter retries is to manage how many times the producer will try to send the message back to the server before returning an error to the client.
    • By default the producer after sending the first error message it will wait 100ms to resend, but we can manage and change it.
  • batch-size
    • When multiple records are sent to the same partition, the producer batches them together. This parameter is to manage the amount of memory used for each batch. When the batch is full, all messages will be sent.
    • However, this does not mean that you have to wait for the full size to be sent, the producer will send half or only 1 message will also be sent. So config batch.zise is not the cause of the delay.
  • linger-ms
    • is to manage the time for waiting for the add message to arrive in the current batch. Producer will send batch when linger-ms is reached or full. By default, the producer sends the message as soon as possible when it checks for a sender available, even if there is only one. Need to configure linger-ms greater than zero, as producer will need few milliseconds to add message to batch before sending to broker, this property increases latency but also increases kafka throughput.


Share the news now

Source : Viblo