Kafka - Exactly once semantics with Matthias J. Sax

Поделиться
HTML-код
  • Опубликовано: 26 июл 2024
  • Hey Everyone,
    In this episode I and Matthias did a deep dive into the world of exactly once semantics in Kafka. We discussed how Kafka has implemented EOS and how failures of different components are handled.
    This talk will also provide you different patterns Kafka uses for maintaining reliability, durability and correctness in the system.
    Enjoy!
    Chapters:
    00:00 Introduction
    03:48 What is Exactly once semantics in Kafka?
    07:39 Building blocks of Exactly once semantics
    10:20 Why is exactly once a hard problem?
    12:35 Idempotent producer - how is it implemented?
    24:45 What happens when the producer crashes?
    32:45 What is a transaction in Kafka world?
    33:30 Transactional producer - how it is implemeted?
    48:12 What happens when a producer fails? Abort?
    51:19 What happens when a broker dies?
    55:28 How is high watermark used?
    58:48 EOS in stream processing
    01:07:25 Topology changes in stream processing
    01:18:00 Use cases of EOS
    I hope you liked the discussion and learnt about EOS in kafka. Please don't forget to hit the like button and subscribe to channel. More and more talks coming soon.
    Important links:
    Matthias: / mjsax
    Confluent blog: www.confluent.io/blog/exactly...
    Enabling EOS in Kafka Streams: www.confluent.io/blog/enablin...
    My Discord Server: / discord
    Connect with me on Linkedin: / kaivalya-apte-2217221a
    Join my discord server, where I go LIVE frequently.
    Cheers,
    The GeekNarrator

Комментарии • 13

  • @shyamagrawal6161
    @shyamagrawal6161 Год назад +4

    One of the best practical session on Kafka .. Thanks to both of you for efforts and valuable time

  • @JardaniJovonovich192
    @JardaniJovonovich192 Год назад +5

    This is a brilliant video on EOS in Kafka. Thanks to you and Matthias!
    A couple of questions,
    1. I did not completely understand why a watermark is even required if a consumer can only read committed events. Could you please shed some light on what a watermark is and why is it needed exactly?
    2. If a microservice is spawned(and running) on 5 Kubernetes pods, that means, each pod would have the same code running inside it. In this code, assume that the kafka producer code with transactional semantics is also running. Now, in this case, how do I give a separate transaction_id to each of the pods by ensuring the code is one and the same,
    Do I have to provide `producer.transaction_id = random_generated_id ` ?

    • @TheGeekNarrator
      @TheGeekNarrator  Год назад +2

      “Reply from Matthias (for some reason his comments are not visible)”
      Glad you liked the episode!
      1) We did not cover all details about the “last stable offset” (LSO) watermark. To give a brief answer: when sending a batch of messages to the consumer, the broker needs to attach some metadata that the consumer needs to filter out aborted messages. If the batch would contain pending messages, the broker cannot compute this metadata.
      2) Yes, you need a unique id per pod. Using a random one might work but could have some disadvantages. A better way might be to derive the id using some unique pod metadata (you would need to deploy as a stateful set). Check out www.confluent.io/kafka-summit-sf18/deploying-kafka-streams-applications/ for more details.
      Hope this helps.

    • @JardaniJovonovich192
      @JardaniJovonovich192 Год назад +1

      @@TheGeekNarrator
      Thanks, Kaivalya and Matthias, that makes sense now. One more question related to Kafka configs in general,
      Which among the parameters `max.partition.fetch.bytes`, `max.poll.interval.ms`, `max.poll.records`, `fetch.max.bytes` needs to be tuned to increase the throughput of a Kafka streams consumer? also a brief about each parameter would really help, official documentation is a bit confusing to understand :(

    • @TheGeekNarrator
      @TheGeekNarrator  Год назад +2

      Matthias is unable to comment because of some restrictions so I am posting his reply:
      In general, I would recommend you to ask questions like this on StackOverflow, the Kafka user mailing list, or the Confluent channels (Developer Forum or Community Slack).
      To answer your question:
      If a broker hosts multiple partitions that a consumer reads from, the consumer sends a single fetch request to the broker, requesting data from all those partitions, and the broker will send back a single batch of data (that may contain messages from multiple of those partitions). The parameter `fetch.max.bytes` controls the maximum overall response size from the broker. The config `max.partition.fetch.bytes` controls how much data the broker can add to the response per partition. Ie, this second parameter can be used if you want to limit the data of a single partition per round-trip. In general, `fetch.max.bytes` is the parameter you might want to tune to increase throughput.
      `max.poll.interval.ms` defines how frequently `poll()` must be called, and has nothing to do with throughput. It basically effects the liveness check for the consumer, and it's also related to rebalancing. Lastly, `max.poll.records` controls how many record a single `poll()` call may return. Again, it has nothing to do with throughput. If a fetch requests, returns 1000 records in the respons, and you set `max.poll.records` to 100, you would need to call `poll()` 10 time to consumer the full response. Ie, 9 of the `poll()` calls will be served from the consumer in-memory buffer without the need to fetch data from the broker. `max.poll.records` is related to `max.poll.interval.ms`: if you allow more record to be returned from `poll()` you need more time to process the data and thus you will call `poll()` less frequent (ie, you spent more time between two `poll()` calls). Ie, either decreasing `max.poll.records` or increasing `max.poll.interval.ms` can help you to not drop out of the consumer group due to timeouts.
      Hope this helps.

    • @JardaniJovonovich192
      @JardaniJovonovich192 Год назад +1

      @@TheGeekNarrator Thank you very much Matthias and Kaivalya. This definitely helps

  • @VJBHARADWAJ
    @VJBHARADWAJ Год назад +1

    👌👌👌

  • @pietrogalassi1113
    @pietrogalassi1113 Год назад +2

    Brilliant video!!! Some questions:
    1. If i have a long running execution with external system calls (let's say from 2 minutes to 50 minutes) which is the best approach to handle them with Kafka Streams and EOS ?
    2. Is EOS also applicable to simple kafka consumer/producer (they're used internally by kstreams) and if so how to do it in pair with long running executions ?
    Thanks!

    • @TheGeekNarrator
      @TheGeekNarrator  Год назад +2

      Thank you @Pietro Galassi. Regarding your questions, here’s my take:
      1) In my experience using Kafka streams for long running execution isn’t ideal (I would let Matthias correct me). Mainly because while you are executing a task (read processing an event) the partition is blocked (waiting for previous execution). I would go for a queuing system like SQS for such cases.
      2) IIUC EOS is applicable using streams API only. Using low level consumer, producer APIs you can get idempotent behaviour.
      I will let Matthias know about the question.

    • @TheGeekNarrator
      @TheGeekNarrator  Год назад +1

      Reply from Matthias. Unfortunately he is unable to reply directly because of some issue. 👇
      The main point for EOS is of course, that side-effects are not covered. Ie, if the external call modifies some external state, those modifications might be done more than once (if the TX fails and and retried). -- Also, there is the TX-timeout: you would need to increase it, implying that you block downstream consumers because LSO does not advance. So you should not combine external calls and EOS.
      For plain consumer/producers: it's a big "it depends". For "plain" consumer/producer there is not EOS, because it's impossible. Read_Committed implies you won't read aborted data, but you still might read committed data multiple times. For the producer, you can use the TX-API, but EOS is more than using TX. -- I actually submitted a Kafka Summit talk (for London 2023) about EOS and will cover the details there if the talk gets accepted.

  • @rajatahuja4720
    @rajatahuja4720 Год назад

    How to implement exactly Kafka consumer because ocnsumer can die before ack the broker and when it comes back it can read the message again ? how can we implement Kafka consumer which is EOS?

    • @TheGeekNarrator
      @TheGeekNarrator  Год назад

      Hi Rajat, I guess we discussed that scenario here. Didn’t we? With the transactional and idempotent producer and read committed consumer you achieve EOS. You can find more details on this blog. Let me know if you have specific questions www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/