Apache Spark Vs. Apache Flink Vs. Apache Kafka Vs. Apache Storm! Data Streaming Tools Compared!

Поделиться
HTML-код
  • Опубликовано: 18 дек 2024

Комментарии • 12

  • @streambased
    @streambased 6 месяцев назад +3

    Imagine if every topic in Kafka had infinite retention. Instant data lake!

    • @thedataguygeorge
      @thedataguygeorge  6 месяцев назад +1

      That's the dream!

    • @myaps1859
      @myaps1859 Месяц назад

    • @ov3rcl0cked
      @ov3rcl0cked 29 дней назад

      This concept kind of exists in Pulsar. You can move old data into a tiered solution, basically old logs move out to S3 or something similar, or even just to cheaper disks. It's not perfect, because it doesn't really have a good backup and recovery mechanism, but the data offloaded to S3 is still live and available in Pulsar it just has a higher latency. We basically use an approach like this for important telemetry data. Pulsar is very functionally similar to Kafka, it even has a kafka client dropin replacement and a kafka protocol extension so you can use it in place of kafka. Apparently there is work being done to store all Pulsar state into S3 for the exact purpose of creating a data lake directly off Pulsar with minimal latency increase. I do agree though this concept is super interesting, because we use Pulsar and Kafka as messaging systems, but also find a lot of utility in using them for event storage and replay. We have to combine a lot of data into materialized views from various live sources, and being able to play back data has been very useful.

  • @thedailyepochs338
    @thedailyepochs338 5 месяцев назад

    Love the video . Have a question though, do you think a broad understanding of Java is needed to work with Apache Kafka in production. Thanks

    • @thedataguygeorge
      @thedataguygeorge  4 месяца назад

      I would say not really, can use python instead if you prefer for sure

    • @thedailyepochs338
      @thedailyepochs338 4 месяца назад

      @@thedataguygeorge Thanks for the response. Please can you share any Kafka or Flink tutorials you know. All Kafka or Flink courses I have seen requires knowing Java

  • @rikirolly
    @rikirolly 6 месяцев назад +1

    Add Redpanda to the comparison :D

    • @thedataguygeorge
      @thedataguygeorge  6 месяцев назад

      I'll get it in the next one!

    • @ov3rcl0cked
      @ov3rcl0cked 29 дней назад

      Redpanda is functionally the same as kafka, it's primary design goal is to be a drop in kafka replacement.