Kafka Deep Dive w/ a Ex-Meta Staff Engineer

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024

Комментарии • 128

  • @user-vq7cu7cs3w
    @user-vq7cu7cs3w Месяц назад +16

    1 Hello interview video = 100 exponents and medium articles thanks a ton for these

  • @ediancomachio2783
    @ediancomachio2783 Месяц назад +27

    You have amazing teaching skills! The World Cup example was incredibly good and entertaining to watch. I’ve paused my interview journey for now, but I always watch your videos for the pure knowledge they provide. Thank you so much!

  • @scarlettc123
    @scarlettc123 Месяц назад +5

    relied on your videos heavily while preparing for my system design interview and accepted my staff engineer offer today. you're doing the lord's work by not putting this content behind a paywall. will recommend your stuff whenever someone asks me for interview prep resources in the future. 🙏🙏

    • @hello_interview
      @hello_interview  Месяц назад

      Amazing work! So happy to help and thanks for sharing your story with us.

  • @aghileslounis
    @aghileslounis Месяц назад +2

    It's not just for an interview at this point but a VERY high quality Kafka course!
    God, you're so talented 😲
    I don't know if it's just me, but it's EXACTLY how I like to learn new things, diagrams, some code and a high quality high level overview. The rest I'll figure it out easily.
    People will love your courses if you decide to make some, it's very rare for someone of your level to take time to explain that well

  • @MrSnackysmorez
    @MrSnackysmorez 29 дней назад +1

    I cannot thank you guys enough for putting these videos together! The way you lay the points out and provide the information really goes well with my learning style. Please keep them coming as I cannot get enough of these. Your content is the best out there in terms of teaching of teaching system design!

  • @XzcutioneR2
    @XzcutioneR2 29 дней назад +3

    These are super cool! I would love to see more of such deep dives into topics like Elastic Search, Flink, and Distributed Databases like Cockroach DB

  • @karvinus
    @karvinus Месяц назад +1

    I always struggled to choose between queues, stream and pub-sub but this video makes it super easy to understand what to use and when to use.

  • @amitagarwal8006
    @amitagarwal8006 Месяц назад +1

    Absolutely loved it! Especially the part explaining how different systems can utilise it. Waiting for more of these!!

  • @Pockykuma
    @Pockykuma Месяц назад +5

    I feel like I am committing a crime to watch this for free. Keep it up, Evan!

  • @abhilashbandi3866
    @abhilashbandi3866 Месяц назад +3

    Superb. Someone who has just theoretical knowledge on Kafka helped me understand this "topic" little better. Request for a video on ZooKeeper (I think Kafka moved away from ZooKeeper to kRaft)

    • @hello_interview
      @hello_interview  Месяц назад +1

      Yah exactly right re-kRaft. Consensus something I maybe should have mentioned but, while key to internals, not really necessary to know about in an interview.

    • @abhilashbandi3866
      @abhilashbandi3866 Месяц назад

      @@hello_interview Thank you for the videos. Do interviews at staff/ principal level focus on consensus? At least, glance them?

  • @sushilprusty2766
    @sushilprusty2766 14 дней назад

    You have saved my time by short , beautiful and point to point answer

  • @SunilKumar-jl6dl
    @SunilKumar-jl6dl Месяц назад +1

    I have used Kafka a lot, but this video just enforced the nitty gritty details. Great content!

  • @SantoshKumar2
    @SantoshKumar2 Месяц назад +1

    Most awaited topic. Thank you for the detailed and insightful video on Kafka. Your every video is a gold mine. 🙏🏻 ❤

  • @MrDianitaChan
    @MrDianitaChan 24 дня назад

    Thanks so much for putting this video. I love the way you explain everything. Keep it up the good work

  • @prashantsalgaocar
    @prashantsalgaocar Месяц назад +2

    amazing discussions and pointers as always.. Evan... always look forward to your videos..

  • @randymujica136
    @randymujica136 Месяц назад +1

    Great explanation as always! I keep watching your videos over and over. As a previous commenter mentioned, it would be great a deep dive on ZooKeeper, it’s mentioned many times in Orchestration/Coordination scenarios along with Consistent Hashing and I think it would be valuable to understand how it works.

    • @hello_interview
      @hello_interview  Месяц назад +1

      Feel free to vote for what you want to see next here! www.hellointerview.com/learn/system-design/answer-keys/vote

    • @randymujica136
      @randymujica136 Месяц назад

      @@hello_interview done

  • @adhirajbhattacharya8574
    @adhirajbhattacharya8574 26 дней назад

    Your process for teaching is amazing. Diagrams and perfect balance of high-level and low-level info, using the deep dives. Anyone interested to know more has enough base info to search themselves.
    Please make more of the technology deep dives. Also, if you could do some of difficult core concepts deep dives.
    Elasticsearch, mongodb, cassandra, graph dbs, something detailed on available load balancers, rate limiter, api gateways implementations deep dive. Geoindex or spatial index.

  • @alby_tho
    @alby_tho Месяц назад +1

    First!
    Shoutout to this channel! It really prepared me for all my system design interviews this cycle

  • @JohnCF
    @JohnCF Месяц назад

    Great video! I'm glad to have found the youtube channel of HelloInterview. A lot more practically useful content and advice for actual system design interviews, compared to other channels in youtube.

  • @goyalpankaj237
    @goyalpankaj237 Месяц назад

    This is hands down the best Kafka explanation I have seen so far :)

  • @Amin-wd4du
    @Amin-wd4du 14 дней назад

    The most important part about limitations around number of consumers and partitions were not covered.

  • @ruleind
    @ruleind Месяц назад

    The mock interviews were very useful for me!
    Your content is the best! Do keep pushing out the content!

  • @redheart97
    @redheart97 12 дней назад

    Amazing content!! Would love to see more deep dives. Maybe into some common AWS tools used in system design interviews.

    • @hello_interview
      @hello_interview  12 дней назад

      There’s a dynamodb write up on our website!

  • @mohammednisar1994
    @mohammednisar1994 6 дней назад

    I feel like a pro already! nice job

  • @annesmith9070
    @annesmith9070 22 дня назад

    This is great - keep them coming - if you produce it I'll consume it!

  • @3rd_iimpact
    @3rd_iimpact Месяц назад

    Listening via AUX while I’m driving. Love it. Curious to see it visually later.

  • @YoungY2-uu9rj
    @YoungY2-uu9rj Месяц назад +1

    Thank you. Love every video published so far.

  • @SaurinShah1
    @SaurinShah1 Месяц назад +2

    Thank you for all you guys do!

  • @danielryu6527
    @danielryu6527 Месяц назад

    I have a system design tomorrow and this is a perfect timing to watch!

  • @notrequired28
    @notrequired28 Месяц назад

    Thanks Evan, nicely explained with enough depth. Can you consider adding a section why Kafka is fast even though it is durable (disk vs in-memory)? Also, a common decision point is to choose from different alternatives, for example in this case Kafka vs Kinesis, Kafka vs RabbitMQ etc; can you add when not to use Kafka and look for alternatives?

  • @fadygamilmahrousmasoud5863
    @fadygamilmahrousmasoud5863 Месяц назад

    very very very insightful, keep this amazing working on this series up.
    Thanks.

  • @NikhilJain2013
    @NikhilJain2013 Месяц назад

    Nicely explained specifically the diff between topic and partition. Glad, you are making videos on system design. I have a doubt.
    As per your explanation, we create a queue for a consumer group for a topic which we called as partition, to scale more, we create more partitions of a single partition in different Brokers and same consumer group will be getting data from different brokers for the partitions which we created. Please let me know if my understanding is correct?
    topic = events
    consumer group = A, B
    partitions (queues) = events_A and events_B
    to scale more, we distribute events_A to 2 brokers, broker 1 and broker 2, some events will go to Broker 1 and same will go Broker 2
    now consumer group A will be getting data from events_A Queue (partition of Broker 1 ) and events_A Queue (partition of Broker 2 )

  • @AvneetSingh_011
    @AvneetSingh_011 Месяц назад

    Your videos are so informative and helpful , will love to see more videos from your side

  • @harrylee27
    @harrylee27 Месяц назад

    With SQS, you probably don't need a retry topic, the attempts are tracked in main topic, and you can configure when the retry attempts exceeds some threshold, to put the message into DLQ. Also consumer just tell SQS whether the message got processed, if failed or timeout, SQS will make the message visible again and increase the attempts count, and SQS will put message into DLQ if needed, not consumer.

  • @WyattsDeBestDad
    @WyattsDeBestDad Месяц назад

    In terms of Horizontal Scaling, from the accompanying article:
    "Horizontal Scaling With More Brokers: The simplest way to scale Kafka is by adding more brokers to the cluster. This helps distribute the load and offers greater fault tolerance. Each broker can handle a portion of the traffic, increasing the overall capacity of the system. It's really important that when adding brokers you ensure that your topics have sufficient partitions to take advantage of the additional brokers. More partitions allow more parallelism and better load distribution. If you are under partitioned, you won't be able to take advantage of these newly added brokers."
    My understanding is that Kafka can be scaled horizontally dynamically, perhaps as a system sees an unanticipated increase in volume. If thats correct, does the above imply that partitions can be added dynamically too? In the example cited, LeBron James campaign, I took that to mean that you'd add extra partitions for that campaign in anticipation of the additional traffic. In the case of hot partitions, can one of the prescribed techniques ( say random salting or compound keys ) be added on the fly? If this is non trivial can you maybe link to how this is achieved?
    Thanks so much!

    • @hello_interview
      @hello_interview  Месяц назад

      In general, these are things handled by managed versions of Kafka, such as AWS MSK or Confluent Cloud. How they dynamically scale depends on each managed service. Typically, handling hot partitions is still not managed dynamically and requires conscious effort on the part of the developer.

  • @benie871
    @benie871 Месяц назад

    Thanks so much for these deep dive videos 🙌. Also your system design videos are very helpful in my learning journey.

  • @bangbang86
    @bangbang86 21 день назад

    ElasticSearch deep dive would be great

  • @RaviChoudhary_iitkgp
    @RaviChoudhary_iitkgp Месяц назад

    thanks for amazing explanation & deep dive into kafka :)

  • @RezaZulfikarNaipospos-v4u
    @RezaZulfikarNaipospos-v4u День назад

    please create use case for Hybrid Cloud Architecture. example an mobile retail application (on cloud) connect to branch system (branch can run on offline mode too) :D

  • @fayezabusharkh3987
    @fayezabusharkh3987 Месяц назад

    Thank you! Great explanation as always

  • @akshayjhamb1022
    @akshayjhamb1022 Месяц назад

    For handling kafka consumer down we could turn manual commit offset on, There's option of AutoCommit Offset and a timer limit also when we should autocommit. Though Great video for revision for kafka. Also it would have been great if you had mentioned number of kafka consumer application limitations based on number of partitions.

  • @trueinviso1
    @trueinviso1 Месяц назад

    Love these deep dives, thanks!

  • @firezdog
    @firezdog Месяц назад

    i tried to summarize the example, but i'm not convinced i'm getting to crux of why events might be processed out of order:
    Example. Imagine we have a website covering a live event and we want to display up-to-date news as it occurs. There will be a producer (maybe a reporter on a keyboard) and a consumer. The producer will put updates on a queue and the consumer will process them and put them on the site. What happens if the number of events increases dramatically? For example, what if instead of covering one live event, our website wants to cover 10 live events?
    A single queue might not be enough to handle so many events (memory pressure), so our producer could start publishing to multiple queues. If we have a pattern of events
    A > B > C > D
    it's possible they will be distributed between the queues as
    Q1: A C
    Q2: B D
    In particular, the consumer might process A and B first. Network issues then delay the arrival of C so that it does not get to Q1 until after D arrives on Q2. Assuming the consumer isn't going to wait for Q1 to fill up and continues to work on Q2 (because that's the only source of work), the events are processed as
    A > B > D > C
    We can solve this problem by associating one queue with one event type. We may not publish updates between events in the order in which they occurred, but supposing D represented scoring a goal and C kicking a ball, we'll never publish events in an order that reverses the causality.
    At some point, though, we'll have so many events that a single consumer cannot keep up with them. But if we scale consumers, we need to be able to coordinate their work. We have to make sure, for example, that consumers don't process the same events. To fix this, we might try to distinguish our consumers into groups, each of which is responsible for handling a single queue (and incidentally preserving the ordering of items on that queue). (We need to guarantee events are processed at most and ideally exactly once.)
    The final problem we might run into is that dividing queues between specific live events might not give us a sufficient level of granularity if the events are of different kinds (soccer vs. football). In that case, we need to distinguish groups of queues as well according to *topic*.
    These examples show some of the use cases for Kafka: we want ensure that a stream of heterogeneous events is processed in order while partitioning those events at diferent levels of granularity.
    * Note: the example depends on network issues and may seem a little bit contrived. In fact, the mere interleaving of events between queues is enough to produce out of order reporting if the consumer cannot coordinate the way in which it processes events with the producer. I think the main point comes across if we can convince ourselves that without some additional framework, the order in which the consumer consumes will only coincide with the order in which the producer produces if we're lucky.

  • @mohamedessammorsy967
    @mohamedessammorsy967 Месяц назад

    Really amazing, You explanied it really well,
    Thanks for the great effort :)

  • @MASTERISHABH
    @MASTERISHABH 3 дня назад

    Hey, I might be wrong but that batch time and size is not possible in kafkajs lib out of the box as every send works based on the provided ack and based on this it continues with rest of the code so batching messages won't get ack and thus won't work in js this way.
    Although, it does support sendBatch separately but if we have a API then that's not directly possible to batch unless we write a custom function to store messages as obj in js side and run periodically to flush out messages to Kafka but still size batching in js won't be so easy as per my understanding.
    Let me know if I'm missing something here.
    P.S: Talking about 39:28

  • @anuragtiwari3032
    @anuragtiwari3032 Месяц назад

    Liked even though I haven't watched the video. I know it will be a banger !

  • @RezaZulfikarNaipospos-v4u
    @RezaZulfikarNaipospos-v4u День назад

    how we monitoring kafka? what's metric we can focus set for alerting?

  • @user-og7ho4dd9u
    @user-og7ho4dd9u 23 дня назад

    I am working on a project recently that I want to process events asynchronous but in order. I am thinking of using Kafka/kinesis. How do I ensure that the two events are actually ingested into Kafka in order? What if event A ingestion is delayed with some network issue and event B which happened later got ingested before A?

  • @mouleeswarkothandaraman7095
    @mouleeswarkothandaraman7095 6 дней назад

    Hi Evan, great content! One question- how about using a time series DB like Influx or Prometheus for aggregation by time slices? Will that work?

  • @fatemehrezaei3727
    @fatemehrezaei3727 Месяц назад

    Thank you so much. Love your channel. Please provide a deep dive on Redis too. 🙏

  • @SujeetBanerjee-b9g
    @SujeetBanerjee-b9g 12 дней назад

    [22:32] What's flink - is that an alternative to Reddis? Is that a design for scalable "Leaderboard" type of application?

  • @shikharupadhyay7435
    @shikharupadhyay7435 22 дня назад

    Great video Evan.......

  • @ziake
    @ziake Месяц назад

    This is so great thanks a lot. One question with the diagram at @43:24. Is it actually possible to have the leader and followers of a partition be in the same broker? I thought that with 2 brokers as in the example, the max replication factor is 2, where the leader and follower are in each brokers

  • @patrickshepherd1341
    @patrickshepherd1341 Месяц назад

    I don't know if you'll see this, but please read if you do. I've been having a lot of trouble lately, and I would really value even just some very brief advice.
    I've been watching your videos a lot to prepare for an upcoming interview. I'm a PhD computer scientist, but I left professorship about a year ago to work in industry. I've been REALLY unsuccessful in landing anything, but there's never any feedback, so I don't know what I'm doing wrong. Just a lot of anonymous rejections. I'm literally facing bankruptcy. I have an interview on Wednesday, and so I'm learning all I can about modern system design. I'm hopeful, but trying not to get too excited. I'm really thankful for your videos and all the information you provide, though. It's really helping.
    Hypothetical question: if you became a single parent with no education at 24, but buckled down and raised your kid and took care of your sick mom and made it through an undergrad and a phd over the course of 10 years, and maybe don't have a huge professional footprint because your grad work + all the course materials you made don't add up to a very impressive github profile, what can you do to stand out more? Keeping a house running and raising a kid and taking care of a sick parent doesn't get many stars on your repos, but I think it speaks a lot to adaptability, perseverance, problem solving acumen, etc. But those aren't things you can typically bring up in a cover letter or resume, and you're not supposed to talk about it in interviews, so it just looks like I haven't done anything serious outside of my grad software. It feels like a catch 22.
    Can you give me any advice? I could wallpaper a house with all the rejects/ignores I've gotten.
    Thank you again for your great videos!

    • @hello_interview
      @hello_interview  Месяц назад

      Really sorry to hear what you've been going through; it's not easy. We work with so many candidates lately who are similarly struggling to land a job in this market. I wish I had a silver bullet, but unfortunately not many novel insights I can offer here. Referrals help. I'd try to leverage connections as best you can if the main challenge is getting through the door to the first interview.
      Beyond that, look for companies with take-home assessments as the first round. This widens the number of candidates that get a shot and puts things back in your control; just need to crush the take-home.

  • @chrisgu4121
    @chrisgu4121 22 дня назад

    great video! one question, how does Kafka handle exact once delivery? is it good enough to set the idempotence on the producer to ensure that?

  • @ninlar-codes
    @ninlar-codes Месяц назад

    Excellent job on this. This is so helpful. I'm familiar with Azure Service Bus and Azure Event Hubs, since we use the Azure Stack. With Azure Event Hubs, the consumers maintain their own bookmark or offset into each partition, so they can choose when to checkpoint and/or replay events / records / messages if needed. Does Kafka have something similar? If I commit the offset to Kafka, but I want to replay events due to data loss, can I reset my offset?

  • @hbhavsi
    @hbhavsi Месяц назад

    Amazing video, thanks so much for sharing! The person in the Redis video mentioned 5 key technologies that are either most common or one should know. Do you guys plan to cover the other 3 after Redis and Kafka? That would be AMAZING!! :) Which ones are those that you guys were referring to?

    • @hello_interview
      @hello_interview  Месяц назад +1

      Planning content on ElasticSearch, Postgres, and Dynamo next. Some internal debate about #5 but you'll see those sometime in the coming weeks.

    • @hbhavsi
      @hbhavsi Месяц назад

      @@hello_interview amazing, thank you so much!!

  • @maazshaikh7905
    @maazshaikh7905 27 дней назад

    can you kindly recommend any research papers about kafka that students can use for academics in order to learn about the history/development of kafka, some live case studies and further improvements in the field.

  • @Nick-lw7rj
    @Nick-lw7rj Месяц назад

    First off, thank you for these videos and resources, they are very valuable to anyone studying for interviews.
    I'm curious though, how would you improve the interview process as someone who's been on both sides of it for a number of years?
    I question the value of these interviews given that people are being asked to design massive systems, for billions of users, engineered by hundreds/thousands of people over a number of years, which were iteratively improved over time. They're expected to have a pretty ideal solution by having researched the problem or similar ones ahead of time, or much less often, having faced similar problems themselves. If someone was asked to actually design a system in a real production environment, they would spend ample time researching it ahead of time anyway, so I don't necessarily understand the value of them knowing it up front in an interview.
    I'm also curious how you would react if you were still interviewing people, and a candidate proposed a solution that's an identical or near-identical copy of yours. Would you pass them as long as they understood why each component is needed, and why certain technologies should be used over others? Would you have time to properly gauge that in a 45 minute interview once they've finished their design?

    • @hello_interview
      @hello_interview  Месяц назад +1

      That's a big topic! One that likely requires a full blog post.
      I will say that, in general, we agree. The interview process within big tech is stuck in a local minima and is in need of a facelift. But as long as the supply of engineers exceeds demand, there isn't much incentive for companies. Their hiring process may have poor recall, but if precision stays high, they don't really care.

    • @Nick-lw7rj
      @Nick-lw7rj Месяц назад

      @@hello_interview agreed about a needed facelift, until then, the grind continues :) thanks again for these

  • @MyAeroMove
    @MyAeroMove Месяц назад

    Very well structured!

  • @rjl-s5p
    @rjl-s5p Месяц назад

    In the section about using Kafka for messenger, how would the topics and partitions for a messaging application like Messenger be structured to achieve low latency and high throughput? For example, if there are 1 billion users on the platform, would there be one billion topics, or a single topic with a billion partitions, one for each user (which I don't think is possible since the recommendation is 4k partitions per broker and max of 200K per cluster)? Is there a different approach that could be considered? What are the tradeoffs for each option?
    And great video. Thank you for doing this.

    • @hello_interview
      @hello_interview  Месяц назад

      Some alternatives discussed here: www.hellointerview.com/learn/system-design/answer-keys/whatsapp

  • @rupeshjha4717
    @rupeshjha4717 Месяц назад

    Good going, please keep continuing this series!
    I had a question regarding consumer concurrency, which is not discussed in this video.
    Let's say I have 1 consumer group with 2 consumers running and topic having partition of 8, then each consumer will be assigned with 4 partitions when concurrency = 1, how the consumer gets affected if consumer concurrency is changed to 2 now ?

    • @yiannig7347
      @yiannig7347 Месяц назад

      Are you asking what happens if consumer threads are increased from 1 to 2 for a single consumer instance in a group? If so, the consumer is still a single client of the broker, like kafka-client-01 and kafka-client-02.
      With more threads, the consumer can process messages from its assigned partitions concurrently, improving throughput. However, it still handles the same number of partitions overall.

    • @hello_interview
      @hello_interview  Месяц назад +1

      Thanks for the assist!

  • @ItsMeIshir
    @ItsMeIshir Месяц назад

    Good video. Thanks for making it.

  • @dibll
    @dibll Месяц назад

    If we use a compound key of adId:userId, it will result in one partition per ad/per user. Is there is any concern having too many partitions each holding small number of messages?

    • @hello_interview
      @hello_interview  Месяц назад

      It’s consistent hashing on the partition key. So it’s not a new partition per ad:user pair.

  • @shrishtigupta6902
    @shrishtigupta6902 Месяц назад

    Thank you for this detailed video. But a quick question here - I'm still confused when to use RabbitMQ and when to use Kafka? Because both of them can be helpful for all the use cases

  • @sergei5104
    @sergei5104 Месяц назад

    I have a question: If I want to consume a message and then perform a long-lasting task (like web crawling) before committing the offset, does it mean that I need to have a configuration where the number of consumers is strictly equal to the number of partitions to avoid duplicate readings of the same message?

    • @hello_interview
      @hello_interview  Месяц назад

      Nope, just gave them as part of the same consumer group.

  • @ayosef
    @ayosef День назад

    Thank you very much!
    Which tool are you using for whiteboard? Looks very clean!

  • @aforty1
    @aforty1 Месяц назад

    Thank you!

  • @htm332
    @htm332 Месяц назад

    re: the ticketmaster example. Partitioning a single event across multiple partitions would break fairness no?

  • @lilav.5945
    @lilav.5945 Месяц назад

    Excellent!

  • @AkashTyagiii
    @AkashTyagiii 25 дней назад

    Which application are you using for drawing this?

  • @konstantinwilleke6292
    @konstantinwilleke6292 Месяц назад

    Great resource!
    What’s the name of the drawing/diagram app?

  • @cedarparkfamily
    @cedarparkfamily Месяц назад

    I still able to see the advertisement

    • @hello_interview
      @hello_interview  Месяц назад

      Yah we’re struggling to figure that out. RUclips doesn’t make it easy

  • @quicktips3858
    @quicktips3858 Месяц назад

    To me, as a Brazilian, it's much more easy to understand using football... :D

    • @hello_interview
      @hello_interview  Месяц назад

      Hopefully your national team can get it together 😉

    • @quicktips3858
      @quicktips3858 Месяц назад

      @@hello_interview Let’s hope so!

  • @aravindravva3833
    @aravindravva3833 Месяц назад

    13:54 what is N ?