Kafka vs. RabbitMQ - who wins and why? | Systems Design Interview 0 to 1 with Ex-Google SWE

Поделиться
HTML-код
  • Опубликовано: 19 ноя 2024
  • НаукаНаука

Комментарии • 101

  • @VijayJain
    @VijayJain 10 месяцев назад +36

    The key thing to mention is that If sensor metrics are spread across multiple partitions, they can still be read out of order, so in-order is only guaranteed within the partition.

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад +1

      Yep!!

    • @pankajjahagirdar1278
      @pankajjahagirdar1278 6 месяцев назад +6

      partitioning can be on deviceId, so readings of same device can be in same parititon

    • @RaviRanjan_ssj4
      @RaviRanjan_ssj4 4 месяца назад

      yah, they need to be properly aggregated, perhaps based on timestamps

  • @M3t4lstorm
    @M3t4lstorm 4 месяца назад +12

    RabbitMQ has durable queues, where the mesages are written to disk, they can also be replicated to other nodes. RabbitMQ also has "Streams" which are queue-like in that you can pop and read messages in order, but the messages are not deleted once read, meaning another consumer can start from the beginning (or any point in the stream) and read the messages again.

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 месяца назад +2

      Thanks for sharing! For any of these open source technologies, I imagine they all sort of converge over time

  • @rhodyborn
    @rhodyborn 10 месяцев назад +11

    Throwing together a design doc and this video was quite helpful, thanks. Also, P!

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад +1

      Great! I should make the caveat though that you should definitely check the official docs of both technologies at this point, because it is very possible that they have evolved significantly!

  • @comingfall6348
    @comingfall6348 5 месяцев назад +3

    You helped me so much with this video, your explanations are clear and made me really understand the core of it

  • @Mactwinz105a
    @Mactwinz105a 3 месяца назад +1

    I'm interviewing several SWE for a position in my company and this video helps with preparing questions. Thanks!I am interviewing multiple software engineers for a position at my company, and this video is helpful for preparing interview questions. Thank you!

  • @2005kpboy
    @2005kpboy Год назад +7

    Low latency backbone used by trading firms beat these two by wide margins.
    I guess these are designed for high throughput whereas LLBs are designed for low latency.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад +10

      I'll be honest I've never heard of these and I work in trading I'll give it a look

    • @yashkhd1100
      @yashkhd1100 Год назад +8

      I guess u r talking about ultra low latency full stack systems which uses combination of HW and SW to get really low latencies. Solace is one of such vendor. These systems have very limited use cases and mostly they are for trading. Kafka and RabbitMQ are all about scale and provides good enough latencies sufficient for wide range of use cases.

    • @ROFEL
      @ROFEL Год назад

      @@jordanhasnolife5163 you a quant?

  • @misamee75
    @misamee75 5 месяцев назад +1

    Interesting video.
    As for the advantages of the log based systems, the metrics example seems a bit of a stretch since one could use a memory based system and include a precise timestamp as part of the message's payload.
    The second example is more appropriate.

  • @dariashevchenko5609
    @dariashevchenko5609 4 месяца назад +3

    just started the video and already heard 2 jokes! well done, this is IT I am looking for!)

  • @indraneelghosh6607
    @indraneelghosh6607 11 месяцев назад +3

    This was a really helpful overview of the messaging systems. Can you do a video that goes more into the implementation details of the message brokers you discussed? For instance, answer to questions like how and under what conditions does Kafka guarantee exactly once message delivery and what sort of guarantees these message brokers give in case of failures etc.

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 месяцев назад +3

      The brokers themselves don't really give any guarantees I'd say, but rather the broker in combination with a stream processing framework is where you start seeing those. Hopefully the stream processing video clears that one up a bit.

  • @quirkyquester
    @quirkyquester 7 месяцев назад +3

    amazing topics, and video been thinking about learning more about rabbit mq and kafka for awhile. wanted to know the diff. this answers my question in mind.

    • @FilterChain
      @FilterChain 4 месяца назад

      they are not interchangable , different use cases

  • @Aditigoyal1997
    @Aditigoyal1997 9 месяцев назад +3

    Insightful and crisp video. Thanks.!

  • @tienat299
    @tienat299 4 месяца назад +1

    I really enjoyed this video, it clear the burden about the different between them. Can you make another video about how RabbitMQ and Kafka solve there's cons like those in your conclusion? i really appreciate it.

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 месяца назад +1

      I think these are cons inherent in their design. Solving them tends to be avoiding them by choosing another solution :)

  • @david6851
    @david6851 Год назад +5

    Wait I think RabbitMQ has some failover strategies? I think they also have quorum queues and classic queues where quorum queues actually have a persist log on disk? Not 100% sure about what I am saying though, maybe a bit too specific to RMQ. someone correct me if I am wrong.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад +4

      You may be right - when I talk about these technologies I try to just focus on "in memory" versus "log based" because specific implementations can change all the time. It's very possible that depending on how you configure RMQ you can do any of those things.

    • @varshard0
      @varshard0 9 месяцев назад +4

      You are correct. RabbitMQ persist messages if both queue and message are marked as durable.

    • @goldmund67
      @goldmund67 8 месяцев назад +3

      Wanted to clarify the same thing. RMQ supports durable queues but just to be clear that means saving messages in the queue not yet ACK'd when the system goes down or memory is unavailable. Doesn't mean full replay.

  • @beecal4279
    @beecal4279 2 месяца назад +1

    Why having separate message queue for each consumer, would reduce throughput for in-memory MQ 4:00 , but increase throughput for log-based MQ 6:22 ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад

      Well assuming we can handle all of our load with one message broker, we can use a round robin in memory topic. For kafka, there's no round robining, you just consume each message, and need more partitions, one per consumer. You can do the same for ActiveMQ, but that's going to be less of a perfect job mapping than just round robining based on which consumer is available.

  • @2sourcerer
    @2sourcerer 3 месяца назад +1

    6:20 For log-based message broker as set up in the video, both consumers read from m1 to m4? So it's not a job dispatch where consumers share the workload?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад +1

      That's correct, that's not what log based brokers are for. If you wanted that, you could partition the log based broker, or use something like a topic on a JMS broker.

  • @Dozer456123
    @Dozer456123 2 месяца назад +1

    It's common practice with memory based brokers to have deadletter queues. If a message fails to process N times, it fails over to a 2nd queue, that is a "failed messages" queue. These can be re-driven at any point. Kind of mitigates some of the downside you were talking about with them.

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад

      This is true, but ordering still isn't guaranteed. And the DLQ is still in memory, so I don't believe it's durable

    • @Dozer456123
      @Dozer456123 2 месяца назад +1

      @@jordanhasnolife5163 totally valid!
      Ordering is the biggest part IMO. If you’re talking about like 60 day old events expiring from memory queues, then yea I guess, but 🤷🏻‍♂️. From my experience, it’s just assumed that SQS for example is durable, even though it’s mem-based.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Месяц назад

      @@Dozer456123 That's a good point! From my own experience using them, you also need to be careful of slow consumers, as you can also run out of memory pretty easily on a broker.

    • @timothyh1965
      @timothyh1965 Месяц назад

      @@Dozer456123sqs is a cloud service though, that makes it different

  • @pejpm
    @pejpm 5 месяцев назад +1

    With log based, what happens if a consumer ungracefully drops out? Is there a timeout where it’ll assume that consumer is no longer subscribed?

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 месяцев назад +1

      A bit confused what you mean here. The consumer is basically just polling the log based broker, so if it drops, we have its last offset, and it can come back whenever.

  • @vadimc4812
    @vadimc4812 10 месяцев назад +4

    RabbitMQ has a different storage options that are log-based. Calling it an "in memory" doesn't sound right. Not sure about others in a list but it could be the same case with them.

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад +4

      Yeah I always make this videos with the caveat that they could very well be incorrect.
      When a technology is open source, it can adapt significantly over time, and additionally people like to throw in a lot of features so that they have more to advertise for the product. A better title for the video would be "in memory vs. log based message brokers", but I just like to clickbait because I'm a narcissist.

    • @medievalogic
      @medievalogic 8 месяцев назад +1

      are you talking about Durable queues?

    • @reallylordofnothing
      @reallylordofnothing 5 месяцев назад +1

      @@jordanhasnolife5163 instead of apologizing, you are doubling down on your mistake. way to go! This means I need to take everything you say with a pinch of salt. So much for building a reputation.

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 месяцев назад +1

      @@reallylordofnothing it's technology, I don't know what to tell you, things change. In a few years from now, this could all be different.
      Anyways, as you mentioned, I'm a stranger on the internet, and I do make mistakes, and no one should fully trust what another person is saying anyways.
      Best of luck in your studies

    • @timothyh1965
      @timothyh1965 Месяц назад

      @@reallylordofnothingrabbitmq is just an example moron. And it traditionally was used and works the way Jordan explained, which is what you will be asked in a message interview. Deep dives into features of specific technologies are irrelevant to this discussion

  • @andrebrandao690
    @andrebrandao690 9 дней назад +1

    This was an amazing explanation thx

  • @tungthanh8192
    @tungthanh8192 4 месяца назад +1

    I think SQS must be a log-based message broker, it allows replay, and also has message retention, 1 message by 1 handler. Could u explain why it is considered an in-memory broker?

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 месяца назад

      Cant speak to the sqs internals exactly as it is closed source, but I was under the impression sqs was memory and kinesis was log, could be wrong tho

  • @poketopa1234
    @poketopa1234 4 месяца назад +1

    Best video so far imho

  • @databasemadness
    @databasemadness 4 месяца назад +1

    Great video and explanation! Subbed 🙌

  • @thequang9234
    @thequang9234 5 месяцев назад +1

    awesome work man, keep it up!

  • @firezdog
    @firezdog 7 месяцев назад +1

    I don’t see anything inherent in the log based queue that would limit throughput - except the slower reads from the disk. I feel like the semantics of the queue should be independent of the way it is stored - if that makes sense.

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 месяцев назад

      Makes sense to me, but at least for jms vs Kafka they're inherently different paradigms.

    • @firezdog
      @firezdog 7 месяцев назад +1

      I was thinking about this more and I suppose the limits on memory could be one reason why the semantics are different. When you’re processing from memory you need to clear items out or the queue gets too big and you’ll get thrashing / crashing. I’ve seen this happen with RabbitMQ at work.

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 месяцев назад

      @@firezdog Absolutely - if you buffer too much you'll bring down the broker

  • @prashantshubham
    @prashantshubham Год назад +5

    Audio and video sync issue? 🤔

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад

      It's possible, I fully edited this thing with my sound off

    • @smoran02
      @smoran02 7 месяцев назад

      @@jordanhasnolife5163 I have this problem on all your videos FWIW. Love the content though.

  • @krish000back
    @krish000back 3 месяца назад +1

    Which of the two types is Amazon SNS? I believe Log based, as it delivers each message in order and to each of the consumer, not Round robin based?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад

      I'm not sure, was originally under the impression kinesis was log based and SNS was jms

    • @ManiBalajiC
      @ManiBalajiC 2 месяца назад +1

      SNS has pub sub , SQS has producer consumer.. so its SQS

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад

      @@ManiBalajiC Ah missed this said SNS, yeah no clue

  • @devops_junkie9203
    @devops_junkie9203 5 месяцев назад +1

    Ah, this is amazing, I have soome junior developers that I am training on MS Architeture we are now at the integration point and wanted to see which is better from our case. It seems we might be using both options for us. Thanks

  • @Bruh468
    @Bruh468 11 месяцев назад +3

    Great videos + How someone so young can have so much knowledge .. but bro why do you have named it -> Systems Design Interview 0 to 1 ? Systems Design 0 to 1 could have been an apt name .. don't you think? its more useful than just for interviews..

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 месяцев назад +1

      I think it's harder to speak authoritatively about actually systems because there's a lot more "it depends" and existing company considerations going into making those choices :). Plus, these things change so fast that for the purposes of accuracy about the technologies my videos are good for interviews, but could be incorrect IRL

  • @rodfrazier1125
    @rodfrazier1125 9 месяцев назад +10

    For a visual learner like me, I understand but the way you scribbled all over the board. It can throw people off. Just a bit of advice maybe you can take some goodness from. Great video though man!

    • @jordanhasnolife5163
      @jordanhasnolife5163  9 месяцев назад +1

      Appreciate it, thanks Rod!

    • @narendrayadav71
      @narendrayadav71 4 месяца назад

      In my last google design interview, interviewer asked me to not draw any diagram. Just speak.
      Now I am practicing explaining without diagrams.
      But yes for most people diagrams helps to understand.

    • @lugebeatzz8747
      @lugebeatzz8747 2 месяца назад

      ​@@narendrayadav71Damn, companies are getting more vigorous in the recruiting department.
      It's hard to explain these concepts without diagrams. You can understand it using words only but at some point the listener or even the one explaining can skip things or forget.

  • @JLJConglomeration
    @JLJConglomeration 4 месяца назад +1

    I'm confused by having multiple consumers on a log-based message broker, if every one has to go through every message then isn't that a lot of redundant work?

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 месяца назад

      Sometimes you want to do redundant work if you want two different systems building up state based on the messages. Other times, you might only need one consumer as you've mentioned

  • @kinshaabid3063
    @kinshaabid3063 6 месяцев назад +1

    Very clear and easy explanation.❤

  • @sarudon5615
    @sarudon5615 8 месяцев назад +1

    6:16 if its a message queue and last message processed by consumer B was M2 then according to Queue next message to process should be M1 but you are saying next message to be processed is M3. This is confusing explain / correct me if I'm wrong

    • @jordanhasnolife5163
      @jordanhasnolife5163  8 месяцев назад +1

      probably just a poor explanation on my part, messages are processed FIFO

  • @OurPastSecrets
    @OurPastSecrets 6 месяцев назад +1

    You don’t actually mention if Kafka is the log or message type vs rabbit

  • @fallencheeto4762
    @fallencheeto4762 Год назад +7

    Nice mustache

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад +4

      Thanks Mr. Cheeto, unfortunately I can't actually grow facial hair so now I'm left with this abomination

  • @abdelrahmanemam6478
    @abdelrahmanemam6478 6 месяцев назад +2

    Great explanation!

  • @AlexBowman-1999
    @AlexBowman-1999 11 месяцев назад +1

    Im building out a game with a microservice architecture and an eventbus system. I want to know some thoughts on the idea of using both kafka AND rabbitmq. I would use kafka for the user auth service where all user data will be saved to disk to maintain reliable syncing with other services at all costs. My thought is that the auth service is by far the most critical service and needs a robust system in place. I am ok with the processing time taking a bit longer as this is just for the signing up/registering a user process(this only happens one time per user). Then for all other game related events, it will be handled with rabbitmq to utilize the in memory speeds. I understand that setting up/maintaining these two systems will be more difficult, but overall do you think this idea has some merit to it?

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 месяцев назад +4

      Well, you may not love my answer, but considering this is a personal project, I think that you're building for scale that you don't need :)
      I'd probably just use some existing OAuth service and for the game related events you could always just have the server connect to all clients via websockets.

  • @LUN-bo2fb
    @LUN-bo2fb 5 месяцев назад +1

    can't in memory broker implement multi consumer group?

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 месяцев назад

      I'm not sure and I imagine it depends on which broker implementation you're referring to, but what's the significance of this?

  • @monishchhadwa777
    @monishchhadwa777 9 месяцев назад +1

    Great explaination!

  • @htm332
    @htm332 Год назад +11

    Gangster stuff

  • @nedotraxxxx
    @nedotraxxxx 4 месяца назад +1

    Thanks, dude 👍

  • @rembautimes8808
    @rembautimes8808 3 месяца назад

    Great video thanks for sharing

  • @joshidev.dev88
    @joshidev.dev88 Год назад +1

    You can have round robin in kafka using null partition key

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад

      I'll have to look into that, afaik every consumer still sees every message but I'm sure there's some way to do it

  • @benagarr
    @benagarr 6 месяцев назад +1

    Great video

  • @cloud_architector
    @cloud_architector 10 месяцев назад +1

    Kafka has much better performance than in memory rabbit mq

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад +2

      In practice this may be true, I'd have to look at benchmarks.
      I think that from an interviewing perspective though, knowing the architectural differences between them and what that means (at least in theory) is important.

  • @medievalogic
    @medievalogic 8 месяцев назад +1

    gold

  • @radonspace2098
    @radonspace2098 6 месяцев назад +1

    Thanks Jordan.
    Great video.
    Please get a life.
    😄😄

  • @phsaurav
    @phsaurav 9 месяцев назад +1

    Great video. Clear and on point. Thank you. 🫡

  • @harris1801
    @harris1801 3 месяца назад +1

    solid vid

  • @VickyGYT
    @VickyGYT 4 месяца назад +1

    Better option is nats.io

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 месяца назад

      Yeah, have to look into nats more, we use it in some places at my company and I believe it's an in memory broker, but off the top of my head don't know the advantages/disadvantages to it, though I've heard it can be more aggressive to kick slow consumers

    • @VickyGYT
      @VickyGYT 4 месяца назад

      @@jordanhasnolife5163 it has both, memory and jetstream(kafka like, store and fwd). Super lightweight and low maintenance. Don’t know why people still use Kafka.