The key thing to mention is that If sensor metrics are spread across multiple partitions, they can still be read out of order, so in-order is only guaranteed within the partition.
RabbitMQ has durable queues, where the mesages are written to disk, they can also be replicated to other nodes. RabbitMQ also has "Streams" which are queue-like in that you can pop and read messages in order, but the messages are not deleted once read, meaning another consumer can start from the beginning (or any point in the stream) and read the messages again.
Great! I should make the caveat though that you should definitely check the official docs of both technologies at this point, because it is very possible that they have evolved significantly!
I'm interviewing several SWE for a position in my company and this video helps with preparing questions. Thanks!I am interviewing multiple software engineers for a position at my company, and this video is helpful for preparing interview questions. Thank you!
Low latency backbone used by trading firms beat these two by wide margins. I guess these are designed for high throughput whereas LLBs are designed for low latency.
I guess u r talking about ultra low latency full stack systems which uses combination of HW and SW to get really low latencies. Solace is one of such vendor. These systems have very limited use cases and mostly they are for trading. Kafka and RabbitMQ are all about scale and provides good enough latencies sufficient for wide range of use cases.
Interesting video. As for the advantages of the log based systems, the metrics example seems a bit of a stretch since one could use a memory based system and include a precise timestamp as part of the message's payload. The second example is more appropriate.
This was a really helpful overview of the messaging systems. Can you do a video that goes more into the implementation details of the message brokers you discussed? For instance, answer to questions like how and under what conditions does Kafka guarantee exactly once message delivery and what sort of guarantees these message brokers give in case of failures etc.
The brokers themselves don't really give any guarantees I'd say, but rather the broker in combination with a stream processing framework is where you start seeing those. Hopefully the stream processing video clears that one up a bit.
amazing topics, and video been thinking about learning more about rabbit mq and kafka for awhile. wanted to know the diff. this answers my question in mind.
I really enjoyed this video, it clear the burden about the different between them. Can you make another video about how RabbitMQ and Kafka solve there's cons like those in your conclusion? i really appreciate it.
Wait I think RabbitMQ has some failover strategies? I think they also have quorum queues and classic queues where quorum queues actually have a persist log on disk? Not 100% sure about what I am saying though, maybe a bit too specific to RMQ. someone correct me if I am wrong.
You may be right - when I talk about these technologies I try to just focus on "in memory" versus "log based" because specific implementations can change all the time. It's very possible that depending on how you configure RMQ you can do any of those things.
Wanted to clarify the same thing. RMQ supports durable queues but just to be clear that means saving messages in the queue not yet ACK'd when the system goes down or memory is unavailable. Doesn't mean full replay.
Well assuming we can handle all of our load with one message broker, we can use a round robin in memory topic. For kafka, there's no round robining, you just consume each message, and need more partitions, one per consumer. You can do the same for ActiveMQ, but that's going to be less of a perfect job mapping than just round robining based on which consumer is available.
6:20 For log-based message broker as set up in the video, both consumers read from m1 to m4? So it's not a job dispatch where consumers share the workload?
That's correct, that's not what log based brokers are for. If you wanted that, you could partition the log based broker, or use something like a topic on a JMS broker.
It's common practice with memory based brokers to have deadletter queues. If a message fails to process N times, it fails over to a 2nd queue, that is a "failed messages" queue. These can be re-driven at any point. Kind of mitigates some of the downside you were talking about with them.
@@jordanhasnolife5163 totally valid! Ordering is the biggest part IMO. If you’re talking about like 60 day old events expiring from memory queues, then yea I guess, but 🤷🏻♂️. From my experience, it’s just assumed that SQS for example is durable, even though it’s mem-based.
@@Dozer456123 That's a good point! From my own experience using them, you also need to be careful of slow consumers, as you can also run out of memory pretty easily on a broker.
A bit confused what you mean here. The consumer is basically just polling the log based broker, so if it drops, we have its last offset, and it can come back whenever.
RabbitMQ has a different storage options that are log-based. Calling it an "in memory" doesn't sound right. Not sure about others in a list but it could be the same case with them.
Yeah I always make this videos with the caveat that they could very well be incorrect. When a technology is open source, it can adapt significantly over time, and additionally people like to throw in a lot of features so that they have more to advertise for the product. A better title for the video would be "in memory vs. log based message brokers", but I just like to clickbait because I'm a narcissist.
@@jordanhasnolife5163 instead of apologizing, you are doubling down on your mistake. way to go! This means I need to take everything you say with a pinch of salt. So much for building a reputation.
@@reallylordofnothing it's technology, I don't know what to tell you, things change. In a few years from now, this could all be different. Anyways, as you mentioned, I'm a stranger on the internet, and I do make mistakes, and no one should fully trust what another person is saying anyways. Best of luck in your studies
@@reallylordofnothingrabbitmq is just an example moron. And it traditionally was used and works the way Jordan explained, which is what you will be asked in a message interview. Deep dives into features of specific technologies are irrelevant to this discussion
I think SQS must be a log-based message broker, it allows replay, and also has message retention, 1 message by 1 handler. Could u explain why it is considered an in-memory broker?
I don’t see anything inherent in the log based queue that would limit throughput - except the slower reads from the disk. I feel like the semantics of the queue should be independent of the way it is stored - if that makes sense.
I was thinking about this more and I suppose the limits on memory could be one reason why the semantics are different. When you’re processing from memory you need to clear items out or the queue gets too big and you’ll get thrashing / crashing. I’ve seen this happen with RabbitMQ at work.
Ah, this is amazing, I have soome junior developers that I am training on MS Architeture we are now at the integration point and wanted to see which is better from our case. It seems we might be using both options for us. Thanks
Great videos + How someone so young can have so much knowledge .. but bro why do you have named it -> Systems Design Interview 0 to 1 ? Systems Design 0 to 1 could have been an apt name .. don't you think? its more useful than just for interviews..
I think it's harder to speak authoritatively about actually systems because there's a lot more "it depends" and existing company considerations going into making those choices :). Plus, these things change so fast that for the purposes of accuracy about the technologies my videos are good for interviews, but could be incorrect IRL
For a visual learner like me, I understand but the way you scribbled all over the board. It can throw people off. Just a bit of advice maybe you can take some goodness from. Great video though man!
In my last google design interview, interviewer asked me to not draw any diagram. Just speak. Now I am practicing explaining without diagrams. But yes for most people diagrams helps to understand.
@@narendrayadav71Damn, companies are getting more vigorous in the recruiting department. It's hard to explain these concepts without diagrams. You can understand it using words only but at some point the listener or even the one explaining can skip things or forget.
I'm confused by having multiple consumers on a log-based message broker, if every one has to go through every message then isn't that a lot of redundant work?
Sometimes you want to do redundant work if you want two different systems building up state based on the messages. Other times, you might only need one consumer as you've mentioned
6:16 if its a message queue and last message processed by consumer B was M2 then according to Queue next message to process should be M1 but you are saying next message to be processed is M3. This is confusing explain / correct me if I'm wrong
Im building out a game with a microservice architecture and an eventbus system. I want to know some thoughts on the idea of using both kafka AND rabbitmq. I would use kafka for the user auth service where all user data will be saved to disk to maintain reliable syncing with other services at all costs. My thought is that the auth service is by far the most critical service and needs a robust system in place. I am ok with the processing time taking a bit longer as this is just for the signing up/registering a user process(this only happens one time per user). Then for all other game related events, it will be handled with rabbitmq to utilize the in memory speeds. I understand that setting up/maintaining these two systems will be more difficult, but overall do you think this idea has some merit to it?
Well, you may not love my answer, but considering this is a personal project, I think that you're building for scale that you don't need :) I'd probably just use some existing OAuth service and for the game related events you could always just have the server connect to all clients via websockets.
In practice this may be true, I'd have to look at benchmarks. I think that from an interviewing perspective though, knowing the architectural differences between them and what that means (at least in theory) is important.
Yeah, have to look into nats more, we use it in some places at my company and I believe it's an in memory broker, but off the top of my head don't know the advantages/disadvantages to it, though I've heard it can be more aggressive to kick slow consumers
@@jordanhasnolife5163 it has both, memory and jetstream(kafka like, store and fwd). Super lightweight and low maintenance. Don’t know why people still use Kafka.
The key thing to mention is that If sensor metrics are spread across multiple partitions, they can still be read out of order, so in-order is only guaranteed within the partition.
Yep!!
partitioning can be on deviceId, so readings of same device can be in same parititon
yah, they need to be properly aggregated, perhaps based on timestamps
RabbitMQ has durable queues, where the mesages are written to disk, they can also be replicated to other nodes. RabbitMQ also has "Streams" which are queue-like in that you can pop and read messages in order, but the messages are not deleted once read, meaning another consumer can start from the beginning (or any point in the stream) and read the messages again.
Thanks for sharing! For any of these open source technologies, I imagine they all sort of converge over time
Throwing together a design doc and this video was quite helpful, thanks. Also, P!
Great! I should make the caveat though that you should definitely check the official docs of both technologies at this point, because it is very possible that they have evolved significantly!
You helped me so much with this video, your explanations are clear and made me really understand the core of it
I'm interviewing several SWE for a position in my company and this video helps with preparing questions. Thanks!I am interviewing multiple software engineers for a position at my company, and this video is helpful for preparing interview questions. Thank you!
Low latency backbone used by trading firms beat these two by wide margins.
I guess these are designed for high throughput whereas LLBs are designed for low latency.
I'll be honest I've never heard of these and I work in trading I'll give it a look
I guess u r talking about ultra low latency full stack systems which uses combination of HW and SW to get really low latencies. Solace is one of such vendor. These systems have very limited use cases and mostly they are for trading. Kafka and RabbitMQ are all about scale and provides good enough latencies sufficient for wide range of use cases.
@@jordanhasnolife5163 you a quant?
Interesting video.
As for the advantages of the log based systems, the metrics example seems a bit of a stretch since one could use a memory based system and include a precise timestamp as part of the message's payload.
The second example is more appropriate.
just started the video and already heard 2 jokes! well done, this is IT I am looking for!)
This was a really helpful overview of the messaging systems. Can you do a video that goes more into the implementation details of the message brokers you discussed? For instance, answer to questions like how and under what conditions does Kafka guarantee exactly once message delivery and what sort of guarantees these message brokers give in case of failures etc.
The brokers themselves don't really give any guarantees I'd say, but rather the broker in combination with a stream processing framework is where you start seeing those. Hopefully the stream processing video clears that one up a bit.
amazing topics, and video been thinking about learning more about rabbit mq and kafka for awhile. wanted to know the diff. this answers my question in mind.
they are not interchangable , different use cases
Insightful and crisp video. Thanks.!
I really enjoyed this video, it clear the burden about the different between them. Can you make another video about how RabbitMQ and Kafka solve there's cons like those in your conclusion? i really appreciate it.
I think these are cons inherent in their design. Solving them tends to be avoiding them by choosing another solution :)
Wait I think RabbitMQ has some failover strategies? I think they also have quorum queues and classic queues where quorum queues actually have a persist log on disk? Not 100% sure about what I am saying though, maybe a bit too specific to RMQ. someone correct me if I am wrong.
You may be right - when I talk about these technologies I try to just focus on "in memory" versus "log based" because specific implementations can change all the time. It's very possible that depending on how you configure RMQ you can do any of those things.
You are correct. RabbitMQ persist messages if both queue and message are marked as durable.
Wanted to clarify the same thing. RMQ supports durable queues but just to be clear that means saving messages in the queue not yet ACK'd when the system goes down or memory is unavailable. Doesn't mean full replay.
Why having separate message queue for each consumer, would reduce throughput for in-memory MQ 4:00 , but increase throughput for log-based MQ 6:22 ?
Well assuming we can handle all of our load with one message broker, we can use a round robin in memory topic. For kafka, there's no round robining, you just consume each message, and need more partitions, one per consumer. You can do the same for ActiveMQ, but that's going to be less of a perfect job mapping than just round robining based on which consumer is available.
6:20 For log-based message broker as set up in the video, both consumers read from m1 to m4? So it's not a job dispatch where consumers share the workload?
That's correct, that's not what log based brokers are for. If you wanted that, you could partition the log based broker, or use something like a topic on a JMS broker.
It's common practice with memory based brokers to have deadletter queues. If a message fails to process N times, it fails over to a 2nd queue, that is a "failed messages" queue. These can be re-driven at any point. Kind of mitigates some of the downside you were talking about with them.
This is true, but ordering still isn't guaranteed. And the DLQ is still in memory, so I don't believe it's durable
@@jordanhasnolife5163 totally valid!
Ordering is the biggest part IMO. If you’re talking about like 60 day old events expiring from memory queues, then yea I guess, but 🤷🏻♂️. From my experience, it’s just assumed that SQS for example is durable, even though it’s mem-based.
@@Dozer456123 That's a good point! From my own experience using them, you also need to be careful of slow consumers, as you can also run out of memory pretty easily on a broker.
@@Dozer456123sqs is a cloud service though, that makes it different
With log based, what happens if a consumer ungracefully drops out? Is there a timeout where it’ll assume that consumer is no longer subscribed?
A bit confused what you mean here. The consumer is basically just polling the log based broker, so if it drops, we have its last offset, and it can come back whenever.
RabbitMQ has a different storage options that are log-based. Calling it an "in memory" doesn't sound right. Not sure about others in a list but it could be the same case with them.
Yeah I always make this videos with the caveat that they could very well be incorrect.
When a technology is open source, it can adapt significantly over time, and additionally people like to throw in a lot of features so that they have more to advertise for the product. A better title for the video would be "in memory vs. log based message brokers", but I just like to clickbait because I'm a narcissist.
are you talking about Durable queues?
@@jordanhasnolife5163 instead of apologizing, you are doubling down on your mistake. way to go! This means I need to take everything you say with a pinch of salt. So much for building a reputation.
@@reallylordofnothing it's technology, I don't know what to tell you, things change. In a few years from now, this could all be different.
Anyways, as you mentioned, I'm a stranger on the internet, and I do make mistakes, and no one should fully trust what another person is saying anyways.
Best of luck in your studies
@@reallylordofnothingrabbitmq is just an example moron. And it traditionally was used and works the way Jordan explained, which is what you will be asked in a message interview. Deep dives into features of specific technologies are irrelevant to this discussion
This was an amazing explanation thx
I think SQS must be a log-based message broker, it allows replay, and also has message retention, 1 message by 1 handler. Could u explain why it is considered an in-memory broker?
Cant speak to the sqs internals exactly as it is closed source, but I was under the impression sqs was memory and kinesis was log, could be wrong tho
Best video so far imho
Great video and explanation! Subbed 🙌
awesome work man, keep it up!
I don’t see anything inherent in the log based queue that would limit throughput - except the slower reads from the disk. I feel like the semantics of the queue should be independent of the way it is stored - if that makes sense.
Makes sense to me, but at least for jms vs Kafka they're inherently different paradigms.
I was thinking about this more and I suppose the limits on memory could be one reason why the semantics are different. When you’re processing from memory you need to clear items out or the queue gets too big and you’ll get thrashing / crashing. I’ve seen this happen with RabbitMQ at work.
@@firezdog Absolutely - if you buffer too much you'll bring down the broker
Audio and video sync issue? 🤔
It's possible, I fully edited this thing with my sound off
@@jordanhasnolife5163 I have this problem on all your videos FWIW. Love the content though.
Which of the two types is Amazon SNS? I believe Log based, as it delivers each message in order and to each of the consumer, not Round robin based?
I'm not sure, was originally under the impression kinesis was log based and SNS was jms
SNS has pub sub , SQS has producer consumer.. so its SQS
@@ManiBalajiC Ah missed this said SNS, yeah no clue
Ah, this is amazing, I have soome junior developers that I am training on MS Architeture we are now at the integration point and wanted to see which is better from our case. It seems we might be using both options for us. Thanks
Great videos + How someone so young can have so much knowledge .. but bro why do you have named it -> Systems Design Interview 0 to 1 ? Systems Design 0 to 1 could have been an apt name .. don't you think? its more useful than just for interviews..
I think it's harder to speak authoritatively about actually systems because there's a lot more "it depends" and existing company considerations going into making those choices :). Plus, these things change so fast that for the purposes of accuracy about the technologies my videos are good for interviews, but could be incorrect IRL
For a visual learner like me, I understand but the way you scribbled all over the board. It can throw people off. Just a bit of advice maybe you can take some goodness from. Great video though man!
Appreciate it, thanks Rod!
In my last google design interview, interviewer asked me to not draw any diagram. Just speak.
Now I am practicing explaining without diagrams.
But yes for most people diagrams helps to understand.
@@narendrayadav71Damn, companies are getting more vigorous in the recruiting department.
It's hard to explain these concepts without diagrams. You can understand it using words only but at some point the listener or even the one explaining can skip things or forget.
I'm confused by having multiple consumers on a log-based message broker, if every one has to go through every message then isn't that a lot of redundant work?
Sometimes you want to do redundant work if you want two different systems building up state based on the messages. Other times, you might only need one consumer as you've mentioned
Very clear and easy explanation.❤
6:16 if its a message queue and last message processed by consumer B was M2 then according to Queue next message to process should be M1 but you are saying next message to be processed is M3. This is confusing explain / correct me if I'm wrong
probably just a poor explanation on my part, messages are processed FIFO
You don’t actually mention if Kafka is the log or message type vs rabbit
Kafka is the log
Nice mustache
Thanks Mr. Cheeto, unfortunately I can't actually grow facial hair so now I'm left with this abomination
Great explanation!
Im building out a game with a microservice architecture and an eventbus system. I want to know some thoughts on the idea of using both kafka AND rabbitmq. I would use kafka for the user auth service where all user data will be saved to disk to maintain reliable syncing with other services at all costs. My thought is that the auth service is by far the most critical service and needs a robust system in place. I am ok with the processing time taking a bit longer as this is just for the signing up/registering a user process(this only happens one time per user). Then for all other game related events, it will be handled with rabbitmq to utilize the in memory speeds. I understand that setting up/maintaining these two systems will be more difficult, but overall do you think this idea has some merit to it?
Well, you may not love my answer, but considering this is a personal project, I think that you're building for scale that you don't need :)
I'd probably just use some existing OAuth service and for the game related events you could always just have the server connect to all clients via websockets.
can't in memory broker implement multi consumer group?
I'm not sure and I imagine it depends on which broker implementation you're referring to, but what's the significance of this?
Great explaination!
Gangster stuff
I'm certainly trapping upon these streets
Thanks, dude 👍
Great video thanks for sharing
You can have round robin in kafka using null partition key
I'll have to look into that, afaik every consumer still sees every message but I'm sure there's some way to do it
Great video
Kafka has much better performance than in memory rabbit mq
In practice this may be true, I'd have to look at benchmarks.
I think that from an interviewing perspective though, knowing the architectural differences between them and what that means (at least in theory) is important.
gold
Thanks Jordan.
Great video.
Please get a life.
😄😄
Never!
Great video. Clear and on point. Thank you. 🫡
solid vid
Better option is nats.io
Yeah, have to look into nats more, we use it in some places at my company and I believe it's an in memory broker, but off the top of my head don't know the advantages/disadvantages to it, though I've heard it can be more aggressive to kick slow consumers
@@jordanhasnolife5163 it has both, memory and jetstream(kafka like, store and fwd). Super lightweight and low maintenance. Don’t know why people still use Kafka.