Hey Jordan, just wanted to say I just landed an E5 role at Meta and your videos were a big part of helping me get past the system design interview. It wasn't the only resource I used, but it was definitely a huge help, especially early on in my study when it helped me fill a lot of holes in my knowledge. Thanks so much for producing these videos!
@@aishwaryakala9653 Hello Interview blogs and videos are awesome. They also help with getting the logistics of how to structure and pace the interview. I watched System Design Fight Club videos, but the videos are so unfocused and long-winded you really have to watch them at high speed and skip over large parts. DDIA is a great read, although I can't directly attribute anything I did in any interviews to it (there isn't enough time usually), I do think it helped reinforce and internalize a lot of ideas I'd only skimmed. Also Kleppmann is also a great, engaging writer. I read Alex Xu's books and... idk personally I kind of hated them and found some of his solutions very questionable, but maybe that's just me. The writing isn't that great and some of it is a real slog to get through. It seemed to me like he didn't get anyone to proof-read or edit them. A lot of people like them though. Though even for material that is bad, reading it can still help because it stimulates your mind to think about what's wrong with it and come up with your own better solution. More important than any specific resource... practice! At least for Meta you have only 35 to 40 minutes after accounting for introductions and questions at the end. Write down checkpoints/milestones for how many minutes you can spend on each step of your system design and practice consistently hitting them. Being able to come up with a great design in 2 hours is pointless because they will cut you off long before then. I practiced with a random design prompt every day and a timer and made sure I was hitting those checkpoints. Also consider that in the real interview you will probably get interrupted sometimes. Also if you can afford it, consider some paid mocks.
@@aishwaryakala9653 Hello Interview blogs and videos are great. They also help with getting the logistics of how to structure and pace the interview. Various other random YT videos on topics I felt I didn't understand. I watched System Design Fight Club vids, but the videos are so unfocused and long-winded you really have to watch them at high speed and skip over large parts. DDIA is awesome, although I can't directly attribute anything I did in any interviews to it (there isn't enough time usually), I do think it helped reinforce and internalize a lot of ideas I'd only skimmed. Also Kleppmann is also a great engaging writer. I read Alex Xu's books and... idk personally I kind of disliked them and found some of his understanding of some systems a bit questionable, but maybe that's just me. The writing isn't always great and it's a slog to get through some parts. He needed better editing and proof-reading imo. A lot of people like them though. Even for material that you don't agree with, reading it can still help because it stimulates your mind to think about what's wrong with it and come up with your own better solution. But more important than any specific resource... practice! At least for Meta you have only 35 to 40 minutes after accounting for introductions and questions at the end. Write down checkpoints/milestones for how many minutes you can spend on each step of your system design and practice consistently hitting them. Being able to come up with a great design in 2 hours is pointless because they will cut you off long before then. Practice with a random design prompt every day and a timer and make sure you're hitting your time checkpoints. Also consider that in the real interview you will probably get interrupted sometimes. Also if you can afford it, consider some paid mocks.
Hey this was a helpful comment! I got E5 M too recently, and timing is the name of the game. Been through the same resources and agree with your opinions on the resources. I did one thing additionally for really getting better which was super random. Stumbled upon Cassandra deep dive book and went through first 150 or so pages. This was not specifically to learn Cassandra but I’d say that’s a side effect. The main knowledge is how a production level highly distributed system handles various topics like fault tolerance, coordination, and the issues arising out of them. The way it’s described in the book is just excellent from a writing perspective. This was more of a “foundational “ knowledge I gained around a lot of the concepts and it has helped me a lot. So would highly recommend
Great in depth video. loved the part that all unnecessary part of design or commonplace is not focussed on but only critical part which is about scale and concurrency and ordering problem, achieving atomicity without transactions. Thanks Jordan. I watch your videos only for system design. no need to go to other place.
Hi Jordan. Kafka ordering is only guaranteed within a partition. To get Kafka to have guarantee ordering therefore you need to configure a topic to have only one partitoin.
@@akshayyn Typically in kafka you're not interleaving which entries go to which consumer, each consumer tends to read all entries in its partition. But yes, technically events could be processed out of order if you did do that.
Great contents as always, thanks! Some questions if I may: 1) on the state machine replication for bid engine back up, isn’t it almost like a 2 phase commit? And because user has to wait for the backup writes/replication confirmation, how is it going to be fast? Or at least faster than Kafka? Imagine if you put the same in memory bid engine behind Kafka, isn’t it going to similar or even better? 2) on handling the hot bid part, I see although we can scale the readers (get the current bidding info), biding engine, a single machine most likely, is still the bottleneck and arguably does most of the heavy lifting? Acquire lock, decide on winning logic, send to backup/kafka and etc.
Yes, it basically is a two phase commit, except the message only has to get to kafka (I'm using a successful write there as an indication that the write is durable), as opposed to completely fully replicated in memory down stream. 2) Yeah, but there's not really a solution to this. If you need multiple things to choose an absolute ordering over them they've gotta go to one place. We could buffer in many different kafka queues and have the bid engine poll at the rate it's able to, but then this whole process becomes asynchronous, and I had wanted it to be synchronous.
@@jordanhasnolife5163 Not sure I fully get the argument for a synchrous process though, as long as the througput is high, which arguably async architecture might be better here, user experience would not be affected?
Jordan, how do you feel about the delivary framework that's suggested by vast amount of youtubers. Functional and nonfunctional requirement -> back of envelop estimation -> list down core entities -> API design -> high level design -> deep dive. I found you have your own style of delivary. and it is still smooth.
I used to do this in the 1.0 playlist. I found that it wasted a lot of time for me, and I think that most people are pretty capable of figuring out the APIs that you need fairly quickly. The high level design to deep dive is something I've considered doing more, which is why I tend to have these overview slides. I don't think giving a high level design without some initial discussion first makes much sense to me.
@@jordanhasnolife5163 Yes I think high level design always goes into very similiar form if you do it without discussion on how you are going to handle and store data. I am still thinking if I should draw a high level diagram in my upcoming interview or not. I did this in a mock interview, and eventually, the interview still ask me how data model looks like and how I handle race condition. I think put the high level design after presenting data model to do a quick summarization of current discussion may be a good idea.
thanks for the great video. One qq, how can a kafka queue have multiple consumers from 28:36 for "top-bid kafka queue"? I heard one partition of kafka can have only one consumer at a time. I guess we need to post the record into multiple partitions?(which parititions by the server it will be subscrbing..?) or relaistically, just change the last bit to in-memory queue like pubsub?
@@jordanhasnolife5163 Ahh thanks. I had a misconcept of kafka, I thought it was one consumer per partition, and after seeing your reply, and found out that one consumer per partition within a consumer group, where each consumer group keeps its own offset. Thanks!!
If interview is to design a heavy read light write, then sync is fine. But for heavy write&read, it can dangerous/hard to sell this sync design, all sync part should be doing early rejection only while those surviving bidding need be queued and processed async by a single node eventually. More like a flash sale problem, rejection starts from client/browser.
Thanks Jordan for an amazing upload as always! A question from the last slide - - what is the use of Auction DB? Is it just for the bidding engine to read and write some state that it needs? As looks like only bidding engine is interacting with it.
I think I missed it (or maybe i keep tuning out at the right time), but where is the actually collision case discussed where two people at the exact time submit the same bid price (and how is it determined who wins?). I understand the "we lock each bid coming in -> increment sequence -> ship to primary/backup engine for state, kafka for source of truth" but wouldn't the two concurrent writes be the same sequence id. I guess what I'm asking is what the collision principle is there. Either way, phenomenal video. I'm having a great time binging these before my interview weh.
@@jordanhasnolife5163 ah i see the difference there, so we are locking against the sequence number to avoid writing two of the same sequence ids? I can see it from a logistical perspective, but where does the fairness aspect come into play? If we have two bids at the exact same time, then wouldn't the winner be whoever acquires a lock first? I guess if it isn't up to us to decide who wins the tie breaker (for the sake of this exercise), and its just get lucky and your bid is accepted before the other one because you got the lock first, then I get how it is expected to work. Thanks for the follow up!
Very nice video, Jordan. One question I have is: how do we accurately restore the Auction state on the server if both the Bid Engine and Backup Engine go down?
At least for the time being, I'm planning to stick with distributed systems design. That being said, I'm sure I'll eventually fall into a rut, and once I do I may revisit these!
one quick question, how would we handle hot auctions near the deadline. When there are too many bids. In that case, we might need to have Kafka before our bidding engine
Hey Jordan, Why dont we use something like transactional outbox pattern for writing to kafka. Instead of the broker listening to db I can have some outbox table whose job is to send the data to kafka and this will achieve dual write with a very high throughput.
Thanks for the detailed overview. Quick question, do you think Redis Sorted Set collection would help to find the top bids ? And that can even offer as an backup option. Any thoughts pls
Hey thanks for this. I was wondering if Redis would be a good choice to run on Bid Engine, given it's atomic and single-threaded. Custom Lua scripts could have the logic to push the things in kafka. Do you see a downside to this?
Technically you'd want one for both. The bid gateway is likely going to be run by many servers which we can round robin to. Finding the right bidding engine will depend on the id of the auction.
One thing I wanna point out is that if the primary goes down, the backup server will have to wait to be caught up with all the kakfa messages in its partition before it can start serving requests. So we can have some unavailability, which is ok here since we are trading it for consistency. Another thing is that since kafka already guarantees ordering within a partition I don't think we need the sequence numbers. Since bid engine is evaluating the bids in a serial order through a critical section and also persists messages to kafka in that critical section, the ordering of messages in kafka will be consistent with the sequence numbers making them redundant. @jordan let me know if I missed something.
On reviewing the pseudocode it seems like we are writing to kafka in a background thread. I don't think thats feasible as it'll violate our durability guarantees. I think have to persist to kafka before we acknowledge a bid.
1) Totally agree that you'd have to wait for the backup to read all Kafka messages. You could mitigate this by sending messages to the backup first and then it puts them in Kafka, but there are tradeoffs there. The sequence numbers were just for the case where we publish to Kafka on a another thread
As for the separate Kafka thread thing, I registered a handler for when we get the ack at which point we return to the user. But then is it truly synchronous?
@@jordanhasnolife5163 Got it. But I wonder if there is any benefit to using a separate thread to publish to kafka. If we use a single bg thread that reads for the in memory queue and publishes to kafka, I'd say lets just use the main thread as pusblishing to kafka is the bottleneck so having multiple thread doesn't really improve response time to client. If we use multiple bg thread that reads for the in memory queue and publishes to kafka( which eliminates the main bottleneck through parallelism), then the messages will be out of order in kafka. Now sequence number do come to rescue here but consider a scenario where I successfully published sq no 5 but the primary died before publishing sq no 4. Since the determination of whether sq no 5 was accepted/rejected depends on sq no 4, loosing sq no 4 which sq no 5 is persisted will create inconsistency in our system. Interested in hear your thoughts on this.
@@jordanhasnolife5163 Got it. Though I am having trouble understanding what the benefit of having a separate thread to publish messages is. I don't think we can use multiple bg thread to publish to kafka because that would screw up ordering and can create inconsistency like sq no 5 is persisted while the primary died before persisting sq no 4. And since sq no 4 was used in the determination of sq no 5's accepted/rejected status this would lead to inconsistency. And if were using a single bg thread might as well use the main thread and publishing to kafka will be bottleneck in the bid processing. using a single background thread just changes where we wait i.e. do we wait before entering critical section(main thread scenario) vs after entering critcal section(bg thread scenario)
Love the content, can you also make a video on RUclips Analytics like Video counts, watch time with no double view on given time period and extensible to have new metrics for content creators.
@@tttrrrrr1841 Yeah I saw this on Leetcode only, can you add more details. You mean you used video chunk_id to get the count and its time_window can be used to multiply the count to get total watch time?
why can't we have multiple instances bid engine and have a distriburted lock using redis (update the bid inprogess).. will that work? May be another basic question, is backup the same instance of bid engine?
For your first question, that's gonna lead to a lot of contention since they're all trying to update the same bid object (at least that's what I think you're trying to say) #2 - not sure what you mean here. They're different servers
Hi jordan, i i believe, when a bid is being processed, then we are keep the price of the auction updated with highest what if,we can keep the bid id also in same auction table of auction db. now in case. we are keeping the auction db replicated also using leader follower with consesus algo. then we will not need backup engine at all even big engin(stateless) gose down. will this method work?
Thank you for the video! Quick question, at ruclips.net/video/3aX-lC5_P1M/видео.htmlsi=cHbcNLOSx6Z6Rsik&t=1008 Is there an option that Bid engine first persists the bid (accepted/rejected) into a database. database use cdc to publish bids to all consumers like what the kafka does in your design? The database will be the source of truth.
Absolutely. But that introduces a disk to a problem where we otherwise don't have it. If this is acceptable latency for us, I think it's a significantly preferable solution.
@@scuderia6272 no i meant like "he getting the system design from his company and putting it here", like he has a bidding system in his company and putting it here not on ebay XDD
Hey Jordan, just wanted to say I just landed an E5 role at Meta and your videos were a big part of helping me get past the system design interview. It wasn't the only resource I used, but it was definitely a huge help, especially early on in my study when it helped me fill a lot of holes in my knowledge. Thanks so much for producing these videos!
Woo!! Congrats man and good luck in the new role!
Congrats! Mind sharing your resources and learning paths so that it could help others as well.
@@aishwaryakala9653 Hello Interview blogs and videos are awesome. They also help with getting the logistics of how to structure and pace the interview.
I watched System Design Fight Club videos, but the videos are so unfocused and long-winded you really have to watch them at high speed and skip over large parts.
DDIA is a great read, although I can't directly attribute anything I did in any interviews to it (there isn't enough time usually), I do think it helped reinforce and internalize a lot of ideas I'd only skimmed. Also Kleppmann is also a great, engaging writer.
I read Alex Xu's books and... idk personally I kind of hated them and found some of his solutions very questionable, but maybe that's just me. The writing isn't that great and some of it is a real slog to get through. It seemed to me like he didn't get anyone to proof-read or edit them. A lot of people like them though.
Though even for material that is bad, reading it can still help because it stimulates your mind to think about what's wrong with it and come up with your own better solution.
More important than any specific resource... practice! At least for Meta you have only 35 to 40 minutes after accounting for introductions and questions at the end. Write down checkpoints/milestones for how many minutes you can spend on each step of your system design and practice consistently hitting them. Being able to come up with a great design in 2 hours is pointless because they will cut you off long before then.
I practiced with a random design prompt every day and a timer and made sure I was hitting those checkpoints. Also consider that in the real interview you will probably get interrupted sometimes.
Also if you can afford it, consider some paid mocks.
@@aishwaryakala9653
Hello Interview blogs and videos are great. They also help with getting the logistics of how to structure and pace the interview.
Various other random YT videos on topics I felt I didn't understand. I watched System Design Fight Club vids, but the videos are so unfocused and long-winded you really have to watch them at high speed and skip over large parts.
DDIA is awesome, although I can't directly attribute anything I did in any interviews to it (there isn't enough time usually), I do think it helped reinforce and internalize a lot of ideas I'd only skimmed. Also Kleppmann is also a great engaging writer.
I read Alex Xu's books and... idk personally I kind of disliked them and found some of his understanding of some systems a bit questionable, but maybe that's just me. The writing isn't always great and it's a slog to get through some parts. He needed better editing and proof-reading imo. A lot of people like them though.
Even for material that you don't agree with, reading it can still help because it stimulates your mind to think about what's wrong with it and come up with your own better solution.
But more important than any specific resource... practice! At least for Meta you have only 35 to 40 minutes after accounting for introductions and questions at the end. Write down checkpoints/milestones for how many minutes you can spend on each step of your system design and practice consistently hitting them. Being able to come up with a great design in 2 hours is pointless because they will cut you off long before then.
Practice with a random design prompt every day and a timer and make sure you're hitting your time checkpoints. Also consider that in the real interview you will probably get interrupted sometimes.
Also if you can afford it, consider some paid mocks.
Hey this was a helpful comment! I got E5 M too recently, and timing is the name of the game. Been through the same resources and agree with your opinions on the resources. I did one thing additionally for really getting better which was super random. Stumbled upon Cassandra deep dive book and went through first 150 or so pages. This was not specifically to learn Cassandra but I’d say that’s a side effect. The main knowledge is how a production level highly distributed system handles various topics like fault tolerance, coordination, and the issues arising out of them. The way it’s described in the book is just excellent from a writing perspective. This was more of a “foundational “ knowledge I gained around a lot of the concepts and it has helped me a lot. So would highly recommend
Great in depth video. loved the part that all unnecessary part of design or commonplace is not focussed on but only critical part which is about scale and concurrency and ordering problem, achieving atomicity without transactions. Thanks Jordan. I watch your videos only for system design. no need to go to other place.
Hi Jordan. Kafka ordering is only guaranteed within a partition. To get Kafka to have guarantee ordering therefore you need to configure a topic to have only one partitoin.
Yeah we partition by auctionid here
@@jordanhasnolife5163 Ah…sorry, yes!
@@jordanhasnolife5163 correct me if I am worng, but we also need only one consumer with single thread listening to a partition to ensure order right?
@@akshayyn Typically in kafka you're not interleaving which entries go to which consumer, each consumer tends to read all entries in its partition.
But yes, technically events could be processed out of order if you did do that.
Dude, this was awesome. Well done on the design and the explanation.
You look like Marv from Home Alone after the iron fell on his face. Love the vids tho
Been a rough week for me
Great contents as always, thanks! Some questions if I may: 1) on the state machine replication for bid engine back up, isn’t it almost like a 2 phase commit? And because user has to wait for the backup writes/replication confirmation, how is it going to be fast? Or at least faster than Kafka? Imagine if you put the same in memory bid engine behind Kafka, isn’t it going to similar or even better? 2) on handling the hot bid part, I see although we can scale the readers (get the current bidding info), biding engine, a single machine most likely, is still the bottleneck and arguably does most of the heavy lifting? Acquire lock, decide on winning logic, send to backup/kafka and etc.
Yes, it basically is a two phase commit, except the message only has to get to kafka (I'm using a successful write there as an indication that the write is durable), as opposed to completely fully replicated in memory down stream.
2) Yeah, but there's not really a solution to this. If you need multiple things to choose an absolute ordering over them they've gotta go to one place. We could buffer in many different kafka queues and have the bid engine poll at the rate it's able to, but then this whole process becomes asynchronous, and I had wanted it to be synchronous.
@@jordanhasnolife5163 Not sure I fully get the argument for a synchrous process though, as long as the througput is high, which arguably async architecture might be better here, user experience would not be affected?
@@ekinrf perhaps not. being "synchronous" is really an illusion over an asyncrhonous network anyways.
Jordan, how do you feel about the delivary framework that's suggested by vast amount of youtubers.
Functional and nonfunctional requirement -> back of envelop estimation -> list down core entities -> API design -> high level design -> deep dive.
I found you have your own style of delivary. and it is still smooth.
I used to do this in the 1.0 playlist. I found that it wasted a lot of time for me, and I think that most people are pretty capable of figuring out the APIs that you need fairly quickly. The high level design to deep dive is something I've considered doing more, which is why I tend to have these overview slides. I don't think giving a high level design without some initial discussion first makes much sense to me.
@@jordanhasnolife5163 Yes I think high level design always goes into very similiar form if you do it without discussion on how you are going to handle and store data.
I am still thinking if I should draw a high level diagram in my upcoming interview or not. I did this in a mock interview, and eventually, the interview still ask me how data model looks like and how I handle race condition.
I think put the high level design after presenting data model to do a quick summarization of current discussion may be a good idea.
thanks for the great video. One qq, how can a kafka queue have multiple consumers from 28:36 for "top-bid kafka queue"? I heard one partition of kafka can have only one consumer at a time. I guess we need to post the record into multiple partitions?(which parititions by the server it will be subscrbing..?) or relaistically, just change the last bit to in-memory queue like pubsub?
You can have many consumers on the same partition, you just can't do round robin within a partition
@@jordanhasnolife5163 Ahh thanks. I had a misconcept of kafka, I thought it was one consumer per partition, and after seeing your reply, and found out that one consumer per partition within a consumer group, where each consumer group keeps its own offset. Thanks!!
If interview is to design a heavy read light write, then sync is fine. But for heavy write&read, it can dangerous/hard to sell this sync design, all sync part should be doing early rejection only while those surviving bidding need be queued and processed async by a single node eventually. More like a flash sale problem, rejection starts from client/browser.
I agree that putting all bids in kafka and processing them asynchronously is the better way to go here.
Thanks Jordan for an amazing upload as always!
A question from the last slide -
- what is the use of Auction DB? Is it just for the bidding engine to read and write some state that it needs? As looks like only bidding engine is interacting with it.
Yeah, we basically need it to write end auction result as well as query for existing auctions if users want to bid.
I think I missed it (or maybe i keep tuning out at the right time), but where is the actually collision case discussed where two people at the exact time submit the same bid price (and how is it determined who wins?). I understand the "we lock each bid coming in -> increment sequence -> ship to primary/backup engine for state, kafka for source of truth" but wouldn't the two concurrent writes be the same sequence id. I guess what I'm asking is what the collision principle is there. Either way, phenomenal video. I'm having a great time binging these before my interview weh.
what does locking accomplish? We aren't "locking bids", each bid must successfully grab a lock so we can assign it a sequence number.
@@jordanhasnolife5163 ah i see the difference there, so we are locking against the sequence number to avoid writing two of the same sequence ids? I can see it from a logistical perspective, but where does the fairness aspect come into play? If we have two bids at the exact same time, then wouldn't the winner be whoever acquires a lock first? I guess if it isn't up to us to decide who wins the tie breaker (for the sake of this exercise), and its just get lucky and your bid is accepted before the other one because you got the lock first, then I get how it is expected to work. Thanks for the follow up!
@@rr5349 Yeah I don't see how you can ever be "fair" unless one party decides which thing gets there first. You can't trust distributed timestamps.
Very nice video, Jordan.
One question I have is: how do we accurately restore the Auction state on the server if both the Bid Engine and Backup Engine go down?
MOAR BACKUPS
Beyond a point, everything can fail. There's no way to guarantee fault tolerance against everything, but within reason hopefully.
Hey Jordan, Do you plan on bringing back the low level system design videos anytime going forward?
At least for the time being, I'm planning to stick with distributed systems design. That being said, I'm sure I'll eventually fall into a rut, and once I do I may revisit these!
one quick question, how would we handle hot auctions near the deadline. When there are too many bids. In that case, we might need to have Kafka before our bidding engine
Agreed, we lose any request response ability for our requests but what can ya do
Hey Jordan,
Why dont we use something like transactional outbox pattern for writing to kafka.
Instead of the broker listening to db I can have some outbox table whose job is to send the data to kafka and this will achieve dual write with a very high throughput.
Because then I have to write to disk first which lowers my throughput. If this table is in memory then by all means go for it
Thanks for the detailed overview. Quick question, do you think Redis Sorted Set collection would help to find the top bids ? And that can even offer as an backup option. Any thoughts pls
Well, we only need to know the top bid at a time, as opposed to the top k, which is when a sorted set would be useful.
Hey thanks for this. I was wondering if Redis would be a good choice to run on Bid Engine, given it's atomic and single-threaded. Custom Lua scripts could have the logic to push the things in kafka. Do you see a downside to this?
I think if you can make what you said work that seems pretty feasible!
At the 31:28th second, why do we need an Bid Gateway behind LB? I think It should be other way around BidGateway and then LB for Bid Engine.
Technically you'd want one for both. The bid gateway is likely going to be run by many servers which we can round robin to. Finding the right bidding engine will depend on the id of the auction.
One thing I wanna point out is that if the primary goes down, the backup server will have to wait to be caught up with all the kakfa messages in its partition before it can start serving requests. So we can have some unavailability, which is ok here since we are trading it for consistency.
Another thing is that since kafka already guarantees ordering within a partition I don't think we need the sequence numbers. Since bid engine is evaluating the bids in a serial order through a critical section and also persists messages to kafka in that critical section, the ordering of messages in kafka will be consistent with the sequence numbers making them redundant. @jordan let me know if I missed something.
On reviewing the pseudocode it seems like we are writing to kafka in a background thread. I don't think thats feasible as it'll violate our durability guarantees. I think have to persist to kafka before we acknowledge a bid.
1) Totally agree that you'd have to wait for the backup to read all Kafka messages. You could mitigate this by sending messages to the backup first and then it puts them in Kafka, but there are tradeoffs there. The sequence numbers were just for the case where we publish to Kafka on a another thread
As for the separate Kafka thread thing, I registered a handler for when we get the ack at which point we return to the user. But then is it truly synchronous?
@@jordanhasnolife5163 Got it. But I wonder if there is any benefit to using a separate thread to publish to kafka.
If we use a single bg thread that reads for the in memory queue and publishes to kafka, I'd say lets just use the main thread as pusblishing to kafka is the bottleneck so having multiple thread doesn't really improve response time to client.
If we use multiple bg thread that reads for the in memory queue and publishes to kafka( which eliminates the main bottleneck through parallelism), then the messages will be out of order in kafka. Now sequence number do come to rescue here but consider a scenario where I successfully published sq no 5 but the primary died before publishing sq no 4. Since the determination of whether sq no 5 was accepted/rejected depends on sq no 4, loosing sq no 4 which sq no 5 is persisted will create inconsistency in our system.
Interested in hear your thoughts on this.
@@jordanhasnolife5163 Got it. Though I am having trouble understanding what the benefit of having a separate thread to publish messages is.
I don't think we can use multiple bg thread to publish to kafka because that would screw up ordering and can create inconsistency like sq no 5 is persisted while the primary died before persisting sq no 4. And since sq no 4 was used in the determination of sq no 5's accepted/rejected status this would lead to inconsistency.
And if were using a single bg thread might as well use the main thread and publishing to kafka will be bottleneck in the bid processing. using a single background thread just changes where we wait i.e. do we wait before entering critical section(main thread scenario) vs after entering critcal section(bg thread scenario)
Love the content, can you also make a video on RUclips Analytics like Video counts, watch time with no double view on given time period and extensible to have new metrics for content creators.
I'd say this sounds a lot like the "top K" problem, but instead of getting the top K you compute it for all of them.
@@tttrrrrr1841 Yeah I saw this on Leetcode only, can you add more details. You mean you used video chunk_id to get the count and its time_window can be used to multiply the count to get total watch time?
@@tttrrrrr1841 you mean you used video segment_id events to aggregate and then use video_id to perform those metrics operations at the query time.?
@@tttrrrrr1841 Can you also tell what was the question you had in second round of SD?
@@tttrrrrr1841 lol funny enough I did not see your leetcode comment, this one was me I promise
why can't we have multiple instances bid engine and have a distriburted lock using redis (update the bid inprogess).. will that work? May be another basic question, is backup the same instance of bid engine?
For your first question, that's gonna lead to a lot of contention since they're all trying to update the same bid object (at least that's what I think you're trying to say)
#2 - not sure what you mean here. They're different servers
@@jordanhasnolife5163 Thanks for you reply. I meant , is it the same bid engine code deployed on diff server?
@@madhuj6912 Ah yeah basically, but it just pulls incoming events from kafka
Hi jordan,
i i believe, when a bid is being processed, then we are keep the price of the auction updated with highest what if,we can keep the bid id also in same auction table of auction db.
now in case. we are keeping the auction db replicated also using leader follower with consesus algo. then we will not need backup engine at all even big engin(stateless) gose down.
will this method work?
Yeah for sure. It just now takes a full consensus write to submit a bid, so your throughput goes down quite a bit.
Why we have auction db mysql, when it's decided to choose kafka and time series as source of truth?
This is just for the metadata of the auction itself, not the bids
in the 2nd pseudo code, you enqueue some bids into an in-memory queue, and then ship to kafka, am I right?
Yep!
Hey Jordan, do you plan to do a email system design?
If I think it has some aspects of it that are new then sure
So how does the bid engine actually determine the ordering? Like what is the actual logic?
You just grab a lock, increment the sequence number by 1, release the lock.
Hey Jordan your linkedin?
www.linkedin.com/in/jordan-epstein-69b017177?
Thank you for the video! Quick question, at ruclips.net/video/3aX-lC5_P1M/видео.htmlsi=cHbcNLOSx6Z6Rsik&t=1008 Is there an option that Bid engine first persists the bid (accepted/rejected) into a database. database use cdc to publish bids to all consumers like what the kafka does in your design? The database will be the source of truth.
Absolutely. But that introduces a disk to a problem where we otherwise don't have it. If this is acceptable latency for us, I think it's a significantly preferable solution.
The first viewer on a Saturday Evening? damn thats sad
Lol
Bro exposing his company system design 💀💀
Has he worked at eBay?
No never lmao, I'm a bit confused here
@@scuderia6272 no i meant like "he getting the system design from his company and putting it here", like he has a bidding system in his company and putting it here not on ebay XDD
@TenFrenchMathematiciansInACoat GUYS ITS JUST A JOKE 😭
😂