This is one of the most in depth, real world, highly practical application of software engineering concepts which also mimics how design discussion among colleagues happen in good companies. This is pure gold.
Finding your channel feels like finding gold! There are ton of SD videos on youtube with shallow content basically exactly like you mention what a junior or mid level candidate would do. Going indepth for senior and staff is one of the highlight of your content. Please continue doing that. Also please don't worry about length of the video. Keep the gold coming :)
Watch this and then imagine if Evan puts together a System Design Learning Course. Just image that !!!! I mean we (the learners) will jump on it sooo quickly. This is just absolutely amazing. This is combining years of experience with hands on actual approach that works along with book contents presented in a very professional manner. Evan, think about it 🙂
Awesome walkthrough! As a junior engineer I learned a lot. Near the end you said you wanted to keep it short but I appreciate the nuances you point out for your decisions including design choices between mid and higher level candidates. Time didn't faze me at all, I watched every second!
I rarely leave comments but this is BY FAR the best system design video I have seen on youtube. The thing I like most is that it differentiates from common textbook solutions that we see everywhere and you explain why certain choices were made. Thanks!
You are the best at what you do, after listening to your videos I cannot now like and tolerate other system design videos, please make more content for system design
Ah, nice, you re-uploaded! Thanks a lot for taking the feedback and acting quickly on this. And, sorry if it caused inconvenience for you 😄Thanks a lot for all of your hard work. 🙏
Really helpful videos - especially the breakdown for different expectations at mid/senior/staff levels, common things you see from candidates, and context into the actual systems like the shard limits for events streams. I used to work on Kinesis - happy you chose it!
I have purchased so far Alex Xu grokking system design by Mikhail Smarshchok i would say i learned a LOT from the Mikhail Smarshchok as far as internals go and then i bumped into your Hello Interview ABSOLUTE BEST stuff that can replace Alex,grokking for me. I was ALWAYS looking for how to connect the functional requirements into the later high level design/ deep dives but both Alex, Grok fail many times in connecting the dots. Subscribed and a big fan of you. I will consider purchasing coaching or mocks once I am through your videos and feel confident.
I’ve purchased both of those contents (I’m a big fan of Mikhail’s RUclips channel, but once he became popular he stopped posting and decided to monetize). Alex Yu is very dry and hand wavy for me. I’ve asked questions on the chat (Discourse?) but those are answered by other students, and I don’t have any confidence that they’re speaking from experience.
Great job with creating this content to help us prep for interviews! Just one thing to note, you can't send a 302 redirect for POST requests, it has to be a GET request.
I discovered your channel a few days and love all the videos so far. I love the solution enforce idempotent click tracking. There are a lot of SD videos but only your SD videos provide guidelines on the expectations based on candidates' levels. So far, I watch one video per day so I will run out of videos to watch very soon xD
Excellent, as usual! Thanks so much 🙏 While concluding, you mentioned you want to keep the videos short. Please don't reduce the length. 1 hour is a sweet spot and it's necessary to capture all the important "juicy" tidbits and details you highlight. Please keep it coming 🙌
Evan, thank you for sharing - this is the most in-depth system design content I've found online. However, regarding the decision of whether to use Flink checkpoints, I have several points I'd like to discuss. First, while I'm not familiar with Kinesis, for Kafka, consumer offsets are committed by the consumers themselves. For a Flink job performing window aggregation calculations, it inevitably consists of multiple operators: the Kafka consumer, window aggregator, and the sink that writes results to either an OLAP database or downstream Kafka topics are all different operators. Without using checkpoints and the automatic offset commits triggered by checkpoints, the Flink job would be practically unable to correctly commit offsets to Kafka. This issue isn't just about whether in-flight data is stored in checkpoints - it's also about whether consumer offsets are being correctly committed. Without this mechanism, the concept of 'restarting consumption from an earlier offset when the Flink job fails' cannot be properly implemented, as we would have no mechanism to ensure that this 'earlier offset' is positioned before all data that hasn't been processed and flushed downstream. If checkpoints and the automatic offset commits triggered by checkpoints are disabled, there are only three options: Automatically commit the offset immediately every time Flink's Kafka consumer receives a message, which obviously provides no fault tolerance and will lead to data loss; Commit offsets periodically, but this cannot guarantee that downstream window aggregations have completed and flushed results to the sink when consumer commit the offset. Although we can set a relatively long commit interval to mitigate this issue, it still doesn't provide exact guarantees; Craft a manual offset commit mechanism to ensure that Kafka consumer only commits offsets after downstream window aggregations have completed and flushed results to the sink. However, this is extremely difficult and unnecessary, as Flink doesn't have a built-in feedback/callback mechanism for downstream operators to notify upstream operators. Therefore, you would have to implement this yourself. But if that's the case, why not directly use checkpoint + 2PC? Therefore, I believe checkpoints are necessary here, even if you only want to achieve at-least-once semantics. The Flink community has invested significant effort in optimizing checkpoint performance, such as introducing incremental checkpoints and implementing state backends with better performance. All these efforts aim to make checkpoints more practical and capable of handling large data volumes with short intervals. I believe leveraging these latest technological advances in Flink and using checkpoints is a better choice, rather than adopting various workaround solutions. 🙏
These interview preps make you feel like if you know enough of system design knowledge, have good cross team examples for bq, and can solve leetcode medium-hard fast, you can get to higher level quicker than going through internal promotions.
This video helped me ace the system design interview. The detailed explanations provided in-depth knowledge of various components, which was extremely helpful for answering follow-up questions during my interview.
Absolutely understood everything in this, being a starting data engineer myself and looking for my next venture. this is class content.Thank you so much.🎉❤
This was such a pleasure to watch. Thank You. I would love to see a video on a metrics monitoring system. There will be some common components with ad-click aggregators.
you are honestly the best content on system design , can you do some playlist on the system design topics themselves ? i mean a video where you discuss replications in depth , concurrency , etc.
The content is great as always. However I found this system a bit frustrating, there is so much specific knowledge one should know to really end up with a good design for this kind of problem. I wish I don't get a system like this in an interview.
Thank you so much for these videos! They’re absolute goldmines-I’m genuinely amazed at the depth and clarity they offer. I’ve learned more from these than from many of the books I’ve read. Truly invaluable!
I just found your channel and these videos are really great and super helpful! I love how you verbalize the challenges and tradeoffs as they come up, it provides a great example for how to communicate in an interview.
If we replace kinesis stream with Kafka we can perhaps get rid of Flink and S3. Kafka has the concept of aggregating stream by group function and time windows (tumbling, hopping etc). Further it has a ksqldb that stores all events for a 7 day period which can be queried and compared with OLAP to verify data integrity. Also, I love your videos. Looking forward to more of those.
I saw some videos and your content is so great. Thank you so much for clarifying the SQL vs NoSQL debate. I always thought that bringing that into an interview was irrelevant but was afraid to do it. 😅 Keep up the amazing work.
Stellar video. Only suggestion for the next ones: your drawing tool seem to have keyboard shortcuts for those diagrams and other options on the toolbar. It'll greatly improve your quality of life! Keep it up!!
Thank you for a great video. For a senior candidate it will be helpful, in my opinion, to narrate the data structures that underpin these solutions in addition to the supporting vendor products/technologies. In that, for fast aggregation of counters, one could demonstrate the use of fast but slightly imprecise counters using a count-min-sketch-like data structure, and for a slower but more accurate counter the use of a Map Reduce jobs. Aggregates and statistics over varying windows are almost a necessity for contemporary Monitoring, Analytics and ML Systems. And in such cases retaining data in-memory backed by persistent storage in the form of tree data structures keyed by aggregation windows are useful for range queries at varying granularities. For e.g.: root window [0-100). Immediate child node windows [0-50), [50-100) etc. It could be helpful to talk about idempotency within queue consumers. And also out-of-sequence arrival of events in the queue (handled through watermarking)
these are so good , such deep dive and so clear ! Thanks because they cover so many different aspects . i am looking forward to many more videos . I am currently preparing for interviews and these are so helpful ..
Amazing content! Very much appreciate you posting these 🙌 System design padawan here. I have a question about the hybrid approach .. what makes us trust the "Kinesis -> Connector -> S3 -> Spark -> Worker -> OLAP" results more than the "Kinesis -> Flink -> OLAP" results? Is it a guarantee where the connector always successfully writes the data to S3? or does Flink make some kind of tradeoff for speed? kind of confused about that piece and figured i'd ask. thanks again!
+1 on this question. IMO, Spark is useful when you have really out-of-order events, like events arriving half an hour late or something. Then, by using Spark, you can reorder the events and get a more accurate count. On the other hand, for events that are only a few minutes late, you can configure Flink with the allowed lateness of a few minutes. That being said, the cost of maintaining 2 code bases and ensuring that they are updated at the same time (to avoid triggering discrepancy alerts) doesn't seem worth it for such edge cases. I'd be interested to hear @hello_interview's thoughts on this though.
> Is it a guarantee where the connector always successfully writes the data to S3? Yes. That is something provided by AWS as a managed service, and it should meet their SLAs. AWS would've created fallbacks and fault tolerance to ensure that all events that are in Kinesis reach S3. I don't see there being any compromise due to speed. Kinesis has retention policy, so the data isn't going anywhere, and S3 is highly available, so the data shall be written there as well.
Hey Evan, your videos have been the best, seriously! The only suggestion I have is if you could use a dedicated mic that would be wonderful. The volume is too low even on max volume on my earphones.
Thanks for the video, it is really in-depth and informative. You covered the flink + streaming part really well and why not to use checkpointing was really a good point. Saw some videos in the past that mentioned let's do checkpointing and never explained why it will work or not work. For the streaming components both streams (kinesis or kafka) and stream processor (Flink/kafka-streams) I got following questions ( (I was asked similar questions in an interview.) 1. we have said that we will partition the shards or streams using adId but we have 10M ads at any given point of time. In this case, if I consider other things like replicas in kafka topic or even kinesis shards, don't we have to create lots of shards? 2. The max traffic is 10k cps so most of the ads are not clicked and won't have active traffic so most of the shards will be empty, in this case what should be the approach? 3. If we group some ads based on let's say advertiserId than we introduce the noisy neighbour problem where one ad generates lots of traffic and blocks the other ads.
Those questions are great and definitely come up often with streaming setups. I'm no expert but ... Partitioning with 10M Ads ... Yeah, if we just went with adId for partitioning, it would mean loads of shards or partitions, which isn’t practical. Instead, you can use consistent hashing or modulo partitioning to map multiple adIds to a smaller, manageable number of partitions.
Hey, I love these videos. I only used your videos and designing data intensive applications and that was enough for an E4 offer at Meta, I love the advice you give and common pitfalls you provide.
Thanks a lot ! This video is super helpful. It not only helped me understand the key components of ad click aggregator, but more importantly it taught me what does the interviewer expect for different level SDEs. One question --- do we need to dive into the aggregation logic details in Flink?
Thanks for the content; It's great; One query on 1. As we compute 10sec and write it to OLAP DBs ; The deep dive on how the user queries for 1min window, 1hr window, may be 1 day and all time window would be great info; 2. And what could be that OLAP database choice would be and what factors to consider. 3. Another aspect is; Freshness of the data; What does each component in the system contributes to the latency to make the event available for aggregates; Other aspects are great; thanks
I like your videos, I have learned a lot. A couple comments on this video: a. I think the system would benefit from a Redis in the click processor service, not the idempotency lock but a redirectUrl cache {adId: redirectUrl} to reduce reads in the Ad DB. It might be a MRU cache to avoid overloading the Redis. b. I'm not sure why you are pushing Kinesis so hard in this solution, I mean yes I learned something about Kinesis, but it would be more practical just to place a Kafka that can handle the load peaks and has event history as well so it is possible to write the reconciliation procedures from it. c. I learned about Flink, thanks. I used a redis aggregator in my own solution. Thank you so much for your work!
I’ve become an annual premium member of HelloInterview after watching couple of your videos. It’d be really helpful if you started a chat/messaging platform for the paid subscribers, so that we could get our questions answered without relying on RUclips.
For hot shard problem in Kafka, you can salt things but in Flink, we will no longer have all the events aggregated in one Flink task. In the case where we are sinking to Redshift, we could have aggregate it there. but if we want to access it in stream, maybe we need a secondary stream that aggregates everything? So that added salt needs to be handled one way or the other
Awesome content!! I was hoping to get more clarity on how same flink node determines the partitions it need to consume from in case of hot shard issue which we are effectively handling by adding a number b/w 0-n to the partition key which leads to same adId clicks published to multiple partitions in the stream.
This is great content. However, I would like to point out that the Lambda architecture includes both batch and speed layers, which process historical and real-time data in parallel. It’s not solely reliant on batch processing.
I believe the checkpoint is still necessary so it avoids data loss. If the server crashed, the checkpoint stores what the offset in kafka/kinesis stream is at and restart from the offset in the checkpoint. Without the checkpoint, flink has no idea where it should be restart from and the data will be lost between the last time it send data to the OLAP and the the it crashed.
Kafka also stores offset for each registered consumer group so when flink job restarts it will start consuming messages from last committed offset. However I think checkpointing is necessary for data integrity if we are using processWindow function for doing aggregation in flink pipeline which is stateful operator, if we disable checkpointing and server crash happens, it will loose the state and data loss will be there. This can severly impact for large window size of 15 min or 30 min
Thank you so much for the explanation. However I more like you write the functional and non functional requirements instead of paste from the clipboard. The main reason is that it's make me thinking about what should be the requirements. It's feels more like code pairing
Since you are filtering for a specific AdId (AdId = 13), grouping by AdId is redundant. The result will only include data for AdId = 13, so grouping won't change the output
Great video! I learnt a lot! One question I have is regarding the hot shard handling. When the click processor detects there is a hot shard and decides to change the key from AdId to AdId:n, how would it let Flink know that it now needs to change the aggregation logic (and sharding logic) for that particular AdId? (I believe this would also have a race condition when it changes within a minute window, but any data integrity issue that arise from it should get resolved by the batch job)
I belive aggregation logic must have way to detect such AdIds maybe detect it by inspecting key format and do the processing. One more thing, Once we are storing these resulti finally in OLAP database, we need to have additional handling of such Ad to generate accurate result as we have mutliple aggregation results for such ads in db AdId:1....AdId:n and do one more level of aggregation
Thank you for sharing! I got a question about the decision to not use checkpointing in Flink. If you don’t enable that then where would you store the Kafka consumer offsets for recovering?
I love the content you post. I am able to clear my interviews because of your notes. In above video do you think Timeseries DB will be better choice for Click DB as compared to Cassandra?
Hi Evan, Absolutely gold content. I have one doubt here. For a really hot ad, you are adding some random no (1-n) to build the partition key before adding it to the kinesis stream. Now this particular ad can land into multiple spark consumers. How the different spark consumers will aggregate this data for this ad. Is there anything that i am missing? Are you referring to keyBy(adId) in Flink?
Thanks for the great videos - they are extremely helpful. I noticed at around 24 mins in you mention querying for the number of distinct userIDs. I don't think you're going to be able to serve queries like that using the pre-aggregation you suggest doing with Flink. I don't know a good solution to this problem other than dumping the list of userIDs for each minute window into your OLAP DB. You might be able to use HLL hashes for this, but depending on how granular the dimensions are in your DB, it may not be worth it.. I think it's at least worth mentioning this if we think users care about unique counts.
For Kinesis hot shards, we don't know if an ad is hot beforehand. So are these ad_id 0-N always active? Is it ok to use x10 the amount of streams we need under normal circumstances? For Flask, we have the same amount of flask servers as the Kinesis shards right? If the server dies, how will the new server keep track of the pointer from the old server? Are they statefull backups instead of stateless
This is a great question. In reality you can make predictions here. We know based on budget and historical performance which ads we’d need to be worried about before hand
The best system design content. Thanks alot for helping me to prepare for my upcoming interview s. Can you please clarify the difference between product design and system design at Meta?
Amazing system design. I've been searching for something exactly like that, which is interview driven, not "let me show you all the ways you can do it" driven. May I just make a small suggestion: please use "etcetera", not "eccetera"...
Thanks alot, great explanation \o/ Regarding handling the idempotency, why can we not get away with only the ad impressionID? Is signed one actually required?
This is awesome. Thanks Evan. I have a question on the usage of blob storage. Aren't those supposed to be used for storing unstructured data like images, audio & video files? Could you please elaborate on 1) how Spark task reads data from S3 2) how Spark job would sustain reading one day's data? Is checkpointing used in this context?
This is another great video!! Please keep it coming. Can we use mirroring in Kafka and have spark read from the mirror and provide data to reconciliation service?
The system design videos on this channel are the best out there. Thanks for putting in so much time! I did have a question regarding the proposed reconciliation architecture: I get, that data accuracy is important and it acts as sort of an "Auditor" in our system. However, you mentioned that errors might stem from e.g. bad code pushes or out of order events in the stream. The proposed reconciliation architecture would really only fix issues that would occur *within Flink* though, right? At the end of the day, the spark job is 'still acting upon that same data from Kinesis, so in case of out of order events or bad code pushes, it would also be affected, no?
Yah if you messed something up in the server before kinesis you’d be screwed still. But you’d want to keep that as simple as possible. You can trust kinesis will be reliable, out of order won’t matter for reconciliation.
Great video, covers so many new topics. Is the use of a time-series database practical for storing click events by timestamp? By design, that would help with aggregation in 1 minute windows, no?
Hi Even, thank you very much for your great video. But there is on point I don’t fully get it, the way you handle hot shards problem by adding number randomly at the end of AdID. What would happen if we have few popular Advertisements? In that case, we might need an executor for all shards. Is my assumption right? Really hope that you can help me to clear that. Or anyone here know the answer, could you please help me?
Nice video. Liked the dive deep. One QQ: Why don’t have DB introduced earlier before Kafka/kinesis and then Kafka reading it from this Db, instead of storing it for recon? Any flaw with this approach?
Question: For the sharding while processing the events through Kinesis, the adId was suggested as the sharding key. This doesn't look like the best approach. At scale, millions of ads are being run on the platform and a good share of them have high enough volume. Going by the presented logic, the number of shards would explode. What do you think about this?
This kind of content can make someone fall in love with software engineering.
This is one of the most in depth, real world, highly practical application of software engineering concepts which also mimics how design discussion among colleagues happen in good companies. This is pure gold.
💙🥲
Finding your channel feels like finding gold!
There are ton of SD videos on youtube with shallow content basically exactly like you mention what a junior or mid level candidate would do.
Going indepth for senior and staff is one of the highlight of your content. Please continue doing that.
Also please don't worry about length of the video. Keep the gold coming :)
@hello_interview waiting eagerly for next video
This is by far the best system design interview ever seen on the internet. Keep doing the great work sir...
Totally agree @sudarshanpanke7329
You somehow managed to make preparing for system design interviews really fun. Massively underrated channel
You’re the best!
Watch this and then imagine if Evan puts together a System Design Learning Course. Just image that !!!! I mean we (the learners) will jump on it sooo quickly. This is just absolutely amazing. This is combining years of experience with hands on actual approach that works along with book contents presented in a very professional manner. Evan, think about it 🙂
Maybe one day! For now just happy with all the people learning about hello interview and getting tons of free value
This is not just interview prep, this is some serious stuff here. Thanks a lot!
🫡
Honestly, it is the best SD showcase I’ve ever seen. You are the best. I watched all your videos and whiteboard them myself then. Thank you!
So glad you like them and very smart to try them yourself and not just blindly consume!
Awesome walkthrough! As a junior engineer I learned a lot. Near the end you said you wanted to keep it short but I appreciate the nuances you point out for your decisions including design choices between mid and higher level candidates. Time didn't faze me at all, I watched every second!
Hell yah!
I rarely leave comments but this is BY FAR the best system design video I have seen on youtube. The thing I like most is that it differentiates from common textbook solutions that we see everywhere and you explain why certain choices were made. Thanks!
Thanks for commenting!! Glad you enjoyed it
You are the best at what you do, after listening to your videos I cannot now like and tolerate other system design videos, please make more content for system design
Coming soon! 🫡
Totally agreed! I also cannot tolerate anyone else at this point
Ah, nice, you re-uploaded! Thanks a lot for taking the feedback and acting quickly on this. And, sorry if it caused inconvenience for you 😄Thanks a lot for all of your hard work. 🙏
Thanks so much for calling that out! Glad to get it fixed within the first day :)
Is e-commerce (design amazon / ebay) not as common as it once was?
Really helpful videos - especially the breakdown for different expectations at mid/senior/staff levels, common things you see from candidates, and context into the actual systems like the shard limits for events streams. I used to work on Kinesis - happy you chose it!
How cool! That must’ve been fun to work on :)
I have purchased so far
Alex Xu
grokking
system design by Mikhail Smarshchok
i would say i learned a LOT from the Mikhail Smarshchok as far as internals go
and then i bumped into your Hello Interview
ABSOLUTE BEST stuff that can replace Alex,grokking for me. I was ALWAYS looking for how to connect the functional requirements into the later high level design/ deep dives but both Alex, Grok fail many times in connecting the dots.
Subscribed and a big fan of you. I will consider purchasing coaching or mocks once I am through your videos and feel confident.
I’ve purchased both of those contents (I’m a big fan of Mikhail’s RUclips channel, but once he became popular he stopped posting and decided to monetize). Alex Yu is very dry and hand wavy for me. I’ve asked questions on the chat (Discourse?) but those are answered by other students, and I don’t have any confidence that they’re speaking from experience.
Great job with creating this content to help us prep for interviews!
Just one thing to note, you can't send a 302 redirect for POST requests, it has to be a GET request.
I discovered your channel a few days and love all the videos so far. I love the solution enforce idempotent click tracking. There are a lot of SD videos but only your SD videos provide guidelines on the expectations based on candidates' levels. So far, I watch one video per day so I will run out of videos to watch very soon xD
Excellent, as usual! Thanks so much 🙏 While concluding, you mentioned you want to keep the videos short. Please don't reduce the length. 1 hour is a sweet spot and it's necessary to capture all the important "juicy" tidbits and details you highlight. Please keep it coming 🙌
If you HAVE to reduce something, please reduce the time between videos to 1 week 😛
No, seriously, thanks so much.
Haha trying 😝
One of the best channels on system design! Please keep going!
Evan, thank you for sharing - this is the most in-depth system design content I've found online. However, regarding the decision of whether to use Flink checkpoints, I have several points I'd like to discuss.
First, while I'm not familiar with Kinesis, for Kafka, consumer offsets are committed by the consumers themselves. For a Flink job performing window aggregation calculations, it inevitably consists of multiple operators: the Kafka consumer, window aggregator, and the sink that writes results to either an OLAP database or downstream Kafka topics are all different operators. Without using checkpoints and the automatic offset commits triggered by checkpoints, the Flink job would be practically unable to correctly commit offsets to Kafka.
This issue isn't just about whether in-flight data is stored in checkpoints - it's also about whether consumer offsets are being correctly committed. Without this mechanism, the concept of 'restarting consumption from an earlier offset when the Flink job fails' cannot be properly implemented, as we would have no mechanism to ensure that this 'earlier offset' is positioned before all data that hasn't been processed and flushed downstream.
If checkpoints and the automatic offset commits triggered by checkpoints are disabled, there are only three options:
Automatically commit the offset immediately every time Flink's Kafka consumer receives a message, which obviously provides no fault tolerance and will lead to data loss;
Commit offsets periodically, but this cannot guarantee that downstream window aggregations have completed and flushed results to the sink when consumer commit the offset. Although we can set a relatively long commit interval to mitigate this issue, it still doesn't provide exact guarantees;
Craft a manual offset commit mechanism to ensure that Kafka consumer only commits offsets after downstream window aggregations have completed and flushed results to the sink. However, this is extremely difficult and unnecessary, as Flink doesn't have a built-in feedback/callback mechanism for downstream operators to notify upstream operators. Therefore, you would have to implement this yourself.
But if that's the case, why not directly use checkpoint + 2PC? Therefore, I believe checkpoints are necessary here, even if you only want to achieve at-least-once semantics.
The Flink community has invested significant effort in optimizing checkpoint performance, such as introducing incremental checkpoints and implementing state backends with better performance. All these efforts aim to make checkpoints more practical and capable of handling large data volumes with short intervals. I believe leveraging these latest technological advances in Flink and using checkpoints is a better choice, rather than adopting various workaround solutions. 🙏
Literally recommended Hello Interview to everyone I've mocked interviewed with
You rock
These interview preps make you feel like if you know enough of system design knowledge, have good cross team examples for bq, and can solve leetcode medium-hard fast, you can get to higher level quicker than going through internal promotions.
This video helped me ace the system design interview. The detailed explanations provided in-depth knowledge of various components, which was extremely helpful for answering follow-up questions during my interview.
Absolutely understood everything in this, being a starting data engineer myself and looking for my next venture. this is class content.Thank you so much.🎉❤
This is the best system design video I have seen on youtube till now. Really loved the in depth discussion. Would love to see more videos. 👍
I felt this was much better than the Alex Xu System Design Vol 2 on the same topic. Great Job1
High praise!
This is the best ad click aggregation system design video and article I have ever seen!
This was such a pleasure to watch. Thank You. I would love to see a video on a metrics monitoring system. There will be some common components with ad-click aggregators.
The level of detail in this video is crazy
you are honestly the best content on system design , can you do some playlist on the system design topics themselves ?
i mean a video where you discuss replications in depth , concurrency , etc.
Will definitely consider this!
The content is great as always. However I found this system a bit frustrating, there is so much specific knowledge one should know to really end up with a good design for this kind of problem. I wish I don't get a system like this in an interview.
Thank you so much for these videos! They’re absolute goldmines-I’m genuinely amazed at the depth and clarity they offer. I’ve learned more from these than from many of the books I’ve read. Truly invaluable!
So glad you like them!
I just found your channel and these videos are really great and super helpful! I love how you verbalize the challenges and tradeoffs as they come up, it provides a great example for how to communicate in an interview.
If we replace kinesis stream with Kafka we can perhaps get rid of Flink and S3.
Kafka has the concept of aggregating stream by group function and time windows (tumbling, hopping etc). Further it has a ksqldb that stores all events for a 7 day period which can be queried and compared with OLAP to verify data integrity.
Also, I love your videos. Looking forward to more of those.
I saw some videos and your content is so great. Thank you so much for clarifying the SQL vs NoSQL debate. I always thought that bringing that into an interview was irrelevant but was afraid to do it. 😅
Keep up the amazing work.
Yah funny how that was evangelized in a couple books and then just stuck
Really good! Listened 3 times to pick up every single detail. Thanks.
Thanks a lot for uploading these videos. They are very informative. Keep doing the good work.
Stellar video. Only suggestion for the next ones: your drawing tool seem to have keyboard shortcuts for those diagrams and other options on the toolbar. It'll greatly improve your quality of life! Keep it up!!
Thank you for a great video.
For a senior candidate it will be helpful, in my opinion, to narrate the data structures that underpin these solutions in addition to the supporting vendor products/technologies. In that, for fast aggregation of counters, one could demonstrate the use of fast but slightly imprecise counters using a count-min-sketch-like data structure, and for a slower but more accurate counter the use of a Map Reduce jobs. Aggregates and statistics over varying windows are almost a necessity for contemporary Monitoring, Analytics and ML Systems. And in such cases retaining data in-memory backed by persistent storage in the form of tree data structures keyed by aggregation windows are useful for range queries at varying granularities. For e.g.: root window [0-100). Immediate child node windows [0-50), [50-100) etc.
It could be helpful to talk about idempotency within queue consumers. And also out-of-sequence arrival of events in the queue (handled through watermarking)
Can have some future videos which go deeper on probabilistic data structures or other more foundational topics.
Amazing, really better than other stuff I found on internet!
Kudos to you!
This is great. And in case someone wants to deep dive specifically into the Lambda and Kappa, you can refer to the Alex Xu book 2.
Love these! And can't recommend the Hello Interview mock interviews enough!
Wahoo thanks Ben!
these are so good , such deep dive and so clear ! Thanks because they cover so many different aspects . i am looking forward to many more videos . I am currently preparing for interviews and these are so helpful ..
Thanks so much for doing this! Greatly appreciated! By far the best system design videos I've seen.
absolutely brilliant content mate. keep em coming. only channel for which I have a notification on.
I like the signed impressionID for deduping. Great video. Thanks!
Amazing content! Very much appreciate you posting these 🙌
System design padawan here. I have a question about the hybrid approach .. what makes us trust the "Kinesis -> Connector -> S3 -> Spark -> Worker -> OLAP" results more than the "Kinesis -> Flink -> OLAP" results? Is it a guarantee where the connector always successfully writes the data to S3? or does Flink make some kind of tradeoff for speed? kind of confused about that piece and figured i'd ask. thanks again!
I am also curious about this
+1 on this question.
IMO, Spark is useful when you have really out-of-order events, like events arriving half an hour late or something. Then, by using Spark, you can reorder the events and get a more accurate count. On the other hand, for events that are only a few minutes late, you can configure Flink with the allowed lateness of a few minutes.
That being said, the cost of maintaining 2 code bases and ensuring that they are updated at the same time (to avoid triggering discrepancy alerts) doesn't seem worth it for such edge cases.
I'd be interested to hear @hello_interview's thoughts on this though.
> Is it a guarantee where the connector always successfully writes the data to S3?
Yes. That is something provided by AWS as a managed service, and it should meet their SLAs. AWS would've created fallbacks and fault tolerance to ensure that all events that are in Kinesis reach S3.
I don't see there being any compromise due to speed. Kinesis has retention policy, so the data isn't going anywhere, and S3 is highly available, so the data shall be written there as well.
You are a legend man. Make some more videos which are mentioned on your websites. Search, E-commerce , Hotel Booking system etc.
Beautiful Design and Amazing explanation - just impressed with the elegance of the design and the beauty of software engineering.
😍
need more like this man!
Amazing explanation skill you have, OMG.
Looking for your next videos. Pls upload more design problems. It almost 1month you have not uploaded. Love your content.
Sorry, was traveling. Recording a video today! Up by EOW
Hey Evan, your videos have been the best, seriously! The only suggestion I have is if you could use a dedicated mic that would be wonderful. The volume is too low even on max volume on my earphones.
Updates in latest video! Have a nice mix now :)
@hello_interview yay 😀
Thanks for the video, it is really in-depth and informative. You covered the flink + streaming part really well and why not to use checkpointing was really a good point. Saw some videos in the past that mentioned let's do checkpointing and never explained why it will work or not work.
For the streaming components both streams (kinesis or kafka) and stream processor (Flink/kafka-streams) I got following questions ( (I was asked similar questions in an interview.)
1. we have said that we will partition the shards or streams using adId but we have 10M ads at any given point of time. In this case, if I consider other things like replicas in kafka topic or even kinesis shards, don't we have to create lots of shards?
2. The max traffic is 10k cps so most of the ads are not clicked and won't have active traffic so most of the shards will be empty, in this case what should be the approach?
3. If we group some ads based on let's say advertiserId than we introduce the noisy neighbour problem where one ad generates lots of traffic and blocks the other ads.
Those questions are great and definitely come up often with streaming setups. I'm no expert but ...
Partitioning with 10M Ads ... Yeah, if we just went with adId for partitioning, it would mean loads of shards or partitions, which isn’t practical. Instead, you can use consistent hashing or modulo partitioning to map multiple adIds to a smaller, manageable number of partitions.
Hey, I love these videos. I only used your videos and designing data intensive applications and that was enough for an E4 offer at Meta, I love the advice you give and common pitfalls you provide.
Crushed it. Congrats on your offer!
Thanks a lot ! This video is super helpful. It not only helped me understand the key components of ad click aggregator, but more importantly it taught me what does the interviewer expect for different level SDEs. One question --- do we need to dive into the aggregation logic details in Flink?
Very useful video and its best among others. I got this same question twice in my loop interview.very happy how I answered
Nice. Good job!
Thanks for the content; It's great; One query on
1. As we compute 10sec and write it to OLAP DBs ; The deep dive on how the user queries for 1min window, 1hr window, may be 1 day and all time window would be great info;
2. And what could be that OLAP database choice would be and what factors to consider.
3. Another aspect is; Freshness of the data; What does each component in the system contributes to the latency to make the event available for aggregates;
Other aspects are great; thanks
I like your videos, I have learned a lot.
A couple comments on this video:
a. I think the system would benefit from a Redis in the click processor service, not the idempotency lock but a redirectUrl cache {adId: redirectUrl} to reduce reads in the Ad DB. It might be a MRU cache to avoid overloading the Redis.
b. I'm not sure why you are pushing Kinesis so hard in this solution, I mean yes I learned something about Kinesis, but it would be more practical just to place a Kafka that can handle the load peaks and has event history as well so it is possible to write the reconciliation procedures from it.
c. I learned about Flink, thanks. I used a redis aggregator in my own solution.
Thank you so much for your work!
I’ve become an annual premium member of HelloInterview after watching couple of your videos. It’d be really helpful if you started a chat/messaging platform for the paid subscribers, so that we could get our questions answered without relying on RUclips.
Thanks for the detailed explanation! Definitely learned some new things in this video.
I love this channel. Very good job sir, your strategy is really good a comprehensive. Straight to the main points. Bravo
For hot shard problem in Kafka, you can salt things but in Flink, we will no longer have all the events aggregated in one Flink task. In the case where we are sinking to Redshift, we could have aggregate it there. but if we want to access it in stream, maybe we need a secondary stream that aggregates everything?
So that added salt needs to be handled one way or the other
A single Flink task can read from various shards.
This is gold right here. Thank you!
Awesome content!! I was hoping to get more clarity on how same flink node determines the partitions it need to consume from in case of hot shard issue which we are effectively handling by adding a number b/w 0-n to the partition key which leads to same adId clicks published to multiple partitions in the stream.
This is great content. However, I would like to point out that the Lambda architecture includes both batch and speed layers, which process historical and real-time data in parallel. It’s not solely reliant on batch processing.
I believe the checkpoint is still necessary so it avoids data loss. If the server crashed, the checkpoint stores what the offset in kafka/kinesis stream is at and restart from the offset in the checkpoint.
Without the checkpoint, flink has no idea where it should be restart from and the data will be lost between the last time it send data to the OLAP and the the it crashed.
However, Kinesis is aware of all subscription offsets, allowing Flink to read from the correct position in the stream.
Kafka also stores offset for each registered consumer group so when flink job restarts it will start consuming messages from last committed offset. However I think checkpointing is necessary for data integrity if we are using processWindow function for doing aggregation in flink pipeline which is stateful operator, if we disable checkpointing and server crash happens, it will loose the state and data loss will be there. This can severly impact for large window size of 15 min or 30 min
How can you identify if interviewer is asking Product vs Infrastructure system design question?
Great explanation and design! Would like to know what are the pitfalls of an alternative design using Kafka with Kafka Streams. Thanks!
The final solution _is_ literally lambda architecture.
I am learning so much, my god
Thank you so much for the explanation. However I more like you write the functional and non functional requirements instead of paste from the clipboard. The main reason is that it's make me thinking about what should be the requirements. It's feels more like code pairing
Since you are filtering for a specific AdId (AdId = 13), grouping by AdId is redundant. The result will only include data for AdId = 13, so grouping won't change the output
Great video! I learnt a lot! One question I have is regarding the hot shard handling. When the click processor detects there is a hot shard and decides to change the key from AdId to AdId:n, how would it let Flink know that it now needs to change the aggregation logic (and sharding logic) for that particular AdId? (I believe this would also have a race condition when it changes within a minute window, but any data integrity issue that arise from it should get resolved by the batch job)
I belive aggregation logic must have way to detect such AdIds maybe detect it by inspecting key format and do the processing. One more thing, Once we are storing these resulti finally in OLAP database, we need to have additional handling of such Ad to generate accurate result as we have mutliple aggregation results for such ads in db AdId:1....AdId:n and do one more level of aggregation
Incredible video with excellent drawing and explanation.
Thank you for sharing!
I got a question about the decision to not use checkpointing in Flink. If you don’t enable that then where would you store the Kafka consumer offsets for recovering?
I love the content you post. I am able to clear my interviews because of your notes.
In above video do you think Timeseries DB will be better choice for Click DB as compared to Cassandra?
Hi Evan, Absolutely gold content. I have one doubt here.
For a really hot ad, you are adding some random no (1-n) to build the partition key before adding it to the kinesis stream. Now this particular ad can land into multiple spark consumers. How the different spark consumers will aggregate this data for this ad. Is there anything that i am missing?
Are you referring to keyBy(adId) in Flink?
Why this is not lambda architecture? It has both realtime and batch processing, so what is the difference?
Thank you, btw
Thanks for the great videos - they are extremely helpful.
I noticed at around 24 mins in you mention querying for the number of distinct userIDs. I don't think you're going to be able to serve queries like that using the pre-aggregation you suggest doing with Flink. I don't know a good solution to this problem other than dumping the list of userIDs for each minute window into your OLAP DB. You might be able to use HLL hashes for this, but depending on how granular the dimensions are in your DB, it may not be worth it..
I think it's at least worth mentioning this if we think users care about unique counts.
Your videos are very helpful. Thank you 🙏
when will you post the next interview video? waiting for it about 1 month!!! really appreciate the effort.
Tomorrow!!
For Kinesis hot shards, we don't know if an ad is hot beforehand. So are these ad_id 0-N always active? Is it ok to use x10 the amount of streams we need under normal circumstances?
For Flask, we have the same amount of flask servers as the Kinesis shards right? If the server dies, how will the new server keep track of the pointer from the old server? Are they statefull backups instead of stateless
This is a great question. In reality you can make predictions here. We know based on budget and historical performance which ads we’d need to be worried about before hand
I didn't understand the idempotency part, what if the same user sees the ad two times in a day. Wouldn't we like to count it
Love your content! Keep them coming!
Thank you for the video! Isn't this exactly what Lambda architecture is and not a "hybrid between lambda and kappa"?
The best system design content.
Thanks alot for helping me to prepare for my upcoming interview s.
Can you please clarify the difference between product design and system design at Meta?
www.hellointerview.com/blog/meta-system-vs-product-design :)
Amazing system design. I've been searching for something exactly like that, which is interview driven, not "let me show you all the ways you can do it" driven.
May I just make a small suggestion: please use "etcetera", not "eccetera"...
Simply the best resource !!!!
Thanks alot, great explanation \o/
Regarding handling the idempotency, why can we not get away with only the ad impressionID? Is signed one actually required?
This is awesome. Thanks Evan. I have a question on the usage of blob storage. Aren't those supposed to be used for storing unstructured data like images, audio & video files? Could you please elaborate on 1) how Spark task reads data from S3 2) how Spark job would sustain reading one day's data? Is checkpointing used in this context?
I really like the leveling information.
This is another great video!! Please keep it coming. Can we use mirroring in Kafka and have spark read from the mirror and provide data to reconciliation service?
you know i have less familiarity there. potentially, but not totally sure.
The system design videos on this channel are the best out there. Thanks for putting in so much time!
I did have a question regarding the proposed reconciliation architecture: I get, that data accuracy is important and it acts as sort of an "Auditor" in our system. However, you mentioned that errors might stem from e.g. bad code pushes or out of order events in the stream.
The proposed reconciliation architecture would really only fix issues that would occur *within Flink* though, right? At the end of the day, the spark job is 'still acting upon that same data from Kinesis, so in case of out of order events or bad code pushes, it would also be affected, no?
Yah if you messed something up in the server before kinesis you’d be screwed still. But you’d want to keep that as simple as possible. You can trust kinesis will be reliable, out of order won’t matter for reconciliation.
@@hello_interview got it. Thanks for the quick response. :)
The final design has both real time data processing and batched processing. Why is it not lambda architecture?
this one had a diff approach i dint see b4. nice design.
Great video, covers so many new topics. Is the use of a time-series database practical for storing click events by timestamp? By design, that would help with aggregation in 1 minute windows, no?
Hi Even, thank you very much for your great video. But there is on point I don’t fully get it, the way you handle hot shards problem by adding number randomly at the end of AdID. What would happen if we have few popular Advertisements? In that case, we might need an executor for all shards. Is my assumption right? Really hope that you can help me to clear that. Or anyone here know the answer, could you please help me?
best way to learn system design.
Nice video. Liked the dive deep. One QQ: Why don’t have DB introduced earlier before Kafka/kinesis and then Kafka reading it from this Db, instead of storing it for recon? Any flaw with this approach?
Question: For the sharding while processing the events through Kinesis, the adId was suggested as the sharding key. This doesn't look like the best approach. At scale, millions of ads are being run on the platform and a good share of them have high enough volume. Going by the presented logic, the number of shards would explode. What do you think about this?
Love the content! Thank you for making these!