I have seen many other channels with regard to system design, but I’m yet to see someone explaining this clear, and yet so simple to understand. Evan, You’re an absolute GEM of a TEACHER. Can’t thank you enough for starting this channel and explaining things to us. May you have a long life, and keep your beautiful teachings coming..🙏🏼
I have cracked two system design round just because of you guys , more than the design the interviewers are loving the methodical approach i am taking while designing, satisfying each functional requirement at a time
Evan, I don't think it can be said enough what a wonderful teacher you are. You make complex things seem so simple. Thank you for making these videos and making us a little better with your knowledge.
A few hours ago, I watched your video on Uber's system design, and I was really impressed by how you broke it down. You started with the high-level design and then shifted focus to the non-functional requirements, which was very insightful. It was a great learning experience for me, and I'm looking forward to seeing more videos from your channel!
Started watching your videos from last some days, it's really helpful. You explain any complex problem in a very simplest way. Best system design videos on internet without any fee. Thankyou so much for your effort, Keep it up:) ❤
Evan I just want to say I love you. You are an absolute rock star and these breakdowns are so so helpful. I am so appreciative of you and your work. Keep going!
Thank you, Even! I always enjoy your system design interviews and posts-really appreciate all that you do! And a huge shoutout for creating Hello Interview! The system design posts, especially the Overview ones, are incredibly helpful for a beginner! The guided practice, mocking system and feedback system, and the way generative AI is used for summaries and feedback are absolutely fantastic. I love it!
Your system design tutorials are very clear and informative. Its just what I was looking for. Thank you for making these videos. Subscribed to youtube + the hello interview practice subscription.
Thanks! Very helpful and informative video for the start! I think that it's fine to say that we are good with one instance of a database based on calculations, but we never want it to be a SPOF, so we need at least one replica
Loved this as much as all other ones. The details on your videos is praiseworthy. Could you try some creating some framework for LLD questions. That is another important interview round for elimination.
Thanks a lot for this! super helpful stuff, I've got two questions tho: 1) Which parts of our system here can be regionalized? e.g. database, redis instances? and which cannot be 2) About the database going down part at the end, and the replica taking over, who's managing this transition? and the cronjob that dumps a snapshot in S3, which instance of the database exactly is it snapshotting from if the main one is down? Hope my questions make sense :))
Hello Evan, I think if you're doing another read from the db to check for collision, you will need a strongly consistent system though. Eventual consistent does not guarantee that the lastest update is reflected meaning the collided short code is written to the db before reading from it. So if from the requirement that you're designing eventual consistent system, you can't really use those approach.
@ yes you can make short code the primary key. But even if you’re using a no SQL like dynamo db, read after write is only in the same region, not global table. For example I have a record like xxx1, this is a write operation. And i have another short code generation that hashes exactly to xxx1. When the second request checks the db, there is no guarantee that the previous write already made it to the node.
Thanks for these great videos! For the first time, I think I disagree with a point you made during the final conclusions. To satisfy high-availability, it seems insufficient to back up the entity database to S3. If the DB goes down, then presumably the read path also goes down until the backup comes back online. Since the DB is only on the order of 500 GB's, what would say to just expanding the redis instance to be a partitioned cluster (the same as the partition key used in your original DB)? I think we'd get additional fault tolerance for having Redis spread out (we can replicate keys a few times) and rely on your s3 snapshot, however the fault mode still allows for slightly slower reads (since we may go to the wrong node on read) during a redis node recovery from s3 snapshot rather than having to fail until the a more traditional DB instance recovers.
I agree with you and need to watch again to remember what I said. But if I implied that just backups was enough that’s wrong. You’d need a hot replica or two
When you added the cache (Redis) with only key: value, you stripped it of the expiration time. This functionality would break. You didn't mention it, but you could set an expiration time on the Redis key, forcing a cache miss. The call would then need to confirm if it's simply out of cache (LRU) or in fact expired as a shortened URL when it goes to the main DB.
Thank you for video, really simple design. However, you didn't touch different availability zones, which could be latency killer, basic ping for EU to JP is more than 200ms. If you really need 200ms latency, you should definitely create at least couple of zones and design how you will keep them in sync
Legend is back! Hey Evan, Yesterday only I went through written article for tiny url on your website. I have read Alex’s Book, Grokking Interview book and you articles for the same design. All your article along with alex and grokking are great. As Alex has a book edition, I wanted to suggest you to do the same for the articles that you have on the website. You can create a book out of it and it would be pretty cool.
I think another benefit of using 302 redirect over a 301, is that in the event the shortURL expires or is no longer valid, using 302 will help the user pick up on that. if 301 is used and one user is caching the old URL and a new user now uses the same URL to redirect to another website that could potentially be a bug (in the event that the first user empties their browser cache)
55:40 What if the global counter is down? It was mentioned at the end. Just curious in an interview do we care about that time during the transition between bring the replica up. If it could be 30 seconds and a server need to get new ids, what should we do. It should be a corner case but it indeed could happen in real world. What other options do we have? Use postgressql table record as a id generator? dynamodb should work here as well. I like the batching id idea a lot. It limits the times we need to contact the counter that makes it more efficient.
If we are designing this for a Product Architecture interview, which areas should we focus on more, and which can we skip? (specifically for the Meta Product Architecture interview)
Great video! Will this type of system designs come up in a Meta Product Architecture design? There are a lot of backend content and calculation in this video so I'm just wondering, though I thoroughly enjoyed learning it as well.
Just curious, if the interviewer, could start to deep dive, especially for API gateway, or microservices scaling, what kind of tools are needed, or some monitoring and logging best practices. Or maybe k8s-related things for orchestrating the setup or do you think this will be an overkill from interviewer side? Also, can we make here the Writer as a Lambda?
wrt lambda, you could I suppose. Write throughput is still high enough that I wouldn't. Would likely get expensive. Observability is always a nice thing to talk about if you have time, but generally its extra imo.
I have a question about the CAP theorem explained here- Shouldnt we focus on consistency to make sure that the data remains unique and we make sure that when the user writes a new short URL that when the other user is generating a shortened url the data they read is correct? The only way I think we can avoid this is using a single instance for the DB
which tool do you recommend to use during a System Design interview to gather requirement and draw the diagrams like you did? Which one are you using? 🙂
@hello_interview, thanks for the content. I have a question regarding the cache. Is it really the read-through cache pattern? Per architecture diagram it seems like the app is in charge of keeping cache in sync with DB - cache a side. My understanding was that in the read-through patter the app interacts only with cache and the cache is responsible for keeping itself in sync. Thanks.
Can you explain better the 55:00 part when you say to get the next 1000 counts from the global counter and keep it in memory. That means you get the current count, for example 8, from the global counter. So you have the current count, and during the next 1000 thousand write requests, you use that 8 to increment in memory and generate the base62 of 9, and after that 10, and 11, and eventually you hit the 1000th write request and update the global counter with 1008. In that case, when you get the value 8 of the global counter, you keep it in a global variable?
Yah you’d fetch 8-1008 and keep those in memory on the server so you don’t have to worry the counter again until you run out. If the server goes down, you lost at most 1k, so be it.
The counter as a single point of failure seems dangerous. I'd be happier if it were higher availability. Just spreading it across multiple redis instances where each has an assigned modulo would be pretty easy right?
Great video! Thank you so much. Just one quick question: how did you do the calculations about sizes? like 62^6 is 56B and 500bytes * 1B is 500GB? is these just general approximations that everybody should know?
Thank you! But isn't it a bit more algorithm and not as much system design? Because basically the main thing here is to create the algorithm that returns the url.
I would argue that something in the functional requirements is not really correct. You have built a redirect API, so the person with the short URL must use your website instead of pasting the URL into the browser. Shouldn't we instead create a CNAME that redirects the client when the client pastes the URL into the browser? Indeed, it will take longer for our new DNS record to propagate, but I would say that this is a more complete system.
i guess we need 302 for one more purpose how will i expire the tiny url, if it will get cached then we will not have the abilitiy to expire it, is it correct??
Good point. In theory, you could put constraints on the alias to ensure not. Like >7 characters. Or only lowercase and greater than 3 characters. Bit contrived though. I’d just go with the retry on PK error and the counter reduces the likelihood of that.
Hi sir is there a way to connect and work with you, I am a senior software engineer at a big MNC and an content creator with 25k subs, a 5 minute chat or call would be really helpful, Thanks.
I have seen many other channels with regard to system design, but I’m yet to see someone explaining this clear, and yet so simple to understand. Evan, You’re an absolute GEM of a TEACHER.
Can’t thank you enough for starting this channel and explaining things to us. May you have a long life, and keep your beautiful teachings coming..🙏🏼
You rock!
I have cracked two system design round just because of you guys , more than the design the interviewers are loving the methodical approach i am taking while designing, satisfying each functional requirement at a time
LETS GO!!! You rock. Well done!
Evan, I don't think it can be said enough what a wonderful teacher you are. You make complex things seem so simple. Thank you for making these videos and making us a little better with your knowledge.
A few hours ago, I watched your video on Uber's system design, and I was really impressed by how you broke it down. You started with the high-level design and then shifted focus to the non-functional requirements, which was very insightful. It was a great learning experience for me, and I'm looking forward to seeing more videos from your channel!
More coming!
Started watching your videos from last some days, it's really helpful. You explain any complex problem in a very simplest way. Best system design videos on internet without any fee. Thankyou so much for your effort, Keep it up:) ❤
You rock!
honestly these videos are pretty long but they are really easy to watch, thanks for your work brother
Evan I just want to say I love you. You are an absolute rock star and these breakdowns are so so helpful. I am so appreciative of you and your work. Keep going!
Made my day!
Great Explanation, Keep going ❤
I want to let you know that I've watched all your system design videos, looking forward for new videos, Thanks.
Thank you, Even! I always enjoy your system design interviews and posts-really appreciate all that you do! And a huge shoutout for creating Hello Interview! The system design posts, especially the Overview ones, are incredibly helpful for a beginner! The guided practice, mocking system and feedback system, and the way generative AI is used for summaries and feedback are absolutely fantastic. I love it!
So hyped to hear you've been enjoying Guided Practice too!
hi helloInterview! you have the best system design videos bar none. thanks for making these videos!
My pleasure! Glad you like them
Your system design tutorials are very clear and informative. Its just what I was looking for. Thank you for making these videos. Subscribed to youtube + the hello interview practice subscription.
Right on! Glad to hear it :)
Thank you Evan, great video. Video on bidding system design please. 🙏
Thank you for expanding on the security piece!
always enjor watching your videos and learn something
Thanks! Very helpful and informative video for the start!
I think that it's fine to say that we are good with one instance of a database based on calculations, but we never want it to be a SPOF, so we need at least one replica
Agreed! I said that in the video, no? Maybe it slipped my mind if not
54:45 honestly mesmerized by how smooth the cursor movement is during "wicked fast cus it's in memory"
Thanks for doing an easy one. needed that confidence booster.
My pleasure!
Haha, this video should have come before the others, but as always, the content is top quality!
Hindsight 20/20!
@@hello_interview You folks should write a book-I'm pretty sure it would be better than the others out there!
Thanks so much for these videos Evan! They're extremely helpful
Glad you liked it :)
Amazing as always! You guys rock! Eternally grateful!
🫶
Loved this as much as all other ones. The details on your videos is praiseworthy.
Could you try some creating some framework for LLD questions. That is another important interview round for elimination.
Maybe in the future!
Thanks for changing the font!!!
Got you :)
Hi Evan, great content again, thanks a lot, maybe simple but fun to revise with you
♥️
Thanks a lot for this! super helpful stuff, I've got two questions tho:
1) Which parts of our system here can be regionalized? e.g. database, redis instances? and which cannot be
2) About the database going down part at the end, and the replica taking over, who's managing this transition? and the cronjob that dumps a snapshot in S3, which instance of the database exactly is it snapshotting from if the main one is down?
Hope my questions make sense :))
Hello Evan, I think if you're doing another read from the db to check for collision, you will need a strongly consistent system though. Eventual consistent does not guarantee that the lastest update is reflected meaning the collided short code is written to the db before reading from it. So if from the requirement that you're designing eventual consistent system, you can't really use those approach.
Just make the short code you’re primary key
@ yes you can make short code the primary key. But even if you’re using a no SQL like dynamo db, read after write is only in the same region, not global table. For example I have a record like xxx1, this is a write operation. And i have another short code generation that hashes exactly to xxx1. When the second request checks the db, there is no guarantee that the previous write already made it to the node.
Thanks for these great videos! For the first time, I think I disagree with a point you made during the final conclusions. To satisfy high-availability, it seems insufficient to back up the entity database to S3. If the DB goes down, then presumably the read path also goes down until the backup comes back online.
Since the DB is only on the order of 500 GB's, what would say to just expanding the redis instance to be a partitioned cluster (the same as the partition key used in your original DB)? I think we'd get additional fault tolerance for having Redis spread out (we can replicate keys a few times) and rely on your s3 snapshot, however the fault mode still allows for slightly slower reads (since we may go to the wrong node on read) during a redis node recovery from s3 snapshot rather than having to fail until the a more traditional DB instance recovers.
I agree with you and need to watch again to remember what I said. But if I implied that just backups was enough that’s wrong. You’d need a hot replica or two
When you added the cache (Redis) with only key: value, you stripped it of the expiration time. This functionality would break. You didn't mention it, but you could set an expiration time on the Redis key, forcing a cache miss. The call would then need to confirm if it's simply out of cache (LRU) or in fact expired as a shortened URL when it goes to the main DB.
Yeah, good call, oversight. I need the expiration to set the TTL on the key-value pair.
great stuff, thanks a ton for this
Thank you for video, really simple design. However, you didn't touch different availability zones, which could be latency killer, basic ping for EU to JP is more than 200ms. If you really need 200ms latency, you should definitely create at least couple of zones and design how you will keep them in sync
Legend is back!
Hey Evan, Yesterday only I went through written article for tiny url on your website. I have read Alex’s Book, Grokking Interview book and you articles for the same design. All your article along with alex and grokking are great. As Alex has a book edition, I wanted to suggest you to do the same for the articles that you have on the website. You can create a book out of it and it would be pretty cool.
One day we hope to :)
I think another benefit of using 302 redirect over a 301, is that in the event the shortURL expires or is no longer valid, using 302 will help the user pick up on that. if 301 is used and one user is caching the old URL and a new user now uses the same URL to redirect to another website that could potentially be a bug (in the event that the first user empties their browser cache)
Agreed!
I really like your system design topics! Could we please have design for Twitter or Instagram?
Lovely video! Thanks!
You bet!
55:40 What if the global counter is down? It was mentioned at the end.
Just curious in an interview do we care about that time during the transition between bring the replica up. If it could be 30 seconds and a server need to get new ids, what should we do. It should be a corner case but it indeed could happen in real world.
What other options do we have? Use postgressql table record as a id generator? dynamodb should work here as well. I like the batching id idea a lot. It limits the times we need to contact the counter that makes it more efficient.
If we are designing this for a Product Architecture interview, which areas should we focus on more, and which can we skip? (specifically for the Meta Product Architecture interview)
Great video! Will this type of system designs come up in a Meta Product Architecture design? There are a lot of backend content and calculation in this video so I'm just wondering, though I thoroughly enjoyed learning it as well.
Meta doesn't typically ask this question anymore, but if they did, yah it'd be fair game in PA.
Just curious, if the interviewer, could start to deep dive, especially for API gateway, or microservices scaling, what kind of tools are needed, or some monitoring and logging best practices. Or maybe k8s-related things for orchestrating the setup or do you think this will be an overkill from interviewer side?
Also, can we make here the Writer as a Lambda?
wrt lambda, you could I suppose. Write throughput is still high enough that I wouldn't. Would likely get expensive. Observability is always a nice thing to talk about if you have time, but generally its extra imo.
Evan you rock! However I think there's a slight correction needed at 55:29 where it was mentioned 3.5T numbers whereas we previously had 56B instead?
3.5T is 62^7 so if we went up to a short code of length 7. Might not have been clear there.
how will manage with expiration? I think better solution is to use cassandra, and shortUrl is partition key and you can use ttl out of the box
I have a question about the CAP theorem explained here- Shouldnt we focus on consistency to make sure that the data remains unique and we make sure that when the user writes a new short URL that when the other user is generating a shortened url the data they read is correct? The only way I think we can avoid this is using a single instance for the DB
Make the short code the primary key :)
which tool do you recommend to use during a System Design interview to gather requirement and draw the diagrams like you did? Which one are you using? 🙂
Excalidraw
@hello_interview, thanks for the content. I have a question regarding the cache. Is it really the read-through cache pattern? Per architecture diagram it seems like the app is in charge of keeping cache in sync with DB - cache a side. My understanding was that in the read-through patter the app interacts only with cache and the cache is responsible for keeping itself in sync. Thanks.
Can you explain better the 55:00 part when you say to get the next 1000 counts from the global counter and keep it in memory. That means you get the current count, for example 8, from the global counter. So you have the current count, and during the next 1000 thousand write requests, you use that 8 to increment in memory and generate the base62 of 9, and after that 10, and 11, and eventually you hit the 1000th write request and update the global counter with 1008.
In that case, when you get the value 8 of the global counter, you keep it in a global variable?
Yah you’d fetch 8-1008 and keep those in memory on the server so you don’t have to worry the counter again until you run out. If the server goes down, you lost at most 1k, so be it.
The counter as a single point of failure seems dangerous. I'd be happier if it were higher availability. Just spreading it across multiple redis instances where each has an assigned modulo would be pretty easy right?
Very! Just enable high availability with redis sentinel. Should’ve mentioned this but took it for granted tbh
Great video! Thank you so much. Just one quick question: how did you do the calculations about sizes? like 62^6 is 56B and 500bytes * 1B is 500GB? is these just general approximations that everybody should know?
62^6, no. You'd need a calculator for that usually.
But 500 bytes * 1B = 500gb definitely
Thank you! But isn't it a bit more algorithm and not as much system design? Because basically the main thing here is to create the algorithm that returns the url.
Yah
Could you please improve the video resolution? it doesn't look like 1080p60!
Should we also know GraphQL even if we have no experience with it or will REST suffice for the META system design interview?
90%+ of the time REST is fine. Worth just reading the basic on graphQL, mostly because it’s pretty straightforward as a concept and cool to know
@hello_interview please bring video on system design of google docs
Working on it!
Thanks hope it comes early as I have my interviews scheduled.
what if multiple people want to keep a same alias and there was no concept of user accounts?
At 55:15, where does 3.5 trillion number come from?
I would argue that something in the functional requirements is not really correct.
You have built a redirect API, so the person with the short URL must use your website instead of pasting the URL into the browser.
Shouldn't we instead create a CNAME that redirects the client when the client pastes the URL into the browser? Indeed, it will take longer for our new DNS record to propagate, but I would say that this is a more complete system.
It is overkill, you will have to manage certificates as well and would end up with longer names which are hard to remember or copy from printed screen
i guess we need 302 for one more purpose how will i expire the tiny url, if it will get cached then we will not have the abilitiy to expire it, is it correct??
Correct!
doesn't allowing custom aliases mean a collision check/read is still required with the counter method or am i missing something?
Good point. In theory, you could put constraints on the alias to ensure not. Like >7 characters. Or only lowercase and greater than 3 characters. Bit contrived though. I’d just go with the retry on PK error and the counter reduces the likelihood of that.
Quick question, as this is for beginners, how many years of experience would a beginner be expected to have?
Hard to say. 0-4 ish
❤
What happens if the global counter goes down? isn't that a single point of failure?
nvm this was answered a few min later
My first system design question was Ticketmaster 😭
Hopefully you watched our video!
Which software are you using for drawing diagrams i wonder ?
Excalidraw
excalidraw
Excalidraw ;)
Excalidraw-draw-draw-draw-draw. (is there some echo in here?)
Like and Subscribed please don't stop
Hi sir is there a way to connect and work with you, I am a senior software engineer at a big MNC and an content creator with 25k subs, a 5 minute chat or call would be really helpful, Thanks.
Connect with me on linkedin! (link in description)
Could not get past the alelebilititties spelling mistake.
About what I’d expect from a Liverpool fan ;)