I interviewed for PayPal a few weeks ago, and this was the exact systems design question they asked. They said to me, “Do a system design for Instagram”. I smiled, because I had just watched this video a few days prior, and so I knew exactly how to answer. Thank you for this video, you helped me get a job, for real :)
As a backend developer who strugles to do projects due to thinking to much and taking a lot of time to make dumb things... I think I have found one awesome channel for me. Thanks for your videos man!
Probably the first video of the system design I have seen. Being a front end developer, I had a fair idea about the things but the way you explained just wow
I'm not even a CSE student. But watching your videos, Gaurav, actually intrigues a lot and motivates me to actually learn more about programming and design some my own scalable system one day. Thanks Gaurav.
Try coding train channel. You can watch most of his videos and have fun. Its like watching a movie and he does real coding. For example watch this video, even if you dont know programming, you will still understand it and its so much fun. Coding the snake game ruclips.net/video/AaGK-fj-BAM/видео.html
Been addicted to this channel recently and binge watching it even though I have my exams ongoing xD. Man you are the best. (saying this from my experience of having watched more than 100 "Real" Coding RUclipsrs) My Systems Design knowledge is growing leaps and bounds by watching these and I plan to implement these good designs just after my tests are over. I have worked on several Applications as a backend developer, and always stressed heavily on scalability, flexibility, ACID properties. But, your channel has taught me a lot more good techniques and design concepts.
Well, I wasn't even looking for this. Just got a random recommendation and now I'm watching this with full focus at 3AM. Never thought this day would come😂 Awesome video.
Gaurav, first of all, thank you for a fantastic, simple and clear explanation. Second of all, I can imagine the work went in to put this video, it must be humongous task in preparing the right content, taping, editing, etc. Great work!
what you said is load balancer is work of service discovery or metadata service like zookeeper/consul/etcd and what you described as gateway is the work of load balancer or reverse proxy. Looks like you haven't used these systems practically(I am not blaming) and trying to inform others based on(your interpretation of) what you read online.
Can I ask a question about your comment. I totally agree what he's describing as a load balancer is actually more like Zookeeper. But I'm confused about the gateway comment. My understanding, which I may be wrong about, is that a gateway will handle authentication, authorization, and then route the incoming HTTP request to one or more services to accomplish the task at hand depending on the configuration. So yes, it sort of acts as a reverse proxy with the addition of authentication logic and possibility of making synchronous service calls. But I don't see how the gateway is a load balancer. It doesn't distribute API calls based on load. It distributed then based on function. If you wanted load balancer between a service and the gateway or a service and another service you would still need a load balancer. Is this correct?
@Gaurav i think it should be "postId" in place of "activityId" in "comment table" when you were explaining feature no. 2 as let say we want to find all the comments for a particular post , then we will look into comment table for column postId. correct me if i am wrong
Hi Gaurav, Greetings. I love your work, I am a subscriber and a frequent liker. However, I find an implicit assumption in your system which considers the Instagram mobile app as thin clients. The process of storing the posts in the cache in the server would result in an unscalable system. I believe the posts are cached in the user's app memory(cache and physical storage), considering that these apps have a considerable chunk of internal storage used. An added proof for this would be if you try to open up Instagram in offline mode you can still see past posts and a toast message which says "couldn't refresh feed". I would like to have the cache on the user's system and then an identifier that is stored in a place where you are storing the cache of the post on the server. (considering the news feed functionality. This can be applied for other uses too. ) Thanks.
@@gkcs the posts ARE stored in application cache... but it doesn't invalidate the fact that mobile apps aren't still thin clients. a user can delete an app, or visit from a third party integration (not built by instagram) -- in which case these timeline feeds are still stored in horizontal caches. you wouldn't believe the amount of money instagram/twitter/etc spend on memcache to make this happen.
@@dustindiaz so which way do i have to follow? Do i need to cache posts in client side? I confuse in like sectiob , whenever user click like, should the client side make a request?
admin wadidaw caching on the client is helpful when revisiting an application. This way a user can be presented with information immediately. Caching on the server, on the other hand, is necessary for large scale services to deliver things like timelines since raw sql queries based on this system design would cause the system to fall over with just a decent amount of traffic
Great video! That said, in your descriptions of the database schema, you should mention hotspotting as a justification for certain decisions as well. Namely, a very good reason to not add a "likes" column to posts is that it creates a lot of contention on rows in a single table, especially because single posts can get hundreds of thousands of likes. You arrived at the same conclusion - building tables that allows for writes to avoid contention and thus reads to be aggregations (which can then utilize caching) - but I think focusing on the larger problem of hotspotting motivates your design decisions better.
@@rujotheone some records are getting queried more than others. the specific instance that contains the record will be much busier than the rest of the system. you're not balancing the load ideally uniformly.
7:16 I think we do need a "type" on the Activity table. For example, suppose there's a postID being "123" and a commentID also being "123". Since both postID and commentID can be interpreted as activityID on Activity table, if there a row on activity table with activityID being "123" we don't know it's for the post or for the comment, unless we have a column "type" to distinguish between them.
True in this case, it depends on your system though. In case where the id is a UUID, then there won't be a case where postId and commetId would be equal.
Hi Gaurav! Very thankful to you for sharing your knowledge with the rest of the world! I have 3 questions about the GATEWAY: 1) Is it a Micro-Service? If not, what exactly is it (i.e what does it contain)? 2) It seems like a single point of failure, looking at the diagram. 3) If we have multiple instances of a Gateway, then would the Load balancers be needed in between Client and the Gateway Service ?
Hi Gaurav, thank you for amazing content. Can you please share your thoughts on why you chose SQL database for all these data instead of NoSQL? Since the volume is high and eventual consistency seems to be ok, can we use NoSQL database for this kind of data? Thanks
Awesome video! thanks a lot Gaurav for this very insightful. I have a question tho regarding of likes and feeds. When retrieving the feed, what would you say is the best way of knowing if the user has liked the post that is seeing at this moment? As calling the DB every time seems a bit overkilling. Keep cracking on!
@@gkcs by the way now you're a software engineer with at least 3- 4 years of experience by now. Do you still practice algorithms? I'm in this dilemma whether to practice or take it light
Thanks for posting this, the part on how to handle the news feed helped me out a lot, originally I could only think of the first method which the administrative tasks are way too high, precomputing the news feed is an option I didn't even think about. thanks :)!
Regarding Hybrid approach : practically User1 follows the ordinary user and celebrity as well. now when post done by ordinary user it will push to user1 but when post by celebrity, system/client has to pull. now how client know when it has to pull ? @gaurav sen sir, can you please explain. or correct me if I misunderstood something...
lot of learning with video and bro one request can u make video on your uber interview about question asked roundwise and that HR round which was pretty tough as you mentioned in video (Got job in uber).
Thanks Abhishek! I won't be mentioning the questions asked, because we aren't allowed to. "Got hired" will turn to "Got fired". 😝 You can go through the content on the channel, it's more extensive than an interview set 😁
Thank you very much for this. Excellent explanation through such a complicated topic. Really helped me think through a follower service I have been struggling to commit to.
Hi Gaurav, Your energy is just unmatched! Audience Request: Please consider doing a video on how would one architect IRCTC Tatkal Booking scenario - with hundreds of thousands of tickets sold in 2 to 3 minutes time duration. Thanks
@gaurav. First of all, your videos are great. Big thanks to you. I have a question. - if I add functionality of showing user’s own profile. - all his posts will also be cached? If yes, how and where?
Hi Gaurav, First of all excellent work on the videos :) I have a doubt on the DB selection, so basically what i am understanding is when we need to store information about user we may use Mysql cause of strong relationships etc but since the content [activity] of the user is kind of unstructured wouldn't it be better to use NoSql? by unstructured i mean, we may or may not have caption, may or may not have images, instead can have videos, or comments in that case can be recursively long..Please correct me if im not going in the right direction! Once again awesome work :)
imo it'll be better to have a combination of noSQL and RDBMS for example tables which need to be regularly updated such as no. of likes must be kept in a noSQL DB whereas things like content of a post which are not changed so frequently are better to be stored in RDBMS
Hi Gaurav, First of all, a big thanks for all the videos that you are making. I am preparing for my interviews and these videos are helping me a lot. I would request you to please make a video explaining designing of a game, maybe a little complex game such as FIFA, which can give us an idea as to how to implement real-time occurrences. Since all the moves in a game need to be processed in real time it would be different from a system like Instagram as they can afford to have a delay of a few seconds but these games can not. Thanks :)
Hi Gaurav, I really like your videos they are very clear and to the point. I would like you to share in one of your videos how the Amazon Market Place Design will look and work like
From the mobile system design perspective, pull model is not suited for reasons like battery consumption, drop in network connectivity but a nice explanation of various possibilities.
Great content!!! Any tips or advise on how to determine the type of DB to use? I believe you opted to use some form of SQL database but what was the motivation behind it would be something useful for the audience(or at least me!!).
I went for the SQL DB because I hadn't spoken much about NoSQL databases then, and because I am more comfortable taking about SQL databases since I have professional experience working with them. Now, however, I would choose NoSQL for stuff that doesn't have strict consistency requirements. ruclips.net/video/xQnIN9bW0og/видео.html
fantastic explanation. but could you clarify one thing? Once the posts are stored into DB by a user, you notify userfeed service. user feed service gets followers for given userid and update post feed to "Each user's queue" which could be max 20. could you explain how queues are established? are these dynamic? say there are 20 followers. so 20 queues?
Great video Gaurav. Explained it in simple, crystal clear thoughts. However i have the below questions on this design: 1. As you have mentioned, when a celebrity like Justin Bieber publishes a post on Instagram having millions of followers, the system has to pre-compute the user feeds for all those millions of followers and it might end up crashing. Can we skip the pre-compute for these cases which involve millions of users and insert those feeds at runtime Or do we have any better way to handle this. 2. We are computing the feeds say 20 per each user, how do we handle when the user access more feeds than 20. I assume we hit DB for getting those feeds but won't it create heavy load on system when more users access feeds (>20). Is there a better way to handle this. Thank you so much for all your videos
13:30 Hi Gaurav - thanks for pointing out the need for a load balancer with the snapshot technique stored onto Gateway for network routing when we horizontally scale the server-side. But why is communicating with the load balancer inefficient? Is this to avoid constant network calls ( which are slow ) and to utilize the SS, which can be stored into memory-side on the Gateway application?
Actually a lot of considerations and thinking in multiple angles is required while doing a System Design. Sometimes, it's just like 'hey, where would that service get the data from? would it need any authentication? etc/. etc./,' Prepare well!!!
Awesome video series on this channel 💚. One request Gaurav- Could you please share insights on how the video&audio based systems are designed,built and the kind of algorithms/libraries that go into transcoding on scale, as they are computationally intensive tasks. Thanks.
hi Gaurav, few queries which were asked to me during one of the interview 1) Is the follower - followee ER design efficient when you have millions of people using the system ?
I think we could have Likes in Posts/Comments table. The reason being they don't violate NF rules + it is going to be more efficient in terms of space. And if we think whether it's a characteristic of a post, then I can't think of why not? Also, since RDs support indexing, you could also include any suggestions on which all keys to index or anything that saves the world.
Will we store the post meta-data in a relational db ? Would that quickly fall apart given the scale of Insta ? Or would a nosql like Cassandra be the way to go with the tables you've described ? What do you think ?
Hi Gaurav Great video it is. Thanks for this. Had a query. How efficient it would be when a celebrity having 50 million followers(or may be more) posts something and we need to add the post in cache for all of the followers?
Hi Gaurav, I love the way you explain things. This video actually sums up all the major components, including the DB structures and High-level architecture. I have one question regarding the design, which is more on low-level design, it would be great if you can create a video on that. Q. If I need to design the data storage in-memory, which data structure we should use to store the posts. likes, follower data, such that we can fulfill the given features efficiently. Thanks,
Very good explanantion Gaurav. The way instagram generates feed today has changed drastically with their new graph api which focuses more on relationships. It would be great to see a video on that.
@GauravSen: Excellent video! Please help me with a question: Can you please explain the reasoning/thought-process behind choosing a Relational database for Users and Feed Schema?? What factors do you consider when taking such a decision? Thanks and Regards
Amazing way of presenting the core concept Gaurav.. Really helpful, thanks.. :) Do we have any probability of learning about the chat feature as well .?
that is a very good video, but regarding #3 - I wouldn't do just a follower, followee table and the reason is scalability it would be much better to have a table something lile: userId, entityId, connectionType and insert actually two records in database when a user1 became a follower to a user2. like user1, user2, folowing -means that user1 follows user2 and user2, user1, follower => means that user2 has user1 as a follower having that it's easier to shard and scale (here sharding would be done based on userId (first row))
Sir I know u will not tell how u studied system design on ur own. But hats off to the content 👍 At ur time there might not be resources 😞 so how learnt on ur own at ur time ??
@@blinkkeebs No, I'm just going over the reddit site and reverse engineering the features, but I've also changed some things. BTW I'm building it in MEAN stack.
@@Eduardo-fk7ft Oh! I'm also building reddit clone but I use React and Next.js for the server-side rendering. The db I use is postgresql. I think I should add some extra features for the project and I found this video :)
@@blinkkeebsVery nice tech stack!!. Yes, the best way to learn anything is to make it your way or change it, that is something that works best for me. Good luck, and may google be you best friend!! :P
Nice video! I have one question, though: If we precompute the feed, what happens when a user scrolls past the 20 posts we have cached? Then we would have to compute the rest of the feed at the request of the user, which would be inefficient, right? Sorry if I didn't understand that correctly.
Great explanation as always. However, I have a little doubt. Instead of sending notifications from Post Service to the user feed service directly, what iff we introduce an event bridge in between and different services will communicate with each other based on the event produced by other services?
For pushing notifications, WebSockets won't work when the application is closed. Also, WebSockets is a costly way to manage connection just for updates related pushes. It will drain a lot of battery unnecessary. WebSockets are more useful in more real-time use cases when the application is actually open eg. Chat applications, Multi-player gaming applications. Notification for updates for posts need not be real-time. Firebase Cloud Messaging is the best way to do that.
Hi, Could you share more technical details on how this load balancer can have snapshot of each service. how are sharing of these snapshots between load balancer and api gateway actually implemented
Hi Gaurav, Your explanations are very clear and relevant. I love your accent. Most of the guys use fake accents just for videos which irritates me a lot. One request I wanna put here for the system design of Metro system, stack overflow system as mostly I saw them in the tread.
It would be great if you made a video that delved deep into the concepts of load balancer vs. proxy servers (forward and reverse proxy) vs gateways. TIA! :)
I don't see any discussion on the type of databases used SQL vs NoSQL and also how you are scaling the DB, what type of queries you are using and how you are partitioning the DB? Also, it would be more helpful if you can explain with some concrete examples on the capacity estimates like the total users, Daily Active users, daily file uploads, total DB/Object storage etc
In the use-case celebrity, the decision has also be made on the basis if the polling is affecting the user's device battery or not. WorkManager on Android is a great way to achieve that. It optimises the resources and takes decision when to give a CPU chunk to a particular application only when the device has decent resources to spare. This could could surely save a lot of CPU clocks on our server end for sending PUSH.
Hi @Gaurav, It was really nice video . I liked it very much. But 2 things I want to ask : 1. How about getting the user feed of followed person based on timestamp and it will provide the upto 2 full pages(full pages lets which will fill mobile screen or lets say top 20 ) of feed when user scroll upto 70% of current feed it will resend next two pages of feed . 2. Sending the notification to the user can be learned ? I mean to say if user is not clicking the notification of particular person from next time system will not send the notification of that particular person . I mean system will learned the behaviour of user which will saves lot of time in sending each followed person's notification.
Thanks Ritesh! The solutions I would choose are: 1) Recompute the feed using the queries mentioned, and paginate as required. 2) User behaviour is difficult, expensive and controversial to predict. I would go for simple solutions before going for custom solutions. Chances of building something unnecessary are lesser. 😁
I believe we can store the user feed on the client as well, since we can recompute in case of app is reinstalled. Other thing is cache will be updated even when user is not even using the app but if we store at client we will only be updating (Hybrid Model will be better) when user is up and running.
@@gkcs I would agree, Some users like me use Instagram on browser only. :) As per as app is concerned we can keep on mobile but I see challenges as well lets say I did not open the app for like 4 hours and whole 20 post feed needs to be updated then we need to do big recalculation step. For fast response times as long as user is on the app we can cache feed on the app push will directly go to the mobile with out one extra hop in between. But I am still not sure what will be better may somewhere in between server + client would be ideal. I may be completely wrong. Thoughts?
Hi Gaurav, Excellent video as usual. I had one doubt. If any celebrity posts an image, Will the system still update the queues for userfeeds of millions of users? Will it be efficient ? If not, how is it done?
Great Video. For use case user-follower scenario, I see technical solution as Graphical DS problem (considering time). Reason is fetching feed both side will become simple and fast. Looks like an interesting problem to analyze/compare various SD approaches against above use case.
question is about the DB design for followerID and followeeID : I assume none of the columns here have constraints on them. If my assumption is right, then how the TAT is in the limit without indexing on the table? Basically, I am not getting (i)how this table is connected to other parts of the whole schema and (ii)retrieving results in time rather not slowing down the total response time? Besides, Thank you very much Gaurav like always ;) :)
@Gaurav Sen , hello, thanks for the video, my question is why did you go for a table based db here , any specific reason, and i was thinking for maintaining post comment wont the subset pattern of mongodb be more better option here.
I interviewed for PayPal a few weeks ago, and this was the exact systems design question they asked. They said to me, “Do a system design for Instagram”. I smiled, because I had just watched this video a few days prior, and so I knew exactly how to answer. Thank you for this video, you helped me get a job, for real :)
Congratulations!
wow awesome Elli May got job in Paypal 😁
What's there to be proud of when you've seen the answer to an interview question beforehand?
@@brandonzheng1092 so one should not feel proud anyways, cuz he/she had studied that in books b4... lame perception
@@brandonzheng1092 I sense happiness rather than pride, none of that this person said implies proudness, and even if so, why not? I'd be proud.
Damn this kid is good. Better than most of the "veteran" system architects I've worked with.
You must have worked with some really crappy architects if that is indeed the case.
Hi Sen, for the database design I think you should go from Logical ERD first then derive Physical Tables from there, it is more natural approach
As a backend developer who strugles to do projects due to thinking to much and taking a lot of time to make dumb things... I think I have found one awesome channel for me.
Thanks for your videos man!
😁
Probably the first video of the system design I have seen. Being a front end developer, I had a fair idea about the things but the way you explained just wow
I'm not even a CSE student. But watching your videos, Gaurav, actually intrigues a lot and motivates me to actually learn more about programming and design some my own scalable system one day. Thanks Gaurav.
Try coding train channel. You can watch most of his videos and have fun. Its like watching a movie and he does real coding. For example watch this video, even if you dont know programming, you will still understand it and its so much fun.
Coding the snake game
ruclips.net/video/AaGK-fj-BAM/видео.html
Explains a lot why there are so many well paid people behind each successful online service, so complex, wow
Been addicted to this channel recently and binge watching it even though I have my exams ongoing xD. Man you are the best. (saying this from my experience of having watched more than 100 "Real" Coding RUclipsrs) My Systems Design knowledge is growing leaps and bounds by watching these and I plan to implement these good designs just after my tests are over. I have worked on several Applications as a backend developer, and always stressed heavily on scalability, flexibility, ACID properties. But, your channel has taught me a lot more good techniques and design concepts.
Thank you 😁
Seriously man,
It is a great help
Well, I wasn't even looking for this. Just got a random recommendation and now I'm watching this with full focus at 3AM.
Never thought this day would come😂
Awesome video.
Gaurav, first of all, thank you for a fantastic, simple and clear explanation. Second of all, I can imagine the work went in to put this video, it must be humongous task in preparing the right content, taping, editing, etc. Great work!
Thank you!
All the important concepts are explained very simply and this is what makes this video amazing.
what you said is load balancer is work of service discovery or metadata service like zookeeper/consul/etcd and what you described as gateway is the work of load balancer or reverse proxy.
Looks like you haven't used these systems practically(I am not blaming) and trying to inform others based on(your interpretation of) what you read online.
good point.
Can I ask a question about your comment. I totally agree what he's describing as a load balancer is actually more like Zookeeper. But I'm confused about the gateway comment. My understanding, which I may be wrong about, is that a gateway will handle authentication, authorization, and then route the incoming HTTP request to one or more services to accomplish the task at hand depending on the configuration. So yes, it sort of acts as a reverse proxy with the addition of authentication logic and possibility of making synchronous service calls. But I don't see how the gateway is a load balancer. It doesn't distribute API calls based on load. It distributed then based on function. If you wanted load balancer between a service and the gateway or a service and another service you would still need a load balancer. Is this correct?
Would love to see a system design of notifications (activity feed) in twitter/IG etc. Aggregate etc.
@Gaurav i think it should be "postId" in place of "activityId" in "comment table" when you were explaining feature no. 2
as let say we want to find all the comments for a particular post , then we will look into comment table for column postId.
correct me if i am wrong
Hi Gaurav,
Greetings. I love your work, I am a subscriber and a frequent liker. However, I find an implicit assumption in your system which considers the Instagram mobile app as thin clients. The process of storing the posts in the cache in the server would result in an unscalable system. I believe the posts are cached in the user's app memory(cache and physical storage), considering that these apps have a considerable chunk of internal storage used. An added proof for this would be if you try to open up Instagram in offline mode you can still see past posts and a toast message which says "couldn't refresh feed". I would like to have the cache on the user's system and then an identifier that is stored in a place where you are storing the cache of the post on the server.
(considering the news feed functionality. This can be applied for other uses too. )
Thanks.
This is a very good point. Thanks for posting 😁
@@gkcs can you explain system design for telegram
@@gkcs the posts ARE stored in application cache... but it doesn't invalidate the fact that mobile apps aren't still thin clients. a user can delete an app, or visit from a third party integration (not built by instagram) -- in which case these timeline feeds are still stored in horizontal caches. you wouldn't believe the amount of money instagram/twitter/etc spend on memcache to make this happen.
@@dustindiaz so which way do i have to follow? Do i need to cache posts in client side?
I confuse in like sectiob , whenever user click like, should the client side make a request?
admin wadidaw caching on the client is helpful when revisiting an application. This way a user can be presented with information immediately.
Caching on the server, on the other hand, is necessary for large scale services to deliver things like timelines since raw sql queries based on this system design would cause the system to fall over with just a decent amount of traffic
hi Bro.. Actually the way you explained the stuff is very simple and clear.. Thanks for your time for making such videos..
Thank you!
Great video! That said, in your descriptions of the database schema, you should mention hotspotting as a justification for certain decisions as well. Namely, a very good reason to not add a "likes" column to posts is that it creates a lot of contention on rows in a single table, especially because single posts can get hundreds of thousands of likes. You arrived at the same conclusion - building tables that allows for writes to avoid contention and thus reads to be aggregations (which can then utilize caching) - but I think focusing on the larger problem of hotspotting motivates your design decisions better.
Noob question, please what is hotspotting
@@rujotheone some records are getting queried more than others. the specific instance that contains the record will be much busier than the rest of the system. you're not balancing the load ideally uniformly.
Wow his content is really at next level
I love this guy and respect his efforts and the amount of hard work he puts in each and every video
7:16 I think we do need a "type" on the Activity table. For example, suppose there's a postID being "123" and a commentID also being "123". Since both postID and commentID can be interpreted as activityID on Activity table, if there a row on activity table with activityID being "123" we don't know it's for the post or for the comment, unless we have a column "type" to distinguish between them.
True in this case, it depends on your system though. In case where the id is a UUID, then there won't be a case where postId and commetId would be equal.
Great video man, really appreciate the fact that you've been posting such indetail conceptual content for free.
Hi Gaurav! Very thankful to you for sharing your knowledge with the rest of the world!
I have 3 questions about the GATEWAY: 1) Is it a Micro-Service? If not, what exactly is it (i.e what does it contain)? 2) It seems like a single point of failure, looking at the diagram. 3) If we have multiple instances of a Gateway, then would the Load balancers be needed in between Client and the Gateway Service ?
Hi Gaurav, thanks for this great post. You look so young, how could you be so knowledgeable?
He is actually 45 years old. He designed a system that removes aging signs from his youtube uploads...
Hi Gaurav, thank you for amazing content. Can you please share your thoughts on why you chose SQL database for all these data instead of NoSQL? Since the volume is high and eventual consistency seems to be ok, can we use NoSQL database for this kind of data?
Thanks
Fantastic work it is because of people like you skills of general masses are also rising
Awesome video! thanks a lot Gaurav for this very insightful. I have a question tho regarding of likes and feeds. When retrieving the feed, what would you say is the best way of knowing if the user has liked the post that is seeing at this moment? As calling the DB every time seems a bit overkilling.
Keep cracking on!
Thanks for explaining the practical use of all we study in our syllabus..Your videos are superb!
Glad to hear that!
Hey Gaurav great videos bro.. Every software engineer should know system designs to build scalable, robust applications.. keep rocking!
Thank you!
@@gkcs by the way now you're a software engineer with at least 3- 4 years of experience by now. Do you still practice algorithms? I'm in this dilemma whether to practice or take it light
@@praveen3123 Never stop learning !
Thanks for posting this, the part on how to handle the news feed helped me out a lot, originally I could only think of the first method which the administrative tasks are way too high, precomputing the news feed is an option I didn't even think about. thanks :)!
Glad it helped 😁
I didn't knew the dbms subject was so much exciting....
Life is incomplete without a Gkcs design video
Hahaha!
I'm so glad I found your channel. Keep up the good work! Nice videos:)
Thanks 😁
Amazing stuff - not only informative - but interesting!
Thanks!
Regarding Hybrid approach : practically User1 follows the ordinary user and celebrity as well. now when post done by ordinary user it will push to user1 but when post by celebrity, system/client has to pull. now how client know when it has to pull ? @gaurav sen sir, can you please explain. or correct me if I misunderstood something...
lot of learning with video and bro one request can u make video on your uber interview about question asked roundwise and that HR round which was pretty tough as you mentioned in video (Got job in uber).
Thanks Abhishek!
I won't be mentioning the questions asked, because we aren't allowed to. "Got hired" will turn to "Got fired". 😝
You can go through the content on the channel, it's more extensive than an interview set 😁
Thank you very much for this. Excellent explanation through such a complicated topic. Really helped me think through a follower service I have been struggling to commit to.
Hi Gaurav,
Your energy is just unmatched!
Audience Request: Please consider doing a video on how would one architect IRCTC Tatkal Booking scenario - with hundreds of thousands of tickets sold in 2 to 3 minutes time duration. Thanks
I'll try to work on this 😁
@gaurav.
First of all, your videos are great.
Big thanks to you.
I have a question.
- if I add functionality of showing user’s own profile.
- all his posts will also be cached?
If yes, how and where?
Awesome man, im glad to see your channel, subscribed immediately! Very helpful!
Dude who follows Stephen Hawking on Instagram and why would he even be on Instagram..😂😂😂 JK
P.S. Great Video btw 👍
Hi Gaurav, First of all excellent work on the videos :) I have a doubt on the DB selection, so basically what i am understanding is when we need to store information about user we may use Mysql cause of strong relationships etc but since the content [activity] of the user is kind of unstructured wouldn't it be better to use NoSql? by unstructured i mean, we may or may not have caption, may or may not have images, instead can have videos, or comments in that case can be recursively long..Please correct me if im not going in the right direction! Once again awesome work :)
imo it'll be better to have a combination of noSQL and RDBMS for example tables which need to be regularly updated such as no. of likes must be kept in a noSQL DB whereas things like content of a post which are not changed so frequently are better to be stored in RDBMS
Hi Gaurav,
First of all, a big thanks for all the videos that you are making. I am preparing for my interviews and these videos are helping me a lot. I would request you to please make a video explaining designing of a game, maybe a little complex game such as FIFA, which can give us an idea as to how to implement real-time occurrences. Since all the moves in a game need to be processed in real time it would be different from a system like Instagram as they can afford to have a delay of a few seconds but these games can not.
Thanks :)
Thanks Ayush!
I am working on tic tac toe currently. It'll be progressing towards more complicated games soon. 😁
From Designing Tinder to Instagram, in a very short time :D
Hahaha, just 6 months 😉
Hi Gaurav, I really like your videos they are very clear and to the point. I would like you to share in one of your videos how the Amazon Market Place Design will look and work like
From the mobile system design perspective, pull model is not suited for reasons like battery consumption, drop in network connectivity but a nice explanation of various possibilities.
Great content!!! Any tips or advise on how to determine the type of DB to use? I believe you opted to use some form of SQL database but what was the motivation behind it would be something useful for the audience(or at least me!!).
I went for the SQL DB because I hadn't spoken much about NoSQL databases then, and because I am more comfortable taking about SQL databases since I have professional experience working with them.
Now, however, I would choose NoSQL for stuff that doesn't have strict consistency requirements.
ruclips.net/video/xQnIN9bW0og/видео.html
@@gkcs Makes sense. Thanks Gaurav..
fantastic explanation. but could you clarify one thing?
Once the posts are stored into DB by a user, you notify userfeed service. user feed service gets followers for given userid and update post feed to "Each user's queue" which could be max 20. could you explain how queues are established? are these dynamic? say there are 20 followers. so 20 queues?
Great video Gaurav. Explained it in simple, crystal clear thoughts.
However i have the below questions on this design:
1. As you have mentioned, when a celebrity like Justin Bieber publishes a post on Instagram having millions of followers, the system has to pre-compute the user feeds for all those millions of followers and it might end up crashing. Can we skip the pre-compute for these cases which involve millions of users and insert those feeds at runtime Or do we have any better way to handle this.
2. We are computing the feeds say 20 per each user, how do we handle when the user access more feeds than 20. I assume we hit DB for getting those feeds but won't it create heavy load on system when more users access feeds (>20). Is there a better way to handle this.
Thank you so much for all your videos
13:30 Hi Gaurav - thanks for pointing out the need for a load balancer with the snapshot technique stored onto Gateway for network routing when we horizontally scale the server-side. But why is communicating with the load balancer inefficient? Is this to avoid constant network calls ( which are slow ) and to utilize the SS, which can be stored into memory-side on the Gateway application?
Actually a lot of considerations and thinking in multiple angles is required while doing a System Design.
Sometimes, it's just like 'hey, where would that service get the data from? would it need any authentication? etc/. etc./,' Prepare well!!!
Awesome video series on this channel 💚. One request Gaurav- Could you please share insights on how the video&audio based systems are designed,built and the kind of algorithms/libraries that go into transcoding on scale, as they are computationally intensive tasks. Thanks.
Thank you!
I'll get to designing Netflix/RUclips in a while. That will be fun! 😁
hi Gaurav, few queries which were asked to me during one of the interview
1) Is the follower - followee ER design efficient when you have millions of people using the system ?
I think we could have Likes in Posts/Comments table. The reason being they don't violate NF rules + it is going to be more efficient in terms of space. And if we think whether it's a characteristic of a post, then I can't think of why not?
Also, since RDs support indexing, you could also include any suggestions on which all keys to index or anything that saves the world.
You mean the "count of likes for this post", or "who has liked which content"?
@@gkcs The count of likes.
@@PrashantMarshal Will you update this record on every like?
Gaurav Sen wouldn’t the strategy for updating the Activity table (be it batched queries or point queries) be valid for the Comments table too?
Nice explanation. Excellent work
Will we store the post meta-data in a relational db ? Would that quickly fall apart given the scale of Insta ? Or would a nosql like Cassandra be the way to go with the tables you've described ? What do you think ?
Love you system design videos. Love from Nepal 👍
Hi Gaurav
Great video it is. Thanks for this.
Had a query. How efficient it would be when a celebrity having 50 million followers(or may be more) posts something and we need to add the post in cache for all of the followers?
Hi Gaurav,
I love the way you explain things. This video actually sums up all the major components, including the DB structures and High-level architecture.
I have one question regarding the design, which is more on low-level design, it would be great if you can create a video on that.
Q. If I need to design the data storage in-memory, which data structure we should use to store the posts. likes, follower data, such that we can fulfill the given features efficiently.
Thanks,
Hey gourav sir😊🙏
Nice overview and well explained.
You r really great person who share our personal experience. 👍
Thanks Sandip!
great video! the animation part is awesome. I like all your system design videos.
Thank you 😁
Amazing Video, Thanks Gaurav :)
What a wonderful channel!!! just subscribed
Very good explanantion Gaurav. The way instagram generates feed today has changed drastically with their new graph api which focuses more on relationships. It would be great to see a video on that.
I'll look into it :)
Fabulous video. Looking forward to more video. Can you elaborate on empirically optimize a ranked feed?
your system design implementation is goign out off my head , i think need to study the basic then only i can get u what you wants to say
@GauravSen: Excellent video! Please help me with a question:
Can you please explain the reasoning/thought-process behind choosing a Relational database for Users and Feed Schema?? What factors do you consider when taking such a decision?
Thanks and Regards
Amazing way of presenting the core concept Gaurav.. Really helpful, thanks.. :) Do we have any probability of learning about the chat feature as well .?
that is a very good video, but regarding #3 - I wouldn't do just a follower, followee table and the reason is scalability
it would be much better to have a table something lile:
userId, entityId, connectionType and insert actually two records in database when a user1 became a follower to a user2.
like
user1, user2, folowing -means that user1 follows user2
and
user2, user1, follower => means that user2 has user1 as a follower
having that it's easier to shard and scale (here sharding would be done based on userId (first row))
Great video, ty. I'm building an app that does something tangential to this, really helpful for real-world work!
Sir I know u will not tell how u studied system design on ur own. But hats off to the content 👍
At ur time there might not be resources 😞 so how learnt on ur own at ur time ??
I'm building a reddit clone, and your way of designing the news feed gave me a lot of ideas, thank you!!
me too! reddit clone from ben awad's tutorial? :D
@@blinkkeebs No, I'm just going over the reddit site and reverse engineering the features, but I've also changed some things. BTW I'm building it in MEAN stack.
@@Eduardo-fk7ft Oh! I'm also building reddit clone but I use React and Next.js for the server-side rendering. The db I use is postgresql. I think I should add some extra features for the project and I found this video :)
@@blinkkeebsVery nice tech stack!!.
Yes, the best way to learn anything is to make it your way or change it, that is something that works best for me.
Good luck, and may google be you best friend!! :P
Just a test message weather you reads it or not.
BTW very good system design playlist.
Nice video! I have one question, though: If we precompute the feed, what happens when a user scrolls past the 20 posts we have cached? Then we would have to compute the rest of the feed at the request of the user, which would be inefficient, right? Sorry if I didn't understand that correctly.
Great explanation as always. However, I have a little doubt. Instead of sending notifications from Post Service to the user feed service directly, what iff we introduce an event bridge in between and different services will communicate with each other based on the event produced by other services?
For pushing notifications, WebSockets won't work when the application is closed. Also, WebSockets is a costly way to manage connection just for updates related pushes. It will drain a lot of battery unnecessary. WebSockets are more useful in more real-time use cases when the application is actually open eg. Chat applications, Multi-player gaming applications. Notification for updates for posts need not be real-time. Firebase Cloud Messaging is the best way to do that.
Hi, Could you share more technical details on how this load balancer can have snapshot of each service. how are sharing of these snapshots between load balancer and api gateway actually implemented
Hi Gaurav, Your explanations are very clear and relevant. I love your accent. Most of the guys use fake accents just for videos which irritates me a lot. One request I wanna put here for the system design of Metro system, stack overflow system as mostly I saw them in the tread.
StackOverflow is interesting, I'll try working it's design video soon :)
Good video, my question though would be why you chose an sql datastore and not nosql, considering that the app is read heavy and need to scale
It would be great if you made a video that delved deep into the concepts of load balancer vs. proxy servers (forward and reverse proxy) vs gateways. TIA! :)
I'll look into it, thanks!
I don't see any discussion on the type of databases used SQL vs NoSQL and also how you are scaling the DB, what type of queries you are using and how you are partitioning the DB?
Also, it would be more helpful if you can explain with some concrete examples on the capacity estimates like the total users, Daily Active users, daily file uploads, total DB/Object storage etc
at 20:20
Why storing feed in cache as LRU ?
I mean if feed is used it should be deleted or replaced by new one right ?
In the use-case celebrity, the decision has also be made on the basis if the polling is affecting the user's device battery or not. WorkManager on Android is a great way to achieve that. It optimises the resources and takes decision when to give a CPU chunk to a particular application only when the device has decent resources to spare. This could could surely save a lot of CPU clocks on our server end for sending PUSH.
Hi @Gaurav,
It was really nice video . I liked it very much. But 2 things I want to ask :
1. How about getting the user feed of followed person based on timestamp and it will provide the upto 2 full pages(full pages lets which will fill mobile screen or lets say top 20 ) of feed when user scroll upto 70% of current feed it will resend next two pages of feed .
2. Sending the notification to the user can be learned ? I mean to say if user is not clicking the notification of particular person from next time system will not send the notification of that particular person . I mean system will learned the behaviour of user which will saves lot of time in sending each followed person's notification.
Thanks Ritesh! The solutions I would choose are:
1) Recompute the feed using the queries mentioned, and paginate as required.
2) User behaviour is difficult, expensive and controversial to predict. I would go for simple solutions before going for custom solutions. Chances of building something unnecessary are lesser. 😁
Concise, at the same time; broad and easy to understand.
Thanks 😁
Great . Thanks.
Could you please do a tutorial on Expedia design system if possible.
Thanks
NoSQL db would be better for this right?
I don't understand this push model and pull model. How does this work ? Any detailed explanation, how can this be implemented ?
I believe we can store the user feed on the client as well, since we can recompute in case of app is reinstalled. Other thing is cache will be updated even when user is not even using the app but if we store at client we will only be updating (Hybrid Model will be better) when user is up and running.
You can cache it on the client. But it's better to keep such responsibilities on the server.
@@gkcs I would agree, Some users like me use Instagram on browser only. :)
As per as app is concerned we can keep on mobile but I see challenges as well lets say I did not open the app for like 4 hours and whole 20 post feed needs to be updated then we need to do big recalculation step.
For fast response times as long as user is on the app we can cache feed on the app push will directly go to the mobile with out one extra hop in between.
But I am still not sure what will be better may somewhere in between server + client would be ideal. I may be completely wrong.
Thoughts?
Hi Gaurav, Excellent video as usual. I had one doubt. If any celebrity posts an image, Will the system still update the queues for userfeeds of millions of users? Will it be efficient ? If not, how is it done?
That subscribers will pull from a queue of celebrity posts.
Great Video.
For use case user-follower scenario, I see technical solution as Graphical DS problem (considering time). Reason is fetching feed both side will become simple and fast.
Looks like an interesting problem to analyze/compare various SD approaches against above use case.
You could try...
question is about the DB design for followerID and followeeID : I assume none of the columns here have constraints on them. If my assumption is right, then how the TAT is in the limit without indexing on the table? Basically, I am not getting
(i)how this table is connected to other parts of the whole schema and
(ii)retrieving results in time rather not slowing down the total response time?
Besides, Thank you very much Gaurav like always ;) :)
Absolutely loved your explanation Gaurav. Thank you :)
hi Gaurav bro, it was amazing .waiting for more such videos
Thanks!
Thanks Gaurav for this video, is there any video on stories back end structure?
@Gaurav Sen , hello, thanks for the video, my question is why did you go for a table based db here , any specific reason, and i was thinking for maintaining post comment wont the subset pattern of mongodb be more better option here.
Should i be making the exact same tables of news feeds(Likes,Posts..) for every users.Thanks in advance
Love your videos . One important question , how do you be yourself on camera cuz I'm generally scared ! Thank you
Thank you Takeda!
Two things I keep in mind are:
1) There are unlimited retakes. Mistakes can just be edited out.
2) I love what I am doing. 😁
@@gkcs Thanks a lot man ! Have a good one .
Loved this playlist, thank you brother
Awesome content.. what I am looking for always get from your videos.
Keep it up.
Great video!
When code_report says something, you better believe it 😎