So are you guys interested in working at Twitter? 😅Btw, don't forget to "Batch" click the like & subscribe buttons. 🚀 neetcode.io/ - Get lifetime access to every course I ever create!
@@NeetCode you should look at your website, I tried to go pro but for some reason the google api won’t let me sign up. I don’t know if I’m the only one the problem or if it is general.
Being an SWE these days is just insane. Any other job, you'd get hired then learn the system over time and by working with people at the company. As a SWE you have to already know how Twitter works just to get through one of the six or so interviews to get a job fixing bugs or writing new features. Does every other SWE know this shit just from going to school or working in the field for a few years? Ive been a SWE for 10 years and these are all semi-new concepts to me. Ive never once had to design a system like this but I guess now companies want you to be an expert on day one. I thought I could avoid cramming algorithims and system design stuff if I didnt try to get a job at FAANG but now every little startup expects you to be a senior level engineer just to make 140k. I feel like my 10 years of experience count for literally nothing.
10 Years and you barely did system design? Typically getting up in seniority means having to take a higher level approach to problems and leaving the implementation to juniors
@@garlicpress6121 I feel like the web based software has skewed everyones perception and it makes people think that this is the only kind of sodtware dev in the work. There are so many other domains which would never need to know this sort of stuff for interviews or even for their work. For example, someone working on low level programming for drivers, or OS level sofware or desktop applications.
I hear you. I am 15 years exp and I am finding this very strange. I functioned without knowing Leetcode algos and these insane System Design stuff. And I did pretty well! I dont know what value these things are adding, TBH.
@@garlicpress6121 I think for a typical SWE, doing system design is common, but not to this degree. Normally it's working on top of or improve existing systems to add features or improve performance/scale/reliability etc.
Man, finding a job as a Software Engineer is just crazy. You need to go through at least 4 to 6 rounds of interviews, starting with a technical take home challenge, then a follow up discussion about that challenge, and then another live technical coding interview, and then a live behavioral interview, and then a live system design interview, and then maybe a product delivery interview, and probably a chat with CTO or VP at the end of that. And then, once you're hired, you're just gonna be focused on fixing bugs and building features, it is very rare that you are creating a fresh system from scratch, unless you're working at a start-up, and even then, you're going to be working with other Engineers to design that system. In most other industries, you typically learn what you need for the job over time, through hands on experience. Only in Software Engineering do all Companies just expect you to be a data structure and algorithm wiz, have previous experience so you can answer those behavioral questions, and then design some abstract system from scratch within 1 hour, just to get hired.
Love your content, your video help me land a position at Twitter one year ago. but I just got laid from Twitter and will start checking your video again 😅
Starting from 4:26 to 7:38, that's pretty much superfluous arithmetic you're going to be doing during a systems design interview. The time you spend mentally calculating those numbers is going to be wasted, just to arrive at a conclusion of "it's a lot", which is almost a given in any systems design interview. Your time will be better spent calculating those numbers while you're doing your high-level design portion, if needed. One example of needing to calculate those numbers is a TopK system for trending topics in a social media feed (which doesn't pertain to a basic Twitter implementation). Ask your interviewer for DAUs and if it's anything over 100M, move on to the core components section (Tweet, User, Feed), rather than calculate capacity estimations.
This is a fantastic example of a realistic architecture screen. I would note for viewers that you will almost certainly not be able to think of and describe everything that was covered here and as someone who conducts 3 or 4 of these every week, I don't expect candidates to cover everything here in the 20-30 minutes I have with them. But as you go through this video, the issues presented scale really well with the expectations that go along with the seniority of the candidate and position. We actually skip a lot of the preliminary setup so that we can delve into the more complex issues for more senior candidates. If you're a mid level, I'm not expecting you to come at me talking about batching out feeds and dynamically updating them based on high popularity tweets.
no, with such test you filter already for ex-twitter employees. That would be fine if you build a social network, but you'd miss out on all the all the brilliant devs who for example designed large e-commerce or data-pipeline architectures, because that requires a very different approach.
It will also be a case study on if these software companies are truly over staffed or not. If Twitter survives after laying off so many people it may inspire other companies to consider down staffing
@@KennethBoneth I think the main issue with scaling down on employees is that the remaining employees will essentially have to monitor and handle the same amount of work as before scaling down, which will cause additional stress and probably a less than healthy work life balance.
Not really, Tesla and SpaceX both are well known for the horrendous work environment. So it depends on the management and the owner of the company in this case.
@@Mattarii That is true if you were properly staffed to begin with. If twitter is as overstaffed as many people believe, then a large chunk of employees are effectively doing nothing. IF twitter goes from properly staffed to understaffed, you are correct. If twitter is going from overstaffed to properly staffed, then that won't happen.
Once I had an interview explaining how to design something. I totally missed the point. This definitely give us a clear idea. It's not about writing a user story, and not even building the actual application, but identifying the most critical points and possible components and to come up with how to solve it. Thanks again.
I can't help but find it slightly hilarious that you released this video during the ongoing controversies happening at Twitter. But in all seriousness, amazing content!
Loved it! The only issue I see is sharding having all the people who follow each other in the same shard. That's just not possible, as a friend of yours will follow someone in another shard group at some point. I haven't got a good answer for that yet, apart from saying we should use a GraphDB here that hopefully is optimised for sharding this kind of data...
Yes, that seems like a big oversight. Each shard will have a subset of a users followees, so the proposed user id as a shard key really doesn't do anything for us.
Just paused at that part, seems incorrect. The best sharding I think may be tweet id (assuming using chronological IDs like snowflake) as people are generally accessing the latest tweets so can grab them in a single request if it misses cache
You've got it wrong. The idea is to have all the _followers_ of the user in one shard. This way, when the user posts a tweet, you would get all their followers ids from one shard with one query. Then you'd use this list of ids, to update their respective feeds with the tweet. When the user request their feed, they get it pre-computed from the cache, not built on-the-fly.
Don't guess the capacity, there are infinite servers, infinite ram, infinite disk. Don't calculate. Only poor calculate. Is the design horizontally scalable? Yes. Go home now
Great video! One question (or perhaps a mistake), in 18:20, you say all the people this guy follows should be on one shard but I don't think that's possible. If person A follows B and C, then B and C should be on one shard. if person E follows C and D, C and D should be on one shard, but its already on a different shard. Maybe B,C,D are all one shard, but as long as each person follows another different person, we will only have one shard.
Thanks for this comment, I really didnt get this sharding thing :) it is looking impossible to sharding per user. I thought that maybe I misunderstood this point but, after your comment it's clear.
Sounds like he meant "each person the user follows will be located uniquely within a single shard" and not "all the people he follows will be in the same shard". The phrasing isn't great.
I just got asked this question in an interview, but with the added feature to follow interests too, and I am surprised I answered pretty much the same thing that is stated here and I passed the interview!, one thing to mention is that some companies/interviewers want to see SQL queries written in order to see how you make joins to the tables, so be prepared on that I would say
If user A follows B and C and B follows back to A then all three should be on same shard and same way if B follows 10 more people and even one person follows back then all those 10 should be on same shard and it goes on with all data on single shard . looks like very abstract way , i am not sure why people not think little more rather thn explaining that abstract way
This level of quality content is available for free, it blows my mind! Also, I am churning through your Blind 75 list of questions and I am loving your solution videos.
Very good tutorial as always from NeetCode. Kudos. One confusion though: I am aware of publisher / subscriver pattern and I am also aware of message queue - What is new is "Pub/Sub message queue". Not sure what that is. From what it looks more like a message queue behaviour auther is indicating instead of a pub/sub. The impact you are creating is far better and huge than anyone working for FAANG.
Caching the Feed page in the CDN and purge it on update(feed is tagged with User_ids), the infrastructure is basically a multi layer data retrieval, uid->followee->tweets(sorted by timestamps) and then merge to get the final result. The uid->followee mapping can be compactly stored and updated if needed. (K/V or RDB) followee->tweets would be a sharded DB with all tweets posted. (K/V). it would just be a simple backend and most of the load would be handled by the CDN.
That more or less is I think what he described for his feed cache description. But it doesn't solve the problem he brings up where we don't want to update all the followers' feed cache whenever a popular user posts a tweet. Also, I don't know how to do it, but when you say "on update", I'm assuming that whenever a person posts a tweet, all the users following that person gets "updated". In that case, then only thing that needs to be changed is inserting that new tweet into the feed (and probably popping out whatever oldest or least important tweet that is in the feed that this new tweet will replace). In that case, I don't think retrieving and merging all the relevant tweets each time there is an "update" makes sense. I think that's why he brought up pub/sub. So it's just a queue where whenever a new one comes the least important one gets popped out.
@@marspark6351 Maybe it's possible to determine a "popular" user and when those users create a tweet, only cache that tweet instead of allowing a message to go through the pub/sub when they post a tweet.
Yes, you must be really disliking Elon Musk so much (to say it mildly ). > Who is most popular on Twitter? Kim Kardashian. probably over 100 million followers . ..... -- Putting the subject aside - you made a good content - thank you!
There shouldn't be any userId in the POST /v1/tweet/create endpoint. This is because we will get the id of the user initiating the request from the authentication token in the request header. Putting sensitive information like authentication tokens in the request body is a security risk
There's no difference in security, whether you put the token in the headers or the body. But it's better to put it in the headers because your gateway can start checking it or sending the request to the destination API before it downloads the body. Putting the userId in the body doesn't make sense here, but it would allow you to have other features like "postponed tweets". And another service with an internal token (without the userId) could call the existing API to post those messages.
Use a DB like Cassandra: users, tweets, followers, follows, feed. Everything sharded by user ID to colocate relevant data. Fan out to followers feeds on tweet. For celebrity users, fetch the celebrity tweets from cache when building the feed. Have some background jobs pre-populate some other good feed candidates, Rank the feed by some scoring system. Push likes, retweets to an event stream and update cached like counters in Redis from the stream every so often. Shard on tweet ID and spin up some read replicas if needed
That's correct, I meant that while NoSQL is easier to scale (automatically or by specifying a shard key), we can still scale relational DBs via sharding.
If you have the capacity for asynchronously pre-building timelines for all (active) users, why don't you increase the capacity of the cache layer for the RLDB, or store the tweets in a fast KV NoSQL?
Probably, having NoSQL KV-store with such massive reads you'd have to deal with its sharding anyways. Don't think you'd just set up Cassandra and start throwing in nodes to the cluster mindlessly. So, author, choosing SQL DB, just makes that logic explicit.
Speaking of popular users. We can separate tweet data by some follower threshold (say 10k followers) and, when popular profile post a new tweet, we only need to update that feed. Every normal profile will check that feed in case they follow popular profiles.
It's so silly and cursed situation with system design interviews. Usually you have functional requirements to support 10e6+ users but you can't make even remotely viable design to support these requirements. It's always a hand wavy "a thing" you can't apply to real life in any way. And the most outrageous thing: in real life you never design for scale without already working product. It's always post tweaks for current and near future loads, numbers you have on hand.
Ngl as a aspiring software engineer, I find this video helpful in terms of macro design. New video style over the different duties of a software engineer? 👀👀
DDIA is the most comprehensive resource (assuming you have at least some experience). Also, most companies (including twitter) release blog posts and white papers about technical challenges they faced and how they overcame them. I think many beginners miss these, but they are an extremely valuable and free resource, which is why they are commonly referenced by system design textbooks.
I think the best way to test a senior developer is to ask them to explain their own projects, and how they solved certain problems. That way you can see if they have the necessary experience and knowledge to do the job. Such random architecture tests are pointless for senior positions. I would not be able to list all those things, because interview are stressful and I am not the best communicator. However, give me a few hours to prepare, and I would design a system that is even better, and if I had a day or two to code I could even create a proof-of-concept. (although that would be unreasonable to ask). Testing such things during a interview mainly tells you how well a persons memory works in a stressful situation. But real developers never have stressful situations, because they already made some code for that. The way to pass such a tests is to mention as many buzzwords as possible, but that filters out the real scientists who do not play such games. I think questions should focus more on actual problem solving abilities and past work, because that is a better indicator of success in the software engineering world.
It will be helpful if you present it in a more realistic interview Q&A kind of scenario, as often the interviewer interrupts the process, asks a different question, or asks to take a different approach. The mocks presented only as a monologue do not serve that purpose.
One of the defining features of twitter is timely notifications about new tweets from people you follow. Could you please describe how could it be implemented in this architecture? Likes and comments allow users to attach their content to a potentially popular tweet. How would it affect our storage layer? What challenges, if any, we would face with multi-az deployment of such system? Thank you for your time and interest in our company.
I would send the tweet timestamp from the client. If you handle it server-side and something breaks and delays the server-side ingestion of the tweet, you'd have an incorrect timestamp. ("Wow, what an amazing touchdown!" posted 2 hours after the touchdown and way out of context on feeds etc)
Why you need relational db, all this relationship data you can store in document db as json for high performance, low latency and scalability. Usage of relational db will not be efficient in this scenario because we need to achieve high availability, we need eventual consistency so NoSql Mongo db is preferred over relational db in this scenario. Correct me if I am wrong.
What this seems to miss is that what's important about twitter is the datastructure that represents a timeline of tweets. There are systems that generate that then there are systems that inject tweets into that timeline. These can be ads or they can be other tweets that the ML systems want to promote... etc. That's the DS that is at the heart of what twitter is. The tech here should be built around that concept NOT the other way around.
Splendid! Solid content with crystal clear pronunciation and comfortable speed. How did you practice your speaking? I wish I could speak no er----en-----aa those no meaning words in a system design interview.
Something I didn't understand: You suggests sharding on user ID as then the people a user follows will be grouped on the same shard. However, users can have a lot of followers and their followers will be distributed across different shards. So you have to duplicate a user's tweets across every shard that has someone following them in it in which case you probably have enough fanout that you're not really sharding anymore, it's just replicas with more steps (at least for the read case, writes would be meaningfully sharded). Am I missing something here? It feels like to get any value out of sharding you'd have to do something MUCH more complicated like assign users to shards based off similarity graphs.
I appreciate the effort and care you put into this video but I think it could use a little more focus. Especially at the sharding-for-writes portion. You jumped around a lot to digressions that made that line of thought hard to follow.
I loved your video, very much and thanks a lot for he afford you made. These are the question we actually face when you are working on the BE side. One small question, If someone asks you, what kind/type of architecture is this? What will be your answer?
I wouldn't combine reads of tweets with "reads" of videos into a single number of data we're going to read from our "storage" as storing videos and streaming videos and storing and reading text tweets + meta data are completely different tasks which access and deals with data in a completely different way.
That’s amazing how this kind of large-scale system can grow and become so complex with amount of components and “moving parts”, also it’s impressive how it works with a massive amount of users and data storage like petabytes. In the end, I didn’t understand if your solution was using sharding or not on the database, if it is using, how do you solve the issue about the sharding-key, ‘cause it looks like not possible to use the “every account followed by someone” strategy due the reasons you even talked about. Is it possible to have sharding and reading replicas at the same time? And how to handle it, using many load balancers, each one after sharding for a single replicas cluster?
Thank you for interesting video. I however doubt that relation database can store the tweets. I've just asked to design twitter during a job interview and constructed something very similar. But I suggested to use aerospike for messages using the following schema: id->list off messages. Aerospike is horisontaly scaled, so there is no need to think about sharding.
If sharding by user id then, to retrieve a single tweet (e.g. by a direct link), you would need to request all shards. Is it something tolerable or how do you overcome it? And what about hot user problem? Sharding by user id does not work well in this case.
So if we want return the cache only, but if the user follows celerity then it will not be up to date. That mean every time user comes we still need to query the list of people that user is subbed to right? To check whether there is celebrity
How about starting from the data model rather than the architecture? And let architecture emerge from constraints on use cases? Enterprise applications 101.
Thank you so much for this video and its good content. Actually one thing to correct maybe is that 12:24 it's not good to save authorization token in db due to security reasons. so maybe if one says that in interview , the interviewer thinks the interviewee does not care about this, and reject him/her
do you mean by passing user ids along with the request implies that the auth token is stored in db? because I don't see him mention it explicitly in the video where to store auth tokens. Also out of curiosity where do we store the auth tokens then?
Design Twitter for Dummies: Create a users table; create a posts table; posts can optionally refer to other posts using a primary key. When someone uses the "@" symbol, the front-end will use whatever text follows to search the database of users (sorted by interaction, popularity, and match %). Done. "But it needs to scale!" - deploy on AWS, fallback to Azure, fallback to Google Cloud. Done. "What about ads?" - Google ads. Done. "What about measuring things?" - Google Analytics. Done. "What about banning users?" - add a banned boolean flag and have an admin panel for users who have the admin boolean set to true. Done. I could whip this up in about 3 hours flat. Front-end AND back-end AND deploying it.
End of min 8, a relation between a follower and a followee is not the same as a relation in a relational database. So it's not a reason to think of one.
It doesn't seem like there's a good general solution. What works for the average user, doesn't work well for super big users. I heard that Twitter separated users with more than a certain number of followers into a separate use case. Not sure if they ever solved it.
Agree on the part that, the data is more on relational side. But why can't we put the tweet in any NoSql db like cassandra, scylla. As from our follow table i know which followee's tweet i have to fetch. Now that i know, i simply have to search in shards the followee's tweet stored.
So are you guys interested in working at Twitter? 😅Btw, don't forget to "Batch" click the like & subscribe buttons.
🚀 neetcode.io/ - Get lifetime access to every course I ever create!
You should leave Google for Twitter
tweet this video to Elon , he might make you CEO, he is weird like that.
@@BhargavSushant lol maybe i should
Yes hire then next day fire
@@NeetCode you should look at your website, I tried to go pro but for some reason the google api won’t let me sign up. I don’t know if I’m the only one the problem or if it is general.
Being an SWE these days is just insane. Any other job, you'd get hired then learn the system over time and by working with people at the company. As a SWE you have to already know how Twitter works just to get through one of the six or so interviews to get a job fixing bugs or writing new features. Does every other SWE know this shit just from going to school or working in the field for a few years? Ive been a SWE for 10 years and these are all semi-new concepts to me. Ive never once had to design a system like this but I guess now companies want you to be an expert on day one. I thought I could avoid cramming algorithims and system design stuff if I didnt try to get a job at FAANG but now every little startup expects you to be a senior level engineer just to make 140k. I feel like my 10 years of experience count for literally nothing.
10 Years and you barely did system design? Typically getting up in seniority means having to take a higher level approach to problems and leaving the implementation to juniors
@@garlicpress6121 I feel like the web based software has skewed everyones perception and it makes people think that this is the only kind of sodtware dev in the work. There are so many other domains which would never need to know this sort of stuff for interviews or even for their work. For example, someone working on low level programming for drivers, or OS level sofware or desktop applications.
I hear you. I am 15 years exp and I am finding this very strange. I functioned without knowing Leetcode algos and these insane System Design stuff. And I did pretty well! I dont know what value these things are adding, TBH.
@@garlicpress6121 I think for a typical SWE, doing system design is common, but not to this degree. Normally it's working on top of or improve existing systems to add features or improve performance/scale/reliability etc.
@@vhchoang if you work in startup and new product is created from ou get oppurtunity to design these kind of things.
Man, finding a job as a Software Engineer is just crazy.
You need to go through at least 4 to 6 rounds of interviews, starting with a technical take home challenge, then a follow up discussion about that challenge, and then another live technical coding interview, and then a live behavioral interview, and then a live system design interview, and then maybe a product delivery interview, and probably a chat with CTO or VP at the end of that.
And then, once you're hired, you're just gonna be focused on fixing bugs and building features, it is very rare that you are creating a fresh system from scratch, unless you're working at a start-up, and even then, you're going to be working with other Engineers to design that system.
In most other industries, you typically learn what you need for the job over time, through hands on experience. Only in Software Engineering do all Companies just expect you to be a data structure and algorithm wiz, have previous experience so you can answer those behavioral questions, and then design some abstract system from scratch within 1 hour, just to get hired.
Love your content, your video help me land a position at Twitter one year ago. but I just got laid from Twitter and will start checking your video again 😅
I'm sorry to hear that, wish you the best - it's only a matter of time!!!
me too😂😂
Starting from 4:26 to 7:38, that's pretty much superfluous arithmetic you're going to be doing during a systems design interview. The time you spend mentally calculating those numbers is going to be wasted, just to arrive at a conclusion of "it's a lot", which is almost a given in any systems design interview. Your time will be better spent calculating those numbers while you're doing your high-level design portion, if needed. One example of needing to calculate those numbers is a TopK system for trending topics in a social media feed (which doesn't pertain to a basic Twitter implementation).
Ask your interviewer for DAUs and if it's anything over 100M, move on to the core components section (Tweet, User, Feed), rather than calculate capacity estimations.
I would love to see more System Design content !! nice video man
Thank you, more to come!
@@NeetCode I think he meant on your youtube channel haha..
@@sanskarkaazi3830 obviously what else could her mean?
@@indiging8330 neetcode has premium courses on his website as well so not there but here.. you get what i mean?
Can you talk about Pinterest, or someone link some available content.
the biggest thing about sharding is that we could potentially lose the joins, and it adds a huge layer of complexity on the application.
wow !!! from algorithms to system design, love to see more on system design videos
This is a fantastic example of a realistic architecture screen. I would note for viewers that you will almost certainly not be able to think of and describe everything that was covered here and as someone who conducts 3 or 4 of these every week, I don't expect candidates to cover everything here in the 20-30 minutes I have with them. But as you go through this video, the issues presented scale really well with the expectations that go along with the seniority of the candidate and position. We actually skip a lot of the preliminary setup so that we can delve into the more complex issues for more senior candidates. If you're a mid level, I'm not expecting you to come at me talking about batching out feeds and dynamically updating them based on high popularity tweets.
no, with such test you filter already for ex-twitter employees. That would be fine if you build a social network, but you'd miss out on all the all the brilliant devs who for example designed large e-commerce or data-pipeline architectures, because that requires a very different approach.
I guess twitter will be a case study in “does talent matter” and “how interchangeable/disposable are sw engineers”.
It will also be a case study on if these software companies are truly over staffed or not. If Twitter survives after laying off so many people it may inspire other companies to consider down staffing
@@KennethBoneth I think the main issue with scaling down on employees is that the remaining employees will essentially have to monitor and handle the same amount of work as before scaling down, which will cause additional stress and probably a less than healthy work life balance.
Not really, Tesla and SpaceX both are well known for the horrendous work environment. So it depends on the management and the owner of the company in this case.
@@bryanyang7626 that's true, might not work well with other companies once people start realizing their lives are worth more than slaving away
@@Mattarii That is true if you were properly staffed to begin with. If twitter is as overstaffed as many people believe, then a large chunk of employees are effectively doing nothing. IF twitter goes from properly staffed to understaffed, you are correct. If twitter is going from overstaffed to properly staffed, then that won't happen.
Once I had an interview explaining how to design something. I totally missed the point. This definitely give us a clear idea.
It's not about writing a user story, and not even building the actual application, but identifying the most critical points and possible components and to come up with how to solve it.
Thanks again.
I can't help but find it slightly hilarious that you released this video during the ongoing controversies happening at Twitter.
But in all seriousness, amazing content!
Musk will hire him
Its because Musk tweeted the HLD of twitter on twitter. You can see that in the thumbnail of this video too
Loved it!
The only issue I see is sharding having all the people who follow each other in the same shard. That's just not possible, as a friend of yours will follow someone in another shard group at some point.
I haven't got a good answer for that yet, apart from saying we should use a GraphDB here that hopefully is optimised for sharding this kind of data...
Yes, that seems like a big oversight. Each shard will have a subset of a users followees, so the proposed user id as a shard key really doesn't do anything for us.
Yeah I felt like I was missing something when he said sharding and scrolled down to the comments to confirm
Just paused at that part, seems incorrect. The best sharding I think may be tweet id (assuming using chronological IDs like snowflake) as people are generally accessing the latest tweets so can grab them in a single request if it misses cache
@@salient244 yeah, but still you'd need to store the friends relationships somehow and you'd get into the sharing issue when it scales up
You've got it wrong. The idea is to have all the _followers_ of the user in one shard. This way, when the user posts a tweet, you would get all their followers ids from one shard with one query. Then you'd use this list of ids, to update their respective feeds with the tweet. When the user request their feed, they get it pre-computed from the cache, not built on-the-fly.
Don't guess the capacity, there are infinite servers, infinite ram, infinite disk. Don't calculate. Only poor calculate. Is the design horizontally scalable? Yes. Go home now
Great video! One question (or perhaps a mistake), in 18:20, you say all the people this guy follows should be on one shard but I don't think that's possible. If person A follows B and C, then B and C should be on one shard. if person E follows C and D, C and D should be on one shard, but its already on a different shard. Maybe B,C,D are all one shard, but as long as each person follows another different person, we will only have one shard.
Thanks for this comment, I really didnt get this sharding thing :) it is looking impossible to sharding per user. I thought that maybe I misunderstood this point but, after your comment it's clear.
Sounds like he meant "each person the user follows will be located uniquely within a single shard" and not "all the people he follows will be in the same shard". The phrasing isn't great.
First time Kim Kardashian has come up in any tech video I've watched
Your content is way, WAY better than the others on RUclips! Great work!
I can't believe that I just found this channel now. Great content
Very much enjoyed the video, the explanation, the simplicity and the clarity it brought out. Thank you
Glad it was helpful!
Wow! That's a lot to take in maybe because I'm sleepy but sparked at the same time. Put out more of this please.
I just got asked this question in an interview, but with the added feature to follow interests too, and I am surprised I answered pretty much the same thing that is stated here and I passed the interview!, one thing to mention is that some companies/interviewers want to see SQL queries written in order to see how you make joins to the tables, so be prepared on that I would say
Extremely good discussion in this video, more of this please!
If user A follows B and C and B follows back to A then all three should be on same shard and same way if B follows 10 more people and even one person follows back then all those 10 should be on same shard and it goes on with all data on single shard . looks like very abstract way , i am not sure why people not think little more rather thn explaining that abstract way
This level of quality content is available for free, it blows my mind! Also, I am churning through your Blind 75 list of questions and I am loving your solution videos.
how is this a quality content?
@@umarqureshi8499what's wrong with it?
Thank you for explaining in such detail. I learned about sharding, definitely will use in my projects.
Literally Amazing man. Take a bow🙇♂️
Very good tutorial as always from NeetCode. Kudos.
One confusion though: I am aware of publisher / subscriver pattern and I am also aware of message queue - What is new is "Pub/Sub message queue". Not sure what that is. From what it looks more like a message queue behaviour auther is indicating instead of a pub/sub.
The impact you are creating is far better and huge than anyone working for FAANG.
Caching the Feed page in the CDN and purge it on update(feed is tagged with User_ids), the infrastructure is basically a multi layer data retrieval, uid->followee->tweets(sorted by timestamps) and then merge to get the final result.
The uid->followee mapping can be compactly stored and updated if needed. (K/V or RDB)
followee->tweets would be a sharded DB with all tweets posted. (K/V).
it would just be a simple backend and most of the load would be handled by the CDN.
That more or less is I think what he described for his feed cache description.
But it doesn't solve the problem he brings up where we don't want to update all the followers' feed cache whenever a popular user posts a tweet.
Also, I don't know how to do it, but when you say "on update", I'm assuming that whenever a person posts a tweet, all the users following that person gets "updated". In that case, then only thing that needs to be changed is inserting that new tweet into the feed (and probably popping out whatever oldest or least important tweet that is in the feed that this new tweet will replace). In that case, I don't think retrieving and merging all the relevant tweets each time there is an "update" makes sense. I think that's why he brought up pub/sub. So it's just a queue where whenever a new one comes the least important one gets popped out.
@@marspark6351 Maybe it's possible to determine a "popular" user and when those users create a tweet, only cache that tweet instead of allowing a message to go through the pub/sub when they post a tweet.
I almost spilled my coffee when i heard the word "How hard can it be?" LOL
Yes, you must be really disliking Elon Musk so much (to say it mildly ).
> Who is most popular on Twitter? Kim Kardashian. probably over 100 million followers .
.....
--
Putting the subject aside - you made a good content - thank you!
There shouldn't be any userId in the POST /v1/tweet/create endpoint. This is because we will get the id of the user initiating the request from the authentication token in the request header. Putting sensitive information like authentication tokens in the request body is a security risk
There's no difference in security, whether you put the token in the headers or the body. But it's better to put it in the headers because your gateway can start checking it or sending the request to the destination API before it downloads the body. Putting the userId in the body doesn't make sense here, but it would allow you to have other features like "postponed tweets". And another service with an internal token (without the userId) could call the existing API to post those messages.
Amazing! This one of the best System Design videos I watched :) Great job!
Use a DB like Cassandra: users, tweets, followers, follows, feed. Everything sharded by user ID to colocate relevant data.
Fan out to followers feeds on tweet. For celebrity users, fetch the celebrity tweets from cache when building the feed. Have some background jobs pre-populate some other good feed candidates, Rank the feed by some scoring system.
Push likes, retweets to an event stream and update cached like counters in Redis from the stream every so often. Shard on tweet ID and spin up some read replicas if needed
The problem right now is not about designing a workable system but a system that works smoothly without spending much $$$ on the infrastructure.
Correction 9:01 We can also implement sharding in most nosql databases.
That's correct, I meant that while NoSQL is easier to scale (automatically or by specifying a shard key), we can still scale relational DBs via sharding.
Nice catch boss
If you have the capacity for asynchronously pre-building timelines for all (active) users, why don't you increase the capacity of the cache layer for the RLDB, or store the tweets in a fast KV NoSQL?
Probably, having NoSQL KV-store with such massive reads you'd have to deal with its sharding anyways. Don't think you'd just set up Cassandra and start throwing in nodes to the cluster mindlessly. So, author, choosing SQL DB, just makes that logic explicit.
The abstract design is vital! Now I have realized this point.
Speaking of popular users. We can separate tweet data by some follower threshold (say 10k followers) and, when popular profile post a new tweet, we only need to update that feed. Every normal profile will check that feed in case they follow popular profiles.
So...use the average Twitterer's tweets as load dampening. They should do that. It will make Twitter even less popular.
It's so silly and cursed situation with system design interviews. Usually you have functional requirements to support 10e6+ users but you can't make even remotely viable design to support these requirements. It's always a hand wavy "a thing" you can't apply to real life in any way. And the most outrageous thing: in real life you never design for scale without already working product. It's always post tweaks for current and near future loads, numbers you have on hand.
Looking forward to part 2!!! More in-depth
I really enjoy watching your video!!
Amazing video, this has made me curious about systems design roles in industry
Don't forget ads. Imagine how complex this whole thing becomes when we add in ads.
Nice video! Gotta love some systems design
Ngl as a aspiring software engineer, I find this video helpful in terms of macro design. New video style over the different duties of a software engineer? 👀👀
What books / sources did you refer to get a strong grip on system design?
DDIA is the most comprehensive resource (assuming you have at least some experience).
Also, most companies (including twitter) release blog posts and white papers about technical challenges they faced and how they overcame them. I think many beginners miss these, but they are an extremely valuable and free resource, which is why they are commonly referenced by system design textbooks.
Thanks!!
@@NeetCode Is there a central url where you find those blog posts or do you just google them?
A good place to start is by learning the classic OOP design patterns. It's less about the OOP and more about the patterns.
What a nice video, I learnt a lot even being a junior developer.
Btw, how can I find the official twitter engineering paper you mentioned at the end?
I’d try checking their engineering blog for leads.
People watch netflix, I watch neetcode.
my IT classes coming in clutch
I think the best way to test a senior developer is to ask them to explain their own projects, and how they solved certain problems. That way you can see if they have the necessary experience and knowledge to do the job.
Such random architecture tests are pointless for senior positions. I would not be able to list all those things, because interview are stressful and I am not the best communicator. However, give me a few hours to prepare, and I would design a system that is even better, and if I had a day or two to code I could even create a proof-of-concept. (although that would be unreasonable to ask).
Testing such things during a interview mainly tells you how well a persons memory works in a stressful situation. But real developers never have stressful situations, because they already made some code for that. The way to pass such a tests is to mention as many buzzwords as possible, but that filters out the real scientists who do not play such games. I think questions should focus more on actual problem solving abilities and past work, because that is a better indicator of success in the software engineering world.
If the interviewer is Elon, all you need to do is remember the word “turboencabulator”.
It will be helpful if you present it in a more realistic interview Q&A kind of scenario, as often the interviewer interrupts the process, asks a different question, or asks to take a different approach. The mocks presented only as a monologue do not serve that purpose.
If we shard based on a used id, won't it become a hotspot (if user is a celebrity or has large no of tweets)?
That initial diss on twitter is everything 😂😂
One of the defining features of twitter is timely notifications about new tweets from people you follow. Could you please describe how could it be implemented in this architecture? Likes and comments allow users to attach their content to a potentially popular tweet. How would it affect our storage layer? What challenges, if any, we would face with multi-az deployment of such system? Thank you for your time and interest in our company.
I would send the tweet timestamp from the client. If you handle it server-side and something breaks and delays the server-side ingestion of the tweet, you'd have an incorrect timestamp. ("Wow, what an amazing touchdown!" posted 2 hours after the touchdown and way out of context on feeds etc)
Why you need relational db, all this relationship data you can store in document db as json for high performance, low latency and scalability. Usage of relational db will not be efficient in this scenario because we need to achieve high availability, we need eventual consistency so NoSql Mongo db is preferred over relational db in this scenario. Correct me if I am wrong.
What this seems to miss is that what's important about twitter is the datastructure that represents a timeline of tweets. There are systems that generate that then there are systems that inject tweets into that timeline. These can be ads or they can be other tweets that the ML systems want to promote... etc. That's the DS that is at the heart of what twitter is. The tech here should be built around that concept NOT the other way around.
isn't what you are describing the "feed" part that he's describing? I'm confused why else you need a database ordered by timeline
This is great. I loled at 0:48 .This video is neet.
Splendid! Solid content with crystal clear pronunciation and comfortable speed. How did you practice your speaking? I wish I could speak no er----en-----aa those no meaning words in a system design interview.
considering how many joins you would have to do in a relational DB, it would be hard to justify that for twitter.
Awesome video, what are you using as your board?
All followers are created equal. Some followers are more equal than others #AnimalFarmTwitter
omg, you're insane. thank you!
we need more of these for sure
Neet: How hard could it be?
Candidate: *sweats profusely seeing Elon* 😥
Something I didn't understand:
You suggests sharding on user ID as then the people a user follows will be grouped on the same shard.
However, users can have a lot of followers and their followers will be distributed across different shards. So you have to duplicate a user's tweets across every shard that has someone following them in it in which case you probably have enough fanout that you're not really sharding anymore, it's just replicas with more steps (at least for the read case, writes would be meaningfully sharded).
Am I missing something here? It feels like to get any value out of sharding you'd have to do something MUCH more complicated like assign users to shards based off similarity graphs.
I appreciate the effort and care you put into this video but I think it could use a little more focus. Especially at the sharding-for-writes portion. You jumped around a lot to digressions that made that line of thought hard to follow.
thank uuuuuuu can you please upload more videos on system design and object oriented design. I know you might be busyy but would mean a LOTT!!!!
I loved your video, very much and thanks a lot for he afford you made. These are the question we actually face when you are working on the BE side.
One small question,
If someone asks you, what kind/type of architecture is this? What will be your answer?
This is great! Thank you!
"They recently fired everyone so Twitter needs someone who knows Twitter beforehand" LMAO
I would argue you need an index on both followee and follower because in twitter you can see both ways
don't know much about sharding, but I do have a lot of experience with *sharting*
Twitter will b3come a case study on "How bloated is the company and what is the minimum number of people a company needs to run ?"
I wouldn't combine reads of tweets with "reads" of videos into a single number of data we're going to read from our "storage" as storing videos and streaming videos and storing and reading text tweets + meta data are completely different tasks which access and deals with data in a completely different way.
That’s amazing how this kind of large-scale system can grow and become so complex with amount of components and “moving parts”, also it’s impressive how it works with a massive amount of users and data storage like petabytes. In the end, I didn’t understand if your solution was using sharding or not on the database, if it is using, how do you solve the issue about the sharding-key, ‘cause it looks like not possible to use the “every account followed by someone” strategy due the reasons you even talked about.
Is it possible to have sharding and reading replicas at the same time? And how to handle it, using many load balancers, each one after sharding for a single replicas cluster?
I was left with the same impression. I don't see how this sharding could work
Thank you for interesting video. I however doubt that relation database can store the tweets. I've just asked to design twitter during a job interview and constructed something very similar. But I suggested to use aerospike for messages using the following schema: id->list off messages. Aerospike is horisontaly scaled, so there is no need to think about sharding.
Great video. Learnt a lot.
If sharding by user id then, to retrieve a single tweet (e.g. by a direct link), you would need to request all shards. Is it something tolerable or how do you overcome it?
And what about hot user problem? Sharding by user id does not work well in this case.
Yep, but there is no requirement in this case to be able to request tweet by id directly without knowing the author of the tweet.
Twitter has more engineers than HR
So if we want return the cache only, but if the user follows celerity then it will not be up to date. That mean every time user comes we still need to query the list of people that user is subbed to right? To check whether there is celebrity
How about starting from the data model rather than the architecture? And let architecture emerge from constraints on use cases? Enterprise applications 101.
Gracias - Thanks, great video.
Elon: bulls**t design. You are fired!
Lol "...in that case you may have to because they recently fired everyone so they need people that know how it works..." 😂
If we have read heavy system why are we not using slave and master design
Thank you so much for this video and its good content. Actually one thing to correct maybe is that 12:24 it's not good to save authorization token in db due to security reasons. so maybe if one says that in interview , the interviewer thinks the interviewee does not care about this, and reject him/her
do you mean by passing user ids along with the request implies that the auth token is stored in db? because I don't see him mention it explicitly in the video where to store auth tokens. Also out of curiosity where do we store the auth tokens then?
@@zhenghaohe4727 we don't store them, we validate them against our secrets
Design Twitter for Dummies: Create a users table; create a posts table; posts can optionally refer to other posts using a primary key. When someone uses the "@" symbol, the front-end will use whatever text follows to search the database of users (sorted by interaction, popularity, and match %). Done.
"But it needs to scale!" - deploy on AWS, fallback to Azure, fallback to Google Cloud. Done. "What about ads?" - Google ads. Done. "What about measuring things?" - Google Analytics. Done. "What about banning users?" - add a banned boolean flag and have an admin panel for users who have the admin boolean set to true. Done.
I could whip this up in about 3 hours flat. Front-end AND back-end AND deploying it.
I would love to see it :)
@@NeetCode me too
@@NeetCode Will repeat this assertion on multiple YT channels in 3 hours flat.😀
So how is traction going? Any users yet?
which tool are you using to draw the diagrams?
I was waiting for something like that
End of min 8, a relation between a follower and a followee is not the same as a relation in a relational database. So it's not a reason to think of one.
Bro how do you draw so good with the mouse
It doesn't seem like there's a good general solution. What works for the average user, doesn't work well for super big users. I heard that Twitter separated users with more than a certain number of followers into a separate use case. Not sure if they ever solved it.
Simply amazing content
Sharding is easy. Sometimes I do it by accident
Great video, I even got a similar answer so I know it's good ;D
Great! Thanks a lot.
Agree on the part that, the data is more on relational side. But why can't we put the tweet in any NoSql db like cassandra, scylla. As from our follow table i know which followee's tweet i have to fetch. Now that i know, i simply have to search in shards the followee's tweet stored.
@Neetcode please show your face once. Really wanted to see you
Probably one day :)