2: Instagram + Twitter + Facebook + Reddit | Systems Design Interview Questions With Ex-Google SWE

Jordan has no life

Просмотров 45 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 25 ноя 2024

Комментарии • 366

@idiot7leon 8 месяцев назад ⁺²⁸
Brief Outline
00:01:09 Objectives
00:02:18 Capacity Estimation
00:03:49 Fetching Follower/Following
00:05:44 Follwer/Following Tables
00:07:56 Follow/Following Database
00:09:24 Follow/Following Partioning/DataModel
00:10:34 NewsFeed(Naive)
00:12:08 NewsFeed(Optimal)
00:13:39 NewsFeed Diagram
00:17:27 Posts Database/Schema
00:18:39 Popular Users
00:20:10 Caching Popular Posts
00:22:35 Fetching popular users that a given user follows
00:25:57 Security Levels on Posts
00:28:15 Nested Comments
00:30:08 Nested Comments Access Patterns
00:31:44 Graph Database
00:33:53 Alternatives to Graph Database
00:34:41 DFS Index (similar to GeoHash!)
00:36:48 Final Diagram
Thanks, Jordan!
@jordanhasnolife5163 8 месяцев назад ⁺³
Thank you!!
@rakeshrakshit6355 4 месяца назад ⁺¹
@idiot7leon Thank you!!!
@kaushalgala 4 месяца назад ⁺⁸
One of the best system design videos on social media use cases. Not only this is great for interviewees, but also for interviewers as well as professionals looking for brainstorming ideas to optimize existing systems
@raymondyang2018 Год назад ⁺⁷⁷
Thanks for the video. I don't think I can survive at Amazon for another month. I also barely have time outside work to study so I don't think I can put enough prep time for interviews. At this point, I'm strongly considering just quitting without another job lined up and just spend a month or two grinding leetcode and system design. I know the job market is bad, I just don't care anymore.
@jordanhasnolife5163 Год назад ⁺²¹
I believe in you man, you got this!
@Saurabh2816 Год назад ⁺²
Rooting for you buddy. You got this!
@Doomer1234 Год назад ⁺³
You got this mate!
@whyrutryingsohard 10 месяцев назад ⁺¹
You got this
@rdwok14 9 месяцев назад ⁺²
Good luck to you man. I feel the same way about my job right now. Forced RTO is probably the final straw for me, it has really made it so much more stressed and less productive and they’re about to lose an engineer because of it.
@cc-to2jn 11 месяцев назад ⁺¹⁵
man it took me 2.5hr to digest this video, coming up with my own and watching urs. How long did it take you? Great content as always, thanks for always putting out such high quality work!
@jordanhasnolife5163 11 месяцев назад ⁺³¹
Glad to hear! Yeah I'd say one of these videos takes me around 8 or so hours to make, but I think it's awesome that you're taking the time to pause and think through it and decide if you agree or not, I think that's a much better process than just mindlessly watching!
@pgkansas 8 месяцев назад ⁺⁴
As an extension (or a new post), would be good to add
- how to efficiently refresh the feed (algo to sort the feed => timeline, ML real-time recommended systems) and keep adding new items to a users' news feed.
The DFS traversal also helps
- to show popular (VIPs) folks commenting on a post ; this is usually shown instead of other normal comments
Excellent post !
@jordanhasnolife5163 8 месяцев назад ⁺²
1) Good point! I imagine as a front end optimization we probably poll for the next set of feed the moment we open the app. The ML stuff I'll cover in a subsequent video.
2) Covered displaying "top"/"hot" comments like how reddit does it in a separate video a couple weeks ago :)
@ekamwalia5757 5 месяцев назад ⁺³
Love the video Jordan! On my way to binging your entire System Design 2.0 playlist.
@alekseyklintsevich4601 9 месяцев назад ⁺⁷
Another way the follower/followee table you can be implemented is by using a secondary index on follower, and use DynamoDB. For example, DynamoDB will do most of what you stated under the hood, when adding a secondary index on a field.
@jordanhasnolife5163 9 месяцев назад ⁺²
Interesting - if that's how they actually implement it then seems totally reasonable to me!
@veshnuramakrishnan3618 Месяц назад ⁺¹
Secondary Index is local to a shard. So the request has to go to multiple shards to collect data. It will be inefficient.
@veshnuramakrishnan3618 Месяц назад
Or did they change it?
@capriworld Месяц назад
@@veshnuramakrishnan3618
In case of cassandra, LSI will look for all the shards. 2) GSI will look for all the matching shards in case of dynamodDB or 3) adding userId as Jordan says, it will be only one shard look up, I also think, it is better,
@quirkyquester 2 месяца назад ⁺¹
Your content is gold bro! Thank you! Love it also love the comment section Q & A. One idea is that you could summarize some questions in comment and make Q & A video for each design.
@Robinfromthehood88 Год назад ⁺¹⁵
Hi Jordan. Nice video. As someone who participated in a very famous news feed design and re-design, I can say that you explained some nice concepts here and assembled a legit design.
With that being said, you did mention you gonna go way deeper, beyond any other system design youtuber on such subjects. Personally for me, a seasoned engineer (who's here for some humor as well), I found it not to my level. There are crazy, crazy issues when building a scaleable news feed (even at HLD stage).
One example:
- How do you make sure that a middle east user can see his feed from Australia if a region is down? or if he's using an Australian VPN now for some reason? (assuming that user want to see live updates fairly fast [say minutes?])
News feeds work really hard to try to make distant region accessible as possible very fast and there are reasons for that.
There are many parts to take into account if depth is what you are looking for in your designs.
@jordanhasnolife5163 Год назад ⁺¹³
I appreciate the constructive criticism. I think you make a good point and will attempt to go further in future videos.
At the same time, I am trying to strike a happy medium here between every failure scenario and instead teach for the interview specifically. I am trying to go more in depth about the decisions asked in interview questions, though maybe I'll devote some more time in the future to those edge cases.
Also, just curious, which one did you help build? Would love to learn more!
Have a good day!
@abhijit2614 Год назад ⁺¹
Maybe I exceeded the char limit 😂
@clintnosleep Год назад ⁺¹
@@jordanhasnolife5163 Great video - I'd also appreciate more thoughts about geo replication / multi-region design. Fantastic channel.
@jordanhasnolife5163 11 месяцев назад ⁺⁴
@@clintnosleep Thanks! The more that I think a bit more about this, I'll try to cover it very high level but at the end of the day it is mainly an interview prep channel so I'd like to keep videos to the point!
@pavankhubani4191 6 месяцев назад ⁺¹
Great video first of all. I think the explanation in the video is in enough depth, but would really like to hear about the additional points @Robinfromthehood88 mentioned.
@theunusual4566 Месяц назад ⁺¹
The Humor in Between the Teaching was amazing :)
Thanks a lot.
@the-unspectd Месяц назад ⁺³
Hi Jordan! Thank you for the great video.
There are still a couple of things I don’t quite understand:
1) You mentioned that the posts DB is sharded by user_id, which makes sense for querying posts by a user. But what about retrieving a post by its id? This is likely to be a very common use case.
2) What happens with Flink nodes during a restart? Do they read directly from storage? Kafka topics usually have a configured retention, and re-reading everything could be quite slow.
@jordanhasnolife5163 Месяц назад ⁺¹
1) You can use userId + timestamp as a postID, you don't need a dedicated post ID field
2) Yes, or they snapshot to S3 on an interval and read from there to speed things up. I have a dedicated flink video that attempts to explain why it is useful for fault tolerance.
@rezaasadollahi9880 5 месяцев назад ⁺¹
Thanks Jordan for preparing these videos. I noticed in couple of your videos you raised a concern with graph DBs inefficiency if we need to jump from one address on disk to another due to disk latency. However this is not the case with SSD drives that are everywhere these days. With SSD any address lookup on disk is done with O(1) time complexity.
@jordanhasnolife5163 5 месяцев назад
Would have to look into it more as I can't remember my operating systems too well, but that would certainly help. SSDs are a bit more expensive though, so that's unfortunate.
@eason7936 Год назад ⁺⁴
Really nice video, it's very clear and impressive. Very appreciate your sharing
@rakeshreddytheeghala9397 2 месяца назад ⁺³
If i understand the hybrid approach of fanning out post correctly, we update the posts of non populars users in the indivual users feed cache. For the popular users post , we keep it in a seperate cache. I want to know how the posts are finally sent to the online users. I guess, users maintain a long polling with the news feed servers which returns new posts whenever it is available i.e feed cache is updated with enough no of posts. Now for the populars posts , new feed service has to send it to so many users. How is that scalable ?
@jordanhasnolife5163 2 месяца назад ⁺¹
We don't send it via a long poll for popular users, we implement polling on a less frequent basis.
@rakeshreddytheeghala9397 2 месяца назад
@@jordanhasnolife5163
1. isn't less frequent basis polling same as long polling
2. I am interested in popular posts , not popular users
@MayankAgrawal-j8h 10 месяцев назад ⁺⁵
Thanks for the great content as usual. One question on cassandra being the storage backend for followers: Can that lead to write conflicts if you delete the row? If yes, do you think it makes sense to handle unfollows with a different table and periodically bulk update the follower table?
@jordanhasnolife5163 10 месяцев назад ⁺⁵
Yeah it can assuming you unfollow like right after following. That being said, it's worth noting that I believe Cassandra uses tombstones for deletes, so assuming that those are there, when the various leaders perform anti entropy the tombstone should reconcile with the original write to be removed from the SSTable.
@AjItHKuMaR-ns5yu 10 месяцев назад ⁺⁹
Hey. Thanks for your efforts. I love the way you explain things. I have one doubt on the feed generation part. I am new to stream processing with kafka and flink, so pardon me if my question is stupid.
U said that we use CDC and maintain the user : [followers] list in memory in flink. I have 2 questions here.
Firstly, there are 2.5 billion users on instagram. Are we really going to maintain these many user:[follower] list in flink?? Is it even capable of holding this much data in memory,as its mainly a real time stream processing framework.
Secondly, I read that both kafka and flink are push based system. So, when a user following is updated, i understand it can be consumed by flink and make necessary updates. However, if suppose flink goes down and since all the data was in memory, it is bound to get flushed. When it comes up again, are we going load all the data again in memory?
@jordanhasnolife5163 9 месяцев назад ⁺⁴
Good questions! I think that potentially watching the video I made on flink may help you, but the gist is:
1) lots of partitions! We can hold this data if we do that, and also you actually can use disk with flink to store data if you need more storage.
2) flink checkpoints state so that it is not lost!
@nvntsingh 4 месяца назад ⁺¹
@@jordanhasnolife5163 Checkpoints only work if we use disk storage. Am I correct?
@jordanhasnolife5163 4 месяца назад ⁺¹
@@nvntsingh No you can store data in memory and still checkpoint to something like S3
@hassansazmand1747 Год назад ⁺²
great work I love how your videos have evolved!
@aforty1 6 месяцев назад ⁺²
Your channel is amazing. Liked and comment for the algorithm!
@hbhavsi Месяц назад ⁺²
Amazing content bro! Wanted to ask you - which video in your System Design 2.0 playlist talks more in depth about Change Data Capture?
@jordanhasnolife5163 25 дней назад ⁺¹
Probably I'd have to say the "concepts" video about stream processing.
@nhancu3964 6 месяцев назад ⁺²
Great explanation. I have the wonder that how these systems avoid re-recommend posts in newsfeeds (like TikTok does with video). Do they store all viewed history 🙄🙄
@jordanhasnolife5163 6 месяцев назад
Short answer is yes + using bloom filters as an optimization, long answer is that I have a video posting on this in 5 minutes :)
@2sourcerer Месяц назад ⁺²
40:43 Hi Jordan, what if the Post Flink Nodes died and is relaunched again? The fact that the Kafka has durability means that the Flink nodes can easily be relaunched and pull up the data that it's needed? Is it faster to pull from Kafka or just pull from the Cassandra post db?
@jordanhasnolife5163 Месяц назад ⁺¹
Flink takes durable snapshots of its own state, but you can also read from cassandra
@choucw1206 11 месяцев назад ⁺³
I have a few questions regarding the way you shard the post data/table
1. In the video you propose to use userId as shard key. I know at your design you had the user-verified and popular post cached. But I think there may still be the use case you may need to query it directly. How do you alleviate the "hot partition" issue for popular users?
2. Some other resources refers to use "Snowflak ID" as shard key, which was used by Twitter to generate global ID for tweets, user, etc. (The GitHub repo was archived so they might move to use something else) However, none of them can explain how using this shard key can make the query efficient. For example, a query like "Find all tweets/post in last 7 days published by Elon" will require to hit every partition node. Did you look at this when you researched this topic?
@jordanhasnolife5163 11 месяцев назад ⁺¹
1) FWIW I guess by hot partition here it's not an issue of posting too much data, but more-so too many reads. If we had too much data, I think we'd have to basically "inner shard" where users can be split across many partitions. Since it's probably just too many reads IRL, I'd probably just add more replicas for that partition.
2) I didn't look into this, but to me it seems pointless lol. Maybe more of an optimization to balance writes on partitions evenly, not sure.
@learningwheel8442 5 месяцев назад
@@jordanhasnolife5163 I think sharding on UserID or PostID is dependent on the use case. If we are looking to fetch all posts created by a user to populate their user timeline, then USERID based shading makes sense.
On the other hand, if we also want to get all new/top posts created in last x hours/days (e.g Reditt) or some analytics on all new posts that were created in the last x hours/days, then sharding on postid and sort by timestamp or using snowflake id with time stamp embedded to efficiently get the new tweets without having to query all the posts database partitions which are sharded on user id makes sense. Thoughts?
@msebrahim-007 28 дней назад ⁺¹
I noticed that in this design CDC was used to prevent partial failures and distributed transactions (such as 2-phase-commit). However, the final result that is produced requires the Feed Service to perform multiple operations which could be prone to a partial failure scenario? The Feed Service must query the "user-verified-following cache", then the "popular posts cache", and lastly the "newsfeed caches" before it is able to aggregate the results and return a newsfeed to a user. I may have not caught this but are we using something like 2PC here?
@jordanhasnolife5163 27 дней назад
Reading from multiple places doesn't require a two phase commit, assuming I don't need perfect consistency across each database that I'm reading
@tarushreesabarwal6618 6 месяцев назад ⁺¹
Great video Jordan, had a qq: at 25:23, where we have users table, and user followers table, the event will be triggered in User Followes table first , for eg: all the followers for user id 6, maybe [3, 5, 10]
Then after this result reaches Flink, then it would take every followers id, and check if it is verified from Users table
So trigger event on these 2 DBs won't be simultaneous . I am imagining CDC to be like Dynamodb streams.
Also I didn't completely understand why are we using both Kafka and Flink, can't we send the trigger event on any DB change directly to Flink.
I am a beginner, so pardon me for any obvious questions asked
@jordanhasnolife5163 6 месяцев назад
They don't have to be simulataneous, once a user gets verified we stop sending their posts to all followers and begin putting them in the popular posts db.
As for question 2, all streaming solutions require some sort of message broker under the hood, it's just a question of how much you abstract things away from the developer
@mayankchhabra3070 День назад ⁺¹
Hi Jordan, Thanks for the amazing video!
I had a few questions around the posts DB.
1) As we are using Cassandra here which is a leaderless replication DB.Lets say a user uploads a post, and the user immediately wants to update the post. Because we are using a leaderless replication, its possible that the user may/may-not be able to read their own writes here. Does it make sense to have some kind of Write-back caches: which can provide read-your-writes consistency so that a user can update a very recently uploaded post and then flush these writes/updates to Cassandra (assuming most of the users only update a new posts within 5-10 minutes from the time of the upload)?
2) Follow up to the first question, if we have a write-back cache (assuming we use a distributed cache) and if the cache goes down which would lead to the posts getting dropped as they were not committed to the DB. In this case would a Write Ahead Log with cache help us make this more fault tolerant?
@jordanhasnolife5163 Час назад
1) I guess it depends on whether we use quorums, and yeah even then we can fail. I think your solution makes sense, but it's probably just too much of an edge case to be worth covering/doing since this isn't vital data.
2) It depends where you store the write ahead log and how you replicate it :)
@gangsterism Год назад ⁺¹
this information is going to be very useful for my massive application with up to 0 to 1 users, achieving speed is certainly going to be a concern
@jordanhasnolife5163 Год назад ⁺²
Users list: my mom
Still need a hyperloglog to count them tho
@shubhamjain1153 3 месяца назад ⁺¹
Hey as always love your content, my primary learning source. I have seen you usually combine kafka with flink for message processing. This makes sure to make it process real-time, but then why are we persisting data on kafka as well. Also can you add detailed video around what problem scenarios we combine kafka with flink and its operational nuances.
@jordanhasnolife5163 3 месяца назад ⁺¹
You need to put data in kafka to read it from flink. I have a concepts video on stream processing which I hope would address any knowledge gaps regarding your use cases question.
@kushalsheth0212 8 месяцев назад ⁺²
the confidence at 20:49 that "all women follow me, they love me" 😂was amazing.
@jordanhasnolife5163 8 месяцев назад ⁺¹
Purr 💅🏻💅🏻
@brandonwl Год назад ⁺⁴
What is the point of having flink? I don't see checkpointing being a benefit because there is no state needed between each user-follower message?
@jordanhasnolife5163 Год назад ⁺⁴
Good point! I suppose it's not completely necessary from a functionality perspective, though from a correctness perspective Flink does help us ensure that each message will be handled at least once, which is good for keeping our tables in sync.
@ShivangiSingh-wc3gk 3 месяца назад
Because we don’t have any state even if we use a Kafka consumer with an Atleast once policy we would be good.
@AAASHIVAM Месяц назад ⁺²
Can we use Hbase instead of MySQL for the comments db? Since Hbase can support a higher throughput of writes as it first writes to WAL.
@jordanhasnolife5163 Месяц назад ⁺¹
Seems reasonable to me since we don't need multirow transactions here I don't believe
@thunderzeus8706 4 месяца назад ⁺¹
Regarding the follower and following table, I wonder why we write to the follower table first and then use CDC to update following, not the other way around?
When user A follows user B, I feel it is more important to reflect to A that "A followed B" instantly, while it is not a big deal for B to know immediately "if A has followed B". Say B takes a look at all B's followers, it is not time sensitive to reflect A.
Thanks!
@jordanhasnolife5163 4 месяца назад
I didn't think about this one too much as I figure that we'll eventually need both, but I think you could probably make cases for either
@TechPeck-zm9pe 3 месяца назад ⁺¹
If the posts table is partitioned by user-id there will be uneven data distribution on Cassandra right w.r.t partitions? Shouldn't it be partitioned by post-id? Actually it should partitioned using a composite primary key of the user-id and post-id. This ensures that there is uniform distribution of data and a read to retrieve all posts by the user can be accomplished by reading a small number of partitions.
Reading a small number of partitions for this kind of partitioned is an acceptable tradeoff give the uniform data distribution we get.
@jordanhasnolife5163 3 месяца назад
You're correct there, eventually though we'll eventually need some form of derived data to shard posts by user. You can send partitions from one node to another over time.
@subramanichockalingam5637 2 месяца назад ⁺¹
Thank you! Inspiring. Didn't feel like 43 mins video. Really enjoyed. But one common question for both DSA and System Design. How to make sure that I am actually learning instead of just memorising?
@jordanhasnolife5163 2 месяца назад ⁺¹
Try some practice problems! See if you feel like you can figure out a solution easily, and then take some time and critique it yourself.
@VibhorKumar-uo9dd 4 дня назад ⁺¹
One question regarding Fan out approach. While pushing posts to each followers, we are pushing that to a particular News Feed Caches corresponding to that user. My doubt is whether these news feed caches are just an another caching layer sharded on user id(let's say 10 caching servers sharded on userid for 100 users), or they specific to the user(100 users 100 caches in that case)?
@jordanhasnolife5163 Час назад
The first thing that you said seems more accurate to me
@hbhavsi Месяц назад ⁺¹
Hey Jordan, what's the difference between Kafka and Flink when you use both for new feed generation? (Kafka for Posts DB CDC and Flink for User-Followers CDC).
@jordanhasnolife5163 Месяц назад ⁺¹
I'm using both in conjunction. One is the queue, one consumes the queue.
@Anonymous-ym6st 3 месяца назад ⁺¹
I remember Meta has a paper around how to use TAO to do graph relationship management efficiently, compared with graphDB (But I didn't read into it), wondering if it used the same way as the DFS index? (also maybe a good paper candidate for the new series!)
@jordanhasnolife5163 3 месяца назад ⁺¹
Yep hopefully down the line!
@koeber99 10 месяцев назад ⁺¹
Instagram + Twitter + Facebook + Reddit (part-1)
How would your design change for each of the services? Instagram and Facebook have more photos and videos, therefore, CDN and S3 would be involved. But are there potential other changes to be made?
In regard to using kafka to process post, while sharding by userID improves message locality and simplifies processing for specific users, it does not enforce message order within each user's stream. I think this will be ok for human users, however, if there is automated service using Tweets that send MSGs one after the other the ordering will not be correct in the newsfeed!
@jordanhasnolife5163 10 месяцев назад ⁺¹
1) Yeah I wouldn't really change anything here, just supply the s3 url in the content of the tweet
2) You can perform ordering in the cache itself via something like a sorted list on the timestamp of each post. Inserts will be a little bit slower, but who cares?
@koeber99 10 месяцев назад
@@jordanhasnolife5163 cool thanks.... whenever, you get chance can you look at part-2 of my questions. Thanks!
@timchen6510 11 месяцев назад ⁺⁴
Hey Jordan, great video, learnt a ton. One question for managing user follows/following relationship, can we just update the two tables in one transaction instead of using CDC? For example if user 1 follows 2, we basically careate one entry in the user follows table as 1 : 2 and the one entry in the user following table 2: 1, and everything happens at the same time.
@jordanhasnolife5163 11 месяцев назад ⁺⁵
You can do this! That being said, these are big tables. So the chance that this transaction will be on multiple nodes is actually pretty high. In that case, you may find yourself having to do a two phase commit, which could slow down your write quite a bit.
@timchen6510 11 месяцев назад
@@jordanhasnolife5163makes sense, thank you!
@ugene1100 8 месяцев назад ⁺¹
definitely a ton to learn from your videos. Appreciate your work Jordan!
@tanvirkekan6111 4 месяца назад ⁺¹
For the reader(U1) : Lets say U1 : {f1,f2,f3 (verified)}You mentioned first it will query the user following verified cache, which is great to know that if the reader is following a verified user , in this case f3, the popular post cache is fetched to get some feeds for f3. But how will U1 know about f1 and f2 ? Does it has to do a DB call to userId - FollowerID cassandra DB each time ?
@jordanhasnolife5163 4 месяца назад
Well all the other posts should just be in the per-user cache
@gaurangpateriya4879 5 месяцев назад ⁺¹
I also had another query, since we were using cassandra, would we need to implement a mechanism to distinguish between write/edit vs sync operations to nodes somewhere before new data is propagated to Kafka queue? I mean to make sure only writes and edits are propagated and not sync updates as the CDC would be capturing data from all the replicas.
@jordanhasnolife5163 5 месяцев назад
Believe it's gonna handle it for us
cassandra.apache.org/doc/stable/cassandra/operating/cdc.html
@ariali2067 8 месяцев назад ⁺¹
What's the data in news feed caches are like? Is that something each entry is a tuple ? Ideally we should only store post id in the cache I would assume?
@jordanhasnolife5163 8 месяцев назад
I think you'd actually want the post text here - it's pretty small anyways! But yeah something like a tree set based on timestamp of posts
@NBetweenStations 6 месяцев назад ⁺²
Nice video Jordan! So for building newsfeeds is the idea when a Post is made, Flink node locally stores a mapping of which followers a user has and writes that post id to the Redis cache by those user ids? Thanks again!
@jordanhasnolife5163 6 месяцев назад ⁺¹
Sounds about right to me!
@quirkyquester Месяц назад
great question!
@Summer-qs7rq 10 месяцев назад ⁺²
Hey jordan, I had question related to user-follower relation. why cant we use graph db to store user follower or user-following instead of storing it in a nosql cassandra like db ?
Also when should i use graph db ? if the disk reads are sequential ?
Thanks a ton
@jordanhasnolife5163 10 месяцев назад ⁺⁵
Hey Summer! A graph db isn't necessary for this relation because we aren't actually doing any traversals. We just want to know for a given user, who they follow, or who follows them.
Generally speaking, I'd avoid using graph DBs if there is a way to model the problem other than by using a graph, as they're slow for general purpose tasks due to poor data locality. So to answer your question, you should use a graph db strictly if you plan on traversing multiple edges in a graph. So for example, "find all people who are separated from me by exactly two edges in a facebook friends graph".
@Summer-qs7rq 10 месяцев назад
@@jordanhasnolife5163 that makes sense.
@yiannigeorgantas1551 2 месяца назад ⁺¹
Thank you for the video. I was wondering: should we be concerned with the durability of the Redis caches? Do we assume there will always be one replica with power from which we can restore state?
@jordanhasnolife5163 2 месяца назад ⁺¹
I wouldn't say so, given
1) they are caches, and the events can be re-created via a batch job to our DB or something
2) we can replicate them
@charan775 Месяц назад ⁺¹
8:20
in your 4th video, you suggested to use single leader setup for user_id, chat_id because of conflicts like user joins chat and leaves chat
can't the same case be applied in this follower following case? why did we go with leaderless?
@jordanhasnolife5163 25 дней назад
Yeah, we could say the same thing more or less. I think it's just less important to me that follower semantics are correct than chat membership (if someone leaves a chat I want to make sure they do, but if you unfollow someone and for some reason it doesn't work not the end of the world). But fair argument, and nice catch!
@2sourcerer Месяц назад ⁺¹
15:54 The user followers table is indexing on user ID, right? Say X, and you can get who follows X (say users A, B, and C) from that (or via CDC), right? And say the person on the top-left is X, so that Flink consumer got to shuffle that new post from X, figure out that oh, it's A, B, C's feed cache I need to sent to. But at the same time, that user followers table is partitioned on user ID. So A not only follows X, but also follows Y, but that Y user may be on a different user-followers table partition than X. So there is another Flink consumer for that other user-followers table partition in which Y is on, and that consumer also has to send new post from Y to the same A's feed cache? And somehow that A's feed cache needs to arrange X's and Y's posts chronologically?
@jordanhasnolife5163 Месяц назад
Everything that you said is correct, it seems you're implying that there's some sort of challenge to doing this though. Which I don't think there is.
@2sourcerer Месяц назад
@@jordanhasnolife5163 OK got it. Now I can see that I do get it more or less. Thank you very much Jordan, very few RUclipsr actually takes the time to answer questions, I really appreciate it.
@jameshunt1822 11 месяцев назад ⁺¹
I have tough time sitting through your videos. Nothing wrong on your part. I don't feel confident I can use these words(technologies) in an interview without knowing the ins and outs of the them. Yeah I need to learn the basics first.
@jordanhasnolife5163 11 месяцев назад
Have you watched the concepts series? That's a bit of a prerequisite
@VaibhavMittal-t4j 9 месяцев назад ⁺¹
Thanks for your video. You suggested to use Flink to manage the user-follows DB and the user-following DB. Why can't we just maintain one DB which outlines the following relationship and have 2 indexes each on one column. One index will be on follower ID column and other on following ID column. Won't that solve our problem?
@jordanhasnolife5163 9 месяцев назад
Think about what happens when these tables are massive and need to be partitioned. All of a sudden if you want to index/partition by the first column of both tables you can't, hence you need two different tables.
@firezdog 6 месяцев назад ⁺¹
so is the need to partition what prevents us from indexing on both user and follower? I wrote this in my notes:
index key: by user or by follower?
we need both (one to quickly find all the users i follow, the other to find all the users who follow me)
if we have user-follower, i can quickly find everyone a given user is followed by (log n)
but to find everyone a given user follows, i have to look at each row for that user as a follower and collect the results (n)
the need to partition prevents us from indexing on two keys in our db?
would the expensive query to find all the users a user follows be classified as "scatter-gather"?
by using CDC to solve this problem, are you essentially trading space for time?
@jordanhasnolife5163 6 месяцев назад
1) Yes
2) Yep that would be a scatter gather
3) You're trading work on the write path for less work on the read path, I wouldn't say you're trading space here really as we're using the same amount of space (barring kafka, but you could make a similar argument even if we used two phase commit)
@deepakshankar249 9 месяцев назад ⁺¹
Jordan, you are a rockstar.. Thanks buddy 🙏
@jordanhasnolife5163 9 месяцев назад
Thank you Deepak!!
@tabishaleem369 4 месяца назад ⁺¹
Hey Jordan, in the user-follows & user-following table are we just storing the user ids? If so, doesn't that mean we will have to do separate calls to the user table to fetch the user info, or use a relational db to make joins on indexed column?
If we do keep the user info inside the user-follows and user-following key then aren't we duplicating the same user data many no of times?
@jordanhasnolife5163 4 месяца назад
Hey! Yes, but I imagine we'll likely paginate these results when we show them to the user so we'll only want to say fetch 25 user profiles at a time. You did describe the tradeoff of normalized vs. denormalized data, but I'd keep things normalized here.
@dkcjenx 4 месяца назад ⁺¹
Hi Jordan, at 7:36, are we able to remove the Flink component if we choose to use a lossless message queue? I don't quite get the purpose of using Flink if the message queue can already guarantee no data loss.
@jordanhasnolife5163 4 месяца назад
Flink is an optimization to avoid having to make an expensive DB read every time that we want to route a group chat message. As far as message delivery guarantees we don't need it.
@damienk5052 2 месяца назад ⁺¹
Instead of having two different tables for user-followers and user-following, is it possible to have a single table where we index on 2 fields? Something like:
partitionKey and index #1 on--> userID
clusteringKey and index #2 --> followerID
@jordanhasnolife5163 2 месяца назад ⁺¹
These tables are partitioned. You can do this, but if you want one to update when the other does you'll need a two phase commit. Personally I prefer to avoid that and just make them eventually consistent with derived data.
@igorrybalka2611 11 месяцев назад ⁺³
Thanks for such a detailed walkthrough, enjoyed the thought process.
I have a question whether we actually need separate derived data for "verified users someone follows"? Can this status not be part of user-follows table (similar to how access control is implemented on another table)? I suppose the downside is that we need to update a lot of relations when user becomes "verified" but it probably doesn't happen too often.
@jordanhasnolife5163 11 месяцев назад
Yep totally could, and I think you outlined the tradeoff well!
@quirkyquester Месяц назад ⁺¹
could we simply have redis cache to keep a set of verified/popular user, whenever flink gets updated for user follower, user following, flink can also check which one is verified and keep the cache in flink, it only needs to be done once per user added, and for that user, the data won't change much in the future?
@huangregin3807 Месяц назад
@@quirkyquester I thought about this as well. Seems an overkill to keep the mapping of {user: list of verified followers} in the cache when a set of verified users (partitioned cache) is good enough for news feed posts fetching purpose. This way, we also don't need CDC from user-followees table for Flink to populate data into verified user cache, just monitor the verified column in user table is sufficient.
@rishabhsaxena8096 10 месяцев назад ⁺²
Hey Jordan,
In Follower/Following Partition Data Model (9:55), you have used partition key as userID and sort key for follower/following ID. But what is understand is we will have 2 tables like you mentioned one will be userID-> FollowerID mapping and another would be FolloweeID-> userID mapping. Here I understand that a single user will be on single partition so we can quickly get all the followers of the user but then if we want all the followees of the user it would again be slow since we will have followees sitting in different partition. Could you please let me know if I'm missing something here?
@jordanhasnolife5163 10 месяцев назад ⁺²
Sure! The point here is that we have two different tables:
1) user -> someone that follows the user
2) user -> someone that the user is following
For both of these tables, we partition on the user column. This allows us to quickly figure out for a given user, who their following is, and who they follow via one singular network request (for each query). If we didn't do this, figuring out who I follow might take many different requests to different partitions which we'd then have to aggregate.
@rishabhsaxena8096 10 месяцев назад ⁺¹
Cool, that makes sense.
Thanks 😊
@sateeshgudla7393 3 месяца назад ⁺¹
Thank you for the videos; they are helping me improve my system design skills.
I have a query regarding the follower/following database using Cassandra. It is mentioned that write conflicts are not an issue. However, if a follow and unfollow action occur in immediate succession and both events succeed on separate nodes, wouldn't this cause a conflict and lead to a Last Write Wins (LWW) situation?
@jordanhasnolife5163 3 месяца назад
I'm imagining we'd model set deletion as an actual event in the database. In such a case, assuming quorum consistency, at least one node should get both the follow and the unfollow, and the follow must obviously occur first since they're causally related. Then we'd see that we've unfollowed.
Even besides that, if we were really going for correctness, we could just ditch cassandra. For twitter followers, it's not the end of the world.
@sateeshgudla7393 3 месяца назад ⁺¹
Thanks for the prompt reply jordan
@Anonymous-ym6st 3 месяца назад ⁺¹
This is a great video to help me understand more how CDC really use in the real app. I am wondering in the real development environment, how CDC is deployed? How Kafka knows there would be some changes and need to go through them? OR it's more like a scribe scheduled data pipeline change which do auto data SQL every few min?
@jordanhasnolife5163 3 месяца назад ⁺¹
Look into debezium. Effectively it's database triggers that publish to kafka.
@Craftizmv 2 месяца назад ⁺¹
This is in-depth and super awesome video Jordan. Could you pls share if all this infra is going to be multi-region or in same region (like in AWS) - as multi-region will bring extra complexity ..right ?
@jordanhasnolife5163 2 месяца назад
I don't really see how it brings too much extra complexity in this case (at least within the scope of an interview). It'll definitely slow down the delivery of certain messages to certain user caches, but I think the process remains the same.
@Craftizmv 2 месяца назад
@@jordanhasnolife5163 Makes sense Jordan ! Thanks !
@vorandrew 11 месяцев назад ⁺¹
user following task - make 'user|follower' table storage model. read model = redis set :followers and :following sets and update them async (flink)
feed - totally disagree. You make individual feed per user - yes. But you don't clone posts... you make array of post_ids ONLY, order/filter them by time, importance, whatever... Fetch posts on client side by id
@jordanhasnolife5163 11 месяцев назад ⁺¹
I may have to double disagree with you back since that fetch of posts by I'd can go to any number of partitions
@vorandrew 11 месяцев назад
@@jordanhasnolife5163 so what? use same Cassandra db...
saving posts will give overhead = avg 100 posts in feed * timestamp (8) + post (140) etc...
@lv0320 2 месяца назад ⁺¹
If we were to use graph database instead for follow graph (Twitter uses graphDB), curious how can we handle fetching both follower and following relationships efficiently? Is the option in that case then to have two graph DBs and use CDC?
@jordanhasnolife5163 2 месяца назад
I don't really see what the graph database gets us here, considering we aren't doing any graph traversals of our follower relationships.
@jen17a 8 месяцев назад ⁺¹
Hi Jordan!
Thanks for the video. I do have some questions (might be basic but I am just starting to learn these technologies).
1) Say I had created an account, followed few folks 5 years back. And today I had decided to post a video. How does flink have the user:[follower] list from 5 years back? Does it fetch from the main storage?
2) There are billions of users on insta/fb, how is flink handling this data?
@jordanhasnolife5163 8 месяцев назад ⁺¹
1) It doesn't go anywhere, flink can store things in a persistent manner, see rocksdb persistence
2) Partitioning!!!!
@yuanshaoqian 6 месяцев назад ⁺¹
Great video, thanks a lot. I am also trying to study how to design ML recommendation based newsfeed (FB/IG/Twitter these days mostly rank the feed based on some ML score instead of post creation time), I couldn’t find a lot of material on it, could you make a video on this too?
@jordanhasnolife5163 6 месяцев назад ⁺¹
Check my recent videos
@beecal4279 2 месяца назад ⁺¹
thanks for the video! Is the part for reddit nested comment (from 28:32 ) now superseded by the (15)Reddit Comments video?
@jordanhasnolife5163 2 месяца назад ⁺¹
I think the latter should be a superset of the former, agreed
@harikareddy5759 11 месяцев назад ⁺¹
Jordan, awesome video once again! What is up with the video descriptions? 😂
@jordanhasnolife5163 11 месяцев назад ⁺²
Oh shoot that was meant for my diary
@tunepa4418 5 месяцев назад ⁺¹
Hello Jordan. If we want to support finding friends of friends in this problem, I guess a graph database would be ideal for the follows table?
@jordanhasnolife5163 5 месяцев назад
Yup
@javaadpatel9097 Месяц назад ⁺¹
Hey Jordan, thanks for the awesowm video. For the comments structure, how do you handle more than 26 comments? The comments are labelled "a", "b, etc but what happens after the "z"th comment?
@jordanhasnolife5163 Месяц назад ⁺¹
Keep in mind an ID is used for a layer of nested comments. See how we can extend this structure to pretty much arbitrarily large size, depending on the number of characters that we use per Id?
post
| comment aa
| comment ab
| | comment abaa
| | comment abab
| | | comment ababaa
| comment ac
| | comment acaa
...
| comment zz
| | comment zzaa
@javaadpatel9097 Месяц назад ⁺¹
Perfect, that makes sense. This is a super useful way to think about nested comments and how you can show top level Vs nested comments using the bfs and DFS approaches.
@seemapandey2020 6 месяцев назад ⁺¹
Thanks.
@jordan With CDC, does it mean that entire user-follower mapping would be available in Kafka, all the time ?.
Generally my view was CDC is for 'change' capture so good for incremental change processing. And the initiated 'change' stream of user-follower mapping updates via introduced Kafka producer would be marked completed, post processing via flink and eventually flushed out.
Though here it's being recommended to be used as a replacement for persistence modelling. Need help creating a mental model of it - why so ? Else the approach is never intuitive extension, ever.
How does the Kafka gets re-populated across deployments, and for any re-boots post eventuality ?
Also appreciate your fresh thought & detailed approach to the solution - Is it really implemented practically at any of similar use-case is industry ? Else its limited to be discussed just theoretically
@jordanhasnolife5163 6 месяцев назад
I'm not sure I understand your question. Once the data gets to flink and makes it into an S3 snapshot, it can be cleared from kafka. At that point, we'll always have the state available for us.
@seemapandey2020 6 месяцев назад ⁺¹
@@jordanhasnolife5163 My view is that - the stream processing of runtime upstream 'change' from Follower service on Flink as the change consumer would also be flushed from S3 snapshot eventually, once its change processing is complete.
Would the entire user-following mapping be persisted and maintained on Flink even after the 'change' processing is complete? Else how does it serve the queries on user-following mapping ?
@jordanhasnolife5163 6 месяцев назад
@@seemapandey2020 You're correct, I'm advising you cache the entire thing
@rishabbanerjee5152 22 дня назад ⁺¹
For situations where there is causal dependency (comments in this video) why do you never use version vectors for eventual consistency and instead resort to single leader replication?
Is it because
1. It’s complicated.
2. Till the time eventually consistency hasn’t happened users might become confused on seeing missing comments?
@jordanhasnolife5163 22 дня назад ⁺¹
Version vectors is great for telling me post hoc which writes are causally dependent, but it doesn't stop me from reading a replica that isn't causally consistent until something like anti entropy occurs
@thunderzeus8706 4 месяца назад ⁺¹
Hi Jordan, I have another question about the News Feed Diagram slide.
In the diagram, the post database, the follower database, the Kafka queues, the Flink nodes are all partitioned on user_id. Is it true that although they share the same partition key, the number of partitioned in each of the components can be different? Or do they share the exact same partition methods and count?
The reason I asked is because I think there could be need for autoscaling in each of those components on their own. If they share the exact same partition methods and count, it would be hard to keep those component in sync. If they actually do, I don't think it is an easy job. How is that achieved?
If they do not share the exact same partition approach, Say the post DB has 1000 partitions, the follower DB has 500 partitions, two Kafka queues have 800 and 1000 partitions respectively, and the Flink node has 1000 partitions. Is there a load balancer in between each of those components to redirect them to the correct shard in the next component? On the other hand, partitioning on the same key is not too meaningful then, as we are doing "semi shuffle" anyway.
Thanks! Please let me know if the question is not super clear to you.
@jordanhasnolife5163 4 месяца назад
All that matters is that user X is always on the same node. How we actually shard each piece of this can be tweaked as much as we need to.
@thunderzeus8706 4 месяца назад ⁺¹
@@jordanhasnolife5163 Sorry I don't think I followed. What does "same node" refer to?
As far as my understanding goes, the DB, the Kafka queue, and the Flink should be on DIFFERENT servers. In that case, from DB to Kafka, from Kafka to flink, each of those steps need to go from one server to the other (via a load balancer), or in other words need a network call.
That's why I don't agree with "user X is always on the same node".
@jordanhasnolife5163 4 месяца назад
@@thunderzeus8706 You're correct, they're on different components. There is either load balancer (either a separate component or each component itself is subscribed to Zookeeper) between each layer.
@thunderzeus8706 4 месяца назад ⁺¹
@@jordanhasnolife5163 Thank you for the reply Jordan! If that is the case, is it true that "It is not important for the PostDB and FollowerDB to have the same sharding key userID as the Flink node"? Because whatever sharding key the DBs use, it has to undergo resharding step (network call) before entering the Flink node, no matter what if it is sharded by userID, or postID or some other IDs. It feels similar to a map reduce batch job during which there involves in sort+shuffle inbetween steps.
When I watched this video for the first time, my intuition was "we benefit less network call with every component sharing the same sharding key". According to your reply above, I guess it was not the case. Is that correct?
I also want confirmation on what "same node" your referred to in your prior reply.
Thanks!
@jordanhasnolife5163 3 месяца назад ⁺¹
@@thunderzeus8706 I simply meant that one for given flink node, all data for user id x should be handled on the same flink node.
As for your first question, the databases can be sharded differently than flink, due to their load balancer. However, you'd typically want kafka to be sharded the exact same way as flink so that each flink node reads from one kafka partition.
@onurozer7218 12 дней назад ⁺¹
Thanks! Can you explain if Flink is really capable of keeping all these by it’s own?
@jordanhasnolife5163 7 дней назад
Not really much to explain here as it's a binary answer, to which I'd have to say yes, assuming we have enough shards.
Alternatively, we can skip the flink optimization and just ask our database who the followers of a given user are.
@rishabbanerjee5152 22 дня назад ⁺¹
Can you please change the color when you write in such dense diagrams? It’s difficult to follow when you make changes or write something with same white color. For small diagrams it’s fine.
@jordanhasnolife5163 22 дня назад ⁺¹
I will make an effort
@ramm5621 10 месяцев назад ⁺²
Hey jordan,
great video as always. So I had a question earlier until I realized why you did it. Diff question now
So the tradeoff you made here is global secondary index on followerId vs having 2 DBs updated by CDC with exactly-once processing guarantee using flink?
In terms of needing distributed transactions and having redundant data both approaches would be about even right? Maybe not having the redundant data (global index) eating up disk speeds up queries or is that wrong?
@jordanhasnolife5163 10 месяцев назад ⁺⁴
Hey Ram! So having a global secondary index here would mean that per write we'd need to update the secondary index write away (which could be on a different partition), hence we could be looking at a distributed transaction.
With CDC, we avoid this, knowing that *eventually* our second table will be updated
@ramm5621 10 месяцев назад
@@jordanhasnolife5163 So supposing a downstream DB (followerId) write fails, we'd have to send a rollback event to rollback the change in the upstream DB (userId) right? But until then we're serving the data in a non-atomic manner. Vs. in a dist. transaction we never serve reads from a partially committed transaction but we suffer in terms of write performance and we can't read from the modified rows.
Assuming I understand this correctly, I really like this tradeoff in terms of providing signal in interviews.
So we could go with CDC with rollbacks even for a reservation/ticketmaster system (Isolation is strict but worst case show a room/ticket is taken and then they can retry) , whereas the only time you go with the dist. transaction is when you'd rather never show the non-atomic data than be slower.
@georgekousouris4900 Месяц назад ⁺¹
Hi Jordan! Thanks a lot for all the videos!
Small question, does it make sense to put the entire post in Kafka, and then subsequently on Flink, etc ? If the post is big/has media, wouldn't it be better to put just the post id through Kafka to Flink and then have the Feed Service fetch from the Posts DB ? Also perhaps the User DB to fetch the profile pic, name, etc ? So we minimise duplicated load on Kafka/Flink ?
Otherwise, we could store the {user: (poster_id, post_id)} relation only in Flink and then fetch all the data and populate a cache for the user ?
What do you think ?
@jordanhasnolife5163 Месяц назад ⁺¹
While I see your point, the posts db is inherently sharded. Meaning to get a news feed, we'd have to read form many posts db shards. All media should just be URLs to S3.
Question for you: what's the benefit of twitter limiting their posts size to 140 characters? :)
EDIT: I now realize that you're suggesting to not put the actual post data in kafka, and just fetch it from flink. I think that's a reasonable optimization, but in practice it wouldn't really save us that much time.
@georgekousouris4900 Месяц назад
@@jordanhasnolife5163 By limiting to 140 characters, we can calculate the following:
- 500 million tweets per day:
- 300 bytes per tweet if we assume we only store (user_id, timestamp, tweet, uuid, media_link)
=> 150 of tweet data GB per day.
If each user has ~100 friends, that's 15TB of replicated tweet data that we have to store on caches sharded by friend_id. We can probably have 60 beefy in-memory caches with 250GB RAM to host this with a TTL of 1 day ?
The next day, we can evict by least frequently used, since hopefully every user will have a newly curated news feed in their cache every day.
However, we probably also want to show information about the user (Their full name, their profile pic, their handle). Perhaps we do not want to put this information with each message, since if a person makes 15 tweets (that we want to cache), their info shouldn't be so replicated.
So could we perhaps have a Users DB cache, where we add to it a reader_id: (poster_id, poster_info) and then shard the caches by reader_id ?
This way the Reader will look at the newsfeed cache, fetch all posts and then fetch the appropriate user info from the Users Cache ?
@georgekousouris4900 Месяц назад ⁺¹
Side note: Couldn't see my own comment when I posted for a few minutes, so does RUclips use multi-leader/leaderless replication for nested comments ? 🤔
@jordanhasnolife5163 Месяц назад
@@georgekousouris4900 I wouldn't be surprised!
@just_a_normal_lad 10 месяцев назад ⁺¹
Thanks for the wonderful video; I really loved it. I have one doubt regarding the part where you suggested using CDC pipeline for updating the user-follows DB. The reason you provided is to avoid the overhead of maintaining distributed transactions, such as the 2-phase commit. For example, let's consider T1 as updating the User-follower DB and T2 as updating the User-follows DB. Maintaining transactionality between T1 and T2 is difficult without using SAGA or 2-phase commit. That's why the suggestion is to use CDC pipeline.
However, my question is, by using CDC pipeline, are we not just replacing T2 with a Kafka producer call? Doesn't this still pose the same issue? What happens if the Kafka producer call fails, even with its multiple retry mechanisms? My concern is whether replacing the DB call with a Kafka producer call truly addresses the distributed transaction issue.
@jordanhasnolife5163 10 месяцев назад
Yep, it's still an extra networking call, you make a fair point there. But it's also non blocking if for whatever reason kafka is down. I can still submit new following requests, and then the other table can be updated "later", as opposed to having to need both writes to go through exactly at once.
If you need both tables to be perfectly consistent, sure, go ahead and use 2pc.
@just_a_normal_lad 10 месяцев назад ⁺¹
Thanks for the reply. I got the idea. Looking forward to the next video
@douglasgomes9144 5 месяцев назад ⁺¹
Hey, Jordan! Which videos or playlist do you recommend for someone who wants to start from scratch? BTW thanks for the amazing content!
@jordanhasnolife5163 5 месяцев назад ⁺¹
Perhaps I'd watch systems design concepts 2.0, and then system design problems 2.0!
@douglasgomes9144 5 месяцев назад ⁺¹
Thanks a lot! You are amazing!
@kaqqao 7 месяцев назад ⁺¹
For the follower/following case, why not index on both columns? It is possible to have a secondary index on a non-partition-key column, right?
@jordanhasnolife5163 7 месяцев назад
If there are tons of these relationships, not all rows will be on the same table. Then you'll have to make a local secondary index, so for getting the counts for a given user you'd need to distribute your query.
@yashbansal1038 3 месяца назад ⁺¹
yo jordan can you start adding links to the slides that you use to explain the system designs?
Will help in keeping notes.
@jordanhasnolife5163 3 месяца назад
See substack in channel description.
@jporritt 4 месяца назад ⁺¹
On Twitter people complain about shadowbanning which is some sort of noticeably reduced interaction with their posts. They have a lot of weird and wonderful conspiracy theories behind the mechanism for this algorithm. Considering the above though, the eventual consistency of the CDC part suggests what they may actually be seeing.
@jordanhasnolife5163 4 месяца назад
Yeah this is interesting, I think there actually has been some legitimate evidence to the contrary though haha
@joonlee3863 7 месяцев назад ⁺¹
This may sound like a dumb question, but for when designing the schema for User-Follower table, you mentioned the reason why you chose not to have both User-Follower and User-Following DBs is to avoid 2PC since it'd be a distributed write.
But what if we put both User-Follower and User-Following Tables in the same DB? What other reasons are there other than there's no good way to partition the DB without screwing up the other table (distributed query)?
@jordanhasnolife5163 7 месяцев назад ⁺¹
You just named the best reason haha!
@joonlee3863 7 месяцев назад
@@jordanhasnolife5163 thanks! Was wondering if there were other good reasons on why you separated the 2 tables into separate DBs
@Rohit-hs8wp 11 месяцев назад ⁺¹
Have Questions on user-follow DB Choice. You said Write Conflict will not be an issues ( why so ?).
Suppose I have 3 node, and I am using Quorums Read and Write ( R=2 , W=2). Suppose , User 1 follows User 2, 3, 4 in a very Quick Session.
User1 Follows User 2 write goes to Node1, Node 2 . User1 Follows User 3 write goes to Node 2 , Node 3. User1 Follows User 4 write goes to Node1, Node 3.
Node 1 -> (User1,User2) , ( User1, User4) , Node 2 -> (User1,User2) , ( User1, User3) , Node 3 -> ( User1, User3 ), (User 1, User 4).
Now Since Cassandra uses LWW, One of this Follows relationship would be lost based on timestamp.
We could have mitigated this issues if we have used Riak and maintaining User-follow( user, setas Follower_id) and set CRDT or User_Follow( user_id, Follower_id) and using version vector and storing sibling in the face of Concurrent Write.
Please do comments your thought on this. ( Thank you for videos, Learning a lot from your videos )
@jordanhasnolife5163 11 месяцев назад
Hey the reason why the above should be fine is that those writes should be different rows, so they should never conflict with one another in the first place.
Eventually, anti entropy will take place and then we'll sync up as expected.
@Rohit-hs8wp 11 месяцев назад
@@jordanhasnolife5163 Yes you are right. Thank you for the reply.
@truptijoshi2535 6 месяцев назад ⁺¹
Hi Jordan, What is the difference between User-verified-following cache and the popular caches?
@jordanhasnolife5163 6 месяцев назад
One contains posts from verified users, the other contains which users are verified
@manishasharma-hy5mj 6 месяцев назад ⁺¹
Hi Jordan, can u pls explain once more thie CDC part, how is it working, from which table to which table it is going to capture the change and what advantage we have. Please 🙏
@jordanhasnolife5163 6 месяцев назад
1) When a user posts, we want to send their post to all their followers (assuming they don't have too many)
2) We have all of those relationships in a database table already
3) We need to load their followers, but this can be an expensive call to the DB
4) We instead pre-cache all of this information by using change data capture on the DB to get it into a flink instance, which we shard on posterId, so all of there followers will already be there
5) When the post comes in, look at the followers, and send the post to the appropriate place.
@gayathrikarthik9041 4 месяца назад ⁺¹
@@jordanhasnolife5163 Thank you for the explanation. Another question on this - how many users' follower data will we capture into Flink instance? What would be the strategy for this? And how long is the data retention of user-follower data in Flink? Imagine a case where a userA started following user B 3 years ago. Will this change data be available in Flink?
@jordanhasnolife5163 4 месяца назад
@@gayathrikarthik9041 Yes, and forever. We can re-partition and rebalance our data as necessary over time.
@akbarkool 8 месяцев назад ⁺¹
Question on the common CDC pattern you use when adding data to 2 tables, why not just make 2 separate calls to these tables (with retrys) why do you need 2P commit? (If rolling back is the issue, I don't see how the CDC pattern is helping with that)
@jordanhasnolife5163 8 месяцев назад ⁺¹
1) You can do this, now you just have to spend some more time thinking about all of the partial failure scenarios, how long you want to retry for, etc etc. In practice it's probably fine.
2) When a write gets undone to table 1, that also goes through table 1's CDC and gets propagated to table 2.
@akbarkool 8 месяцев назад ⁺¹
@@jordanhasnolife5163All the extra thinking about retrys and everything else is surely less than the effort to setup an entire kafka queue + CDC piepline? I think we are over engineering this aspect of the design and an interviewer could push back on this
@Summer-qs7rq 10 месяцев назад ⁺¹
Thanks for this lovely video. Appreciate your efforts here.
However i have a question here regarding nested comments.
what is the downside of using document db for storing nested comments ? is there a situation that makes document db more optimal than dfs based index store for nested comments ? Could you please shed some light on decision making on documents vs dfs ?
@jordanhasnolife5163 10 месяцев назад ⁺¹
Yeah I think with a document DB you lose the ability to query for a comment by ID. Instead you basically have to fetch the parent comment and Travers down, as opposed to being able to get a query off the bat.
@PoRBvG 4 месяца назад ⁺¹
Hi Jordan!
Great content as always! I have a question regarding the setup with two tables:
Table 1: User -> Followee
Table 2: User -> Follower
I understand you used CDC (log-based, I assume) through Kafka and Flink from Table 1 to Table 2. The reasoning behind this makes total sense, allowing swift access to all followers and followees, partitioned by user (id). My question is, do we use the first table for pull (fanout-on-read) for celebrity users, and the second table for push (fanout-on-write) for regular users?
@jordanhasnolife5163 4 месяца назад ⁺¹
Yep seems correct to me
@yiannigeorgantas1551 2 месяца назад ⁺¹
From my understanding, for celebrity users, we're trading less storage cost for more compute cost. Celebrity posts are merged to users feed when they read their feed, and this happens each time they reload their feed (requires more compute). However, celebrity posts don't get copied millions of times over in the feed cache, which saves storage.
I think the main reason for making this trade-off is to avoid having a celebrity user block other messages from progressing with a large fan-out when they post.
@yiannigeorgantas1551 2 месяца назад ⁺¹
One more observation: when the fan-out happens at write time, it needs to be coordinated by a single shard. When it happens at read time, that computation is spread across a cluster of nodes.
@jordanhasnolife5163 2 месяца назад ⁺¹
@@yiannigeorgantas1551 especially agree with your second point about blocking other posts
@techlifewithmohsin6142 5 месяцев назад ⁺¹
Do we really need CDC complex approach? Databases like DDB with secondary index can be used, where user_id is partition key while follower_id is secondary index. This solves both followers and following in same table.
@jordanhasnolife5163 5 месяцев назад
How do you deal with different partitioning schemas? DDB has global indexes but then you need to two phase commit.
@techlifewithmohsin6142 5 месяцев назад
@@jordanhasnolife5163 when we add relationship or entry in DDB it would be single transaction, there is no two phase commit
@techlifewithmohsin6142 5 месяцев назад ⁺¹
@@jordanhasnolife5163 idea here would be to use LSI given for this use case we don't need GSI so two phase commit would be avoided which is happening internally with shadow table. It can possibly lead to hot partition, if we're calling a particular user more often but then our cache will any come into the role. so I think DDB can further simplify this. A good trade off to consider
@jordanhasnolife5163 5 месяцев назад ⁺²
@@techlifewithmohsin6142 I don't think I agree that a local secondary index would be sufficient here. If I want to find all followers of user x and all people that user x follows, how would I partition the table so that I can efficiently do this without duplicating a ton of data?
@techlifewithmohsin6142 5 месяцев назад
@@jordanhasnolife5163 yeah now I'm realizing, the two phase commit, we would need GSI.
@pbsupriya 6 месяцев назад ⁺¹
Hi Jordan, Thank you for all the content. I have a question - Why can't we use a secondary index for fetching followers/following mentioned at 5:40?
@jordanhasnolife5163 6 месяцев назад ⁺¹
Because they are sharded differently! Using a local secondary index means that one of the two tables would require a scatter/gather read request.
@pbsupriya 6 месяцев назад
@@jordanhasnolife5163 Oh okay. Thank you. Great content. I recommend to 10 people in 2 days :)
@Didneodl Год назад ⁺¹
Thanks for another great video! What do you think about supporting a feature friends who you might know on FB?
That case we gonna have to traverse more depths on follower/followee relationships. Then, could columnar db could still be a good option? Would you go with graph db in that case?
@jordanhasnolife5163 Год назад ⁺¹
Well I guess the interesting thing here is that potential friends all have mutual friends, so every time you add a friend you could add all of their friends as potential ones.
But agreed generally speaking something like a graph db may be best
@harinimurali1180 Месяц назад ⁺¹
Can we use graph db for storing follower/following relationships?
@jordanhasnolife5163 25 дней назад
Why would we?
@Prakhar1405 9 месяцев назад ⁺¹
What happens when a flink node is down during caching, will this caching data will be stored in S3 as well. How we will recover from this?
@jordanhasnolife5163 9 месяцев назад
If a flink node goes down, we can bring up another new node, restore the state of the failed node from its previous checkpoint in S3, and then the kafka queues beginning from the checkpoint barriers associated with this s3 checkpoint. I'd probably recommend watching the video that I made about Flink!
@levyshi 6 месяцев назад ⁺¹
One question on post service, when a user is posting, do they send the photos to s3 first, and then after upload is done, they'll make a call to the post service? or do they send the photo to the post service, and the post service will write to s3. how should we handle post service failing after the upload to s3 is complete but before they can write to the cassandra db?
@jordanhasnolife5163 6 месяцев назад
Ultimately that's your call.
You probably have less ability to do validation uploading right to S3, but also less latency in having to upload to a server first.
You can just do a client retry for the second part. If your client is down, whatever we have an extra image in S3, no big deal.
@user-xu3nx9tk8v 9 месяцев назад ⁺¹
Hi , thanks for posting this, i have a question, I don't understand how we are using cassandra to store
user follower relationship in 9:42
1. please correct me if i am wrong but cassandra primary key has to be unique, you cannot have
represent user 1 has user 2 and user 3 follower like this
user1 -> user2
user1 -> user3
Are we bundling (user, follower) into single primary key, and query by partition key?
2. if user1 has 100 million followers, for example elon musk, justin bieber, storing them in same partition might not be a good idea?
I might misunderstand something, can you elaborate on how you represent user to follower relationship in cassandra?
@jordanhasnolife5163 9 месяцев назад
1) The Cassandra primary key is the combination of a partitioning key and the clustering (or sorting key). The partitioning key here is the first user, and the clustering key is the second user.
2) Yeah, you're correct that for someone with 100 million followers we probably can't store this all on one partition. We can introduce a second component to our partitioning key which has some number (from say 1-100) that we only use for popular users. Then we can perform re-partitioning for our popular user data.
@John-nhoJ Год назад ⁺¹
Disagree with the scatter-gather for followers/following. Index both fields, shard on the user_id of the follower. It's not a linear scan if you use both indices.
@jordanhasnolife5163 Год назад ⁺²
You can't index on both fields on the same table unless you mean an inner sort, which doesnt help us get all followers for a given user without a linear scan
Feel free to elaborate
@John-nhoJ Год назад ⁺¹
@@jordanhasnolife5163 Table - follower
follower_user_id: PK, idx (shard key)
followed_user_id: FK (user.id), idx
get_followers_by_user_id(user_id) will have to do scatter gather over replicas of the shards, but you don't have demands for strong consistency and most places prevent you from paginating deeply into someone's followers. E.g. no way Twitter will show you all of Taylor Swift's followers.
@John-nhoJ Год назад ⁺²
@jordanhasnolife5163 why no response? Cowering in fear?
@jianchengli8517 10 месяцев назад ⁺¹
What if a user newly subscribe another user and refresh the timeline. How does this streaming solution works as apparently there will be a cache miss for posts from this new following (posted before subscription)?
@jordanhasnolife5163 10 месяцев назад ⁺¹
Yep, you just don't see them. Not the end of the world, there are no guarantees on the news feed.
@satvik-kaushik Месяц назад ⁺¹
I have a doubt, if you please help me out
In the follower-following relationship, you mentioned that we can just merge relationships present if different Database nodes, but what if one DB has received the call to delete but the following relationship is still present in another, how do we ensure consistency then?
@jordanhasnolife5163 Месяц назад
Not entirely sure what you mean here.
We have two tables: user-followers (I'll call this A), and user following (I'll call this B).
B is derived from A via change data capture. If I want to delete an entry, I just delete it from A. It's using single leader replication so I don't believe I should have to worry about consistency issues here.
The tables are eventually consistent with one another.
@satvik-kaushik Месяц назад ⁺¹
8:00 seems like A is using leaderless replication.
Are we saying that if a conflict exists for a while in the DB we don't really care because it is eventually consistent and we can show stale data to a client for a while since it's not critical to our functionality?
@jordanhasnolife5163 Месяц назад ⁺¹
@@satvik-kaushik Ah damn forgot I used Cassandra. I think assuming we're using quorums we should have very few conflicts, but ultimately we'll converge to a "consistent state", even if it isn't necessarily the correct one. E.g. I follow, unfollow, and follow again and the DB thinks that I unfollowed. That being said, this isn't critical state so who cares haha
@satvik-kaushik Месяц назад
@@jordanhasnolife5163 thanks for taking out the time to reply to all comments and help people with doubts, massively appreciated!
@CompleteAbsurdist 6 месяцев назад ⁺¹
For the posts DB, what's your opinion on using mongodb? at the end, the posts data is almost always text. won't a document based DB be suitable instead of Cassandra?
@jordanhasnolife5163 6 месяцев назад
I don't really think it's json text, so as long as it isn't too long (e.g 140 characters) I don't think that the document format would make a huge difference, but hard to say!
@mcee311 Год назад ⁺²
for write conflicts what if a user follows then unfollows. I guess last write wins in this situation?
@jordanhasnolife5163 Год назад ⁺³
Yeah basically, it's an edge case but ideally shouldn't happen too often. Fortunately if we screw up a following it's not the end of the world.
@scbs2007 10 месяцев назад ⁺¹
"Kafka - partition by follower Id": do you think it is a pragmatic approach to have tons of partitions in kafka? instead should we not have created a limited set of partitions and mapped users to one of the entries in that set? Or did you mean to say the latter?
@jordanhasnolife5163 10 месяцев назад
When I say "partition by follower Id", what I really mean is "use a partitioning function that uses the range of hashes on follower id". Agree with your concern otherwise!
@hung291100 10 месяцев назад ⁺¹
So in conclusion write conflicts won’t be a problem if the data doesn’t usually get updated?
@jordanhasnolife5163 10 месяцев назад
What are you saying this in reference to? I'd say that write conflicts aren't a problem if we don't have writes that overwrite one another.

Следующие

Автовоспроизведение

3: Dropbox + Google Drive | Systems Design Interview Questions With Ex-Google SWE