3: Dropbox + Google Drive | Systems Design Interview Questions With Ex-Google SWE

Jordan has no life

Просмотров 19 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 ноя 2024

Комментарии • 162

@ricardobedin2953 11 месяцев назад ⁺⁴⁵
Hey Jordan -- just wanna say thank you. I got multiple staff-level offers with big tech companies and your videos were the main resource I used for system's design. You are doing a phenomenal job that no other channel (that I know of) is close to doing. Thanks my man!
@jordanhasnolife5163 11 месяцев назад ⁺³
Let's gooo! Congrats Ricardo, makes me super happy to hear! Glad your hard work paid off
@johnlong9786 9 месяцев назад ⁺²
And thank you, Ricardo, for referring me here in your article about your journey.
@truptijoshi2535 5 месяцев назад
Hey @ricardobedin2953, could you please let me know if you used any other source along with this one for staff-level offers?
@debarshighosh9059 8 месяцев назад ⁺¹⁷
Hey Jordan, in the capacity estimation, I think the total doc size should be 10 PB(1b users * 100 docs/user * 100kB per doc = 1B * 10^7 bytes = 10pB), rather than 100TB
@jordanhasnolife5163 8 месяцев назад
Thanks for catching!
@samsai4460 4 месяца назад
100 PB
@knightbird00 12 дней назад ⁺¹
Bookmarks for my revision
8:26 Chunking (checksum, integrity)
9:46 Chunks table - (replication type, versions, single leader, multi leader, siblings)
18:36 Uploading file (S3, DB 2PC, may be multi part, pre-signed)
20:35 Pushing files to users
23:00 File routing (CDC for all file changes, Ref 4:18 Document permissions)
Can add things related to client and client side database and merkle tree for detecting file changes, erasure coding for block replication etc.
@zhonglin5985 7 месяцев назад ⁺¹
hey Jordan, nice content again, and thanks for answering all my previous questions! One question for this video -- how would you handle hash collision when you use MD5 as the uniquely identifier for blocks?
@jordanhasnolife5163 7 месяцев назад
I think these would be pretty darn infrequent, but if we needed to we could perform the hashing check, but we only don't reupload if the chunk with the same hash corresponds to this fileId
@davidwang9350 5 месяцев назад ⁺²
Thanks for these great videos Jordan! Very thorough and well-explained.
Quick question: If we wanted to support the additional functionality of "given a file, return which users already have access to it", what would you opt for? I was thinking that the query to go from "fileId" --> "user's with access" is slow since the Permissions DB is indexed/partitioned on userId. Would the ideal solution be similar to what you employed in the Twitter design? Use CDC from Permissions DB to construct a new DB (basically derived data) which partitions and indexes on fileId?
@jordanhasnolife5163 5 месяцев назад ⁺¹
I think that would be very reasonable yeah
@samsai4460 4 месяца назад ⁺¹
My exact thoughts!
@1986yatinpatil 5 месяцев назад ⁺²
Hey Jordan,
Great content! Your system design videos offer a refreshing perspective compared to the repetitive patterns seen online.
I have a question regarding storing files (and posts from your previous video) in a per-user cache. Is this cache stored in memory or persistently? Also, in the event of data loss from this per-user cache, would the setup involving Kafka, Flink, and the cache be capable of rebuilding the state from scratch?
And similar question about the caching we do on the flink node. If the Flink node goes down, can it restore the state of the in memory cache or will it have to replay all messages from Kafka to restore the state?
Thanks!
@jordanhasnolife5163 5 месяцев назад
Realistically, I'd think it would have to be on disk, since otherwise that would be memory usage per user which could be expensive! While there's not a great way to rebuild the cache downstream (without it being in the flink ecosystem and having one flink node routing messages to a per user flink cache via kafka), the cache will refill itself as more people post!
@yrfvnihfcvhjikfjn 11 месяцев назад ⁺¹
Hey Jordan. I just wanted to express my condolences. I know you are going through a tough time now after such a significant loss. 😢
@jordanhasnolife5163 11 месяцев назад ⁺²
I appreciate the condolences. Not every day you lose your virginity
@yrfvnihfcvhjikfjn 11 месяцев назад ⁺¹
@@jordanhasnolife5163 you voice does sound pretty raspy in this video
@ShreeharshaV 2 месяца назад ⁺¹
Great video Jordan. I have below questions.
1) You mentioned that Client first uploads chunks to S3 first followed by calling chunkService to update metdata (s3 url link). Can you pls help explain how Client directly starts uploading to S3 first? How does the client know where to upload?
2) How does CDN come into picture here? Is it that S3 has some integration with CDN that it automatically caches files to CDN? Can you help explain how does that happen little more in detail?
3) How does file read happen by client? Didnt follow how client is subscribing to those Kafka topics. Does it happen through an API?
@jordanhasnolife5163 2 месяца назад ⁺¹
1) Look into signed URLs, you can ask the server to basically give you an S3 link to upload to. Alternatively, you can just upload via a server, but then you have some extra network hops there.
2) I can't say exactly how it works, as I imagine it changes from CDN to CDN, but yeah I think it's pretty easy to hook a CDN like cloudflare into S3. Give it a google.
3) The client figures out which documents it needs to listen to changes for and the client application can listen to the appropriate servers. In retrospect, having the client subscribe to the kafka broker directly may have been a bit short sighted on my end, it may be better to use a dissemination approach that looks a bit more like facebook messenger or twitter where the client has a "change server" that is subscribing to kafka on its behalf.
@krish000back 2 месяца назад ⁺¹
Hey Jordan, thanks for great contents in every video.
Couple of clarification questions:
1. For Chunk DB, how will lock be taken up for single leader DB? Will it be materialized locking or are we going to create some dummy entry? What if database doesn't support materialized locking?
2. Can we move the file version lock (optimistic) from Chunks DB to File DB, as we can add an entry without requirement of any file upload and don't need any special locking on DB end? Probably following schema:
FileMetadata: userId (PK), filePath_version (SK), fileId, status .....
Chunks: fileId, chunkHash.....
3. How does user permission for files work in CDN? Does CDN hit service everytime user tries to access the file?
@jordanhasnolife5163 2 месяца назад
Sorry, maybe I'm forgetting some of the points that I made in the video, so timestamps would be useful.
1) If it's SQL, we can just use a transaction to atomically upload multiple chunks to the metadata db at once.
2) Sure, as long as there's a version number in that table
3) I guess you'd have to authenticate with a service, yes. Or, optionally, the CDN could cache some sort of session token for each user and what that grants them access to.
@krish000back 2 месяца назад
@@jordanhasnolife5163 Thanks, #2 and 3 are clearer now.
For #1, you mentioned about locking at 11:30. But, yeah if we will write atomically, it makes sense. I believe that will be done before client uploading any chunk at all.
@siddharthgupta6162 9 месяцев назад ⁺⁴
This is so much better than the other Dropbox system design and so easy to understand. You are a gem, bro!
However, I do see you use a lot of CDC in the system designs - is it used in industry as well? or it it basically because we don't want to assign another server for this stuff and let the DB take care of it? Also, can we not use it with spark consumers?
I have never seen CDC used in grokking (didn't like the content so stopped using it), and also didn't see in my career (so far) so just curious.
@jordanhasnolife5163 9 месяцев назад ⁺⁵
From what I understand CDC is a pretty new design pattern, however it's really semantically similar to just putting a write in Kafka and then using a stateful consumer to handle it and upload it to multiple places, it just changes which data sink is the source of the truth!
The main reason that I use CDC is because to keep multiple data sources in line, using stream processing frameworks with guarantees about message delivery is to me a lot better than using two phase commit all over the place or having to reason about failure scenarios.
@felixliao8314 3 месяца назад ⁺²
thanks for the content buddy. also, is that a protein powder on your burner?
@jordanhasnolife5163 3 месяца назад
It's off to the side lol
@deepitapai2269 5 месяцев назад ⁺¹
Great video, Jordan! Could you elaborate a little bit on why we chose a Kafka queue over Cache for Unpopular file changes?
@jordanhasnolife5163 5 месяцев назад
While we could aggregate the full document state in a server and send that to the user, we want to only send incremental changes that a user needs to apply. We can do this by having them listen to a kafka queue of these ordered incremental changes to those files.
@samsai4460 4 месяца назад
@@jordanhasnolife5163doesn’t make sense, why not give users the latest chunks via cache, instead of making them build incrementally (isn’t slow)
@just_a_normal_lad 9 месяцев назад ⁺¹
Wow, I truly appreciate your videos! Every time I watch them, it serves as a powerful reminder of how much there is to learn. One doubt , What do you mean by sharding in Kafka ? AFAIK there is no sharding concept in Kafka. 2 approaches which I can think. 1. Have 1 Kafka Cluster setup for each shard and have a single topic and push all events to it. 2. Have 1 kafka cluster setup and multiple topics for each shards, so clients will push the events to specific topic based on the sharding logic. Anything else you wanted to convey or any other approach which i missed??
@just_a_normal_lad 9 месяцев назад ⁺¹
@jordanhasnolife5163 Also, whatever approach we select we are sure that for each user we need to have a separate consumer group so that it can subscribe to the single kafka queue/topic. Are we OK with having billions of consumer group in Kafka?? One more tradeoff with this approach is that if User1 and User2 belong to same shard then both of them will be reading the same set of events, in that case some events wont be applicable for most of the users but they will still consume it and ignore it.
@jordanhasnolife5163 9 месяцев назад
When I say sharding I mean partitioning, which definitely exists in Kafka. I don't need a single consumer per userId, but I do want to ensure that each userId has only one consumer reading messages that correspond to it, hopefully this makes sense!
Yes, I don't particularly mind that the same consumer will be handling requests of multiple users, otherwise we'll have billions of consumer groups like you mentioned.
@iknowinfinity 4 месяца назад ⁺¹
Hey Jordan, thanks for the awesome videos!
I see that you often use Flink in your designs. So, I believe that it's assumed to be a full-fledged consumer with all the checkpointing and everything. If yes, then won't that be expensive? Are there any alternatives?
@jordanhasnolife5163 4 месяца назад
Checkpointing does have the side effect of taking up storage, you could not do it and have to reread many more Kafka messages/lose fault tolerance.
Under which part of this would you consider it "expensive"?
@iknowinfinity 4 месяца назад
@@jordanhasnolife5163 It seems a little overwhelming to me to have a job manager, Zookeeper, a storage, keeping track of barrier messages, etc just to have fault tolerance. It just might be the case that this is the bare minimum that's required for achieving fault tolerance, but I was wondering if there is anything better here?
@chingyuanwang8957 9 месяцев назад ⁺¹
Hi Jordan, amazing video, it really helped me out! Quick question on the file routing part: if the file cache only contains the file ID, and we still need to check the file's details, why is it necessary to propagate file changes to the user? Thanks for clarifying!
@jordanhasnolife5163 9 месяцев назад ⁺¹
Hey Ching Yuan! I'm not 100% sure what you mean here. We need to let users know when the chunks of a given file have changed so that they can go to S3 and fetch the new chunks. Feel free to clarify and I'll get back to you.
@AdityaRaj-bo9qe 6 месяцев назад ⁺²
Hi Jordan, Regarding flink, there is a concept of window, that for how much interval of time we do the joining part, so that we don't store entire data for a stream in its lifetime in memory. What window type shall be used here.
@jordanhasnolife5163 6 месяцев назад
Hey! No window here, as I'm attempting to limit the amount of data that we're performing joins on (so... infinite window)
@michaelparrott128 11 месяцев назад ⁺²
Would be interesting to hear about how exactly the file reader connects to these queues. Are the queues something like SQS? How would a client device (e.g. phone or computer) connect to that? When thinking about, I thought one solution could be to broadcast that there is a new version for file X, then the client goes to a file reader service to read the new chunk data.
@jordanhasnolife5163 11 месяцев назад
The queues are just kafka queues, super easy to subscribe to a given topic
@bshm1718 11 месяцев назад
@@jordanhasnolife5163since we have 1 billion users, can we have 1 billion topics in kafka and each user subscribes to their topic? can kafka scale to so many topics?
@deantodoroski5059 10 месяцев назад
@@jordanhasnolife5163 Do you suggest client apps (android, ios, web, desktop) connect directly to Kafka? I have no experience with Kafka, quick google search shows people recommending against and putting some proxy in front of Kafka. Do you have experience with this? Btw, great videos! :)
@vaishnavimanjunath7901 4 месяца назад ⁺¹
Hey Jordan, Thanks for this great video.
1. What exactly do we store in CDC when any file changes? is it the file_id and chunk-ids that are new/ updated? in that case, on the user end - we pull chunks that user does not have, but how is chunk ordering maintained?
2. On file upload, file can go through block server that divides the data into chunks. Once a user makes changes on a file, how are new chunks and chunks that were updated identified? Does chunk creation for new content happen on user end?
@jordanhasnolife5163 4 месяца назад
1) Yep! The chunk order is determined when we write to the database (ultimately we can just add a full new "version" of the metadata in the DB, and put this in kafka, as the metadata itself shouldn't be super big). The client then looks at which chunk hashes it is missing when it receives that metadata list.
2) Yeah this can be done client side, think of something similar to a git diffing algorithm (probably merkle trees).
@dmitrigekhtman1082 9 месяцев назад ⁺²
Good stuff!
I agree with the guy further down in the comments that the final read interaction with the queues is confusing -- what is the end-user doing with the file changes coming off the queue and how does the user read from the queue?
@jordanhasnolife5163 9 месяцев назад ⁺²
Sure! So ideally what the messages are in the queues are basically just telling you the diff between the current chunks of the file and the new version of the file. So from there you could basically see the S3 urls of those different chunks and go fetch them.
The user reads from the queue by subscribing to their own partition (based on userId). It can also subscribe to 10-20 "popular partitions" for documents with tons of users.
@AmitYadav-cn6gj 9 месяцев назад ⁺¹
@@jordanhasnolife5163 - How would this design deal with permissioning of the popular files. It seems like the users would have visibility into all the popular files (since they are subscribing to 10-20 "popular partitions") regardless of whether they have permissions to even be aware of the presence of these files ?
@jordanhasnolife5163 9 месяцев назад ⁺¹
@@AmitYadav-cn6gj Yep, good point! I think that for this particular case you would basically have to interact with these queues via an intermediate server which is aware of the permissions and sends you documents back accordingly.
@rakeshvarma8091 5 месяцев назад ⁺¹
@@jordanhasnolife5163 In this scenario, the intermediate server will end up sending the document to several users right since it is popular.
@jordanhasnolife5163 5 месяцев назад ⁺¹
@@rakeshvarma8091 I'd say it would be more likely that users would query the popular server rather than it pushing the documents to users. You could also have many consumers on the popular queue so that you can have many servers that serve requests here.
@huguesbouvier3821 11 месяцев назад ⁺¹
Thank you for the video! I really like your videos, the quality of the second batch is excellent. How you dive deep in the problems make for very interesting viewing.
For the chunkDB, how would you solve the issue of 2 writers creating a new version at the same time (phantoms read I think?). The solution would be materializing conflict? How could we do that?
@jordanhasnolife5163 11 месяцев назад
Assuming it's single leader replication basically the idea is just to use a predicate lock on fileId and version you're claiming
This should ideally be fast as we have an index on those fields!
@kanani749 7 месяцев назад ⁺¹
@@jordanhasnolife5163 I want to build off the other question. Could you provide clarification if possible.
In order to achieve fast throughput when writing to chunkDB would serializable snapshot isolation be the most optimal way of resolving potential concurrency issues?
Additionally, would the predicate lock be on the File-ID and Version #? Would this be to prevent write skew and specifically a phantom write?
Also I'm assuming we would need to use zookeeper or etcd to manage holding these distributed locks?
@jordanhasnolife5163 7 месяцев назад ⁺¹
@@kanani749 I think it really depends on how often you expect to have to lock. Even something like a primary key I would think needs to get a lock if it's to be monotonically increasing, so in reality, the true answer is as always, it depends :)
This being said, I think this is something that gets abstracted from you in the db so you likely don't have to worry much about it.
@5atc 10 месяцев назад ⁺⁴
Wouldn't the chunk versioning result in a lot of additional metadata DB rows if a small change is made to a large file? E.g. if a single 4MB chunk is modified in a 1GB file, does that mean we increment the version number and add 250 new rows to the metadata DB, where 249 rows point to the existing S3 chunks (but with the incremented version number) and 1 row points to the newly modified chunk?
@5atc 10 месяцев назад ⁺¹
Also, am I right in thinking that by using a SQL DB, ACID transactions mean that all the rows for a new version number would be written together? Which is good, because we avoid situations where the lead or follower replicas have some but not all rows for the chunks of a specific version number. I think this would be a challenge if we chose a NoSQL DB?
@jordanhasnolife5163 10 месяцев назад ⁺²
Yup, that's my reasoning for using SQL here! That being said, some NoSQL databases still support atomic writes I think.
As to your second point, which is a good one (and I should have touched upon more within the video), you can basically just upload metadata rows for the new chunks (and then for the chunks that already exist you'd use the existing metadata rows from the highest possible version). Of course there would have to be some smart logic regarding how to know which of the previous version chunk metadata rows that we'd have to pull in.
@5atc 10 месяцев назад
That makes sense, thanks! These videos are great, keep it up!
@mytabelle 9 месяцев назад ⁺¹
When talking about hot/popular files:
- if I understand correctly for your proposed solution, we have one connection for normal files, and one additional one for every popular file we have access to. For the unpopular files, the notification service would push these updates to the user, but for the popular ones, we'd push to some service and its replicas, and users would pull from that service, correct?
- is there an issue with sending update notifications in a batch? We could keep one connection for each user, but simply send, say, 100 updates per second for popular files. This means there's a minor delay for the end user, but makes our system less complex.
@jordanhasnolife5163 9 месяцев назад
1) yep!
2) I'm a bit confused what you're saying here, we could certainly send batch updates, but then we have to send them to a lot of users automatically. This can definitely work, but it may be easier for users to just poll the changes as they want them.
Nice questions!
@meenalgoyal8933 8 месяцев назад ⁺¹
Hey Jordan, thank you so much for these videos. I find them very helpful and in-depth. I had 2 questions.
1. How would the client know for the first time about any new file shared with them, specially popular files? For non-popular files, we can still update per user cache to reflect that so when user checks the cache they know about it. but for popular files, we rely on user already knowing fileId.
Do you think user needs to do periodic call to permission db to get an idea about files shared to them or may be long-polling request to the permission db?
2. I understand that Apache flink seems like a good option for many design questions due to partitioning, and caching capabilities to maintain the state from one stream so we can join it when message from other stream comes. But out of curiosity, what would be good alternative to Flink? Thoughts?
@jordanhasnolife5163 8 месяцев назад
1) If I recall correctly, in this design the client doesn't need to be aware of its file access, it stays connected to one server which has all of its changes delivered directly to it. As for popular files, you're correct that we need to do extra work because we are polling for changes. I'd say that on each poll the client should first poll the permissions db to see what files it has permissions for, and which of those are popular. From there it can poll the servers that hold changes to popular files. See the twitter design, it's very similar.
2) Technology wise: kafka streams, spark streaming. Design wise? I'm not entirely sure, I think at the end of the day each client wants to minimize its open connections, and so you can't really avoid using some intermediary layer that delivers the relevant changes to the server that the client is listening to. Message queues with a stream processing framework seems like the best choice here to avoid constant polling.
@meenalgoyal8933 8 месяцев назад
@@jordanhasnolife5163 Thank you! 1) makes sense. For 2), yes I was asking technology wise. I got introduced to flink through your videos only. Do you know if kafka stream also offer similar things as flink:
1. fault tolerance and restoring from checkpoint after crash?
2. Have a cache which stores state when consuming multiple streams and joining them real-time?
I suspect #2 might not be offered in kafka stream.
@zhonglin5985 7 месяцев назад ⁺²
Suddenly got a low level question about Flink's partitioning -- is it required that the all ChunkTable CDC topic, PermissionTable CDC topic and Flink cluster have to be partitioned in the same way? In the same way I mean not only they should use the same partition key (fileId), but they should use the same hash function and the same number of partitions.
@jordanhasnolife5163 7 месяцев назад ⁺²
Yeah, it basically has to be, our else the data could be going to different flink nodes
@__noob__coder__ 11 месяцев назад ⁺¹
Timestamp - 19:40, can we use S3 events to detect new chunks being uploaded to S3 ? So a worker when detects this kind of event, can put an entry in the database. The chunks being uploaded to S3, will contain some metadata, like which file id and version do they belong to. So the worker can get file id and version for the chunk from the S3 object metadata.
@jordanhasnolife5163 11 месяцев назад ⁺¹
Possibly! Hadn't heard of s3 events before but something like that could work too. I think for an interview you generally want to be as technologically agnostic as possible, so unless every cloud storage provider has handlers like these maybe I'd avoid it.
IRL would work great tho
@samsai4460 4 месяца назад ⁺¹
@@jordanhasnolife5163you are right, we should always be able easily migrate to a new storage in future if it comes with more performance and less cost
@YTJones 10 месяцев назад ⁺²
Thanks for the excellent vids Jordan! I had two questions:
1. Could we use something like change data capture to keep S3 and the chunks DB in sync? the chunks DB could be seen as derived data from what's in S3, so could you turn the S3 uploads into an event stream and use the same Kafka + flink combo as in other vids?
2. In the hybrid approach where we push most docs into user caches, what do we do to handle cache invalidation if the doc is edited? Do we just write to the cache directly first to make sure the user sees it, then write to S3/the appropriate chunks DB and propagate to any other users? Or do we first need to write to the central sources then update the cache, allowing a brief period where the user can't see their own update (sounds no bueno)?
@jordanhasnolife5163 10 месяцев назад ⁺¹
1) I'm actually not sure whether s3 supports cdc, but if it does then yeah I'm all for it!
2) I think every file change would basically go through the same pipeline of into the object store then into flink and off to user caches. I'd imagine that after a user makes an edit we'd probably perform a client side optimization where we locally cache their edit and if the version number is higher than whatever is in their cache we show them their local version.
Nice questions!
@CompleteAbsurdist 7 месяцев назад ⁺¹
@@jordanhasnolife5163
> I'm actually not sure whether s3 supports cdc, but if it does then yeah I'm all for it!
Lil late to the party but here goes - You can configure to trigger a Lambda function on an S3 file operation which can in turn call your application to do what you want.
Alternatively, you can trigger an SQS or SNS event on the addition of a new s3 file that your application can listen to.
@jordanhasnolife5163 7 месяцев назад
@@CompleteAbsurdist Very cool! You'd have to be able to derive the metadata from the document, perhaps you could embed it in the name somehow
@jordanhasnolife5163 7 месяцев назад
@@CompleteAbsurdist Very cool! Thanks for sharing!
@OF-nf2cb 4 месяца назад ⁺¹
Hey Jordan - great video, I'm spending a lot of time making sure I understand the details..for uploading, I dont understand why it's okay to do parallel commits to s3 and chunks metadata db. If the chunks metadata db write fails or the s3 write fails, then when the user reads wont that be a big problem?
@jordanhasnolife5163 4 месяца назад
I don't believe I said to do these in parallel, you'd want to go to s3 first
@samsai4460 4 месяца назад
@@jordanhasnolife5163you said going to s3 and coming back and going to db and coming would take lot of time. So do it in parallel, if db call fails we cant intimate the user, but s3 might have some stale data, which can be cleaned via batch jobs
@OF-nf2cb 4 месяца назад ⁺²
Sorry I had misremembered, so for the case when you upload to chunks metadata DB after s3 upload and it fails, you mentioned we could clean that up later but won't that create issues downstream when someone tries to read the file?
@jordanhasnolife5163 4 месяца назад
@@OF-nf2cb When will anyone try to read the file? There wouldn't be metadata for it. If we reupload after a failed attempt we'll overwrite the old data.
@OF-nf2cb 4 месяца назад
@@jordanhasnolife5163 but if one chunk fails then wouldn't they still be able to read the rest of the file for the chunks that didn't fail and thus just see the file with a bunch of words missing? Sorry if I'm completely missing something fundamental here
@Prakhar1405 8 месяцев назад ⁺¹
Nice Video Jordan.
I have a question on how do we read file data. For ex, lets say file has version 4, which contains only changed blocks. And customer adds a new device which does not have the file. How do we construct the file on device?
@jordanhasnolife5163 8 месяцев назад
You go to the database and ask it for all of the blocks of the file! Ideally each chunk has a reference to the chunk before it so we can reconstruct easily by reading the database
@Prakhar1405 8 месяцев назад ⁺²
@hasnolife5163 Thanks for the explanation. I did well in my design interview by following your videos. Thanks a lot!!
@jordanhasnolife5163 8 месяцев назад
@@Prakhar1405 That's great man!! Super glad to hear :)
@priteshacharya 6 месяцев назад ⁺¹
I have question on "The Unpopular file changes" queue. Since it's partitioned on userID, a single partition would cover multiple users. Lets say a single partition has 10000 users, does that mean a client would have to listen to updates from 99999 users when it's only 1 that they need to?
Someone else mentioned it's not a good idea to expose kakfa queue directly to the user. I agree with them. There has to be a service that connects kafka queue to the user.
@jordanhasnolife5163 6 месяцев назад ⁺²
Very reasonable, another very easy solution is to just have a server read from that kafka queue and use a websocket to push the changes to the user. See messenger, notification service design.
@adityaatri2053 3 месяца назад ⁺¹
Hey Jordan, thanks for super useful content. I have one question -
So when user A opens a doc (either popular/un-popular), we read from cache to get all the chunks for a given file. And when another user B makes changes to that file while being opened by user A at the same time, we listen to CDC for any updated chunks.
@jordanhasnolife5163 3 месяца назад ⁺¹
Yep!
@Anonymous-ym6st 2 месяца назад ⁺¹
I really like the detailed discussion on multi leader + lock + version vector vs. single leader + lock. But I am a bit lost on 16:16 MySQL. Does that mean we can just leverage the MySQL's ACID transaction to solve google drive problem? do we still need to have a separate lock to solve the problem?
@jordanhasnolife5163 2 месяца назад ⁺¹
We're going to write multiple pieces of metadata to a database (one per file chunk), which occupy different rows. We want that write to be atomic, and we don't want it to be interleaved with someone else making a similar write at the same time. That's why ACID transactions are useful here.
@Anonymous-ym6st 2 месяца назад ⁺¹
@@jordanhasnolife5163 thanks for the detailed reply! so the single leader + lock we described in 11:40, is the internal mechanism that MySQL use for achieving ACID? When we are using MySQL in real case, do we need to set up any config to reach this?
@jordanhasnolife5163 2 месяца назад
@@Anonymous-ym6st Yep! Just use transactions.
@kushalsheth0212 8 месяцев назад ⁺¹
hey, good video, great explanation, but have a doubt that why we are storing in S3 and also in chunkDB at 18:40, shouldn't we only save chunks in chunkDB?
@jordanhasnolife5163 8 месяцев назад
Chunk db is really for the metadata about the chunk, s3 is for the actual video content
@samsai4460 4 месяца назад ⁺¹
@@jordanhasnolife5163s3 should give us some kind of url to the objects which should be part of chunks meta right?
This is not properly represented in the diagram. But great content man!
@maxvettel7337 6 месяцев назад ⁺¹
Hi Jordan, great explanation but I don't understand how client assembles file from chunks after downloading. In what order should the chunks be assembled?
@jordanhasnolife5163 6 месяцев назад
We can have either 1) the chunk number or 2) a pointer to the next chunk id in the metadata table per chunk
@maxvettel7337 6 месяцев назад ⁺¹
@@jordanhasnolife5163 Thanks a lot
@saimunshahee5823 10 месяцев назад ⁺²
Dumb question maybe but should the CDN's be in front of the S3 service? I.e should the file readers/write be hitting the CDN first and then the CDN routes to S3? I might have an incorrect understanding of this
Awesome channel btw, learned a lot from your System Designs 2.0 playlist!
@jordanhasnolife5163 10 месяцев назад ⁺¹
I think that's potentially a fair option to lower write latency, but it could also be possible that we then evict data that we want to be in the cdn from it, so there are tradeoffs
@kushalsheth0212 8 месяцев назад
i am not getting why we are uploading file to S3, we are already saving chunks to chunkDB, then why we need S3? can you please help me to understand
@adityagarg6466 7 месяцев назад ⁺¹
@@kushalsheth0212 ChunksDB contains the metadata info, and not the real file.
@MANJEETSINGH-kg6do 3 месяца назад ⁺¹
Hey Jordan, Thanks for the amazing content. I have one doubt regarding the database replication for document permission. Here we are using single leader replication to avoid causal dependency issue. But the following/follower table in facebook video (2nd video) could also have the same issue and we used the leaderless replication there instead of single leader. Could you clarify why is that?
@jordanhasnolife5163 3 месяца назад
1) As far as leaderless replication goes, we can always have causal dependencies even with quorum consistency due to things like sloppy quroums/hinted handoff.
2) I don't believe we used leaderless replication in tinyURL. We want single leader replication there to avoid write conflicts.
@MANJEETSINGH-kg6do 3 месяца назад ⁺¹
@@jordanhasnolife5163 sorry by mistake I mentioned TinyURL, I meant to say the following/follower table in Instagram + twitter + facebook video (2nd video) [corrected the above comment].
Here you mentioned to use the leaderless replication (Cassandra) in 2nd video time ruclips.net/video/S2y9_XYOZsg/видео.html
@jordanhasnolife5163 3 месяца назад
@@MANJEETSINGH-kg6do My thinking was basically that a) you can basically avoid write conflicts here due to causal dependencies here by actually storing delete operations (kinda like a CRDT) b) it's follower following relationships, not something where the data needs to have the same integrity of a stored file, so I care less (mostly this)
@zalooooo 10 месяцев назад ⁺¹
Nice vid but no mention of operational transform algorithm or crdts for concurrent writes to the same block? I think that's a critically important part of this problem to at least acknowledge. If I gave this question as an interviewer, I'd expect to see an acknowledgement of the challenges of multiple users modifying the same part of a document which your solution doesn't address.
@jordanhasnolife5163 10 месяцев назад
This is Google Drive (which tries to act like a distributed file system), you're talking about Google docs. I'll do that one soon enough
@zalooooo 10 месяцев назад ⁺¹
Makes sense! I definitely misread G drive As G docs
@zachlandes5718 9 месяцев назад ⁺¹
You mention zookeeper near the end -- what will you manage with zookeeper in the diagrammed design?
@jordanhasnolife5163 9 месяцев назад
We pretty much need zookeeper for managing which partitions belong on which nodes, as well as when to perform failovers when a given leader goes down.
@Nisshant1112 9 месяцев назад ⁺²
hey Jordan!
When we are partitioning kafka on File Id won't we end up having billions of partitions in kafka?
@jordanhasnolife5163 9 месяцев назад ⁺³
Hey Nisshant - when I say partition by fileid, that really means "use consistent hashing over the range of fileId hashes" as far as Kafka is concerned.
Hopefully this makes sense.
@anshulkatare 2 месяца назад ⁺³
Is it possible for users to directly connect to kafka for file changes.
I mean, there are billion users, and multiple devices, to have them connect with kafka and pull the file chunk changes.
1. A lot of events are not going to be related to their file.
2. How is kafka going to store for each user what offset it is.
That part of recieving is not clear to me.
@jordanhasnolife5163 2 месяца назад ⁺¹
it's possible, but you explained why we don't want to do it - too many connections to the kafka broker, and too many messages to filter out. Better to have some intermediary servers that consume from kafka, as they can be trusted to ensure each user only sees document changes it has permissions for.
@anshulkatare 2 месяца назад ⁺²
@@jordanhasnolife5163 so the users devices are connected to this intermediary consumer server.
This server going to store at user-device-fileId-chunk level what is the offset and maintain single connection to the Kafka broker.
So, now what DB it will store it? And what protocol it will use to push the changes? Does udp make any sense here?
@jordanhasnolife5163 2 месяца назад
@@anshulkatare Sorry, why do we need a database here? I don't think UDP does make any sense, something like a websocket between user and server seems more than sufficient to me.
@anshulkatare 2 месяца назад
@@jordanhasnolife5163 As you have said, the intermediary servers are making a websocket connections with the users and channeling whatever the changes are to users devices thru websocket.
Now the consumers (intermediary servers) need to coordinate who is connected to which user, maybe we need a zookeeper for this, which will handle info about server - user connection mapping and handle which changes are sent and which are in queue.
So user - server - fileChunk - isSent kind of data.
This whole file sharing management is what we need a datastore for.
Let me know if I am thinking in right direction.
@yiannigeorgantas1551 2 месяца назад ⁺¹
For unpopular file changes, the file reader would need to connect to the shard matching their userId, and ignore all other userIds that are collocated on the same shard. For popular changes, they would connect to a shard for each popular file they want to listen to, and ignore all other popular files that are collocated on the same shard. Is my understanding correct?
@jordanhasnolife5163 2 месяца назад
Yep! That being said "ignore", probably just means that particular server that the user connects to has some sense of which users need the changes for which documents and only push them to those users.
@yiannigeorgantas1551 2 месяца назад
@@jordanhasnolife5163 Makes sense, thank you!
@5atc 10 месяцев назад ⁺¹
Thank you for the video! Does each Flink instance effectively maintain an up-to-date, local slice of the permissions DB for the files contained within the Flink instance's shard of fileIds? How does that get stored inside the Flink instance?
@jordanhasnolife5163 10 месяцев назад ⁺¹
Yep, basically! It gets there via change data capture.
@levyshi 6 месяцев назад ⁺¹
For the pushing file changes to users, does it mean like when the user is viewing the file, and the owner changes the content? the file should change in real time?
If not, couldn't the user just fetch the latest version from s3 when they open the file?
or the purpose of this is to reduce fetching of entire files, and if the file is already on the user's device, we just need to make sure the updated changes are also there.
@levyshi 6 месяцев назад ⁺¹
also, if I add permission to another user, does it require a distributed transaction since it's sharded on user id, and if two users are on different nodes, we'll need to read to one and then write to another
@jordanhasnolife5163 6 месяцев назад
1) I think this is more of a design question, but yes it could mean they would see changes in real time. More likely, what it means is that clicking to open a file on your computer doesn't take a few seconds to poll the latest version of it.
2) Not sure what you mean here. We're just adding permissions for one user at a time (userId, fileId) to the appropriate partition in the database
@levyshi 6 месяцев назад ⁺¹
@@jordanhasnolife5163 Yeah, sorry Ignore the second question
@andreybraslavskiy522 6 месяцев назад ⁺¹
Hi Jordan, thank you for your effor to help us prepeare for the interview. Can you help me with the question. Flink reads from Kafka and caches data about file to user permission. What if Flink crashed. How it will restore its state? Reread both kafka topic from the start? Or it has a replica of its state somewhere?
@jordanhasnolife5163 6 месяцев назад
It occasionally snapshots state in s3, and then you can reread from the kafka indexes that correspond to that state
@andreybraslavskiy522 6 месяцев назад
@@jordanhasnolife5163 thanks, I should learn more about Flink.
@saurabhmittal6947 5 месяцев назад ⁺¹
hey jordan, loving your content but I was wondering how would version increment here so lets say user has created a file which is v0 the he make some changes then we would be continuously uploading those changes to s3 right ?? If that's the case then would be keep incrementing the version with timestamp or what ?
if lets say say, uploading to s3 is based on a trigger then we keep incrementing the version of file by 1 of currently loaded version, right ?
@jordanhasnolife5163 5 месяцев назад
I envision it as uploads only get made when you hit save, and then you increment the version number of the highest known version by 1. If there are conflicts you have to resolve them locally.
@truptijoshi2535 5 месяцев назад ⁺¹
Hey Jordan,
How do we divide the file into chunks and how do we determine their order?
How do we get to know which chunk was modified?
Thank you!
@jordanhasnolife5163 5 месяцев назад
I'd imagine they probably use something like merkle trees for this
@lagneslagnes 5 месяцев назад ⁺¹
Chunks DB:
How do you get the latest versions across all chunks of an document?
You might want to add another boolean field called "latest" ?
I'm assuming each chunk of same document can have different latest version.
If we tried to increment the version of all chunks at same time, kind of defeats the purpose of having chunks.
@jordanhasnolife5163 5 месяцев назад
You could always upload metadata rows for an entire version of a document, and then only upload the "new" chunks to S3, since that's the expensive operation. Adding a metadata row for each is fine.
@visiablehanle 11 месяцев назад ⁺⁶
10pb?
@jordanhasnolife5163 11 месяцев назад ⁺²
Ah shoot did I mess it up? Lol
@bengillman1123 9 месяцев назад ⁺¹
Yeah it's 10pb. BOE calculations are tricky
@SaiAkhil-my8ps 7 месяцев назад ⁺¹
if the unpopular file changes are partitioned on userId, if the user has not opened the file which has a change and the message is at the top the queue, what are we doing with that kafka message?
@jordanhasnolife5163 7 месяцев назад
Not entirely sure what you mean here.
If the user doesn't have the file open, and they miss multiple changes on a file, they can go ahead and poll the most recent state of the document from the db
@John-nhoJ 11 месяцев назад ⁺³
Not covered - How do you enforce a storage limit, especially on things like live document edits AND file uploads?
@jordanhasnolife5163 11 месяцев назад
What do you mean by live document edits here?
I think that might come into play for Google docs. For the file uploads, you do know in advance how big each chunk is and could perform some client side validation on the size - (hit a backend service to tell you how much data you've uploaded, which gets changed every time that you upload new chunks).
@John-nhoJ 11 месяцев назад ⁺¹
@@jordanhasnolife5163
Uploading the file metadata isn't really secure. You could intercept the request and say that your massive chunk is 1 kb.
By live document edits, I mean like 40 people collaborating in Google docs. You have to take the text into consideration when computing the overall storage usage of the document.
@user-se9zv8hq9r 11 месяцев назад
@@John-nhoJ did you know that before 2021 google docs / sheets space was infinite, which is how there were those 2 projects on github which chunked and uploaded files into base64 then put them in sheets and docs for infinite space? lol, like to store the ubuntu iso was 534 docs. Then google caught on and now includes it as part of the drive space. I'm curious how they do that though efficiently
@manishasharma-hy5mj 5 месяцев назад ⁺¹
Hi Jordan, u talked about in your videos that, crdt are useful in multileader replication.
But here u are saying single leader replication for document permissions.
Can u pls help in clarifying
@jordanhasnolife5163 5 месяцев назад
Keyword is "document permissions". That's just a table mapping a docId to all the userIds that have access.
The actual content of the document itself is maintained by a CRDT.
@indraneelghosh6607 10 месяцев назад ⁺¹
Does Flink store the state of all files? What if there are a billion files? Will it still be able to store the updated state of all the files? How does it do this exactly?
@jordanhasnolife5163 10 месяцев назад ⁺¹
Partitions, partitions, and more partitions :) We're sharding on fileId.
@ankitagarwal4022 6 месяцев назад ⁺¹
Hi Jordan, How long flink will keep the file to user permission information ?
@jordanhasnolife5163 6 месяцев назад
Forever
@monaagrawal9695 3 месяца назад ⁺¹
Its not clear who is doing chunking? is client doint it and uploading directly to S3? how does it send metadata to db then?
@jordanhasnolife5163 3 месяца назад
Yeah I'd propose chunking be done client side. You can send chunk metadata to the DB.
@monaagrawal9695 3 месяца назад
@@jordanhasnolife5163 so, we have mobile clients which can handle logic of chunking. on other hand, your proposal is to have clients app subscribing to kafka q for getting file updates. isn't it server side thing to produce and consume from kafka?
@Anonymous-ym6st 2 месяца назад ⁺¹
interesting learning at 26:20 that we can use kafka as the storage for file push. So the reason (compared with twitter design using redis) is remember offset of last read, while for post / twit it won't be sliced to multiple change so redis better for the whole? Is my understanding correctly?
Also for connection, I remember we used websocket in twitter one, is there any reason we use long polling over websocket here? Thanks in advance!
@jordanhasnolife5163 2 месяца назад ⁺¹
To be clear I'm not using kafka for file uploads. I'm using it for the file metadata. The files themselves still belong in S3. I think the end design here between these two should be fairly similar, the difference being I don't really care as much about the cached file "news feed" of file changes for individual users. Just either deliver the incremental differences or they can poll for all files on dropbox start up.
As for long polling vs. websocket, I don' t really have a great reason. Long polling is honestly probably fine for both, since the connection is unidirectional.
@alekseyklintsevich4601 9 месяцев назад ⁺¹
The way you're using Flink is not really what it is designed for. You're using it as an in memory store
@jordanhasnolife5163 9 месяцев назад
I don't agree with you here. Flink is meant for processing messages in real time. I'm using it to handle incoming messages and distribute them to a variety of sink locations
@justlc7 5 месяцев назад ⁺¹
Where can we find your notes?
@jordanhasnolife5163 5 месяцев назад
See my channel description
@szyulian 2 месяца назад ⁺¹
Watched.😶----

Следующие

Автовоспроизведение

4: Facebook Messenger + WhatsApp | Systems Design Interview Questions With Ex-Google SWE