It's 4 in the morning and just completed watching this amazing amazing video on GFS, really amazing decisions they took in 2003. Got few questions but first I will try to look-up on my own. Hatsoff to your teaching skills 🤩
I am a beginner in system design and This paper is so brilliant that there were so many lines in it which would give me a whole new perspective of how things work and how i can make them more efficient I could imagine those lines or ideas and apply them in other situations of computer science and was amazed by how well it turns out Thanks for such easy explanation and content bhaiya Loved the video 👌
Great explanation. Thank you! Of the multiple times I felt "wow", two places are 1. How heartbeat is efficiently used and avoided persistence of chunk to location mapping. 2. How the write failures are handled with 2 step acks from primary replica.
when you dropped this gem! I was also reading GFS white paper and came here to watch video after completing the white paper . Even though i have understood the paper , watching your video made me fall in love with GFS even more 😊
Nice explanation! This was covered in the fourth year and Hadoop is inspired by GFS and that's when yahoo open sourced and big data reveloution started
Thank you so much Arpit for the whole video. I had a question about why would we need global mutation order, but then went through the paper and rewinded some parts of the video to understand that multiple client would be trying to write to the same chunk and hence a global mutation order is very important. Thanks for instilling curiosity :-). Means a ton to me too.
This has been awesome so far, I am half way through the video. If this is not too much to ask for, it would be great if the notes that is presented can be shared 🙏
Great video! Really made me want to go back and read the source paper 😃 Around 1:13:00, I was slightly confused as to how the master detects corruption of chunks on a specific chunk server (doesn’t seem scalable for the master to keep checking the checksums regularly). After skimming the paper it seems the master is informed about potential corruption by the chunkserver with the corrupted chunk rather than the master detecting it proactively.
As a sequel, can you please add another video on the whitepaper MapReduce: Simplified Data Processing on Large Clusters? Also will it be possible to have a video covering Colossus? I know there are not enough resources available for that.
I want someone to say to me the way Arpit says "such a beautiful software" 😊 Jokes apart, what a video! I'm really glad that i stumbled on this. Thank you
Hello Arpit, thank you for this video. I came across your channel today, amazing. Can you please create a similar video on Hadoop File System (I think it will be more or less similar to this one) or on How PySpark uses and processes data on Hadoop. Please?
Thanks for making such amazing content big fan of your work ❤, just a small doubt if i am not missing 1:01:19 you said that client have to write to each replica itself , don't you think it would be better if client only write to primary replica and from there its primary replica duty to write it to all secondary replicas, it will help in 2 ways 1) client don't have to send the confirmation that write to all replicas is completed , all these responsibility can be encapsulated to the primary replica 2) network call overhead to write from client to each replica can be reduce by primary server as they can use very high bandwidth connectivity of the Google also if replicas are near or in same physical zone than we can save network overhead to pass the data across hundreds of miles waiting for your response
Kafka does the same thing what u are suggesting. My best guess for gfs to take this call would be based on the network bandwidth availability in data centers in 2003.
Hi Arpit,, Thanks for all the great content you are making. I had a doubt, in 17:22 you talk about how client translates (file,offset) -> (file,chunk,offset). But how would the client know on which chunk the data is present? Wouldn’t that data reside inside master?
Hi @Arpit, in this video you talked about mostly how 1 GFS cluster looks like. But I have a doubt - if there are multiple GFS clusters ? how the routing happens ? I mean how system stores the info for a particular file stored in which cluster ?
Making chunks of files, but how can these make the migration easy? I think, migration of the whole system will be done when required, in general. I also think, making chunks is also complicating the system with certain trade-offs.
On every heartbeat, if the chunk server is sending the list of chunks it is holding, then is it good? what if checksum is not written correctly on the disk? I don't think, write is that efficient. Google is doing such on its own customized infra, there it will be good but for general network it will be quite inefficient. Also, I think GFS client will be taking lot of network resources, as it is sending the data to all replicas also. I get the replicas having a particular chunk, then do I need to check with master while accessing again? I can keep hitting the same replica for the same chunk.
what happened to the older chunks ? i mean if i want to overwrite some chunk, GFS create a new chunk and change the sequence in op log like in this file these are the chunks in this sequence. But how does it identify & handle the older chunks ?
Hi @AsliEngineering, during the global order across chunk servers, how would the primary chunkserver know which chunkserver are secondary? So that it can send the order needs to be persisited
Arpit, as you mentioned that the GFS Client will directly talk with the Chunk Servers to get particular chunk data and the ACL is maintained on the master server. Does it means that there is no access rights checking on the chunk servers? Can anyone with the details (meta data obtained from master server) of the chunk server access the data in that particular chunk server?
Can someone help me with the calculation of 64mb chunk requires 64bytes of metadata, then 1GB of metadata could hold how much data? Arpit said 10^6 but shouldnt it be 10^4? Also how do chunk servers talk to each other, they also have mapping of other servers locally?
this is not a transactional system like a database. so contentions are not common. But in case there are then pessimistic locking is a simple solution.
Great discussion Arpit, please make more such long videos. I have one question from the discussion. Why is the client sending the same write to all the chunk servers ? And why not just to the primary replica and then the primary replica sending the write to other replicas ? (Similar to Kafka, where the leader broker send the writes to other brokers and waits for the ack from the replica brokers)
Exactly, Why client is waiting acknowledgement from the majority, isn't primary acknowledgement is enough, system will take care of the replication then.
Maybe to protect the primary replica's bandwidth. Primary replica sending the writes to the other 2 will eat up its bandwidth, instead they chose to preserve it and use up the client's bandwidth. That's my hunch
That's great question. My take is client writes to primary & then primary writes to at least half of the replica. Let say if Replication factor = 3 then it waits for at least one more replica to give ACK. Hence total 2 writes are confimed. Then they update master and return to the client. so this ensures at least N / 2 replicas are consistent in the system. Also my take is rather than saving chunk server's bandwidth as writes to client to server is very unreliable and need to cross entire internet. Hence it is slow. mostly I believe chunk server would internally copy the data and ensures consistency. Client -> broadcast to all replica .. I AM NOT IN FAVOUR.
Go through it and find out. If I were you and in college, I would have gone through such videos more often than not. Do not doubt your abilities before even trying. To be honest, being a student you will be able to learn quicker than the experienced folks. Do not let others tell you what you can learn and what you cannot.
Thank you for this awesome video. Here master is working more like a coordinator node, isn't it? the zookeeper also does the same thing, right? Handle the information about which replica is residing on which node. I think we are using here multi-master replication, every replica is acting like a master for one chunk and others acting like a follower.
I hope you loved the video. if you do, do share a word on social media. It would me the world to me ❤
Can you share the notes for GFS ?
Thank you.
It's 4 in the morning and just completed watching this amazing amazing video on GFS, really amazing decisions they took in 2003. Got few questions but first I will try to look-up on my own. Hatsoff to your teaching skills 🤩
Thank you Pranjal :)
I am a beginner in system design and This paper is so brilliant that there were so many lines in it which would give me a whole new perspective of how things work and how i can make them more efficient
I could imagine those lines or ideas and apply them in other situations of computer science and was amazed by how well it turns out
Thanks for such easy explanation and content bhaiya
Loved the video 👌
Great explanation. Thank you! Of the multiple times I felt "wow", two places are 1. How heartbeat is efficiently used and avoided persistence of chunk to location mapping. 2. How the write failures are handled with 2 step acks from primary replica.
I can't express my gratitude adequately for creating such excellent content
Thank you for supplying detailed information regarding the Google File System, as outlined and explained in the referenced paper.
The way you explained it is just awesome.
when you dropped this gem! I was also reading GFS white paper and came here to watch video after completing the white paper .
Even though i have understood the paper , watching your video made me fall in love with GFS even more 😊
Thanks man! means a ton ✨
love it, thanks for sharing. I did not learn this much in last 10 years.
Nice explanation! This was covered in the fourth year and Hadoop is inspired by GFS and that's when yahoo open sourced and big data reveloution started
Great video.
The Primary replica concept reminds me of the leader replica concept of a Kafka topic across brokers.
More paper reviews! I like how you explain
Thank you so much Arpit for the whole video. I had a question about why would we need global mutation order, but then went through the paper and rewinded some parts of the video to understand that multiple client would be trying to write to the same chunk and hence a global mutation order is very important. Thanks for instilling curiosity :-). Means a ton to me too.
This has been awesome so far, I am half way through the video.
If this is not too much to ask for, it would be great if the notes that is presented can be shared 🙏
Great video! Really made me want to go back and read the source paper 😃
Around 1:13:00, I was slightly confused as to how the master detects corruption of chunks on a specific chunk server (doesn’t seem scalable for the master to keep checking the checksums regularly). After skimming the paper it seems the master is informed about potential corruption by the chunkserver with the corrupted chunk rather than the master detecting it proactively.
Thank you 🙌
Thanks for such detailed video. More power to you !
Thanks :)
this is great would love to see paper on big table as well
As a sequel, can you please add another video on the whitepaper MapReduce: Simplified Data Processing on Large Clusters?
Also will it be possible to have a video covering Colossus? I know there are not enough resources available for that.
Wat so special about doesn't torrent work same way? It's just looks like google renaming bit client to Google file system.
Excellent explanation. I'm at 40:41 but its totally amazing. Plz make more explanation videos of research paper
I have a playlist of 6 paper dissections on my channel.
I want someone to say to me the way Arpit says "such a beautiful software" 😊
Jokes apart, what a video! I'm really glad that i stumbled on this. Thank you
Hahhaha 😅
Very well explained and rly helped me understand… please do Big Table next😃
Good video. Not really into file management systems or computer science (I'm a 3d artist) but still watched.
Hello Arpit, thank you for this video. I came across your channel today, amazing. Can you please create a similar video on Hadoop File System (I think it will be more or less similar to this one) or on How PySpark uses and processes data on Hadoop. Please?
Your teaching skills are great👍
Thanks Ganesh!
Thanks for making such amazing content big fan of your work ❤, just a small doubt if i am not missing 1:01:19 you said that client have to write to each replica itself , don't you think it would be better if client only write to primary replica and from there its primary replica duty to write it to all secondary replicas, it will help in 2 ways
1) client don't have to send the confirmation that write to all replicas is completed , all these responsibility can be encapsulated to the primary replica
2) network call overhead to write from client to each replica can be reduce by primary server as they can use very high bandwidth connectivity of the Google also if replicas are near or in same physical zone than we can save network overhead to pass the data across hundreds of miles
waiting for your response
Kafka does the same thing what u are suggesting. My best guess for gfs to take this call would be based on the network bandwidth availability in data centers in 2003.
Thanks for the great explanation!
I have one doubt: How is the checkpoint in a compact B-tree format? Isn't it more like an append-only log?
Hi Arpit,,
Thanks for all the great content you are making.
I had a doubt, in 17:22 you talk about how client translates (file,offset) -> (file,chunk,offset).
But how would the client know on which chunk the data is present? Wouldn’t that data reside inside master?
Hi @Arpit, in this video you talked about mostly how 1 GFS cluster looks like. But I have a doubt - if there are multiple GFS clusters ? how the routing happens ? I mean how system stores the info for a particular file stored in which cluster ?
you can add one router in front of it and apply any routing strategy like hash/range/static etc.;
Nailed it.. Thank you very much for the detailed video Arpit. Stay curious :)
Loved this video and also implemented a hello world of GFS, can you also make a similar video of Kafka paper..
A good one to watch, but I wonder where commodity hardware is being used for such distributed systems.
Making chunks of files, but how can these make the migration easy? I think, migration of the whole system will be done when required, in general. I also think, making chunks is also complicating the system with certain trade-offs.
On every heartbeat, if the chunk server is sending the list of chunks it is holding, then is it good?
what if checksum is not written correctly on the disk?
I don't think, write is that efficient. Google is doing such on its own customized infra, there it will be good but for general network it will be quite inefficient. Also, I think GFS client will be taking lot of network resources, as it is sending the data to all replicas also.
I get the replicas having a particular chunk, then do I need to check with master while accessing again? I can keep hitting the same replica for the same chunk.
Amazing explanation!! Loved it!!
Would love to have a similar walkthrough for aurora db
what happened to the older chunks ? i mean if i want to overwrite some chunk, GFS create a new chunk and change the sequence in op log like in this file these are the chunks in this sequence. But how does it identify & handle the older chunks ?
@arpit bhayani - Can you share slides or notes.
Hi @AsliEngineering, during the global order across chunk servers, how would the primary chunkserver know which chunkserver are secondary? So that it can send the order needs to be persisited
Just wow ❤
great video!
Wow a great way to explain .. Thanks ..
Arpit, as you mentioned that the GFS Client will directly talk with the Chunk Servers to get particular chunk data and the ACL is maintained on the master server. Does it means that there is no access rights checking on the chunk servers? Can anyone with the details (meta data obtained from master server) of the chunk server access the data in that particular chunk server?
Do HDFS has same primary replication scenario and how to set the primary replication for a file ?
Can someone help me with the calculation of 64mb chunk requires 64bytes of metadata, then 1GB of metadata could hold how much data?
Arpit said 10^6 but shouldnt it be 10^4?
Also how do chunk servers talk to each other, they also have mapping of other servers locally?
Might be 64 KB is offset used to ensure Data Integrity
Can you share the notes ?
Great video loved it
Dhanyavaad ji.
How do we handle atomic or contentious operation here. ? two client writing same chunks are they going to acquire lock on LRU ?
this is not a transactional system like a database. so contentions are not common. But in case there are then pessimistic locking is a simple solution.
Writes: 48:05
Hi Arpit, would it be feasible to implement this in Golang? What are your thoughts?
Yes. totally. pretty easy to implement a quick prototype.
Will watch it again after completing my ongoing assignment 🫶🫡
Where can i find the notes thats being displayed in the video?
can you share these notes?
Doesn't torrent work the same way, I didn't find any difference other than renaming bit clients to Google file system
Great discussion Arpit, please make more such long videos. I have one question from the discussion.
Why is the client sending the same write to all the chunk servers ? And why not just to the primary replica and then the primary replica sending the write to other replicas ? (Similar to Kafka, where the leader broker send the writes to other brokers and waits for the ack from the replica brokers)
Exactly, Why client is waiting acknowledgement from the majority, isn't primary acknowledgement is enough, system will take care of the replication then.
Maybe to protect the primary replica's bandwidth. Primary replica sending the writes to the other 2 will eat up its bandwidth, instead they chose to preserve it and use up the client's bandwidth. That's my hunch
Same thought.. The bandwidth would often be faster between chunk servers.
That's great question. My take is client writes to primary & then primary writes to at least half of the replica. Let say if Replication factor = 3 then it waits for at least one more replica to give ACK. Hence total 2 writes are confimed. Then they update master and return to the client. so this ensures at least N / 2 replicas are consistent in the system.
Also my take is rather than saving chunk server's bandwidth as writes to client to server is very unreliable and need to cross entire internet. Hence it is slow. mostly I believe chunk server would internally copy the data and ensures consistency.
Client -> broadcast to all replica .. I AM NOT IN FAVOUR.
Gold Content , so nice presentation , handwriting ,Thanks for all the efforts 🫶🫡
This was a paper I read earlier then you hehe 😁
Hey sir, in case of hot spots, won't writes be further affected?
I am in final year of undergrad, have never read anything in system design, will i be able to understand this, please be honest sir
Go through it and find out.
If I were you and in college, I would have gone through such videos more often than not. Do not doubt your abilities before even trying.
To be honest, being a student you will be able to learn quicker than the experienced folks. Do not let others tell you what you can learn and what you cannot.
@@AsliEngineering Thanks a lot sir for your answer, would surely go by your advice
What font are you using in the video?
These are handwritten notes. But glad you thought it was a font.
You have very good looking hand writing!@@AsliEngineering
How do you make the diagrams?
they are all hand drawn n my GoodNotes app.
💯💯👍🏻👍🏻
👍👍
Thank you for this awesome video.
Here master is working more like a coordinator node, isn't it?
the zookeeper also does the same thing, right?
Handle the information about which replica is residing on which node.
I think we are using here multi-master replication, every replica is acting like a master for one chunk and others acting like a follower.
Don't know man.. Google drive kinda sucks