Google File System (GFS) - It's Ok To Fail | Distributed Systems Deep Dives With Ex-Google SWE

Jordan has no life

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 ноя 2024

Комментарии • 35

@firezdog 3 месяца назад ⁺⁶
This channel’s title is becoming more and more accurate as the series continues.
@jordanhasnolife5163 3 месяца назад
Amen to that brother
@firezdog 3 месяца назад
@@jordanhasnolife5163 my advice to you as you go down this route is that if you make your channel more specialized, you risk losing a general audience. you'll have to decide whether the tradeoff is worth it -- or you'll have to put in more effort trying to fit the specialized topic to the general audience. I am interested in these papers but also feel like I get lost in a lot of the details -- and I am still struggling to feel competent at general system design. I think what's helpful / interesting here is we see how real world problems get solved, not just the toy examples you have to work through. but if you want to make these papers relevant to a broader audience, you might have to work a little harder to help uncover the lessons that don't apply to the particular system but that we can think about when we get interview questions, when we're working, etc.
@vitaliybondarenko79 Месяц назад ⁺¹
This is your best video I checked so far! Great job!
@yugi122 3 месяца назад ⁺⁶
Jordan, keep up the amazing work. Thank you.
@iamworstgamer 3 месяца назад ⁺⁴
gold, pure gold, thanks for this
@codingenthusiast3652 3 месяца назад ⁺¹
Oh my I wanted to replicate this GFS alby myself and was searching for good resources this is literally so golden thanks a lot :D
Always enjoy your system design videos and following your no life strategy 💯👍
@jaeorgjaehpaejh 4 дня назад ⁺¹
Jordan sat alone in his dimly lit room, eyes fixed on the screen. Kafka Streams flowed before him like a slow, seductive dance. His fingers moved over the keyboard, sending commands that made the data bend to his will-smooth, precise, and totally in his control.
“Processing this much data feels like... processing my love life,” he chuckled, leaning in closer. “A little messy at first, but once I get my hands on it, everything falls into place... perfectly.”
The logs rolled in real time, a rhythm that matched his heartbeat. Who needed real life when the streams responded to him this way? Controlled. Obedient. Alive. Jordan didn’t just process data-he made it swoon.
@jordanhasnolife5163 2 дня назад
Real or ChatGPT? Either way, well done, I do think about kafka streams a lot
@marksun6420 25 дней назад ⁺¹
Hey Jordan! It took me a while to realize that writing is done in chunks and chunks can get stored in different servers! I think this is a basic principle and would be beneficial if shared in the video. Actually why we want to store multiple chunks of the same file to different servers? we can achieve high throughput assigning multiple servers to store a file while each server processes one file. Thanks for the video!
@jordanhasnolife5163 24 дня назад
Thanks Mark!
1) I feel like I tried to cover this at 3:48
2) We use chunks to better distribute data for super large files, but 64MB is still big enough that there isn't a massive overhead to having to jump around every once in a while to a new server to load the next chunk
3) We'll eagerly load the metadata of all chunk locations as an optimization when try to read one file, and at that point you're pretty much not losing any read performance
@ВалерийГоловко-т9я 2 месяца назад ⁺¹
Thanks a lot for such a great content!
A question: at 33:47 about Check Sums on Writes - when/why could we have corrupted fragment of 64KB region - C1. Didn't get why could it happen... Thanks!
@jordanhasnolife5163 2 месяца назад
With enough standard commodity disks data can just get corrupted randomly sometimes due to hardware failures.
@ВалерийГоловко-т9я 2 месяца назад ⁺¹
@@jordanhasnolife5163 As data could be corrupted randomly, shouldn't regions checksums be recalculated everytime, not only when there is a write which involves multiple 64KB regions? Guess not as sounds like pretty expensive...
In provided example seems the reason to recalculate all three region's checksums come from the fact that the write involves those regions, but maybe Im miss smth. My confusion here from why potential corruption even matters.
@jordanhasnolife5163 2 месяца назад ⁺¹
@@ВалерийГоловко-т9я On every write certainly!
Corruption matters because then we no longer have data that we thought we could access.
Now if we just had replica, sure detecting corruption is useless. But when you have 3, you can see one has been corrupted, and make another replica off of one of the uncorrupted ones.
@98tarun 3 месяца назад ⁺¹
love the subtle pun, whether intended or not 😂
@NBetweenStations 3 месяца назад ⁺¹
Thanks for the great video! Question about atomic appends. Why is interleaving bad? Is it because 2 different chunks could be appended next to each other so adding buffer allows concurrent writes still? Confusing 😢
@jordanhasnolife5163 3 месяца назад ⁺¹
If I want to write some data, it's possible that data doesn't make any sense if I only see the first few lines and then somewhere much later in the next chunk I see the last three lines. If two clients append data A and data B, I don't want to read the first half of A, then B, then the second half of A (because there was a chunk boundary).
A = "My name is Jordan"
B = "Corinna Kopf"
If we interleave these they become "My name is Corinna Kopf Jordan" Doesn't make much sense anymore does it?
@zigg_zoldyck 2 месяца назад ⁺¹
Great Content !!! ♥
@raysdev 3 месяца назад ⁺¹
if network bandwidth was not a concern, then would it be more efficient for the client to write to all three replicas in parallel rather than "data pipelining" to the closest chunk server?
also, after successful replication and storing in memory, would the client get an "ACK" from the closest chunk server which then forwards that on to the master where it would perhaps append to its op log?
@jordanhasnolife5163 3 месяца назад
Not really sure what you mean by most efficient. If you mean the fastest write throughput, then possibly, but I imagine that depends on the client's outgoing bandwidth.
The client receives an ack from the primary chunk server, as that's the one that initiates the write.
@kannanjayachandran7087 3 месяца назад ⁺¹
Great content
@rydmerlin 3 месяца назад ⁺¹
This is just to let you know that you didn't have anything in the files metadata description of this video. ie. no pointers to the paper etc.
@jordanhasnolife5163 3 месяца назад
I'm almost certain the first link in the description is a link to the paper.
@ScottThompson-g8k Месяц назад ⁺¹
Sipes Roads
@JasonBorn3814 3 месяца назад ⁺²
hi, I am sleeping. how can you get hire at Google?
@jordanhasnolife5163 3 месяца назад
I don't know what this means
@huz1 3 месяца назад ⁺¹
Dude u lost alot of weight Nice work
@jordanhasnolife5163 3 месяца назад
Ha thank you man though I'm afraid it's about to be bulking season so I'll be right back
@huz1 3 месяца назад ⁺¹
One must imagine Sisyphus happy
@jordanhasnolife5163 3 месяца назад
@@huz1 I like that
@khizerahmedkhan6386 3 месяца назад ⁺¹
Has anyone tried implementing it from scratch?
@jordanhasnolife5163 3 месяца назад
Google, Hadoop, I'm sure plenty of others
@MASTERISHABH 3 месяца назад ⁺¹
A quick question though, what did you have for starters today? 😅
@jordanhasnolife5163 3 месяца назад
Care to elaborate?

Следующие

Автовоспроизведение

BigTable - One Database to Rule Them All?. | Distributed Systems Deep Dives With Ex-Google SWE