DropBox Removed their SSDs, got 20% faster writes

Поделиться
HTML-код
  • Опубликовано: 21 сен 2024

Комментарии • 74

  • @hnasr
    @hnasr  Год назад +6

    Get my fundamentals of database engineering udemy course database.husseinnasser.com

  • @arbazadam3407
    @arbazadam3407 Год назад +8

    This guy right there is the OG backend engineer. The topics which he chooses for his videos are the ones which a backend engineer should be familiar with. Though i find his style of teaching a bit difficult and hard to grasp, i have massive respect for the topics which he chooses. Unlike others, who only talk about CRUD operations, this dude is totally different 🙏🙏🙏🙏🙌🙌🙌🙌🙌🙌

  • @sidheartgadekar
    @sidheartgadekar Год назад +129

    Sounds like a job for Pied Piper Inc. :D

    • @stevefan8283
      @stevefan8283 Год назад +3

      just do it the middle-out

    • @g33ktube16
      @g33ktube16 Год назад +6

      Sounds like a job for Gilfoil

    • @fss1704
      @fss1704 Год назад

      sound like a job for dric codes, the plagiarism of sv applied into a "real" business.

  • @timmitchell9021
    @timmitchell9021 Год назад +44

    Knowing that magic pocket is for cold storage, I don't think it is that surprising that the SSDs were more trouble than they were worth.

  • @ruojautuma1
    @ruojautuma1 Год назад +85

    Based on topic alone it sounds exceptional, but then you realize it's just "parallel throughput of massive HDD pool > single SSD" and it's honestly not that interesting anymore. Sure it's something to keep in mind when planning storage solutions. There's always going to be bottleneck somewhere anyway: network, disk array or indeed cache. How much scalability do you want and what is your budget. That should already help you establish which bottleneck you'll have so plan everything else with that in mind.

    • @MercyFromOverwatch2
      @MercyFromOverwatch2 Год назад +3

      Sounds like Dropbox is just being cheap

    • @monad_tcp
      @monad_tcp Год назад +2

      I realized that when I put 12 x 24TB HDDs on my workstation machine.
      Oh, the throughput combined of the entire array is greater than my Sansumg 970 SSD for sequential writes, but not of my PCIe M2 thingy (I bet it cheats as it has 1GB of RAM itself).

    • @monad_tcp
      @monad_tcp Год назад +1

      0.5ms latency doesn't matter if you are going to spend tens of seconds transferring your big 4TB "block" , It only means it finishes 0.5ms later.

    • @jemiebridges3197
      @jemiebridges3197 Год назад

      Yeah it'd pretty obvious. There are various type of ssds. In this case your want the "crappy" version without dram. Terrible for a boot drive but excellent as a cache for a mechanical drive array

    • @hp67c
      @hp67c Год назад

      That's the conclusion I reached less than seven minutes into the video. Talk about comparing a single apple to a crate of oranges.

  • @tomhekker
    @tomhekker Год назад +1

    As a specialist on storage latency they just needed more SSD’s. One single SSD will never win from hundreds of spinning disks. It sounds like they were completely wearing out their SSD’s with writes and that’s causing the latency and hardware failures.

  • @rightangleoverseas2391
    @rightangleoverseas2391 Год назад +3

    Truly mind blowing - So counterintuitive at first. Thank you - I'm a big fan !

  • @efkastner
    @efkastner Год назад +15

    “The word block is the most overloaded word in software engineering” 19:40
    Oh I love this! It really is pretty overloaded - what are some other overloaded terms?
    “Model”
    “User”
    “Auth” (this one is a bit of a cheat since it’s a shortening of at least 2 other terms)

  • @boredape1257
    @boredape1257 Год назад +4

    so nice to see that HDD technology is still alive.

    • @deleater
      @deleater Год назад +1

      God bless IBM

    • @boredape1257
      @boredape1257 Год назад

      @@deleater I actually had IBM-Hitachi drive at some point. It was quite amazing.

    • @deleater
      @deleater Год назад +1

      @@boredape1257 lucky you :)

  • @bikerchrisukk
    @bikerchrisukk Год назад +1

    Your topic isn't quite related to me (though I do own servers), but you convey your points in a really nice, friendly and simple way for the Layman. Nice one 👍

  • @pacifico4999
    @pacifico4999 Год назад +9

    Sounds like the perfect use case for Optane. That is, if Optane were not dead

    • @davidmcken
      @davidmcken Год назад +1

      At the time Dropbox started using SSDs Optane was far from dead. From the sounds of it they don't even need allot of it per server (10s of GB sizes would work for them). Much lower latency and no wear leveling to speak of either (their use case is similar to mysql binary logs with rotation so would naturally wear level anyway).
      Despite that sounds like they took a similar approach to CEPH which makes each disk an OSD and then duplicates the writes for redundancy (be it no two writes exist on the same OSD, server, rack, etc) as many times as needed and so the writes naturally spread out over the disks and remain independent of each other. The write returns / marked as complete when a certain number of replicas are written to disk.

  • @autohmae
    @autohmae Год назад +7

    Obviously the SSDs would fail much sooner than HDDs, because they are a write-back cache, so lots of little changes. Thus many writes to SSD, but HDD is mostly adding files in case of DropBox

  • @LianParma
    @LianParma Год назад +3

    Were they using 3D XPoint (Optane) drives? Optane drives or persistent memory modules seems to be a good option for a such system.

  • @hgbugalou
    @hgbugalou Год назад +4

    I would be super curious if an Optane SSD would perform better in this situation. It's already been proven Optane drive, even first gen ones can out perform modern NAND drives when it comes to random access and multiple operations of that type. This is why I am sad Intel is retiring it as it has some fabulous use cases and is fundamentally different than NAND.

    • @monad_tcp
      @monad_tcp Год назад

      Wasn't there a substitute for it that was essentially battery backed RAM ?

    • @hgbugalou
      @hgbugalou Год назад +1

      @@monad_tcp Its a thing but its really a step backwards. Battery backed DRAM existed before optane. It works but is cumbersome and takes much more energy than optane.
      Optane is unique because it doesn't use NAND but is still none volatile. It has much better wear leveling and total lifetime than NAND and as I mentioned it handles random access and multiple access better than NAND. NAND of course smokes Optane in most other i/o operations, but Optane does have its niches where it excels (eg: its a great OS boot drive or cache drive). I hate its going away already.

  • @hazemal-takleh3703
    @hazemal-takleh3703 Год назад

    What an interesting blog. Your analysis Hussein was very valuable.

  • @hippopotoftea
    @hippopotoftea Год назад +2

    ASMR drives are the quietest in the data centre!

  • @medicalwei
    @medicalwei Год назад +1

    7:17 ... and write it asynchronously to ASMR 😂

  • @llortaton2834
    @llortaton2834 Год назад +1

    This VS BTRFS /ZFS implementation on unraid with ZFS-raid 8x 10 SSD with zoned namespace (IOPS FOR DAYS)

  • @gljames24
    @gljames24 Год назад +1

    The new hard drives use HAMR which is heat-assisted magnetic read/write that uses a laser to allow the ferrus material to be flipped.

  • @justingiovanetti
    @justingiovanetti Год назад +1

    I suspect this won’t mean that we can keep our node_modules in Dropbox.

  • @SnazzieTV
    @SnazzieTV Год назад

    For some reason your videos stopped getting recommended to me, and i completely forgot i use to watch your channel.

  • @gljames24
    @gljames24 Год назад +1

    I'm currently looking at bcachefs development for per-file raid and caching disks, so this was an interesting watch. I have CMR hardrives so I don't know how much this matters.

  • @ABUNDANCEandBEYONDATHLETE
    @ABUNDANCEandBEYONDATHLETE Год назад

    This seems like they should have known before they installed it no? Sounds like an oversight of the designer.

  • @imanmokwena1593
    @imanmokwena1593 Год назад +1

    Thanks! I want to get your GIS book but I feel like an Udemy course would be easier to digest n better to learn lol.

  • @monad_tcp
    @monad_tcp Год назад

    23:44 the next best thing - memory as cache . but what if it fails, send the request to 3 or 4 servers and use some cache-locking algorithm for the writes to the HDDs, let it duplicate a bit, then clean up later but marking the space as free

  • @hritiksingh4905
    @hritiksingh4905 Год назад

    You should try reading Salesforce engineering blogs. They are pretty good ... Too!!

  • @catlmarc9618
    @catlmarc9618 Год назад

    I'm not that smart. And even I can understand. Thanks for your great explanation. 👍

  • @hz8711
    @hz8711 Год назад

    Hey, thank you for this video, very interesting! :)
    I was thinking that tey just buy S3 storage and resell it :D at least i know that they started like that, but it makes sense to move on to their own infrastructure.

  • @tempaccount8256
    @tempaccount8256 Год назад +4

    First , a big follower for content

  • @gblargg
    @gblargg Год назад

    tl: dw (article can be read in far less than 30 minutes): they had a single SSD "cache" (more a buffer) in front of an array of disks so the speed of the SSD was a bottleneck. Other reasons too (a bunch of SSDs failed at the same time a while back) to eliminate the SSD rather than say have an array of buffer SSDs in front.

  • @ImAManMann
    @ImAManMann Год назад

    They were stupid for using ssds like that. I build high capacity storage for my environments that I manage and you never use a single ssd in this way. Think about it... if an ssd is faster than a HD an array of ssds will be faster than an array of hds.... when designing a storage system, I always build the cache as an array of ssds.... to be more accurate I actually have 3 tiers with ram being in there.... when you do this on a large scale you need a fabric and fast network 100gig and up and you start isolating storage types.... and now I get to the part of the video where you bring up multiple ssds lol.... nice job....

  • @AmCanTech
    @AmCanTech Год назад

    can you link to the OG blog!!

  • @2022_temporary
    @2022_temporary Год назад

    Nice !

  • @ewenchan1239
    @ewenchan1239 Год назад

    I'm currently using my SSDs right now as a temporarily stop-gap measure as I am migrating my systems and consolidating it down to a single server.
    The proposal that's on the table right now is that I will have eight HGST 6 TB SATA 6 Gbps 7200 rpm in a raidz2 ZFS pool, that will host the VM's OS disk images/files, and then that will be separated from the larger, bulk storage of the rest of my data.
    I've found that when I tried to pile everything into the main, bulk storage, the performance of the VMs were suffering (sometimes semi-catastrophically) because it was waiting for the bulk storage's I/O to catch up.
    And with the incoming server/data migration, I'm going to be writing a fair bit of data to the bulk storage, and I want the VMs to be able to stay alive and responsive during that time.
    So that's the plan for the moment.

  • @Tobarja
    @Tobarja Год назад

    Here for the ASMR.

  • @terjeoseberg990
    @terjeoseberg990 Год назад

    The problem was that they had 1 SSD. They need 100 SSDs.

  • @dixztube
    @dixztube Год назад

    working in a small business i dont always understand the obsession with speed over other factors....whats up with that!?

    • @davidmcken
      @davidmcken Год назад +1

      I'm not sure its over other factors, they mention the replicas. The service will not be all that useful if the data could not be retrieved or was corrupted in some way.
      Speed underlies allot of other factors though, the faster the data can be written the more a single server can handle. Allot of these cloud services operate on an oversubscription model that makes use of under-utilized resources in servers deployed at a smaller (read pretty much everyone else's) business. When you are paying per-server / per-rack costs you want to squeeze as much performance out as possible. They are already at the point where there are redundant servers in place (whereas in a small business you have to make a whole plan / business case just to get your 2nd server to act as the backup).

    • @dixztube
      @dixztube Год назад

      @@davidmcken that makes sense. Thanks! Any good book recommendations for a front end light backend guy to learn more about all this or backend engineering in general?

  • @YasheshBharti
    @YasheshBharti Год назад

    Hussein, Stream on Twitch and discuss problems and blogs! people will go crazy!

  • @simon3121
    @simon3121 Год назад +2

    No value in this video for me. Reading the article takes a few minutes and I knew more than after 3 minutes of listening to this video.

    • @elihernandez330
      @elihernandez330 Год назад +1

      Yeah it's what i just did then came back here. He's just pumping fluff in to justify all the ad breaks.

  • @RahulSharma-bh1ux
    @RahulSharma-bh1ux Год назад +1

    when they said the ssds reached write endurance they meant the ssds failed cause of the wear and tear that happens when data is written over and over onto it
    ssds have a max write endurance capacity i.e., if a 2tb ssd has that value as 200tb. it means after we write a total of 200tb of data the ssd would die

    • @deleater
      @deleater Год назад

      Don't spread misinformation! Unless the malfunction unfortunately occurs outside the designed constrains, in almost all of the cases the data would never die post its target write endurance because that's not how it is designed to work. You will be able to read all of the data, you just won't be able to write anymore into it.

    • @RahulSharma-bh1ux
      @RahulSharma-bh1ux Год назад

      @@deleater Oks. Thanks for the correction. Do you have a link on that handy. I would be relieved of the anxiety and put more of my stuff in SSDs. Right now I keep a bunch of HDDs and only a couple nvmes

    • @RahulSharma-bh1ux
      @RahulSharma-bh1ux Год назад +1

      Ok. So here is what I summarized from the explanation on internet. Post the write endurance, the drive wont suddenly become read only. It will start getting write errors and the disk will start making the errored areas as bad thereby reducing available space. Post a certain degradation, the write/delete operations themselves would become unreliable making the drive risky to edit. So by that definition some of the data would be present but it can get botched due to partial writes/deletes. Effectively making the disk an incorrect source of data.
      To your point, one should keep a disk nearing the capacity for read purposes and not overwrite important data. Any such carefullly managed data should stay available for longer term. Still its best to move the critical data to new drive and use the old disk as a spare

    • @RahulSharma-bh1ux
      @RahulSharma-bh1ux Год назад

      @dhruv by the above summary, dropbox had no reason to trust a disk nearing its write endurance as their primary case is write

  • @shift-happens
    @shift-happens Год назад +1

    Do is ALL in ASMR next time :D

  • @sapito169
    @sapito169 Год назад +1

    nothing new
    1 we create our own cache to increase performance
    2 we remove our own cache to increase performance

  • @biplobmanna
    @biplobmanna Год назад

    Simpler and quality UX has overengineered parts in the backend.

  • @jimbobbyrnes
    @jimbobbyrnes Год назад

    have they even thought about using ram as a cache yet?

    • @hnasr
      @hnasr  Год назад

      problem is RAM is not durable so might lose data. maybe until ULTRARAM is a thing?

    • @davidmcken
      @davidmcken Год назад

      Durability, it was mentioned that once data was written to the SSD it was signalled upsteam that the write was complete and therefore the web-server could discard the data safe in the assumption that it could be retrieved safely later on. MySQL calls this the doublewrite buffer (not sure what postgres calls it).
      Comparing to a RDBMS dropbox seems to be dealing with an issue similar to the concept of pages and for the SMR setup dirty pages needing to be written all at once.

  • @potaetoupotautoe7939
    @potaetoupotautoe7939 Год назад

    I wonder why they used ssd in the first place, they could have used custom high capacity high speed server RAM or HBM usd in GPUs as cache storage and custom software to work things out if they wanted to do better. They need a better architect designer.