Our data is GONE... Again - Petabyte Project Recovery Part 1

Поделиться
HTML-код
  • Опубликовано: 21 ноя 2024

Комментарии • 8 тыс.

  • @markclayton8977
    @markclayton8977 2 года назад +4088

    The irony of a cloud storage provider sponsoring this segment is not lost on Linus. I like that.

    • @Electrex8
      @Electrex8 2 года назад +148

      The most amazing part is a backup provider also sponsored the first video on losing their data. Incredible timing.

    • @Time4Technology
      @Time4Technology 2 года назад +185

      @@Electrex8 "Hi we would like to sponsor your next data loss video, can you put us on your waiting list?"

    • @legominimovieproductions
      @legominimovieproductions 2 года назад +20

      I mean backing up a petabyte of stuff on a cloud provider is so fucking expensive, you need to pay huge amounts for bandwidth (even with 200MBps it will take forever), so its not like a realistic option

    • @ZerotheWanderer
      @ZerotheWanderer 2 года назад +6

      @@legominimovieproductions If they built the drive and sent it to the host already loaded/backed up/ready to go, I wonder what the service would run.

    • @pyjama9556
      @pyjama9556 2 года назад +1

      Generously negotiated for future f***ups no doubt!!

  • @Lmpy
    @Lmpy 2 года назад +2905

    LTT never ceases to amaze me on how professional and unprofessional they actually are at the same time.

    • @hambo76
      @hambo76 2 года назад +259

      You just described every corporation and Government in the world.

    • @HiddenChin
      @HiddenChin 2 года назад +25

      Do as i say not as i do.

    • @forresthopkinsa
      @forresthopkinsa 2 года назад +20

      Definitely. But minus the professional part.

    • @bubbaandy89
      @bubbaandy89 2 года назад +56

      Right? I've worked in infrastructure for years, the consumer end videos are awesome and insightful, but the server/infrastructure videos frustrate me so much sometimes...

    • @paulb4334
      @paulb4334 2 года назад +17

      Yet 100% entertaining which is the only metric by which to value an entertainment business ;)

  • @KoSiNeK
    @KoSiNeK 2 года назад +829

    I don't know why, but "server issues" episodes are my favourite LTT videos. Content like this just doesn't exist anywhere else.

    • @rfitzgerald2004
      @rfitzgerald2004 2 года назад +47

      That's what I like too, there's only so many gaming hardware reviews I can stand to watch, they're all much of a muchness to me, but I really enjoy the infrastructure and unusual project videos the most

    • @ryanq.4799
      @ryanq.4799 2 года назад +35

      IMO It feels more real, and a lot like old LTT did, just overall more entertaining to watch than the usual formula

    • @noxious8
      @noxious8 2 года назад +14

      Me too. One of the first LTT videos i watched was the one years ago where Linus, Anthony and Jake doing stuff in the server room on the weekend

    • @UpSideDownTech
      @UpSideDownTech 2 года назад +8

      Right?! The Whonnock Server died video is one of my favorites to watch! I have no reason why, but I just like watching it for some reason😂

    • @Mesmaroth_
      @Mesmaroth_ 2 года назад +11

      Check out Craft Computing if you like home lab server videos. Techno Tim as well for homelab hosting tutorials.

  • @OnlineWerds
    @OnlineWerds 2 года назад +148

    As a data center engineer your storage content is my favorite content. I'm terribly sorry for your issues here.

  • @TJ-vh2ps
    @TJ-vh2ps 2 года назад +613

    Postmortem reports like this are hugely valuable, but companies don’t usually share them. This is a great service to the community.

    • @AegisHyperon
      @AegisHyperon 2 года назад +9

      Because companies don't let their storage get to this situation

    • @MajesticBlueFalcon
      @MajesticBlueFalcon 2 года назад +5

      @@AegisHyperon exactly. Companies from the get go have an official IT dept. or outsources it to a competent MSP.

    • @LG1ikLx
      @LG1ikLx 2 года назад +18

      @@MajesticBlueFalcon you would be surprised how many companies mess up. What about if the IT dept didn't do their job properly and skipped over certain things in order to save time?

    • @jdatlas4668
      @jdatlas4668 2 года назад +7

      @@AegisHyperon not true. I do sysadmin for small and midsize businesses, and you wouldn't believe the kinds of things I've had to take over. Usually it's either some guy who does something else at the company and thinks he knows stuff but doesn't, or the work of some usually very mediocre external company.

    • @johngangemi1361
      @johngangemi1361 2 года назад

      @@AegisHyperon oh yes they do.

  • @The_Keeper
    @The_Keeper 2 года назад +11824

    Linus: "Right, *Now* we won't ever lose data again!
    Data storage: "How many time do we have to teach you this lesson, old man?"

    • @limemason
      @limemason 2 года назад +149

      @Nimki rafa 8 What the fuck?

    • @FelipeGutierrez-me9th
      @FelipeGutierrez-me9th 2 года назад +25

      More like power outages😂

    • @treborrrrr
      @treborrrrr 2 года назад +114

      @@limemason Spam bots, they reply to every comment automatically. Just report and move on.

    • @markaged
      @markaged 2 года назад +1

      *times

    • @defeatSpace
      @defeatSpace 2 года назад +27

      @@limemason Only for fans over 18 years old, where's the confusion coming from?

  • @HAWKF305
    @HAWKF305 2 года назад +4850

    Linus: Hates how USB and HDMI are being named.
    Also Linus: New new new vault

    • @mjtt12
      @mjtt12 2 года назад +204

      If you can't beat them, join them.

    • @robertt9342
      @robertt9342 2 года назад +272

      Well it’s pretty clear. It’s not like it’s named new old vault.

    • @marqs37
      @marqs37 2 года назад +129

      @@robertt9342 Don't give him ideas.

    • @adamjurak708
      @adamjurak708 2 года назад +16

      @@robertt9342 that was my first thought when he said about reusing old voult. It would be new voult build from old voult. So... New Old Voult [short NOV]

    • @Pico2199
      @Pico2199 2 года назад +48

      At least new vault and new new vault aren't being renamed vault 2.0 and vault 2.0 + new

  • @ulbuilder
    @ulbuilder 2 года назад +386

    Your backups must be tested
    So you know they work as expected
    Offline is best
    So you can rest
    When lightening strikes unexpected

  • @jstadler
    @jstadler 2 года назад +361

    As a full time Sysadmin i always wondered how you guys sustained your data without a real backup plan. As it turns out now, you didn't. Really sorry to hear that guys!
    That's exactly why people like me get hired. Companies think they can do it on their own until they lose critical data to misconfigs and missing maintenance. Hurts to learn it the hard way.
    I really recommend you guys to create offline backups to tape storage for all your archived content.
    And respect for admitting having it done wrong so others can learn!
    Keep on making such great content!

    • @heavyq
      @heavyq 2 года назад +16

      I'm not a sysadmin, just a network guy that dabbles in sysadmin stuff and yeah, it blew my mind to hear what happened here. If they open a spot to hire an IT guy I think I'm gonna apply :D

    • @TheGruwy10
      @TheGruwy10 2 года назад +4

      Get this dude hired, quick!

    • @GrayMatter70
      @GrayMatter70 2 года назад +12

      I'm not a sysadmin either, but I'm also surprised they didn't catch the offline drives earlier. Even without the regular data scrubs, basic monitoring should have caught that. As for tape backups, I agree but also advise caution that tape backups can fail too, so they need to be planned properly. I've done tape backups myself but that was a long time ago.

    • @StarFireG3
      @StarFireG3 2 года назад +6

      Yep. I'm doing this for 25 years now. I worked for a couple of companies with big raid systems but no backup. It's a struggle to get the responsible persons to buy sufficient backup systems. In one case only one week after installing the backup solution and having the first full backup, the main raid system failed and died. Without backup this company would have gone out of business completely. I have seen this happen to companies before.

    • @brighton_geek
      @brighton_geek 2 года назад +2

      You would need one hell of a tape array to backup that kind of data not to mention it would take forever! I don't see tapes a practical offline backup solution for this quantity of data for a company LTT's size. It is better to off have a duplicate server in a DC with clean power and resilient backups and replicate the data, that would act as backup and be a suitable DR solution.
      Without backups I do wonder if they have a BCDR plan in place also?

  • @obedulloa6219
    @obedulloa6219 2 года назад +3271

    If Linus manages his data the way he manages hardware... it's no surprise the data dropped

  • @Cluesman
    @Cluesman 2 года назад +629

    "a lot of power outages" + "transferring that much data might take months" sounds like a recipe for another video in this series.

    • @Carcinogenic2
      @Carcinogenic2 2 года назад +19

      Yeah, on how bad a power grid can be and how important an UPS becomes in such situations.

    • @gorkskoal9315
      @gorkskoal9315 2 года назад +4

      I'll hazard aguess that they keep blowing a fuse. and don't have a generator for the building, or a UPS for the servers.

    • @gorkskoal9315
      @gorkskoal9315 2 года назад +1

      LOL I can see the tittle: Ever try to backup a few sextibytes? or even just a few exobytes?no? well funny thing happend...
      Or "This is awkward...newcubed16 ...."
      Please tell me they have fiber to the new vault and aren't trying to do this over a normal connection.

    • @namAehT
      @namAehT 2 года назад +15

      @@gorkskoal9315 They do have a UPS for their server room, but for a few months they didn't because their UPS caught fire. Also it sounds like they never configured the servers to _safely_ shutdown when the UPS was running low, instead the UPS ran out of power and the servers got plug pulled.

    • @larrylentini5688
      @larrylentini5688 2 года назад +4

      Natural gas backup generators aren't very expensive relative to petabytes of hard drives, they should probably invest in one.

  • @ashleymc5599
    @ashleymc5599 2 года назад +969

    "We never hired a full-time IT person" was stated and I immediately had the urge to bust out the popcorn and look at IT pros in the comment section.

    • @onceuponaban
      @onceuponaban 2 года назад +41

      To be fair many of the LMG staff do qualify as IT pros in skill, if not in formal credentials.

    • @fulcrum7082
      @fulcrum7082 2 года назад +93

      @@onceuponaban no. Just no.
      The constant fuckups shows that they are not

    • @applepie9806
      @applepie9806 2 года назад +16

      The funniest thing is the next two comments under this are from the IT pros.

    • @fulcrum7082
      @fulcrum7082 2 года назад +51

      @@applepie9806 been in IT for 10 years, worked as infrasturcture engineer for hospital, Technical lead for MSP supporting SMEs and finally soloutions architect for a 250m £ company, Im also a freelance consultant I love LTTs vids i normally just have it on in the background whilst im working. but they do make some big mistakes but its all part of the drama :L

    • @SirNarax
      @SirNarax 2 года назад +3

      I am a bit of an IT professional myself.

  • @shwolverine2300
    @shwolverine2300 2 года назад +2619

    Linus: "the way they name HDMI generations are so confusing"
    also Linus: "we move the data from the old vault to new new vault and then name the old vault new new new vault with a bit of upgrade"

    • @jacobleeson4763
      @jacobleeson4763 2 года назад +12

      Lmao

    • @treescompany3462
      @treescompany3462 2 года назад +26

      I hope they pin this

    • @fazz010s
      @fazz010s 2 года назад +3

      Ayyyyy

    • @shawno8253
      @shawno8253 2 года назад +28

      Its the new 14nm +++++++++++

    • @davidyu1813
      @davidyu1813 2 года назад +24

      USB: this guy seems to know his stuff. let's just learn how to name things from him.

  • @loadnabox1943
    @loadnabox1943 2 года назад +2150

    Linus, I have over a decade of experience in managing multi-petabyte ZFS with five nines uptime in large ISP's. I think you may have the wrong cause of the data and it may not (MAY NOT) be as lost as you think.
    Please reach out to me

    • @JosephGamacheKD0AHS
      @JosephGamacheKD0AHS 2 года назад +151

      Upvoting this to get it seen.

    • @B0A2
      @B0A2 2 года назад +87

      Tweet at him

    • @Jumalten001
      @Jumalten001 2 года назад +17

      No you dont

    • @philb5593
      @philb5593 2 года назад +173

      I would recommend that you reach out to them as well.
      Linus does read a lot of comments, but RUclips isn’t a good way to get a response.

    • @lordelliott42
      @lordelliott42 2 года назад +109

      Email their business email address.

  • @OfficialSamuelC
    @OfficialSamuelC 2 года назад +557

    I feel Jake holds a lot more of LTT together with his expertise than we think. Underrated!

    • @riks.1773
      @riks.1773 2 года назад +40

      Fact he takes the time to actually look and uncover this is enough to be praised employee of the month

    • @romanbaranovichi5375
      @romanbaranovichi5375 2 года назад +20

      It also helps that he's worked there from when they were getting serious about their data storage, so he knows the reasoning behind why the things are set up the way they are

    • @kstenders
      @kstenders 2 года назад +6

      @@riks.1773 usually you set up a monitoring with alerting for checking the health state of your storages.

    • @riks.1773
      @riks.1773 2 года назад +2

      @@kstenders yes, but i never assumed they configured that... because other simple things i´ve seen get overlooked

    • @VanlockFR
      @VanlockFR 2 года назад +3

      @@riks.1773 as Linus explained, it's routine checks that they should have been doing monthly, for years. AND they didn't set any email alerts so they never got notified of the failures !

  • @LabGecko
    @LabGecko 2 года назад +245

    Tech Tips' data loss is due to one thing - quantum variability. :D
    The data was in a state of flux until someone audited, at which point it was forced to exist or not exist. Some were observed to be the latter.

    • @HilbertXVI
      @HilbertXVI 2 года назад +1

      Tf are you on about?

    • @LabGecko
      @LabGecko 2 года назад +52

      @@HilbertXVI if you don't like quantum jokes then I'm half-certain there is a dimension on which you didn't comment.

    • @omary5439
      @omary5439 2 года назад +27

      Schrodinger's hard drive?

    • @ThomasGroshong
      @ThomasGroshong 2 года назад +1

      😂

    • @malcomoguji3910
      @malcomoguji3910 4 месяца назад +1

      To be or not to be 😹😹😹

  • @QualityDoggo
    @QualityDoggo 2 года назад +858

    Just hearing "never hired a full time IT person" makes me go "uh oh... I don't like where this is going..." a good sysadmin who can help protect systems is a valuable part of any modern company

    • @danielgomez7236
      @danielgomez7236 2 года назад +153

      The world's biggest IT youtube channel, there's no IT guy

    • @darthkarl99
      @darthkarl99 2 года назад +177

      Classic case of responsibility creep. As Linus and others have become responsible for more stuff as the company has grown their ability to handle routine IT maintenance duties has dropped off, and because it's happened slowly over time it's never quite shown up on anyone's radars as a matter of concern.

    • @uwirl4338
      @uwirl4338 2 года назад +23

      Yeah, because just so you know, only other sysadmins value sysadmins. It's an extremely simple job, so the rest of us think we can do it, and we sure can until our real job prevents us. If only we could teach monkeys a couple of bash commands and have them be sysadmins for a couple dozen bananas.

    • @chrismcveigh4498
      @chrismcveigh4498 2 года назад +38

      As a sysadmin/sysengineer, unfortunately these guys although knowledgeable, aren’t professionals and works doesn’t always mean works properly :/

    • @Habdabi
      @Habdabi 2 года назад +5

      That's why the sys admin job is dying out and most mid sized companies pay less to move it to cloud based systems that are more reliable (for now, until the price gets hiked)

  • @normandabald6501
    @normandabald6501 2 года назад +692

    The second most important thing to consider about backups, behind actually having them in the first place, is TESTING THEM!
    If you don't test your backups then you don't have backups.

    • @jonathanbuzzard1376
      @jonathanbuzzard1376 2 года назад +6

      Only if you have shit backup software. Last year I did a restore of our main HPC file system after and upgrade, everything came back. The only "testing" necessary is the occasional restore when users have done daft stuff and deleted files by accident. Then again I have a "proper" backup system in IBM Spectrum Protect (nee TSM). If you use toy backup systems (aka everything else in my view) then yeah test them regularly.

    • @zazethe6553
      @zazethe6553 2 года назад +16

      This is not a backup system, it's live storage.
      But you are right.

    • @johngangemi1361
      @johngangemi1361 2 года назад +2

      Agreed

    • @jacquesb5248
      @jacquesb5248 2 года назад +3

      yeah actually checking that the backups are running

    • @jonathanbuzzard1376
      @jonathanbuzzard1376 2 года назад +10

      @@jacquesb5248 Nope if you have to "check" that your backups are running then you are doing it wrong. This should be integrated into your monitoring system so you get told that your backup *DIDN'T* run. Checking manually is prone to someone forgetting or been on holiday or insert a thousand other reasons. Also getting told daily that you backup ran also becomes an issue where it is seen as background noise and you get bored checking the same report day in day out. Basically being notified something is as expected is the wrong way to do anything. You need to be notified that something is *NOT* as expected, in this case the backup didn't run to completion without errors.

  • @leodoz1016
    @leodoz1016 2 года назад +1055

    Alternate title: The LMG group MIGHT hire an actual IT person

    • @callowaymotorcompany
      @callowaymotorcompany 2 года назад +92

      Linus Media Group group

    • @leodoz1016
      @leodoz1016 2 года назад +18

      @@callowaymotorcompany yes

    • @Achilleaa
      @Achilleaa 2 года назад +24

      Linus LMG group might hire an IT person for new new vault

    • @frozenturbo8623
      @frozenturbo8623 2 года назад +2

      @@Achilleaa With Seagate again for another Vault.

    • @DevamBansal
      @DevamBansal 2 года назад +2

      @@callowaymotorcompany LMGG group?

  • @RobertCrawfordRobert4049
    @RobertCrawfordRobert4049 2 года назад +16

    As soon as they switched from storage spaces I kind of saw this coming; I've got a 912tb S2D cluster that serves as storage for about 200 or so virtual machines and it's been rock solid and performance with NVME cache has been solid. One of the things I saw on Spiceworks was a warning about over engineering infrastructure.

  • @makingtechsense126
    @makingtechsense126 2 года назад +590

    Tape (LTO-9) is still an affordable option for backups. Especially for data that doesn't change. Yeah, it's old tech but it still works.

    • @mkastelovic
      @mkastelovic 2 года назад +59

      Yep, completely agree with you, Tape library with LTO 9 tapes will be much safer. And it isn't so slow as people think. :)

    • @jspafford
      @jspafford 2 года назад +28

      @@mkastelovic 250-300MBps. And they have worm tapes. And by using a dual drive tape robot, it makes backups completely automated. Restores too. Backing up to individual LTO drives having to load tape after tape is too much labor. Backups will never get done.

    • @unlink1649
      @unlink1649 2 года назад +71

      Modern tape storage has INSANE capacity. We are talking 32 petabytes per rack. ETERNUS DX600 S5 is one such system.

    • @mkastelovic
      @mkastelovic 2 года назад +14

      @@jspafford Well, if you have the library, the backup is done automaticaly, plus in their case, we are speaking about the incremental backups, where most of the old videos doesn't change at all ;), so Backup will be done during the night.

    • @jojojojo4332
      @jojojojo4332 2 года назад +7

      I agree with all of you, except for one thing. Linus has expressed that he has quite a lot of data that he says isn't that important. Meaning that buying a tape robot, would be quite a expensive investment. Maybe not even worth trouble.

  • @waveformdistortion
    @waveformdistortion 2 года назад +49

    Well if you hadn't made this video, I never would have known to check if automatic scrubbing was enabled on my storebought NAS. It wasn't. I don't believe it's ever suffered a power failure, being connected to a UPS and configured for automatic shutdown when the UPS drops below 50% battery since day one, so no automatic scrub on resume either. It's now set to automatically scrub once a month, so thanks!

    • @linusnexus9000
      @linusnexus9000 2 года назад +1

      Same here on a Synology box, thanks to your comment I checked and noticed it wasn't enabled either. I also activated a monthly schedule :)

  • @SimonPoirier
    @SimonPoirier 2 года назад +945

    Other Pro tip: if building such a large scale storage, make sure your disks are from different manufacturing batches. Imagine the nightmare is having disks with consecutive serials wearing out and failing almost at the same time.

    • @lostintechnology1851
      @lostintechnology1851 2 года назад +102

      Or they could just buy a professional backup solution and get proper training operating it plus a maintenance contract. You know the way every real enterprise would do it :D

    • @entelin
      @entelin 2 года назад +168

      @@lostintechnology1851 It's a different situation, he said this is non essential archival footage, the creation of these servers created content, the failure of it created content, and yeah, backing that stuff up would cost a lot of money... so risk/reward. The best option isn't necessarily always the right option.

    • @gamingbud926
      @gamingbud926 2 года назад +6

      That is... a pretty smart idea.

    • @lupsik1
      @lupsik1 2 года назад +4

      @@entelin Didnt watch the video yet but it sounds like something that RAID 5 would solve instantly and would cost them barely any storage with that many hard drives

    • @KingSvenDeluxe
      @KingSvenDeluxe 2 года назад +33

      Or just never use Seagate.

  • @cromulence
    @cromulence 2 года назад +21

    I’m responsible for our SANs at work and there’s something else that wasn’t touched on in this video - make sure you configure email reporting from your storage nodes! The sooner you’re notified about issues, the sooner corrective action can be taken. Additionally, if possible, keep hardware spares at each site where the hardware is, so if a drive has failed (or even if it’s in a predictive failure statue), you can swap a new drive in ASAP. Same goes for other hardware, such as controller cache batteries; these too can fail, and can do so silently, allowing the node to continue working, but with degraded performance.
    TL;DR - Keep an eye on your infrastructure and monitor it!

    • @Nickwilde7755
      @Nickwilde7755 Год назад

      This. If they had been notified from the first drive, this most likely would've been prevented

  • @Kblender798
    @Kblender798 2 года назад +125

    Please adopt LTO tape backups into your workflow! It's indispensable as a deep storage solution, especially within my field of work (film industry).

  • @paulbrooks4395
    @paulbrooks4395 2 года назад +66

    I worked for an MSP where they had fired the previous person in charge of backups. I was on the infrastructure team. We found that 65% of our customer backups were no good and something like 85-90% of offsite replication was failing. It was 8 months before we could return all the backups to normal and reduce the back checks workflow to less than a few hours per week. During the 8 months, it spend the first 4 working to get the backups all straightened out with almost every hour of my workday.
    Suffice to say, having an ops team with competent people who are organized and themselves redundant and able to check each other’s work without judgement is absolutely paramount for a team in charge of critical systems.
    I personally love working on backups because it’s a silent way to ensure continuity while working with amazing technologies.

    • @capps1994
      @capps1994 2 года назад +5

      As someone in IT I know the pain, one thing I go by is that you don't have a valid backup unless you have tested it. I've had some times (granted back in the day like 8 - 9 years ago) where the software would say its a good back up. god forbid that you need to restore as it will just fail. they are very fun times they are

    • @Phoen1x883
      @Phoen1x883 2 года назад +3

      Good worker! Providing billable service with no ongoing expenses like "maintenance" or "checking the backups".
      -most MSP management, probably

    • @aravindpallippara1577
      @aravindpallippara1577 2 года назад

      @@Phoen1x883 well if you can do something in company time to help the company bottom-line... You should?
      There sre things like loyalty and good will even in corporates

  • @DarrynJones
    @DarrynJones 2 года назад +1848

    "I'm the highest ranking person in the company, the highest ranking person in the IT team, and the person who decided not to hire a dedicated IT staff. There is no way to determine who's accountable here" - Linus 2022

    • @connorwilliams9285
      @connorwilliams9285 2 года назад +100

      Bet he still might not hire one since he 'learned his lesson'. Oh well live and learn!

    • @Dimmers
      @Dimmers 2 года назад +16

      @Connor Williams but by that logic it means he will fix what they failed at and not for anything that may arise. If they don't have a full time or part time IT person then the same or similar issues are doomed to happen again

    • @connorwilliams9285
      @connorwilliams9285 2 года назад +10

      @@Dimmers that's my point, hopefully we see a video posted asking for applications soon so this doesn't happen again!

    • @KP3droflxp
      @KP3droflxp 2 года назад +14

      @Connor Williams it would be quite dumb for them to hire an IT specialist because a good portion of their content is working on their own IT systems.

    • @RalphInRalphWorld
      @RalphInRalphWorld 2 года назад +50

      @@KP3droflxp they need an IT specialist to schedule and perform regular preventative maintenance. Otherwise, their team will just fix things when they break like this video.

  • @MerlinsBeard91
    @MerlinsBeard91 2 года назад +4

    As someone who works in the IT field for a small company I will be following this very closely. Anything that you guys do like this I absolutely love and try to implement it if it is appropriate for my company.

    • @CommanderRiker0
      @CommanderRiker0 2 года назад

      Most enterprise NAS already do all this for you, for example Synology.

  • @GTRShaun
    @GTRShaun 2 года назад +223

    In the takeaways at the end of the video, there was no mention of monitoring. If zfs zed was configured to email somebody/service desk on events like drive failure, this disaster could have been averted by replacing failing drives one at a time as they failed instead of accidentally finding the house of cards your enterprise is built on. Monitoring for failure should have been the most prominent takeaway.

    • @davidbubble6863
      @davidbubble6863 2 года назад +8

      My take away is no system is safe from hard drive failure and owner of system this big should hire someone dedicated to take care of it.

    • @yensteel
      @yensteel 2 года назад +15

      Thought it was weird too. An email as soon as one drive fails could reduce response time. The number of drives they are handling meant the chances of 2 or more failing at the same time is pretty high.
      What about reserve drives to automatically repair when one degrades? Not foolproof but a good start. For bit rot, more frequent scrubbing?

    • @glenby2u
      @glenby2u 2 года назад +3

      even a post power outage check or weekly job for an intern... oh well. once is a mistake, twice is a problem, thrice = low value asset.

    • @rosen9425
      @rosen9425 2 года назад

      My thoughts too. File it under "mistakes where made", it's the big locker you can't miss 😁

    • @NumptyMcNumptyface
      @NumptyMcNumptyface 2 года назад +5

      Not just configure it, also test that configuration. I've worked at a place where the storage system was set up to send an email in case of pending doom. Problem was it wasn't configured correctly so the emails never reached their recipiant.
      How did they found out about the impending doom? Well, the system also gave off a sound alert as well as flashing a LED which were only noticed when I was given a tour of the server room.

  • @JeffGeerling
    @JeffGeerling 2 года назад +572

    I think we all have a lot of cases of 'didn't follow our own advice' in the storage/DR world. Unless it affects your bottom line, backups and DR tend to be lower on the priority list.
    And lower on the priority list usually means either "not configured at all" or at minimum "never been tested before" :(

    • @SodaWithoutSparkles
      @SodaWithoutSparkles 2 года назад +4

      Always test your backup and fail-safe. There is no use of having a backup but it doesnt work ar all.
      Dont just do backup, TEST your backup

    • @Deerhunter360
      @Deerhunter360 2 года назад +6

      @Nimki rafa 8 shut up bot

    • @lostphotographs3936
      @lostphotographs3936 2 года назад

      As a fellow Repair and Recovery guy in the SS world we sell hundreds of drives globally to guys in that very situation. TRUST ME !
      new new vault...... " vault 3 " ..... 😇

    • @ImAManMann
      @ImAManMann 2 года назад +1

      I always follow my advice for backing up data because there is a simple rule... if you back up your data, you won't need the backup, if you don't back up your data you WILL need the backup.

    • @waspennator
      @waspennator 2 года назад +1

      Backups and UPS should be essentials at this point, lost drives on my old comp cause I had the "bright" idea to use it in the middle of a bad wind storm with only a surge protector.

  • @bencoomer2000
    @bencoomer2000 2 года назад +227

    You know. It's nice to see someone that handles things like an adult, admit mistakes, acknowledge that some failures aren't simple "that person screwed up", and use it to constructively fix problems

    • @beermarket9971
      @beermarket9971 2 года назад +8

      If he was handling this as an adult he would have hired a fulltime IT long time this is childish

    • @alias_not_needed
      @alias_not_needed 2 года назад +7

      @@beermarket9971 Why? It is everyones own choice how important their data is. If they can live with the loss of some old footage, i see no problem in their actions...

    • @beermarket9971
      @beermarket9971 2 года назад

      ​@@alias_not_needed There are plenty of reasons why this is childish in my POV:
      For one you should value what belongs to you and protect them from predictable breakdown otherwise you come out as a spoiled child.
      Second, as a CEO you have to duty to protect and save your employees work, while accidents do happen when they are caused by a lack of prevention, the people in charge (or CEO) come out as childish.
      Finally, when a CEO cannot hold someone accountable for data loss (or work loss) it's ultimately his fault and he should just own it but, maybe i missed it, but it didnt quite come out like that.
      I don't want this to come out negative, i like LTT and it looks like an amazing place to work, and i admire Linus. But this is frustrating to watch...

  • @gwheeler1609
    @gwheeler1609 2 года назад +1

    Mate, I really appreciate the honesty of this video. Eating humble pie in order to educate your viewers shows real dedication to your mission.

  • @brodur
    @brodur 2 года назад +331

    I am very interested to see how the recovery process goes. As someone who has only ever done disaster recovery in the realm of terabytes... yikes. Good luck friends.

    • @detingzonen7048
      @detingzonen7048 2 года назад

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @FireWyvern870
      @FireWyvern870 2 года назад +20

      Damn, these bots
      #RUclipsKilledTrustedFlagging

    • @theluigifan42
      @theluigifan42 2 года назад +2

      these bots out here calling youngboy "extravagant"

    • @leexgx
      @leexgx 2 года назад +1

      What I don't und2is why isn't auto mod capturing then (when ever I post a link 90% of the time my post gets auto modded, it disappears)

    • @FireWyvern870
      @FireWyvern870 2 года назад +1

      @@marcogenovesi8570 both are problems. One is not higher than the other.

  • @lucasmenchone2826
    @lucasmenchone2826 2 года назад +715

    HR meeting with Linus: “All our data has been lost, i’m gonna fire someone…
    But not before i fire up our segway to our sponsor…”

    • @cogYo
      @cogYo 2 года назад +5

      🤣🤣🤣🤣

    • @klaasmuller9663
      @klaasmuller9663 2 года назад +16

      *Segue

    • @DailyCorvid
      @DailyCorvid 2 года назад +1

      Linus is the only person who's adverts I enjoy. Angry Joe started putting tonnes of effort in to his, but they are so forced!! I think Linus actually gets a laugh-kick out of saying LTTSTORE where it's crowbarred into something lol. I know I do, but not as much kick as the coffee in this LTTSTORE FLASK WILL HAVE.
      Linus dude, over all the years I have watched you I don't think I ever credited you properly. Well done man, this thing you've all created is really cool :)

    • @UncleKennysPlace
      @UncleKennysPlace 2 года назад +5

      @@Avendesora Except, of course, it's totally wrong to use one of those words for the other. Unless your server room is so large that you must use a Segway to get to the sponsor.

    • @SuperNGLP
      @SuperNGLP 2 года назад +1

      Gotta make up for that loss of money somehow.

  • @technogamer18
    @technogamer18 2 года назад +917

    “This caused the array to offline itself to prevent further degradation”
    …Been there, array. Been there.

  • @emeraldmorningmist
    @emeraldmorningmist 2 года назад +1

    First off, I am sorry for LTT about the data loss. Secondly, I am glad it wasn't "active" or current data but rather old RUclips videos, and those can be recovered (but only the uploaded videos and not any extra material/footage you had stored). Good luck on the project!

  • @moralapostel
    @moralapostel 2 года назад +402

    Big mistake to immeidately replace the drives that weren't even dead, which just showed some failures. By removing them LTT removed all the (still good) parity data on those. Probably should've run a scrub first, and then remove the possibly malfunctioning drives.

    • @hallif7295
      @hallif7295 2 года назад +2

      Wouldn't that take a long time tho?

    • @bkrich
      @bkrich 2 года назад +2

      Yeah I was thinking the same

    • @AyoKeito
      @AyoKeito 2 года назад +8

      I'm pretty sure those wouldn't survive a scrub either.

    • @bkrich
      @bkrich 2 года назад +29

      @@AyoKeito we wouldn’t know they for sure but we do know it didn’t survive the replacements

    • @dracotrapnet
      @dracotrapnet 2 года назад +6

      If they are offlined, they are already dirty parity data.

  • @wesrihn
    @wesrihn 2 года назад +261

    Ahhh, the reason I originally subbed to LTT, insane server builds and configs.

    • @theairaccumulator7144
      @theairaccumulator7144 2 года назад +13

      Insanely bad and mismanaged server builds

    • @UrielZeptim
      @UrielZeptim 2 года назад +5

      @@theairaccumulator7144 the point still stands

    • @anona1443
      @anona1443 2 года назад

      And lots of dropping expensive hardwares

  • @DangerousDac
    @DangerousDac 2 года назад +195

    Well this "presentation" format certainly has a different energy to it than Whonnock died.

    • @philb5593
      @philb5593 2 года назад +31

      The vault is hardly the beating heart of the company that whonnock was, and sounds like this unfolded over the course of days and weeks as Jake found the issues and they are still working on rebuilding the data.
      The vault is just archive data. Whonnock is the in progress projects, and I think at that time Linus said there was no backup.

  • @_GhostMiner
    @_GhostMiner 2 года назад +55

    Linus being so calm while talking about one of his/their biggest oopsies is so cool 😄

  • @myname7021
    @myname7021 2 года назад +149

    10:30 and most importantly: monitor your environment! SNMP, Syslogs and even specialized monitoring agents are an easy way to monitor your environment.

    • @grrkaa8450
      @grrkaa8450 2 года назад +3

      PRTG has entered the chat

    • @towel2473
      @towel2473 2 года назад +15

      The irony is that they advertise these products in segways but don't implement them it seems.

    • @BTMikeMan
      @BTMikeMan 2 года назад +6

      @@towel2473 I was going to say, did they not have Pulseway deployed :)

    • @adg1355
      @adg1355 2 года назад +1

      Rather messages from SMART and HBA utilities.

  • @seriphim8542
    @seriphim8542 2 года назад +123

    At that density and the infrequency of the older data being updated you really should consider acquiring a tape library. A couple iSCSI targets and a 250 slot LTO library would keep you until you more than double your current use. But considering the increasing file sizes of the raw files you're ingesting I would recommend going for a 3-3.5X scaling.

    • @grrkaa8450
      @grrkaa8450 2 года назад

      A 250 slot library for what? 3 PB of direct access tape storage?

    • @killer2600
      @killer2600 2 года назад +5

      Tape is slow. I think the whole point of their setup is for fast access to footage new or old for editing purposes. If they were just hanging on to it for keep sake then Tape is an option but I think they keep it so they can retrieve previous footage on-demand to splice into the current video being edited.

    • @joross8
      @joross8 2 года назад +11

      ​@@grrkaa8450 Tape is slow, but much cheaper per TB.
      Typically you would have a hybrid system where users interacting with the data would hit high speed disk storage of some sort, and that disk storage would be running software that would migrate copies of files, or just less accessed files to tape.
      It's effectively the best of both worlds, users have the speed and accessibility of high speed storage, but the high speed pool is much smaller, and most of the archival data is on less expensive tape drives. The only time you hit a slow down is when a user has to access the stuff on tape which would be normally pulled when the user accessed a stub file representing the file on the disk pool.

    • @Wooble57
      @Wooble57 2 года назад +5

      @@killer2600 so do both? use tape as a economical backup option.

    • @MDKAOD
      @MDKAOD 2 года назад +8

      @@grrkaa8450 Why keep the data in hot storage at all? Archive to tape (not backup) toss it in a fire safe.

  • @jblyon2
    @jblyon2 2 года назад +43

    I've been through a number of mergers and acquisitions over the past 10+ years. On every single one the IT dept/employees who do IT tasks for the other entity have been running without viable backups, server monitoring, out of band management, or alerting. Most also lacked UPS units (or working UPS units), and one was even running RAID0 on a production server and couldn't figure out why it kept failing on them. It's a scary world out there.

  • @eternalko
    @eternalko 2 года назад +2

    A very practical advice. Store you "old" archival data (like photos) on hard drive that is not connected to power / server. Use other cloud storages all you want but just keep a one, disconnected, low tech option.

    • @klebdotio3284
      @klebdotio3284 2 года назад

      Suddenly I feel smart for keeping my backup drives in an antistatic bag unplugged

  • @billhollinshead
    @billhollinshead 2 года назад +534

    A data *recovery* policy abides with this: "The only 'known-good backup' is one that you *have* successfully restored." 😀

    • @klaernie
      @klaernie 2 года назад +12

      There is even the question, if the old Premiere projects are still loadable in current software versions..

    • @khatarin
      @khatarin 2 года назад +12

      Former Data Protection Product Manager here for some 30-40k servers at my old job: Yes. :)

    • @feesh9977
      @feesh9977 2 года назад

      Eeplwllwlwl

    • @ProTechShow
      @ProTechShow 2 года назад +1

      This is the way

    • @DAndyLord
      @DAndyLord 2 года назад +7

      When discussing redundancy, one is none, two is one. That's how I discuss backup options with my clients. If it's mission critical you need a layered backup system.

  • @captdev
    @captdev 2 года назад +522

    As an operations engineer, the amount of red flags that the process you followed here brought up was terrifying. Please write processes for this sort of stuff and test them - it's all fun and games till you lose something essential because of a stupid decision from 5 years ago

    • @williameldridge9382
      @williameldridge9382 2 года назад +32

      Not to mention they used Seagate drives. They are just completely unreliable. I wouldn't trust them in any circumstance. I've hundreds of Seagate drives due to failure, but only a handful of WD/Hitachi. It isn't surprising as Seagate purchased the worst hard drive company that ever existed, Maxtor. And they didn't learn their lesson, they got even more Seagate drives.

    • @jrdemasi
      @jrdemasi 2 года назад +13

      Why anyone trusts this guy for basically anything is beyond me. Lol.

    • @mikex4941
      @mikex4941 2 года назад +7

      @@williameldridge9382 Got a different experience. I'm still rocking Seagate and WD drives while all of my Hitachi drives from the same era as all my other drives died. But not sure right now though.

    • @esbekay
      @esbekay 2 года назад +2

      seriously, its hard to watch

    • @JLeYang
      @JLeYang 2 года назад +20

      @@williameldridge9382 Hard drive manufacturers have all had bad batches, it's just the nature of the beast now. I have had failures from all brands in usage. You should see hard drives as a consumable (especially as a storage array), run SMART and replace when health is detected as bad. The bigger issue is people not doing backups, that's a failure on you and your users to not enforce that.

  • @TristensMadness
    @TristensMadness 2 года назад +322

    Please be server room related. I’ve been craving some of that content recently

    • @toxicxshotsx
      @toxicxshotsx 2 года назад +13

      Me too man!! Also ^s/o to the milfs in the 20 mile radius comments ahah

    • @williamprimeee
      @williamprimeee 2 года назад +2

      yeah we all wana see his server ;)

    • @gabrielrojasg.3180
      @gabrielrojasg.3180 2 года назад +1

      I started following Linus by server content haha

    • @SuperNGLP
      @SuperNGLP 2 года назад +4

      You just have to wait until something goes wrong and boom new server content!
      Maybe we pay seagate to send Bad drives, so we get new content sooner?
      Sounds like a good, reasonable idea.

    • @frozenturbo8623
      @frozenturbo8623 2 года назад +1

      Wait until Seagate fails again in Vault 3 then we have Vault 4 until we got into Vault 76 and That marks the End of Seagate.

  • @karenwang313
    @karenwang313 2 года назад

    Mad props for coming out and saying you guy screwed up. All of us can learn this and hopefully not lose any data of our own.

  • @Unreasonable0ne
    @Unreasonable0ne 2 года назад +386

    I'm just wondering why LTT didn't go for tape storage for their servers, since, as Linus said himself, it was for archival purposes and more of a fun project to test out the tech they got. They even got a tape drive some time ago afaik. It doesn't make sense to keep the drives spinning for years if they are not actively used or maintained.

    • @PanKosiu
      @PanKosiu 2 года назад +44

      Basically this. it was the first thing I thought of. If the archive data never changes, tapes will be crazy cheap way of backing up old videos.

    • @Stasiek_Zabojca
      @Stasiek_Zabojca 2 года назад +29

      Because they probably want to have quick access to it, I think... To cut something out of old video and things like that? As far as I know, tape storage does not give you that luxury.

    • @aoeuable
      @aoeuable 2 года назад +27

      @@Stasiek_Zabojca You could store lower-bitrate stuff on fast storage for browsing and only get the tape out when you need access to the original files.

    • @666Tomato666
      @666Tomato666 2 года назад +14

      Tape storage is cost competitive on the level of multiple petabytes, not single petabytes.
      So it's nothing that any significant minority of viewers will ever see in person, let alone be part of decision making process to buy, install or configure.

    • @beid777
      @beid777 2 года назад +8

      Because he'd rather have "dope hardware" instead of using tape. If they need access to it that's fine, every week or month or time frame you do a fresh backup to tape and keep your servers running for access and have tapes as backup. He failed to implement backup in depth which is basically industry standard.
      Archive is not backup. Redundant and separated storage of data is backup.

  • @Tetraknot
    @Tetraknot 2 года назад +90

    Love your show! Just wanted to chime in here coming from an IT background supporting large companies in datacenters as well as being a content creator. Trying to maintain an accessible RAID of ever growing content only gets more difficult and expensive over time. You will eventually need a full time employee to manage your content if you go this route and at some point you will need to migrate your entire content to a new RAID when 1 petabyte isn't enough anymore and that's not going to be fun.
    The alternative cheaper and simpler solution is to archive your content to tape which will have a much higher chance of surviving the years to come as it's not on spinning platters that run 24/7. Yes, getting access to a piece of content you want to grab on short notice will be more annoying but you can always keep a smaller RAID with your completed videos and archive your raw content via tape as it's the RAW video content that really eats up the TB which is why you might want to consider archiving your raw video.

    • @pixelmaster98
      @pixelmaster98 2 года назад +8

      just build a giant data center that uses robots to automatically fetch & read tapes, so it's at least automated, even if it still takes half an hour. Building a data center is probably also great content for the channel ^^
      /s

    • @alextraska
      @alextraska 2 года назад +6

      @@pixelmaster98 yea until crash override and acid burn have a hacking battle with your tape robots

    • @zicklane
      @zicklane 2 года назад

      Ok no one asked

    • @blademan7671
      @blademan7671 2 года назад +3

      This response from a pro is why you would leave a job like this to pros. As this pro demonstrated, #1 is identifying and understanding the requirements. Do you really need all your old content available online, or maybe offline is good enough? Then solution to fit the needs.

    • @geoff_cline
      @geoff_cline 2 года назад

      This could also be done with AWS Glacier

  • @gvfc
    @gvfc 2 года назад +773

    In my first months as a sysadmin I learned a lesson: always keep a secondary backup that isn't on-premise. Power can go out, and you'll have a few bad sectors on your drives. But if there's a fire and your server goes with it, all of a sudden giving a few bucks to Jeff Bezos doesn't sound that bad of a deal after all.

    • @radical_dog
      @radical_dog 2 года назад +62

      Yeah, not paying for cloud storage basically confirms that they wouldn't cry to sleep if they lost the whole lot. Which is a reasonable decision since it's not mission critical data.

    • @TonytheEE
      @TonytheEE 2 года назад +27

      They had a remote server in a previous VLOG a year or two back. I wonder what's up with that?

    • @tpmeredith
      @tpmeredith 2 года назад +10

      Heck anyone with a 5 or more user office 365 tenant can get unlimited onedrive backup. Yes it's slow to backup, yes it's full of details like 25TB sharepoint sites that you have to subdivide, but it IS unlimited for very cheap and an offsite backup.

    • @radical_dog
      @radical_dog 2 года назад +64

      @@tpmeredith No such thing as "unlimited", it just means "we haven't written down a hard limit". 720TB would definitely be knocking on that door!

    • @Kevin-jb2pv
      @Kevin-jb2pv 2 года назад +9

      I think they've covered this in the past, and the problem is that they just have so much at this point that the upload will take forever. But that doesn't mean you're not right. If anything, they should do it _now_ because every day they wait is going to just be more they have to upload. I'm sure there's something out there that will just start uploading everything in the background until it catches up.
      Also, IDK if it would only be "a few bucks" for the amount they need. IDK what that kind of enterprise level storage costs, but it's probably not cheap and I'll bet that even on "unlimited" cloud storage plans there's probably a catch written in the fine print with some way of restricting the storage in practice, like restricting the upload bandwidth past a certain amount of data uploaded to such a slow rate that they would never be able to upload faster than they create new data...

  • @glock21guy
    @glock21guy 2 года назад +15

    I don't really think this was an issue of not having a "tech person", or "not having time" to set up. It was simply an oversight. Setting up scrubs and SMART alerts doesn't take long, and you certainly don't need a full time person sitting around waiting for trouble notices from monitoring applications.

  • @JessSimpson1313
    @JessSimpson1313 2 года назад +39

    Hey Linus, 2 best practice recommendations I didn't hear you state, but would be very important. 1) per every 24 disk (avg) you should have 1 hotspare. This drive should be in the same HA zone as the 24 disks so if a failure occurs or a scrub detects failures it can automatically start the rebuilds to the spare, this gives you time to get replacement equipment etc, without having to worry about your data while your purchasing. 2) If you cannot do full backups such as to a cloud or dedicated location, the next best thing is to ensure your data is across 2x different technology solutions. As this is entirely archival, and your not worried about location protection having your 2nd system be replaced by either an always on VTFS (virtual tape FS) or just streamed to tape backups and 1 guy pulling the tapes about once a month. Tape is rather inexpensive and had a great shelf life. I've been doing IT Storage and Data Protection engineering for the last 10 years, and customers in your position of not having dedicated staff but are gathering increasingly large data sizes is all too common sadly.

    • @peterpain6625
      @peterpain6625 2 года назад +6

      They're using Seagate drives. So for every drive they should have a hotspare ;) No seriously. I'd go 12D 1HS at least. The way they manhandle their servers i'd say VTFS is prone to a hilarious video with a lot of grey confetti :D

    • @brucepayan2845
      @brucepayan2845 2 года назад

      Offsite rotating backups?

    • @JessSimpson1313
      @JessSimpson1313 2 года назад

      @@brucepayan2845 that would be ideal, but in the video they said they couldn’t afford offsite backups.

  • @perrygolden
    @perrygolden 2 года назад +169

    When your downtime and data loss is measured in lost $, hiring full time systems engineer becomes a very attractive value proposition.

    • @gabrielenitti3243
      @gabrielenitti3243 2 года назад +13

      i don't think any of this will produce any downtime for his company. The Petabyte worth of data he may loose as he said is just a "nice to have". It's not the actual production server where they store the current projects and videos. His employees may not even know about this data loss.

  • @yrmoma
    @yrmoma 2 года назад +39

    Thank you for this. I've never scrubbed my ZFS pool because I didn't know what that meant. I now have it set up to do it monthly and am running one as we speak. 5 hour estimate for completion

    • @yrmoma
      @yrmoma 2 года назад +21

      Update: no errors on all drives of my 8 drive Z2 array. Awesome! Took about 8 hours and they're 2 tb drives.

  • @onlnagent
    @onlnagent 2 года назад +13

    It's amazing that a company in the tech field can take such a YOLO approach to backups and still be credible to some.

    • @johnathanera5863
      @johnathanera5863 2 года назад

      Because its frankly unimportant for their company. Get that stick out your ass bud.

  • @Jordan_C_Wilde
    @Jordan_C_Wilde 2 года назад +115

    "We lost a sh*tload of video data, lets make an educational video about it" - Most Linus thing ever

  • @henningbutz2289
    @henningbutz2289 2 года назад +178

    Lets set this straigt: There are more backup options than local spinning disks and cloud storage. The cheapest way would be a LTO Tape-Library. An LTO8 Tape (12TB of uncompressed storage) is about 50-100€, thich is only a fraction of the cost of spinning disks. Also they are archival grade and can be labelled and stored on a Shelf somewhere. As their backup files dont really change you could just put a few projects on one tape and chuck it in the warehouse.

    • @thomasphillips885
      @thomasphillips885 2 года назад +10

      Yeah he's done a video about tape storage before

    • @7eis
      @7eis 2 года назад +14

      This is not the logic channel

    • @bostjanko
      @bostjanko 2 года назад +2

      You must be old :-), like me.

    • @markm4120
      @markm4120 2 года назад +16

      Yep, the system my team and I designed included LTO with 2 robotic libraries. Archival data doesn't belong on a hard drive.

    • @SierraLimaOscar
      @SierraLimaOscar 2 года назад +16

      While I agree with the archive not being on spinning disks, long term storage of tapes is an issue in itself. It requires regular maintenance, climate controlled warehousing and copying every few years. I work in broadcasting and I have only seen deep archives done correctly maybe once in my career. I have quoted archival systems several times and the face customers make when they see the numbers and are then informed it does not include any recurring and on-going operational cost is always funny (not really).

  • @BiffaPlaysCitiesSkylines
    @BiffaPlaysCitiesSkylines 2 года назад +533

    Up to 80tb myself and needing more soon....! This hoarding raw footage is a nightmare 🤣

    • @Briceronie
      @Briceronie 2 года назад +33

      hey i watch your cities skylines videos. hope your day is going well. much love

    • @TheMallaclllypse
      @TheMallaclllypse 2 года назад +24

      Hello everybody and welcome back to the next episode of fix my NAS.

    • @StrokeMahEgo
      @StrokeMahEgo 2 года назад +4

      Consider cloud, or tape based backups that you mail to a trusted friend or put it in a safety box at a bank.

    • @BiffaPlaysCitiesSkylines
      @BiffaPlaysCitiesSkylines 2 года назад +10

      @@Briceronie hi, thanks 😊

    • @BiffaPlaysCitiesSkylines
      @BiffaPlaysCitiesSkylines 2 года назад +7

      @Malaclypse The Elder yes, that'll be me soon lol 😆

  • @jasonlevi7030
    @jasonlevi7030 2 года назад +15

    Sure sounds like a good time to make a video about how tape drive systems aren't as obsolete as many might think and maybe even get yourself a super cool tape robot!
    You could also dig into data reconstruction/recovery software to see what you can pry out of the drives you've pulled and maybe try out the old "HDD in a freezer" trick.
    There you go. Two new video ideas (that I'd love to see presented by Jake and Anthony respectively) to hopefully recoup some of the costs of this oversight.

    • @mrmotofy
      @mrmotofy 2 года назад +1

      I used to think tape drives were old...I recently seen a tape drive with TB or something...guess I was wrong

    • @pof1857
      @pof1857 2 года назад +2

      @@mrmotofy LTO-9 is 18TB/tape.

  • @tkirchmann
    @tkirchmann 2 года назад +280

    (oversimplified) Summary: The power dropped out a bunch of times and LTT dropped the ball on configuring the servers so the servers dropped a bunch of errors before dropping physical drives out of the servers resulting in the servers permanently dropping some data... I see a familiar pattern here.

    • @RippahRooJizah
      @RippahRooJizah 2 года назад +3

      HOLD IT!
      I'm not sure what you are getting at.

    • @ZNotFound
      @ZNotFound 2 года назад +20

      At least they get to drop a new video about it.

    • @sushimshah2896
      @sushimshah2896 2 года назад +5

      Would've been nice if (Mass)Drop sponsored then as well

    • @Thefreakyfreek
      @Thefreakyfreek 2 года назад +4

      Linus drop tips

    • @4-Avenue
      @4-Avenue 2 года назад +5

      how are we suppose to trust linus' tech tips if they keep dropping the ball :(
      But atleast they show us!

  • @andrewnotmyrealname7827
    @andrewnotmyrealname7827 2 года назад +735

    All techs: "Follow this advice!"
    Those same techs: "YOLO"

    • @placate9051
      @placate9051 2 года назад +45

      Ay gotta know the rules before you break them

    • @datingzoneo798
      @datingzoneo798 2 года назад

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @K-----
      @K----- 2 года назад +3

      To be fair it's more, follow this advice if X and then the same techs don't really have X. He basically said that at 9:27

    • @snowysysadmin59
      @snowysysadmin59 2 года назад +4

      Ok but we all know linus has said before "do as i say, not as i do"

    • @TheAssirra
      @TheAssirra 2 года назад +3

      "Do as I say, not as I do".

  • @marcozanuttigh2060
    @marcozanuttigh2060 2 года назад +60

    the perfect opportunity for testing out tape backup! i had 4hdd's failing at the same time in my raid 6 storage server with total data loss. i recover all my data from my tapes! it was only 80TB of data, but when come to price for large backups, tape is king!

    • @heavyq
      @heavyq 2 года назад +3

      Tape is so underrated by so many people. It's such a great choice for storing a shitload of data for long periods of time.

    • @dakyno
      @dakyno 2 года назад

      "only 80TB" bruh

    • @geort45
      @geort45 2 года назад

      @@heavyq problem is the drives cost a shitload...

  • @justindacosta3d
    @justindacosta3d 2 года назад +2

    Thanks for doing this video, I'm sure this made a LOT of people go back and check whether their home servers, or servers they support to make sure they are not vulnerable.

  • @bluedot5555
    @bluedot5555 2 года назад +194

    Considering a lot of this is for cold storage, it would be neat to see you implement a tape drive for this use case. They would store a ton of data, pretty cheaply, and safely. Also something that many people don't even know is still is use

    • @jimbo-dev
      @jimbo-dev 2 года назад +15

      He does have tape drive, but he probably would need an autoloader and bit of backup automation 🤔

    • @aimannorazman7959
      @aimannorazman7959 2 года назад +16

      yep, tape is very viable, especially if its only accessed a few times after the final video have been uploaded to RUclips like Linus said.

    • @florabee9283
      @florabee9283 2 года назад +3

      Tape backup burns less coal too, and are immune to blackouts.

    • @chaos.corner
      @chaos.corner 2 года назад +7

      @@florabee9283 Also electrical problems. For small concerns like myself, I've tried tape but disk is just easier and cheaper and tracks needs better. For Linus, tape is definitely well worth a look.

    • @TheBacktimer
      @TheBacktimer 2 года назад +5

      I made the case here many times how it is overkill to storing all footage in raw data. Not really surprised this strategy failed, very sorry to hear this. It's not like you can simply backup a few petabytrs to another machine. So yeah, tapes. Mby eben Amazon glacier? At least I would have made a second backup tier to store compressed data. Another option would be to store finished renders in max quality on Blu-rays. That's still a lot better in case of a permanent loss. And a lot cheaper.

  • @cbrugiati
    @cbrugiati 2 года назад +182

    There's another type of storage for enterprise who needs a lot of storage. LTO is a lot better when it's too much data like you have

    • @rogerwilco2
      @rogerwilco2 2 года назад +32

      Yes.
      We store over 70 PB of archive data on tapes.
      They have their own failure modes though, but overall it's a good solution.
      We once had a tape robot arm out of alignment, and it knocked a lot of tapes out of the storage.

    • @volvo09
      @volvo09 2 года назад +21

      Yes, if it's for archive, why keep it on actively running drives...

    • @geort45
      @geort45 2 года назад +31

      It's the clear solution, for a normal user an LTO drive is expensive AF, but in his case it'd be cheap compared to a server... and the tape cartridges are very cheap for what they can store... he could have duplicate cartridges of all his data even. Instead he insists on buying bigger and more expensive hardware which is more complicated to mantain and has much more points of failure.

    • @234ne14
      @234ne14 2 года назад +19

      Which is funny, because I thought LTT had tape backups after the... third(?) server crash. Linus did a full review of the LTO-8 thunderbolt dock.

    • @fridaycaliforniaa236
      @fridaycaliforniaa236 2 года назад +1

      Excuse my ignorance, what is a LTO ? (I'm too lazy to search on Wikipedia lol)

  • @Charlie8913
    @Charlie8913 2 года назад +35

    Oh my gosh, they had no automatic scrubs and no automatic e-mail notification when a drive fails? That's absolutely necessary maintenance basics for ZFS...
    I wish LTT luck on restoring their data!

    • @oddeye
      @oddeye 2 года назад

      I wish them luck too, but all the information they tell about how it's important to backup the drives and have multiple backups they don't even follow. Is it just we'll fix it later or the cost to do it isn't a justifiable reason?

    • @SquirreliciousMe
      @SquirreliciousMe 2 года назад

      Also funny as they had multiple videos with sponsors like Pulseway where they brag about having everything monitored (so I guess they don’t use it… or didn’t configure that either…)

  • @dctech4432
    @dctech4432 2 года назад +2

    Ya'll spend A LOT of money on redundancy for data, how about allocating "a reasonable amount of money" to redundant power backup strategies. Generators, solar panels, enterprise UPS w/ some SLA battery banks, or a nice LiPo/LiFe array. Buy yourself some time, with a big enough buffer for power outages. Do an energy audit of what absolutely must never loose power, and consider your options. Custom automating your alternative power sources, or even off loading your grid expenses with alt energy would pay off in MANY ways. You have a roof on that building load it up with some panels. It would make a supreme video series as well!

  • @jeffbillings
    @jeffbillings 2 года назад +92

    LTT, please look into an implementing an LTO 8 tape library as a proper backup to your network pool! Tapes are so much cheaper than drives, and are the preferred archive format for long term. The tape robot and archiving software would do all the hard work and keeping track of data.

    • @QuickQuips
      @QuickQuips 2 года назад +6

      I was about to suggest the same. Newer (1-2 years)/more frequently accessed video goes to the petabyte but classic ones go to tape.
      They only talked about it 3 years ago. ruclips.net/video/alxqpbSZorA/видео.html

    • @VampyWorm
      @VampyWorm 2 года назад

      the one-time tapes would actually make sense only downside is you would have to pay for application to read/write said tapes (i.e commvault). But that isn't all terribly bad.

    • @dakotahsoucy
      @dakotahsoucy 2 года назад

      Tape is definitely a good way to go, especially with a tape library. As for applications to write to the tapes, there are some powerful open-source ones such as Bacula but it might take someone a bit of time to get it up and running.

    • @tooc4n
      @tooc4n 2 года назад +2

      you LOOK like an LTT employee.

    • @vicheaterx
      @vicheaterx 2 года назад +1

      @@VampyWorm Yep, but a BackupExec copy with 1 agent would be in the hundreds or very low thousands USD / yr. Tech support included :))) 46+2 LTO8 tapes would absolutely rule LTT. Just done a 4 drive, 2 autoloader, 2 libraries implementation, it took 3 people around one week to fully set up copy & backup jobs, I'm very impressed with the results!!

  • @ryan0io
    @ryan0io 2 года назад +221

    Please don't use 15 wide vdevs. Groups of 6 wide in raid-z2 is a good choice for spinning rust (4 data + 2 parity). As a zfs user for 10+ years, I cannot imagine running multiple 15 wide vdevs.

    • @namAehT
      @namAehT 2 года назад +36

      Really wide VDEVs are only OK when using SSDs or low capacity HDDs. The rebuild time on a 12 drive VDEV of 12TB drives is insane, and the stress the other disks are under during that period can easily cause one to fail. 6-8 drives on a RAIDZ2 seems to be the sweet spot for large drives, maybe 9 drive RAIDZ3 if you're _really_ paranoid.
      EDIT: I'm also saying this as someone who's running 8TB drives in 9 drive RAIDZ2 VDEVs. I have plenty of slots for more drives, so I'm sticking with 8TB drives for the time being.

    • @tpmeredith
      @tpmeredith 2 года назад +13

      Let alone multiple 15 wide vdevs in raidz2! Even worse. Then 4 of them in one pool? Of course that data was a time bomb.

    • @alessandrozigliani2615
      @alessandrozigliani2615 2 года назад +6

      More than 6 raidz2 using 20tb disks sounds a little edgy. I would require disks rated 1 error over 10^17 bits for that.
      15 is objectively scary with raidz2. 10 with adequate replication or backups would already be edgy.
      With raidz3 maybe 15 is not crazy but you might want to upgrade the pool at some point with 40tb drives or more, if they ever come out. Which would be totally nuts.

    • @ryan0io
      @ryan0io 2 года назад +6

      11 wide z3 vdevs would be the most I'd be comfortable with regardless of ssd / rated error rate. But once at 11 wide z3, why not go (2x) 6 wide z2? One extra drive, one extra parity, more stripping (more performance) more flexibility in adding / removing / replacing devices. All a balance between redundancy / space efficiency and flexibility. To me, 6 drive z2's, and just multiply as needed. Lets think about worst case. For 6 drive z2's, you lose 2 drives. you have a 4 drive "raid 0" to deal with until redundancy is returned. Not great, but not terrible. Email alerts, etc. But a 15 wide z2? No email alerting? 2 drives die you get a 13 wide "raid 0". Good luck.

    • @tpmeredith
      @tpmeredith 2 года назад +2

      @@ryan0io exactly. 100% right. Especially with a linus budget lmao.

  • @ismaela.6973
    @ismaela.6973 2 года назад +52

    I.....I strongly believe he needs to hire an I.T full time to manage and do preventive maintenance on those data servers

    • @tobimai4843
      @tobimai4843 2 года назад +2

      On WAN show he said he thinks about it, also because of the Lab

  • @mbgdemon
    @mbgdemon 2 года назад

    These videos about your big fuckups are by far the most informational and educational videos on your channel... I have a little checklist of shit not to do when I set up a storage system, wouldn't have heard about these pitfalls anywhere else.

  • @MOLINE7708
    @MOLINE7708 2 года назад +35

    Bro, hire a dedicated sys admin. You have too many employees that rely on your server infrastructure to yolo everything yourself. You mention that you, Anthony, and Jake work on it, but they also are writers. You have enough data and infrastructure to warrant a dedicated and experienced sys admin at this point

    • @peterpain6625
      @peterpain6625 2 года назад +9

      I wouldn't want that job. They'll go behind his/her/their back at any opportunity anyways because "it's faster that way" or "reasons". The way LTT grew the IT-Guy job is a surefire way to get PTSD now ;) No way they'll can establish any structure now.

    • @outofahat9363
      @outofahat9363 2 года назад +2

      @@peterpain6625 they know enough to be dangerous

    • @peterpain6625
      @peterpain6625 2 года назад

      @@outofahat9363 They know a lot in some areas and go full Dunning-Kruger in others ;)

  • @nathanielsottung
    @nathanielsottung 2 года назад +228

    I would love to know why tape backups aren't considered. It seems to be one of the more economical options and is great for archival. Also, as a photographer who works with tens of terabytes I would love to learn more about tape backup.

    • @SnotRocket123
      @SnotRocket123 2 года назад +60

      As an actual IT professional, learn about it from literally anywhere other than RUclips.

    • @Lexan_YT
      @Lexan_YT 2 года назад +6

      It would probably be insanely slow if they ever wanted to use the videos to edit from

    • @ker6349
      @ker6349 2 года назад +28

      @@Lexan_YT theoretically they'd use it to reinstall on new drives and use the tape backups as backups and not main drives

    • @marekspacirek
      @marekspacirek 2 года назад +16

      @@Lexan_YT Backup is not main storage. With Dell Powervault TL and IBM Spectrum we are achieving 1-2Gb/s write and read speeds. So restore of that data isn't that insanely slow.

    • @yensteel
      @yensteel 2 года назад +10

      Wow, I just checked, LTO-8 standard goes up to 12 TB per cartridge! It's very interesting!

  • @SuperSmashDolls
    @SuperSmashDolls 2 года назад +378

    All of this was very patiently and thoroughly explained, except for one thing: what happened to that LTO-8 drive you were planning to put into service years ago?

    • @jayred8289
      @jayred8289 2 года назад +28

      I thought that they should have tape back up to

    • @lolish1234
      @lolish1234 2 года назад +41

      @@jayred8289 i mean how does such a big company with so many resources not have a 3-2-1 backuo, even of it's some raw data. It's not like they're short on cash, are they

    • @FormerHumanX
      @FormerHumanX 2 года назад +5

      It was probably a review unit sent to LTT just to make a video and not something they were actively going to implement.

    • @TheDemocrab
      @TheDemocrab 2 года назад +33

      @@lolish1234 Because it's not ridiculously important data, Linus even says in the video that half the reason they bother keeping it around is because they can make interesting videos on it. I wouldn't be surprised if the eventual goal was a 3-2-1 backup system but they wanted to cover setting up each stage in videos which kept slipping cause LMG is pretty busy until we get to today. A lesson into why businesses with large data needs should be hiring their own IT guy.

    • @sarowie
      @sarowie 2 года назад +2

      @@TheDemocrab Setting up a cable testing lab and acquiring more space is more important then building a 3-2-1 backup system?
      Yes "in hind sight" everything is easy to judge, but assuming Linus sets his priority straight, literally he has more issues with monitor cables then his raw video archive.

  • @andreasbrand3191
    @andreasbrand3191 2 года назад +1

    that is exactly the reason why I stopped building my own storage servers and got my first Synology like 10 years ago!
    Obviously I far less storage demand (I got 4TB of triple backed data and 25TB of nice to have original videos and RAW photos backed up ones). All secured via parity, auto-scrubbing, snapshot deduplication etc. I've never run into any issue and I've basically distributed more than 20 of DiskStations in my family and close friend's circle to people with far less IT know how than me... and I'm a different kind of scientist with ok-ish Hobby IT knowledge.
    There is no way on earth I can build something half reliable and convenient as purchasing a Synology or maybe QNAP and put another one up as backup at my parent's place!

  • @JWSpradlin
    @JWSpradlin 2 года назад +165

    With how large these drives are, I would really recommend going with Raid-Z3. I'm not saying larger drives fail more often, but rather resilvering a vDev with large drives takes INSANELY long. And resilvering hammers the remaining drives. Raid-Z1 and Z2 were great with like 2-8TB drives. 20TB? Not so much.

    •  2 года назад +12

      Finally somebody making sense.

    • @Kevin-jb2pv
      @Kevin-jb2pv 2 года назад +1

      This. IDK about the specifics of the different RAID configs, but I do think that it makes a lot more sense to have more smaller drives so that if and when something fails, it has less of a chance to wipe out _everything._

    • @jdatlas4668
      @jdatlas4668 2 года назад +1

      Definitely. Same reasoning why RAID6 isn't considered "good enough" in large drive arrays any more, either.

    • @kozmokohler
      @kozmokohler 2 года назад +3

      **cries in RAID rebuilds** Seriously though how is ZFS not a widely adopted standard of storage?

    • @Momi_V
      @Momi_V 2 года назад +3

      @@Kevin-jb2pv More smaller drives can cost you a lot more though. You need twice as many servers, twice as much space (and also more power and cooling, though that's not much of a concern here). But if a drive has twice as much capacity at nearly the same speed it'd propably be appropreate to think of them as two drives in terms of their redundancy needs (2 out of 15*10 TB is fine, 2 out of 15*20 TB is like 2 out of 30*10 TB drives which is risky)

  • @viridisdraco
    @viridisdraco 2 года назад +229

    Linus, i used to tell my loved ones "there is 2 kind of people in the world. who have a backup and who whish he had" i used to work on storage rack support and i've seen the worst of the worst, including a 24 hour straight marathon to restore a super critical one. but i've also seen a storage rack with all the capacitor blown due to a lightningh strike that fried a little unprotected datacenter.
    so... are you hiring an IT fulltime person now? :P

    • @JoeBlow-ub1us
      @JoeBlow-ub1us 2 года назад +25

      lol this guy is like, "Where do I send my resume?"

    • @calebdevore3395
      @calebdevore3395 2 года назад +11

      @Telleva You deleted their data, and blamed them for not backing it up..?

    • @AlexAlex-jk2tn
      @AlexAlex-jk2tn 2 года назад +5

      Actually there is 3 kind of people in the world. Who have a backup who wish he had and who check that it is possible to restore data from the backup. I mean that lots of companies are thinking that they have backups, but actually they haven't tried to restore data from the backup and it is possible that their "backups" is not recoverable. Just try to restore data from your backup and you might be unpleasantly surprised.

    • @ToothlessSnakeable
      @ToothlessSnakeable 2 года назад +1

      @Telleva I have my stuff saved on icloud and Google photos

  • @pseudonymity0000
    @pseudonymity0000 2 года назад +180

    When backing up, always remember 3,2,1. 3 copy's, 2 local, 1 remote.
    Another important thing not to forget, Raid is not a backup.

    • @ApolloPTT
      @ApolloPTT 2 года назад +16

      The must basic rule of sysadmins

    • @HermanIdzerda
      @HermanIdzerda 2 года назад +3

      This was just on my mind as well. 3-2-1 - I do it at home as well.

    • @pseudonymity0000
      @pseudonymity0000 2 года назад +13

      @L. Kärkkäinen You're right, However, this could be mitigated through tapes. They are actually ideal for this kind of data, as video files are sequential data files. Tape is also archival class, meaning they should not suffer from bit rot over time when stored properly.
      if they need old footage from years ago, they can grab the tape from archive, and it should seek and fetch the data off relatively quickly.
      Tape also solves the offline problem, as they should only be loaded when writing new data, or if you intend to retrieve it.

    • @MAxAMILLIoN757
      @MAxAMILLIoN757 2 года назад

      Why is raid not considered backup? I was considering using a 2 drive raid synology nas for my desktop files, and possibly copying that data to a cloud provider like wasabi as well. Is this not a good solution for “safely” storing my crap?

    • @Mychanel691
      @Mychanel691 2 года назад +3

      Yeah, tape is one of the best solution to offline data storage. It is "old" tech, but it does the job. For personal use i have cloud for archives, but for larger businesses a tape library is a nice touch. Only problem is the software, it can be high priced.

  • @aquarianage3953
    @aquarianage3953 2 года назад +3

    Thank you for sharing, Linus. This is a sobering heads-up video for all of us who seek future dealings with our own DIY servers. Peace.👍

  • @RachelMant
    @RachelMant 2 года назад +163

    The whole way through.. could not stop myself saying "shoulda got a small tape library and backed up to that" - it's very cost effective esp by comparison to the options presented at around 9:20. modern LTO tapes store tens of TB per tape and with LTTs connections, swinging a library, a couple of drives and a full suite of tapes should be no more expensive than a few months of cloud storage while not hurting the power bill - even our small library here at home consumes at most 350W total for the controller shell, expansion shell and all drives + gantry.

    • @Herlehy
      @Herlehy 2 года назад +51

      He also missed on the fact that things like Deep Archive at AWS are answers to this for around $1/TB-Month. Yes, you pay to retrieve it, but in reality, you are rarely ever going to. It is a vault of last resort.
      So it is doable for $1k-$2k a month. Time to do the cost-benefit analysis with more correct values vs the on prem tape vault.

    • @preevetElizabeth
      @preevetElizabeth 2 года назад +20

      Yea offline tape drives seem like the answer to this issue, can even have boxes of tape offsite

    • @Banzai51
      @Banzai51 2 года назад +4

      And recovery would be measured in years. For something this large, tape isn't practical.

    • @HughGordon
      @HughGordon 2 года назад +11

      @@Herlehy He already answered that question though. None of this is mission critical and RUclips is literally providing cloud backup for all the videos and they're paying his company to do it!

    • @SwervingLemon
      @SwervingLemon 2 года назад +4

      Our studies concluded that tape backup fails about 1/4 ÷ nT times, where nT is the number of tapes in the backup. If you recover a tape backup that involved more than 2 tapes, you're already at a coin flip.
      Tape is delicate and requires very careful storage to even work 3/4 of the time.
      Each additional tape adds another chance of failure.
      Petabytes of data on tape would take literally years to back up, years to recover and have a virtually 0% chance of recovery.
      You'd hope that Delta backups would make it more efficient, but they only complicate matters further, sadly.

  • @ihavekalashnikovyoudomath9275
    @ihavekalashnikovyoudomath9275 2 года назад +51

    I would love to see a follow up on this with how much data was saved, and how much was lost. What videos are only on RUclips, and how much they can still refer back to

  • @NoProHarrie
    @NoProHarrie 2 года назад +407

    Moral of this story: hire a IT specialist already Linus.

    • @InventorZahran
      @InventorZahran 2 года назад +17

      Linus: "I am the IT specialist."

    • @gorkskoal9315
      @gorkskoal9315 2 года назад +2

      ^^^^^^^^^^^^^^^^

    • @ticler
      @ticler 2 года назад +5

      They can very well afford midrange EMC or Netapp storages that will be more stable and may be as performant as these toy storages.

    • @Carcinogenic2
      @Carcinogenic2 2 года назад +3

      @@ticler
      They can rot as bad as the 'toy' storages do. It's enough that they don't get attention. And where would the many hours of fun content about it go?

    • @gorkskoal9315
      @gorkskoal9315 2 года назад

      @@ticler THANK YOU *HUG*

  • @acine2122
    @acine2122 2 года назад +5

    Anthony is more than a writer and IT person. He is the true face of LMG, and my hero.

  • @excellentswordfight8215
    @excellentswordfight8215 2 года назад +118

    Having such large raid groups (15 drives) without any hot spares or replacement routines with large drives seems rather dangerous as well. If you already have two dead drives in a vdev its not that unlikly that you will loose a third during the resilver.
    Anyway, ltt:s IT infrastructure has always been a bit of an dumpster fire, but maybe they do it intentionally cause it results in alot of great content 😅
    I wonder if they have thought about connecting an 84 drive SAS expansion to their ssd tier and just have old data migrate to spinning drives (I think seagate has a rebranded dell box if they have a partnership with seagate).

  • @nermanus
    @nermanus 2 года назад +84

    LMG seems like a company where everybody does everything and that can work to a degree if you have just a couple of employees but it's a disaster when you have a bigger business to run.

    • @ericwhite265
      @ericwhite265 2 года назад +7

      @UnjustifiedRecs I don't understand how you easily lose track of a server that should be sending out notifications to someone that drives have died.

    • @sapier
      @sapier 2 года назад +2

      @@ericwhite265 very true. just about every commercial nas software has some notification system for when a drive goes down. you shouldn't have to audit the system to find that there are several that have failed.

  • @primesyndicate272
    @primesyndicate272 2 года назад +239

    "With great power comes great responsibility"
    Watch it Linus, we all know what happens to characters who say those cursed words.

    • @datingzoneo798
      @datingzoneo798 2 года назад

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @Dual_Ralle
      @Dual_Ralle 2 года назад +21

      With great comments comes great botsibilites.

    • @play2windemon208
      @play2windemon208 2 года назад +1

      Didn't know linus had a kid named Peter Parker

    • @bz3086
      @bz3086 2 года назад +2

      Prime,
      At least 3 pro-establishment bots were trained to oppose yours and my viewpoint.
      🤣

    • @johnmorgan6316
      @johnmorgan6316 2 года назад +6

      Anyone else seeing these annoying bots everywhere

  • @nevertakeadayoff
    @nevertakeadayoff 2 года назад

    i will refrain from saying anything negative because i appreciate your honesty.

  • @alanaktion
    @alanaktion 2 года назад +179

    I can’t even imagine building such massive storage servers and then never running a scrub or even manually checking the disks, wow. I have a relatively tiny home server with like 80 TB of storage and I run monthly scrubs, manually verify disks constantly, and make regular cold storage backups.

    • @DarksiderDude
      @DarksiderDude 2 года назад +64

      I've been working IT for the past 5 years, and never scrubbing our drives or verifying disks is unthinkable. LTT need to hire an actual IT guy, not just tech enthusiasts.

    • @drizzle8309
      @drizzle8309 2 года назад +16

      "Relatively tiny" my buttocks... Also, kind of a dick move bro. I think they realise they made a mistake.

    • @deViant14
      @deViant14 2 года назад +16

      @@drizzle8309 yeah a mistake like never checking your fire extinguishers still work...ever...on a 100 story building.

    • @drizzle8309
      @drizzle8309 2 года назад +15

      @@deViant14 lol did you watch the video? none of it was critical
      also, they would've made a video saying "we should've checked our fire extinguishers", to which OP then would've replied that he always checks the fire extinguisher in his "small $3M villa"...

    • @merma9042
      @merma9042 2 года назад +2

      @@deViant14 well in the situation your talking about lives are lost. So that comparison is a reach at best.

  • @joekenorer
    @joekenorer 2 года назад +743

    Linus: "We're not sure who's accountable here, so I'm considering hiring someone to be accountable because the situation is currently untenable without an appropriate system of blame in place."

    • @antiisocial
      @antiisocial 2 года назад +36

      Sounds like my company. Lol

    • @monkyyy0
      @monkyyy0 2 года назад +25

      .... yes
      Caring comes from being the person to blame

    • @saiyadulahmad2012
      @saiyadulahmad2012 2 года назад +14

      Said every CEO ever.

    • @acatch22
      @acatch22 2 года назад +24

      look up "Diffusion of responsibility" to understand what he truly meant.

    • @chloefletcher9612
      @chloefletcher9612 2 года назад +10

      Ahhh you've discovered the world of IT.

  • @danbell8536
    @danbell8536 2 года назад +10

    Very impressed with the honesty on this channel. I know plenty of IT folks who would never admit losing data. I run large ZFS storage arrays at my work. When my primary ZFS array is due for replacement (after moving data and workloads to a new array), I then create a Zpool on the old array configured for max capacity and sequential I/O. I then snap and replicate (zfs send/receive) the data on the primary array nightly to the old array. I don't need a ton of performance or redundancy on the old array as it only receives the changed blocks on each replication and is only used for Oh Sh!t moments. I also HIGHLY recommend you add mirrored "Special" devices to your Zpools. Special devices (man zpool) are used for storing metadata (use SSD/NVME) and removing those I/O's from your slower main Zpool drives. You will be amazed at the performance increase, I promise.

    • @shanemshort
      @shanemshort 2 года назад +2

      you have to be careful with those special devices though, if you happen to have them configured in a non-redundant way and they go away, you drop the entire pool.

    • @alessandrozigliani2615
      @alessandrozigliani2615 2 года назад

      @@shanemshort great advice from you both. I totally agree. But they forgot to scrub a 2PB array and its backup, letting them rot for years. I mean... guys, come on.

    • @mirror1766
      @mirror1766 2 года назад

      @@alessandrozigliani2615 There was a backup? It could be scrubbed?

    • @mirror1766
      @mirror1766 2 года назад

      man 7 zpoolconcepts should be where 'special' is hiding. POSIX compatibility, COW filesystem, and magnetic media with its large seek times makes for a less than idea combination if performance matters.

  • @emilemcgee6031
    @emilemcgee6031 2 года назад

    Honestly the first RUclipsr I regularly watched. And still do

  • @arwenevenstar9843
    @arwenevenstar9843 2 года назад +124

    One thing that was not included in the root cause analysis, is single ZFS vDev per pool. In the case of archival data, using a single vDev per pool and multiple pools per server, makes more sense. This would have potentially allowed more data recovery than multiple vDevs in a single pool. Write once data, (basically what archival is), means you can also fill the pool up higher. Perhaps even 95%.
    That said, of course not making the vDev too wide should also be mentioned. Meaning if you are going with single vDev per pool, don't use a vDev of 16 or more disks. The wider the vDev, the longer the RAID-Z2 re-build time since ZFS may have to read more disks per data block / stripe.
    Good luck.

    • @arwenevenstar9843
      @arwenevenstar9843 2 года назад +16

      Forgot to mention that with larger disks, (like 20TB), using fewer disks per vDev is also suggested. Having week long rebuilds, even with RAID-Z2's 2 disk parity is still pretty risky. So 10 to 12 disks maximum in a vDev, with single vDev per pool is probably optimal on a storage verses cost basis.
      Last, leaving a free disk slot in each server for replace in place is also a good idea. This allows you to replace a failing, but not yet failed, disk, with higher degree of safety than simply pulling the failing disk. Allowing ZFS to read data from the failing disk as well as the rest of the vDev to re-create the failing disk into the replacement disk. Thus, if their are other unknown errors, less chance of data loss. ZFS is one of the few RAID schemes that allows this functionality, (though probably more common today than when ZFS first came out). Of course, this does not help in the case of a completely failed disk, nor in some failing disk cases where it's in bad shape.

    • @racerex340
      @racerex340 2 года назад +6

      yeah, and triple-parity / mirroring for such large drives if you're going to be running on a home-brew system that you're not 100% confident that you'll be notified of errors. This is why enterprise storage and enterprise backup platforms exist.

    • @BassRacerx
      @BassRacerx 2 года назад +1

      If they can't hire a full time IT team to manage this it would make the most sense to contract a 3rd party to manage something like this. it's not "mission critical" it is not "top secret" it's just old already uploaded videos. It would make sense to hire an expert to set up and maintain this for 1k a month wich is about 1/5 or 1/6th of the cost of one full time person.

    • @KP3droflxp
      @KP3droflxp 2 года назад +3

      @BassRacerx but hiring someone wouldn’t get you content which is one of the main motivations of projects like this.

    • @arwenevenstar9843
      @arwenevenstar9843 2 года назад

      @@PBMS123 Good point.

  • @777arc2
    @777arc2 2 года назад +147

    Linus, when you did the cloud pricing calc you missed something- backblaze wasn't showing you the archival level rates (most likely on purpose), for example with Azure it's $0.001 per GB per month as long as you're OK with the delay of accessing archival level files. So more like $1k per month for 1PB which aint bad.

    • @SuperSmashDolls
      @SuperSmashDolls 2 года назад +24

      AWS S3 also has archival tiers that are competitive with tape, though retrieval fees will bite you in the ass if you don't plan around them.

    • @jayred8289
      @jayred8289 2 года назад +8

      Kind of what I was thinking that they should probably be using tape cold storage

    • @ryanjones8977
      @ryanjones8977 2 года назад +11

      I use S3 Glacier Deep Archive for my backups. Ends up being just a few pennies a month.

    • @aurelienlux
      @aurelienlux 2 года назад +7

      @@ryanjones8977 Deep Glacier is good as a last resort because it's so cheap but the retrieval cost is quite significant. So personally I use it as a backup of a backup.

    • @peterh7575
      @peterh7575 2 года назад +7

      @@SuperSmashDolls screw AWS/Amazon. they're horrible.

  • @kwerboom
    @kwerboom 2 года назад +37

    "The rule of two: One is none. Two is one. If it's important, you need a backup." - C. G. P. Grey

  • @ericd4mation
    @ericd4mation 2 года назад +2

    Thanks for pointing out needing to manually schedule a parity check!
    I've been using Unraid and I assumed that it would have scheduled _something_ by default. Nope. Parity hasn't been checked since I set it up in October.

  • @tommiaijala2732
    @tommiaijala2732 2 года назад +76

    Protip: Also setup proper monitoring of system and harddrives so that you react immediately when even ONE drive fails.

    • @blowfly71
      @blowfly71 2 года назад +3

      This. And either constant notification of error condition (email every hour etc) and/or escalation to someone else if not resolved in a particular time frame. Oh and have hotspares

    • @jfolz
      @jfolz 2 года назад +7

      @@blowfly71 just have alerts mate. You don't want constant "everything is OK" messages, because you will start ignoring those real quick and miss the one that says it's no longer OK.

    • @Thomas0918273645
      @Thomas0918273645 2 года назад +2

      @@jfolz he wasn't talking about constant notifications, but ongoing reminders if an error has occurred but wasn't fixed yet. That way you can't miss the single notification of a failed drive.

    • @blowfly71
      @blowfly71 2 года назад +3

      @@jfolz thats what I meant. You have alerts that require action, escalate if not resolved...

    • @jfolz
      @jfolz 2 года назад +2

      @@blowfly71 got it. Though sending constant messages does have a benefit: it's a canary for your monitoring ;)
      It's probably better to have monitoring that monitors the monitoring though.

  • @sasidharasarma8625
    @sasidharasarma8625 2 года назад +150

    Team: Our data is gone
    Linus: So we got our content for today’s video

    • @schmitt00
      @schmitt00 2 года назад +4

      and quite a couple more

    • @JamezMartinez
      @JamezMartinez 2 года назад +2

      as long as they do not lose the data for that video too...

    • @ProTechShow
      @ProTechShow 2 года назад +1

      I do like this about LMG. I've been called in to help with several incidents of a similar nature and the level of stress as people see their livelihoods on the line can be pretty extreme. The fact that LMG can just make lemonade out of it is quite refreshing (pun not intended).

  • @TehDeejus
    @TehDeejus 2 года назад +58

    This is the issue with the massive storage on a single disks like 20TB, it takes forever to rebuild and your more likely to have another failure during the rebuild cycle. Also, you should always have some hot spares so it rebuilds automatically once a failure is detected instead of manually doing it.

    • @JustinDavis90
      @JustinDavis90 2 года назад +1

      declustered parity helps with this issue significantly by getting every disk in the array involved with rebuilding the lost data instead of a single parity disk or two.

  • @verdantia
    @verdantia 2 года назад

    You and your bunch give us so much of yourselves,thank you for putting so much time and precision in all your work.