Our data is GONE... Again - Petabyte Project Recovery Part 1

Поделиться
HTML-код
  • Опубликовано: 3 окт 2024
  • Configure your own workstation at lambdalabs.com...
    Check out Hetzner Cloud and use code LTT22 for $20 off at linustechtips.h...
    It's been a long time since we've had any serious data loss, but on this episode, we're discussing a software misconfiguration that has resulted in us losing an unknown amount of data on our petabyte project storage clusters.
    Discuss on the forum: linustechtips....
    Check out 45Drives at the links below
    Website: lmg.gg/eGo2K
    RUclips: lmg.gg/6ModQ
    Buy Seagate 20TB Exos Drives
    On Amazon: geni.us/WFfs
    On Newegg: geni.us/XhkNI
    Purchases made through some store links may provide some compensation to Linus Media Group.
    ► GET MERCH: lttstore.com
    ► AFFILIATES, SPONSORS & REFERRALS: lmg.gg/sponsors
    ► PODCAST GEAR: lmg.gg/podcast...
    ► SUPPORT US ON FLOATPLANE: www.floatplane...
    FOLLOW US ELSEWHERE
    ---------------------------------------------------
    Twitter: / linustech
    Facebook: / linustech
    Instagram: / linustech
    TikTok: / linustech
    Twitch: / linustech
    MUSIC CREDIT
    ---------------------------------------------------
    Intro: Laszlo - Supernova
    Video Link: • [Electro] - Laszlo - S...
    iTunes Download Link: itunes.apple.c...
    Artist Link: / laszlomusic
    Outro: Approaching Nirvana - Sugar High
    Video Link: • Sugar High - Approachi...
    Listen on Spotify: spoti.fi/UxWkUw
    Artist Link: / approachingnirvana
    Intro animation by MBarek Abdelwassaa / mbarek_abdel
    Monitor And Keyboard by vadimmihalkevich / CC BY 4.0 geni.us/PgGWp
    Mechanical RGB Keyboard by BigBrotherECE / CC BY 4.0 geni.us/mj6pHk4
    Mouse Gamer free Model By Oscar Creativo / CC BY 4.0 geni.us/Ps3XfE
    CHAPTERS
    ---------------------------------------------------
    0:00 Intro

Комментарии • 8 тыс.

  • @The_Keeper
    @The_Keeper 2 года назад +11812

    Linus: "Right, *Now* we won't ever lose data again!
    Data storage: "How many time do we have to teach you this lesson, old man?"

    • @limemason
      @limemason 2 года назад +149

      @Nimki rafa 8 What the fuck?

    • @FelipeGutierrez-me9th
      @FelipeGutierrez-me9th 2 года назад +25

      More like power outages😂

    • @treborrrrr
      @treborrrrr 2 года назад +114

      @@limemason Spam bots, they reply to every comment automatically. Just report and move on.

    • @markaged
      @markaged 2 года назад +1

      *times

    • @defeatSpace
      @defeatSpace 2 года назад +27

      @@limemason Only for fans over 18 years old, where's the confusion coming from?

  • @markclayton8977
    @markclayton8977 2 года назад +4054

    The irony of a cloud storage provider sponsoring this segment is not lost on Linus. I like that.

    • @Electrex8
      @Electrex8 2 года назад +147

      The most amazing part is a backup provider also sponsored the first video on losing their data. Incredible timing.

    • @Time4Technology
      @Time4Technology 2 года назад +184

      @@Electrex8 "Hi we would like to sponsor your next data loss video, can you put us on your waiting list?"

    • @legominimovieproductions
      @legominimovieproductions 2 года назад +20

      I mean backing up a petabyte of stuff on a cloud provider is so fucking expensive, you need to pay huge amounts for bandwidth (even with 200MBps it will take forever), so its not like a realistic option

    • @ZerotheWanderer
      @ZerotheWanderer 2 года назад +5

      @@legominimovieproductions If they built the drive and sent it to the host already loaded/backed up/ready to go, I wonder what the service would run.

    • @pyjama9556
      @pyjama9556 2 года назад +1

      Generously negotiated for future f***ups no doubt!!

  • @Lmpy
    @Lmpy 2 года назад +2861

    LTT never ceases to amaze me on how professional and unprofessional they actually are at the same time.

    • @hambo76
      @hambo76 2 года назад +256

      You just described every corporation and Government in the world.

    • @HiddenChin
      @HiddenChin 2 года назад +25

      Do as i say not as i do.

    • @forresthopkinsa
      @forresthopkinsa 2 года назад +20

      Definitely. But minus the professional part.

    • @bubbaandy89
      @bubbaandy89 2 года назад +56

      Right? I've worked in infrastructure for years, the consumer end videos are awesome and insightful, but the server/infrastructure videos frustrate me so much sometimes...

    • @paulb4334
      @paulb4334 2 года назад +17

      Yet 100% entertaining which is the only metric by which to value an entertainment business ;)

  • @ashleymc5599
    @ashleymc5599 2 года назад +956

    "We never hired a full-time IT person" was stated and I immediately had the urge to bust out the popcorn and look at IT pros in the comment section.

    • @onceuponaban
      @onceuponaban 2 года назад +41

      To be fair many of the LMG staff do qualify as IT pros in skill, if not in formal credentials.

    • @fulcrum7082
      @fulcrum7082 2 года назад +91

      @@onceuponaban no. Just no.
      The constant fuckups shows that they are not

    • @applepie9806
      @applepie9806 2 года назад +16

      The funniest thing is the next two comments under this are from the IT pros.

    • @fulcrum7082
      @fulcrum7082 2 года назад +50

      @@applepie9806 been in IT for 10 years, worked as infrasturcture engineer for hospital, Technical lead for MSP supporting SMEs and finally soloutions architect for a 250m £ company, Im also a freelance consultant I love LTTs vids i normally just have it on in the background whilst im working. but they do make some big mistakes but its all part of the drama :L

    • @SirNarax
      @SirNarax 2 года назад +3

      I am a bit of an IT professional myself.

  • @HAWKF305
    @HAWKF305 2 года назад +4835

    Linus: Hates how USB and HDMI are being named.
    Also Linus: New new new vault

    • @mjtt12
      @mjtt12 2 года назад +204

      If you can't beat them, join them.

    • @robertt9342
      @robertt9342 2 года назад +271

      Well it’s pretty clear. It’s not like it’s named new old vault.

    • @marqs37
      @marqs37 2 года назад +129

      @@robertt9342 Don't give him ideas.

    • @adamjurak708
      @adamjurak708 2 года назад +16

      @@robertt9342 that was my first thought when he said about reusing old voult. It would be new voult build from old voult. So... New Old Voult [short NOV]

    • @Pico2199
      @Pico2199 2 года назад +47

      At least new vault and new new vault aren't being renamed vault 2.0 and vault 2.0 + new

  • @TJ-vh2ps
    @TJ-vh2ps 2 года назад +611

    Postmortem reports like this are hugely valuable, but companies don’t usually share them. This is a great service to the community.

    • @AegisHyperon
      @AegisHyperon 2 года назад +9

      Because companies don't let their storage get to this situation

    • @MajesticBlueFalcon
      @MajesticBlueFalcon 2 года назад +5

      @@AegisHyperon exactly. Companies from the get go have an official IT dept. or outsources it to a competent MSP.

    • @LG1ikLx
      @LG1ikLx 2 года назад +18

      @@MajesticBlueFalcon you would be surprised how many companies mess up. What about if the IT dept didn't do their job properly and skipped over certain things in order to save time?

    • @jdatlas4668
      @jdatlas4668 2 года назад +7

      @@AegisHyperon not true. I do sysadmin for small and midsize businesses, and you wouldn't believe the kinds of things I've had to take over. Usually it's either some guy who does something else at the company and thinks he knows stuff but doesn't, or the work of some usually very mediocre external company.

    • @johngangemi1361
      @johngangemi1361 2 года назад

      @@AegisHyperon oh yes they do.

  • @shwolverine2300
    @shwolverine2300 2 года назад +2611

    Linus: "the way they name HDMI generations are so confusing"
    also Linus: "we move the data from the old vault to new new vault and then name the old vault new new new vault with a bit of upgrade"

    • @jacobleeson4763
      @jacobleeson4763 2 года назад +12

      Lmao

    • @treescompany3462
      @treescompany3462 2 года назад +26

      I hope they pin this

    • @fazz010s
      @fazz010s 2 года назад +3

      Ayyyyy

    • @shawno8253
      @shawno8253 2 года назад +28

      Its the new 14nm +++++++++++

    • @davidyu1813
      @davidyu1813 2 года назад +24

      USB: this guy seems to know his stuff. let's just learn how to name things from him.

  • @OnlineWerds
    @OnlineWerds 2 года назад +145

    As a data center engineer your storage content is my favorite content. I'm terribly sorry for your issues here.

  • @leodoz1016
    @leodoz1016 2 года назад +1047

    Alternate title: The LMG group MIGHT hire an actual IT person

    • @callowaymotorcompany
      @callowaymotorcompany 2 года назад +92

      Linus Media Group group

    • @leodoz1016
      @leodoz1016 2 года назад +18

      @@callowaymotorcompany yes

    • @Achilleaa
      @Achilleaa 2 года назад +24

      Linus LMG group might hire an IT person for new new vault

    • @frozenturbo8623
      @frozenturbo8623 2 года назад +2

      @@Achilleaa With Seagate again for another Vault.

    • @DevamBansal
      @DevamBansal 2 года назад +2

      @@callowaymotorcompany LMGG group?

  • @Cluesman
    @Cluesman 2 года назад +626

    "a lot of power outages" + "transferring that much data might take months" sounds like a recipe for another video in this series.

    • @Carcinogenic2
      @Carcinogenic2 2 года назад +19

      Yeah, on how bad a power grid can be and how important an UPS becomes in such situations.

    • @gorkskoal9315
      @gorkskoal9315 2 года назад +4

      I'll hazard aguess that they keep blowing a fuse. and don't have a generator for the building, or a UPS for the servers.

    • @gorkskoal9315
      @gorkskoal9315 2 года назад +1

      LOL I can see the tittle: Ever try to backup a few sextibytes? or even just a few exobytes?no? well funny thing happend...
      Or "This is awkward...newcubed16 ...."
      Please tell me they have fiber to the new vault and aren't trying to do this over a normal connection.

    • @namAehT
      @namAehT 2 года назад +15

      @@gorkskoal9315 They do have a UPS for their server room, but for a few months they didn't because their UPS caught fire. Also it sounds like they never configured the servers to _safely_ shutdown when the UPS was running low, instead the UPS ran out of power and the servers got plug pulled.

    • @larrylentini5688
      @larrylentini5688 2 года назад +4

      Natural gas backup generators aren't very expensive relative to petabytes of hard drives, they should probably invest in one.

  • @KoSiNeK
    @KoSiNeK 2 года назад +821

    I don't know why, but "server issues" episodes are my favourite LTT videos. Content like this just doesn't exist anywhere else.

    • @rfitzgerald2004
      @rfitzgerald2004 2 года назад +47

      That's what I like too, there's only so many gaming hardware reviews I can stand to watch, they're all much of a muchness to me, but I really enjoy the infrastructure and unusual project videos the most

    • @ryanq.4799
      @ryanq.4799 2 года назад +35

      IMO It feels more real, and a lot like old LTT did, just overall more entertaining to watch than the usual formula

    • @noxious8
      @noxious8 2 года назад +14

      Me too. One of the first LTT videos i watched was the one years ago where Linus, Anthony and Jake doing stuff in the server room on the weekend

    • @UpSideDownTech
      @UpSideDownTech 2 года назад +8

      Right?! The Whonnock Server died video is one of my favorites to watch! I have no reason why, but I just like watching it for some reason😂

    • @Mesmaroth_
      @Mesmaroth_ 2 года назад +11

      Check out Craft Computing if you like home lab server videos. Techno Tim as well for homelab hosting tutorials.

  • @ulbuilder
    @ulbuilder 2 года назад +381

    Your backups must be tested
    So you know they work as expected
    Offline is best
    So you can rest
    When lightening strikes unexpected

  • @lucasmenchone2826
    @lucasmenchone2826 2 года назад +713

    HR meeting with Linus: “All our data has been lost, i’m gonna fire someone…
    But not before i fire up our segway to our sponsor…”

    • @cogYo
      @cogYo 2 года назад +5

      🤣🤣🤣🤣

    • @klaasmuller9663
      @klaasmuller9663 2 года назад +17

      *Segue

    • @DailyCorvid
      @DailyCorvid 2 года назад +1

      Linus is the only person who's adverts I enjoy. Angry Joe started putting tonnes of effort in to his, but they are so forced!! I think Linus actually gets a laugh-kick out of saying LTTSTORE where it's crowbarred into something lol. I know I do, but not as much kick as the coffee in this LTTSTORE FLASK WILL HAVE.
      Linus dude, over all the years I have watched you I don't think I ever credited you properly. Well done man, this thing you've all created is really cool :)

    • @UncleKennysPlace
      @UncleKennysPlace 2 года назад +5

      @@Avendesora Except, of course, it's totally wrong to use one of those words for the other. Unless your server room is so large that you must use a Segway to get to the sponsor.

    • @SuperNGLP
      @SuperNGLP 2 года назад +1

      Gotta make up for that loss of money somehow.

  • @obedulloa6219
    @obedulloa6219 2 года назад +3268

    If Linus manages his data the way he manages hardware... it's no surprise the data dropped

  • @loadnabox1943
    @loadnabox1943 2 года назад +2143

    Linus, I have over a decade of experience in managing multi-petabyte ZFS with five nines uptime in large ISP's. I think you may have the wrong cause of the data and it may not (MAY NOT) be as lost as you think.
    Please reach out to me

    • @JosephGamacheKD0AHS
      @JosephGamacheKD0AHS 2 года назад +151

      Upvoting this to get it seen.

    • @B0A2
      @B0A2 2 года назад +87

      Tweet at him

    • @Jumalten001
      @Jumalten001 2 года назад +17

      No you dont

    • @philb5593
      @philb5593 2 года назад +173

      I would recommend that you reach out to them as well.
      Linus does read a lot of comments, but RUclips isn’t a good way to get a response.

    • @lordelliott42
      @lordelliott42 2 года назад +109

      Email their business email address.

  • @LabGecko
    @LabGecko 2 года назад +241

    Tech Tips' data loss is due to one thing - quantum variability. :D
    The data was in a state of flux until someone audited, at which point it was forced to exist or not exist. Some were observed to be the latter.

    • @HilbertXVI
      @HilbertXVI 2 года назад +1

      Tf are you on about?

    • @LabGecko
      @LabGecko 2 года назад +52

      @@HilbertXVI if you don't like quantum jokes then I'm half-certain there is a dimension on which you didn't comment.

    • @omary5439
      @omary5439 2 года назад +27

      Schrodinger's hard drive?

    • @ThomasGroshong
      @ThomasGroshong 2 года назад +1

      😂

    • @malcomoguji3910
      @malcomoguji3910 2 месяца назад +1

      To be or not to be 😹😹😹

  • @DarrynJones
    @DarrynJones 2 года назад +1840

    "I'm the highest ranking person in the company, the highest ranking person in the IT team, and the person who decided not to hire a dedicated IT staff. There is no way to determine who's accountable here" - Linus 2022

    • @connorwilliams9285
      @connorwilliams9285 2 года назад +100

      Bet he still might not hire one since he 'learned his lesson'. Oh well live and learn!

    • @Dimmers
      @Dimmers 2 года назад +16

      @Connor Williams but by that logic it means he will fix what they failed at and not for anything that may arise. If they don't have a full time or part time IT person then the same or similar issues are doomed to happen again

    • @connorwilliams9285
      @connorwilliams9285 2 года назад +10

      @@Dimmers that's my point, hopefully we see a video posted asking for applications soon so this doesn't happen again!

    • @KP3droflxp
      @KP3droflxp 2 года назад +13

      @Connor Williams it would be quite dumb for them to hire an IT specialist because a good portion of their content is working on their own IT systems.

    • @RalphInRalphWorld
      @RalphInRalphWorld 2 года назад +49

      @@KP3droflxp they need an IT specialist to schedule and perform regular preventative maintenance. Otherwise, their team will just fix things when they break like this video.

  • @billhollinshead
    @billhollinshead 2 года назад +535

    A data *recovery* policy abides with this: "The only 'known-good backup' is one that you *have* successfully restored." 😀

    • @klaernie
      @klaernie 2 года назад +12

      There is even the question, if the old Premiere projects are still loadable in current software versions..

    • @khatarin
      @khatarin 2 года назад +12

      Former Data Protection Product Manager here for some 30-40k servers at my old job: Yes. :)

    • @feesh9977
      @feesh9977 2 года назад

      Eeplwllwlwl

    • @ProTechShow
      @ProTechShow 2 года назад +1

      This is the way

    • @DAndyLord
      @DAndyLord 2 года назад +7

      When discussing redundancy, one is none, two is one. That's how I discuss backup options with my clients. If it's mission critical you need a layered backup system.

  • @paulbrooks4395
    @paulbrooks4395 2 года назад +66

    I worked for an MSP where they had fired the previous person in charge of backups. I was on the infrastructure team. We found that 65% of our customer backups were no good and something like 85-90% of offsite replication was failing. It was 8 months before we could return all the backups to normal and reduce the back checks workflow to less than a few hours per week. During the 8 months, it spend the first 4 working to get the backups all straightened out with almost every hour of my workday.
    Suffice to say, having an ops team with competent people who are organized and themselves redundant and able to check each other’s work without judgement is absolutely paramount for a team in charge of critical systems.
    I personally love working on backups because it’s a silent way to ensure continuity while working with amazing technologies.

    • @capps1994
      @capps1994 2 года назад +5

      As someone in IT I know the pain, one thing I go by is that you don't have a valid backup unless you have tested it. I've had some times (granted back in the day like 8 - 9 years ago) where the software would say its a good back up. god forbid that you need to restore as it will just fail. they are very fun times they are

    • @Phoen1x883
      @Phoen1x883 2 года назад +3

      Good worker! Providing billable service with no ongoing expenses like "maintenance" or "checking the backups".
      -most MSP management, probably

    • @aravindpallippara1577
      @aravindpallippara1577 2 года назад

      @@Phoen1x883 well if you can do something in company time to help the company bottom-line... You should?
      There sre things like loyalty and good will even in corporates

  • @cromulence
    @cromulence 2 года назад +20

    I’m responsible for our SANs at work and there’s something else that wasn’t touched on in this video - make sure you configure email reporting from your storage nodes! The sooner you’re notified about issues, the sooner corrective action can be taken. Additionally, if possible, keep hardware spares at each site where the hardware is, so if a drive has failed (or even if it’s in a predictive failure statue), you can swap a new drive in ASAP. Same goes for other hardware, such as controller cache batteries; these too can fail, and can do so silently, allowing the node to continue working, but with degraded performance.
    TL;DR - Keep an eye on your infrastructure and monitor it!

    • @Nickwilde7755
      @Nickwilde7755 Год назад

      This. If they had been notified from the first drive, this most likely would've been prevented

  • @SimonPoirier
    @SimonPoirier 2 года назад +944

    Other Pro tip: if building such a large scale storage, make sure your disks are from different manufacturing batches. Imagine the nightmare is having disks with consecutive serials wearing out and failing almost at the same time.

    • @lostintechnology1851
      @lostintechnology1851 2 года назад +102

      Or they could just buy a professional backup solution and get proper training operating it plus a maintenance contract. You know the way every real enterprise would do it :D

    • @entelin
      @entelin 2 года назад +168

      @@lostintechnology1851 It's a different situation, he said this is non essential archival footage, the creation of these servers created content, the failure of it created content, and yeah, backing that stuff up would cost a lot of money... so risk/reward. The best option isn't necessarily always the right option.

    • @gamingbud926
      @gamingbud926 2 года назад +6

      That is... a pretty smart idea.

    • @lupsik1
      @lupsik1 2 года назад +4

      @@entelin Didnt watch the video yet but it sounds like something that RAID 5 would solve instantly and would cost them barely any storage with that many hard drives

    • @wrash
      @wrash 2 года назад +6

      Yuuup.
      Had 3 out of 7 Seagate drives in a preconfigured server die within a month. Only some data was lost, fun times.

  • @andrewnotmyrealname7827
    @andrewnotmyrealname7827 2 года назад +735

    All techs: "Follow this advice!"
    Those same techs: "YOLO"

    • @placate9051
      @placate9051 2 года назад +45

      Ay gotta know the rules before you break them

    • @datingzoneo798
      @datingzoneo798 2 года назад

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @K-----
      @K----- 2 года назад +3

      To be fair it's more, follow this advice if X and then the same techs don't really have X. He basically said that at 9:27

    • @snowysysadmin59
      @snowysysadmin59 2 года назад +4

      Ok but we all know linus has said before "do as i say, not as i do"

    • @typerightseesight
      @typerightseesight 2 года назад

      DO WORK!

  • @technogamer18
    @technogamer18 2 года назад +915

    “This caused the array to offline itself to prevent further degradation”
    …Been there, array. Been there.

  • @karenwang313
    @karenwang313 2 года назад

    Mad props for coming out and saying you guy screwed up. All of us can learn this and hopefully not lose any data of our own.

  • @viridisdraco
    @viridisdraco 2 года назад +229

    Linus, i used to tell my loved ones "there is 2 kind of people in the world. who have a backup and who whish he had" i used to work on storage rack support and i've seen the worst of the worst, including a 24 hour straight marathon to restore a super critical one. but i've also seen a storage rack with all the capacitor blown due to a lightningh strike that fried a little unprotected datacenter.
    so... are you hiring an IT fulltime person now? :P

    • @JoeBlow-ub1us
      @JoeBlow-ub1us 2 года назад +25

      lol this guy is like, "Where do I send my resume?"

    • @calebdevore3395
      @calebdevore3395 2 года назад +11

      @Telleva You deleted their data, and blamed them for not backing it up..?

    • @AlexAlex-jk2tn
      @AlexAlex-jk2tn 2 года назад +5

      Actually there is 3 kind of people in the world. Who have a backup who wish he had and who check that it is possible to restore data from the backup. I mean that lots of companies are thinking that they have backups, but actually they haven't tried to restore data from the backup and it is possible that their "backups" is not recoverable. Just try to restore data from your backup and you might be unpleasantly surprised.

    • @ToothlessSnakeable
      @ToothlessSnakeable 2 года назад +1

      @Telleva I have my stuff saved on icloud and Google photos

  • @makingtechsense126
    @makingtechsense126 2 года назад +590

    Tape (LTO-9) is still an affordable option for backups. Especially for data that doesn't change. Yeah, it's old tech but it still works.

    • @mkastelovic
      @mkastelovic 2 года назад +59

      Yep, completely agree with you, Tape library with LTO 9 tapes will be much safer. And it isn't so slow as people think. :)

    • @jspafford
      @jspafford 2 года назад +28

      @@mkastelovic 250-300MBps. And they have worm tapes. And by using a dual drive tape robot, it makes backups completely automated. Restores too. Backing up to individual LTO drives having to load tape after tape is too much labor. Backups will never get done.

    • @unlink1649
      @unlink1649 2 года назад +71

      Modern tape storage has INSANE capacity. We are talking 32 petabytes per rack. ETERNUS DX600 S5 is one such system.

    • @mkastelovic
      @mkastelovic 2 года назад +14

      @@jspafford Well, if you have the library, the backup is done automaticaly, plus in their case, we are speaking about the incremental backups, where most of the old videos doesn't change at all ;), so Backup will be done during the night.

    • @jojojojo4332
      @jojojojo4332 2 года назад +7

      I agree with all of you, except for one thing. Linus has expressed that he has quite a lot of data that he says isn't that important. Meaning that buying a tape robot, would be quite a expensive investment. Maybe not even worth trouble.

  • @QualityDoggo
    @QualityDoggo 2 года назад +855

    Just hearing "never hired a full time IT person" makes me go "uh oh... I don't like where this is going..." a good sysadmin who can help protect systems is a valuable part of any modern company

    • @danielgomez7236
      @danielgomez7236 2 года назад +153

      The world's biggest IT youtube channel, there's no IT guy

    • @darthkarl99
      @darthkarl99 2 года назад +177

      Classic case of responsibility creep. As Linus and others have become responsible for more stuff as the company has grown their ability to handle routine IT maintenance duties has dropped off, and because it's happened slowly over time it's never quite shown up on anyone's radars as a matter of concern.

    • @uwirl4338
      @uwirl4338 2 года назад +23

      Yeah, because just so you know, only other sysadmins value sysadmins. It's an extremely simple job, so the rest of us think we can do it, and we sure can until our real job prevents us. If only we could teach monkeys a couple of bash commands and have them be sysadmins for a couple dozen bananas.

    • @chrismcveigh4498
      @chrismcveigh4498 2 года назад +38

      As a sysadmin/sysengineer, unfortunately these guys although knowledgeable, aren’t professionals and works doesn’t always mean works properly :/

    • @Habdabi
      @Habdabi 2 года назад +5

      That's why the sys admin job is dying out and most mid sized companies pay less to move it to cloud based systems that are more reliable (for now, until the price gets hiked)

  • @acine2122
    @acine2122 2 года назад +5

    Anthony is more than a writer and IT person. He is the true face of LMG, and my hero.

  • @brodur
    @brodur 2 года назад +331

    I am very interested to see how the recovery process goes. As someone who has only ever done disaster recovery in the realm of terabytes... yikes. Good luck friends.

    • @detingzonen7048
      @detingzonen7048 2 года назад

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @FireWyvern870
      @FireWyvern870 2 года назад +20

      Damn, these bots
      #RUclipsKilledTrustedFlagging

    • @theluigifan42
      @theluigifan42 2 года назад +2

      these bots out here calling youngboy "extravagant"

    • @leexgx
      @leexgx 2 года назад +1

      What I don't und2is why isn't auto mod capturing then (when ever I post a link 90% of the time my post gets auto modded, it disappears)

    • @FireWyvern870
      @FireWyvern870 2 года назад +1

      @@marcogenovesi8570 both are problems. One is not higher than the other.

  • @OfficialSamuelC
    @OfficialSamuelC 2 года назад +556

    I feel Jake holds a lot more of LTT together with his expertise than we think. Underrated!

    • @riks.1773
      @riks.1773 2 года назад +40

      Fact he takes the time to actually look and uncover this is enough to be praised employee of the month

    • @romanbaranovichi5375
      @romanbaranovichi5375 2 года назад +20

      It also helps that he's worked there from when they were getting serious about their data storage, so he knows the reasoning behind why the things are set up the way they are

    • @kstenders
      @kstenders 2 года назад +6

      @@riks.1773 usually you set up a monitoring with alerting for checking the health state of your storages.

    • @riks.1773
      @riks.1773 2 года назад +2

      @@kstenders yes, but i never assumed they configured that... because other simple things i´ve seen get overlooked

    • @VanlockFR
      @VanlockFR 2 года назад +3

      @@riks.1773 as Linus explained, it's routine checks that they should have been doing monthly, for years. AND they didn't set any email alerts so they never got notified of the failures !

  • @Kblender798
    @Kblender798 2 года назад +125

    Please adopt LTO tape backups into your workflow! It's indispensable as a deep storage solution, especially within my field of work (film industry).

  • @brice0403
    @brice0403 Год назад +11

    When Linus says that something is "nobody's fault" it usually means it was his fault 😂

  • @Jordan_C_Wilde
    @Jordan_C_Wilde 2 года назад +115

    "We lost a sh*tload of video data, lets make an educational video about it" - Most Linus thing ever

  • @JeffGeerling
    @JeffGeerling 2 года назад +572

    I think we all have a lot of cases of 'didn't follow our own advice' in the storage/DR world. Unless it affects your bottom line, backups and DR tend to be lower on the priority list.
    And lower on the priority list usually means either "not configured at all" or at minimum "never been tested before" :(

    • @SodaWithoutSparkles
      @SodaWithoutSparkles 2 года назад +4

      Always test your backup and fail-safe. There is no use of having a backup but it doesnt work ar all.
      Dont just do backup, TEST your backup

    • @Deerhunter360
      @Deerhunter360 2 года назад +6

      @Nimki rafa 8 shut up bot

    • @lostphotographs3936
      @lostphotographs3936 2 года назад

      As a fellow Repair and Recovery guy in the SS world we sell hundreds of drives globally to guys in that very situation. TRUST ME !
      new new vault...... " vault 3 " ..... 😇

    • @ImAManMann
      @ImAManMann 2 года назад +1

      I always follow my advice for backing up data because there is a simple rule... if you back up your data, you won't need the backup, if you don't back up your data you WILL need the backup.

    • @waspennator
      @waspennator 2 года назад +1

      Backups and UPS should be essentials at this point, lost drives on my old comp cause I had the "bright" idea to use it in the middle of a bad wind storm with only a surge protector.

  • @GTRShaun
    @GTRShaun 2 года назад +223

    In the takeaways at the end of the video, there was no mention of monitoring. If zfs zed was configured to email somebody/service desk on events like drive failure, this disaster could have been averted by replacing failing drives one at a time as they failed instead of accidentally finding the house of cards your enterprise is built on. Monitoring for failure should have been the most prominent takeaway.

    • @davidbubble6863
      @davidbubble6863 2 года назад +8

      My take away is no system is safe from hard drive failure and owner of system this big should hire someone dedicated to take care of it.

    • @yensteel
      @yensteel 2 года назад +15

      Thought it was weird too. An email as soon as one drive fails could reduce response time. The number of drives they are handling meant the chances of 2 or more failing at the same time is pretty high.
      What about reserve drives to automatically repair when one degrades? Not foolproof but a good start. For bit rot, more frequent scrubbing?

    • @glenby2u
      @glenby2u 2 года назад +3

      even a post power outage check or weekly job for an intern... oh well. once is a mistake, twice is a problem, thrice = low value asset.

    • @rosen9425
      @rosen9425 2 года назад

      My thoughts too. File it under "mistakes where made", it's the big locker you can't miss 😁

    • @NumptyMcNumptyface
      @NumptyMcNumptyface 2 года назад +5

      Not just configure it, also test that configuration. I've worked at a place where the storage system was set up to send an email in case of pending doom. Problem was it wasn't configured correctly so the emails never reached their recipiant.
      How did they found out about the impending doom? Well, the system also gave off a sound alert as well as flashing a LED which were only noticed when I was given a tour of the server room.

  • @ericd4mation
    @ericd4mation 2 года назад +2

    Thanks for pointing out needing to manually schedule a parity check!
    I've been using Unraid and I assumed that it would have scheduled _something_ by default. Nope. Parity hasn't been checked since I set it up in October.

  • @jstadler
    @jstadler 2 года назад +358

    As a full time Sysadmin i always wondered how you guys sustained your data without a real backup plan. As it turns out now, you didn't. Really sorry to hear that guys!
    That's exactly why people like me get hired. Companies think they can do it on their own until they lose critical data to misconfigs and missing maintenance. Hurts to learn it the hard way.
    I really recommend you guys to create offline backups to tape storage for all your archived content.
    And respect for admitting having it done wrong so others can learn!
    Keep on making such great content!

    • @heavyq
      @heavyq 2 года назад +16

      I'm not a sysadmin, just a network guy that dabbles in sysadmin stuff and yeah, it blew my mind to hear what happened here. If they open a spot to hire an IT guy I think I'm gonna apply :D

    • @TheGruwy10
      @TheGruwy10 2 года назад +4

      Get this dude hired, quick!

    • @GrayMatter70
      @GrayMatter70 2 года назад +12

      I'm not a sysadmin either, but I'm also surprised they didn't catch the offline drives earlier. Even without the regular data scrubs, basic monitoring should have caught that. As for tape backups, I agree but also advise caution that tape backups can fail too, so they need to be planned properly. I've done tape backups myself but that was a long time ago.

    • @StarFireG3
      @StarFireG3 2 года назад +6

      Yep. I'm doing this for 25 years now. I worked for a couple of companies with big raid systems but no backup. It's a struggle to get the responsible persons to buy sufficient backup systems. In one case only one week after installing the backup solution and having the first full backup, the main raid system failed and died. Without backup this company would have gone out of business completely. I have seen this happen to companies before.

    • @brighton_geek
      @brighton_geek 2 года назад +2

      You would need one hell of a tape array to backup that kind of data not to mention it would take forever! I don't see tapes a practical offline backup solution for this quantity of data for a company LTT's size. It is better to off have a duplicate server in a DC with clean power and resilient backups and replicate the data, that would act as backup and be a suitable DR solution.
      Without backups I do wonder if they have a BCDR plan in place also?

  • @BiffaPlaysCitiesSkylines
    @BiffaPlaysCitiesSkylines 2 года назад +533

    Up to 80tb myself and needing more soon....! This hoarding raw footage is a nightmare 🤣

    • @Briceronie
      @Briceronie 2 года назад +33

      hey i watch your cities skylines videos. hope your day is going well. much love

    • @TheMallaclllypse
      @TheMallaclllypse 2 года назад +24

      Hello everybody and welcome back to the next episode of fix my NAS.

    • @StrokeMahEgo
      @StrokeMahEgo 2 года назад +4

      Consider cloud, or tape based backups that you mail to a trusted friend or put it in a safety box at a bank.

    • @BiffaPlaysCitiesSkylines
      @BiffaPlaysCitiesSkylines 2 года назад +10

      @@Briceronie hi, thanks 😊

    • @BiffaPlaysCitiesSkylines
      @BiffaPlaysCitiesSkylines 2 года назад +7

      @Malaclypse The Elder yes, that'll be me soon lol 😆

  • @DangerousDac
    @DangerousDac 2 года назад +195

    Well this "presentation" format certainly has a different energy to it than Whonnock died.

    • @philb5593
      @philb5593 2 года назад +31

      The vault is hardly the beating heart of the company that whonnock was, and sounds like this unfolded over the course of days and weeks as Jake found the issues and they are still working on rebuilding the data.
      The vault is just archive data. Whonnock is the in progress projects, and I think at that time Linus said there was no backup.

  • @RobertCrawfordRobert4049
    @RobertCrawfordRobert4049 2 года назад +16

    As soon as they switched from storage spaces I kind of saw this coming; I've got a 912tb S2D cluster that serves as storage for about 200 or so virtual machines and it's been rock solid and performance with NVME cache has been solid. One of the things I saw on Spiceworks was a warning about over engineering infrastructure.

  • @moralapostel
    @moralapostel 2 года назад +402

    Big mistake to immeidately replace the drives that weren't even dead, which just showed some failures. By removing them LTT removed all the (still good) parity data on those. Probably should've run a scrub first, and then remove the possibly malfunctioning drives.

    • @hallif7295
      @hallif7295 2 года назад +2

      Wouldn't that take a long time tho?

    • @bkrich
      @bkrich 2 года назад +2

      Yeah I was thinking the same

    • @AyoKeito
      @AyoKeito 2 года назад +8

      I'm pretty sure those wouldn't survive a scrub either.

    • @bkrich
      @bkrich 2 года назад +29

      @@AyoKeito we wouldn’t know they for sure but we do know it didn’t survive the replacements

    • @dracotrapnet
      @dracotrapnet 2 года назад +6

      If they are offlined, they are already dirty parity data.

  • @waveformdistortion
    @waveformdistortion 2 года назад +49

    Well if you hadn't made this video, I never would have known to check if automatic scrubbing was enabled on my storebought NAS. It wasn't. I don't believe it's ever suffered a power failure, being connected to a UPS and configured for automatic shutdown when the UPS drops below 50% battery since day one, so no automatic scrub on resume either. It's now set to automatically scrub once a month, so thanks!

    • @linusnexus9000
      @linusnexus9000 2 года назад +1

      Same here on a Synology box, thanks to your comment I checked and noticed it wasn't enabled either. I also activated a monthly schedule :)

  • @TristensMadness
    @TristensMadness 2 года назад +322

    Please be server room related. I’ve been craving some of that content recently

    • @toxicxshotsx
      @toxicxshotsx 2 года назад +13

      Me too man!! Also ^s/o to the milfs in the 20 mile radius comments ahah

    • @williamprimeee
      @williamprimeee 2 года назад +2

      yeah we all wana see his server ;)

    • @gabrielrojasg.3180
      @gabrielrojasg.3180 2 года назад +1

      I started following Linus by server content haha

    • @SuperNGLP
      @SuperNGLP 2 года назад +4

      You just have to wait until something goes wrong and boom new server content!
      Maybe we pay seagate to send Bad drives, so we get new content sooner?
      Sounds like a good, reasonable idea.

    • @frozenturbo8623
      @frozenturbo8623 2 года назад +1

      Wait until Seagate fails again in Vault 3 then we have Vault 4 until we got into Vault 76 and That marks the End of Seagate.

  • @mikkelp1234
    @mikkelp1234 2 года назад +1

    You should consider using tape backup. Cheap backup option for a lot of data

  • @Tetraknot
    @Tetraknot 2 года назад +90

    Love your show! Just wanted to chime in here coming from an IT background supporting large companies in datacenters as well as being a content creator. Trying to maintain an accessible RAID of ever growing content only gets more difficult and expensive over time. You will eventually need a full time employee to manage your content if you go this route and at some point you will need to migrate your entire content to a new RAID when 1 petabyte isn't enough anymore and that's not going to be fun.
    The alternative cheaper and simpler solution is to archive your content to tape which will have a much higher chance of surviving the years to come as it's not on spinning platters that run 24/7. Yes, getting access to a piece of content you want to grab on short notice will be more annoying but you can always keep a smaller RAID with your completed videos and archive your raw content via tape as it's the RAW video content that really eats up the TB which is why you might want to consider archiving your raw video.

    • @pixelmaster98
      @pixelmaster98 2 года назад +8

      just build a giant data center that uses robots to automatically fetch & read tapes, so it's at least automated, even if it still takes half an hour. Building a data center is probably also great content for the channel ^^
      /s

    • @alextraska
      @alextraska 2 года назад +6

      @@pixelmaster98 yea until crash override and acid burn have a hacking battle with your tape robots

    • @zicklane
      @zicklane 2 года назад

      Ok no one asked

    • @blademan7671
      @blademan7671 2 года назад +3

      This response from a pro is why you would leave a job like this to pros. As this pro demonstrated, #1 is identifying and understanding the requirements. Do you really need all your old content available online, or maybe offline is good enough? Then solution to fit the needs.

    • @geoff_cline
      @geoff_cline 2 года назад

      This could also be done with AWS Glacier

  • @normandabald6501
    @normandabald6501 2 года назад +691

    The second most important thing to consider about backups, behind actually having them in the first place, is TESTING THEM!
    If you don't test your backups then you don't have backups.

    • @jonathanbuzzard1376
      @jonathanbuzzard1376 2 года назад +6

      Only if you have shit backup software. Last year I did a restore of our main HPC file system after and upgrade, everything came back. The only "testing" necessary is the occasional restore when users have done daft stuff and deleted files by accident. Then again I have a "proper" backup system in IBM Spectrum Protect (nee TSM). If you use toy backup systems (aka everything else in my view) then yeah test them regularly.

    • @zazethe6553
      @zazethe6553 2 года назад +16

      This is not a backup system, it's live storage.
      But you are right.

    • @johngangemi1361
      @johngangemi1361 2 года назад +2

      Agreed

    • @jacquesb5248
      @jacquesb5248 2 года назад +3

      yeah actually checking that the backups are running

    • @jonathanbuzzard1376
      @jonathanbuzzard1376 2 года назад +10

      @@jacquesb5248 Nope if you have to "check" that your backups are running then you are doing it wrong. This should be integrated into your monitoring system so you get told that your backup *DIDN'T* run. Checking manually is prone to someone forgetting or been on holiday or insert a thousand other reasons. Also getting told daily that you backup ran also becomes an issue where it is seen as background noise and you get bored checking the same report day in day out. Basically being notified something is as expected is the wrong way to do anything. You need to be notified that something is *NOT* as expected, in this case the backup didn't run to completion without errors.

  • @captdev
    @captdev 2 года назад +522

    As an operations engineer, the amount of red flags that the process you followed here brought up was terrifying. Please write processes for this sort of stuff and test them - it's all fun and games till you lose something essential because of a stupid decision from 5 years ago

    • @williameldridge9382
      @williameldridge9382 2 года назад +32

      Not to mention they used Seagate drives. They are just completely unreliable. I wouldn't trust them in any circumstance. I've hundreds of Seagate drives due to failure, but only a handful of WD/Hitachi. It isn't surprising as Seagate purchased the worst hard drive company that ever existed, Maxtor. And they didn't learn their lesson, they got even more Seagate drives.

    • @jrdemasi
      @jrdemasi 2 года назад +13

      Why anyone trusts this guy for basically anything is beyond me. Lol.

    • @mikex4941
      @mikex4941 2 года назад +7

      @@williameldridge9382 Got a different experience. I'm still rocking Seagate and WD drives while all of my Hitachi drives from the same era as all my other drives died. But not sure right now though.

    • @esbekay
      @esbekay 2 года назад +2

      seriously, its hard to watch

    • @JLeYang
      @JLeYang 2 года назад +20

      @@williameldridge9382 Hard drive manufacturers have all had bad batches, it's just the nature of the beast now. I have had failures from all brands in usage. You should see hard drives as a consumable (especially as a storage array), run SMART and replace when health is detected as bad. The bigger issue is people not doing backups, that's a failure on you and your users to not enforce that.

  • @timoonitamarooni
    @timoonitamarooni 2 года назад

    I'm so sorry you've had to deal with this! I haven't watched the full video yet (I'll get back to it) so I'm not sure if this is something mentioned but, in terms of operational controls to prevent scope creep/creepback / operational swiss-cheesing, RACI matrices are good tools used correctly, it might seem super unnecessary or tedious but in terms for defining infrastructure maintenance (or other) tasks and roles and who does what (not to be overly prescriptive or take away from a lax culture but as a documentation tool so there's no confusion as to who does backups, who does audit, who makes sure ssl certs are up to date and how often etc) it is super effective.
    That being said technical failures are ultimately unavoidable, hopefully some of that loss was transferred via insurance? Best of luck going forward y'all

  • @NoProHarrie
    @NoProHarrie 2 года назад +408

    Moral of this story: hire a IT specialist already Linus.

    • @InventorZahran
      @InventorZahran 2 года назад +17

      Linus: "I am the IT specialist."

    • @misham6547
      @misham6547 2 года назад +7

      Or a cybersecurity expert, would make for really interesting videos

    • @gorkskoal9315
      @gorkskoal9315 2 года назад +2

      ^^^^^^^^^^^^^^^^

    • @ticler
      @ticler 2 года назад +5

      They can very well afford midrange EMC or Netapp storages that will be more stable and may be as performant as these toy storages.

    • @Carcinogenic2
      @Carcinogenic2 2 года назад +3

      @@ticler
      They can rot as bad as the 'toy' storages do. It's enough that they don't get attention. And where would the many hours of fun content about it go?

  • @MOLINE7708
    @MOLINE7708 2 года назад +35

    Bro, hire a dedicated sys admin. You have too many employees that rely on your server infrastructure to yolo everything yourself. You mention that you, Anthony, and Jake work on it, but they also are writers. You have enough data and infrastructure to warrant a dedicated and experienced sys admin at this point

    • @peterpain6625
      @peterpain6625 2 года назад +9

      I wouldn't want that job. They'll go behind his/her/their back at any opportunity anyways because "it's faster that way" or "reasons". The way LTT grew the IT-Guy job is a surefire way to get PTSD now ;) No way they'll can establish any structure now.

    • @outofahat9363
      @outofahat9363 2 года назад +2

      @@peterpain6625 they know enough to be dangerous

    • @peterpain6625
      @peterpain6625 2 года назад

      @@outofahat9363 They know a lot in some areas and go full Dunning-Kruger in others ;)

  • @wesrihn
    @wesrihn 2 года назад +261

    Ahhh, the reason I originally subbed to LTT, insane server builds and configs.

    • @theairaccumulator7144
      @theairaccumulator7144 2 года назад +13

      Insanely bad and mismanaged server builds

    • @UrielZeptim
      @UrielZeptim 2 года назад +5

      @@theairaccumulator7144 the point still stands

    • @anona1443
      @anona1443 2 года назад

      And lots of dropping expensive hardwares

  • @arjunyg4655
    @arjunyg4655 2 года назад +115

    Ah and now once again we understand why enterprise storage is so expensive, you pay for the support and for the software that makes it impossible for this kind of stuff to happen. Maybe older wiser Linus will consider something more serious next time.

    • @rya3190
      @rya3190 2 года назад +10

      I mean, he did say none of this is mission critical, and really more of a luxury...and an excuse to experiment with high tech toys.

  • @marcozanuttigh2060
    @marcozanuttigh2060 2 года назад +60

    the perfect opportunity for testing out tape backup! i had 4hdd's failing at the same time in my raid 6 storage server with total data loss. i recover all my data from my tapes! it was only 80TB of data, but when come to price for large backups, tape is king!

    • @heavyq
      @heavyq 2 года назад +3

      Tape is so underrated by so many people. It's such a great choice for storing a shitload of data for long periods of time.

    • @dakyno
      @dakyno 2 года назад

      "only 80TB" bruh

    • @geort45
      @geort45 2 года назад

      @@heavyq problem is the drives cost a shitload...

  • @ismaela.6973
    @ismaela.6973 2 года назад +52

    I.....I strongly believe he needs to hire an I.T full time to manage and do preventive maintenance on those data servers

    • @tobimai4843
      @tobimai4843 2 года назад +2

      On WAN show he said he thinks about it, also because of the Lab

  • @myname7021
    @myname7021 2 года назад +149

    10:30 and most importantly: monitor your environment! SNMP, Syslogs and even specialized monitoring agents are an easy way to monitor your environment.

    • @grrkaa8450
      @grrkaa8450 2 года назад +3

      PRTG has entered the chat

    • @towel2473
      @towel2473 2 года назад +15

      The irony is that they advertise these products in segways but don't implement them it seems.

    • @BTMikeMan
      @BTMikeMan 2 года назад +6

      @@towel2473 I was going to say, did they not have Pulseway deployed :)

    • @adg1355
      @adg1355 2 года назад +1

      Rather messages from SMART and HBA utilities.

  • @user-ei7ed6zy9k
    @user-ei7ed6zy9k 2 года назад +12

    You’re a multimillion dollar company with never having an infrastructure consultant to some degree?

  • @chaos.corner
    @chaos.corner 2 года назад +49

    "Never underestimate the bandwidth of a station wagon full of magnetic tapes hurtling down the highway."

    • @jrevillug
      @jrevillug 2 года назад +4

      Thank you Professor Tanenbaum.

  • @perrygolden
    @perrygolden 2 года назад +169

    When your downtime and data loss is measured in lost $, hiring full time systems engineer becomes a very attractive value proposition.

    • @gabrielenitti3243
      @gabrielenitti3243 2 года назад +13

      i don't think any of this will produce any downtime for his company. The Petabyte worth of data he may loose as he said is just a "nice to have". It's not the actual production server where they store the current projects and videos. His employees may not even know about this data loss.

  • @cbrugiati
    @cbrugiati 2 года назад +182

    There's another type of storage for enterprise who needs a lot of storage. LTO is a lot better when it's too much data like you have

    • @rogerwilco2
      @rogerwilco2 2 года назад +32

      Yes.
      We store over 70 PB of archive data on tapes.
      They have their own failure modes though, but overall it's a good solution.
      We once had a tape robot arm out of alignment, and it knocked a lot of tapes out of the storage.

    • @volvo09
      @volvo09 2 года назад +21

      Yes, if it's for archive, why keep it on actively running drives...

    • @geort45
      @geort45 2 года назад +31

      It's the clear solution, for a normal user an LTO drive is expensive AF, but in his case it'd be cheap compared to a server... and the tape cartridges are very cheap for what they can store... he could have duplicate cartridges of all his data even. Instead he insists on buying bigger and more expensive hardware which is more complicated to mantain and has much more points of failure.

    • @234ne14
      @234ne14 2 года назад +19

      Which is funny, because I thought LTT had tape backups after the... third(?) server crash. Linus did a full review of the LTO-8 thunderbolt dock.

    • @fridaycaliforniaa236
      @fridaycaliforniaa236 2 года назад +1

      Excuse my ignorance, what is a LTO ? (I'm too lazy to search on Wikipedia lol)

  • @verdantia
    @verdantia 2 года назад

    You and your bunch give us so much of yourselves,thank you for putting so much time and precision in all your work.

  • @seriphim8542
    @seriphim8542 2 года назад +124

    At that density and the infrequency of the older data being updated you really should consider acquiring a tape library. A couple iSCSI targets and a 250 slot LTO library would keep you until you more than double your current use. But considering the increasing file sizes of the raw files you're ingesting I would recommend going for a 3-3.5X scaling.

    • @grrkaa8450
      @grrkaa8450 2 года назад

      A 250 slot library for what? 3 PB of direct access tape storage?

    • @killer2600
      @killer2600 2 года назад +5

      Tape is slow. I think the whole point of their setup is for fast access to footage new or old for editing purposes. If they were just hanging on to it for keep sake then Tape is an option but I think they keep it so they can retrieve previous footage on-demand to splice into the current video being edited.

    • @joross8
      @joross8 2 года назад +11

      ​@@grrkaa8450 Tape is slow, but much cheaper per TB.
      Typically you would have a hybrid system where users interacting with the data would hit high speed disk storage of some sort, and that disk storage would be running software that would migrate copies of files, or just less accessed files to tape.
      It's effectively the best of both worlds, users have the speed and accessibility of high speed storage, but the high speed pool is much smaller, and most of the archival data is on less expensive tape drives. The only time you hit a slow down is when a user has to access the stuff on tape which would be normally pulled when the user accessed a stub file representing the file on the disk pool.

    • @animefreak5757
      @animefreak5757 2 года назад +5

      @@killer2600 so do both? use tape as a economical backup option.

    • @MDKAOD
      @MDKAOD 2 года назад +8

      @@grrkaa8450 Why keep the data in hot storage at all? Archive to tape (not backup) toss it in a fire safe.

  • @SuperSmashDolls
    @SuperSmashDolls 2 года назад +380

    All of this was very patiently and thoroughly explained, except for one thing: what happened to that LTO-8 drive you were planning to put into service years ago?

    • @jayred8289
      @jayred8289 2 года назад +28

      I thought that they should have tape back up to

    • @lolish1234
      @lolish1234 2 года назад +41

      @@jayred8289 i mean how does such a big company with so many resources not have a 3-2-1 backuo, even of it's some raw data. It's not like they're short on cash, are they

    • @FormerHumanX
      @FormerHumanX 2 года назад +5

      It was probably a review unit sent to LTT just to make a video and not something they were actively going to implement.

    • @TheDemocrab
      @TheDemocrab 2 года назад +33

      @@lolish1234 Because it's not ridiculously important data, Linus even says in the video that half the reason they bother keeping it around is because they can make interesting videos on it. I wouldn't be surprised if the eventual goal was a 3-2-1 backup system but they wanted to cover setting up each stage in videos which kept slipping cause LMG is pretty busy until we get to today. A lesson into why businesses with large data needs should be hiring their own IT guy.

    • @sarowie
      @sarowie 2 года назад +2

      @@TheDemocrab Setting up a cable testing lab and acquiring more space is more important then building a 3-2-1 backup system?
      Yes "in hind sight" everything is easy to judge, but assuming Linus sets his priority straight, literally he has more issues with monitor cables then his raw video archive.

  • @tkirchmann
    @tkirchmann 2 года назад +280

    (oversimplified) Summary: The power dropped out a bunch of times and LTT dropped the ball on configuring the servers so the servers dropped a bunch of errors before dropping physical drives out of the servers resulting in the servers permanently dropping some data... I see a familiar pattern here.

    • @RippahRooJizah
      @RippahRooJizah 2 года назад +3

      HOLD IT!
      I'm not sure what you are getting at.

    • @ZNotFound
      @ZNotFound 2 года назад +20

      At least they get to drop a new video about it.

    • @sushimshah2896
      @sushimshah2896 2 года назад +5

      Would've been nice if (Mass)Drop sponsored then as well

    • @Thefreakyfreek
      @Thefreakyfreek 2 года назад +4

      Linus drop tips

    • @4-Avenue
      @4-Avenue 2 года назад +5

      how are we suppose to trust linus' tech tips if they keep dropping the ball :(
      But atleast they show us!

  • @RazielAU
    @RazielAU 2 года назад

    This is why I prefer off the shelf solutions like a Synology NAS. You know things are already configured correctly and probably won't run into any weird issues like this. I lost a drive in my array around 2 weeks ago, the system started beeping and sending out e-mails. I was immediately aware there was a problem, at that point I just popped in a new drive, hit a few buttons and off it went rebuilding it. Looking at what happened here, several drives have already failed, but no one even seemed to be aware of it, on top of that things weren't configured correctly, so data will probably be lost.

  • @joekenorer
    @joekenorer 2 года назад +743

    Linus: "We're not sure who's accountable here, so I'm considering hiring someone to be accountable because the situation is currently untenable without an appropriate system of blame in place."

    • @antiisocial
      @antiisocial 2 года назад +36

      Sounds like my company. Lol

    • @monkyyy0
      @monkyyy0 2 года назад +25

      .... yes
      Caring comes from being the person to blame

    • @saiyadulahmad2012
      @saiyadulahmad2012 2 года назад +14

      Said every CEO ever.

    • @acatch22
      @acatch22 2 года назад +24

      look up "Diffusion of responsibility" to understand what he truly meant.

    • @chloefletcher9612
      @chloefletcher9612 2 года назад +10

      Ahhh you've discovered the world of IT.

  • @Unreasonable0ne
    @Unreasonable0ne 2 года назад +386

    I'm just wondering why LTT didn't go for tape storage for their servers, since, as Linus said himself, it was for archival purposes and more of a fun project to test out the tech they got. They even got a tape drive some time ago afaik. It doesn't make sense to keep the drives spinning for years if they are not actively used or maintained.

    • @PanKosiu
      @PanKosiu 2 года назад +44

      Basically this. it was the first thing I thought of. If the archive data never changes, tapes will be crazy cheap way of backing up old videos.

    • @Stasiek_Zabojca
      @Stasiek_Zabojca 2 года назад +29

      Because they probably want to have quick access to it, I think... To cut something out of old video and things like that? As far as I know, tape storage does not give you that luxury.

    • @aoeuable
      @aoeuable 2 года назад +27

      @@Stasiek_Zabojca You could store lower-bitrate stuff on fast storage for browsing and only get the tape out when you need access to the original files.

    • @666Tomato666
      @666Tomato666 2 года назад +14

      Tape storage is cost competitive on the level of multiple petabytes, not single petabytes.
      So it's nothing that any significant minority of viewers will ever see in person, let alone be part of decision making process to buy, install or configure.

    • @beid777
      @beid777 2 года назад +8

      Because he'd rather have "dope hardware" instead of using tape. If they need access to it that's fine, every week or month or time frame you do a fresh backup to tape and keep your servers running for access and have tapes as backup. He failed to implement backup in depth which is basically industry standard.
      Archive is not backup. Redundant and separated storage of data is backup.

  • @sasidharasarma8625
    @sasidharasarma8625 2 года назад +150

    Team: Our data is gone
    Linus: So we got our content for today’s video

    • @schmitt00
      @schmitt00 2 года назад +4

      and quite a couple more

    • @JamezMartinez
      @JamezMartinez 2 года назад +2

      as long as they do not lose the data for that video too...

    • @ProTechShow
      @ProTechShow 2 года назад +1

      I do like this about LMG. I've been called in to help with several incidents of a similar nature and the level of stress as people see their livelihoods on the line can be pretty extreme. The fact that LMG can just make lemonade out of it is quite refreshing (pun not intended).

  • @abursh
    @abursh 2 года назад +1

    At what point do you start to consider pushing your least frequently accessed materials out to cloud storage (e.g. AWS Glacier)? When does it start to become more cost effective to pay someone else to store your oldest stuff, and run a smaller on-site storage cluster for the things you need frequent access to? Both in service costs, and time spent.

    • @RobinCernyMitSuffix
      @RobinCernyMitSuffix 2 года назад

      that's the thing: it's actually not cheaper to have "cloud" storage. It's cheaper to own and manage your own hardware.

  • @excellentswordfight8215
    @excellentswordfight8215 2 года назад +118

    Having such large raid groups (15 drives) without any hot spares or replacement routines with large drives seems rather dangerous as well. If you already have two dead drives in a vdev its not that unlikly that you will loose a third during the resilver.
    Anyway, ltt:s IT infrastructure has always been a bit of an dumpster fire, but maybe they do it intentionally cause it results in alot of great content 😅
    I wonder if they have thought about connecting an 84 drive SAS expansion to their ssd tier and just have old data migrate to spinning drives (I think seagate has a rebranded dell box if they have a partnership with seagate).

  • @gvfc
    @gvfc 2 года назад +773

    In my first months as a sysadmin I learned a lesson: always keep a secondary backup that isn't on-premise. Power can go out, and you'll have a few bad sectors on your drives. But if there's a fire and your server goes with it, all of a sudden giving a few bucks to Jeff Bezos doesn't sound that bad of a deal after all.

    • @radical_dog
      @radical_dog 2 года назад +62

      Yeah, not paying for cloud storage basically confirms that they wouldn't cry to sleep if they lost the whole lot. Which is a reasonable decision since it's not mission critical data.

    • @TonytheEE
      @TonytheEE 2 года назад +27

      They had a remote server in a previous VLOG a year or two back. I wonder what's up with that?

    • @tpmeredith
      @tpmeredith 2 года назад +10

      Heck anyone with a 5 or more user office 365 tenant can get unlimited onedrive backup. Yes it's slow to backup, yes it's full of details like 25TB sharepoint sites that you have to subdivide, but it IS unlimited for very cheap and an offsite backup.

    • @radical_dog
      @radical_dog 2 года назад +64

      @@tpmeredith No such thing as "unlimited", it just means "we haven't written down a hard limit". 720TB would definitely be knocking on that door!

    • @Kevin-jb2pv
      @Kevin-jb2pv 2 года назад +9

      I think they've covered this in the past, and the problem is that they just have so much at this point that the upload will take forever. But that doesn't mean you're not right. If anything, they should do it _now_ because every day they wait is going to just be more they have to upload. I'm sure there's something out there that will just start uploading everything in the background until it catches up.
      Also, IDK if it would only be "a few bucks" for the amount they need. IDK what that kind of enterprise level storage costs, but it's probably not cheap and I'll bet that even on "unlimited" cloud storage plans there's probably a catch written in the fine print with some way of restricting the storage in practice, like restricting the upload bandwidth past a certain amount of data uploaded to such a slow rate that they would never be able to upload faster than they create new data...

  • @primesyndicate272
    @primesyndicate272 2 года назад +239

    "With great power comes great responsibility"
    Watch it Linus, we all know what happens to characters who say those cursed words.

    • @datingzoneo798
      @datingzoneo798 2 года назад

      Only for fans over 18 years old baby-girls.id/angelina?cute-girl 🍑
      tricks I do not know
      Megan: "Hotter"
      Hopi: "Sweeter"
      Joonie: "Cooler"
      Yoongi: "Butter
      So with toy and his tricks, do not read it to him that he writes well mamon there are only to laugh for a while and not be sad and stressed because of the hard life that is lived today.
      Köz karaş: '' Taŋ kaldım ''
      Erinder: '' Sezimdüü ''
      Jılmayuu: '' Tattuuraak ''
      Dene: '' Muzdak ''
      Jizn, kak krasivaya melodiya, tolko pesni pereputalis.
      Aç köz arstan
      Bul ukmuştuuday ısık kün bolçu, jana arstan abdan açka bolgon.
      Uyunan çıgıp, tigi jer-jerdi izdedi. Al kiçinekey koyondu wins taba algan. Al bir az oylonboy koyondu karmadı. '' Bul koyon menin kursagımdı toyguza albayt '' dep oylodu arstan.
      Arstan koyondu öltüröyün dep jatkanda, bir kiyik tigi tarapka çurkadı. Arstan aç köz bolup kaldı. Kiçine koyondu emes, çoŋ kiyikti jegen jakşı dep oylodu. # 垃圾
      They are one of the best concerts, you can not go but just seeing them from the screen, I know it was surprising
      💗❤️💌💘

    • @Dual_Ralle
      @Dual_Ralle 2 года назад +21

      With great comments comes great botsibilites.

    • @play2windemon208
      @play2windemon208 2 года назад +1

      Didn't know linus had a kid named Peter Parker

    • @bz3086
      @bz3086 2 года назад +2

      Prime,
      At least 3 pro-establishment bots were trained to oppose yours and my viewpoint.
      🤣

    • @johnmorgan6316
      @johnmorgan6316 2 года назад +6

      Anyone else seeing these annoying bots everywhere

  • @Cineenvenordquist
    @Cineenvenordquist 2 года назад

    I have to hate both how some drives are 'just dead' and how errors are uncharacterized otherwise, so it's an information free video all in all. You can pick Andy Jeszy personal cloud tutorial papers from pretty far back and suddenly have a sort of base plan.

  • @ryan0io
    @ryan0io 2 года назад +221

    Please don't use 15 wide vdevs. Groups of 6 wide in raid-z2 is a good choice for spinning rust (4 data + 2 parity). As a zfs user for 10+ years, I cannot imagine running multiple 15 wide vdevs.

    • @namAehT
      @namAehT 2 года назад +36

      Really wide VDEVs are only OK when using SSDs or low capacity HDDs. The rebuild time on a 12 drive VDEV of 12TB drives is insane, and the stress the other disks are under during that period can easily cause one to fail. 6-8 drives on a RAIDZ2 seems to be the sweet spot for large drives, maybe 9 drive RAIDZ3 if you're _really_ paranoid.
      EDIT: I'm also saying this as someone who's running 8TB drives in 9 drive RAIDZ2 VDEVs. I have plenty of slots for more drives, so I'm sticking with 8TB drives for the time being.

    • @tpmeredith
      @tpmeredith 2 года назад +13

      Let alone multiple 15 wide vdevs in raidz2! Even worse. Then 4 of them in one pool? Of course that data was a time bomb.

    • @alessandrozigliani2615
      @alessandrozigliani2615 2 года назад +6

      More than 6 raidz2 using 20tb disks sounds a little edgy. I would require disks rated 1 error over 10^17 bits for that.
      15 is objectively scary with raidz2. 10 with adequate replication or backups would already be edgy.
      With raidz3 maybe 15 is not crazy but you might want to upgrade the pool at some point with 40tb drives or more, if they ever come out. Which would be totally nuts.

    • @ryan0io
      @ryan0io 2 года назад +6

      11 wide z3 vdevs would be the most I'd be comfortable with regardless of ssd / rated error rate. But once at 11 wide z3, why not go (2x) 6 wide z2? One extra drive, one extra parity, more stripping (more performance) more flexibility in adding / removing / replacing devices. All a balance between redundancy / space efficiency and flexibility. To me, 6 drive z2's, and just multiply as needed. Lets think about worst case. For 6 drive z2's, you lose 2 drives. you have a 4 drive "raid 0" to deal with until redundancy is returned. Not great, but not terrible. Email alerts, etc. But a 15 wide z2? No email alerting? 2 drives die you get a 13 wide "raid 0". Good luck.

    • @tpmeredith
      @tpmeredith 2 года назад +2

      @@ryan0io exactly. 100% right. Especially with a linus budget lmao.

  • @tommiaijala2732
    @tommiaijala2732 2 года назад +76

    Protip: Also setup proper monitoring of system and harddrives so that you react immediately when even ONE drive fails.

    • @blowfly71
      @blowfly71 2 года назад +3

      This. And either constant notification of error condition (email every hour etc) and/or escalation to someone else if not resolved in a particular time frame. Oh and have hotspares

    • @jfolz
      @jfolz 2 года назад +7

      @@blowfly71 just have alerts mate. You don't want constant "everything is OK" messages, because you will start ignoring those real quick and miss the one that says it's no longer OK.

    • @Thomas0918273645
      @Thomas0918273645 2 года назад +2

      @@jfolz he wasn't talking about constant notifications, but ongoing reminders if an error has occurred but wasn't fixed yet. That way you can't miss the single notification of a failed drive.

    • @blowfly71
      @blowfly71 2 года назад +3

      @@jfolz thats what I meant. You have alerts that require action, escalate if not resolved...

    • @jfolz
      @jfolz 2 года назад +2

      @@blowfly71 got it. Though sending constant messages does have a benefit: it's a canary for your monitoring ;)
      It's probably better to have monitoring that monitors the monitoring though.

  • @HarrySarge96
    @HarrySarge96 2 года назад +29

    I love this series !!! Can’t wait for next year’s

    • @geort45
      @geort45 2 года назад

      I bet he's a sleepwalker that fucks up the server just to have an excuse to buy a newer, bigger one

  • @CarnivalPS
    @CarnivalPS 2 года назад +4

    Sorry but I’m sticking to WD. Too many Seagate drives have failed for me. 😞

  • @Uhn_Tis_Uhn_Tis_Uhn_Tis_Baby
    @Uhn_Tis_Uhn_Tis_Uhn_Tis_Baby 2 года назад +264

    Linus, “We need a full time IT person, we keep losing data”
    Linus - Doesn’t hire a full time IT person.
    Also Linus, “I’m going to build another storage server with EVEN more storage.
    IT professionals, “hey, I’ve seen this one before!”

    • @gorkskoal9315
      @gorkskoal9315 2 года назад

      lol well anyone really. lol. Something something something something insanity.

    • @astronemir
      @astronemir 2 года назад

      The IT challenges are content

  • @jblyon2
    @jblyon2 2 года назад +43

    I've been through a number of mergers and acquisitions over the past 10+ years. On every single one the IT dept/employees who do IT tasks for the other entity have been running without viable backups, server monitoring, out of band management, or alerting. Most also lacked UPS units (or working UPS units), and one was even running RAID0 on a production server and couldn't figure out why it kept failing on them. It's a scary world out there.

  • @danbell8536
    @danbell8536 2 года назад +10

    Very impressed with the honesty on this channel. I know plenty of IT folks who would never admit losing data. I run large ZFS storage arrays at my work. When my primary ZFS array is due for replacement (after moving data and workloads to a new array), I then create a Zpool on the old array configured for max capacity and sequential I/O. I then snap and replicate (zfs send/receive) the data on the primary array nightly to the old array. I don't need a ton of performance or redundancy on the old array as it only receives the changed blocks on each replication and is only used for Oh Sh!t moments. I also HIGHLY recommend you add mirrored "Special" devices to your Zpools. Special devices (man zpool) are used for storing metadata (use SSD/NVME) and removing those I/O's from your slower main Zpool drives. You will be amazed at the performance increase, I promise.

    • @shanemshort
      @shanemshort 2 года назад +2

      you have to be careful with those special devices though, if you happen to have them configured in a non-redundant way and they go away, you drop the entire pool.

    • @alessandrozigliani2615
      @alessandrozigliani2615 2 года назад

      @@shanemshort great advice from you both. I totally agree. But they forgot to scrub a 2PB array and its backup, letting them rot for years. I mean... guys, come on.

    • @mirror1766
      @mirror1766 2 года назад

      @@alessandrozigliani2615 There was a backup? It could be scrubbed?

    • @mirror1766
      @mirror1766 2 года назад

      man 7 zpoolconcepts should be where 'special' is hiding. POSIX compatibility, COW filesystem, and magnetic media with its large seek times makes for a less than idea combination if performance matters.

  • @NERTech
    @NERTech 2 года назад

    pov you store all your files on a single SSD... "I don't need that" Me now: "I might need that"

  • @Master_KayOz
    @Master_KayOz 2 года назад +69

    It's so refreshing to hear the boss of a company confessing: "We f-ed up!" when usually the rhetoric goes: "this or that employee/supplier/provider.....". Props to Linus for being such a stand-up guy!

    • @gorkskoal9315
      @gorkskoal9315 2 года назад

      LoL: no no B+: Well actually our backups underperformed to undermeat the metrics of sucess, and our data was rightsized to meet the new standard: aka well ...uhhh we're having a moment....we just lost 6 weeks of work...we done fucked up.

    • @nickwallette6201
      @nickwallette6201 2 года назад +4

      And the community here is _still_ full of jackals. "That's what you get for being sloppy." Is it any wonder there's no transparency when empathy is a scarce resource?

    • @advice-13
      @advice-13 2 года назад +2

      Actually, he said "We fucked up ... but it doesn't matter, so who cares." If they actually lost important data, I'm sure the tone of the video would be a lot different.

    • @roodoff2411
      @roodoff2411 2 года назад +3

      As the company boss and the key decision maker, I would have preferred it if he said "I fcked up" and not "we fcked up". No dedicated IT employee, no scheduled maintenance and builds/implementations based on his decision making and dictation. The buck ends with him. He fcked up.

    • @gorkskoal9315
      @gorkskoal9315 2 года назад

      @@roodoff2411 ^^^^^^^^ he tends to do that. do that to.

  • @bencoomer2000
    @bencoomer2000 2 года назад +227

    You know. It's nice to see someone that handles things like an adult, admit mistakes, acknowledge that some failures aren't simple "that person screwed up", and use it to constructively fix problems

    • @beermarket9971
      @beermarket9971 2 года назад +8

      If he was handling this as an adult he would have hired a fulltime IT long time this is childish

    • @alias_not_needed
      @alias_not_needed 2 года назад +7

      @@beermarket9971 Why? It is everyones own choice how important their data is. If they can live with the loss of some old footage, i see no problem in their actions...

    • @beermarket9971
      @beermarket9971 2 года назад

      ​@@alias_not_needed There are plenty of reasons why this is childish in my POV:
      For one you should value what belongs to you and protect them from predictable breakdown otherwise you come out as a spoiled child.
      Second, as a CEO you have to duty to protect and save your employees work, while accidents do happen when they are caused by a lack of prevention, the people in charge (or CEO) come out as childish.
      Finally, when a CEO cannot hold someone accountable for data loss (or work loss) it's ultimately his fault and he should just own it but, maybe i missed it, but it didnt quite come out like that.
      I don't want this to come out negative, i like LTT and it looks like an amazing place to work, and i admire Linus. But this is frustrating to watch...

  • @rustyshakkleford
    @rustyshakkleford 2 года назад +53

    Linus: We lost all our data again :/
    Also Linus: Why don't more people build and configure their own servers? It's so easy!

  • @DxBang3D
    @DxBang3D 2 года назад

    This reminds me of this one time, at base ca... nope, I had everything placed on the backup, went to format all of my drives, and then went to restore the backup... just to find out that the backup itself, had died.... FTK Imager couldn't even help me, since it was the circuit board on the backup disk had died, a brand-new disk, dead on day-two. Went to buy another one of the same disk, replaced the circuit board and prayed to the data-recovery gods... they answered, and my data was recovered :)
    I said goodbye to Western Digital that day!

  • @nathanielsottung
    @nathanielsottung 2 года назад +228

    I would love to know why tape backups aren't considered. It seems to be one of the more economical options and is great for archival. Also, as a photographer who works with tens of terabytes I would love to learn more about tape backup.

    • @SnotRocket123
      @SnotRocket123 2 года назад +60

      As an actual IT professional, learn about it from literally anywhere other than RUclips.

    • @Lexan_YT
      @Lexan_YT 2 года назад +6

      It would probably be insanely slow if they ever wanted to use the videos to edit from

    • @ker6349
      @ker6349 2 года назад +28

      @@Lexan_YT theoretically they'd use it to reinstall on new drives and use the tape backups as backups and not main drives

    • @marekspacirek
      @marekspacirek 2 года назад +16

      @@Lexan_YT Backup is not main storage. With Dell Powervault TL and IBM Spectrum we are achieving 1-2Gb/s write and read speeds. So restore of that data isn't that insanely slow.

    • @yensteel
      @yensteel 2 года назад +10

      Wow, I just checked, LTO-8 standard goes up to 12 TB per cartridge! It's very interesting!

  • @TheSleepyCraftsman
    @TheSleepyCraftsman 2 года назад +102

    You could have made incremental slow tape backups for emergencies like this.

    • @jesseinfinite
      @jesseinfinite 2 года назад +5

      Literally nobody would've time for that, especially given that this isn't just archives of old videos that are static in number, but that is continuously increasing every single week.

    • @bamzilla1616
      @bamzilla1616 2 года назад +27

      @@jesseinfinite LTO spanning archives are exceptionally easy to implement and add very little overhead given proper organisation.

    • @bgezal
      @bgezal 2 года назад +5

      1 PB takes 83 uncompressed LTO-8 tapes (or 33 compressed), or 55 (22) LTO-9 tapes, whenever they become available.

    • @tangyboi6420
      @tangyboi6420 2 года назад +1

      Yup! Tape drive backup solution that holds any video older than 2 years old and then pay a company to keep it safe.

    • @DantalionNl
      @DantalionNl 2 года назад +16

      @@jesseinfinite The entirety of CERN is tape backed, clearly if it can work for an organization with such immense data needs it can work for Linus to.

  • @nermanus
    @nermanus 2 года назад +84

    LMG seems like a company where everybody does everything and that can work to a degree if you have just a couple of employees but it's a disaster when you have a bigger business to run.

    • @ericwhite265
      @ericwhite265 2 года назад +7

      @UnjustifiedRecs I don't understand how you easily lose track of a server that should be sending out notifications to someone that drives have died.

    • @sapier
      @sapier 2 года назад +2

      @@ericwhite265 very true. just about every commercial nas software has some notification system for when a drive goes down. you shouldn't have to audit the system to find that there are several that have failed.

  • @HadesTimer
    @HadesTimer 2 года назад +1

    This is why on-premise storage solutions are so expensive. Everything has to be maintained by someone whose job it is to do so. Which means paying someone extra besides paying for all the hardware.

  • @kieranholland1048
    @kieranholland1048 2 года назад +50

    Me: I like it when LTT does server videos, I hope they do another one soon...
    Also me: How many errors? Oh sh*t ...I take it back.

  • @Coz131
    @Coz131 2 года назад +259

    Honestly I never thought LTT was professional with their IT operations. They seem to have to too many failures than they should. If they want archival storage, they should just use tape backup.

    • @TamerBayouq
      @TamerBayouq 2 года назад +1

      Nah Bro, ZIP DRIVES WOOP!

    • @QuentinStephens
      @QuentinStephens 2 года назад +14

      And thus it's a useful lesson for the rest of us.

    • @bits3608
      @bits3608 2 года назад +28

      @@TamerBayouq Nah, stone tablets are more reliable.

    • @ndc5544p
      @ndc5544p 2 года назад +9

      and that's ok. It generates content and it informs a broader audience about this kind of issues.

    • @narfbite5239
      @narfbite5239 2 года назад +15

      Didn’t they try this years ago, but decided not to because of cost or retrieval speed… something like that.

  • @kwerboom
    @kwerboom 2 года назад +37

    "The rule of two: One is none. Two is one. If it's important, you need a backup." - C. G. P. Grey

  • @mikeromero4162
    @mikeromero4162 2 года назад

    Congrats... you are getting pretty close to my life's book.

  • @jewjubes3688
    @jewjubes3688 2 года назад +230

    Linus: "RAID Is NOT a backup"
    Also Linus: "Yeah, so I didn't back up my RAID"

    • @catbertz
      @catbertz 2 года назад +4

      I haven't finished the video yet, but I face palmed so hard when he said no back up.

    • @flintstone1409
      @flintstone1409 2 года назад +9

      Tbh its not such a big deal for the reasons he mentioned, as its just archive of old videos but yeah

    • @melissablick779
      @melissablick779 2 года назад +4

      @@catbertz Pretty sure that was a deliberate business decision. He knew this could happen, but losing that data would not impact day-to=day operations.

    • @ForFourGaming
      @ForFourGaming 2 года назад +4

      @@catbertz oh yea lets just casually backup 2 peta bytes of non critical data, that for sure makes sense

    • @sleeplesson
      @sleeplesson 2 года назад

      @@ForFourGaming LTO tape. It's not complicated. Back it up, and store it in proper archival conditions, tape will last 20-50 years.

  • @Charlie8913
    @Charlie8913 2 года назад +35

    Oh my gosh, they had no automatic scrubs and no automatic e-mail notification when a drive fails? That's absolutely necessary maintenance basics for ZFS...
    I wish LTT luck on restoring their data!

    • @oddeye
      @oddeye 2 года назад

      I wish them luck too, but all the information they tell about how it's important to backup the drives and have multiple backups they don't even follow. Is it just we'll fix it later or the cost to do it isn't a justifiable reason?

    • @SquirreliciousMe
      @SquirreliciousMe 2 года назад

      Also funny as they had multiple videos with sponsors like Pulseway where they brag about having everything monitored (so I guess they don’t use it… or didn’t configure that either…)

  • @yrmoma
    @yrmoma 2 года назад +39

    Thank you for this. I've never scrubbed my ZFS pool because I didn't know what that meant. I now have it set up to do it monthly and am running one as we speak. 5 hour estimate for completion

    • @yrmoma
      @yrmoma 2 года назад +21

      Update: no errors on all drives of my 8 drive Z2 array. Awesome! Took about 8 hours and they're 2 tb drives.

  • @_GhostMiner
    @_GhostMiner 2 года назад +55

    Linus being so calm while talking about one of his/their biggest oopsies is so cool 😄

  • @JessSimpson1313
    @JessSimpson1313 2 года назад +37

    Hey Linus, 2 best practice recommendations I didn't hear you state, but would be very important. 1) per every 24 disk (avg) you should have 1 hotspare. This drive should be in the same HA zone as the 24 disks so if a failure occurs or a scrub detects failures it can automatically start the rebuilds to the spare, this gives you time to get replacement equipment etc, without having to worry about your data while your purchasing. 2) If you cannot do full backups such as to a cloud or dedicated location, the next best thing is to ensure your data is across 2x different technology solutions. As this is entirely archival, and your not worried about location protection having your 2nd system be replaced by either an always on VTFS (virtual tape FS) or just streamed to tape backups and 1 guy pulling the tapes about once a month. Tape is rather inexpensive and had a great shelf life. I've been doing IT Storage and Data Protection engineering for the last 10 years, and customers in your position of not having dedicated staff but are gathering increasingly large data sizes is all too common sadly.

    • @peterpain6625
      @peterpain6625 2 года назад +6

      They're using Seagate drives. So for every drive they should have a hotspare ;) No seriously. I'd go 12D 1HS at least. The way they manhandle their servers i'd say VTFS is prone to a hilarious video with a lot of grey confetti :D

    • @brucepayan2845
      @brucepayan2845 2 года назад

      Offsite rotating backups?

    • @JessSimpson1313
      @JessSimpson1313 2 года назад

      @@brucepayan2845 that would be ideal, but in the video they said they couldn’t afford offsite backups.

  • @pseudonymity0000
    @pseudonymity0000 2 года назад +180

    When backing up, always remember 3,2,1. 3 copy's, 2 local, 1 remote.
    Another important thing not to forget, Raid is not a backup.

    • @ApolloPTT
      @ApolloPTT 2 года назад +16

      The must basic rule of sysadmins

    • @HermanIdzerda
      @HermanIdzerda 2 года назад +3

      This was just on my mind as well. 3-2-1 - I do it at home as well.

    • @pseudonymity0000
      @pseudonymity0000 2 года назад +13

      @L. Kärkkäinen You're right, However, this could be mitigated through tapes. They are actually ideal for this kind of data, as video files are sequential data files. Tape is also archival class, meaning they should not suffer from bit rot over time when stored properly.
      if they need old footage from years ago, they can grab the tape from archive, and it should seek and fetch the data off relatively quickly.
      Tape also solves the offline problem, as they should only be loaded when writing new data, or if you intend to retrieve it.

    • @MAxAMILLIoN757
      @MAxAMILLIoN757 2 года назад

      Why is raid not considered backup? I was considering using a 2 drive raid synology nas for my desktop files, and possibly copying that data to a cloud provider like wasabi as well. Is this not a good solution for “safely” storing my crap?

    • @Mychanel691
      @Mychanel691 2 года назад +3

      Yeah, tape is one of the best solution to offline data storage. It is "old" tech, but it does the job. For personal use i have cloud for archives, but for larger businesses a tape library is a nice touch. Only problem is the software, it can be high priced.

  • @reread2549
    @reread2549 2 года назад +20

    Before I retired I spent most of my time imaging and backing up for 25 to 200 seat companies because it was such an overwhelming task for most companies IT departments. I was looked at as more of a cost department until the ransom ware attacks started and then I was looked at much more of a valued service when I could tell them no you don’t have to pay them. I have all of your data.

    • @mikehawk4517
      @mikehawk4517 2 года назад +1

      And when the ransom ware attack happend is also when they realized you were the one doing the attack ;)

  • @matzevoje
    @matzevoje 2 года назад

    If you have a lot of drives try to mix the manufacture dates because drives out of one batch of production tend to die at the same time. You see this in huge data centers regularly happen i had 20 ish failed drives on a day and they where all manufactured close to the same time. This goes for HDDs. Not seen this on SSDs but the amount of HDDs at the data center was way more.