How To Rebuild and Resync From Failed Hard Drive Raid 5 on mdadm

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024
  • In this video I will set one of my hard drives to be faulty then replace it with another hard drive, rebuild and resync the raid 5 in linux using mdadm. thank for the view!
    ○○○ LINKS ○○○
    Rpi 4 Quad Sata ► • Full Setup Quad Sata H...
    ○○○ SHOP ○○○
    Novaspirit Shop ► teespring.com/...
    Amazon Store ► amzn.to/2AYs3dI
    ○○○ SUPPORT ○○○
    patreon ► goo.gl/xpgbzB
    ○○○ SOCIAL ○○○
    Pandemic Playground ► / @pandemicplayground novaspirit tv ► goo.gl/uokXYr
    Novaspirit Gaming ► www.youtube.co... ► / novaspirittech
    discord chat ► / discord
    FB Group Novaspirit ► / novasspirittech
    ○○○ Send Me Stuff ○○○
    Don Hui
    PO BOX 765
    Farmingville, NY 11738
    ○○○ Music ○○○
    From Epidemic Sounds
    patreon @ / novaspirittech
    Tweet me: @ / novaspirittech
    facebook: @ / novaspirittech
    Instagram @ / novaspirittech
    DISCLAIMER: This video and description contains affiliate links, which means that if you click on one of the product links, I’ll receive a small commission.

Комментарии • 80

  • @matthewhelton1725
    @matthewhelton1725 4 года назад +2

    Great Tutorial! I've been seeing a lot of YT Computing channels using/promoting RAID5 lately, and it is good to talk about it and even better to understand it. One thing to keep in mind for RAID5 (any type of RAID5 ; Controller-based RAID5, MDRAID-5 or LVM-RAID-5) there is a limit to the size of the RAID5 array before you exceed the URE rating of the hard disks (when you exceed about 12TB or 10^14 bits); once this happens there is really no way to ~*know*~ if your data is good. As disks get larger (with 4+ TB disks this probability of a URE increases by a factor of two or more depending on the physical head layout). On point, RAID6 with it's Double Parity, neatly covers this gap, but write performance with RAID6 is even worse than RAID5; This is one reason why RAID5+0 (which makes it much simpler to keep the RAID Disk Groups under the URE limit) and especially RAID1+0 have become more popular with Enterprise Storage solutions (but unfortunately cost a lot more). Just something to think about. Other controller-less Software Bulk Storage solutions like Gluster, CEPH and ZFS also sidestep these issues with more robust internal error checking than offered by the RAID5 parity stripe. Just something to consider and research.

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 года назад +3

    mdadm is wonderful! Speaking as a sysadmin who has seen his share of drive failures, I would never bother with hardware RAID again.

    • @wontcreep
      @wontcreep 2 года назад

      i was wondering what would be the reasons to buy hardware RAID solutions over this

  • @seejjordan
    @seejjordan 4 года назад +2

    Nice haircut! Love the channel. One of the best, as it doesn't shy away from the more difficult stuff, but doesn't go into insane complexity either. Ty.

  • @viktor133100
    @viktor133100 4 года назад +1

    i wae looking for a while for a clear tutorial on how to change a faulty disk in raid5. and this is very clear.

  • @BrianThomas
    @BrianThomas Год назад +1

    Wow. I had no idea it was that simple. I'm starting to rethink my open media vault setup. Thank you 💞

  • @thomasneal7126
    @thomasneal7126 2 месяца назад

    Great video !!! this is what I have been looking for. Thanks so much for sharing your knowledge.

  • @michaelkebe
    @michaelkebe 4 года назад

    You should consider switching from screen to tmux. It took a short while but now I am more than happy.

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 года назад

    2:05 screen is very useful for accessing remote machines, where you only need one SSH connection and can open any number of terminal sessions on top of that.
    Here’s a tip: the default CTRL-A escape character conflicts with one of the line-editing keystrokes in Bash that I use a lot. So in my ~/.screenrc, I have the line
    escape ^Zz
    which changes the escape keystroke to CTRL-Z instead. This interferes with the stop-job character in Bash (which you now have to type as CTRL-Z followed by Z), but that’s OK because screen offers a better alternative to job control anyway.

  • @Ziad_B
    @Ziad_B 3 года назад +1

    Hey, I'd be really interested in a tutorial for boosting a RAID 5's write speed with an SSD cache!
    Hopefully you find the time to make that happen!

  • @Coentjeeee
    @Coentjeeee Месяц назад

    Wow thank you. You helped me out again. Thank's for the video :D

  • @Gatsu563
    @Gatsu563 4 месяца назад

    This was very helpful, thanks!

  • @YazeedAlKhalaf
    @YazeedAlKhalaf 4 года назад +2

    Thank you for doing the video! You’re awesome 😎

  • @antunezcarlos
    @antunezcarlos Год назад

    Awesome video!!
    Question. Where can I get that hard drive NAS enclosure? Can you or anyone share the link please. Thanks

  • @otter-pro
    @otter-pro 4 года назад +2

    I recommend tmux instead of screen, as it is a better modern alternative.

  • @simtcr
    @simtcr Год назад

    Thank you very much. Was trying to figure out what to do after a failed disk is replaced.
    If you dont mind, I have a non related question.
    lets say I built a raid with 4TB x 2 disks with ultimate storage of 4TB.
    What will I be doing when I fill up the 4 TB storage?

  • @Zellonous
    @Zellonous 4 года назад +3

    Since it's a pi... What happens if the SD card dies? If you had a copy of it and replaced the dead SD card with the new working one, would it affect anything with the raid?
    If you didn't have a backup of it, what then? Can the software pick up the raid and utilize its existing state automagically?

    • @NovaspiritTech
      @NovaspiritTech  4 года назад +1

      yup, since it's not encrypted it will work, you can even move these 4 drives to another debain install with mdadm and you should be good to go

    • @AndersJackson
      @AndersJackson 4 года назад +3

      @@NovaspiritTech when moving between systems, you need to change the host name in the meta information of each disk and the RAID.
      You usually also store the configuration of the RAID in /etc/mdadm.conf (or something like that) so it can start faster. But if you make a backup the SD card, you will be good.
      Still, if you didn't made a backup, mdadm(8) can scan all disks in the system and recreate the RAID from that meta info stored there. But it is faster if you store the information in the /e/mdadm.conf file. 😜

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 года назад +1

    4:17 In the demo of mdadm I did for the local Linux users’ group a few years ago, I used USB sticks, and I pulled one of them out to cause a failure.
    Edit: this was also handy because I could demonstrate hot-swapping without shutting anything down.

  • @OnceARider
    @OnceARider 3 года назад

    Awesome guide! Have another question, how to transfer existing raid5 from another system to another system?

  • @IEnjoyCreatingVideos
    @IEnjoyCreatingVideos 4 года назад

    Great video Don! Thank you for sharing it with us!💖👌👍😎JP

  • @sirstefano1968
    @sirstefano1968 3 года назад

    Hi,
    the project of this nas is highly risky for data after a few restarts the raid is not seen .... consequently it has to be rebuilt or resynchronized, but the thing that surprised me is the following:
    I inserted an 8 Giga usb pen when I did the mount it, the raid5 jumped, it got corrupted (it only saw two disks), I had to stop the raid from the command line
    sudo mdadm --stop md0
    sudo mdadm --assemble --run --force --update = resync / dev / md0 / dev / sda1 / dev / sdb1 / dev / sdc1 dev / sdd1
    after the omv program saw the raid in clean state (resync pending)
    I went into the filesystem menu and mounted it, all right .... for now!

  • @samwork3038
    @samwork3038 4 года назад +1

    how did you know which drive to remove from the array, like physically with the screwdriver and stuff .. also great video, many create raid ( 5-6 ) arrays, and have no idea how to recover from a failed drive.

    • @NovaspiritTech
      @NovaspiritTech  4 года назад +3

      the quad sata board has writing on it telling me which one is sata3

    • @AndersJackson
      @AndersJackson 4 года назад +1

      @@NovaspiritTech this is also a reason to have different manufacturers of the disks.
      All disks also have a serial number printed on them. So you can get the serial number of all the working ones and know which to not remove. And if you are so Lucky the faulty one is responsive, you can get the serial number from it.
      It shows up in the logga when the machine boots and see the disk the first time. 😜

  • @JoseRodriguez-wy4jb
    @JoseRodriguez-wy4jb 4 года назад

    Great video. Thanks for your explanations. This NAS box seems to me a very interesting option.
    Is it possible to move the root partition to NAS hard drives? Could you explain how to do it?

    • @Bandicoot803
      @Bandicoot803 4 года назад

      @Jose Rodriguez: You can't do it on a Raspberry Pi as it can only boot off the µSD card. Booting off a RAID array array requires a complete different setup on either a i386 or amd64 platform.

  • @TomClaessens
    @TomClaessens 3 года назад

    Hi Don, maybe make a future video on the same theme but how to recover from a failed RAID card/controller? Something that recently happened at work. They taught one of the drives was acting up, but apparently the controller failed. How would you recover from something like that?

  • @GeorgeTJ
    @GeorgeTJ 4 года назад +4

    Hey Don, how did you know which of the 4 drives was the faulty one when you disassembled it to replace?

    • @IanC14
      @IanC14 4 года назад

      I'm guessing he already made a note of which drive serial number was assigned to SDx

    • @Ray-dz9fn
      @Ray-dz9fn 4 года назад

      For the Quad SATA, I think the related/affected drive light will wink

  • @AndersJackson
    @AndersJackson 4 года назад +1

    Do you want to experiment with RAID?
    Just use a bounch of USB memory sticks and create small disks.
    OR use a virtual machine with a couple of virtual disks. Crasch them? Copy /dev/zero on them with dd(1). 😜
    You should also check the contents of the disk before you add. I uses fdisk(1) with the list option, - l.
    There are also a reason to make a partition of known size, because all disks are not created equal. So if the new is slightly smaler, you couldn't add it to the RAID. You can add a larger though, but you can't use that extra memory for a RAID. The disks, be it hda or hda1, have to be of equal sice.
    And it is a good reason to not have all disks from the same batch and/or manufacturer. The risk of have them crash the same time is lower in that way.
    As RAID-5 can "only" loose one disk. Two, and you are lost.

  • @chinook575
    @chinook575 4 года назад

    Was the Barracuda you are using here an SMR spindle or SSD?, curious as to rebuild times if its an SMR?

  • @theloniuser
    @theloniuser 2 года назад

    It looks to me like you have 4 active devices and no spare. Do I need to have 3 devices and a spare in this setup or do I not need the spare?

  • @wiideathmodtv
    @wiideathmodtv 4 года назад

    Can you do update on the amd nas like did you ever update the psu

  • @Valnurat
    @Valnurat 3 года назад

    When I do "sudo mdadm --manage /dev/md127 -a /dev/sde1" it is just added as a spare. How do I force my NAS to use the spare?

  • @Kenny_Ded
    @Kenny_Ded 4 года назад

    Another good terminal multiplexer is tmux

  • @jarisipilainen3875
    @jarisipilainen3875 4 года назад +1

    4:56 use same command and make it non faulty or active sync lol xD no need wait 20hour

  • @dan8t669
    @dan8t669 4 года назад

    Is this process not easier thru the OMV interface?

  • @mohareb12
    @mohareb12 4 года назад +1

    What OS do you use ?.. it looks cool

    • @MarcusWeyer
      @MarcusWeyer 4 года назад

      Looks like Elementary OS elementary.io/

    • @AndersJackson
      @AndersJackson 4 года назад

      @@MarcusWeyer that is tecnically a distribution. 😜

    • @MarcusWeyer
      @MarcusWeyer 4 года назад

      Anders Jackson A distribution of an OS, perhaps...?

    • @AndersJackson
      @AndersJackson 4 года назад

      @@MarcusWeyer a distribution based on an OS.
      Linux, the kernel, is the OS. Rest is distribution. 😜

    • @MarcusWeyer
      @MarcusWeyer 4 года назад

      Anders Jackson So the kernel he is using is Linux.. so what OS is he using? I don’t think people refer to Windows 7, 8, 10 as distributions because they use the Windows kernel. The kernel is only a portion of the entire operating system package. Point being, I don’t think there is anything wrong with referring Ubuntu, Linux Mint, Elementary OS, etc as operating systems. Is there any point to be so pedantic? Heck, “OS” is in the name of the distribution. 🤷🏼‍♂️

  • @stevenbell9589
    @stevenbell9589 4 года назад

    I know they say this setup supports 4 drives at 4tb each drive but have you tested say 4 drives at say 5tb each do you think it would work

    • @AndersJackson
      @AndersJackson 4 года назад

      Would work, but syncing an new disk will take a huge time. All disks has to be read and data has to be written to the new disk, and doing calculations on every data written. That is 5 TB write on that disk. Make the calculations yourself. 😜

  • @ierosgr
    @ierosgr 4 года назад

    I find it strange that the sync procedure happens automatically... so it initializes the disk formats it and then sync it? All that happened at zero time in yours since the state was at a glance to active synced all drives instead of showing the one rebuilding (Did you have a timelapse there)
    Other method I had seen for adding a drive included
    sudo mdadm /dev/md0 --add /dev/sd(whatever letter the new drive is)
    sudo mdadm --grow --raid-devices=4 --backup-file=/boot/raid.bac /dev/md0
    sudo resize2fs /dev/md0 (in order for the new drive to have a filesystem)

  • @jerrychandler657
    @jerrychandler657 4 года назад

    How would you go about recovery of a bad SD card which is a very likely scenario?

    • @macemoneta
      @macemoneta 4 года назад

      1. Shutdown
      2. Pop out the bad microSD
      3. Pop in the replacement with the dd image you made once your configuration was the way you wanted it.
      Total downtime about 60 seconds. That's why you prepare.

  • @Starky3000
    @Starky3000 4 года назад

    How long did the rebuild actually take? There was under 1TiB of information on it.

    • @NovaspiritTech
      @NovaspiritTech  4 года назад

      not too long since i only had about 130gb of data

  • @dan8t669
    @dan8t669 4 года назад

    I wanna build a NAS using the Rock Pi 4 + penta sata hat + OMV for 5x4TB drives in raid5.
    Any opinions? yay/nay

    • @AndersJackson
      @AndersJackson 4 года назад

      It gonna take a lots of time build the RAID.
      Except from that, I can't see any problems.
      Practice on some files as device though. Make a couple of files as when making swap files. These can be used as disks in a RAID.
      MUCH easier then practice on real system. 😜

  • @wadud92
    @wadud92 4 года назад

    Still out of stock and doesn't look like it will get better any time soon what with the virus :(
    I'll keep on holding on but man do I want this case and hat

  • @user-wl7gp4ei2i
    @user-wl7gp4ei2i Год назад

    How on earth did you get the raid 5 to even form with mdadm in the first place. Whenever i try to do so on raspian lite after doing an update && upgrade and then "sudo mdadm -Cv -l5 -c64 -n4 -pls /dev/md0 /dev/sd{a,b,c,d}1" when it tries to rebuild drive 4 as expected off the bat after a couple of hours there will be a failure with the jmicron usb controller and /dev/sda,b,c,d vanish and leave me with /dev/sde,f and the raid is lost until i reboot then i can then unmount and zero all then repartition and i can start again. The only Raid that seem to be able to completly sync is Raid 0 & Raid 1 obviously Raid 0 is completely skipping this process and Raid 1 means i have to make two seperate drive spaces which will be a pain to scale. why anyone would be interedted in using Raid with 4 drives and not us Raid 5 or 10 at a minimum because you may as well just JBOD and not have bothered with the complexity of a raid controller, im lost any ideas to get Raid 5 or 10 to build and sync without the jmicron thing falling over and not allowing the raid to form ?

  • @DavidMadeira29
    @DavidMadeira29 4 года назад

    "I'm still having on/off issues, but I'm very used to pointing the zeroes..." 🗨😘🔋

  • @ewenchan1239
    @ewenchan1239 4 года назад

    What happens if you remove the drive BEFORE detaching it?

    • @NovaspiritTech
      @NovaspiritTech  4 года назад

      that's fine as well... detaching is for hot swap. so if you don't want to turn of your computer. but if you turned off the computer it will auto detach.

    • @ewenchan1239
      @ewenchan1239 4 года назад

      @@NovaspiritTech
      So...once you remove the drive, it will always auto-detach on shutdown?
      Interesting.

  • @David_Quinn_Photography
    @David_Quinn_Photography 4 года назад

    now can we get a video on how to expand our raid say I have 3tb and now need say 5tb? can you do a video on that?

    • @AndersJackson
      @AndersJackson 4 года назад +2

      Just add some more disks to the RAID. 😜
      If you want, you can also put in bigger disks instead of the original, one at a time. Just let them sync before you replace the next one.
      When all are replace, you can resize the RAID to the smallest of the new ones.
      Just make a RAID of some virtual disks in a virtual machine, and you can experiment with them.
      Don't use too large disks though, as syncing takes time on big disks.
      To crash a disk, just copy /dev/zero or /dev/one onto the disk you want to Crasch. Do that while the RAID is running, and watching /proc/mdstat as he did while you do this.
      Then remove it as the video and then write zeros on it again and add it as a new disk to the RAID.
      Also try to add a spare disk to the RAID. That isn't used, until one disk crasch. All seen by mdstat file.

    • @macemoneta
      @macemoneta 4 года назад

      @@AndersJackson Yes, being able to play with RAID in a virtual system to gain familiarity with failure modes and recovery is a very underappreciated feature on Linux with md / BTRFS / ZFS RAID.

  • @bc-kelley
    @bc-kelley 4 года назад

    Is your device numbering off, why is there not a number 3??

  • @zxjason
    @zxjason 4 года назад

    I tried mdadm the soft raid, and the IO is to high for raspberry pi, and cause failure.

    • @AndersJackson
      @AndersJackson 4 года назад +1

      I have a RAID on USB sticks and it works. No way it could be too fast disks. There has to be something else.

  • @jarisipilainen3875
    @jarisipilainen3875 4 года назад

    8:53 fastest recover ever. that was same drive

    • @Marin3r101
      @Marin3r101 4 года назад +1

      Or there was not much data on the array to begin with.

  • @spexpl
    @spexpl 4 года назад

    If I had two such devices. What if I could make the space of two RAID5s look like one pool...

  • @Hex-Mas
    @Hex-Mas 4 года назад +1

    MDMA

  • @Robber7
    @Robber7 4 года назад

    Just a minor thing, your intro should be rendered at MUCH higher bitrate (or rendered from scratch for every video). When you re-encode it in your videos the encoder bit-rate starved artifacts really show. Just those small details that may degrade the "quality" feel of the intro. Nice video tho as usual! :)

    • @NovaspiritTech
      @NovaspiritTech  4 года назад

      hahahah you spotted my lazy editing... it's much easier to just drop in a pre rendered intro. then to make it from scratch each time. and faster then using after effect everytime LOL

    • @Robber7
      @Robber7 4 года назад

      @@NovaspiritTech haha, its all good man. I dont think people really care. Im just one of those that notice it I guess :P
      But arent there ways of setting it up in a project, like a composition, so its ready to be dropped in and rendered on every video without any extra work? Not sure if it's possible, but I'd be suprised if there wasnt a way :P

  • @jarisipilainen3875
    @jarisipilainen3875 4 года назад

    7:14 just say you recorded that changing drive before or after. you cant be that fast. your pc clock is same lol. no liying if you do it do it xD you changed drive same time what clock change but video was speedup lol. just admit you did not setup same thing again. it is same time.you continued like you was changed drive

  • @drkcodeman
    @drkcodeman 4 года назад

    lol that is just to much work to add another drive, windows storage spaces for the win