What to do with a Degraded ZFS Pool

Поделиться
HTML-код
  • Опубликовано: 14 ноя 2023
  • Thanks to Vultr for sponsoring today's video. Visit getvultr.com/craft to start your free trial, and receive $250 in credit for your first 30 days!
    This video is going to focus on a ZFS Mirror inside Proxmox, but the commands and steps apply to nearly every type of ZFS array, regardless of the system they're running on. In short, there are four commands you should know...
    zfs status - Display the health and disk info for all ZFS Pools
    zfs scrub [pool_name] - Verify all data block can be read in a ZFS Pool
    zfs offline [pool_name] [disk] - Deactivate a disk in a ZFS Pool
    zfs replace [pool_name] [old_disk] [new_disk] - Remove a disk from a Pool, and replace it with another
    But first... What am I drinking???
    Ex Novo Brewing (Portland, OR) Proper Fence NZ Pilsner (4.8%)
    Links to items below may be affiliate links for which I may be compensated
    Recommended Videos:
    What is RAID? - • What is RAID???
    Proxmox 8.0 Tutorial - • Let's Install Proxmox ...
    Observium Network Monitor - • Keep an eye on your ne...
    Grab yourself a Pint Glass at craftcomputing.store
    Follow me on Mastodon @Craftcomputing@hostux.social
    Support me on Patreon and get access to my exclusive Discord server. Chat with myself and the other hosts on Talking Heads all week long.
    / craftcomputing
  • НаукаНаука

Комментарии • 190

  • @CraftComputing
    @CraftComputing  7 месяцев назад +3

    Thanks to Vultr for sponsoring today's video. Visit getvultr.com/craft to start your free trial, and receive $250 in credit for your first 30 days!

    • @hpsfresh
      @hpsfresh 7 месяцев назад +1

      Proxmox sends emails when pool is not ok. Just setup good smart host with dpkg-reconfigure postfix

  • @TheRowie75
    @TheRowie75 7 месяцев назад +70

    If your first nvme fails, your proxmox will not boot anymore, cause you have to copy the 3 partitions (UEFI) from the first nvme to the new one with "sgdisk /dev/nvme0n1 -R /dev/nvme1n1" ... then renew the guid with sgdisk -G /dev/nvme1n1 ... then add the 3rd part.-id to the pool and resilver! ;-)

    • @CraftComputing
      @CraftComputing  7 месяцев назад +25

      This is correct ^^^ My brain didn't think about the fact this was also a boot pool. Thanks for chiming in :-)

    • @Varengard
      @Varengard 7 месяцев назад +4

      @@CraftComputing maybe should pin it since your video basically teaches people how to set up their proxmox server to fail to boot if their OG mirrored drives fail one after the other

    • @succubiuseisspin3707
      @succubiuseisspin3707 7 месяцев назад +1

      @@Varengard In my case I had to do some additional steps
      sgdisk --replicate=/dev/sdb (new empty disk) /dev/sda (existing disk with data on it)
      sgdisk --randomize-guids /dev/sdb (new disk)
      pve-efiboot-tool format /dev/sdb2 --force (new disk)
      pve-efiboot-tool init /dev/sdb2 (new disk)
      And AFTER that do the ZFS resilver to the zfs partition

    • @eDoc2020
      @eDoc2020 7 месяцев назад

      I was looking for a comment along these lines. I specifically noticed he was replacing a single partition with a whole disk.

    • @TheRowie75
      @TheRowie75 7 месяцев назад

      @@succubiuseisspin3707
      sgdisk -R = "--replicate"
      sgdisk -G = "--randomize-guids"
      ;-)
      on a UEFI install Proxmon you only need to
      proxmox-boot-tool refresh
      Let me know if i forgot something! ;-)

  • @Royaleah
    @Royaleah 7 месяцев назад +33

    I would use disk-by-id for the replacing disk. From my experience this makes things simpler in the long run. And if i ever import a pool I force it to do it by id as it then doesn't care what ports they are connected to.

    • @Darkk6969
      @Darkk6969 7 месяцев назад +2

      I do the same thing too. Earlier I was using the traditional ZFS replace command and it only labeled the drive by it's short path rather than disk-by-id. I had to redo the replace command and worked perfectly without data loss.

    • @CMDRSweeper
      @CMDRSweeper 7 месяцев назад +1

      @@67fabs I use this, it includes the disk' serial number and makes it easy to find the culprit that is causing issues and replace it.

  • @djcmike
    @djcmike 7 месяцев назад +20

    You forget the partitions for boot and the like. SO if your other old drive dies, you got a problem booting.

  • @lavalamp3773
    @lavalamp3773 7 месяцев назад +73

    Pretty crazy that Proxmox gives no warning about a failed drive and degraded array and that you had to go digging to see it.

    • @nathanscarlett4772
      @nathanscarlett4772 7 месяцев назад +7

      Yeah, that's insane. Synology gives statuses on drives all the time

    • @hachikiina
      @hachikiina 7 месяцев назад

      to be fair, proxmox isn't a primary nas system but there should be a better implementation for raid health in proxmox. I think you can set up daily checks with email notification for this but it really should be apparent on the main dashboard

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +30

      It will send you an email notification if it's configured correctly. And that's the first thing you should do after install, just like with TrueNAS and other stuff

    • @Darkk6969
      @Darkk6969 7 месяцев назад

      @@marcogenovesi8570 Yes and no. It will send out notification once the ZFS pool has failed and finished resilivering. It won't send out notification even it's in degraded state which is crazy. You can change this in the ZFS checks script inside ProxMox. No idea why it wasn't set by default.

    • @MakersEase
      @MakersEase 7 месяцев назад

      ​@@marcogenovesi8570exactly..

  • @alanwparkerap
    @alanwparkerap 7 месяцев назад +22

    I feel like you forgot a step. In proxmox official documentation you need to first copy the partition scheme with sgdisk to ensure the replacement disk can also be booted from. The config you have there mirrors data but not boot partitions

    • @alanwparkerap
      @alanwparkerap 7 месяцев назад +3

      Sorry missed comment where this was already addressed.

  • @novellahub
    @novellahub 7 месяцев назад +5

    Hopefully round 2 will break the RUclips algorithm!

  • @GourmetSaint
    @GourmetSaint 7 месяцев назад +7

    What about the boot partitions? There's a guide in the Proxmox dock, under ZFS on Linux, which describes the sgdisk copying of the partitions before the zpool replace?

  • @LtGen.Failure
    @LtGen.Failure 7 месяцев назад +1

    You can have surprises regarding the disksize. A 1TB SSD is not always the same size as another 1TB SSD. I ran into this problem when trying to replace a faulty SSD with a new one. The replacement i purchased had a little bit less storagecapacity than the faulty one despite coming from the same manufacturer and being the same model and size. In the end i recreated my zfs-pool and restored the data from my backup.

  • @rett.isawesome
    @rett.isawesome 7 месяцев назад +1

    wow incredible. I'm seeing this for the first time. I've never seen anything like this. one of a kind.

  • @eDoc2020
    @eDoc2020 7 месяцев назад +3

    This is the type of thing that should have a built-in GUI option. Especially since, as others have said, you did not copy the boot pool or records.

    • @ericneo2
      @ericneo2 6 месяцев назад

      Yeah there really should. There should be a warning in the overview that then takes you to through the steps of first scrub, then replace and if both fail restore from backup.

  • @criostasis
    @criostasis 7 месяцев назад +1

    Good video, reminds me to sync my external backup today!

  • @SystemPromowania
    @SystemPromowania 7 месяцев назад +1

    More Proxmox content always nice to wach :)

  • @elmestguzman3038
    @elmestguzman3038 7 месяцев назад +2

    Came for the x79 content and stayed for the x99!!! Now all about the proxmox!!!

  • @breadworkshop
    @breadworkshop 7 месяцев назад

    Big fan of Office Buzzword Jeff, reminds me of some of the senior management at my job 😂

  • @prodeous
    @prodeous 7 месяцев назад +3

    Have to admi, I love the sponsor skit you shown.. did you write it yourself or was that provided to you? The level of buzz words, perfect.

  • @sonsti8014
    @sonsti8014 7 месяцев назад

    I would very much be interested in a video about monitoring solutions! Thanks as always!

  • @1leggeddog
    @1leggeddog 7 месяцев назад +1

    This vid is going in my home server playlist 🥰

  • @larslessel4501
    @larslessel4501 7 месяцев назад

    Thanks for a great video!

  • @RetiredRhetoricalWarhorse
    @RetiredRhetoricalWarhorse 7 месяцев назад

    I've had errors on my large pool and I reset the disk twice in many years. It's still going and it's been months without a single error on that disk. Really interesting. I have a replacement new in box ready if I ever do feel the need to replace a disk but so far... nothing. That's over an array of 12 disks. I mean sure, my disks aren't worked very hard but still I'm impressed.

  • @TAP7a
    @TAP7a 7 месяцев назад +1

    Haircut looking sharp

  • @RockNLol2009
    @RockNLol2009 6 месяцев назад

    In the video you say it correctly, but in the video description the commands are "zfs ..." instead of "zpool ...". Just stumbled over this, while replacing an SSD ;)

  • @ZaPirate
    @ZaPirate 17 дней назад

    thanks for the video

  • @WolfgangDemeter
    @WolfgangDemeter 7 месяцев назад +6

    Good video, but didn't you miss to replicate the partition table of the old / known good device (-part3 for rpool) and instead used the whole new NVMe device for rpool. If your now old NVMe device fails you have now no Boot-Partition to boot from on your newly added NVMe. Or am i completely wrong here?
    Those are the steps, i usally take to replace a failed Proxmox ZFS rpool disc:
    1) Replace the physical failed/offline drive with /dev/sdc (for example)
    Initialize Disk
    2) From the WebUI, Servername -> Disks -> Initialize Disk with GPT (/dev/sdc) OR gdisk /dev/sdc --> o (new empty GPT) --> w (write)
    Copy the partition table from /dev/sda (known good device) to /dev/sdc
    3) sgdisk --replicate=/dev/sdc /dev/sda
    Ensure the GUIDs are randomized
    4) sgdisk --randomize-guids /dev/sdc
    Install the Grub on the new disk
    5) grub-install /dev/sdc
    Then replace the disk in the ZFS pool
    6) zpool replace rpool /dev/sdc3
    OR zpool attach rpool /dev/sda3 /dev/sdc3 --> sda3 known good devcie / sdc3 new device
    Maybe detach old disk from the ZFS pool
    7) zpool detach rpool /dev/sdx3
    Maybe install Proxmox Boot Tool to new device
    8) proxmox-boot-tool status
    proxmox-boot-tool format /dev/sdc2
    proxmox-boot-tool init /dev/sdc2
    proxmox-boot-tool clean

    • @ericneo2
      @ericneo2 6 месяцев назад

      Just curious, is there no need to offline the old disk? or replace the old disk with the new one at step 6?

    • @WolfgangDemeter
      @WolfgangDemeter 6 месяцев назад +1

      ​@@ericneo2 "zpool replcae ..." does that implicitly if your replaced disk is in the same physical location (/dev/sc). If your hardware detects the replaced disk in a different physical locaion (like /dev/sdz), then you might have to go the "zpool attach / detach" route.
      As far as I recollect, I never had to use "zpool offline / online". I think those commands are more relevant if you use something like FibreChannel to connect to your Storage(-Disks) and maybe have to redo some network-cabeling.

    • @ericneo2
      @ericneo2 6 месяцев назад

      @@WolfgangDemeter That makes a lot of sense, thanks man.

  • @JaredByer
    @JaredByer 7 месяцев назад

    Hey, when you built that NAS what did you use to split up the power supply to four molex? A couple of molex splitters? Some Sata to molex adapters? some other type of adapter?

  • @PMEP12
    @PMEP12 7 месяцев назад +1

    This video came at the perfect time for me as my boot disk just became degraded a few weeks ago. It'd be pretty good if there's a walk-through of going from a single disk to a mirror migration. AFAIK, this isn't a thing and I can't clone the drive because one specific path is causing disk error when I try to do anything on it so I need to re-installed the OS and migrate the VM spec over to the new boot drives.

    • @kennis942
      @kennis942 7 месяцев назад +1

      mine became degraded a year ago and then it took me a year to replace one drive but no real read write or checksum errors but got new on warranty so....hopefully the 2 other drives that now shows as defraded are just mistakes/misreadings

  • @emiellr
    @emiellr 7 месяцев назад

    Fresh cute, love it

  • @praecorloth
    @praecorloth 7 месяцев назад +4

    5:03 Re: Monitoring
    I'm partial to Nagios, myself. Nagios Core can be a pretty big hurdle for people new to the homelab scene. However, Nagios XI provides you a free 7 node, 100 service check license. That gets you access to the core configuration manager, dashboards, performance data, all kinds of good stuff.
    Now, for the purposes of this story, as well as full disclosure, I work for Nagios Enterprises. When I started there, I was like, "This is cool, but I'll probably keep using Nagios Core, because I'm a nerd like that." My brother in Flying Spaghetti Monster, that sentiment lasted all of like two weeks. Then I was like, "Can I have an XI license to use for my home environment?" :D

  • @WyvernDotRed
    @WyvernDotRed 7 месяцев назад

    10:20 with BTRFS, it may be smaller than the rest of the drives.
    And parity of a RAID configuration can be rebuilt on the leftover disks, if they have enough free space.
    But BTRFS has limits with scaling to larger servers as the performance optimised RAID5/6 and to a degree RAID10 should be avoided.
    I'm using RAID1 with RAID1C3 for the metadata in my server, but haven't gotten to making this array useful yet.
    This RAID only ensuring the amounts of copies but it spreading the writes more randomly as it's filesystem level RAID.

  • @Sunlight91
    @Sunlight91 7 месяцев назад

    12:30 You do a cost-benefit analysis for beer? Did you create a spreadsheet?

  • @fierce134
    @fierce134 7 месяцев назад +2

    Also, nice haircut Jeff

  • @ajpenninga
    @ajpenninga 7 месяцев назад +7

    It's always good to know some of the tools under the hood. Maybe a full featured video of ZFS commands on Ubuntu or similar?

  • @adamchandler9260
    @adamchandler9260 7 месяцев назад

    When do we get an updated video on the new studio?

  • @bobruddy
    @bobruddy 7 месяцев назад +1

    don't you have to worry about the uefi partition? or is that on the zfs volume?

  • @msokolovskii
    @msokolovskii 7 месяцев назад

    Hi Jeff! Can you recommend any reliable enterprise SSD for home server application for storing all my VMs which are controlled and accessed by proxmox? Preferably these disks should be available in local stores, not only as used parts.

  • @SideSweep-jr1no
    @SideSweep-jr1no 7 месяцев назад

    Damn I love that Ad intro 😄!
    [edit] You look different... Even maybe even... Do I dare to say it... handsome? Do you have more hair? Or did you change hairstyle? WTF Happend?!!

  • @TevisC
    @TevisC 7 месяцев назад

    UnRaid supports zfs nativity now. Drive health is very easily monitored.

  • @daves4081
    @daves4081 7 месяцев назад

    ZFS is awesome

  • @ierosgr
    @ierosgr 7 месяцев назад +3

    You should have mentioned if the drive to be replaced participated in a boot pool and except from zfs the installation is in uefi mode , then you need to fix the partition tables of the new disk yourself in order to be exact the same of the working disk. The process is not as straight forward as it seems in the video

  • @marcogenovesi8570
    @marcogenovesi8570 7 месяцев назад

    Imho this video is a bit rough, a bit rushed. No mention about setting up email notifications for events and zfs (very important to get notified of failures), no mention of the fact that this is a root pool so you have to follow a different procedure to create the boot partition and update the config so proxmox re-generates the boot files there too.

  • @UntouchedWagons
    @UntouchedWagons 7 месяцев назад +1

    Just a note that the new drive does not have a bootloader.

  • @inputoutput-hd7jl
    @inputoutput-hd7jl 7 месяцев назад +2

    you should really still be using disk uuid's when replacing the degraded disk

  • @tollertup
    @tollertup 7 месяцев назад

    I just had my SSD mirror disks fail, both Crucial BX500 240gb disks. exactly one month apart. Luckily I adopted the habit of buying random 120/240/500gb ssds whenever I buy anything as they are sooo cheap. I replaced the 2 disks with Crucial MX500 240gb disks, they are actually larger! only by 10gb.

  • @ewenchan1239
    @ewenchan1239 7 месяцев назад +1

    Two things:
    1) ZFS on root is good on Proxmox IF you do NOT intend on passing a GPU through and/or if you are willing to deal with the additional complexities that ZFS root presents, when you're updating the GRUB boot command line options (which is required for GPU passthrough).
    2) Replacing the disk by the /dev/# rather than the /dev/disk-by-id/# is NOT the preferred way to do it because if you have a system with multiple NVMe controllers, the way that Debian/Proxmox ennumerates those controllers and slots CAN change, therefore; replacing the disk via zfs replace is the preferred way to execute the zfs replace command as it no longer cares WHERE you put the disk (i.e. on different NVMe controllers), so long as it is present in the system.

  • @pkt1213
    @pkt1213 7 месяцев назад

    Looks like you got a haircut. For this episode we'll call you Fresh.

  • @fasti8993
    @fasti8993 Месяц назад

    You are saying that keeping tabs on system health is paramount, what ist absolutely true. But what the best monitoring systems?

  • @IvanToman
    @IvanToman 7 месяцев назад +3

    I don't think that using /dev/nvme0n1 instead of /dev/disk/by-id/... is smart idea :)

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +1

      it's fine, zfs locks on the drive signature anyway when starting the array. It's not a normal linux filesystem. The only downside is that it is less readable

  • @jeffhex
    @jeffhex 6 месяцев назад

    Is there a way to do all that from the GUI like in TrueNAS?

  • @Grid21
    @Grid21 7 месяцев назад

    Also, your link to the Observium tutorial link is broken and doesn't seem to go anywhere.

  • @ScottGagon
    @ScottGagon 7 месяцев назад

    Synergistic-ally Action Cloud Deliverables!

  • @Richard25000
    @Richard25000 7 месяцев назад

    I had a pool randomly decide to corrupt, no drive failures..
    It made FreeBSD Kernel Panic anytime the pool was attempted to be mounted. (TrueNAS Core)
    I thought maybe I'll boot linux and try mount it (TrueNAS Scale)
    Nope managed to kernel panic linux too...
    Thankfully, I replicated to another nas, and recreated and restored the pool..
    Painful but was a known restore time scale than trying to fix a broken pool..

  • @fxandbabygirllvvs
    @fxandbabygirllvvs 7 месяцев назад

    hey man not to pester you but what up with the shops remodel any update i would love to see more on it

  • @gregliming6875
    @gregliming6875 7 месяцев назад

    Wow... I could've used this video 3 weeks ago.

  • @RambozoClown
    @RambozoClown 7 месяцев назад

    While software RAID is the future, I really did like my old Compaq hardware RAID servers. See a drive with a yellow light, pull it and replace it with a new one, job done. The controller does everything in the background, no need to even log in. The green light starts blinking as it rebuilds, when it's solid, it's done. I could even run redundant hot swap controllers in the cluster boxes.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +1

      ZFS has autoreplace option for each pool/array (off by default) to do that. If it detects a drive is pulled then a drive is inserted in its place it will format it and start array rebuild.
      Also it has the ability to control the enclosure LEDs (if he SAS HBA and the cable support the feature and Linux system sees the LEDs) through the zfs-zed daemon (which is also responsible of sending email for disk failures).
      So in theory it can do that too.

    • @RambozoClown
      @RambozoClown 7 месяцев назад

      @@marcogenovesi8570 Thanks, I wasn't aware of that. Something I will want to look into for sure.

  • @berndeckenfels
    @berndeckenfels 7 месяцев назад

    If you replace with the short disk name is it still finding the disk by id later on when device names change?

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      yes zfs looks at the filesystem signatures on boot it does not store a path like normal linux in /etc/fstab

    • @berndeckenfels
      @berndeckenfels 7 месяцев назад

      @@marcogenovesi8570 it’s just weird that the short disk name was mixed with the long in Jeff’s status, is that changing after reboot?

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      @@berndeckenfels it can change name if the disk is no more at the same /dev/sdx place.

  • @alexanderlavoie5461
    @alexanderlavoie5461 7 месяцев назад

    I have a 2200ge 4c4t on one of my Prox machines. I also have a 2400ge that I can swap into it. Would Prox get really upset if all of a sudden if has a new CPU running better show?

    • @alexanderlavoie5461
      @alexanderlavoie5461 7 месяцев назад

      The show***

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      Proxmox (and linux OS in general) does not care. You can power off, change the CPU and power on and it will be fine.

  • @cashy2011
    @cashy2011 7 месяцев назад

    Just checked on my servers thinking all is fine....well :D

  • @gsrcrxsi
    @gsrcrxsi 7 месяцев назад

    The scrub command does not accept the -v option.

  • @frankwong9486
    @frankwong9486 7 месяцев назад

    Software hardware requirement: technical it do exist
    You need sata controller or pcie lanes depending on drive type
    Pcie lanes are not cheap 😢

  • @Grid21
    @Grid21 7 месяцев назад

    I have question, can you cover how to do this on TrueNAS? Because I am unsure if I'd be able to find the disk that's faulty should that ever come up for me.

    • @CraftComputing
      @CraftComputing  7 месяцев назад

      For a TrueNAS pool, it's the exact same setup. zpool status, zpool scrub, zpool offline, zpool replace.

    • @darkain
      @darkain 7 месяцев назад +1

      TrueNAS has this in the UI already. Just replace the physical disk, go into the UI under System > Boot > Status, and everything can be managed right there. Its all point-n-click :)

    • @LA-MJ
      @LA-MJ 7 месяцев назад

      Honestly it works better in TrueNAS GUI. No stress, You will do just fine, if the time comes.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      @@LA-MJ *when

  • @icmann4296
    @icmann4296 3 месяца назад

    How do I expand my z2 pool by one drive?

  • @LaserFur
    @LaserFur 7 месяцев назад +1

    A decade ago I had a NTFS raid [1] really mess up everything. One drive dropped out and then months latter when I rebooted the drive came back online. The OS took alternating old and new data and everything was a mess. Worse yet I did not know what was wrong and tried to repair it. If I had pulled out one of the drives I would have gotten either the old or the new data. but instead I lost a month's worth of work. as it had been a month since the last full backup.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      that's more like a raid1. A raid0 stops working if you pull a drive

    • @LaserFur
      @LaserFur 7 месяцев назад

      sorry. yes raid 1. @@marcogenovesi8570

    • @dudaskank
      @dudaskank 7 месяцев назад

      But why the problematyic driver was still connected? Or you didn't notice the failure?

    • @LaserFur
      @LaserFur 7 месяцев назад +1

      @@dudaskankI didn't notice since when I looked both drives were working. It's that there was no way for the hardware raid to realize the two drives were out of sync.

  • @sourcilavise3788
    @sourcilavise3788 7 месяцев назад

    Just my grain of salt but wouldn't it be better to install proxmox on a single nvme, backup this nvme and make a datastore of SATA SSD/HDD to store the virtual machines and data ? (with a raid Z1 for example) ? I don't remember the previous videos but maybe you need the read/write speed of those nvme ?

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +1

      nvme is much better than sata ssd/hdd for VMs and using them as system drives does not hurt. Not sure what logic you follow to waste a nvme for system, use a single drive, and then put the VMs on slower drives.

  • @hardleecure
    @hardleecure 4 месяца назад

    zfs on windows any good yet or still a dumpster fire?

  • @SEOng-gs7lj
    @SEOng-gs7lj 7 месяцев назад

    I have checksum errors in my vm image on a zfs mirror... any idea how to find the cause and reset it to 0?

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      the cause may be anything from a loose cable to a dying disk, so check disk health with smartctl -a /dev/sdX (X= whatever a b c d is the drive).
      Then you can clear errors with
      zpool clear poolname

    • @SEOng-gs7lj
      @SEOng-gs7lj 7 месяцев назад

      @@marcogenovesi8570 they are both nvme so no cables to speak of, both have smartctl passed with 1% and 2% used respectively, i can clear the errors, but the come back right after the scrub completes

    • @testingtime7780
      @testingtime7780 5 месяцев назад

      @@SEOng-gs7lj Forget ZFS, had the same change cable etc, could be RAM could be your powerdeliver everything which does a tinything can break ZFS, I moved away too much stress with that crap.

  • @yoghurrt1
    @yoghurrt1 7 месяцев назад +1

    It's tiem to leave a comment

    • @CraftComputing
      @CraftComputing  7 месяцев назад +2

      For the algorithm!

    • @yoghurrt1
      @yoghurrt1 7 месяцев назад

      @@CraftComputing for the algorithm! *raises sword*

  • @dustojnikhummer
    @dustojnikhummer 7 месяцев назад

    Sadly my server only has a single M.2 slot. But this is also useful for TrueNAS, hopefully.

    • @CraftComputing
      @CraftComputing  7 месяцев назад +1

      Yep! Same steps for TrueNAS, or any other ZFS configuration.

    • @Ian_Carolan
      @Ian_Carolan 7 месяцев назад

      Create zfs root filesystem with copies=2 property set on your single nvme drive. This will provide some extra proofing against bit rot, but note this is redundancy not a backup.

    • @dustojnikhummer
      @dustojnikhummer 7 месяцев назад

      ​@@Ian_CarolanWill that half usable space?

    • @Ian_Carolan
      @Ian_Carolan 7 месяцев назад

      @@dustojnikhummer Yes, each file and meta data are stored twice on the disk, with the copy being used to recover from bit rot or corruption if detected. Copies can be greater than 2. This is like creating a mirror when you only have one disk. The copies= property must be set at create/install time before the system is installed to the root file system as only newly created files are copied twice or more to disk after the property is set.

  • @AndehX
    @AndehX 5 месяцев назад

    *Second disk fails*
    *Replace it following this guide*
    *wipe forhead, crisis averted*
    *First disk fails*
    *System is unable to boot because vital boot partitions are weirdly not mirrored during the resilvering process*
    *Facepalms*

  • @killingtimeitself
    @killingtimeitself 7 месяцев назад

    "sometimes the first step to solving a problem, is creating a problem" Everyone who has been burnt by not testing things properly.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      why have a test environment when you can test in production

    • @killingtimeitself
      @killingtimeitself 7 месяцев назад

      for home gamers its pretty much the case, unless you put MORE effort into it because why not@@marcogenovesi8570

  • @unijabnx2000
    @unijabnx2000 7 месяцев назад

    if you drop to a shell... and run mutt/mail you should have gotten those degraded alerts emailed to root user.

  • @ofacesig
    @ofacesig 7 месяцев назад

    I currently run an NFS share on my TrueNAS and feed 10Gbps to my Proxmox box for VM storage. I don't like what I'm seeing on how Proxmox reports this stuff.

  • @SikSlayer
    @SikSlayer 7 месяцев назад +1

    Repost?

    • @CraftComputing
      @CraftComputing  7 месяцев назад +2

      Yeah. Major edit error made the first cut.

    • @fierce134
      @fierce134 7 месяцев назад

      Hope you don't get a video made about your error!

    • @CraftComputing
      @CraftComputing  7 месяцев назад +1

      To whom it may concern: *

  • @loginof-docs
    @loginof-docs 7 месяцев назад

    zpool scrub -v pool_name
    invalid option 'v'

  • @CS-yg4vt
    @CS-yg4vt 7 месяцев назад

    i respect proxmox but would prefer just a standalone true nas core. i guess im a newb?

  • @MarcoZ1ITA1
    @MarcoZ1ITA1 7 месяцев назад +1

    Do NOT use the simple /dev/ path when replacing a disk. there's no guarantee Linux will always use the same number for the same drive. There's a reason why rpool is set up with disk by id.

    • @pierreb4319
      @pierreb4319 7 месяцев назад

      and yet ZFS doesn't really care but it's prettier

  • @lifefromscratch2818
    @lifefromscratch2818 7 месяцев назад

    That's horrifying that it doesn't send you an email like when a backup fails and show some blinking lights somewhere on the dashboard.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      yes it will send emails to admins, it can also send emails for backups and other events

    • @Darkk6969
      @Darkk6969 7 месяцев назад

      @@marcogenovesi8570 Except it won't send out e-mail when ZFS is in degraded state. It will when the ZFS pool dies which is already too late. I've tested this scenario and it won't send out an e-mail. I get e-mails about everything else except that which is crazy.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      @@Darkk6969 zfs sends emails independently through the zfs-zed daemon, see the proxmox docs for "ZFS_on_Linux" and zfs-zed decumentation in general.
      Since I like to have email notifications from that and other base Linux services that default send email to "root" (for example smartctl daemon) I have set an alias in
      /etc/postfix/generic
      to remap all mails that are sent to "rootATlocaldomain" and "rootATmydomain.lan" (my local domain) and "rootATpve1.mydomain.lan" (the full host name of this host, as specified in /etc/hosts)
      to the email that other notifications are also sent to.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      @@Darkk6969 another way is explained in TechnoTim's video about proxmox email notifications, it's very in-depth and is a recommended watch imho. It has a section about zfs notifications too.

  • @hicknopunk
    @hicknopunk 7 месяцев назад

    How you got zfs working in cpm is beyond me 😅

  • @computersales
    @computersales 7 месяцев назад

    ZFS may be better, but replacing a drive on a RAID controller is so much easier. I was stumped when I had my first ever drive failure, and all the instructions said to pull the drive and insert a new one. Didn't realize it was that easy.

    • @horst.zimmermann
      @horst.zimmermann 7 месяцев назад +3

      unless your raid controller dies and you dont have the same modell handy, then you are f...... ,ask me how i know😂

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +2

      depends from the RAID controller, as that's just an option that can be on or off depending on the sysadmin's choice. ZFS can do that too if you enable the autorebuild option for that pool.

    • @computersales
      @computersales 7 месяцев назад

      @@marcogenovesi8570 I'm surprised that feature isn't touted more often. It would be cool if I could just swap a dead drive in my zfs arrays without doing anything.

  • @Vatharian
    @Vatharian 7 месяцев назад

    zpool-scrub doesn't have the -v argument...?

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      no, it is a command that starts a background process. Scrub on a large HDD array can take days
      If you want to follow the progress you can use this command
      watch zpool status poolname
      And exit the watch process with ctrl+c (the scrub continues unaffected)

  • @justinnamilee
    @justinnamilee 7 месяцев назад

    $12USD for 4 beer... Tall boys by the look of it... For canadians, that's not bad. Down right approachable. xD

    • @CraftComputing
      @CraftComputing  7 месяцев назад

      Yeah, but do you spend $12 for 4x Pilsners at 4.8% that taste like every other Pilsner, or do you spend $12 on 4x genuinely unique 8.4% Double IPAs?

  • @strandvaskeren
    @strandvaskeren 7 месяцев назад

    I feel all these "let's simulate a disk failure" videos sends a dangerous message. If you have a pool of healthy disks and simulate one or more of them failing the rebuild process works great. If you have a pool of old and thus crappy disks and one of them fails for real, the process of replacing and rebuilding a new disk will very likely kill one of more of your remaining pool of equally old and crappy disks, leaving you high and dry.

    • @welshalan
      @welshalan 7 месяцев назад

      It's ok. You take backups for this scenario, obeying the 3,2,1 rule. 😉

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +1

      yeah when an old disk fails you should just throw the whole array away and buy new drives. That's the only logical solution

    • @CraftComputing
      @CraftComputing  7 месяцев назад

      ....are you advocating for no redundancy, only backups? I really don't understand... "If a drive fails, the rest may fail while rebuilding, so why bother showing how to rebuild". Did I get that right?

    • @welshalan
      @welshalan 7 месяцев назад

      @@CraftComputing is that directed at me? I can't tell on mobile. I'm definitely not advocating no redundancy. I was only pointing out that redundancy is not the only tool in our kit. Local redundancy, geo redundancy. Whatever's required based on the importance of the data being stored. (Hopefully not just the budget, but we do work in tech, it happens). Definitely a great case for having arrays of disks bought at different times, and disks should be replaced to a schedule. Looking at backblaze's data though, disk failures aren't consecutive. It's somewhat random but does increase with age. Redundancy and backups all the way. If our important data doesn't exist in multiple copies, it doesn't really exist.

    • @welshalan
      @welshalan 7 месяцев назад

      @@CraftComputing one thing I was wondering about though with proxmox. Does it do a regular automatic parity check / patrol read across the zfs array, to check for errors on a regular basis, out of the box?

  • @klobiforpresident2254
    @klobiforpresident2254 7 месяцев назад

    I stopped watching your channel a few years ago, I'm not sure why. Now that I need to store several terabytes of data that I don't want to sacrifice my D:/ drive for I guess I'm back. The more things change, the more they stay the same.

  • @ychto
    @ychto 7 месяцев назад +2

    No you're a ZFS pool

    • @CraftComputing
      @CraftComputing  7 месяцев назад

      I identify as a 4000 ADA SFF

    • @ychto
      @ychto 7 месяцев назад +1

      So you get killer framerates in Solitaire?@@CraftComputing

  • @lawrencerubanka7087
    @lawrencerubanka7087 Месяц назад

    I love Proxmox, just disappointed that their UI is rubbish. Having to copy and paste the drive ID into a command line to specify a replacement drive is not enterprise grade nothing. You get what you pay for, eh?

  • @5Breaker
    @5Breaker 7 месяцев назад

    I don’t like zfs on proxmox in particular. Every install in keeps about 50% of the drive for the system and the other half for the pool. Since I don’t need 500G for proxmox itself I ether remove the zfs pool entirely or extend it which is definitely not easy. I rather go through a Debian net install and have a clean lvm install and then install proxmox over it instead of dealing with zfs with the ISO.
    ZFS on a dedicated NAS is a completely different thing tho so don’t get me wrong. I like zfs but not in combination with proxmox.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +2

      afaik it creates a separate pool only if you are using LVM.
      When I install with zfs it just creates a single pool with all the space.

  • @yt2100
    @yt2100 7 месяцев назад

    I have been watching your channel for a long time, and you have been very helpful on many topics, but this video is very misleading as you skipped a very critical step of replicating the partition setup so you could also mirror the boot partition. You acknowledge this in another comment, but it's not sticky and you have not made any attempt to correct this video. But the Vultr sponsorship is stuck at the top... I'm all for making money, but you should correct your video immediately as you are impacting people who may be using this advice and breaking their systems. The viewers are just as important as sponsors.

  • @Liqweed1337
    @Liqweed1337 7 месяцев назад

    dont be too clingy on to your digital data.
    i lost my whole digital existance (my drives) twice in my lifetime.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      that's easy for you to say when you have nothing left to lose

    • @Liqweed1337
      @Liqweed1337 7 месяцев назад

      you will realize how little the things you can loose are actually worth.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад

      @@Liqweed1337 that's cope

  • @kspau13
    @kspau13 7 месяцев назад

    zfs is not the best choice for home labs or small business.

    • @FunctionGermany
      @FunctionGermany 4 месяца назад

      then what is? BTRFS? doesn't have stable parity redundancy.

  • @YoutubeHandlesSuckBalls
    @YoutubeHandlesSuckBalls 7 месяцев назад

    I've been casually looking for a while to try to find a disk format/system that will take care of a bunch of drives, allow you to add or remove a drive at will, of various sizes, and report any errors as they occur. It should try to maximise throughput (read and write) while having redundancy, so long as there are at least 2 drives. Ideally with a web interface.
    I know, moon on a stick.

    • @marcogenovesi8570
      @marcogenovesi8570 7 месяцев назад +1

      probably btrfs-based NAS system like Rockstor running RAID1. Btrfs "raid1" works at the data level so as long as you have enough redundancy it's ok if it's different drives. For example you can do a RAID1 with 2x 512GB drives and 1x 1TB drives. Data will be split equally so it's either on one 512GB drive and the 1TB or on the other 512GB drive and the 1TB one.

    • @YoutubeHandlesSuckBalls
      @YoutubeHandlesSuckBalls 7 месяцев назад

      @@marcogenovesi8570 Sounds good. I'm increasingly noticing that the better filesystems require a linux based OS. Might have to bite the bullet on that one.

  • @billclintonisrapist870
    @billclintonisrapist870 7 месяцев назад

    I use Zabbix to monitor Proxmox via "Linux agent" and Proxmox API template. Would make for a good video. Make sure to build your own Zabbix and not use the evaluation VM image

  • @feelsbad1193
    @feelsbad1193 7 месяцев назад

    Please more ZFS content!!! Maybe some ZFS on root also? Not BSD but Linux ZFS on root? I would love to see some of that.

  • @orangeActiondotcom
    @orangeActiondotcom 7 месяцев назад

    correct answer: run screaming away from zfs and never look back

    • @testingtime7780
      @testingtime7780 5 месяцев назад

      True, I was on ZFS once and faced a lot of issues like checksum errors; it was the worst experience I ever had. The pool degraded rapidly (HDDs had no issues), and in the end, I had to shut down the server and redo everything. The only positive aspect was the performance-read/write speeds were around 700 MB/s, which was close to my 10 Gbit max speed. However, it caused too many headaches. kkthxbb ZFS