Thanks for the video. I plan on running badblocks on (2) 14TB drives. Since this video is 4 years old, is there anything you would do differently today? Any issues running this from a Ubuntu Live USB drive?
Bookmarking this as I'm working on building a 24 drive TrueNAS Scale system. :D At some point I'm going to have to do this and let it sit and do its thing for a month and a half. 😃 I don't know if such an old post is being monitored but how much storage space do you need to allocate for the log output dumps? As 24 x 16TB drives is going to be A LOT.
Hello Art of Server! Thanks for the amazing video, I just started this test on 6 new drives I purchased, and so far so good! Also, I was wondering if it was advisable to run this kind of test on NVMe drives. I'm about to setup a hybrid pool with ZFS, using NVMe drives for ZIL and the special metadata device, and I'm thinking I would like to have for these drives the reassurance bht is providing me for traditional spinning hard drives. Setting aside the limited endurance of flash storage, is there a good reason not to run bht or badblocks on NVMe drives? Thanks again!
I'm glad you found this helpful! I don't think I would use this type of testing on flash storage. Most modern flash storage already keep track of wear, and their failure mode is usually related to controller failure (if not by cell wear out). For flash storage, I would just check the statistics on wear leveling, remaining endurance, and GB written. For maintenance, I might run a discard. That's just my opinion.
@Art of Server Thanks for the great video(s). I have run bht on a few drives already. If I remember correctly, you have mentioned (probably in a different video) that you wouldn't use drives which showed bad blocks in this test. But what about drives with reallocated sectors, even a small number of them? You do bring Reallocated_Sector_Ct in the summary. I have SAS drives and in the SMART report for one drive I just bought it has "Elements in grown defects list: 4". Would you use such a drive?
Drives are cheap enough now, I think I would not use that drive. If you can get it replaced by the vendor you bought from, I would choose that route. If you can't get it replaced, I would only use it for testing situations where the data may not be as important. When drives develop bad sectors, it doesn't mean it is completely unusable. I find that there are 2 scenarios: 1) a few bad sectors, but then it remains stable and the drive can work for many years, or 2) the bad sectors keep growing and fails completely after some time. The only way to find out which scenario you have, is to do further testing and monitor the defect list. It may not be worth the time to do this if you can get it replaced. If not, then run bht on the drive a couple of times and see if the defects grow or not.
@@ArtofServer Thanks. I'll try to get it replaced. This drive was sold as "new out of box" and then turns out to have been in use for over 28000 hours. The price wasn't bad even for a used drive; I wouldn't have returned it just for being used but I'm not going to keep a drive with "elements in grown defects list".
I would love to hear your experiences about deciphering smartctl readouts. I realize this is a vast topic, but it's an important one. Currently, I am sitting in front of a zfs test which has been going on ~1year. Just a humble Freenas. It is near me. I noticed yesterday, the sound of a repeated HDD spin-up. Sounds like a drive is stuck in a loop. it does a pattern of something for ~10 seconds. Then spin-up. smartctl shows interesting stuff(high numbers compared to other drives), but no fail. It is at this moment I am reminded of how strange these smartctl readouts are. i recall in the past that some HDD's don't support all smart tests and the value's can show up wonky. In my situation, if this drive continues like this, it is likely to fail. If I was not near the computer in a silent room I would have no way to know. Even right now, in order to know exactly which one it is, I will have to detach it from the case and keep the wires connected. JUST to know if it is indeed the one I suspect. (plug it in somewhere else=yea) Looks like an ironwolf I got last year. I'd be mad if this is the case, b/c I specifically went out of my way to NOT get the ones with the funny firmware(stripe gate?)
That could be an interesting topic for a future video. I avoid Seagate at all costs. I'll gladly pay more for a used HGST over a brand new Seagate. I've lost count of how many times a customer has reached out to me to get help troubleshooting their storage setup and it turned out to be some strange Seagate issue.
Nice video useful as always In case someone wanted to run 1 pass instead of 4 what would be the option to do that? Is bht's main difference than DD, that can run simultaneously to multiple hdd's ? It took 6 days at what Hdd capacity? Does wd cwd100emaz pid refer to a 10TB drive ?
Great video & channel May I have several questions: 1. Is badblocks still relevant with new disks? I've read that newer HDD might not report errors to badblocks so it will never know if sector was bad because it was relocated by HDD behind the scenes. In this case people recommends S.M.A.R.T. where such value is shown. 2. man page of badblocks mentions that results from badblocks can be used for creating/correcting ext3/ext4 file systems. I know that badblocks can be run on either disk or partition but does the eixsting FS has some effect on badblocks and is it recommended to run on other FS than ext3/ext4? 3. Currently I have HDD which I have cloned using ddresce with no problems but badblocks -nvs on this drive created 11GB! log file. smartctl does not show anything suspicious (Reallocated_Sector_Ct is 0, Power_On_Hours is 136, Multi-Zone Error Rate is NA). What do you recommend for this case? Thanks
Late to the show but for anyone else who sees this question....Even if its reallocating in the backend it would still trigger a smart count uptick. This is why its important to track the values before you start the process and after. Badblocks for what is being referenced here is designed for low level inspection of the disk. Not file level. The actions in this video are destructive. ZFS that would be used in this case is different than extX. 3. Restart the system and run it again. I've seen posts online where people were getting lists and lists of bad sectors but when they run it again....nothing where SMART shows clean.
That's a good question. I would look at the stability of the error rather than the number. Some medium errors are indicators of a larger problem.. so if the error keeps incrementing, that's not good. But if there's 1 or 2 bad sectors, and it remains stable and doesn't increment even after a full test like running this BHT, then it might be okay. Ideally, I'd like to see 0 errors and no growing defects, but short of that, I don't mind a handful of bad sectors that are not growing if the price was right.
Old video but I m going to give it a shot. Not quite on topic but I happened to use badblocks then long smartctl on new drives and I have a weird issue irrelevant of the tests with the hdds. I ve bought 7 new Dell branded (is for a Dell R740) TOSHIBA AL15SEB18EQY sas drives. All of them have this weird issue. Even though the tests pass, the smartctl option Non-medium error count: , increases by 1 with each reboot or turn off / on of the server for all drives. It might not be a medium error (after all the explanation of the Non-medium error count: dictates that), but a problem of the controller (using HBA330) or the cabling or the caddies (if i want to stretch enough). I have 2 identical servers though and other hard drives same spec as Toshiba's and all the other drives don t exhibit the same issue with the Non-medium error count: option. The same behavior happens when the drives tested on a different server. Any insights of what to look for or if i could let this option slide and concentrate more on Elements in grown defect list: ?? Maybe any better tool to check them? PS It seems to me that upon initialization of the drives an unknown parameter is being asked from the controller I assume and since they can t return info back this counter increases by one or something relevant. Thank you in advance.
The non-medium error really isn't specific enough to really understand what is going on. Your "PS" comment might be hinting you in the right direction, but without specifics, I can't really say anything. If the drives work after initialization and incrementing the counter, I would not worry too much about non-specific errors.
Never mind, I was looking at the wrong video. However, it would make your customers more "sticky" if you had a blog where you posted example commands and results and theories, etc. Like a printable summary of what your videos covered. The only thing missing from your videos is standing in a sweltering hot or freezing cold warehouse while you talk about SAS. Do you have a video where you talk about FCP?
Art of Server I don’t drink either - just didn’t want to be cheap! Seriously, this just helped me out tremendously. I don’t need anything in your eBay store at the moment, and I’ve already hit the like button. If there’s some other way I can show my appreciation, say the word!
@@ArtofServer Speaking of our human evolution I think it will be wise for all humans to learn how to accept donations/contributions. If we can't teach each other then information will require institutions. If we cannot sell to each other then corporations are required. personally I have yet to figure out how to accept donations. Sure its easy just to click the yes buttons... but what of the headaches? I just use paypal and venmo for friends and family, but the thought of putting up a public "donate link" to me sounds like > spend 1 hour a month checking(reconciling) when your not popular >> and > hire accountant to track the BS if you become well known
I have another question. I ran the test, the final output was different than yours. I do not see a line for "Smart Reallocated Sector" and "Smart_Multizone_Error_Rate".... HUS724040ALS640_PCG6B85X: badblocks[Reading and comparing Pass completed, 0 bad blocks found. (0/0/0 errors)] HDD Type:[SAS] SMART:[power_on_time(hours:minutes)=7672:15] Any idea why?
That's a great question, but unfortunately, I'm not well versed in Windows. I'm sure there are Windows tools for wiping drives and doing a full sector by sector test. I just don't know what they are or which one is the best.
Glad I found this video after buying some of your great products on eBay. I have a Nas4free installation, and I can’t find equivalent tools under FreeBSD. Before I spend hours trying to figure out how to make a live usb with Linux plus persistent install of bht and badblocks, I’m curious how much more comprehensive your approach is compared to, say, using smartctl in long self test mode, or dd in test mode.
Glad this could help. Smart long test is more of a surface scan and does not really test writing to sectors. it's not comprehensive, but can detect sectors going bad because they often first manifest as being difficult to read. I'm not aware of "dd test mode" ?
The bht requires mailx to be configured and kornsh. It would not let me execute regardless if I specify an email or not. I think the badblocks version would be: badblocks -b 32768 -c 512 -wsv /dev/ -o error_output.txt For multiple disks, I use tmux, but it is not as convient as the bht. For those who are using USB 3.0 disk, to get the smartctl to work, you would need to specify the device as SAT to get the smart info; otherwise, it is not going to work. e.g. smartctl -A -d sat /dev/ smartctl -t short -d sat /dev/ smartctl -t long -d sat /dev/
Going to check this out for my server build. What would you consider to be cutoff amount of badbloccks / or reallocated sectors for a drive to be considered no good?
For something important or production use, anything > 0 is pulled out. For casual use, testing, temporary storage, something in the single digits doesn't bother me.
@Art of Server Are you able to explain in a bit more detail how to install these 2 programs? I just installed Linux mint 20 to test 7 14 TB drives and am not able to install these programs as they only have source code for download. Thank you
Didn't know this tool. Last week I created a script that uses smartmontools to list every drive and get it's serial, then the script checks if it's serial is in the exclude list (you insert your OS's HD serial there) and then runs badblock asynchronous on every HDD plugged using "&" at the end of the command. My script is better in a scenario where you replace multiple drives every time and don't want to insert every disk path so it can be tested.
Excellent video! I am looking at setting up a home server and would like to use this utility to test all the hard drives going into the server ahead of time. I'm not very familiar with Linux command line so I was wondering if you would be able to walk me through the setup in further detail. I have already done the git clone command but am unable to run the utility. Any help would be greatly appreciated.
Hi! thanks for watching. So, the instructions on how to use this tool is in the README file on github. Check there first, and see if that makes sense to you. Let me know if you have questions after reading the README file.
@@ArtofServer Yes, I did got through and make sure I had all requirements installed before attempting to run. I did not reboot the system after installing everything, not sure if that is necessary at all. I am currently running badblocks on a single hard drive so will not be able to reboot until that is complete. I am running a clean install of Ubuntu 18.04.
@@dereksumption6018 well, the error message basically means you don't have ksh i think. you'll need to figure out why ksh is not there even though you installed it.
I usually do "ddrescue --force /dev/zero /dev/sdX" to erase the disk. It even omits bad sectors and has a nice output. Nice midnight admin work, too. :)
I think this tool would really help me if I was smart enough to get it to work. Don't know if anyone will find this comment but one of the external utilities that bht checks for before it runs, pvs, is not available/not installed on my system (Ubuntu 18.04 Desktop). I cannot install it on its own nor can I find any straight answers online. Is it because I didn't enable LVM during installation? Is it part of something else? Do I need a server distro? Any help would be welcome. Thanks
The pvs command is part of the lvm2 package. You need to install that package to have that available. I'm not as familiar with ubuntu, but I believe something like this command will do the trick: sudo apt-get install lvm2 Hope that helps.. :-)
No, I don't think I would recommend this type of testing for SSDs as you would end up reducing the endurance of the SSDs. When I can find the time, I was thinking of adding a SSD test mode by using blkdiscard... When I find time! 😂
Following up on this, I know this is 5 years old, as best I can tell this script will not work on any system that does not have the ksh93 (Korn) shell. All the other prereqs are fine as smartctl, badblocks, sha256sum, and lsscsi are common tools. Even on TrueNAS Scale. Except ksh93. If anyone has a newer method they can share it would be appreciated. I'd doing a dry run, building a portable 6 drive NAS in prep for a 24 drive monster next year. Installed TrueNAS, enabled SSH, and found this does dependency is not there. So either I'll need to build a flash drive with a boot environment that has ksh93 or find another solution.
@@ArtofServer got following errror: ATTN: this hard drive testing process will wipe all data ATTN: on the following hard drives: ATTN: /dev/sdb ATTN: ARE YOU SURE YOU WANT TO PROCEED?: y INFO: collecting SMART data from each drive. ERR: could not identify model and/or serial number for /dev/sdb. Abort.
Edit: I found it. I don't use Linux as much as I should and am not super familiar with the terminal. It was a dependency issue. Specifically ksh which is listed as ksh93 on the requirements list but ksh93 is not the name of the package. ... I really should have figured that out faster... I'm disappointed with myself now... ok I have messed something up... I created a install on a usb drive (NOT a live disc) and I am currently using this to test some WD Easy Stores, however I needed to test another drive right away and that test has another 3 days left on it. So, I tried to make another installation to run on another PC to test the drive. When I try to run it on the new installation I keep getting "sudo: bht: command not found" or "sudo: unable to execute ./bht: No such file or directory". I have checked that it is executable, which is all google keeps telling me, but I cant seem to get it to run. It is probably something stupid that I am forgetting but its killing me. I know this isn't really the place but any help anyone can offer would be much appreciated. I vaguely remember an issue like this with a simple fix when I set up the first one, but can't remember it and I am hoping this will help others with this problem.
I'm curious, what OS are you booted into while using the BHT script? I'm trying to get in working on Ubuntu and starting to feel like that might be my whole problem. KSH93 does not seem to be available for Ubuntu. Is it CentOS you're using, or something else?
Hey man, thanks for the video. Running Debian 10, but can't seem to install pvs, and the script won't run without it. I've ton a ton of Googling and can't find a package called pvs or that has pvs included. Any thoughts? Thanks again.
I would like to thank you so much for the time and effort you spent in making this script and and sharing it in a video for the beginners like me. But would you please write in the descriptions the pre-requisites to use this script. For example, you need to install first: ksh, pvs ( found in package lvm2), smartctl (found in smarmontools), lsscsi. And add sysstat to run iostat 🌹💕🌹
@@ArtofServer pvs is not. Ksh93 is misleading for beginners and that’s what happened to me. In the end I realised it is ksh package. I don’t mean to belittle your efforts, but rather to spread the good work you benevolently gave away to us newbies. God bless you and have a blessed day.
@@MichealG I'm open to suggestions for improvements, so thanks for that. I'll look into adding pvs/lvm requirements. I may add some clarification on the ksh93 issue. I meant for that to distinguish from ksh88 and pdksh, which may still be available on some unix systems. Thanks for the suggestions.
I didn't. The default block size of the script is 32k. It was not chosen to match the native sector size of the drive, but to optimize for testing throughput as described in the video.
Thank you very much for publishing this script... I've used to test a few disks already. My first batch of tests was 100% ok but now I'm testing two more disks and I got this result after two days: badblocks[Testing with pattern 0xff Interrupted at block 217784832] badblocks[Testing with pattern 0xff Interrupted at block 198981120] The test seems to have stopped... I was under the impression that in case of encounterings bad blocks it would keep testing and report at the end. Should I do anything else to conclude those disks are bad? Should I run the test again to confirm? Thanks!
I don't know for sure, but I think "interrupted" means the process was interrupted some how... i would re-run the test and see if it breaks at the same block(s).
@@chrismoore9997 No, it probably won't work on FreeBSD / FreeNAS at this point due to the hardware checking being Linux-isms. But I'm sure it can be ported to work in FreeBSD. I'll looking into porting it, but how's your shell scripting skills? Want to help me port it? :-)
Too bad this tool isn't available any longer. Also, half these commands don't work in Ubuntu, even after running the script on the Github page to check all the tools needed are already installed. Kinda strange nobody has mentioned this. I assume you're running Debian and this would all have to be done in Debian to work.
Or you could just stick them all in a RAID array and initialise it. Trust me you will soon find out if there is a bad disk, and hey if something fails early good news your protected :-)
Excellent! Will use this to test the disks for the next server build.
Glad it will be helpful!
Thanks for the video. I plan on running badblocks on (2) 14TB drives. Since this video is 4 years old, is there anything you would do differently today? Any issues running this from a Ubuntu Live USB drive?
No, I still use it today just as I did in the past. I wish I had more time to improve on it though, but rarely have time these days.
Bookmarking this as I'm working on building a 24 drive TrueNAS Scale system. :D At some point I'm going to have to do this and let it sit and do its thing for a month and a half. 😃 I don't know if such an old post is being monitored but how much storage space do you need to allocate for the log output dumps? As 24 x 16TB drives is going to be A LOT.
You know, i'm not sure... the log can get quite large, but I would think it's less than a few GB in total.
Hello Art of Server!
Thanks for the amazing video, I just started this test on 6 new drives I purchased, and so far so good!
Also, I was wondering if it was advisable to run this kind of test on NVMe drives. I'm about to setup a hybrid pool with ZFS, using NVMe drives for ZIL and the special metadata device, and I'm thinking I would like to have for these drives the reassurance bht is providing me for traditional spinning hard drives.
Setting aside the limited endurance of flash storage, is there a good reason not to run bht or badblocks on NVMe drives?
Thanks again!
I'm glad you found this helpful! I don't think I would use this type of testing on flash storage. Most modern flash storage already keep track of wear, and their failure mode is usually related to controller failure (if not by cell wear out). For flash storage, I would just check the statistics on wear leveling, remaining endurance, and GB written. For maintenance, I might run a discard. That's just my opinion.
Thanks for the thoughtful response!!
Thanks for this video. I wish I could click the thumbs up more than once.
Ha ha.. thanks! Really glad you found this useful! :-)
@Art of Server
Thanks for the great video(s). I have run bht on a few drives already.
If I remember correctly, you have mentioned (probably in a different video) that you wouldn't use drives which showed bad blocks in this test.
But what about drives with reallocated sectors, even a small number of them?
You do bring Reallocated_Sector_Ct in the summary. I have SAS drives and in the SMART report for one drive I just bought it has "Elements in grown defects list: 4". Would you use such a drive?
Drives are cheap enough now, I think I would not use that drive. If you can get it replaced by the vendor you bought from, I would choose that route. If you can't get it replaced, I would only use it for testing situations where the data may not be as important.
When drives develop bad sectors, it doesn't mean it is completely unusable. I find that there are 2 scenarios: 1) a few bad sectors, but then it remains stable and the drive can work for many years, or 2) the bad sectors keep growing and fails completely after some time. The only way to find out which scenario you have, is to do further testing and monitor the defect list. It may not be worth the time to do this if you can get it replaced. If not, then run bht on the drive a couple of times and see if the defects grow or not.
@@ArtofServer Thanks. I'll try to get it replaced. This drive was sold as "new out of box" and then turns out to have been in use for over 28000 hours. The price wasn't bad even for a used drive; I wouldn't have returned it just for being used but I'm not going to keep a drive with "elements in grown defects list".
I would love to hear your experiences about deciphering smartctl readouts. I realize this is a vast topic, but it's an important one. Currently, I am sitting in front of a zfs test which has been going on ~1year. Just a humble Freenas. It is near me. I noticed yesterday, the sound of a repeated HDD spin-up. Sounds like a drive is stuck in a loop. it does a pattern of something for ~10 seconds. Then spin-up. smartctl shows interesting stuff(high numbers compared to other drives), but no fail. It is at this moment I am reminded of how strange these smartctl readouts are. i recall in the past that some HDD's don't support all smart tests and the value's can show up wonky. In my situation, if this drive continues like this, it is likely to fail. If I was not near the computer in a silent room I would have no way to know. Even right now, in order to know exactly which one it is, I will have to detach it from the case and keep the wires connected. JUST to know if it is indeed the one I suspect. (plug it in somewhere else=yea) Looks like an ironwolf I got last year. I'd be mad if this is the case, b/c I specifically went out of my way to NOT get the ones with the funny firmware(stripe gate?)
That could be an interesting topic for a future video.
I avoid Seagate at all costs. I'll gladly pay more for a used HGST over a brand new Seagate. I've lost count of how many times a customer has reached out to me to get help troubleshooting their storage setup and it turned out to be some strange Seagate issue.
Nice video useful as always
In case someone wanted to run 1 pass instead of 4 what would be the option to do that?
Is bht's main difference than DD, that can run simultaneously to multiple hdd's ?
It took 6 days at what Hdd capacity? Does wd cwd100emaz pid refer to a 10TB drive ?
Thanks for your time and effort.
Thanks for watching!
Great video & channel
May I have several questions:
1. Is badblocks still relevant with new disks? I've read that newer HDD might not report errors to badblocks so it will never know if sector was bad because it was relocated by HDD behind the scenes. In this case people recommends S.M.A.R.T. where such value is shown.
2. man page of badblocks mentions that results from badblocks can be used for creating/correcting ext3/ext4 file systems. I know that badblocks can be run on either disk or partition but does the eixsting FS has some effect on badblocks and is it recommended to run on other FS than ext3/ext4?
3. Currently I have HDD which I have cloned using ddresce with no problems but badblocks -nvs on this drive created 11GB! log file. smartctl does not show anything suspicious (Reallocated_Sector_Ct is 0, Power_On_Hours is 136, Multi-Zone Error Rate is NA). What do you recommend for this case?
Thanks
Late to the show but for anyone else who sees this question....Even if its reallocating in the backend it would still trigger a smart count uptick. This is why its important to track the values before you start the process and after.
Badblocks for what is being referenced here is designed for low level inspection of the disk. Not file level. The actions in this video are destructive. ZFS that would be used in this case is different than extX.
3. Restart the system and run it again. I've seen posts online where people were getting lists and lists of bad sectors but when they run it again....nothing where SMART shows clean.
Thank you, wonderful test!
Thank you!
Question: what kind of errors or percentage of errors would be acceptable for a used drive?
That's a good question. I would look at the stability of the error rather than the number. Some medium errors are indicators of a larger problem.. so if the error keeps incrementing, that's not good. But if there's 1 or 2 bad sectors, and it remains stable and doesn't increment even after a full test like running this BHT, then it might be okay. Ideally, I'd like to see 0 errors and no growing defects, but short of that, I don't mind a handful of bad sectors that are not growing if the price was right.
Old video but I m going to give it a shot.
Not quite on topic but I happened to use badblocks then long smartctl on new drives and I have a weird issue irrelevant of the tests with the hdds. I ve bought 7 new Dell branded (is for a Dell R740) TOSHIBA AL15SEB18EQY sas drives. All of them have this weird issue. Even though the tests pass, the smartctl option Non-medium error count: , increases by 1 with each reboot or turn off / on of the server for all drives. It might not be a medium error (after all the explanation of the Non-medium error count: dictates that), but a problem of the controller (using HBA330) or the cabling or the caddies (if i want to stretch enough). I have 2 identical servers though and other hard drives same spec as Toshiba's and all the other drives don t exhibit the same issue with the Non-medium error count: option. The same behavior happens when the drives tested on a different server. Any insights of what to look for or if i could let this option slide and concentrate more on Elements in grown defect list: ?? Maybe any better tool to check them?
PS It seems to me that upon initialization of the drives an unknown parameter is being asked from the controller I assume and since they can t return info back this counter increases by one or something relevant.
Thank you in advance.
The non-medium error really isn't specific enough to really understand what is going on. Your "PS" comment might be hinting you in the right direction, but without specifics, I can't really say anything. If the drives work after initialization and incrementing the counter, I would not worry too much about non-specific errors.
Curios, what command are you running on the righ side of the screen. You referred to it as "iostack", but I don't see a program called that.
It is 'iostat' not iostack. :-)
Never mind, I was looking at the wrong video. However, it would make your customers more "sticky" if you had a blog where you posted example commands and results and theories, etc. Like a printable summary of what your videos covered. The only thing missing from your videos is standing in a sweltering hot or freezing cold warehouse while you talk about SAS. Do you have a video where you talk about FCP?
What is "FCP"? I think of the European car parts seller... or is that for Final Cut Pro? Or?
@@ArtofServer Fibre channel
Thank you for this!! This is exactly what I was looking for but wasn't even sure existed. Can I buy you a beer or something?
Ha ha ha... glad this was helpful! I only drink tea though LOL
Art of Server I don’t drink either - just didn’t want to be cheap! Seriously, this just helped me out tremendously. I don’t need anything in your eBay store at the moment, and I’ve already hit the like button. If there’s some other way I can show my appreciation, say the word!
@@DuckDuckDad please subscribe and share my videos and channel with any friends that you think would be interested. thanks for your support! :-)
@@ArtofServer Speaking of our human evolution I think it will be wise for all humans to learn how to accept donations/contributions. If we can't teach each other then information will require institutions. If we cannot sell to each other then corporations are required. personally I have yet to figure out how to accept donations. Sure its easy just to click the yes buttons... but what of the headaches? I just use paypal and venmo for friends and family, but the thought of putting up a public "donate link" to me sounds like > spend 1 hour a month checking(reconciling) when your not popular >> and > hire accountant to track the BS if you become well known
I have another question. I ran the test, the final output was different than yours. I do not see a line for "Smart Reallocated Sector" and "Smart_Multizone_Error_Rate"....
HUS724040ALS640_PCG6B85X:
badblocks[Reading and comparing Pass completed, 0 bad blocks found. (0/0/0 errors)]
HDD Type:[SAS]
SMART:[power_on_time(hours:minutes)=7672:15]
Any idea why?
I think because you are testing SAS drives and not SATA drives. SAS and SATA have different SMART data.
@@ArtofServer Thank you
@@jcmichel5768 Yeah. SMART data output is not the same for SAS.
I think the script/binary was made for SATA.
Great video, thank you for sharing!
Thanks for watching!
Got any tips for doing something like this on Windows? 1. just for a few drives? 2. for many more? I don't have linux experience.
That's a great question, but unfortunately, I'm not well versed in Windows. I'm sure there are Windows tools for wiping drives and doing a full sector by sector test. I just don't know what they are or which one is the best.
Glad I found this video after buying some of your great products on eBay. I have a Nas4free installation, and I can’t find equivalent tools under FreeBSD. Before I spend hours trying to figure out how to make a live usb with Linux plus persistent install of bht and badblocks, I’m curious how much more comprehensive your approach is compared to, say, using smartctl in long self test mode, or dd in test mode.
Glad this could help.
Smart long test is more of a surface scan and does not really test writing to sectors. it's not comprehensive, but can detect sectors going bad because they often first manifest as being difficult to read. I'm not aware of "dd test mode" ?
Sorry, I meant diskinfo using -t option which does “Perform a simple and rather naive benchmark of the disks seek and transfer performance.”
The bht requires mailx to be configured and kornsh. It would not let me execute regardless if I specify an email or not. I think the badblocks version would be:
badblocks -b 32768 -c 512 -wsv /dev/ -o error_output.txt
For multiple disks, I use tmux, but it is not as convient as the bht.
For those who are using USB 3.0 disk, to get the smartctl to work, you would need to specify the device as SAT to get the smart info; otherwise, it is not going to work.
e.g. smartctl -A -d sat /dev/
smartctl -t short -d sat /dev/
smartctl -t long -d sat /dev/
I think there's an open issue on that... when I can find time, I'll have to fix that.
Going to check this out for my server build. What would you consider to be cutoff amount of badbloccks / or reallocated sectors for a drive to be considered no good?
For something important or production use, anything > 0 is pulled out. For casual use, testing, temporary storage, something in the single digits doesn't bother me.
What command did you use to monitor the throughput on the right?
iostat command run in a loop.
@Art of Server Are you able to explain in a bit more detail how to install these 2 programs? I just installed Linux mint 20 to test 7 14 TB drives and am not able to install these programs as they only have source code for download. Thank you
Thanks for sharing. Wish you the best. Liked and subscribed 👍
Didn't know this tool. Last week I created a script that uses smartmontools to list every drive and get it's serial, then the script checks if it's serial is in the exclude list (you insert your OS's HD serial there) and then runs badblock asynchronous on every HDD plugged using "&" at the end of the command.
My script is better in a scenario where you replace multiple drives every time and don't want to insert every disk path so it can be tested.
That's cool! :-)
Excellent video! I am looking at setting up a home server and would like to use this utility to test all the hard drives going into the server ahead of time. I'm not very familiar with Linux command line so I was wondering if you would be able to walk me through the setup in further detail. I have already done the git clone command but am unable to run the utility. Any help would be greatly appreciated.
Hi! thanks for watching.
So, the instructions on how to use this tool is in the README file on github. Check there first, and see if that makes sense to you. Let me know if you have questions after reading the README file.
@@ArtofServer I'm getting the following error: "bash: /home/derek/bin/bht/bht /bin/ksh: bad interpreter: Permission Eenied" I am running with su
@@dereksumption6018 that looks like you're trying to run without the ksh? did you install ksh and other requirements mentioned in the README?
@@ArtofServer Yes, I did got through and make sure I had all requirements installed before attempting to run. I did not reboot the system after installing everything, not sure if that is necessary at all. I am currently running badblocks on a single hard drive so will not be able to reboot until that is complete. I am running a clean install of Ubuntu 18.04.
@@dereksumption6018 well, the error message basically means you don't have ksh i think. you'll need to figure out why ksh is not there even though you installed it.
I usually do "ddrescue --force /dev/zero /dev/sdX" to erase the disk. It even omits bad sectors and has a nice output. Nice midnight admin work, too. :)
I think this tool would really help me if I was smart enough to get it to work. Don't know if anyone will find this comment but one of the external utilities that bht checks for before it runs, pvs, is not available/not installed on my system (Ubuntu 18.04 Desktop). I cannot install it on its own nor can I find any straight answers online. Is it because I didn't enable LVM during installation? Is it part of something else? Do I need a server distro? Any help would be welcome. Thanks
The pvs command is part of the lvm2 package. You need to install that package to have that available. I'm not as familiar with ubuntu, but I believe something like this command will do the trick:
sudo apt-get install lvm2
Hope that helps.. :-)
@@ArtofServer Thanks for the reply. I'll give that a try the next chance I get.
Is this recommended for SSDs as well?
No, I don't think I would recommend this type of testing for SSDs as you would end up reducing the endurance of the SSDs. When I can find the time, I was thinking of adding a SSD test mode by using blkdiscard... When I find time! 😂
Thank you this works great.
Great to hear!
Following up on this, I know this is 5 years old, as best I can tell this script will not work on any system that does not have the ksh93 (Korn) shell. All the other prereqs are fine as smartctl, badblocks, sha256sum, and lsscsi are common tools. Even on TrueNAS Scale. Except ksh93. If anyone has a newer method they can share it would be appreciated. I'd doing a dry run, building a portable 6 drive NAS in prep for a 24 drive monster next year. Installed TrueNAS, enabled SSH, and found this does dependency is not there. So either I'll need to build a flash drive with a boot environment that has ksh93 or find another solution.
Kornshell should be available most unix like OS these days.
Will this method work in VMs? for those HHD's passed through?
I don't see why it would not?
@@ArtofServer got following errror:
ATTN: this hard drive testing process will wipe all data
ATTN: on the following hard drives:
ATTN: /dev/sdb
ATTN: ARE YOU SURE YOU WANT TO PROCEED?: y
INFO: collecting SMART data from each drive.
ERR: could not identify model and/or serial number for /dev/sdb. Abort.
Edit: I found it. I don't use Linux as much as I should and am not super familiar with the terminal. It was a dependency issue. Specifically ksh which is listed as ksh93 on the requirements list but ksh93 is not the name of the package. ... I really should have figured that out faster... I'm disappointed with myself now...
ok I have messed something up... I created a install on a usb drive (NOT a live disc) and I am currently using this to test some WD Easy Stores, however I needed to test another drive right away and that test has another 3 days left on it. So, I tried to make another installation to run on another PC to test the drive. When I try to run it on the new installation I keep getting "sudo: bht: command not found" or "sudo: unable to execute ./bht: No such file or directory". I have checked that it is executable, which is all google keeps telling me, but I cant seem to get it to run. It is probably something stupid that I am forgetting but its killing me. I know this isn't really the place but any help anyone can offer would be much appreciated. I vaguely remember an issue like this with a simple fix when I set up the first one, but can't remember it and I am hoping this will help others with this problem.
Dude so cool!
Thanks!
I'm curious, what OS are you booted into while using the BHT script? I'm trying to get in working on Ubuntu and starting to feel like that might be my whole problem. KSH93 does not seem to be available for Ubuntu.
Is it CentOS you're using, or something else?
Centos 7. Ksh93 should be available on all Linux distros, but you might need to install it.
can you please share the iostat command?
I think it's just "iostat -m 1"
Excellent! Subbed :)
Hey man, thanks for the video. Running Debian 10, but can't seem to install pvs, and the script won't run without it. I've ton a ton of Googling and can't find a package called pvs or that has pvs included. Any thoughts?
Thanks again.
Sorry, found it. It's in package lvm2. Great script!
glad you figured it out!
oh man is there a ver I can just boot off a live thumb drive and just run the tests? im not a Linux user.. I don't even know how to use Linux.
sorry, i don't have such an option right now.
I would like to thank you so much for the time and effort you spent in making this script and and sharing it in a video for the beginners like me. But would you please write in the descriptions the pre-requisites to use this script. For example, you need to install first: ksh, pvs ( found in package lvm2), smartctl (found in smarmontools), lsscsi. And add sysstat to run iostat 🌹💕🌹
All of that is written in the readme file, isn't it?
@@ArtofServer pvs is not. Ksh93 is misleading for beginners and that’s what happened to me. In the end I realised it is ksh package. I don’t mean to belittle your efforts, but rather to spread the good work you benevolently gave away to us newbies. God bless you and have a blessed day.
@@MichealG I'm open to suggestions for improvements, so thanks for that. I'll look into adding pvs/lvm requirements. I may add some clarification on the ksh93 issue. I meant for that to distinguish from ksh88 and pdksh, which may still be available on some unix systems. Thanks for the suggestions.
@@ArtofServer The real appreciation and thanks are for you. God bless you. ❤️
I’m interested in knowing why you used block size 512 when all modern hdds are 4096 physical block size
I didn't. The default block size of the script is 32k. It was not chosen to match the native sector size of the drive, but to optimize for testing throughput as described in the video.
Thank you very much for publishing this script... I've used to test a few disks already. My first batch of tests was 100% ok but now I'm testing two more disks and I got this result after two days:
badblocks[Testing with pattern 0xff Interrupted at block 217784832]
badblocks[Testing with pattern 0xff Interrupted at block 198981120]
The test seems to have stopped... I was under the impression that in case of encounterings bad blocks it would keep testing and report at the end. Should I do anything else to conclude those disks are bad? Should I run the test again to confirm?
Thanks!
I don't know for sure, but I think "interrupted" means the process was interrupted some how... i would re-run the test and see if it breaks at the same block(s).
@@ArtofServer I see. I've relaunched the test. Thanks again.
Nice video
Has anyone seen a updated version of this online? It's not working for me now.
what's not working?
this no longer works when i try running bht keeps complaining about pvs not being installed but can't find nothing about pvs ecept for LVM
That's exactly what pvs is....
@@ArtofServer yeah. But this tool don't work without it.
@@VioletDragonsProjects I guess I need to add that to the requirements list? Usually LVM stuff is included in the OS.
okay so im using ubuntu and i cannot for the life of me get the bht to work
I'd need a lot more info than that if you actually want some help. Or, open an issue on github with details.
how is this test on smr drives?
No idea. I avoid SMR like COVID-19.
@@ArtofServer thanks for getting back to me. I got 8tb brand new for 80canadian each 12 total so could not refuse.
@@YuriShevchouk that might be tempting to some, but you couldn't pay me take SMR drives... Good luck with them!
@@ArtofServer was about to ask how many months this could take on a smr drive ;)
Is this your script?
yup.
@@ArtofServer - nice work. Do you know if it will work using a FreeNAS system instead of from a Linux host OS?
@@chrismoore9997 No, it probably won't work on FreeBSD / FreeNAS at this point due to the hardware checking being Linux-isms. But I'm sure it can be ported to work in FreeBSD. I'll looking into porting it, but how's your shell scripting skills? Want to help me port it? :-)
tmux!
Too bad this tool isn't available any longer. Also, half these commands don't work in Ubuntu, even after running the script on the Github page to check all the tools needed are already installed. Kinda strange nobody has mentioned this. I assume you're running Debian and this would all have to be done in Debian to work.
I'm not sure what doesn't work for you. also, I am not running on Debian. This video was done on CentOS.
Or you could just stick them all in a RAID array and initialise it. Trust me you will soon find out if there is a bad disk, and hey if something fails early good news your protected :-)