this post was submitted on 30 Nov 2023
1 points (100.0% liked)

Data Hoarder

168 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

Here is a fairly robust way to ensure a drive safe to put into service. I have tested this before and caught drives that would have failed shortly after put into prod, and some that would of after it was more than half full.

  1. Check S.M.A.R.T Info: Confirm no (0) Seek Error Rate, Read Error Rate, Reallocated Sector Count, Uncorrectable Sector Count

  2. Run Short S.M.A.R.T test

  3. Repeat Step 1

  4. Run Conveyance S.M.A.R.T test

  5. Repeat Step 1

  6. Run Destructive Badblocks test (read and write)

  7. Repeat Step 1

  8. Perform a FULL Format (Overwrite with Zeros)

  9. Repeat Step 1

  10. Run Extended S.M.A.R.T test

  11. Repeat Step 1

Return the drive if either of the following is true:

A) The formatting speed drops below 80MB/s by more than 10MB/s (my defective one was ~40MB/s from first power-on)

B) The S.M.A.R.T tests show error count increasing at any step

It is also highly advisable to stagger the testing (and repeat some) if you plan on using multiple drives in a pool/raid config. This way the wear on the drives differ, to reduce the likelihood of them failing at the same time. For example, I re-ran either the Full format or badblocks test on some of the drives so some drives have 48 hours of testing, some have 72, some have 96. This way, the chances of a multiple drive failures during rebuild is lower.

top 17 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 11 months ago (1 children)

Jeez you're buring through so much of the drive's lifespan just checking the damn thing. If a failed drive will cause problems worthy of this amount of burn-in time you need a more robust setup.

I run all used ebay drives. Except for a glance at the smart data before addng them to the array I don't test them at all. Just keep an extra drive or two on hand as spares. Life's easier when you plan for failure instead of fighting it.

[–] [email protected] 1 points 11 months ago

Same, except I also use Scrutiny to flag drives for my attention. It makes educated guesses for a pass/fail mark, using analysis of vendor-specific interpretations of SMART values, matched against the failure thresholds from the BackBlaze survey. It can tell you things like "the current value for the Command Timeout attribute for this drive falls into the 1-10% bracket of probability of failure according to BackBlaze".

It helps me to plan ahead. If for example I have 3 drives that Scrutiny says "smell funny" it would be nice if I had 2-3 spares on hand rather than just 1. Or if two of those drives happen to be together in a 2-pair mirror perhaps I can swap one somewhere else.

[–] [email protected] 1 points 11 months ago

I just full format and check smart, seems like a lot of work for each drive…

[–] [email protected] 1 points 11 months ago

What program are you using to run those tests? Is it usable on windows 10? Thanks for putting your guide up!

[–] [email protected] 1 points 11 months ago (2 children)

Way Overkill.

Single pass read (SMART test is fine) and single pass write (ones, zeros, random, whatever you want) is more than adequate to determine any issues a new disk may have out of the gate, unless you want to isolate a fringe case condition and waste time and wear on your hard drive doing so.

[–] [email protected] 1 points 11 months ago

For real. I suppose if you kept one single copy of the drive you'd want to really, really make sure? But then again why would you keep one copy of anything?

TLDR: smart is smort enuf

[–] [email protected] 1 points 11 months ago

I do it the other way around: first write (zero wipe), then read (SMART long test). Served me well for many disks. :)

[–] [email protected] 1 points 11 months ago

Most I would do is a write test followed by a read test, and then check the smart counters

[–] [email protected] 1 points 11 months ago

I just plug it in and glance over smart data.

[–] [email protected] 1 points 11 months ago (1 children)

Seek Error Rate and Read Error Rate can't be zero.

[–] [email protected] 1 points 11 months ago

Yeah I was under the impression these two attributes vary so wildly between vendors that they're basically void of meaning by now.

[–] [email protected] 1 points 11 months ago

A single full read, and full write test should be plenty. Drives tend to fail really early on or don't fail at all until eol

[–] [email protected] 1 points 11 months ago

I didn’t know all of this until today. I just plug in and use 😅.

[–] [email protected] 1 points 11 months ago

I guess having a backup and error correcting file systems like ZFS or BTRFS will help you more long term. Sure, watch fir Smart values, but imho don't go over board with tests. I do a extended smart test, rebuild/extend my RAID, check a quick smart test again and that's it. Drives can die at any time, even if they were fine after a long test cycle. The 3-2-1 rule should save you from data loss.

[–] [email protected] 1 points 11 months ago

How about if I have already filled the new hard drive (still have the data on the source drives) and just want to make sure all of it is readable (before erasing the data from the source drive), without having to copy all the data from the new drive ?

[–] [email protected] 1 points 11 months ago

But why do all this if using raid with hot spare? If a new drive fails, just replace it once detected that it failed?

[–] [email protected] 1 points 11 months ago

Question: How do you monitor format speed?