Does HDD SMART ever be useful for you?

HDD smart useful?

  • Yep! I got SMART warning, backup the data, and awhile later it goes kaput

    Votes: 3 37.5%
  • Useless! I got broken HDD with healthy SMART

    Votes: 2 25.0%
  • SMART is so dumb

    Votes: 3 37.5%

  • Total voters
    8

orangpelupa

Elite Bug Hunter
Legend
My anecdote:

* clicking HDD with horrible read write speed have HEALTHY SMART
* HDD With "failed" SMART status is still going strong for years (more than 3 years I think).
* HDD suddenly totally broke and SMART gave no warning (its supposed to give you a warning of imminent failure after BIOS screen or after windoes logo, I forgot)
 
Your interpretation of SMART report is what's wrong. They are not a single result of Drive is Good or Drive is bad. They are indicators. You need to know what changes in them indicate. It's all in knowing what signs to pay attention to, while certain fields may not signal failure they serve as a request for further investigation.

My file server has never suffered any data loss in the past decade and a half.
 
Your interpretation of SMART report is what's wrong. They are not a single result of Drive is Good or Drive is bad. They are indicators. You need to know what changes in them indicate. It's all in knowing what signs to pay attention to, while certain fields may not signal failure they serve as a request for further investigation.

My file server has never suffered any data loss in the past decade and a half.

Its the official tool from Seagate, hgst, wd themselves that says its fail / failure despite the HDD still working fine for years. And they also the one that says HDD still healthy and then the next day the HDD goes kaput.

Shouldn't those manufacturers already have the statistics, so their apps can properly tell you imminent failure?
 
Didn't get any SMART warnings, HDD died suddenly. Happened a couple of weeks ago. Though in my case I think it might be the controller or something and not the actual disk. Maybe SMART doesn't work for that.

But there is no guarantee that any kind of sense is 100% accurate and even it if it is, hardware could always fail before you have a chance to do something about it. In my case I only had 2 mechanical drives fail on me, one of them that got carried around every day so it suffered a lot of abuse, in 15+ years. Just backup your important stuff and learn to live with the fact you might loose your pr0n collection some day ;)
 
As I said before, you need to look at all indicators. You can't just assume its an All or Nothing thing.

Some are more important than others. Some may not be in "failure" state but seeing them increase is a sign of impending failure. I don't have time to explain the ins and outs of these (Gears 5 unlocking soon), but if you do any amount of reading about these particular indicators on a fileserver forums you'll be better off.

Here's the more important SMART Indicators to monitor. Any increase in these is an early warning sign despite them not being flagged as FAILING / FAILING NOW.
5 - Reallocated sectors count
187 - Reported uncorrectable errors
188 - Command time-out
197 - Current pending sector count
198 - Uncorrectable sector count
199 - UDMA CRC error rate
 
I do agree we're far better off now that we have these valuable statistics than we were pre-SMART.
We now tend to preemptively schedule replacement of drives as soon as a sector re-mapping or timeout occurs.

But the OP does have a truthy point. Seen many drives fail suddenly despite active SMART monitoring not breathing a word beforehand.
 
We now tend to preemptively schedule replacement of drives as soon as a sector re-mapping or timeout occurs.

Seen many drives fail suddenly despite active SMART monitoring not breathing a word beforehand.

Uhm, those stats are provided by SMART monitoring, so they are breathing a word beforehand.
 
Uhm, those stats are provided by SMART monitoring, so they are breathing a word beforehand.

Yes and we're glad to have them. But I'm talking about cases where there wasn't any indication beforehand of any such events, and still sudden instant drive failure.
 
And my clicking HDD still shows healthy smart :/

BTW my HDD with failed smart status have maxed the pending sector counts for years and still wkfking fine :/
 
I remembered reading a Google report about this a few years ago. The conclusion is that about only 30% of their HDD failure (Google has a lot of HDDs as you can imagine) had SMART warnings beforehand. There's no data about how long a HDD goes on with SMART warnings as I imagine that they probably always replace a HDD with SMART warning, so there's no indication on the rate of false positive.
 
Also you should be running at least a SMART short test every month then examining the results instead of being passive and only looking at indicators from your usage.

Before I add drives to my array, I put them through at least a 2 cycle check using what's known in unRaid as "preclear". This is around 36 hours for a 8 TB drive for 1 cycle, so 72 hours for 2 cycles. It exercises the drive by doing a full verified read, verified write, and verified read. A SMART test is run before and after and compared. I have had drives fail the test by showing pending sectors or uncorrectables or even read failures. These get returnned for replacement under warranty.

Hard drives follow a bathtub curve, they fail early on or they fail extremely late in the life cycle. Its important to test drives before you trust it with your data.
 
Btw SMART is even more useless for SSD.

I have old Samsung ssd, the one with electric bug, where the data will detoriate the older it is. Then Samsung released "refresh tool" that basically move the data around from one block to another.

Its hilariously slow. The write speed can go down as slow as KILOBYTES per seconds hahaha.

The read speed also randomly goes up and down like a pendulum.

Then I repurposed it as a portable storage with only temporary files and it works good enough.

Also you should be running at least a SMART short test every month then examining the results instead of being passive and only looking at indicators from your usage.

Before I add drives to my array, I put them through at least a 2 cycle check using what's known in unRaid as "preclear". This is around 36 hours for a 8 TB drive for 1 cycle, so 72 hours for 2 cycles. It exercises the drive by doing a full verified read, verified write, and verified read. A SMART test is run before and after and compared. I have had drives fail the test by showing pending sectors or uncorrectables or even read failures. These get returnned for replacement under warranty.

Hard drives follow a bathtub curve, they fail early on or they fail extremely late in the life cycle. Its important to test drives before you trust it with your data.


About that bathtub curve, that's my justification for getting used 1TB first Gen Seagate SSHD for PS4 PRO.

It was only 25 dollars, and it's like 5 or 7 years old already.

Currently very happy with it. It's blazing fast! (when reading from cache, it's utter slow when writing to disk)
 
As I said before, you need to look at all indicators. You can't just assume its an All or Nothing thing.

Some are more important than others. Some may not be in "failure" state but seeing them increase is a sign of impending failure. I don't have time to explain the ins and outs of these (Gears 5 unlocking soon), but if you do any amount of reading about these particular indicators on a fileserver forums you'll be better off.

Here's the more important SMART Indicators to monitor. Any increase in these is an early warning sign despite them not being flagged as FAILING / FAILING NOW.
5 - Reallocated sectors count
187 - Reported uncorrectable errors
188 - Command time-out
197 - Current pending sector count
198 - Uncorrectable sector count
199 - UDMA CRC error rate

^ This

If you are on Windows I recommend using CrystalDiskInfo: https://crystalmark.info/en/download

Under the Function Tab of CrystalDiskInfo check both Resident and Startup to have it start when Windows boots and keep it running always even if exited.

If one of the above SMART Indicators increases you will get a warning and the status of the drive in question goes from Good (Green) to Caution (Yellow).
 
^ This

If you are on Windows I recommend using CrystalDiskInfo: https://crystalmark.info/en/download

Under the Function Tab of CrystalDiskInfo check both Resident and Startup to have it start when Windows boots and keep it running always even if exited.

If one of the above SMART Indicators increases you will get a warning and the status of the drive in question goes from Good (Green) to Caution (Yellow).

Yeah I use that app too. It's free and readable
 
5 - Reallocated sectors count
187 - Reported uncorrectable errors
188 - Command time-out
197 - Current pending sector count
198 - Uncorrectable sector count
199 - UDMA CRC error rate
And are those numbers good? I have no idea
 
Take
197 - Current pending sector count
Is 197 good? is it bad? i dont know
how many does it have to increase by before its a problem 1? ,2? 10? I dont know
do I have to run the program every day and write down the numbers?
when you say increasing in short time whats that? Minutes, days, weeks I dont know
 
Back
Top