How to determine if a SATA drive is failing.
When is it a good time to check to see if a hard drive is failing? Well, when your console is full of IO/seek errors, I’d say that is a pretty good time! Hah.
According to research conducted by Google, a document entitled Failure Trends in a Large Disk Drive Population states that the manufacturer, particular model and vintage plays a role, but does not provide failure statistics on model and manufacturers. Most drives were run at 45C or less.
From the SMART data, scan errors, reallocations, offline reallocations and probational counts had a significant correlation with failure probability, whereas seek errors, calibration retries and spin retries had little significance.
Soooo…. you want to look at the Raw_Read_Error_Rate, Seek_Error_Rate and Reallocated_Sector_Ct information from smartctl.
[root@SOMESERVER ~]# smartctl --all /dev/sdb | grep Error Error logging capability: (0x01) Error logging supported. 1 Raw_Read_Error_Rate 0x000f 117 100 006 Pre-fail Always - 166491825 7 Seek_Error_Rate 0x000f 090 060 030 Pre-fail Always - 999290467 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1
[root@SOMESERVER ~]# smartctl --all /dev/sdb| grep Reallocated_Sector_Ct5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
In regards to Reallocated_Sector_Ct, the normalized values (current=100, worst=100) indicate the drive is in perfect condition (higher is better, and looking at the overall report it appears that 100 is “best”). The threshold value (36) just indicates how low the normalized value would have to drop before the manufacturer would consider the drive to be in a “Pre-fail” condition.
If you run “smartctl –all /dev/sdb | grep Error” again and notice that Raw_Read_Error_Rate and Seek_Error_Rate keep incrementing AND Reallocated_Sector_Ct is greater than 0, its pretty safe to say that you have a ticking time-bomb on your hands. You should consider replacing those drives as soon as possible.