Fundamentally, hard drives can suffer one of two classes of failures:
- Predictable failures such as mechanical wear and aging happen gradually over time. A monitoring device can detect these similar to a temperature dial on the dashboard of an automobile. It can warn a driver that the engine has started to overheat before serious damage occurs.
- Unpredictable failures, such as an electronic component failing may occur suddenly and unpredictably.
Mechanical failures, which are usually predictable failures, account for 60 percent of drive failures. The purpose of S.M.A.R.T. is to warn a user or system administrator of an imminent drive failure while there is still time to take preventative action. I.E. - copying the data to a replacement device. Approximately 30% of failures can be predicted by S.M.A.R.T.
Work at Google on over 100,000 drives, the implementation of S.M.A.R.T. status has shown little overall predictive value of information as a whole.
Although, certain sub-categories of S.M.A.R.T. information and tracking do correlate with actual failure rates - specifically that of the period following the first scan error.
Drives are 39 times more likely to fail within 60 days of a first scan error than drives with no such errors. There is also a strong correlation to a higher probability of Hard Drive failure with first time errors in reallocations, offline reallocations, and probational counts.
PC techguide's page on S.M.A.R.T. (2003) comments that the technology has gone through three phases:
"In its original incarnation SMART provided failure prediction by monitoring certain online hard drive activities. A subsequent version improved failure prediction by adding an automatic off-line read scan to monitor additional operations. The latest SMART III technology not only monitors hard drive activities but adds failure prevention by attempting to detect and repair sector errors. Also, whilst earlier versions of the technology only monitored hard drive activity for data that was retrieved by the operating system, SMART III tests all data and all sectors of a drive by using off-line data collection to confirm the drive's health during periods of inactivity."