SMART Tools

I didn't know it until I looked it up while writing this, but "SMART" had to mean something ... "Self-Monitoring, Analysis, and Reporting Technology." I just wanted to know what was wrong with the hard drive a friend gave me ...

I have to admit I have no idea if this applies to SSDs: I have a couple, but haven't experimented. I've used it extensively on spinning disks.

Installation

On Debian or Ubuntu:

# apt-get install smartmontools

On Fedora:

# dnf install smartmontools

The First Test

# smartctl -t short /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.13-200.fc25.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Tue Apr  4 21:07:45 2017

Use smartctl -X to abort test.

This can be run on a drive that's mounted and in use in a functional system. We've requested a short self-test. As it explains in its output, you need to wait a couple minutes. After that wait, retrieve the output:

# smartctl --all /dev/sdb

This will dump a lot of information: you may want to pipe it to less. Let's look at this in pieces.

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 2.5" HDD MQ01ABD...
Device Model:     TOSHIBA MQ01ABD075
Serial Number:    42M9P0BXT
LU WWN Device Id: 5 000039 3f2587e20
Firmware Version: AX002M
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Apr  4 21:10:04 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Every detail of the drive specs you could hope to know. Good start. Next interesting bit:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1625
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       2913
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  ...
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       10512
  ...

There's lots more to this table. Much of it ... I have no idea about. But probably the most important thing to note is the 'WHEN_FAILED' column. If that's got numbers under it, you have a problem. Values that I find interesting are "Start_Stop_Count", "Reallocated_Sector_Ct" (if that has a non-zero 'RAW_VALUE', even if it's not calling it a fail, you should worry about the drive), and "Power_On_Hours".

Next interesting bit is the results of the tests:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10512         -
# 2  Extended captive    Interrupted (host reset)      90%      9998         -
# 3  Short offline       Completed without error       00%      9998         -
...

Your output may be shorter or much longer, but if it's much longer you've been running SMART tests and probably don't need this page. The '#1' test appears to be the most recent (see "LifeTime(hours)") and it appears to be good. Now that you can read the output, run some more tests, or at least one more. I recommend the 'long' test:

# smartctl -t long /dev/sdb
...
Please wait 32 minutes for test to complete.
...

This takes longer (the amount of time varies by drive), but the results are examined in the same way as previously. It's a good test to run if you're concerned about the integrity of the drive.

smartctl is capable of much more. Read the (long) man page: man smartctl.