Smartmontools是一种硬盘检测工具,通过控制和管理硬盘的SMART(Self Monitoring Analysis and Reporting Technology,自动检测分析及报告技术)技术来实现的。
SMART技术可以对硬盘的磁头单元、盘片电机驱动系统、硬盘内部电路以及盘片表面介质材料等进行监测,当SMART监测并分析出硬盘可能出现问题时会及时向用户报警以避免计算机数据受损失。SMART技术必须在主板支持的前提下才能发生作用,而且SMART技术也不能保证能预报所有可能发生的硬盘故障。
Windows没有内置SMART相关工具,需要安装第三方工具软件,vmware虚拟机的硬盘不支持SMART,Linux上很早就支持SMART。
smartctl [options] device
-h, --help, --usage 获取命令帮助
-V, --version, --copyright, --license 打印显示软件版本、授权等信息
-i, --info 显示指定设备的身份信息
-g NAME, --get=NAME 查看设备设置值,name可选值包括all, aam, apm, dsn, lookahead, security,wcache, rcache, wcreorder, wcache-sct
-a, --all 打印显示设备的所有smart信息
-x, --xall 打印显示设备的所有信息
–scan 扫描磁盘设备
–scan-open 扫描磁盘设备并参数开启设备
-j, --json[=[cgiosuv]] 打印输出为json格式
-q TYPE, --quietmode=TYPE 安静模式,TYPE可选值为errorsonly, silent, noserial
-d TYPE, --device=TYPE 指定设备类型,TYPE可选值为ata, scsi[+TYPE], nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, sntjmicron[,NSID], intelliprop,N[+TYPE], marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test
-T TYPE, --tolerance=TYPE 公差类型,可选值为normal, conservative, permissive, verypermissive
-b TYPE, --badsum=TYPE 设置校验和有错的扇区执行操作,可选TYPE值有warn, exit, ignore
-r TYPE, --report=TYPE 报告事务设置
-n MODE[,STATUS], --nocheck=MODE[,STATUS] 检查介绍后的操作never, sleep, standby, idle
-s VALUE, --smart=VALUE 开启或禁用设备device功能,VALUE值为on/off
-o VALUE, --offlineauto=VALUE 开启或者禁用离线测试,VALUE值为on/off
-S VALUE, --saveauto=VALUE 开启或者禁用属性自动保存,VALUE值为on/off
-s NAME[,VALUE], --set=NAME[,VALUE] 开启或者关闭指定类型设备
-H, --health 查看设备smart健康状况
-c, --capabilities 查看设备smart能力
-A, --attributes 查看生成厂商smart属性和属性值
-f FORMAT, --format=FORMAT 设置输出格式属性
-l TYPE, --log=TYPE 查看指定类型日志,常用日志类型error, selftest, selective, directory,background, scttemp[sts,hist]
-v N,OPTION , --vendorattribute=N,OPTION 设置供应商属性N的显示选项
-t TEST, --test=TEST TEST可选值包括offline, short, long, conveyance, force, vendor,N,select,M-N, pending,N, afterselect,[on|off]
-C, --captive 捕获模式下运行,即前台运行
-t short 后台检测硬盘,消耗时间短
-t long 后台检测硬盘,消耗时间长
-C -t short 前台检测硬盘,消耗时间短
-C -t long 前台检测硬盘,消耗时间长
-X, --abort 中断任何后台自测
apt install smartmontools
smartctl -i /dev/sda
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Purple
Device Model: WDC WD40PURX-78NZ6Y0
Serial Number: WD-WCC7K4AN0E4C
LU WWN Device Id: 5 0014ee 20fdccc05
Firmware Version: 80.00A80
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Mar 30 09:18:37 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Available表示硬盘支持SMART,Enabled表示启用了SMART
smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Enabled.
SMART Automatic Offline Testing Enabled every four hours.
smartctl -a /dev/sda
smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
result后边的结果:PASSED,这表示硬盘健康状态良好,如果这里显示Failure,那么最好立刻给服务器更换硬盘。
smartctl -A /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 142 142 054 Pre-fail Offline - 68
3 Spin_Up_Time 0x0007 122 122 024 Pre-fail Always - 185 (Average 189)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 715
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 115 115 020 Pre-fail Offline - 34
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 12687
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 372
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 830
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 830
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always - 31 (Min/Max 7/41)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
smartctl -l error /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged
smartctl --test=long /dev/sda
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: “Execute SMART Extended self-test routine immediately in off-line mode”.
Drive command “Execute SMART Extended self-test routine immediately in off-line mode” successful.
Testing has begun.
Please wait 119 minutes for test to complete.
Test will complete after Tue Oct 12 17:14:21 2021
Use smartctl -X to abort test.
smartctl -C -t short /dev/sda
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: “Execute SMART Short self-test routine immediately in captive mode”.
Drive command “Execute SMART Short self-test routine immediately in captive mode” successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Tue Oct 12 16:03:19 2021
smartctl -X /dev/sda
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: “Abort SMART off-line mode self-test routine”.
Self-testing aborted!
smartctl -l selftest /dev/sda