首页 > 代码库 > SMART 磁盘监控方案
SMART 磁盘监控方案
1 编写目的
在如今大数据的环境中,磁盘的性能和稳定性是非常重要的一个业务因素。在Linux系统中,smartctl是较为常用的磁盘检测工具。
本文基于Linux系统中smartctl进行分析,目的在于说明相关工具的使用,并对SMART(Self-Monitoring, Analysis and Reporting Technology)做一些分析。
2 术语、定义和缩略语
2.1 术语、定义
本文使用的专用术语、定义,见表2.1。
表2.1
术语/定义 含义
SMART Self-Monitoring, Analysis and Reporting Technology
2.2 缩略语
本文件应用了以下缩略语,见表2.2。
表2.2
缩略语 原 文 中文含义
SMART Self-Monitoring, Analysis and Reporting Technology 自监察分析及报告技术
3 smartctl
smartctl是smartmontools-5.38-2.el5 rpm中的一个命令行工具,可以执行SMART任务:打印SMART self-test和error报告,开启或关闭SMART自动测试,触发磁盘self-test。
语法:
smartctl [options] device
device:
"/dev/hd[a-t]" IDE/ATA 磁盘
"/dev/sd[a-z]" SCSI devices磁盘。注意,对于SATA磁盘,由于是通过libata
库来访问,所以要增加参数"-d ata"。
3.1 [options]:
参数按照不同的类型来分类。
3.1.1 显示信息 参数:
-h 帮助信息
-V 版本信息
-i 打印基本信息(磁盘设备号、序列号、固件版本…)
-a 打印磁盘所有的SMART信息
3.1.2 运行时行为 参数:
-q TYPE 指定输出的安静模式。
TYPE可以有3种选择:
eorsonly 只打印错误日志。
slent 有任何打印。
nserial 不打印序列号
-d TYPE 指定磁盘的类型。如果没有指定,smartctl会根据磁盘的名字来
猜测磁盘类型。
-T TYPE 指定当发生错误时,smartctl的容忍程度,是否继续运行。
TYPE可以有4种选择:
conservative 一有错就会退出
normal 如果必须支持的SMART命令失败,则退出
permissive 忽略一次必须支持的SMART命令失败
verypermissive 忽略所有必须支持的SMART命令失败
-b TYPE 指定当发生校验错误时,smartctl的动作。
TYPE有3种选择:
warn 发出警告,继续执行
exit 退出smartctl
ignore 不发出告警,继续执行
-r TYPE smartmontools开发人员相关。
-n POWERMODE 指定当磁盘处于节能模式时,smartctl是否继续检查,
默认是不检查。
POWERMODE有4种选择:
never 检查
sleep 除了sleep模式,检查。
standby 除了sleep或standby模式,检查。
idle 除了sleep或standby或idle模式,见车。
3.1.3 SMART功能开关 参数:
-s on/off 打开或关闭磁盘的SMART功能
-o on/off 打开或关闭SMART自动离线检测,该功能每4小时就会自动扫描磁盘是
否有缺陷。
-S on/off 打开或关闭“自动保存厂商指定属性”功能。
3.1.4 SMART 读和显示数据 参数
-H 报告磁盘的是否健康。如果报告不健康,则说明磁盘已经损坏或会在24小时
内损坏。
-c 显示磁盘支持的普通SMART功能,以及这些功能当前的状态。
-A 显示磁盘支持的厂商指定SMART特性。这些特性的编号从1-253,并且有指
定的名字。
-l TYPE 指定显示的log类型。
TYPE有4种选择:
error 只显示error log。
selftest 只显示selftest log
selective 只显示selective self-test log
directory 只显示Log Directory
-v N,OPTION 显示厂商指定SMART特性N时,使用厂商相关的显示方式。
-F TYPE 设置smartctl的行为,当出现一些已知但还没有解决的硬件或软件bug时,
smartctl应该怎么做。
-P TYPE 设置smartctl是否对磁盘使用数据库中已有的参数。
3.1.5 SMART 离线测试、自测试 参数
-t TEST 立刻执行测试,可以和-C参数一起使用。
TEST可以有以下几个选择:
offline 离线测试。可以在挂载文件系统的磁盘上使用
short 短时间测试。可以在挂载文件系统的磁盘上使用。
long 长时间测试。可以在挂载文件系统的磁盘上使用。
conveyance [ATA only]传输zi测试。可以在挂载文件系统的磁盘上使用。
select, N-M
select, N+SIZE [ATA only]有选择性测试,测试磁盘的部分LBA。N表示
LBA编号,M表示结束LBA编号,SIZE表示测试的LBA
范围。
-C 在captive模式下运行测试。
注意:(1)-C必须配合-t一起使用,但如果是-t offline,则-C不生效。
(2)-C会使得磁盘很忙,所以最好是在没有挂载文件系统的磁盘上使用。
-X 中断no-captive模式下运行的测试。
3.2 常用example
3.2.1 查看当前整体健康状态
查看/dev/sda当前整体监控状态。PASSED表示健康,否则意味着磁盘已经故障,或很快就会发生故障。
smartctl -H /dev/sda
3.2.2 查看所有信息
打印/dev/sda所有的SMART信息。
martctl -a /dev/sda
相当于依次执行:
smartctl –i /dev/sda
smartctl -c /dev/sda
smartctl -A /dev/sda
smartctl -l error /dev/sda
smartctl -l selftest /dev/sda
smartctl -l selective /dev/sda
3.2.3 开/关SMART功能
打开或关闭/dev/sda 的SMART功能。
smartctl -s on/off /dev/sda
查看当前SMART功能是否开启,可以使用 –i 参数。
smartctl -i /dev/sda
3.2.4 离线测试
对/dev/sda进行离线测试,它的结果主要用来更新SMART 属性。
smartctl -t offline /dev/sda
3.2.5 短时间测试
对/dev/sda进行短时间测试。
smartctl -t short /dev/sda
3.2.5.1 观察测试进度
通过-c 参数,可以观察到测试的进度:
# smartctl -c /dev/sda
…
Self-test execution status: ( 242) Self-test routine in progress...
20% of test remaining.
…
3.2.5.2 观察测试结果
通过-l selftest 参数,可以看到/dev/sda测试的结果记录:
“#1”代表的那一次测试,Completed without error表示完成,没有错误。
“#2”代表的那一次测试,Aborted by host表示测试被用户终止,还有90%没有完成。
# smartctl -l selftest /dev/sda
...
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 9535 -
# 2 Extended offline Aborted by host 90% 9534 -
...
3.2.6 查看SMART属性值
通过-A参数,可以看到/dev/sda SMART属性值。
smartctl -A /dev/sda
每一行代表一个SMART属性的相关信息。
RAW_VALUE: 表示该属性的实际值,比如12行,表示磁盘power-cycle的实际次数。
VALUE: 范围是1到254,由RAW_VALUE装换而来,装换工作是由磁盘的固件自己完成的。
THRESH: 范围0到255,门限值,和VALUE值比较。如果VALUE值小于等于THRESH,那么这个属性就不正常了。
TYPE: Pre-fail表示当VALUE值小于或等于THRESH时,磁盘即将会有相关故障。
Old_age表示当VALUE值小于或等于THRESH时,磁盘相关属性已经老化。
3.3 smartctl结构
smartctl工具的主要结构如下图,解析参数后,就根据参数指定的值设置或查询SMART信息。
3.4 SMART 属性
使用smartctl -A /dev/sda能看到很多磁盘的SMART 属性,可以知道磁盘是否健康。
下面是一个列表,可以知道每个属性的具体含义:
ID Hex Attribut name Description
01 0x01 Read Error Rate (Vendor specific raw
value.) Stores data related to the rate of hardware read errors that
occurred when reading data from a disk surface. The raw value has
different structure for different vendors and is often not meaningful as
a decimal number.
02 0x02 Throughput Performance Overall (general)
throughput performance of a hard disk drive. If the value of this
attribute is decreasing there is a high probability that there is a
problem with the disk.
03 0x03 Spin-Up Time Average time of spindle spin up (from zero RPM to fully operational [millisecs]).
04 0x04 Start/Stop Count A tally of spindle
start/stop cycles. The spindle turns on, and hence the count is
increased, both when the hard disk is turned on after having before been
turned entirely off (disconnected from power source) and when the hard
disk returns from having previously been put to sleep mode.
05 0x05 Reallocated Sectors Count Count of
reallocated sectors. When the hard drive finds a read/write/verification
error, it marks that sector as "reallocated" and transfers data to a
special reserved area (spare area). This process is also known as
remapping, and reallocated sectors are called "remaps". The raw value
normally represents a count of the bad sectors that have been found and
remapped. Thus, the higher the attribute value, the more sectors the
drive has had to reallocate. This allows a drive with bad sectors to
continue operation; however, a drive which has had any reallocations at
all is significantly more likely to fail in the near future.[2]While
primarily used as a metric of the life expectancy of the drive, this
number also affects performance. As the count of reallocated sectors
increases, the read/write speed tends to become worse because the drive
head is forced to seek to the reserved area whenever a remap is
accessed. A workaround which will preserve drive speed at the expense of
capacity is to create a disk partition over the region which contains
remaps and instruct the operating system to not use that partition.
06 0x06 Read Channel Margin Margin of a channel
while reading data. The function of this attribute is not specified.
07 0x07 Seek Error Rate (Vendor specific raw
value.) Rate of seek errors of the magnetic heads. If there is a partial
failure in the mechanical positioning system, then seek errors will
arise. Such a failure may be due to numerous factors, such as damage to a
servo, or thermal widening of the hard disk. The raw value has
different structure for different vendors and is often not meaningful as
a decimal number.
08 0x08 Seek Time Performance Average performance
of seek operations of the magnetic heads. If this attribute is
decreasing, it is a sign of problems in the mechanical subsystem.
09 0x09 Power-On Hours (POH)
Count of hours in power-on state. The raw value of this attribute shows
total count of hours (or minutes, or seconds, depending on manufacturer)
in power-on state.
10 0x0A Spin Retry Count Count of retry of spin
start attempts. This attribute stores a total count of the spin start
attempts to reach the fully operational speed (under the condition that
the first attempt was unsuccessful). An increase of this attribute value
is a sign of problems in the hard disk mechanical subsystem.
11 0x0B Recalibration Retries orCalibration Retry Count
This attribute indicates the count that recalibration was requested
(under the condition that the first attempt was unsuccessful). An
increase of this attribute value is a sign of problems in the hard disk
mechanical subsystem.
12 0x0C Power Cycle Count This attribute indicates the count of full hard disk power on/off cycles.
13 0x0D Soft Read Error Rate Uncorrected read errors reported to the operating system.
180 0xB4 Unused Reserved Block Count Total "Pre-Fail" Attribute used at least in HP devices.
183 0xB7 SATA Downshift Error Count Western Digital and Samsung attribute.
184 0xb8 End-to-End error / IOEDC This
attribute is a part of Hewlett-Packard‘s SMART IV technology, as well as
part of other vendors‘ IO Error Detection and Correction schemas, and
it contains a count of parity errors which occur in the data path to the
media via the drive‘s cache RAM.
185 0xB9 Head Stability Western Digital attribute.
186 0xBA Induced Op-Vibration Detection Western Digital attribute.
187 0xBB Reported Uncorrectable Errors The count of errors that could not be recovered using hardware ECC .
188 0xBC Command Timeout The count of aborted
operations due to HDD timeout. Normally this attribute value should be
equal to zero and if the value is far above zero, then most likely there
will be some serious problems with power supply or an oxidized data
cable.
189 0xBD High Fly Writes HDD producers implement a
Fly Height Monitor that attempts to provide additional protections for
write operations by detecting when a recording head is flying outside
its normal operating range. If an unsafe fly height condition is
encountered, the write process is stopped, and the information is
rewritten or reallocated to a safe region of the hard drive. This
attribute indicates the count of these errors detected over the lifetime
of the drive.
This feature is implemented in most modern Seagate drives and some of
Western Digital’s drives, beginning with the WD Enterprise WDE18300 and
WDE9180 Ultra2 SCSI hard drives, and will be included on all future WD
Enterprise products.
190 0xBE Airflow Temperature (WDC) resp.Airflow
Temperature Celsius (HP) Airflow temperature on Western Digital
HDs (Same as temp. [C2], but current value is 50 less for some models.
Marked as obsolete.)
191 0xBF G-sense Error Rate The count of errors resulting from externally-induced shock & vibration.
192 0xC0 Power-off Retract Countor Emergency Retract Cycle
Count(Fujitsu) Count of times the heads are loaded off the
media. Heads can be unloaded without actually powering off.
193 0xC1 Load Cycle Count orLoad/Unload Cycle
Count(Fujitsu) Count of load/unload cycles into head landing zone
position.
The typical lifetime rating for laptop (2.5-in) hard drives is 300,000
to 600,000 load cycles. Some laptop drives are programmed to unload the
heads whenever there has not been any activity for about five
seconds.Many Linux installations write to the file system a few times a
minute in the background. As a result, there may be 100 or more load
cycles per hour, and the load cycle rating may be exceeded in less than a
year
194 0xC2 Temperatureresp.Temperature Celsius Current internal temperature.
195 0xC3 Hardware ECC Recovered (Vendor specific
raw value.) The raw value has different structure for different vendors
and is often not meaningful as a decimal number.
196 0xC4 Reallocation Event Count Count of remap
operations. The raw value of this attribute shows the total count of
attempts to transfer data from reallocated sectors to a spare area. Both
successful & unsuccessful attempts are counted.
197 0xC5 Current Pending Sector Count Count of
"unstable" sectors (waiting to be remapped, because of read errors). If
an unstable sector is subsequently read successfully, this value is
decreased and the sector is not remapped. Read errors on a sector will
not remap the sector (since it might be readable later); instead, the
drive firmware remembers that the sector needs to be remapped, and
remaps it the next time it‘s written.
198 0xC6 Uncorrectable Sector Countor
Offline Uncorrectableor
Off-Line Scan Uncorrectable Sector Count
The total count of uncorrectable errors when reading/writing a
sector. A rise in the value of this attribute indicates defects of the
disk surface and/or problems in the mechanical subsystem.
199 0xC7 UltraDMA CRC Error Count The count of
errors in data transfer via the interface cable as determined by ICRC
(Interface Cyclic Redundancy Check).
200 0xC8 Multi-Zone Error Rate The count of errors
found when writing a sector. The higher the value, the worse the disk‘s
mechanical condition is.
200 0xC8 Write Error Rate (Fujitsu) The total count of errors when writing a sector.
201 0xC9 Soft Read Error Rate or
TA Counter Detected
Count of off-track errors.
202 0xCA Data Address Mark errorsor
TA Counter Increased
Count of Data Address Mark errors (or vendor-specific).
203 0xCB Run Out Cancel Count of ECC errors
204 0xCC Soft ECC Correction Count of errors corrected by software ECC
205 0xCD Thermal Asperity Rate (TAR) Count of errors due to high temperature.
206 0xCE Flying Height Height of heads above the
disk surface. A flying height that‘s too low increases the chances of a
head crash while a flying height that‘s too high increases the chances
of a read/write error.
207 0xCF Spin High Current Amount of surge current used to spin up the drive.
208 0xD0 Spin Buzz Count of buzz routines needed to spin up the drive due to insufficient power.
209 0xD1 Offline Seek Performance Drive’s seek performance during its internal tests.
210 0xD2 Unkonw (found in a Maxtor 6B200M0 200GB and Maxtor 2R015H1 15GB disks)
211 0xD3 Vibration During Write Vibration During Write
212 0xD4 Shock During Write Shock During Write
220 0xDC Disk Shift Distance the disk has shifted
relative to the spindle (usually due to shock or temperature). Unit of
measure is unknown.
222 0xDE Loaded Hours Time spent operating under data load (movement of magnetic head armature)
223 0xDF Load/Unload Retry Count Count of times head changes position.
224 0xE0 Load Friction Resistance caused by friction in mechanical parts while operating.
225 0xE1 Load/Unload Cycle Count Total count of load cycles
226 0xE2 Load ‘In‘-time Total time of loading on
the magnetic heads actuator (time not spent in parking area).
227 0xE3 Torque Amplification Count Count of attempts to compensate for platter speed variations
228 0xE4 Power-Off Retract Cycle The count of times
the magnetic armature was retracted automatically as a result of
cutting power.
230 0xE6 GMR Head Amplitude Amplitude of "thrashing" (distance of repetitive forward/reverse head motion)
231 0xE7 Temperature Drive Temperature
232 0xE8 Endurance Remaining Number of physical
erase cycles completed on the drive as a percentage of the maximum
physical erase cycles the drive is designed to endure
232 0xE8 Available Reserved Space Intel SSD reports
the number of available reserved space as a percentage of reserved
space in a brand new SSD.
233 0xE9 Power-On Hours Number of hours elapsed in the power-on state.
233 0xE9 Media Wearout Indicator Intel SSD reports a
normalized value of 100 (when the SSD is new) and declines to a minimum
value of 1. It decreases while the NAND erase cycles increase from 0 to
the maximum-rated cycles.
240 0xF0 Head Flying Hours Time while head is positioning
240 0xF0 Transfer Error Rate(Fujitsu) Count of times the link is reset during a data transfer.
241 0xF1 Total LBAs Written Total count of LBAs written
242 0xF2 Total LBAs Read Total count of LBAs read.
Some S.M.A.R.T. utilities will report a negative number for the raw value since in reality it has 48 bits rather than 32.
250 0xFA Read Error Retry Rate Count of errors while reading from a disk
254 0xFE Free Fall Protection ount of "Free Fall Events" detected
3.5 SMART self-test
使用smartctl –t offline/short/long 可以指定磁盘进行自测。
offline:
这个是默认的自测。
short:
短时自测的目的是快速确认磁盘是否故障。
测试过程有很多项目,都是磁盘厂商自定义的,比如下面的项目:
a) 电气测试项目,测试磁盘内部的电路。具体测试细节有磁盘厂商自己指定,比如:
A) 缓存测试。
B) 读、写电路测试。
C) 读、写磁头测试。
b) 寻道、伺服测试项目,测试磁盘在数据磁道上的寻找和伺服能。
c) 读、校验测试项目,测试磁盘对部分或全盘的读能力。
long:
称为扩展的自测试。测试的项目和short类型,但是时间长得多。
SMART 磁盘监控方案