首页 > 代码库 > [archlinux][hardware] ThankPad T450自带SSD做bcache之后的使用寿命分析

[archlinux][hardware] ThankPad T450自带SSD做bcache之后的使用寿命分析

这个分析的起因,是由于我之前干了这两个事:

[troubleshoot][archlinux][bcache] 修改linux文件系统 / 分区方案 / 做混合硬盘 / 系统转生大!手!术!(调整底层架构,不!重!装!)

[archlinux][hardware] 查看SSD的使用寿命

 

在12月06日完成了底层硬盘的调整之后,做了如下的硬盘指标统计:

技术分享
/home/tong/Workspace/system/bcache [tong@T7] [17:18]
> cat 20161206 
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.11-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SanDisk based SSDs
Device Model:     SanDisk SSD U110 16GB
Serial Number:    153486400725
LU WWN Device Id: 5 001b44 ec81598d5
Firmware Version: U21B001
User Capacity:    16,013,942,784 bytes [16.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      1.8 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Dec  6 20:03:00 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x51) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   3) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   -O----   100   100   000    -    0
  9 Power_On_Hours          -O----   100   100   000    -    2199
 12 Power_Cycle_Count       -O----   100   100   000    -    693
171 Program_Fail_Count      -O----   100   100   000    -    0
172 Erase_Fail_Count        -O----   100   100   000    -    0
173 Avg_Write/Erase_Count   -O----   100   100   000    -    45
174 Unexpect_Power_Loss_Ct  -O----   100   100   000    -    56
187 Reported_Uncorrect      -O----   100   100   000    -    0
230 Perc_Write/Erase_Count  -O----   100   100   000    -    150
232 Perc_Avail_Resrvd_Space PO----   100   100   005    -    0
234 Perc_Write/Erase_Ct_BC  -O----   100   100   000    -    111
241 Total_LBAs_Written      -O----   100   100   000    -    538537024
242 Total_LBAs_Read         -O----   100   100   000    -    1275507679
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01       GPL,SL  R/O      1  Summary SMART error log
0x03       GPL,SL  R/O     16  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06       GPL,SL  R/O      1  SMART self-test log
0x09       GPL,SL  R/W      1  Selective self-test log
0x10       GPL,SL  R/O      1  SATA NCQ Queued Error log
0x11       GPL,SL  R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS       1  Device vendor specific log
0xa2       GPL,SL  VS       2  Device vendor specific log
0xa3       GPL,SL  VS       1  Device vendor specific log
0xa6-0xa7  GPL,SL  VS     255  Device vendor specific log

Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: 1 (16 sectors)
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 65535
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN               MIN_LBA               MAX_LBA  CURRENT_TEST_STATUS
    1  18446744073709551615  18446744073709551615  Not_testing
    2  18446744073709551615  18446744073709551615  Not_testing
    3  18446744073709551615  18446744073709551615  Not_testing
    4  18446744073709551615  18446744073709551615  Not_testing
    5  18446744073709551615  18446744073709551615  Not_testing
65535  18446744073709551615                 65534  Read_scanning was never started
Selective self-test flags (0xffff):
  Currently read-scanning the remainder of the disk.
If Selective self-test is pending on power-up, resume after 65535 minute delay.

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              42  ---  Current Temperature
0x05  0x010  1               -  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              54  ---  Highest Temperature
0x05  0x028  1              25  ---  Lowest Temperature
0x05  0x030  1              46  ---  Highest Average Short Term Temperature
0x05  0x038  1              46  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              95  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               1  N--  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2            5  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0001  2            0  Command failed due to ICRC error


/home/tong/Workspace/system/bcache [tong@T7] [17:18]
> 
12月06日硬盘指标统计

在12月19日再次进行了硬盘指标的统计:

技术分享
/home/tong/Workspace/system/bcache [tong@T7] [17:20]
> cat 20161219 
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SanDisk based SSDs
Device Model:     SanDisk SSD U110 16GB
Serial Number:    153486400725
LU WWN Device Id: 5 001b44 ec81598d5
Firmware Version: U21B001
User Capacity:    16,013,942,784 bytes [16.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      1.8 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Dec 19 16:51:12 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x51) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   3) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   -O----   100   100   000    -    0
  9 Power_On_Hours          -O----   100   100   000    -    2351
 12 Power_Cycle_Count       -O----   100   100   000    -    707
171 Program_Fail_Count      -O----   100   100   000    -    0
172 Erase_Fail_Count        -O----   100   100   000    -    0
173 Avg_Write/Erase_Count   -O----   100   100   000    -    61
174 Unexpect_Power_Loss_Ct  -O----   100   100   000    -    56
187 Reported_Uncorrect      -O----   100   100   000    -    0
230 Perc_Write/Erase_Count  -O----   100   100   000    -    203
232 Perc_Avail_Resrvd_Space PO----   100   100   005    -    0
234 Perc_Write/Erase_Ct_BC  -O----   100   100   000    -    120
241 Total_LBAs_Written      -O----   100   100   000    -    598719455
242 Total_LBAs_Read         -O----   100   100   000    -    1338182982
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01       GPL,SL  R/O      1  Summary SMART error log
0x03       GPL,SL  R/O     16  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06       GPL,SL  R/O      1  SMART self-test log
0x09       GPL,SL  R/W      1  Selective self-test log
0x10       GPL,SL  R/O      1  SATA NCQ Queued Error log
0x11       GPL,SL  R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS       1  Device vendor specific log
0xa2       GPL,SL  VS       2  Device vendor specific log
0xa3       GPL,SL  VS       1  Device vendor specific log
0xa6-0xa7  GPL,SL  VS     255  Device vendor specific log

Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: 1 (16 sectors)
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 65535
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN               MIN_LBA               MAX_LBA  CURRENT_TEST_STATUS
    1  18446744073709551615  18446744073709551615  Not_testing
    2  18446744073709551615  18446744073709551615  Not_testing
    3  18446744073709551615  18446744073709551615  Not_testing
    4  18446744073709551615  18446744073709551615  Not_testing
    5  18446744073709551615  18446744073709551615  Not_testing
65535  18446744073709551615                 65534  Read_scanning was never started
Selective self-test flags (0xffff):
  Currently read-scanning the remainder of the disk.
If Selective self-test is pending on power-up, resume after 65535 minute delay.

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              49  ---  Current Temperature
0x05  0x010  1               -  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              54  ---  Highest Temperature
0x05  0x028  1              25  ---  Lowest Temperature
0x05  0x030  1              46  ---  Highest Average Short Term Temperature
0x05  0x038  1              46  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              95  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               2  N--  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2            5  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0001  2            0  Command failed due to ICRC error


/home/tong/Workspace/system/bcache [tong@T7] [17:20]
> 
12月19日硬盘指标统计

比较如下:

/home/tong/Workspace/system/bcache [tong@T7] [17:20]
> diff 20161206 20161219 
1c1
< smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.11-1-ARCH] (local build)
---
> smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
17c17
< Local Time is:    Tue Dec  6 20:03:00 2016 CST
---
> Local Time is:    Mon Dec 19 16:51:12 2016 CST
62,63c62,63
<   9 Power_On_Hours          -O----   100   100   000    -    2199
<  12 Power_Cycle_Count       -O----   100   100   000    -    693
---
>   9 Power_On_Hours          -O----   100   100   000    -    2351
>  12 Power_Cycle_Count       -O----   100   100   000    -    707
66c66
< 173 Avg_Write/Erase_Count   -O----   100   100   000    -    45
---
> 173 Avg_Write/Erase_Count   -O----   100   100   000    -    61
69c69
< 230 Perc_Write/Erase_Count  -O----   100   100   000    -    150
---
> 230 Perc_Write/Erase_Count  -O----   100   100   000    -    203
71,73c71,73
< 234 Perc_Write/Erase_Ct_BC  -O----   100   100   000    -    111
< 241 Total_LBAs_Written      -O----   100   100   000    -    538537024
< 242 Total_LBAs_Read         -O----   100   100   000    -    1275507679
---
> 234 Perc_Write/Erase_Ct_BC  -O----   100   100   000    -    120
> 241 Total_LBAs_Written      -O----   100   100   000    -    598719455
> 242 Total_LBAs_Read         -O----   100   100   000    -    1338182982
126c126
< 0x05  0x008  1              42  ---  Current Temperature
---
> 0x05  0x008  1              49  ---  Current Temperature
140c140
< 0x07  0x008  1               1  N--  Percentage Used Endurance Indicator
---
> 0x07  0x008  1               2  N--  Percentage Used Endurance Indicator

/home/tong/Workspace/system/bcache [tong@T7] [17:21]
> 

关键指标红色标出。

从去年大概10月底新机初装开始,到12月06日,寿命禁用了1%。从12月06至19日短短13天,寿命值便增长为2%。

已知假设常规使用的情况下一块SSD的寿命是10年,计算如下:

/home/tong/Workspace/system/bcache [tong@T7] [16:53]
> bc -l
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty. 
538537024 / 365                  # 取整之前常规使用情况下,的使用时长为365天。
1475443.90136986301369863013          # 计算平均每天的写入次数。
(598719455 - 538537024) / 13
4629417.76923076923076923076          # 计算过去的13天里,评价每天的写入次数。
10 / 4                       # 两种情况下的单天写入次数取比例。
2.5000000000000000000

基于以上计算,采用bcache模式使用SSD硬盘的情况下,硬盘的写入次数是常规情况下的4倍。

按照常规经验值10年计算,bcache方式下SSD的寿命为10年的四分之一:2.5年。

我只为bcache分配了16GSSD的一半8GB。又因为bcache方式下,SSD的缓存为 LRU方式。所以,如果采用16GB的话,缓存内容会加倍,读写次数自然也会加倍。这样的话寿命将缩短为1.25年,这与百合的经验也完全相符,它的SSD就是一年之后坏掉的。

所以基于以上,我的这块SSD,将于2019年6月前坏掉。

 

淘宝上16G二手ngffSSD的售价是35。全新64G也不过150。

我决定将这块SSD作为耗材继续使用。完。:)

 

当然,后续我还会定期观察。随时更新信息。

 

[archlinux][hardware] ThankPad T450自带SSD做bcache之后的使用寿命分析