V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
xiaoyuesanshui
V2EX  ›  问与答

smartctl 输出结果求解

  •  
  •   xiaoyuesanshui · 2023-06-14 17:12:17 +08:00 · 587 次点击
    这是一个创建于 529 天前的主题,其中的信息可能已经有所发展或是发生改变。
    最近 transmission 总是报 local data corrupted #1777,pls verify local data

    虽然 NAS 都有体面的关机,但是第一块硬盘也是 2019 年开始用的,我不得不担心硬盘的健康情况

    smartctl -H 全部 pass
    但是 smartctl -t short 有两个盘输出的结果不太对

    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Short captive Interrupted (host reset) 90% 31594 -
    # 2 Short captive Interrupted (host reset) 90% 31594 -
    # 3 Short captive Interrupted (host reset) 90% 31594 -
    # 4 Short captive Interrupted (host reset) 90% 31594 -
    # 5 Short captive Interrupted (host reset) 90% 31594 -
    # 6 Short captive Interrupted (host reset) 90% 31594 -
    # 7 Short captive Interrupted (host reset) 90% 31594 -
    # 8 Short captive Interrupted (host reset) 90% 31594 -
    # 9 Short captive Interrupted (host reset) 90% 31594 -
    #10 Short captive Interrupted (host reset) 90% 31594 -
    #11 Short captive Interrupted (host reset) 90% 31590 -
    #12 Short captive Interrupted (host reset) 90% 31590 -
    #13 Short captive Interrupted (host reset) 90% 31590 -
    #14 Short captive Interrupted (host reset) 90% 31590 -
    #15 Short captive Interrupted (host reset) 90% 31590 -
    #16 Short captive Interrupted (host reset) 90% 31590 -
    #17 Short captive Interrupted (host reset) 90% 31590 -
    #18 Short captive Interrupted (host reset) 90% 31590 -
    #19 Short captive Interrupted (host reset) 90% 31590 -
    #20 Short captive Interrupted (host reset) 90% 31590 -
    #21 Short captive Interrupted (host reset) 90% 31590 -


    查了一些资料,也没有太多帮助

    后来我把硬盘上上的分区全部 umount ,再进行 smartctl -t short,结果也是这样

    smartctl -a /dev/sdc 的结果如下

    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-19-amd64] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Family: Western Digital Red
    Device Model: WDC WD40EFRX-68N32N0
    Serial Number: WD-WCC7K7RJ5Z78
    LU WWN Device Id: 5 0014ee 2bbbd7c5e
    Firmware Version: 82.00A82
    User Capacity: 4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes: 512 bytes logical, 4096 bytes physical
    Rotation Rate: 5400 rpm
    Form Factor: 3.5 inches
    Device is: In smartctl database [for details use: -P show]
    ATA Version is: ACS-3 T13/2161-D revision 5
    SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is: Wed Jun 14 17:11:26 2023 CST
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    General SMART Values:
    Offline data collection status: (0x04) Offline data collection activity
    was suspended by an interrupting command from host.
    Auto Offline Data Collection: Disabled.
    Self-test execution status: ( 41) The self-test routine was interrupted
    by the host with a hard or soft reset.
    Total time to complete Offline
    data collection: (43560) seconds.
    Offline data collection
    capabilities: (0x7b) SMART execute Offline immediate.
    Auto Offline data collection on/off support.
    Suspend Offline collection upon new
    command.
    Offline surface scan supported.
    Self-test supported.
    Conveyance Self-test supported.
    Selective Self-test supported.
    SMART capabilities: (0x0003) Saves SMART data before entering
    power-saving mode.
    Supports SMART auto save timer.
    Error logging capability: (0x01) Error logging supported.
    General Purpose Logging supported.
    Short self-test routine
    recommended polling time: ( 2) minutes.
    Extended self-test routine
    recommended polling time: ( 463) minutes.
    Conveyance self-test routine
    recommended polling time: ( 5) minutes.
    SCT capabilities: (0x303d) SCT Status supported.
    SCT Error Recovery Control supported.
    SCT Feature Control supported.
    SCT Data Table supported.

    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x002f 200 194 051 Pre-fail Always - 0
    3 Spin_Up_Time 0x0027 162 159 021 Pre-fail Always - 6875
    4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 244
    5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
    7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
    9 Power_On_Hours 0x0032 057 057 000 Old_age Always - 31595
    10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
    11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
    12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 220
    192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21
    193 Load_Cycle_Count 0x0032 196 196 000 Old_age Always - 14413
    194 Temperature_Celsius 0x0022 106 104 000 Old_age Always - 44
    196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
    197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 5
    198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
    199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
    200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

    SMART Error Log Version: 1
    No Errors Logged

    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Short captive Interrupted (host reset) 90% 31594 -
    # 2 Short captive Interrupted (host reset) 90% 31594 -
    # 3 Short captive Interrupted (host reset) 90% 31594 -
    # 4 Short captive Interrupted (host reset) 90% 31594 -
    # 5 Short captive Interrupted (host reset) 90% 31594 -
    # 6 Short captive Interrupted (host reset) 90% 31594 -
    # 7 Short captive Interrupted (host reset) 90% 31594 -
    # 8 Short captive Interrupted (host reset) 90% 31594 -
    # 9 Short captive Interrupted (host reset) 90% 31594 -
    #10 Short captive Interrupted (host reset) 90% 31594 -
    #11 Short captive Interrupted (host reset) 90% 31590 -
    #12 Short captive Interrupted (host reset) 90% 31590 -
    #13 Short captive Interrupted (host reset) 90% 31590 -
    #14 Short captive Interrupted (host reset) 90% 31590 -
    #15 Short captive Interrupted (host reset) 90% 31590 -
    #16 Short captive Interrupted (host reset) 90% 31590 -
    #17 Short captive Interrupted (host reset) 90% 31590 -
    #18 Short captive Interrupted (host reset) 90% 31590 -
    #19 Short captive Interrupted (host reset) 90% 31590 -
    #20 Short captive Interrupted (host reset) 90% 31590 -
    #21 Short captive Interrupted (host reset) 90% 31590 -

    SMART Selective self-test log data structure revision number 1
    SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
    Selective self-test flags (0x0):
    After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.


    请问我这块盘有没有问题?


    另外,reboot 服务器之后,transmission 中报错的任务 正常了
    第 1 条附言  ·  2023-06-16 10:19:32 +08:00
    sudo smartctl -t long /dev/sdc 的结果出来了

    # 1 Extended offline Completed: read failure 90% 31615 9047736

    我这个盘是不是要完蛋了
    10 条回复    2023-06-16 17:04:30 +08:00
    paranoiagu
        1
    paranoiagu  
       2023-06-14 18:37:28 +08:00 via Android
    可能是 smart 坏了,我有一块西数红盘好像也是这样。
    xiaoyuesanshui
        2
    xiaoyuesanshui  
    OP
       2023-06-15 09:03:27 +08:00
    请问 smart 坏了是什么意思?
    按照我的理解,smart 应该是个程序吧
    我一共有 sda/b/c/d/e
    b/d 正常检测
    c/e 同样的错误
    julyclyde
        3
    julyclyde  
       2023-06-15 13:30:18 +08:00
    每次测试都是 31594 出错
    但是没产生 log
    不过既然故障稳定重现,建议还是换盘

    如果不甘心你就 long 测试
    xiaoyuesanshui
        4
    xiaoyuesanshui  
    OP
       2023-06-15 13:54:45 +08:00
    31594 是通电时间
    已经跑着 long 了
    julyclyde
        5
    julyclyde  
       2023-06-16 13:55:22 +08:00
    @xiaoyuesanshui 哦我看错字段了。我还以为 31594 是 LBA_of_first_error ;原来居然是 LifeTime(hours)吗??

    short 确实是容易被打断的。用 long 吧
    xiaoyuesanshui
        6
    xiaoyuesanshui  
    OP
       2023-06-16 13:57:07 +08:00
    @julyclyde 今天 long 出来了
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    1 Extended offline Completed: read failure 90% 31615 9047736

    197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10

    感觉要完蛋了 ,我已经在做迁移,准备换盘了
    julyclyde
        7
    julyclyde  
       2023-06-16 13:59:44 +08:00
    @xiaoyuesanshui 不幸中的万幸,至少是个确定的结果,可以坚定信心
    不像我以前遇到的一些破烂,都 tmd 已经无法通信了还报 health OK 呢
    xiaoyuesanshui
        8
    xiaoyuesanshui  
    OP
       2023-06-16 14:09:55 +08:00
    @julyclyde 但是我还有一个一万多小时的 18T 盘,也没用多少。short 不出。long 也是 interrupted 。搞得也很烦

    不过那个盘还在保,还没有坏道出现,数据我先往里面迁。
    julyclyde
        9
    julyclyde  
       2023-06-16 17:00:29 +08:00
    @xiaoyuesanshui 不读写的时候做测试试试?
    单用户模式,不 mount
    xiaoyuesanshui
        10
    xiaoyuesanshui  
    OP
       2023-06-16 17:04:30 +08:00
    @julyclyde 我是在非读写期间做的 long 测试,但是还被打断了
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   864 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 25ms · UTC 21:56 · PVG 05:56 · LAX 13:56 · JFK 16:56
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.