首页 > 代码库 > NetApp存储无法开机问题处理-(初始化重装系统)
NetApp存储无法开机问题处理-(初始化重装系统)
测试环境:
原有存储是两个独立控制器+磁盘柜,目前是一个控制器+磁盘柜。开机启动时,先开启扩展柜,一分钟后开启控制器。发现系统起不来,经过多次尝试失败后,决定通过维护模式进入系统进行查看。(类似于Windows7的维护模式一样)
问题处理:
开机boot启动项,按Ctrl+C命令中断正常启动,进入到boot menu菜单。
Starting AUTOBOOT press Ctrl-C to abort...
Loading X86_64/freebsd/image1/kernel:0x200000/10088648 0xb9f0c8/4301024 Entry at 0x80271e20
Loading X86_64/freebsd/image1/platform.ko:0xfba000/1990365 0x11a0000/296352 0x11e85a0/273360
Starting program at 0x80271e20
NetApp Data ONTAP 8.3.1P2
Copyright (C) 1992-2015 NetApp.
All rights reserved.
Checking boot device filesystem
** /dev/da0s1
** Phase 1 - Read and Compare FATs
** Phase 2 - Check Cluster Chains
** Phase 3 - Checking Directories
** Phase 4 - Checking for Lost Files
69 files, 1011584 free (31612 clusters)
MARK FILE SYSTEM CLEAN? yes
MARKING FILE SYSTEM CLEAN
Retry #1 of 5: /sbin/fsck_msdosfs /dev/da0s1
Retry #2 of 5: /sbin/fsck_msdosfs /dev/da0s1
Repaired boot device filesystem
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
*******************************
^CBoot Menu will be available.
WARNING: The battery is unfit to retain data during a power
outage. This is likely because the battery is
discharged but could be due to other temporary
conditions.
When the battery is ready, the boot process will
complete and services will be engaged.
To override this delay, press ‘c‘ followed by ‘Enter‘
c
CAUTION: Using this appliance without NVRAM
battery backup coupled with a power
failure condition CAN CAUSE DATA LOSS.
Are you sure you want to continue (y or n)? y
Proceeding without NVRAM battery backup.
Please choose one of the following:
(1) Normal Boot. #正常启动
(2) Boot without /etc/rc. #启动存储时,不执行/etc/rc设置参数
(3) Change password. #如果忘记了超级用户密码,可以在此修改
(4) Clean configuration and initialize all disks. #清除配置,初始化所有的磁盘
(5) Maintenance mode boot. #进入维护模式,当系统进不去的时候可以尝试用维护模式进入
(6) Update flash from backup config. #从备份配置中升级flash
(7) Install new software first. #安装新的软件
(8) Reboot node. #重启节点
Selection (1-8)? 5
ixgbe: e1a: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e1b: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e2a: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e2b: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
Ipspace "iwarp-ipspace" created
WAFL CPLEDGER is enabled. Checklist = 0x7ff841ff
add host 127.0.10.1: gateway 127.0.20.1
5
You have selected the maintenance boot option:
the system has booted in maintenance mode allowing the
following operations to be performed:
? acorn
acpadmin aggr
cna_flash disk
disk_latency disk_list
disk_mung disk_shelf
diskcopy disktest
dumpblock environment
fcadmin fcstat
fctest fru_led
ha-config halt
help ifconfig
key_manager led_off
led_on nv8
raid_config sasadmin
sasstat scsi
sesdiag sldiag
storage stsb
sysconfig systemshell
ucadmin version
vmservices vol
vol_db vsa
xortest
Type "help <command>" for more details.
In a High Availablity configuration, you MUST ensure that the
partner node is (and remains) down, or that takeover is manually
disabled on the partner node, because High Availability
software is not started or fully enabled in Maintenance mode.
FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED
NOTE: It is okay to use ‘show/status‘ sub-commands such as
‘disk show or aggr status‘ in Maintenance mode while the partner is up
Continue with boot? y
y
Ipspace "acp-ipspace" created
original max threads=40, original heap size=41943040
bip_nitro Virtual Size Limit=79455027 Bytes
bip_nitro: user memory=724406272, actual max threads=41, actual heap size=43201331
WARNING: Giving up waiting for mroot
Tue Feb 14 07:59:49 UTC 2017
*> ? #可以看到在维护模式下支持的命令参数
? disktest key_manager stsb
acorn dumpblock led_off sysconfig
acpadmin environment led_on systemshell
aggr fcadmin nv8 ucadmin
cna_flash fcstat raid_config version
disk fctest sasadmin vmservices
disk_latency fru_led sasstat vol
disk_list ha-config scsi vol_db
disk_mung halt sesdiag vsa
disk_shelf help sldiag xortest
diskcopy ifconfig storage
*> disk show
Local System ID: 1575136460
DISK OWNER POOL SERIAL NUMBER HOME DR HOME
------------ ------------- ----- ------------- ------------- -------------
0a.10.6 sz-3240-02(1575136687) Pool0 LXW6RH4M sz-3240-02(1575136687)
0b.10.7 sz-3240-02(1575136687) Pool0 LXW63XYM sz-3240-02(1575136687)
0a.10.2 sz-3240-01(1575136460) Pool0 LXW72ZGM sz-3240-01(1575136460)
0b.10.5 sz-3240-01(1575136460) Pool0 LXW1W02M sz-3240-01(1575136460)
0b.10.11 sz-3240-02(1575136687) Pool0 LXW6364M sz-3240-02(1575136687)
0a.10.8 sz-3240-02(1575136687) Pool0 LXV3HE7L sz-3240-02(1575136687)
0b.10.9 sz-3240-02(1575136687) Pool0 LXW5YNSM sz-3240-02(1575136687)
0a.10.4 sz-3240-01(1575136460) Pool0 LXWT76HL sz-3240-01(1575136460)
0b.10.3 sz-3240-01(1575136460) Pool0 LXW6ELRM sz-3240-01(1575136460)
0a.10.10 sz-3240-02(1575136687) Pool0 LXW6DTTM sz-3240-02(1575136687)
0b.10.1 sz-3240-01(1575136460) Pool0 LXW6R84M sz-3240-01(1575136460)
0a.10.0 sz-3240-01(1575136460) Pool0 LXV3GV4L sz-3240-01(1575136460)
由上图可以看到,存储12块磁盘被平均分配到了两个控制器上,由于目前只有一个控制器,所以很有可能系统在另外一个控制器上,而另外一个控制器缺少,导致开机无法启动。
现在手工把所有的磁盘都分配到当前控制器上。
*> disk reassign -s 1575136687 -d 1575136460
#把1575136687控制器上的磁盘都重新分配给1575136460控制器
#reassign {-s <old_sysid>} [-d <new_sysid>] [-p <partner_sysid>]- reassign disks from old filer
Partner node must not be in Takeover mode during disk reassignment from maintenance mode.
Serious problems could result!!
Do not proceed with reassignment if the partner is in takeover mode. Abort reassignment (y/n)? n
After the node becomes operational, you must perform a takeover and giveback of the HA partner node to ensure disk reassignment is successful.
Do you want to continue (y/n)? y
Disk ownership will be updated on all disks previously belonging to Filer with sysid 1575136687.
Do you want to continue (y/n)? y
Cannot do remote rescan. Use ‘run local disk show‘ on the console of sz-3240-01 for it to scan the newly assigned disks
Feb 14 08:04:52 [sz-3240-01:diskown.RescanMessageFailed:warning]: Could not send rescan message to sz-3240-01. Use the "disk show" command in nodeshell of sz-3240-01 for it to scan the newly inserted disks.
*> disk show
Local System ID: 1575136460
DISK OWNER POOL SERIAL NUMBER HOME DR HOME
------------ ------------- ----- ------------- ------------- -------------
0a.10.6 sz-3240-01(1575136460) Pool0 LXW6RH4M sz-3240-01(1575136460)
0b.10.7 sz-3240-01(1575136460) Pool0 LXW63XYM sz-3240-01(1575136460)
0a.10.2 sz-3240-01(1575136460) Pool0 LXW72ZGM sz-3240-01(1575136460)
0b.10.5 sz-3240-01(1575136460) Pool0 LXW1W02M sz-3240-01(1575136460)
0b.10.11 sz-3240-01(1575136460) Pool0 LXW6364M sz-3240-01(1575136460)
0a.10.8 sz-3240-01(1575136460) Pool0 LXV3HE7L sz-3240-01(1575136460)
0b.10.9 sz-3240-01(1575136460) Pool0 LXW5YNSM sz-3240-01(1575136460)
0a.10.4 sz-3240-01(1575136460) Pool0 LXWT76HL sz-3240-01(1575136460)
0b.10.3 sz-3240-01(1575136460) Pool0 LXW6ELRM sz-3240-01(1575136460)
0a.10.10 sz-3240-01(1575136460) Pool0 LXW6DTTM sz-3240-01(1575136460)
0b.10.1 sz-3240-01(1575136460) Pool0 LXW6R84M sz-3240-01(1575136460)
0a.10.0 sz-3240-01(1575136460) Pool0 LXV3GV4L sz-3240-01(1575136460)
现在所有的磁盘都已经划分到现有控制器下了,接下来重新安装存储操作系统:
*> halt
Waiting for PIDS: 624.
Terminated
Uptime: 8m46s
System halting...
Phoenix TrustedCore(tm) Server
Copyright 1985-2006 Phoenix Technologies Ltd.
All Rights Reserved
BIOS version: 5.3.0
Portions Copyright (c) 2007-2014 NetApp, Inc. All Rights Reserved
CPU = 1 Processors Detected, Cores per Processor = 4
Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
Testing RAM
512MB RAM tested
8192MB RAM installed
6144 KB L2 Cache
System BIOS shadowed
USB 2.0: MICRON eUSB DISK
BIOS is scanning PCI Option ROMs, this may take a few seconds...
+++++++++++++++++++
Boot Loader version 3.6
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2014 NetApp, Inc. All Rights Reserved.
CPU Type: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
机器起来后,要手工启动存储的系统:
LOADER-A> boot_ontap
Loading X86_64/freebsd/image1/kernel:0x200000/10088648 0xb9f0c8/4301024 Entry at 0x80271e20
Loading X86_64/freebsd/image1/platform.ko:0xfba000/1990365 0x11a0000/296352 0x11e85a0/273360
Starting program at 0x80271e20
NetApp Data ONTAP 8.3.1P2
Copyright (C) 1992-2015 NetApp.
All rights reserved.
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
*******************************
^CBoot Menu will be available.
WARNING: The battery is unfit to retain data during a power
outage. This is likely because the battery is
discharged but could be due to other temporary
conditions.
When the battery is ready, the boot process will
complete and services will be engaged.
To override this delay, press ‘c‘ followed by ‘Enter‘
c
CAUTION: Using this appliance without NVRAM
battery backup coupled with a power
failure condition CAN CAUSE DATA LOSS.
Are you sure you want to continue (y or n)? y
Proceeding without NVRAM battery backup.
Please choose one of the following:
(1) Normal Boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Clean configuration and initialize all disks.
(5) Maintenance mode boot.
(6) Update flash from backup config.
(7) Install new software first.
(8) Reboot node.
Selection (1-8)? 4
ixgbe: e1a: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e1b: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e2a: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e2b: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
Ipspace "iwarp-ipspace" created
WAFL CPLEDGER is enabled. Checklist = 0x7ff841ff
add host 127.0.10.1: gateway 127.0.20.1
Zero disks, reset config and install a new file system?:
Please answer yes or no
Zero disks, reset config and install a new file system?: yes
This will erase all the data on the disks, are you sure?: y
Rebooting to finish wipeconfig request.
Waiting for PIDS: 615.
Skipped backing up /var file system to CF.
Terminated
.
Uptime: 3m13s
System rebooting...
Phoenix TrustedCore(tm) Server
Copyright 1985-2006 Phoenix Technologies Ltd.
All Rights Reserved
BIOS version: 5.3.0
Portions Copyright (c) 2007-2014 NetApp, Inc. All Rights Reserved
CPU = 1 Processors Detected, Cores per Processor = 4
Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
Testing RAM
512MB RAM tested
8192MB RAM installed
6144 KB L2 Cache
System BIOS shadowed
USB 2.0: MICRON eUSB DISK
BIOS is scanning PCI Option ROMs, this may take a few seconds...
+++++++++++++++++++
Boot Loader version 3.6
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2014 NetApp, Inc. All Rights Reserved.
CPU Type: Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
Starting AUTOBOOT press Ctrl-C to abort...
Loading X86_64/freebsd/image1/kernel:0x200000/10088648 0xb9f0c8/4301024 Entry at 0x80271e20
Loading X86_64/freebsd/image1/platform.ko:0xfba000/1990365 0x11a0000/296352 0x11e85a0/273360
Starting program at 0x80271e20
NetApp Data ONTAP 8.3.1P2
Copyright (C) 1992-2015 NetApp.
All rights reserved.
*******************************
* *
* Press Ctrl-C for Boot Menu. *
* *
*******************************
Wipe filer procedure requested.
WARNING: The battery is unfit to retain data during a power
outage. This is likely because the battery is
discharged but could be due to other temporary
conditions.
When the battery is ready, the boot process will
complete and services will be engaged.
To override this delay, press ‘c‘ followed by ‘Enter‘
c
CAUTION: Using this appliance without NVRAM
battery backup coupled with a power
failure condition CAN CAUSE DATA LOSS.
Are you sure you want to continue (y or n)? y
Proceeding without NVRAM battery backup.
ixgbe: e1a: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e1b: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e2a: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
ixgbe: e2b: ** JUMBOMBUF DEBUG ** switching to large buffers(9k -> 3k): (sz = 5120)!
original max threads=40, original heap size=41943040
bip_nitro Virtual Size Limit=80844390 Bytes
bip_nitro: user memory=742682624, actual max threads=42, actual heap size=44459622
Ipspace "iwarp-ipspace" created
WAFL CPLEDGER is enabled. Checklist = 0x7ff841ff
add host 127.0.10.1bootarg.bootmenu.selection is |4a|
: gateway 127.0.20.1
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
接下来就漫长的等待了,初始化的时候是所有的硬盘同时做条带化,与硬盘数目多少无关,只与硬盘容量和转数相关。
重启完成后,就会进入到初始化配置界面,包括集群设置、IP地址设置等等(后面会介绍,尽情期待)
NetApp存储无法开机问题处理-(初始化重装系统)