Tuesday, August 6, 2013

Replacing drive in netapp

The NetApp filer in the lab recently encountered a failed disk.  With the failed disk confirmed dead and removed, and the replacement disk added this is how the disk is replaced
fas3050clow*> disk assign 0a.29
disk 0a.29 (S/N 3HY0T1GG00007342W9NJ) is already owned by system cr2conffd03 (ID
 84173417).
disk assign: Assign failed for one or more disks in the disk list.
Detour.  The following parsed output confirmed this disk had ownership information from a previous filer in its DNA:
fas3050clow*> disk show -a
  DISK       OWNER                  POOL   SERIAL NUMBER
———— ————-          —–  ————-
0a.29        cr2conffd03(84173417)   Pool0  3HY0T1GG00007342W9NJ
Quick help from the community set me in the right direction.  A few commands accomplished the required task:
fas3050clow*> priv set advanced
fas3050clow*> disk assign 0a.29 -s unowned -f
Note: Disks may be automatically assigned to this node, since option disk.auto_a
ssign is on.
fas3050clow*> disk assign 0a.29
Thu May 13 13:30:56 CDT [fas3050clow: diskown.changingOwner:info]: changing owne
rship for disk 0a.29 (S/N 3HY0T1GG00007342W9NJ) from unowned (ID -1) to fas3050c
low (ID 101175198)
Thu May 13 13:30:56 CDT [fas3050clow: HTTPPool00:warning]: HTTP XML Authenticati
on failed from 192.168.110.71.
fas3050clow*> Thu May 13 13:30:56 CDT [fas3050clow: diskown.RescanMessageFailed:
warning]: Could not send rescan message to fas3050clow. Please type disk show on
 the console of fas3050clow for it to scan the newly inserted disks.
Thu May 13 13:30:56 CDT [fas3050clow: raid.assim.label.upgrade:info]: Upgrading
RAID labels.
Thu May 13 13:30:57 CDT [fas3050clow: disk.fw.downrevWarning:warning]: 1 disks h
ave downrev firmware that you need to update.
Thu May 13 13:31:00 CDT [fas3050clow: monitor.globalStatus.ok:info]: The system’
s global status is normal.
Shortly after, the firmware on the replacement disk was automatically upgraded:
Thu May 13 13:31:18 CDT [fas3050clow: dfu.firmwareDownloading:info]: Now downloa
ding firmware file /etc/disk_fw/X274_SCHT6146F10.NA16.LOD on 1 disk(s) of plex [
Pool0]…
I confirmed via NetApp System Manager (my GUI crutch), that the replaced disk is now a spare for the two aggregates configured on/owned by the head.  I then updated the storage array spreadsheet I maintain which tracks disks, spares, arrays, luns, aggregates, volumes, exports, groups, pools, etc. for the various lab storage.
One additional item I learned from a NetApp Engineer is that spares are not to remain static.  Rather, the role is designed to float around to different disks as failures can and will occur.  This is a habit I’m learning to break which contradicts management of older storage arrays where spares instantiated to active duty were later deactivated when a failed disk was replaced.
 Don’t forget to exit privileged mode when done:
fas3050clow*> priv set

No comments:

Post a Comment