Thursday, January 30, 2014

Restoring data from Snashots using Snaprestore command

Ok so now you have allocated correct snap reserve space, configured snap schedules, snap autodelete, users have access to their snapshots and they recover their data without any interference of backup team. Everyone is happy so you happy but all of sudden on a Friday evening get a call from VP marketing crying on phone that he lost all his data from his network drive and windows shows recovery time of 2 hrs but he wants his 1Gb pst to be accessible now as he is on VPN with a client and needs to pull some old mails from his pst. Well that’s nothing abnormal as he was having lots of data and to recover the data windows has to read all the data from snapshot and then write back on network drive which but obvious will take time. Now what would you say, will you tell him to navigate to his pst and recover it (which shouldn’t take much time on fast connection) then try to recover all the data or ok I have recovered all your data while talking on the phone and become hero.
Well I must say I would like to use the opportunity to become hero with a minute or less of work, but before we do a few things to note.
For volume snaprestore:
·                     The volume must be online and must not be a mirror.
·                     When reverting the root volume, filer will be rebooted.
·                     Non-root volumes do not require a reboot however when reverting a non-root volume, all ongoing access to the volume must be terminated, just as is done when a volume is brought offline.
For single-file snaprestore:
·                     The volume used for restoring the file must be online and must not be a mirror.
·                     If restore_as_path is specified, the path must be a full path to a filename, and must be in the same volume as the volume used for the restore.
·                     Files other than normal files and LUNs are not restored. This includes directories (and their contents), and files with NT streams.
·                     If there is not enough space in the volume, the single file snap restore will not start.
·                     If the file already exists (in the active file system), it will be overwritten with the version in the snapshot.
To restore data there are two ways, first system admins using “snap restore” command invoked by SMO, SMVI, Filer view or system console and second by end users where they can restore by copying file from .snapshot or ~snapshot directory or by using revert function in XP or newer system. However restoring data through snap restore command is very quick (seconds) even for TBs of data. Syntax for snap restore is as below.
“snap restore -t vol -s <snapshot_name> -r <restore-as-path> <volume_name>”
If you don’t want to restore the data at different place then remove the “-r <restore-as-path>” argument and filer will replace current file with the version in snapshot and if you don’t provide a snapshot name in syntax then system will show you all available snapshots and will prompt to select snapshot from which you want to restore the data.
Here’s the simplest form of this command as example to recover a file.
testfiler> snap restore -t file /vol/testvol/RootQtree/test.pst
WARNING! This will restore a file from a snapshot into the active filesystem. If the file already exists in the active filesystem, it will be overwritten with the contents from the snapshot.
Are you sure you want to do this? yes
The following snapshots are available for volume testvol:
date            name
------------    ---------
Nov 17 13:00    hourly.0
Nov 17 11:00    hourly.1
Nov 17 09:00    hourly.2
Nov 17 00:00    weekly.0
Nov 16 21:00    hourly.3
Nov 16 19:00    hourly.4
Nov 16 17:00    hourly.5
Nov 16 15:00    hourly.6
Nov 16 00:00    nightly.0
Nov 15 00:00    nightly.1
Nov 14 00:00    nightly.2
Nov 13 00:00    nightly.3
Nov 12 00:00    nightly.4
Nov 11 00:00    nightly.5
Nov 10 00:00    weekly.1
Nov 09 00:00    nightly.6
Nov 03 00:00    weekly.2
Oct 27 00:00    weekly.3
Which snapshot in volume testvol would you like to revert the file from? nightly.5
You have selected file /vol/testvol/RootQtree/test.pst, snapshot nightly.5

Proceed with restore? yes

Wednesday, January 29, 2014

Data Deduplication Concepts

Data Deduplication

What is Data Deduplication?

Data deduplication essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. Deduplication is able to reduce the required storage capacity since only the unique data is stored.

For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only 1 MB.

Need for Data Deduplication?
In general, data deduplication improves data protection, increases the speed of service, and reduces costs.
Lower storage space requirements will save money on disk expenditures.
The more efficient use of disk space also allows for longer disk retention periods, which provides better recovery time objectives (RTO) for a longer time and reduces the need for tape backups.
Data deduplication also reduces the data that must be sent across a WAN for remote backups,replication, and disaster recovery.

How Data Deduplication works?
Data deduplication can generally operate at the file, block, and even the bit level.
File deduplication eliminates duplicate files (as in the example above), but this is not a very efficient means of deduplication.
Block and bit deduplication looks within a file and saves unique iterations of each block or bit.
Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process generates a unique number for each piece which is then stored in an index. If a file is updated, only the changed data is saved. That is, if only a few bytes of a document or presentation are changed, only the changed blocks or bytes are saved, the changes don't constitute an entirely new file. This behavior makes block and bit deduplication far more efficient. However, block and bit deduplication take more processing power and uses a much larger index to track the individual pieces.

Deduplication in NetApp Environment

Netapp has implemented their deduplication function at fixed block (4kb) level which gives more space saving and works very efficiently. As it works on block level irrespective of file type or data format hence you can dedupe any type of file on either cifs or nfs even you can dedupe a lun of any size no matter where they are written in a volume.
This picture gives a high level overview of data before and after deduplication process. Here all similar colours of boxes denotes block with similar data and before deduplication process all the duplicate blocks were written to the different areas of hard disk so once the deduplication process runs it will identifies all duplicate blocks and removes them so only unique blocks of data is on volume.
As stated before deduplication process runs on storage level hence no configuration is required on application side and they keep accessing the data as before. While system creates the fingerprint in the process of writing of new data there's a negligible performance impact on your system, however if your filer is heavily utilized and it's constantly above 50% utilization then a performance impact will be an average of 15%.
Under the hood
Whenever new data is written on a flexvol which is having asis on (NetApp term of deduplication) OnTap creates a fingerprint for every block of data it writes for comparison. At this moment system writes all the data as any other system except recording some extra information for your data i.e. fingerprint for every block. Now either you have to start the deduplication process manually or schedule it to run on a specific time. Once the deduplication process is started then fingerprints are checked for duplicates and, when found, first a byte-by-byte comparison of the blocks is done to make sure that the blocks are indeed identical, and if they are found to be identical, the block's pointer is updated to the already existing data block and the new (duplicate) data block is released.
The maximum sharing for a block is 255. This means, for example, that if there are 500 duplicate blocks, deduplication would reduce that to only 2 blocks. Also note that this ability to share blocks is different from the ability to keep 255 Snapshot copies for a volume.

Deduplication in EMC Environment
This is just a high level overview of deduplication in EMC as they have added dedupe function only in January 2009 to their Celerra range of products.
EMC has deployed deduplication on their newer Celerra models on file level in conjunction with compression technology. As it's on file level hence the duplication is very fast however it gives very small savings because two or more files should be identical to be a dedupe candidate. Compression technology used by EMC gives additional level of space saving as it uses spare CPU cycles from the system hence you don't have to invest money on expensive specialized compression products. However even after having deduplication working with compression it gives less storage savings compare to NetApp Fixed block level deduplication technology. How? Here's the detail
As it's on file level and files needs to be an exact match for deduplication hence vmdk files and any luns made on the storage are not de-duplicated.
It targets only infrequently accessed files as compressing active files is not a good idea
By default any files more than 200mb size is left untouched
Compression works only on file size more than 24K
It disables MPFS
Has a performance impact in reading deduped and compressed files
Celerra Data Deduplication
NetApp Data Deduplication
User interface
GUI-Simple graphical user interface, a one-click operation to enable
CLI only; cryptic commands with limited flexibility
Additional space savings on duplicate PLUS unique data
NetApp does not offer compression; this makes EMC more efficient in saving space
Unlimited file system size
EMC supports a 16 TB file system size across the entire Celerra unified storage series (NX4 through the NS-960)
NetApp limits the file system size based on the Filer model; the FAS2020 supports only 1 TB to a maximum of 16 TB for the FAS6080
Integrated with snaps
Celerra snaps do not negatively affect deduplication space savings in production file systems. Space savings can be realized immediately
NetApp will not achieve space savings from deduplication on any data that is currently part of a snapshot

How to check part number of installed adapter in Ontap

There are number of situations when you want to check part number of installed PCI adapters in your NetApp FAS or V-series system, how do you do it?

Well whatever way you do there’s a simple way and undocumented also (atleast in their man pages) “sysconfig -ca”.
Just run the command and it will give you part number of all the pci adapters installed as well checks if they are in appropriate slot.

Here’s the sample output

XXYY> sysconfig -ca
sysconfig: slot 3 OK: X3147: NetApp NVRAM6 512MB
sysconfig: slot 2 OK: X1049A: PCI-E Quad 10/100/1000 Ethernet G20
sysconfig: slot 1 OK: X2054B: LSI 949E; PCI-E quad-port Fibre Channel (LSI7404EP)
sysconfig: Unless directed by NetApp Global Services volumes root,  should have the volume option create_ucode set to On.

Monday, January 20, 2014

Which is faster, NDMPcopy or vol copy?

Only if I have to count speed then vol copy, because it copies blocks directly from disk without going through FS, however I think it’s well suitable if you want to migrate a volume.

  • CPU usage can be throttled
  • Source volume snapshot can be copied
  • Simultaneously 4 copy operations can be started
  • Once started it goes to background and you can use console for other purpose 

  • Destination can’t be root volume
  • Destination volume should be offline
  • All data in destination volume will be over-written
  • Destination volume size should be bigger or equal to source
  • Single file or directory cannot be specified for copy operation
  • Both the volumes should be of same type; traditional or flexible
  • If data is copied between two filers both filer should have other filer’s entry in /etc/hosts.equiv file and loopback address for itself in /etc/hosts file 

However for copying data between two filers for test or any other purpose ndmpcopy is more suitable because it gives you additional control and less restrictions, which is very useful.

  • Little or no CPU overhead
  • Incremental copy is supported
  • No limitation on volume size and type
  • No need to take destination volume offline
  • Single file or directory can also be specified
  • No file fragmentation on destination volume as all data is copied sequentially from source volume so improved data layout
  • No configuration is required between two filers and username and password is used for authentication

  • Snapshots can’t be copied from source
  • Console is not available till the time copy operation is running so no multiple ndmpcopy operations
  • If lots of small files has to be copied then copy operation will be slower 

So as you have seen both are well however one can’t be replaced for other and both have their usage for different purposes.

How to check unplanned downtime detail for a NetApp filer

Every now and then someone ask us what is uptime of system and we just type 'uptime' on system console to get the detail instantly.

This is really handy command to know when the system was last rebooted and how many operations per protocol it has served since then. Wouldn't our life be little easy if managers get satisfy with this detail? Alas! but that doesn't happen and they ask us to give all the details since we have acquired the system or 1st January and then we go back to our excel sheet or ppt we have created as part of monthly report to pull the data.

How about if we can get same information from system with just a command, wouldn't that be cool. Fortunate enough we have little known command 'availtime' right inside Ontap which just do the exact same function and specifically created after thinking about our bosses.

HOST02*> availtime fullService statistics as of Sat Aug 28 18:07:33 BST 2010 System  (UP). First recorded 68824252 secs ago on Mon Jun 23 04:16:41 BST 2008         Planned   downs 31, downtime 6781737 secs, longest 6771328, Tue Sep  9 15:07:33 BST 2008         Uptime counting unplanned downtime: 100.00%; counting total downtime:  90.14% NFS     (UP). First recorded 68824242 secs ago on Mon Jun 23 04:16:51 BST 2008         Planned   downs 43, downtime 6849318 secs, longest 6839978, Wed Sep 10 10:11:43 BST 2008         Uptime counting unplanned downtime: 100.00%; counting total downtime:  90.04% CIFS    (UP). First recorded 61969859 secs ago on Wed Sep 10 12:16:34 BST 2008         Planned   downs 35, downtime 17166 secs, longest 7351, Thu Jul 30 13:52:25 BST 2009         Uptime counting unplanned downtime: 100.00%; counting total downtime:  99.97% HTTP    (UP). First recorded 47876362 secs ago on Fri Feb 20 14:08:11 GMT 2009         Planned   downs 8, downtime 235 secs, longest 53, Wed Jan 20 14:10:18 GMT 2010         Unplanned downs 16, downtime 4915 secs, longest 3800, Mon Jul 27 16:01:02 BST 2009         Uptime counting unplanned downtime:  99.98%; counting total downtime:  99.98% FCP     (DOWN). First recorded 68817797 secs ago on Mon Jun 23 06:04:16 BST 2008         Planned   downs 17, downtime 44988443 secs, longest 38209631, Sat Aug 28 18:07:33 BST 2010         Unplanned downs 6, downtime 78 secs, longest 21, Fri Feb 20 15:24:44 GMT 2009         Uptime counting unplanned downtime:  99.99%; counting total downtime:  34.62% iSCSI   (DOWN). First recorded 61970687 secs ago on Wed Sep 10 12:02:46 BST 2008         Planned   downs 21, downtime 38211244 secs, longest 36389556, Sat Aug 28 18:07:33 BST 2010         Uptime counting unplanned downtime: 100.00%; counting total downtime:  38.33% 

I am not sure why NetApp has kept this command in Advanced mode but once you know this command I bet next time you will not refrain yourself going inside advance mode to see how many unscheduled downtime you had since last reset.

A shorter version of this command is just 'availtime' it also shows the same information as 'availtime full' however it truncates letters from output and denotes  Planned with P and Unplanned with U which is very good if you want to pass it in script. 

HOST04*> availtimeService statistics as of Sat Aug 28 18:07:33 BST 2010 System  (UP). First recorded (20667804) on Wed Sep 23 09:35:49 GMT 2009         P  5, 496, 139, Fri Dec 11 15:58:19 GMT 2009         U  1, 1605, 1605, Wed Mar 31 17:01:41 GMT 2010 CIFS    (UP). First recorded (20666589) on Wed Sep 23 09:56:04 GMT 2009         P  7, 825, 646, Thu Jan 21 19:08:03 GMT 2010         U  1, 77, 77, Wed Mar 31 16:34:54 GMT 2010 HTTP    (UP). First recorded (20664731) on Wed Sep 23 10:27:02 GMT 2009         P  3, 51, 22, Thu Jan 21 19:17:25 GMT 2010         U  4, 203, 96, Thu Jan 21 19:08:03 GMT 2010 FCP     (UP). First recorded (20477735) on Fri Sep 25 14:23:38 GMT 2009         P  3, 126, 92, Thu Jan 21 19:07:57 GMT 2010         U  4, 108, 76, Wed Mar 31 16:34:53 GMT 2010

In order to reset the output use 'reset' switch and it will zero out all the counters, make sure you have recorded the statistics before you reset the counters as once you reset the counters you will not be able to get details of system uptime since system was built so you may like to do only after you acquire a new system, have done all the configuration and now it's the time for it to serve user requests.

HA Configuration Checker (ha-config-check.cgi)

The HA Configuration Checker is a Perl script that detects errors in the configuration of a pair of NetApp HA (active-active) storage controllers. It will run as a command from a Unix shell or Windows prompt, but also doubles as a CGI script that can be executed by a Unix web server. The script uses rsh or ssh to communicate with the storage controllers you're checking, so you'll need to have the appropriate permissions for rsh to run on both storage controllers in the HA pair.

 If no /etc/hosts.equiv entry exists for the host where you tring to run, then the username and password must be provided to the script.

D:\>ha-config-check.exe -l filer1 filer2
filer1 rsh login: bali
Password: ********
filer2 rsh login: bali
Password: ********

Output would be..

== NetApp HA Configuration Checker v2.0.0 ==

Checking rsh logins. rsh filer1 -l bali:******** version

Checking rsh logins. rsh filer2 -l bali:******** version
Checking Data ONTAP versions...
Checking licenses...
Checking HA configuration identity...
Checking cf status...
Checking fcp cfmode settings...
fcp: FCP is not licensed.
Checking options...
Option timed.sched                  1h
 on filer2 has no match on filer1
Option timed.sched                  hourly
 on filer1 has no match on filer2
HA configuration issue(s) found above. Please correct them and rerun this script

Download the tool from below NOW link


When a disk enters maintenance center on one node (There will be lot of reason behind this (One reason is it’s not a valid disk to use it)), the partner may not be aware of this. This may result in one node excluding the disk in its disk inventory, while the partner including the same disk in its inventory. This will result in generating "cf.disk.inventory.mismatch" and "CLUSTER ERROR: DISK/SHELF COUNT MISMATCH" auto support.

If the disk in the maintenance center is repaired after testing, the disk inventory mismatch will be automatically resolved. If the disk in the maintenance center is failed after testing, the disk should be removed.

filer1> disk show 1b.57
------------ ------------- ----- ------------- -------------
1b.57 filer2 (151704668) Pool0 3SK0AFTX0000902778L9 filer2 (151704668)
filer2> disk show 1b.57
------------ ------------- ----- ------------- -------------
1b.57 filer2 (151704668) FAILED 3SK0AFTX0000902778L9 filer2 (151704668)

From syslog (/etc/messages)

Fri Jul  6 23:02:24 EDT [filer1: raid.disk.missing:info]: Disk 0b.57 Shelf 3 Bay 9 [NETAPP   X291_S15K7420F15 NA00] S/N [3SK0AFTX0000902778L9] is missing from the system

Fri Jul  6 18:52:10 EDT [filer1: cf.disk.inventory.mismatch:CRITICAL]: Status of the disk 1b.57 (20000024:B6564A61:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000) has recently changed or the node (filer2) is missing the disk.

But after sometime it will completely go to failed pool if ONTAB failed to rectify the soft error on that disk if not it will back to inventory stating that soft error has been fixed. See below message from syslog.

Fri Jul  6 20:41:11 EDT [filer1: cf.disk.inventory.mismatchOK:info]: The node (filer2) included the disk 0b.57 (20000024:B6564A61:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000) in its inventory.

Failed disk replacement in NetApp

Disk failures are very common in storage environment and as a storage administrator we come across this situation very often, how often that depends how much disks your storage systems is having; more disks you manage more often you come across this situation.
This post I have written considering RAID-DP with FC-AL disks because it’s always better than RAID4 and SCSI loops we don’t use. Due to its design RAID-DP gives protection from double disk failure in a single raid group. To say that it means you will not loose data even if 2 disks are failed in a single RG at same time or one after another.
As like any other storage system Ontap also uses a disk from spare disks pool to rebuild the data from surviving disk as soon as it encounters a failed disk situation and sends an autosupport message to NetApp for parts replacement. Once autosupport is received by NetApp they initiate RMA process and part gets delivered to the address listed for that failed system in NetApp records. Once the disk arrives you change the disk by yourself or ask a NetApp engineer to come at onsite and change it, whatever way as soon as you replace the disk your system finds the newly working disk and adds it in spare pool.
Now wasn’t that pretty simple and straightforward? Oh yes; because we are using software based disk ownership and disk auto assignment is turned on. Much like your baby had some cold so he called-up GP himself and got it cured rather than asking you to take care of him, but what about if there are some more complication.
Now, will cover what all other things can come in way and any other complications.
Scenario 1:
I have replaced my drive and light shows Green or Amber but ‘sysconfig -r' still shows the drive as broken?
Sometimes we face this problem because system was not able to either label the disks properly or replaced disk itself is not good. The first thing we try is to label the disk correctly if that doesn’t work try replacing with another disk or known good disk but what if that too doesn’t work, just contact NetApp and follow their guidelines.
To label the disk from "BROKEN" to "SPARE" first you have to note down the broken disk id, which you can get from “aggr status -r", now go to advance mode with “priv set advanced” and run “disk unfail ” at this stage your filer will throw some 3-4 errors on console or syslog or snmp traps, depends on how you have configured but this was the final step and now disks should be good which you can confirm with “disk show” for detailed status or “sysconfig -r” command. Give it a few seconds to recognize the changed status of disk if status change doesn’t shows at first.
Scenario 2:
Two disks have failed from same raid group and I don’t have any spare disk in my system.
Now in this case you are really in big trouble because always you need to have at least one spare disk available in your system whereas NetApp recommends 1:28 ratio i.e. have one spare on each 28 disks. In the situation of dual disk failure you have very high chances of loosing your data if another disk goes while you are rebuilding the data on spare disk or while you are waiting for new disks to arrive.
So always have minimum 2 disks available in your system one disk is also fine and system will not complain about spare disk but if you leave system with only one spare disk then maintenance centre will not work and system will not scan any disk for potential failure.
Now going to your above situation that you have dual disk failure with no spares available, so best bet is just ring NetApp to replace failed disk ASAP or if you think you are loosing your patient select same type of disk from another healthy system, do a disk fail, remove disk and replace it with failed disk on other system.
After adding the disk to another filer if it shows Partial/failed volume, make sure the volume reported as partial/failed belongs to newly inserted disk by using “vol status -v” and “vol status -r" commands, if so just destroy the volume with “vol destroy” command and then zero out the disk with “disk zero spares”.
This exercise will not take more than 15 min(except disk zeroing which depends on your disk type and capacity) and you will have single disk failure in 2 systems which can survive with another disk failure, but what if that doesn’t happens and you keep running your system with dual disk failure. Your system will shut down by itself after 24 hours; yes it will shut down itself without any failover to take, your attention. There is a registry setting to control how long your system should run after disk failure but I think 24hrs is a good time and you shouldn’t increase or decrease it until and unless you think you don’t care of the data sitting there and anyone accessing it.
Scenario 3:
My drive failed but there is no disk with amber lights
A number of times these things happen because disk electricals are failed and no more system can recognize it as part of it. So in this situation first you have to know the disk name. There are couple of methods to know which disk has failed.
a) “sysconfig -r “ look for broken disk list
b) From autosupport message check for failed disk ID
c) "fcadmin device_map" looks for a disk with xxx or “BYP” message
d) In /etc/messages look for failed or bypassed disk warning and there it gives disk ID
Now once you have identified failed disk ID run “disk fail ” and check if you see amber light if not use “blink_on ” in advanced mode to turn on the disk LED or if that that fails turn on the adjusting disk’s light so you can identify the disk correctly using same blink_on command. Alternatively you can use led_on command also instead of blink_on to turn on the disk LEDs adjacent to the defective disk rather than its red LED.
If you use auto assign function then system will assign the disk to spare pool automatically otherwise use “disk assign ” command to assign the disk to system.
Scenario 4:
Disk LED remains orange after replacing failed disk
This error is because you were in very hurry and haven’t given enough time for system to recognize the changes. When the failed disk is removed from slot, the disk LED will remain lit until the Enclosure Services notices and corrects it generally it takes around 30 seconds after removing failed one.
Now as you have already done it so better use led_off command from advanced mode or if that doesn’t works because system believes that the LED is off when it is actually on, so simply turn the LED on and then back off again using “led_on ” then “led_off ” commands.
Scenario 5:
Disk reconstruction failed
There could be a number of issues to fail the RAID reconstruction fail on new disk including enclosure access error, file system disk not responding/missing, spare disk not responding/missing or something else, however most common reason for this failure is outdated firmware on newly inserted disk.
Check if newly inserted disk is having same firmware as other disks if not first update the firmware on newly inserted disk and it then reconstruction should finish successfully.
Scenario 6:
Disk reconstruction stuck at 0% or failed to start
This might be an error or due to limitation in ONTAP i.e. no more than 2 reconstructions should be running at same time. Error which you might find a time is because RAID was in degraded state and system went through unclean shutdown hence parity will be marked inconsistent and need to be recomputed after boot. However as parity recomputation requires all data disks to be present in the RAID group and we already have a failed disk in RG so aggregate will be marked as WAFL_inconsistent. You can confirm this condition with “aggr status -r" command.

If this is the case then you have to run wafliron, giving command “aggr wafliron start ” while you are in advance mode. Make sure you contact NetApp before starting walfiron as it will un-mount all the volumes hosted in the aggregate until first phase of tests are not completed. As the time walfiron takes to complete first phase depends on lots of variables like size of volume/aggregate/RG, number of files/snapshot/Luns and lots of other things therefore you can’t predict how much time it will take to complete, it might be 1 hr or might be 4-5 hrs. So if you are running wafliron contact NetApp at fist hand.