Tuesday 31 January 2017

How to troubleshoot the RAID controller modules in Dell PowerVault MD 3600f storage?

                                             
Designed for flexibility, the MD3600f arrays support a range of drive types, enclosures, and RAID levels all within a single array.  The DellPowerVault MD 3600 f array can offers  support for RAID levels 0, 1, 10, 5, 6 Up to 192 physical disks per group in RAID 0, 1, 10 Up to 30 physical disks per group in RAID 5, 6 Up to 512 virtual disks.
The storage array is connected to a host using two hot-swappable RAID controller modules. The RAID controller modules are identified as RAID controller module 0 and RAID controller module 1.
Each RAID controller module has four FCIN (host) port connectors that provide FC connections to the host or node. Each RAID controller module also contains an Ethernet management port and an SAS Out port connector. The Ethernet management port allows the user to install a dedicated management station (server or stand-alone system). The SAS Out port allows the user to connect the storage array to optional expansion enclosures for additional storage capacity.
Certain events can cause a RAID controller module to fail and/or shut down. Unrecoverable ECC memory or PCI errors, or critical physical conditions can cause a lockdown. If the RAID storage array is configured for redundant access and cache mirroring, the surviving controller can normally recover without data loss or shutdown.
 Invalid Storage Array : The RAID controller module is supported only in a Dell-supported storage array. After installation in the storage array, the controller performs a set of validation checks. The array status LED is lit with a steady amber color while the RAID controller module completes these initial tests and the controllers are booted successfully. If the RAID controller module detects a non-Dell supported storage array, the controller aborts startup. The RAID controller module does not generate any events to alert the user in the event of an invalid array, but the array status LED is lit with a flashing amber color to indicate a fault state.
ECC Errors: RAID controller firmware can detect ECC errors and can recover from a single-bit ECC error whether the RAID controller module is in a redundant or non-redundant configuration. A storage array with redundant controllers can recover from multi-bit ECC errors as well.The RAID controller module failover if it experiences up to 10 single-bit errors or up to 3 multi-bit errors.
PCI Errors  : The storage array firmware can detect and only recover from PCI errors when the RAID controller modules are configured for redundancy. If a virtual disk uses cache mirroring, it fails over to its peer RAID controller module, which initiates a flush of the dirty cache.
The storage array generates a critical event if the RAID controller module detects a critical condition that could cause immediate failure of the array and/or loss of data.
If both RAID controller modules fail simultaneously, the enclosure cannot issue critical or noncritical event alarms for any enclosure component.
How to troubleshoot the raid controller modules?
It is recommended to turn off the host server before turning off the array to prevent loss of data in the case of non-redundant configurations.
If the array status LED is solid or blinking amber:
In the AMW, select the Summary tab, and click on Storage Array needs attention. Follow the listed procedures in the Recovery Guru(s) and wait for up to 5 minutes to check if the LED has turned blue. If following the recovery guru procedures does not solve the problem, complete the following procedure to further troubleshoot the array.
Turn off the host server as appropriate.
Remove the RAID controller module and verify that the pins on backplane and RAID controller module are not bent.
Reinstall the RAID controller module and wait for 30 seconds.
 Check the RAID controller module status LED.
 Replace the RAID controller module and turn on the host server
If both LEDs for any given FCIN port are unlit,
 Turn off the server, storage arrays, and expansion enclosures.
 Reseat the RAID controller module and reconnect cables to the storage array and the server. Restart the storage array and wait until the array is fully booted.
Turn on the server. Recheck the LEDs of the affected port(s).
Replace the fiber optic cables of any port(s) where both LEDs are unlit.
An MD3600f series storage arraysupports single as well as dual RAID controller configurations. If only one RAID controller module is installed in the array, it must be installed in slot 0. The user must install the RAID controller module blank in slot 1.
RAID controller modules can be removed and installed without turning off the array. It is recommended not to remove the RAID controller module while data is being transferred. Replacing or installing a RAID controller module that is connected to a host server causes it to loose communication with the array and may require a reboot of the host server.
Navigator System Private Ltd provides offers storage maintenance support with spare parts-replaceable unit service in most cities with flexible maintenance Contract.
Navigator System offers server support, maintenance, repair, service, and servers repair in Bangalore, Chennai, Mumbai, Hyderabad, Delhi, Pune and all over India.
If you have questions, please feel free to contact us at +91 9845451006 or sales@navigatorsystem.com
Please visit our website for more details: http://www.navigatorsystem.com