Friday, November 21, 2014

Repository Disk corruption Issue in PowerHA Systemmirror 7.1

Why the issues Normally comes

 if any node in the cluster encounters errors with the repository disk or a failure while accessing the disk, the  cluster enters a limited or restricted mode of operation.
 In this mode of operation, most topology-related operations are not allowed. For example, node cannot be added or a node cannot join the cluster.  Because of this starting the cluster  on the problematic node   will fail ,since the "node_join" event will not succeed  due to  corrupted Repository Disk .

This type of issue arises if there is any storage level issue(I/O Issues) due to which the storage disks where not accessible. 


How to figure Out the problem is there with the Repository Disk:









When the repository disk fails, you are notified of the disk failure. PowerHA SystemMirror  notifies you of the repository disk failure until it is resolved. 


To determine what the problem is with the repository disk, you can view the following log files:

   1.  hacmp.out
   2.  AIX error log (using the errpt command)
 


Hacmp.out error mssg .

The following is an example of an error message in the hacmp.out log file when a repository disk fails:

ERROR: rep_disk_notify : Tue Jan 10 13:38:22 CST 2012 : Node "abc123"(0x62518DTS1H0638E873GE041A74C40ZF9) on Cluster test has lost access to repository disk hdisk3.



 AIX error log


LABEL:              OPMSG
IDENTIFIER:     AA8AB241
 

When a node loses access to the repository disk, an entry is made in the AIX error log of each node that has a problem.











 Below are the steps that we have already performed successfully for overcoming the Repository Disk Corruption Issue.

1. Verify that the caa services are running on both the cluster nodes . using #lssrc -g caa . if not running start the services on both the nodes .
2. Remove the Repository Disk configuration from the node where the repository disk corruption issue is reported  using the command # rmcluster -F -r <repository disk >
3. Rebuild or recover  the Repository disk using the command # clusterconf –r <repository-Disk>.
4. Once successful again the CAA_VG will appear on the error node  and become active .
5. Sync the cluster from the node that is having the latest Cluster Information.

If that is not working we need to again add the new repository disks into the cluster configuration .

No comments:

Post a Comment