UNIX SYSTEM ADMINISTRATION : 2014

Friday, December 12, 2014

Resolving the Pseudo Terminal( pty) error Issue

To find the maximum number of pseudo-terminals (PTY) in IBM AIX:

# lsattr -l pty0 -E

To increase or decrease the Pseudo Terminals

# smit pty

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                        [Entry Fields]
STATE to be configured at boot time                     [available]
Maximum number of Pseudo-Terminals                 [256] --change this                                                                                                                   #
Maximum number of BSD Pseudo-Terminals           [16]

--------------------------------------------------------------------------------------------------------

Diagnosing the problem The symptoms may indicate that there is an application that is holding on to ptys and not releasing it.

1. Login to console .

2. Try using the 'fuser' command to find the culprit application, like this:
      # cd /dev/pts
      # fuser *
    The 'fuser' command will list all PIDs associated with each pty device.
    If there is a process that is not releasing its ptys, you will see its PID occur many times in the   fuser   output above .

3. Verify which process is holding the pty's and inform the respective team for corrective action.

Friday, November 21, 2014

Mailx and sendmail Configuration in AIX

Below are the pre-requisite for the sendmail configuration.

1. The server-ip and the FQDN Name of the mail server.
2. The mail server should be reachable through the SMTP port number 25.
3. Ensure that the server-ip has been allowed to send and recieve mail from mail relay server or mail server .

Sendmail configuration:

1. Edit the /etc/sendmail.cf file and check for "DS" entry and add the FQDN name of the mail server infront of this.

2. Add the entry of mail server in /etc/hosts file

3.stop the sendmail services and again start the sendmail services
#stopsrc -s sendmail
#startsrc -s sendmail

4. Verify that the you are able to send mail using the below command.

#echo "test"|mailx -v abc123@gmail.com
WARNING: local host name (test_boot) is not qualified; see cf/README: WHO AM I?abc123@gmail.com... Connecting to mail.abc.com . via relay...
220 abc.com ESMTP
>>> EHLO test_boot
250-abc.com
250-PIPELINING
250-SIZE 10485760
250 8BITMIME
>>> MAIL From:<aks@test_boot> SIZE=28
250 ok
>>> RCPT To:<abc123@gmail.com
>>> DATA
250 ok
354 go ahead punk, make my day
>>> .
250 ok 1412847209 qp 31504 by mail.abc.com
abc123@gmail.com... Sent (ok 1412846609 qp 3188 by abc.com)
Closing connection to mail.abc.com.
>>> QUIT
221 abc.com Goodbye.

Common Errors

If you are getting the error 554 --Mail rejected.

1..Check the mail queue of the server using command #mailq of the server , if the sent mssg is not present in it. means there is no issues with the server configuration the issue seems that the mail server is rejecting the mail .
2. Check with the mail team whether the ip is allowed to send mail.

Repository Disk corruption Issue in PowerHA Systemmirror 7.1

Why the issues Normally comes

if any node in the cluster encounters errors with the repository disk or a failure while accessing the disk, the cluster enters a limited or restricted mode of operation.
In this mode of operation, most topology-related operations are not allowed. For example, node cannot be added or a node cannot join the cluster. Because of this starting the cluster on the problematic node   will fail ,since the "node_join" event will not succeed due to corrupted Repository Disk .

This type of issue arises if there is any storage level issue(I/O Issues) due to which the storage disks where not accessible.

How to figure Out the problem is there with the Repository Disk:

When the repository disk fails, you are notified of the disk failure. PowerHA SystemMirror notifies you of the repository disk failure until it is resolved.

To determine what the problem is with the repository disk, you can view the following log files:

   1. hacmp.out
   2. AIX error log (using the errpt command)

Hacmp.out error mssg .

The following is an example of an error message in the hacmp.out log file when a repository disk fails:

ERROR: rep_disk_notify : Tue Jan 10 13:38:22 CST 2012 : Node "abc123"(0x62518DTS1H0638E873GE041A74C40ZF9) on Cluster test has lost access to repository disk hdisk3.

AIX error log

LABEL:              OPMSG
IDENTIFIER:     AA8AB241

When a node loses access to the repository disk, an entry is made in the AIX error log of each node that has a problem.

Below are the steps that we have already performed successfully for overcoming the Repository Disk Corruption Issue.

1. Verify that the caa services are running on both the cluster nodes . using #lssrc -g caa . if not running start the services on both the nodes .
2. Remove the Repository Disk configuration from the node where the repository disk corruption issue is reported using the command # rmcluster -F -r <repository disk >
3. Rebuild or recover the Repository disk using the command # clusterconf –r <repository-Disk>.
4. Once successful again the CAA_VG will appear on the error node and become active .
5. Sync the cluster from the node that is having the latest Cluster Information.

If that is not working we need to again add the new repository disks into the cluster configuration .

Thursday, July 31, 2014

DUAL VIO'S CONCEPTS

Dual VIO CONCEPT
===========
Why we need Dual VIO ?

For understanding the requirement of DUAL VIO, we need to go in flashback and think about single vio concept......................

while doing so few question arises in my mind.Let us take and example of small virtualized environment of 1 VIO and 4 VIO Client LPAR's and all disks are coming from VIO'S.

1. How we will do the downtime maintenance activity of VIO.?
ans) For any downtime maintence actvity (upgrade etc) or in case of any issue which involves downtime of VIO, we need to bring down all 4 LPAR and then perform the activty on VIO.

2) What if these 4 LPAR's are production critical and customer is not willing to afford the downtime?
ans) Here is the challenge in doing the VIO Downtime related activities

3) Also what will happen if VIO server only automatically got rebooted or went down.?
ans) This is also SPOF in case of virtualized single VIO implementation (where all the disks are coming from VIO)

For providing the solution to these issues , IBM introduced the DUAL VIO Concept , which provides solution for all these drawbacks in simple virtualized environment .

----------------------------------------------------------------------------------------------

What is DUAL VIO Concept?

DUAL VIO concept provide the redundancy at VIO Server Level . It means that if 1 VIO Server is down then the network communication and the disks will be accessible through the second VIO.

HERE THE QUESTION ARISES ?

How the commuication will happen through 2nd VIO? and the failover happens?
ans) while configuring DUAL VIO we also need to implement SEA FAILOVER that if take care of network communication of the LPAR's.

HOw the disks will be accessible from 2nd VIO? and why?
ans) all the disks are mapped in cluster mode to both the VIO's from storage level ,I.e. the disk with same PVID is present on both VIO Servers and VIOS mapping for the same disk is done from both the VIO's .At client level , you will se two paths each coming from different vscsi.