Monday, June 11, 2018

Starting ...... Currently no name


I am not a writer .  But whenever i get time, i try to pen down something  . Normally will try to post the technical stuffs ,but  first time tried to  write something new .


So please comment  if you like and provide suggestions to improve it


Synopsis
========

This story revolves around an engineer kumar  .   This is the first part  In This part , I had tried express , the  first day of the  engineering college life of kumar .
There are many interesting incidents that can be penned down .   


Part-1 

It was month of May , sun was on top  with the temperature about 40 degrees but the thinking mind kept me distracted from the sweat on my neck and the sticky humidity of the day.
I booked the auto and started toward the Engineering college.

Me: how far is the college ?
auto driver:  it is 8 Kms outside the city .

There was complete silence  ,as we were nearing , i was  enjoying the   lush green fields, grazing  cattles and  small dhabas  ,which i saw after long time .
 
auto driver: this is the college entrance gate .
Me: thanks how much?
auto driver: 40 Rs
Me : this is your's 40 rs ,thanks

Scene-1 - Entry to College

I started moving slowly toward the entry gate thinking about hostel,ragging ,colleauges and bla bla bla ,when i heard somebody calling me .
it was the  security  gate personnel  .

security : hey come here
Me: Yes plzz
security : New admission
Me : yes. can you please let me know the  principal's office
security :  Please make the entry in the register . Go straight on right there is principal's office .

I entered the college gate ,started for principal's office .It was a big campus,Its shady trees ,green sunny lawns, row of roses  catched my eyes .The cold air started  blushing giving  the cooling sensation .

I saw a group of students sitting on the corridor .

Me: Hi kumar here , shall i know where is the pricipal's office
senior: go straight the last room .
Me: Thanks
senior (laughing) : will see you soon .
I reached the principal office , and knocked the Door
principal : come in
Me: thanks
Principal: Take your seat
Me: okay thanks sir
The dicussion started with the  introduction,expectections ,curriculum  then it came to hostel.
Principal: You have opted for hostel .
Me: yes
Principal: Will call the hostel warden to discuss the room allocation .
Me: Okay ,thanks sir
Principal: He will be here in 5 minutes .

Hostel warden enters the principal office and greets
Warden: Good afternoon sir
Principal : Good Afternoon. Meet kumar he has joined today and will be needing your's help.
Me: Good Afternoon
Warden : Are you new joinee .
Me: yes
Warden: there are 5 rooms free in the hostel , 4 in first floor and 1 in ground floor and all are twin sharing .
Me: ground floor will be fine for me .
Warden : write down the hostel address and the room number in the notepad .
Me: okay thanks sir

I left the principal's office and the saw the watch , it was 3 PM  and i was  hungry  and
I went nearly local dhaba  and took some food and took auto for the hostel  and reached the hostel entrance gate .

Scene-2 - Entry to Hostel

Security : Good evng
Me: Good evng and handed over  the note .
Security : welcome
Me : Thanks
Security : please come with me .

We started moving toward the hostel corridor  and finally reached the allocated room.

Security : This is the room  who had been allocated .
Me: thanks

I threw all  my luggage and locked the room and went to take the much needed shower  and after that went to the bed , I don't know when my eyes grow heavy and i  went into deep sleep . sudden knock , i woke up  and opened the door .

Me :  Hi good evng
pandey : HI , i am pandey .
Me: Good evng , kumar here ,i have newly joined .
Pandey : ohh.. okay .. Good evng mate
Me: How is the hostel .
Pandey : fine
Me: heard about ragging ,  is it here also ?
Pandey (laughing) : You will come to know in few days.
Me : Please tell me have you faced it here?
Pandey: Yes , nothing to worry ..
Me:   Can you let me know what i should be doing as you are in hostel for few days
Pandey : sure , Don't look into the eyes of any seniors ,greet them whenever you see and obey the instructions , rest all will be fine .
Me: thanks will do that .

It was around 8 PM , we started towards canteen for food . There i met few more colleagues and  introduced myself .  My eyes were searching for any  known face then  Suddenly saw singhji one of my schoolmate  , i smiled and went towards him .


Me : singhji you here
Singhji : Kumar how r u ?
Me: good  after long time . wht abt you ?
Singhji : i am fine . Nice seeing you ..
Me: Singhji in which room you are staying ?
Singhji: S-12
Me: ok .. we have lot to talk ,we will sit together after food
Singhji: Ok kumar
Me: Pandey , at what time you go to bed?
Pandey : 11 PM.
Me: ok ,will go with singhji and comeback before 11.

We completed our food while talking and i went with singhji to his room .There in his room already many of our colleagues were sitting and playing cards .


Singhji: Meet my old friend and our class mate .
Me: Good evng friends .
Singhji : kumar consider it as your room and take the seat .
Me: thanks singhji
Singhji : This is our normal timepass that we  normally  do after  college timings.After college ,if we are going out o hostels seniors will catch.
Basker: Do you know how to play cards.
Me: yes
Basker: we are playing dahla pakad(card game) ,would you like to join us .
Me: thanks ,yes will .

we started playing the game and there was complete silence and everybody was concentrated  as the last phase was coming .

suddenly anurag(one of the colleague ) entered with the bang .
Anurag:  Basker,singhji how r u ?
Singhji : fine ,meet my old friend kumar who joined us today .
Anurag : Hi  kumar , how  r u ?
Me: fine
Anurag : which branch
Me: Computer science
Anurag: Playing Cards  mates , This room has become  the adda for cards .
Basker : Yes sirji , you can join us .

After 20 mins ,there was knock on the door and 3-4 people came inside the rooms , with cigarettes in their hands,eyes complete red  and the unpleasant smell of alcohol spread across the room .
. Everybody left the cards and stood up and greeted with heads down.

Senior : Hey what you all were doing ?
Basker : Sir ,we were playing cards. would you like to join us .
Senior : New bakra (joinee) ..looking at me .
Me: Yes sir
Senior: Give your intro
Me:  .......................
Senior : Sing a song for me ?
Me : started singing .. ratkali ek khwab me aayi ..
Senior : Shit .. can you sing any new song
Me : tujhe dekha to yeh jaana sanam....

Senior : Basker  you dance the nagin dance ,on this song . kumar you should not stop  singing before 5 mins.I started singing ,basker started rolling the hands like cobra ..hiss ..hiss.. and expression  everybody started laughing ,i too .

Senior(anger) : kumar how  dare you stopped before 5 mins ?
Senior : bend on your knees and be there until i tell .
Me : ok sir
Senior: now we will have the cricket match here . Basker you will be batting , anurag you will be bowling and kumar you need to do commentry .
Me: Ok sir
Me: Gayle is getting ready to face the first  ball from prasad.  Prasad is running from the stadium end  and gayle played  the defensive shot .
Senior : Idiot.. i told Basker is baating and anurag is bowling ... where comes the  gayle and prasad .
Me: sorry sir , you told to do commentry ,that only was doing .
Senior : Singh can you give a slap to kumar
Singhji : ok sir.
Senior : is this slapping , shall i teach you how to slap.
Singhji : No sir .. slapped
senior: how you felt,was it fine or you require more.
Me:(rolling hands on the cheeks) : it was hard.. will start the commentry
Me: basker on the crease to take strike  of the first ball from anurag .basker hits the ball to the short fine leg boundary ..fielders running behind ,but looks like it ball will
reach the boundary and it is . Anurag bowls the second delivery which was fullish length  ,basker defends.

it goes on for next 5 mins

Senior : ok guys ,we are feeling sleepy ... see you tommorrow ..gud ngt..


This is how  was the first day of the  kumar's  engineering life


to be continued ..............................

 Many more to come ..

Please comment if you like .. 




Thursday, May 31, 2018

OPENSSH 7.1 (7.1.102.1100) -- issues





OPENSSH 7.1 (7.1.102.1100)  .

1.       Addition of ciphers  using “+” sign

After adding below lines in sshd_config

Ciphers  + blowfish-cbc,arcfour256,arcfour128

#ssh -vv test123

debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctrchacha20-poly1305@openssh.com,,blowfish-cbc,arcfour256,arcfour128
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctrchacha20-poly1305@openssh.com,,blowfish-cbc,arcfour256,arcfour128

      The issue seems to be with the double comma , that prevents  the ciphers to work .

2.       The unsupported Ciphers  showing in the OPENSSH package itself

List of supported Ciphers in OPENSSH7.1 Package

$ssh -Q cipher
3des-cbc
blowfish-cbc
cast128-cbc
arcfour
arcfour128
arcfour256
aes128-cbc
aes192-cbc
aes256-cbc
aes128-ctr
aes192-ctr
aes256-ctr


As per the sshd_config Man page , The default Cipher List .


chacha20-poly1305@openssh.com, aes128-ctr,aes192-ctr,aes256-ctr, aes256-gcm@openssh.com 



Connectivity result before applying the Ciphers

$ ssh -vv test123
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctrchacha20-poly1305@openssh.com, >>>List of ciphers supported by default by OPENSSH7.1
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctrchacha20-poly1305@openssh.com,

It refers to me that the default Cipher that the OPENSSH7.1  supports  is something different that we see in the man page of sshd_config .  “aes256-gcm@openssh.com” cipher doesn’t exists in the default Cipher list,  when we do connectivity test .  



Below Ciphers list were added  as per the different site references  .

Ciphers blowfish-cbc,aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,chacha20-poly1305@openssh.com,aes128-gcm@openssh.com,aes256-gcm@openssh.com


Working Configuration



But when we tried this options after removing the gcm cipher , it worked .

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,chacha20-poly1305@openssh.com,aes256-cbc,3des-cbc


Common errors :

Unable to negotiate with x.x.x.x.: no matching cipher found. Their offer: aes128-cbc,blowfish-cbc,3des-cbc lost connection

solution ) add the below lines in sshd_config configuration file 

Ciphers aes128-ctr,aes192-ctr,aes256-ctr,chacha20-poly1305@openssh.com,aes256-cbc,3des-cbc.blowfish-cbc

stop and start  the ssh services 

 error 2) sometimes noticed that passwordless authentication or the authentication negotiations are happening using only  the dsa keys , then it will fail ?
ans) in openssh7 ,the dsa keys are by default disabled , it means that any key negotiations or connections that uses only dsa keys will be failing . In this scenario , we need to first validate the connectivity using the ssh -vv  <server-name>  check for the keys which it is accepting   to confirm the exact issue . 

add the below lines in sshd_config to allow the dsa keys . 

HostKeyAlgorithms +ssh-dss 

PubkeyAcceptedKeyTypes +ssh-dss 

stop and start the sshd services . 

test the connectivity . 

AIX - CPU Utilization -Some points

                                        CPU Utilization in AIX . 
                                      =====================

 Here in  our example , we will try to understand actually how are the CPU entitlement parameters working in AIX .

 Let us  try to understand through different scenario's .

 Taking the below example 


 Mode                                           : Uncapped
Entitled Capacity                          : 3.00
Online Virtual CPUs                     : 20
Maximum Virtual CPUs                : 30
Minimum Virtual CPUs                 : 2
Minimum Capacity                        : 1.00
Maximum Capacity                       : 10.00


 Entitled Capacity : This LPAR is entitled/guaranteed  to get 3 CPU .
Minimum Capacity :  minimum requirement to start this LPAR is 1 CPU .
Maximum Capacity : Maximum entitlement for this LPAR is 10 CPU .

Question comes here .,what is actually maximum Capacity and how it works ?
ans) Maximum Capacity parameter comes into picture when we talk about DLPAR operations . It means that we can increase the  Entitled Capacity  online to the Maximim Capacity Value i.e. in this scenario to 10 . 
                                  Maximum entitlement doesn't have any relation with the CPU utilization  of LPAR . Many times noticed people have mis-conception that this is the maximum value till what CPU utilization of the LPAR can go .

Ques) In this scenario ,what can be the maximum CPU utilization this LPAR can achieve ?
ans) This LPAR is un-capped ,it means that it  can go upto maximum limit as per the configuration and requirement  subjected to the fact that CPU pools are having enough free CPU.
     When we talk about it can go to maximum limit  as per the configuration ,here comes into the picture the "online virtual CPU "  .As we know that , virtual CPU use power of 1 CPU as per the requirement .     The maximum  CPU utilization of this LPAR depends on the value of "online virtual CPU's "  and the free CPU's available in CPU Pools.
Here in this scenario , This LPAR  CPU utilization can go  maximum  upto 20 CPU   subjected to the fact that it that enough CPU resources in the CPU Pool .


Taking the same scenario but the "CPU Mode is capped"


Mode                                            : capped
Entitled Capacity                          : 3.00
Online Virtual CPUs                     : 20
Maximum Virtual CPUs                : 30
Minimum Virtual CPUs                 : 2
Minimum Capacity                         : 1.00
Maximum Capacity                        : 10.00



Ques) In this scenario ,what can be the maximum CPU utilization this LPAR can achieve ?
ans) The CPU mode for this LPAR is capped means that in any case it can't go above the entitled Capacity . Here in this case , Entitled Capacity is 3 CPU , means  CPU utilization
 of this LPAR can't go beyond 3 CPU . 

Wednesday, May 30, 2018

NMON- Analyzing Memory Usage



Recently, got request to clarify the  AIX Memory utilization from application team  . After going through lot of documentations , this was what i was able to understand .  .  





Total virtual :36 GB  >>>  this total memory allocated which includes physical and paging space .
Accessed virtual : 16.3 GB --   Active virtual pages  in memory (including pagespace + real memory) which comes around 45.3%

In AIX  Virtual Memory Management  , The free memory will be used for cache and whenever application request for memory , it will freed automatically .  

In our latest report below is the physical memory consumption .

%used =93.7%   -- it is the total  physical memory used out of 32 GB (i.e process+system +cache)

Note:  This will be always high  in AIX  as it includes the cache also .

%free =  6.3%


This section  of NMON provides details how is the physical memory used  and the differentiation –refer below attached screenshot  .




Numperm(cache) =41.9%      >>> this is used for cache purpose(filesystemcache etc)  for better performance and will be freed up automatically by the Operating system when application requests for memory.
Process                = 40.4%      >>> this much of physical memory out of 32 GB is used by application process
System                = 11.4%      >>>  This much of memory is used by the Operating system processes out of 32 GB
Free                    =6.3%         >>>  This is free physical memory available out of 32 GB


Basically when we talk about performance we consider only “process+system”  . If this is above 90 -95% , then we can see performance impact .  


Saturday, December 16, 2017

lspv shows newly assigned LUN as "VeritasVolumes" instead of "None"

Problem

AIX's lspv shows newly assigned LUN as "VeritasVolumes" instead of "None".
The problem was that the veritas volume manager package was already removed from the server level long back .veritas services were not configured at server level .

error :

test# extendvg -f datavg hdisk3
Disk hdisk3 is already in volume group VeritasVolumes

probable problem cause 


 lspv command reads the customized database . That means that output we are getting is  due to the  PV attributes defined in customized database.
it can be the case that earlier hdisk3 was part of veritas volume manager and  the proper procedure was not followed for removal of veritas volumes  and that caused in-consistencies
in customized database .


understanding the exact cause 


1)for understanding the exact issue , we tried to get the disk details from the customized database .We found that the PV attributes is set  as "VeritasVolumes" to the problematic disk ,which was not present for other  AIX LVM disks. we cross-checked all the disks and found the same issue . 

# odmget -q value=hdisk3 CuAt

CuAt:
        name = "VeritasVolumes"
        attribute = "pv"
        value = "hdisk3"
        type = "R"
        generic = ""
        rep = ""
        nls_index = 0

2)we removed the disk from the server level using rmdev command ,to figure out whether this attribute value is getting removed . but no luck rest all attributes were removed but this
 attribute was not removed . This gives us the impression that either this attribute need to be removed by correct VxVM command or need to be forcefully removed from customized ODM database.


Resolution


normally what we know is the below process to remove the disk from VxVM control,if it is configured.


1. to tell the vxconfigd deamon to enter enabled mode
 #vxdctl enable
2. check for the disk details .
# vxdisk -e list|grep hdisk3
test_aks0_1242 auto      -             -            online       hdisk3     std

3.uninitialise the device to remove the VxVM information:
# /etc/vx/bin/vxdiskunsetup -C test_aks0_1242

4.it will also have to be removed from VxVM's view:
# vxdisk rm test_aks0_1242

5.lspv  shows the device without the VeritasVolumes tag:
# lspv|grep hdisk3
hdisk3        none                                None


 when VxVM package is itself not present and it is not possible to remove the disk from VxVM control using any VxVM command .


step 1) validated the hdisk3 disk details
test# lspv | grep hdisk3
hdisk3          none                                VeritasVolumes

Step 2 ) removed the disk , to figure out that  ODM disk related PV attributes  information  of hdisk3 is getting cleared or not .

test# rmdev -Rdl hdisk3
hdisk3 deleted

Step 3) found that after removing the disk also the PV attribute of hdisk3 is not getting cleared .
test# odmget -q value=hdisk3 CuAt

CuAt:
        name = "VeritasVolumes"
        attribute = "pv"
        value = "hdisk3"
        type = "R"
        generic = ""
        rep = ""
        nls_index = 0
test# odmget -q name=hdisk3 CuAt
test# odmget -q name=hdisk3 CuDv
test# odmget -q value3=hdisk3 CuDvDr
test# odmget -q name=hdisk3 CuDep
test# odmget -q name=hdisk3 CuVPD

Step 4) removed the PV attribute  from customized ODM.

# odmdelete -q value=hdisk3 -o CuAt
0518-307 odmdelete: 1 objects deleted.

Step 5) ran cfgmgr to re-configure it  .

#cfgmgr
test# lspv | grep hdisk3
hdisk3          none                                none

Note : it is not required to remove the disk using rmdev , we can directly remove the ODM definitions 

Thursday, November 09, 2017

HMC Commandline




Getting the frame details



hscroot@hmc-op:~> lssyscfg -r sys -F name
op710-1-xxxxxxx
op710-2-xxxxxxx
op720-1-xxxxxxx


Getting the LPAR details in the frame with status

hscroot@hmc-op:~> lssyscfg -m op710-2-SN1008B2A -r lpar -F name,lpar_id,state
op710-2-Client5-Fedora-Core-4,6,Running
op710-2-Client4-openSUSE-10.0,5,Running
op710-2-Client3-Debian-3.1,4,Running
op710-2-Client2-RHAS4U3,3,Running
op710-2-Client1-SLES9SP3,2,Running
op710-2-VIO-Server,1,Running

Getting the resource allocation for frame

hscroot@HMC:~> lshwres -r mem -m Server-8204-XXX-XXXX --level sys
configurable_sys_mem=114688,curr_avail_sys_mem=256,pend_avail_sys_mem=256,installed_sys_mem=114688,max_capacity_sys_mem=deprecated,
deconfig_sys_mem=0,sys_firmware_mem=2560,mem_region_size=256,configurable_num_sys_huge_pages=0,curr_avail_num_sys_huge_pages=0,pend_avail_num_sys_huge_pages=0,max_num_sys_huge_pages=6,requested_num_sys_huge_pages=0,huge_page_size=16384,total_sys_bsr_arrays=16,bsr_array_size=8,curr_avail_sys_bsr_arrays=0,max_mem_pools=0
hscroot@HMC:~>

Getting the resource allocation for LPAR

HMC:~> lssyscfg -m Server-8206-E48-XXXXXXX  -r prof --filter "lpar_names=test_retail"
name=test_retail_Profile_OK,lpar_name=test_retail,lpar_id=2,lpar_env=aixlinux,all_resources=0,min_mem=28872,desired_mem=28872,max_mem=28872,min_num_huge_pages=0,
desired_num_huge_pages=0,max_num_huge_pages=0,proc_mode=ded,min_procs=6,desired_procs=6,max_procs=6,sharing_mode=share_idle_procs,"io_slots=,lpar_io_pool_ids=none,
max_virtual_slots=10,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=none,virtual_eth_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=1,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lhea_logical_ports=none,lhea_capabilities=none,lpar_proc_compat_mode=default,
electronic_err_reporting=null,virtual_fc_adapters=none



changing the Memory Allocation for and LPAR .

chsyscfg -r prof -m Server-8206-E48-SN2239B16  -i "name=test_retail_Profile_OK,lpar_name=test_retail,min_mem=94208,desired_mem=94208,max_mem=94208"


Changing the Virtual CPU parameter for an LPAR

chsyscfg -r prof -m Server-8206-E48-SN2239B16  -i "name=test_retail_Profile_OK,lpar_name=test_retail,min_procs=7,desired_procs=7,max_procs=7"

Changing the entitiled capacity for an LPAR

chsyscfg -r prof -m Server-8206-E48-SN2239B16  -i "name=test_retail_Profile_OK,lpar_name=test_retail,min_proc_units=0.1,desired_proc_units=0.2,max_proc_units=2.0"


Starting and bringing down the LPAR .

To start the LPAR named "test_retail" with the profile "test_retail_Profile_OK"


hscroot@hmc-570:~> lssyscfg -m Server-9110-510-SN100129A -r lpar -F name,lpar_id,state,default_profile
VIOS1.3-FP8.0,1,Running,default
linux_test,2,Not Activated,client_default



chsysstate -m Server-8206-E48-SN2239B16  -r lpar -o on -n test_retail -f  test_retail_Profile_OK

Shutting down the Lpar "test12" immediately

chsysstate -m SYSTEM-9131-52A-SN10XXXXX -r lpar -o shutdown -n test12  --immed


IMP Commands 


lshmc -v  Shows vital product data, such as the serial number.
lshmc -V  Shows the release of the HMC.
lshmc -n  Shows network information of the HMC.
hmcshutdown -r -t now  Reboot the HMC.
lssysconn -r all Show the connected managed systems.
chhmcusr -u hscpe -t passwd -v abc1234  Change the password of user hscpe.
lshmcusr  List the users of the HMC.
monhmc -r disk Look at the filesystems of the HMC
monhmc -r proc details of the processor
monhmc -r mem details of memory
rmvterm -m SYSTEM-9117-570-SN10XXXXX -p name Forces the closure of a virtual terminal session.
lspartition -dlpar   shows dlpar capable partitions


And now let's initiate some commands to a VIOS using viosrvcmd.

hscroot@hmc-570:~> viosvrcmd -m Server-9115-520-SNxxxxx -p VIOS1.3-FP8.0 -c "mkvg -f -vg datavg hdisk2 hdisk3"
datavg
hscroot@hmc-570:~> viosvrcmd -m Server-9115-520-SNxxxxxx -p VIOS1.3-FP8.0 -c "mklv -lv testlv datavg 10G"
testlv
hscroot@hmc-570:~> viosvrcmd -m Server-9115-520-SNxxxxxx -p VIOS1.3-FP8.0 -c "lsvg -lv datavg"
datavg:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
testlv              jfs        160   160   1    closed/syncd  N/A

Cluster Issue

Why the SP2 Failover failed?

Observations :

1.       After analyzing the logs ,we noticed the below error in cluster logs.  The error what we noticed that the cluster event “get_disk_vg_fs” failed . On further analysis  ,to pinpoint where was the actual issue and why this event failed we further deep-dived the logs . we found that the Cluster services   had issues while activating/mounting the Cluster filesystem /sapmnt/SP2.





   

2.       When we initiated the  SP2 cluster failover , it will un-mount the Filesystems , export the VG’s from node1 and after this it will import all the VG’s and mount the respected Filesystems  on node2 . As per  the logs ,the cluster VG’s  were successfully exported from  node1 and the VG’s  were imported successfully  on  node2   but while mouting the FS(/sapmnt/SP2) it was giving issues .   The cluster failed to mount the /sapmnt/SP2 filesystem  . 

3.       Once we got these details from cluster logs, we investigated further , to know why the cluster was facing issues with /sapmnt/SP2 filesystem during the failover . On further investigation ,we found that /sapmnt/SP2 filesystem was already  NFS -mounted  and also this filesystem was manually mounted on node2    though the normal NFS commands .  That means that cluster was not able to mount the FS since it was already mounted  . 


.

4.       We  verified with  the SAP/DB team in call ,about the requirement of /sapmnt/SP2 filesystem on node2  and upon confirmation, we have un-mounted it. As per the application team this filesystem is needed where SP2 application is running.  we   configured the filesystem  /sapmnt/SP2  as NFS-Crossmount inside cluster  to meet the requirement  and again performed the cluster failover test and application validation . 

Everything was fine . 


Tuesday, September 12, 2017

Dynamic Routing -gated services aix


                In TCP/IP, routing can be one of two types: 

 1.  Static  routing
2.  Dynamic routing

 With static routing, you maintain the routing table manually using the route command. Static routing is practical for a single network communicating with one or two other networks.

* Note -  However, as your network begins to communicate with more networks, the number of gateways increases, and so does the amount of time and effort required to maintain the routing table manually.
With dynamic routing, daemons update the routing table automatically. Routing daemons continuously receive information broadcast by other routing daemons, and so continuously update the routing table.



 In AIX  , TCP/IP  provides two daemons for use in dynamic routing,

1.  routed  deamon
2.  gated daemon

The gated daemon supports  

 a)Routing Information Protocol (RIP) & Routing Information Protocol Next Generation (RIPng)
 b)Exterior Gateway Protocol (EGP), 
 c)Border Gateway Protocol (BGP) and BGP4+, 
 d)Defense Communications Network Local-Network Protocol (HELLO), 
 e)Open Shortest Path First (OSPF), 
 f) Simple Network Management Protocol (SNMP) and some more 


Routing daemons can operate in one of two modes,
1.  passive 
2.  active,  

In active mode, routing daemons both broadcast routing information periodically about their local network to gateways and hosts, and receive routing information from hosts and gateways.
                                                              In passive mode, routing daemons receive routing information from hosts and gateways, but do not attempt to keep remote gateways updated (they do not advertise their own routing information).

                                              Dynamic routing daemons, however, must be run in the passive (quiet) mode when run on a host that is not a gateway.

Recently came across environment where gated services where used with OSPF routing protocol

       This was something new for me ,so started reading the pdf's and blogs to understand the exact concepts. 
               
The most important point is that if you want to understand the complete configuration ,you first need to understand the Routing protocol and it's working and it's network terms  .





Now let us go through the basic concept   of the OSPF routing protocol that will be helpful in configuration 


OSPF 

  • Dynamic Routing Protocol 
  •  Link State technology 
  • Runs over IP, protocol 89 
  •  Designed by IETF for TCP/IP 
  • Supports VLSM   -- It supports subnetting 
  • Multi-vendor   - It is standard protocol  and supported by all the vendor's 
  • Fast rerouting - OSPF detects changes in the topology, such as link failures, and converges on a new loop-free routing structure within seconds.
  • Minimises routing protocol traffic 

  • Low bandwidth requirements 
  •  Supports different types of areas 
  • Route summarisation and authentication


 Under construction  ....  


Sunday, September 10, 2017

Network performance .. Some points


Recently  was trying  to understand the issue,  in which the customer  complained that there network  connection  are getting dropped off.
Network team worked on it for long time, and came to Unix team to look from server end also.

Since it was virtualize environment,  we started look from network end first. And also informed application team to let us know how is these connections  setup.

Hoping that some tuning is required from both  the end to resolve  the issue
Network stats
=============

108038312 packets received
                67173530 acks (for 3510816000 bytes)
                295731 duplicate acks
                0 acks for unsent data
                97425484 packets (2215095896 bytes) received in-sequence
                22985 completely duplicate packets (28717295 bytes)
                0 old duplicate packets
                8552 packets with some dup. data (5423403 bytes duped)
                8332754 out-of-order packets (461387377 bytes)

 understanding the reason for these out of order packet and duplcate packets at receiving end  ?

There are these certain scenario's  :

1. The network congestion . 
2. the adapter(etherchannel) configuration 
3. the adapter buffers etc 

The Adapter Configuration
=====================
In our scenario ,The etherchannel is configured  as link-aggregation but with the algorithm used as “round-robin”.

Let us first understand the round-robin algorithm 


Round-Robin: All outgoing traffic is spread evenly across all of the adapters in the EtherChannel. It provides the highest bandwidth optimization for the AIX server system.  While round robin distribution is the ideal way to utilize all the links equally but we should also  consider that it also introduces the potential for out-of-order packets at the receiving system. 

  The out of order packets ,duplicate acks  these all can be due to the etherchannel configuration algorithm “round-robin”  or may indicate any other network issues .



2.       We have noticed the lot of  TCP ack packets are getting  delayed. This is normal behavior of TCP-IP functioning in UNIX but sometime  for high performance(response time )  demanding application  it may be issue .

This is normally customized at app level . but we are also having option in AIX to overcome this. the “TCP_NODELAY” socket option is disabled by default,  which means  TCP Nagle algorithm on network transmissions is used which delays sending small successive packets. 

The nagle algorithm means that a TCP connection can only have one outstanding acknowledgement for a small segment. Clearly this causes delays in sending further packets until either the acknowledgement is received or TCP can bundle up more data into a full segment. Setting tcp_nodelay to 1 is a dynamic change and can better response time.
                                                                                                                   sometimes it is seen that this is very helpful , in getting the network throughput for high response time demanding application  . but this will increase the cpu overhead  and may lead to network congestion .



Before reaching the conclusion , we also need to validate different other parameters                                                             .... Under construction.... 
                                                                                                                                                   

Friday, August 25, 2017

NIM mksysb Restoration issue - "image.data has invalid logical volume data"


Recently we were working on the DR exercise .  we were trying to restore the mksysb  and faced some unique issue . After checking the IBM sites also we were not able to get the satisfactory steps to resolution .

Issue noticed and troubleshooting steps:
----------------

  1.  while restoring , we were recieving the below errors and it was redirecting us to the main page ,language selection page  after this . 
  2. we noticed that ,the system was getting error while pharsing the image.data file and was throwing the error "image.data has invalid logical volume data" .
  3. further for isolating the issue . we again  created the new image.data file from the source servers and again tried to restore themksysb after adding the  image.data resources . 
  4. But no luck , after this also we were facing the same errors. This make us clear that there is no issues with the image.data file , but issue with the content of the image.data file .
  5. We tried to again validate the image.data files from many of the other servers which were successfully restored . 
  6. After lot of investigation and validating the many  image.data  files of the successfully restored servers , we figured out that , the problematic server was not having paging space(hd6) defined under rootvg .


Resolution :


  •  As a hit and trial method , We edited the image.data file of the server  and added the paging space logical volume into it .  
  • Created new image.data resource  and assigned it to nim-client . 
  • After doing so , the NIM mksysb restoration started smoothly without any issues  .










Thursday, July 27, 2017

sendmail -issue : sendmail listening to only localhost

Problem Statement 
================
 The server test01 and test02 are in  cluster . The customer was having concern that  they were not able to reach the  Cluster-IP through port number 25 .

Analysis and understanding the exact problem 
======================================
1. First we validated if the sendmail services are running fine .it was working fine .
2. We tried to telnet the Cluster-IP  using port number 25 ,it was rejecting . we tried to telnet all the  configured  IP’s on the network  interfaces    using port number 25  but that also failed .
3. We tried to telnet the localhost using port 25 , it was successful . That means that there is no port level blocking at server level .
4. We validated the sendmail configuration file to check for the relay server  configuration and found it is ok .
5. Now the question arises that if port is open and we are able to do telnet to localhost ,then why are we not able to do for other-IP .  why is the sendmail services listening only to localhost . This was the actual Issue that need to be sorted out .


  For searching the solution ,we went through net and found   some useful  documents . sharing the same .

 For  sendmail : 
============ 
https://www.novell.com/support/kb/doc.php?id=7003912

After performing this change in configuration file , it was working fine for all the IP's.

For postfix
==========

1. Open /etc/postfix/main.cf file:
# vi /etc/postfix/main.cf
2. Append / modify line as follows to bind to localhost (127.0.0.1) only:
inet_interfaces = 127.0.0.1
3. If you need to bind to 127.0.0.1 and 192.168.2.1, enter:
inet_interfaces = 192.168.2.1,127.0.0.1
Save and close the file. You need to stop and start Postfix when this parameter changes. So type the following to restart Postfix:
# /etc/init.d/postfix restart