Thursday, July 21, 2016

How to create new FS/VG/LV in suse linux


How to create new FS/VG/LV in suse linux
=========================================

pre-requisite :

1. take the  df -h;fdisk -l ;multipath -ll ,vgs ,pvs,lvs command output .
2.take the backup of /etc/fstab.


2. once the storage team allocate the LUN's . NOte down the LUN-ID provided by storage team .
   Suppose LUN-ID = AB0004lm0000000008n00876d00005e9e

 run the below command to detetct at server level.

#rescan-scsi-bus.sh

3. Validate that the New LUN has been detetcted .
    #ls -ltr /dev/mapper/  - check the latest(last) entry to cross-check
    #multipath -ll |grep "AB0004lm0000000008n00876d00005e9e"
    # ls -l /dev/disk/by-id/ |grep -i "AB0004lm0000000008n00876d00005e9e"

4.  Once you are able to find out the new LUN's , note down the logical name of it .



5. If you are need to create PV .

 # pvcreate /dev/mapper/mpathdi

6. Create the volume group
 #vgcreate  abhidata3vg /dev/mapper/mpathdi

7. Once done you need to create the LV .
#lvcreate -L 20G -n lvabhidata3 abhidata3vg

8. Here we are going to create xfs filesystem
   #mkfs.xfs /dev/mapper/abhidata3vg-lvabhidata3

9. create the directory on which you want to mount the new FS  "/oracle/SQ7/sapdata3 and mount the filesystem.


#mkdir /aks
#mount /dev/abhidata3vg/lvabhidata3 /aks

10. To make the changes permanent add the entry of this filesystem in /etc/fstab

  /dev/abhidata3vg/lvabhidata3    /aks xfs     rw,noatime,nodiratime,barrier=0  0 0

  below options normally we prefer for xfs filesystems .

Friday, July 15, 2016

HACMP Failover Test Scenario's


                          

      CLUSTER FAILOVER TEST SCENARIO’S IN AIX ENVIRONMENT

This document covers the  Cluster Failover Test Scenarios  in AIX Environment .

In AIX  ,We have  normally three ways for performing the Failover Testing  .
1.       Manual Failover by moving the Resource Group
2.       Automatic Failover by abruptly halting the nodes
3.       Failover Testing by removing the attached hardware(disabling the NIC’s ,cables etc)




Important points that need to be validated before performing any failover test  as a System Administrator .

1. Data backup should be handy .

2. Cluster snapshot should be taken .

3. Configuration backup (including the RG attributes  ,FS details ).

4. If crossmount is configured kindly verify the exports file and compare the FS crossmounted . 
    In 1 case we noticed that the cluster filesystem was mounted as normal nfs mount  leading to issue while performing the failover test .  Since cluster will look for the entries in file "/usr/es/sbin/cluster/etc/exports "  if it exists to mount and unmount the FS . 

5. Also if going for failover test , if the RG's goes to error state , there are cases where it will not allow you to execute any cluster commands . In this case you may require to reboot the nodes . So better  keep the required team updated ,that we may require the server reboot of both the nodes in case of any issues. 



    Manual Failover Testing by moving the RG’s

Steps :
1.  Take the console  session of both the nodes.
2.   Verify  the Resource Group availability on nodes before the failover test .
               Command to be used #/usr/es/sbin/cluster/utilities/clRGinfo
# clRGinfo
-----------------------------------------------------------------------------
Group Name     Group State          Node          
-----------------------------------------------------------------------------
RES_01     ONLINE                   node1      >>>>>.    RG (RES_01) currently active on node1
                  OFFLINE                  node2       

RES_02     ONLINE                    node2       
                  OFFLINE                   node1 

3.   Here in this case .we are going to manually move the resource group (RES_GRP_01) from node1 to      node2
4.    From node1  run the command #smitty clstop
                  node1# smitty clstop
                               Stop Cluster Services

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                        [Entry Fields]
* Stop now, on system restart or both                 now                    +
  Stop Cluster Services on these nodes               [node1 ]                +           >>>>>>   select the node
  BROADCAST cluster shutdown?                         true                   +
* Select an Action on Resource Groups            Move Resource Groups      >>>>>  need to select this option for  manual failover


5. Next screen will ask for the  Resource group to move and the node where to move . Select the appropiate Resource Group and press enter , it will start the failover .

6. From node 2 , verify the RG status using the command #/usr/es/sbin/cluster/utilities/clRGinfo
1st probable output


     # clRGinfo
-----------------------------------------------------------------------------
Group Name     Group State                  Node          
-----------------------------------------------------------------------------
RES_01     OFFLINE                       node1       
                  ACQUIRING                  node2         >>>>>>>>>>       failover initiated and node2 is acquiring the Resource group    

RES_02     ONLINE                       node2       
                   OFFLINE                      node1 


2nd probable output

# clRGinfo
-----------------------------------------------------------------------------
Group Name     Group State            Node          
-----------------------------------------------------------------------------
RES_GRP_01     OFFLINE               node1       
                            ONLINE                 node2     Failover completed successfully ,node2 has acquired Resource Group  (RES_GRP_01)

RES_GRP_02     ONLINE                  node2       
                            OFFLINE                 node1 

Note: When stopping the cluster on node 1 the first thing executed is the cluster stop script. It brings down the applications and unmounts all application filesystems. If your application stop script is not able to stop all application processes some filesystems can't be unmounted and the failover fails.When all resources are down on node 1 HACMP starts to bring up all resources on node 2. The application start script is the last thing hacmp does.

7. Verify the status of the cluster using the command #lssrc -ls clstrmgrES .  It should be in "stable" state . If so everything is fine . 
7. Perform the server-level health-checkup to validate the FS and  Cluster IP'S have moved successfully.
8. Inform APP/DB Team to start the APP/DB Services or validate the APP/DB Status after failover  


  Force of auto failover by rebooting active node (typically not recommended, but an option)
HACMP is intelligent enough to differentiate between deliberate shutdown and   abrupt shutdown of node due to  any hardware failures.  Whenever we are forcing the failover by bringing down the active node  , shutdown ,reboot command will not trigger failover.
                                 The halt command will only force the automatic  RG failover from Server end .

1.       Login to node1 , run the command #halt –q as root user . This will bring down the node1 abruptly and Force the RG available on node1 to automatically  failover to node2 .
2.       Login to node2 ,Verify the Resource group status on node2    using the below command .

# clRGinfo
-----------------------------------------------------------------------------
Group Name     Group State            Node          
-----------------------------------------------------------------------------
RES_01     OFFLINE               node1       
                   ONLINE                node2           Failover completed successfully ,node2 has acquired Resource Group  (RES_01)

RES_02     ONLINE                  node2       
                   OFFLINE                node1 

3.       Verify that all the filesystems and IP’s are available on node2 after the automatic failover.
4.       Inform APP/DB Team to validate the APP/DB Status and Startup(if applicable)














Saturday, April 23, 2016

Introduction to OPENSSH


           Introduction to  openSSH & SSH (Secure Shell)                                                       



opensSSH?

OpenSSH is a free implementation of the SSH 1 and SSH 2 protocols. It was originally developed as part of the OpenBSD (Berkeley Software Distribution) operating system and is now released as a generic solution for UNIX or Linux® and similar operating systems.

What Does openSSH Package Provides?                                    
Basically openSSH provides three kind of services
Ø  logging to the server(SSH)
Ø   secure file transfer(SFTP)
Ø  Secure Copy (SCP)

Why SSH?


   SSH was designed as a replacement for Telnet and for unsecured remote shell 

           protocols such as the Berkeley rlogin, rsh, and rexec protocols.

             Those protocols send information, notablypasswords, in plaintext, rendering

           them  susceptible to interception and disclosure using packet analysis.

Note:  The encryption used by SSH is intended to provide confidentiality and integrity of data over an unsecured network, such as the Internet.


In UNIX the configuration files for ssh is sshd_config and for older version it's ssh_config. it is basically located under /etc/ssh directory.



What is SSH ?
The Secure Shell (SSH) protocol was developed to get around these limitations.
 The standard TCP port 22  has been assigned for contacting SSH servers:

1. SSH provides for encryption of the entire communication channel, including the login and password credential exchange

2.It can be used with public and private keys to provide automatic authentication for logins.

3.   You can also use SSH as an underlying transport protocol for other services 



How SSH Protocol works?

SSH architecture

IETF RFCs 4251 through 4256 define SSH as the "Secure Shell Protocol for remote login and other secure network services over an insecure network." The shell consists of three main elements.

·         Transport Layer Protocol: This protocol accommodates server authentication, privacy, and integrity with perfect forward privacy. This layer can provide optional compression and is run over a TCP/IP connection but can also be used on top of any other dependable data stream.
It sets up encryption, integrity verification, and (optionally) compression and exposes to the upper layer an API for sending and receiving plain text packets.

·         User Authentication Protocol: This protocol authenticates the client to the server and runs over the transport layer. Common authentication methods include password, public key, keyboard-interactive, GSSAPI, SecureID, and PAM.

·         Connection Protocol: This protocol multiplexes the encrypted tunnel to numerous logical channels, running  over the User Authentication Protocol. A single SSH connection can host multiple channels concurrently, each transferring data in both directions




What are the different SSH Protocol Versions?
When  first time SSH Protocol Version 1  was introduced , Many vulnerabilities were reported and for fixing the  vulnerabilities in between many versions were introduced like 1.3,1.5 etc 

Currently  we are having two major  SSH Protocol Versions.
1.       SSH Protocol Version 1
2.       SSH Protocol Version 2

What is SSH Protocol Version 1 ?
SSH version 1 makes use of several patented encryption algorithms (however, some of these patents have expired) and is vulnerable to a well known security exploit that allows an attacker to insert data into the communication stream.
What is SSH Protocol Version 2 ?
SSH protocol version 2 is the default protocol used these days.
 This is due to some major advancements in version 2 compared to version 1.
 The workflow of the ssh login is almost same as that of version 1, however there are some major changes done in the protocol level.
Some of these changes include improved encryption standards, Public key certification, much better message authentication codes, reassignment of session key etc.

Various types of encryption are available, ranging from 512-bit encryption to as high as 32768 bits, inclusive of ciphers, like Blowfish, Triple DES, CAST-128, Advanced Encryption Scheme (AES), and ARCFOUR.

Why is SSH Protocol Version 1 not encouraged?
                                                                After using the SSH Version 1 ,it was noticed that , hackers are able to do  unauthorized insertion of content into an encrypted SSH stream due to insufficient data integrity protection from CRC-32 used in this version of the protocol . Later the developers of SSH released fixes but the vulnerability detection continued  due to the flaw in the design flaw of this protocol .



Differences between SSH1 and SSH2 protocols

SSH protocol, version 2
SSH protocol, version 1
Separate transport, authentication, and connection protocols
One monolithic protocol
Strong cryptographic integrity check
Weak CRC-32 integrity check; admits an insertion attack in conjunction with some bulk ciphers.
Supports password changing
N/A
Any number of session channels per connection (including none)
Exactly one session channel per connection (requires issuing a remote command even when you don't want one)
Full negotiation of modular cryptographic and compression algorithms, including bulk encryption, MAC, and public-key
Negotiates only the bulk cipher; all others are fixed
Encryption, MAC, and compression are negotiated separately for each direction, with independent keys
The same algorithms and keys are used in both directions (although RC4 uses separate keys, since the algorithm's design demands that keys not be reused)
Extensible algorithm/protocol naming scheme allows local extensions while preserving interoperability
Fixed encoding precludes interoperable additions
User authentication methods:
  • publickey (DSA, RSA*, OpenPGP)
  • hostbased
  • password
  • (Rhosts dropped due to insecurity)
Supports a wider variety:
  • public-key (RSA only)
  • RhostsRSA
  • password
  • Rhosts (rsh-style)
  • TIS
  • Kerberos
Use of Diffie-Hellman key agreement removes the need for a server key
Server key used for forward secrecy on the session key
Supports public-key certificates
N/A
User authentication exchange is more flexible, and allows requiring multiple forms of authentication for access.
Allows for exactly one form of authentication per session.
hostbased authentication is in principle independent of client network address, and so can work with proxying, mobile clients, etc. (though this is not currently implemented).
RhostsRSA authentication is effectively tied to the client host address, limiting its usefulness.
periodic replacement of session keys
N/A







How to know which  SSH protocol version is used for connection ?








[root@saks20161 ~]# telnet 192.168.0.115 22
Trying 192.168.0.115...
Connected to 192.168.0.105 (192.168.0.115).
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3 >>>>>>>  this will show the protocol used


 SSH security and configuration best practices

SSH Security hardening is also required to minimize the security attacks . openSSH provides lot of flexibility where we can enable/disable the various features  using the ssh configuration file.
Below are the  list of processes and configurations that you can use to tighten and enhance SSH security with regard to remote host access:

      Restrict the root account to console access only:

# vi /etc/ssh/sshd_config
PermitRootLogin no

Create private-public key pairs using a strong passphrase and password protection for the private key 

a:) never generate a password-less key pair or a password-less passphrase key-less login
b:) Use a higher bit rate for the encryption for more security

ssh-keygen -t rsa -b 4096


Restrict SSH access by controlling user access

We can restrict the user access  through ssh as per our need in ssh configuration files . Below mentioned 4 funtions  can be used for doing this.

·         AllowUsers
·         AllowGroups
·         DenyUsers
·         DenyGroups

·         # vi /etc/ssh/sshd_config
·         AllowUsers fsmythe bnice swilson



Only use SSH Protocol 2

·         # vi /etc/ssh/sshd_config
            Protocol 2


Don't allow Idle sessions, and configure the Idle Log Out Timeout interval:

·         # vi /etc/ssh/sshd_config
·         ClientAliveInterval 600                           # (Set to 600 seconds = 10 minutes)

Disable host-based authentication:

·         # vi /etc/ssh/sshd_config
           HostbasedAuthentication no

Disable users' .rhosts files

·         # vi /etc/ssh/sshd_config
            IgnoreRhosts yes


Confine SFTP users to their own home directories by using Chroot SSHD

·         # vi /etc/ssh/sshd_config
·         ChrootDirectory /data01/home/%u


Disable empty passwords:

·         # vi /etc/ssh/sshd_config
           PermitEmptyPasswords no

Configure an increase in SSH logging verbosity:

·         # vi /etc/ssh/sshd_config
            LogLevel DEBUG


IMP: after doing any of the above changes in the ssh configuration files ,you need to stop and start the ssh services.  This changes will impact only the new connections . The existing SSH Connections will be using the earlier configuration .

******************************************************************************









         

Thursday, January 28, 2016

NTP Configuration

                        NTP

Network Time Protocol (NTP) is a networking protocol for clock synchronization
between computer systems over packet-switched, variable-latency data networks.

NTP is one of the oldest Internet protocols . NTP was originally designed by David L. Mills of the University of Delaware,

The protocol uses  client-server model.  NTP Uses  UDP port Number 123 for sending  and receiving timestamps(packets) 

NTP uses a hierarchical, semi-layered system of time sources. Each level of this hierarchy is termed a "stratum" .


 Note: Suppose your NTP Master Server is in Stratum 3. Then the Client's will be of Stratum 4.


* stratum 16 is used to indicate that a device is unsynchronized.


How to configure NTP Server .

On client
  1. Verify that you have a server suitable for synchronization. Enter:
    # ntpdate -d ip.address.of.server
    
    The offset must be less than 1000 seconds for xntpd to synch. If the offset is greater than 1000 seconds, change the time manually on the client and run the ntpdate -d again.
    If you get the message, "no server suitable for synchronization found", verify xntpd is running on the server (see above) and that no firewalls are blocking port 123.

    2. Specify your xntp server in /etc/ntp.conf, enter:
    # vi /etc/ntp.conf
          (Comment out the "broadcastclient" line and add server ip.address.of.server prefer.)
           Leave the driftfile and tracefile at their defaults.
  1. Start the xntpd daemon:
    # startsrc -s xntpd
    

  2. Uncomment xntpd from /etc/rc.tcpip so it will start on a reboot.
    # vi /etc/rc.tcpip
    
    Uncomment the following line:
    start /usr/sbin/xntpd "$src_running"
    
    If using the -x flag, add "-x" to the end of the line. You must include the quotes around the -x.

  3. Verify that the client is synched.
    # lssrc -ls xntpd
    
    NOTE: Sys peer should display the IP address or name of your xntp server. This process may take up to 12 minutes.



*****************Under Construction*********************************

Saturday, September 19, 2015

Introduction to GPFS Filesystem




  • IBM Introduced GPFS Filesystems in 1998.
  • GPFS is a high performance clustered file system developed by IBM .

  • GPFS provides concurrent high speed file access to application executing on multiple nodes of cluster

  •  It is a high-performance shared-disk file system that can provide fast data access from all nodes in a homogenous or heterogenous cluster of IBM UNIX servers running either the AIX or the Linux operating system or windows.

  • All nodes in a GPFS cluster have the same GPFS journaled filesystem mounted, allowing multiple nodes to be active at the same time on the same data.




GPFS Filesystem internals 


A file system (or stripe group) consists of a set of disks that are used to store file metadata as well as data and structures used by GPFS, including quota files and GPFS recovery


                 How does the GPFS Filesystem works ?

Whenever a disk is added to GPFS Filesystem , a file system descriptor is written on it . The filesystem desccriptor is written at a fixed position
on each disks which helps the GPFS to identify this disk and its place in a file system.

The filesystem descriptor contains file system specifications and information about the state of the file system.


the GPFS  Filesystem uses the concept of inodes,indirect blocks and data blocks to access and store the disks .


                 what is metadata ?

Inodes and indiret blocks are considered as metadata .
The metadata for each file is stored in the inodes and contains information such as file-name,file-size and  last modification timestamp.

For faster access , the inodes of the small files also contains the addresses of all disk blocks that contains the file data.


You can control which disks GPFS uses for storing metadata when creating the file system using the mmcrfs command or
when modifying the file system at a later time by issuing the mmchdisk command.


How to define which disk will be used for storing the metadata ?


already discussed ,the format of the  disk descriptor  file .

Diskname:::Diskusage:FailureGroup::StoragePool:

The DiskUsage field will decide what kind of data you are going to store in the disk

Below are the options that can be used.

  • dataAndMetadata     >>        indicates that disk stores both data and metadata
  • dataOnly                   >>        indicates that disk stores only data
  • metadataOnly            >>       indicates that disk contains only metadata
  • descOnly                   >>        indicates that disk contains only file system decsriptor.



         

We can also use the same options with the mmchdisk command for changing the disk usage options .


But after changing the diskusage paramter using mmchdisk command ,we need to use the mmrestripfs command with -r option to re-allocate the data
as per the new disk parameter. This is online activity but running the mmrestripefs command is I/O intensive,so need to be executed when i/O load is
less.

ex. mmchdisk gpfs0 change -d "gpfsnsd:::dataOnly"

after this confirm whether the changes has been done successfully using the  below command
mmlsdisk gpfs0


GPFS and memory


GPFS uses three areas of memory:


  •  memory allocated from the kernel heap, 
  • memory allocated within the daemon segment, and 
  • shared segments accessed from both the daemon and the kernel.


Memory allocated from the kernel heap
GPFS uses kernel memory for control structures such as vnodes and related structures
 that establish the necessary relationship with the operating system

Memory allocated within the daemon segment
GPFS uses daemon segment memory for file system manager functions. Because of that, the file system manager
 node requires more daemon memory since token states for the entire file system are initially stored there.

File system manager functions requiring daemon memory include:

  • Structures that persist for the execution of a command
  • Structures that persist for I/O operations
  • States related to other nodes



Shared segments accessed from both the daemon and the kernel

Shared segments consist of both pinned and unpinned memory that is allocated at daemon startup.
The initial values are the system defaults. However, you can change these values later using the mmchconfig


The pinned memory is called the pagepool and is configured by setting the pagepool cluster configuration parameter.
This pinned area of memory is used for storing file data and for optimizing the performance of various data access patterns


In a non-pinned area of the shared segment, GPFS keeps information about open and recently opened files. This information is held in two forms:
    1.  full inode cache
    2.   stat cache



Pinned  memory


GPFS  uses pinned memory (also called pagepool memory) for storing file data and metadata in support of I/O operations.
With some access patterns, increasing the amount of pagepool memory can increase I/O performance


Increased pagepool memory can be useful in the following cases:
There are frequent writes that can be overlapped with application execution.
There is frequent reuse of file data that can fit in the pagepool.
The I/O pattern contains various sequential reads large enough that the prefetching data improves performance.


Pinned memory regions cannot be swapped out to disk, which means that GPFS will always consume at least the value of pagepool in system memory.


Non-pinned memory
There are two levels of cache used to store file metadata:

Inode cache
The inode cache contains copies of inodes for open files and for some recently used files that are no longer open.
The maxFilesToCache parameter controls the number of inodes cached by GPFS.

Every open file on a node consumes a space in the inode cache.
Additional space in the inode cache is used to store the inodes for recently used files in case another application needs that data.

The number of open files can exceed the value defined by the maxFilesToCache parameter to enable applications to operate. However,
 when the maxFilesToCache number is exceeded, there is not more caching of recently open files, and only open file inode data is kept in the cache.


Stat cache
The stat cache contains enough information to respond to inquiries about the file and open it, but not enough information to read from it or write to it.

A stat cache entry consumes significantly less memory than a full inode. The default value stat cache is four times the maxFilesToCache parameter.

This value may be changed through the maxStatCache parameter on the mmchconfig command.



Monday, September 14, 2015

Adding the space or disks in GPFS Filesystem

          Steps to add the disks to the filesystem

step 1 : Before adding a disks in the GPFS ,take the details of GPFS disks .

      # mmlsnsd  and also verify using the command


       # mmlsnsd
         File system   Disk name    NSD servers
         --------------------------------------------------------------------------
          gpfs0         nsd08        (directly attached)

          gpfs0         nsd09        (directly attached)


      #mmlsnsd -m  >> this gives details of the corresponding disk and ID .
 

Step 2 : Before adding the disk in GPFS filesystem ,we need to create the 
         GPFS Disk using the command mmcrnsd.
       
         For creating a nsd we need to create a disk descriptor file . The format of the file is as follows .
         it is not necessary to to define all fields.
     

         disk-Name:Primaryserver:backupserver:diskusage:failuregroup:desiredname:storagepool
       
  I am going to add hdisk1,hdisk2,hdisk3,hdisk4,hdisk5,hdisk6 to the filesystem gpfs0 .


    Create the file /tmp/abhi/gpfs-disks.txt .

hdisk1:::dataAndMetadata::nsd01::
hdisk2:::dataAndMetadata::nsd02::
hdisk3:::dataAndMetadata::nsd03::
hdisk4:::dataAndMetadata::nsd04::
hdisk5:::dataAndMetadata::nsd05::
hdisk6:::dataAndMetadata::nsd06::



#mmcrnsd -F /tmp/abhi/gpfs-disks.txt

mmcrnsd: Processing disk hdisk1
mmcrnsd: Processing disk hdisk2
mmcrnsd: Processing disk hdisk3
mmcrnsd: Processing disk hdisk4
mmcrnsd: Processing disk hdisk5
mmcrnsd: Processing disk hdisk6
mmcrnsd: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Once the command is sucessful ,we can see that NSD names corresponding to the disks in lspv output.



# lspv
hdisk0          00c334b6af00e77b                    rootvg          active
hdisk1          none                                nsd01
hdisk2          none                                nsd02
hdisk3          none                                nsd03
hdisk4          none                                nsd04
hdisk5          none                                nsd05
hdisk6          none                                nsd06
hdisk8          none    nsd08
hdisk9          none      nsd09


Also  we need to verify using the mmlsnsd command .

# mmlsnsd
 File system   Disk name    NSD servers
--------------------------------------------------------------------------
 gpfs0         nsd08        (directly attached)

 gpfs0         nsd09        (directly attached)

(free disk)   nsd01        (directly attached)

(free disk)   nsd02        (directly attached)

(free disk)   nsd03        (directly attached)

(free disk)   nsd04        (directly attached)

(free disk)   nsd05        (directly attached)

(free disk)   nsd06        (directly attached)


step 3 -- after this we need to add the disks to the filesystems

Before adding the disk to the GPFS filesystems ,we need to create a disk descriptor file .
since while creating the NSD ,we have already defined some of the parameters so no need to define it again here .
                  Below fields "diskname",datausage ,failure group,storagepool should be defined

by default GPFS Cluster will have one storage pool that is "system" but we can define many storage pools as per our requirement.

diskname:::diskusage:failuregroup::storagepool:

    cat /tmp/abhi/gpfs-disk.txt
nsd01:::dataAndMetadata:-1::system
nsd02:::dataAndMetadata:-1::system
nsd03:::dataAndMetadata:-1::system

#mmadddisk gpfs -F /tmp/cg/gpfs-disk.txt -r  >>>-r option is used here for re-balancing the data on all the new disks

Note: Rebalancing of data is I/O intensive job . it is not preferred to use this option during peak load .

Once added verify the disk size using the df -gt and also the output of #mmlsnsd.

# mmlsnsd
 File system   Disk name    NSD servers
--------------------------------------------------------------------------
 gpfs0         nsd08        (directly attached)

 gpfs0         nsd09        (directly attached)

 gpfs0         nsd01        (directly attached)

 gpfs0         nsd02        (directly attached)

 gpfs0         nsd03        (directly attached)

 gpfs0         nsd04        (directly attached)

 gpfs0         nsd05        (directly attached)

 gpfs0         nsd06        (directly attached)





Wednesday, July 29, 2015

CPU ---MONITORING & PERFORMANCE & TUNING


central processing unit (CPU) of a computer is a piece of hardware that carries out the instructions of a computer program. It performs the basic arithmetical, logical, and input/output operations of a computer system. The CPU is like the brains of the computer - every instruction, no matter how simple, has to go through the CPU.

A typical CPU has a number of components.
1. ALU -which performs simple arithmetic and logical operations

    1. CU - Second is the control unit (CU), which manages the various components of the computer. It reads and interprets instructions from memory and transforms them into a series of signals to activate other parts of the computer. The control unit calls upon the arithmetic logic unit to perform the necessary calculations.

    1. Cache - CPU caching keeps recently (or frequently) requested data in a place where it is easily accessible. This avoids the delay associated with reading data from RAM.



What is CPU Processor Clock Speed ?

A processor's clock speed measures one thing -- how many times per second the processor has the opportunity to do something.

Ex. A 2.3 GHz processor's clock ticks 2.3 billion times per second, while a 2.6 GHz processor's clock ticks 2.6 billion times per second. All things being equal, the 2.6 GHz chip should be approximately 13 percent faster.



                    What is CPU Caching ?
CPU caching keeps recently (or frequently) requested data in a place where it is easily accessible. This avoids the delay associated with reading data from RAM.
                 
  • A CPU cache places a small amount of memory directly on the CPU. This memory is much faster than the system RAM because it operates at the CPU's speed rather than the system bus speed. The idea behind the cache is that chip makers assume that if data has been requested once, there's a good chance it will be requested again. Placing the data on the cache makes it accessible faster.


 WHY IS CACHE REQUIRED  FOR BETTER PERFORMANCE ?
 CPU  will be accessing the data from memory . CPU is  connected  to memory through system bus.  The clock speed of the CPU is much higher than the speed of the system Bus .  For completion of any request ,CPU need to fetch  the data from the memory which can be accessed after going through the system bus . here is speed of the system bus comes into picture .   As a result,request processing power of the CPU was impacted .
                                                    So for  overcoming this latency the concept of CPU Caching was introduced.   The Cache will be on the processor chip  and will store the recently or frequently requested data  and is lot many times faster than accessing data from memory . Now since all  the required data is already available in cache the CPU will not have to wait for getting the data from memory and in terms  request processing speed will be increased.

Typically there are now 3 layers of cache on modern CPU cores:

    L1 cache is very small and very tightly bound to the actual processing units of the CPU, it can typically fulfil data requests within 3 CPU clock ticks. L1 cache tends to be around 4-32KB depending on CPU architecture and is split between instruction and data caches.

    L2 cache is generally larger but a bit slower and is generally tied to a CPU core. Recent processors tend to have 512KB of cache per core and this cache has no distinction between instruction and data caches, it is a unified cache. 

    L3 cache tends to be shared by all the cores present on the CPU and is much larger and slower again, but it is still a lot faster than going to main memory.


Note : CPU Performance also largely depend on the size of the L1 ,L2 & L3 Cache



 performance metrics in terms of CPU Performance

latency
   

The time that one system component spends waiting for another component in order to complete the entire task. Latency can be defined as wasted time. In networking discussions, latency is defined as the travel time of a packet from source to destination. 

response time
   

The time between the submission of a request and the completion of the response.

response-time = service time+wait time


service time
   

The time between the initiation and completion of the response to a request. 

throughput
   
The number of requests processed per unit of time. 

wait time
   
The time between the submission of the request and initiation of the response.



Response Time

Because response time equals service time plus wait time, you can increase performance in this area by:

    Reducing wait time

    Reducing service time

Understanding different aspects of   CPU Service time  ? 


  Suppose ,a LPAR is having 2 physical CPU allocated to it (no SMT Enabled/single threaded Mode) then what will happen, each CPU will be processing 1 request at a time  .  hence there will be no wait time & also least service time. this in term will increase the application response time .

                                                                   In other case suppose LPAR is assigned .4 CPU and 2 virtual CPU . Also SMT-2 is enabled . that means you will be having  2 threads  per Virtual CPU.  Each Virtual CPU is entitled 20 ms per timecycle/core  . If simultaneously requests from  both the threads of  Virtual CPU1 is queued up in the run-queue.The thread which is having the  high priority will be dispatched for execution  first. CPU dispatcher & Scheduler  will decide when to provide the timeslice to other  thread as per the scheduling algorithms.IF the primary physical CPU is not able to provide the timeslice to the thread ,context switching will happen and the request will be executed by other physical CPU of the same pool . That means anyhow your's service-time will be increased . This in terms will increase the application  response time  .