Saturday, September 19, 2015

Introduction to GPFS Filesystem




  • IBM Introduced GPFS Filesystems in 1998.
  • GPFS is a high performance clustered file system developed by IBM .

  • GPFS provides concurrent high speed file access to application executing on multiple nodes of cluster

  •  It is a high-performance shared-disk file system that can provide fast data access from all nodes in a homogenous or heterogenous cluster of IBM UNIX servers running either the AIX or the Linux operating system or windows.

  • All nodes in a GPFS cluster have the same GPFS journaled filesystem mounted, allowing multiple nodes to be active at the same time on the same data.




GPFS Filesystem internals 


A file system (or stripe group) consists of a set of disks that are used to store file metadata as well as data and structures used by GPFS, including quota files and GPFS recovery


                 How does the GPFS Filesystem works ?

Whenever a disk is added to GPFS Filesystem , a file system descriptor is written on it . The filesystem desccriptor is written at a fixed position
on each disks which helps the GPFS to identify this disk and its place in a file system.

The filesystem descriptor contains file system specifications and information about the state of the file system.


the GPFS  Filesystem uses the concept of inodes,indirect blocks and data blocks to access and store the disks .


                 what is metadata ?

Inodes and indiret blocks are considered as metadata .
The metadata for each file is stored in the inodes and contains information such as file-name,file-size and  last modification timestamp.

For faster access , the inodes of the small files also contains the addresses of all disk blocks that contains the file data.


You can control which disks GPFS uses for storing metadata when creating the file system using the mmcrfs command or
when modifying the file system at a later time by issuing the mmchdisk command.


How to define which disk will be used for storing the metadata ?


already discussed ,the format of the  disk descriptor  file .

Diskname:::Diskusage:FailureGroup::StoragePool:

The DiskUsage field will decide what kind of data you are going to store in the disk

Below are the options that can be used.

  • dataAndMetadata     >>        indicates that disk stores both data and metadata
  • dataOnly                   >>        indicates that disk stores only data
  • metadataOnly            >>       indicates that disk contains only metadata
  • descOnly                   >>        indicates that disk contains only file system decsriptor.



         

We can also use the same options with the mmchdisk command for changing the disk usage options .


But after changing the diskusage paramter using mmchdisk command ,we need to use the mmrestripfs command with -r option to re-allocate the data
as per the new disk parameter. This is online activity but running the mmrestripefs command is I/O intensive,so need to be executed when i/O load is
less.

ex. mmchdisk gpfs0 change -d "gpfsnsd:::dataOnly"

after this confirm whether the changes has been done successfully using the  below command
mmlsdisk gpfs0


GPFS and memory


GPFS uses three areas of memory:


  •  memory allocated from the kernel heap, 
  • memory allocated within the daemon segment, and 
  • shared segments accessed from both the daemon and the kernel.


Memory allocated from the kernel heap
GPFS uses kernel memory for control structures such as vnodes and related structures
 that establish the necessary relationship with the operating system

Memory allocated within the daemon segment
GPFS uses daemon segment memory for file system manager functions. Because of that, the file system manager
 node requires more daemon memory since token states for the entire file system are initially stored there.

File system manager functions requiring daemon memory include:

  • Structures that persist for the execution of a command
  • Structures that persist for I/O operations
  • States related to other nodes



Shared segments accessed from both the daemon and the kernel

Shared segments consist of both pinned and unpinned memory that is allocated at daemon startup.
The initial values are the system defaults. However, you can change these values later using the mmchconfig


The pinned memory is called the pagepool and is configured by setting the pagepool cluster configuration parameter.
This pinned area of memory is used for storing file data and for optimizing the performance of various data access patterns


In a non-pinned area of the shared segment, GPFS keeps information about open and recently opened files. This information is held in two forms:
    1.  full inode cache
    2.   stat cache



Pinned  memory


GPFS  uses pinned memory (also called pagepool memory) for storing file data and metadata in support of I/O operations.
With some access patterns, increasing the amount of pagepool memory can increase I/O performance


Increased pagepool memory can be useful in the following cases:
There are frequent writes that can be overlapped with application execution.
There is frequent reuse of file data that can fit in the pagepool.
The I/O pattern contains various sequential reads large enough that the prefetching data improves performance.


Pinned memory regions cannot be swapped out to disk, which means that GPFS will always consume at least the value of pagepool in system memory.


Non-pinned memory
There are two levels of cache used to store file metadata:

Inode cache
The inode cache contains copies of inodes for open files and for some recently used files that are no longer open.
The maxFilesToCache parameter controls the number of inodes cached by GPFS.

Every open file on a node consumes a space in the inode cache.
Additional space in the inode cache is used to store the inodes for recently used files in case another application needs that data.

The number of open files can exceed the value defined by the maxFilesToCache parameter to enable applications to operate. However,
 when the maxFilesToCache number is exceeded, there is not more caching of recently open files, and only open file inode data is kept in the cache.


Stat cache
The stat cache contains enough information to respond to inquiries about the file and open it, but not enough information to read from it or write to it.

A stat cache entry consumes significantly less memory than a full inode. The default value stat cache is four times the maxFilesToCache parameter.

This value may be changed through the maxStatCache parameter on the mmchconfig command.



Monday, September 14, 2015

Adding the space or disks in GPFS Filesystem

          Steps to add the disks to the filesystem

step 1 : Before adding a disks in the GPFS ,take the details of GPFS disks .

      # mmlsnsd  and also verify using the command


       # mmlsnsd
         File system   Disk name    NSD servers
         --------------------------------------------------------------------------
          gpfs0         nsd08        (directly attached)

          gpfs0         nsd09        (directly attached)


      #mmlsnsd -m  >> this gives details of the corresponding disk and ID .
 

Step 2 : Before adding the disk in GPFS filesystem ,we need to create the 
         GPFS Disk using the command mmcrnsd.
       
         For creating a nsd we need to create a disk descriptor file . The format of the file is as follows .
         it is not necessary to to define all fields.
     

         disk-Name:Primaryserver:backupserver:diskusage:failuregroup:desiredname:storagepool
       
  I am going to add hdisk1,hdisk2,hdisk3,hdisk4,hdisk5,hdisk6 to the filesystem gpfs0 .


    Create the file /tmp/abhi/gpfs-disks.txt .

hdisk1:::dataAndMetadata::nsd01::
hdisk2:::dataAndMetadata::nsd02::
hdisk3:::dataAndMetadata::nsd03::
hdisk4:::dataAndMetadata::nsd04::
hdisk5:::dataAndMetadata::nsd05::
hdisk6:::dataAndMetadata::nsd06::



#mmcrnsd -F /tmp/abhi/gpfs-disks.txt

mmcrnsd: Processing disk hdisk1
mmcrnsd: Processing disk hdisk2
mmcrnsd: Processing disk hdisk3
mmcrnsd: Processing disk hdisk4
mmcrnsd: Processing disk hdisk5
mmcrnsd: Processing disk hdisk6
mmcrnsd: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Once the command is sucessful ,we can see that NSD names corresponding to the disks in lspv output.



# lspv
hdisk0          00c334b6af00e77b                    rootvg          active
hdisk1          none                                nsd01
hdisk2          none                                nsd02
hdisk3          none                                nsd03
hdisk4          none                                nsd04
hdisk5          none                                nsd05
hdisk6          none                                nsd06
hdisk8          none    nsd08
hdisk9          none      nsd09


Also  we need to verify using the mmlsnsd command .

# mmlsnsd
 File system   Disk name    NSD servers
--------------------------------------------------------------------------
 gpfs0         nsd08        (directly attached)

 gpfs0         nsd09        (directly attached)

(free disk)   nsd01        (directly attached)

(free disk)   nsd02        (directly attached)

(free disk)   nsd03        (directly attached)

(free disk)   nsd04        (directly attached)

(free disk)   nsd05        (directly attached)

(free disk)   nsd06        (directly attached)


step 3 -- after this we need to add the disks to the filesystems

Before adding the disk to the GPFS filesystems ,we need to create a disk descriptor file .
since while creating the NSD ,we have already defined some of the parameters so no need to define it again here .
                  Below fields "diskname",datausage ,failure group,storagepool should be defined

by default GPFS Cluster will have one storage pool that is "system" but we can define many storage pools as per our requirement.

diskname:::diskusage:failuregroup::storagepool:

    cat /tmp/abhi/gpfs-disk.txt
nsd01:::dataAndMetadata:-1::system
nsd02:::dataAndMetadata:-1::system
nsd03:::dataAndMetadata:-1::system

#mmadddisk gpfs -F /tmp/cg/gpfs-disk.txt -r  >>>-r option is used here for re-balancing the data on all the new disks

Note: Rebalancing of data is I/O intensive job . it is not preferred to use this option during peak load .

Once added verify the disk size using the df -gt and also the output of #mmlsnsd.

# mmlsnsd
 File system   Disk name    NSD servers
--------------------------------------------------------------------------
 gpfs0         nsd08        (directly attached)

 gpfs0         nsd09        (directly attached)

 gpfs0         nsd01        (directly attached)

 gpfs0         nsd02        (directly attached)

 gpfs0         nsd03        (directly attached)

 gpfs0         nsd04        (directly attached)

 gpfs0         nsd05        (directly attached)

 gpfs0         nsd06        (directly attached)





Wednesday, July 29, 2015

CPU ---MONITORING & PERFORMANCE & TUNING


central processing unit (CPU) of a computer is a piece of hardware that carries out the instructions of a computer program. It performs the basic arithmetical, logical, and input/output operations of a computer system. The CPU is like the brains of the computer - every instruction, no matter how simple, has to go through the CPU.

A typical CPU has a number of components.
1. ALU -which performs simple arithmetic and logical operations

    1. CU - Second is the control unit (CU), which manages the various components of the computer. It reads and interprets instructions from memory and transforms them into a series of signals to activate other parts of the computer. The control unit calls upon the arithmetic logic unit to perform the necessary calculations.

    1. Cache - CPU caching keeps recently (or frequently) requested data in a place where it is easily accessible. This avoids the delay associated with reading data from RAM.



What is CPU Processor Clock Speed ?

A processor's clock speed measures one thing -- how many times per second the processor has the opportunity to do something.

Ex. A 2.3 GHz processor's clock ticks 2.3 billion times per second, while a 2.6 GHz processor's clock ticks 2.6 billion times per second. All things being equal, the 2.6 GHz chip should be approximately 13 percent faster.



                    What is CPU Caching ?
CPU caching keeps recently (or frequently) requested data in a place where it is easily accessible. This avoids the delay associated with reading data from RAM.
                 
  • A CPU cache places a small amount of memory directly on the CPU. This memory is much faster than the system RAM because it operates at the CPU's speed rather than the system bus speed. The idea behind the cache is that chip makers assume that if data has been requested once, there's a good chance it will be requested again. Placing the data on the cache makes it accessible faster.


 WHY IS CACHE REQUIRED  FOR BETTER PERFORMANCE ?
 CPU  will be accessing the data from memory . CPU is  connected  to memory through system bus.  The clock speed of the CPU is much higher than the speed of the system Bus .  For completion of any request ,CPU need to fetch  the data from the memory which can be accessed after going through the system bus . here is speed of the system bus comes into picture .   As a result,request processing power of the CPU was impacted .
                                                    So for  overcoming this latency the concept of CPU Caching was introduced.   The Cache will be on the processor chip  and will store the recently or frequently requested data  and is lot many times faster than accessing data from memory . Now since all  the required data is already available in cache the CPU will not have to wait for getting the data from memory and in terms  request processing speed will be increased.

Typically there are now 3 layers of cache on modern CPU cores:

    L1 cache is very small and very tightly bound to the actual processing units of the CPU, it can typically fulfil data requests within 3 CPU clock ticks. L1 cache tends to be around 4-32KB depending on CPU architecture and is split between instruction and data caches.

    L2 cache is generally larger but a bit slower and is generally tied to a CPU core. Recent processors tend to have 512KB of cache per core and this cache has no distinction between instruction and data caches, it is a unified cache. 

    L3 cache tends to be shared by all the cores present on the CPU and is much larger and slower again, but it is still a lot faster than going to main memory.


Note : CPU Performance also largely depend on the size of the L1 ,L2 & L3 Cache



 performance metrics in terms of CPU Performance

latency
   

The time that one system component spends waiting for another component in order to complete the entire task. Latency can be defined as wasted time. In networking discussions, latency is defined as the travel time of a packet from source to destination. 

response time
   

The time between the submission of a request and the completion of the response.

response-time = service time+wait time


service time
   

The time between the initiation and completion of the response to a request. 

throughput
   
The number of requests processed per unit of time. 

wait time
   
The time between the submission of the request and initiation of the response.



Response Time

Because response time equals service time plus wait time, you can increase performance in this area by:

    Reducing wait time

    Reducing service time

Understanding different aspects of   CPU Service time  ? 


  Suppose ,a LPAR is having 2 physical CPU allocated to it (no SMT Enabled/single threaded Mode) then what will happen, each CPU will be processing 1 request at a time  .  hence there will be no wait time & also least service time. this in term will increase the application response time .

                                                                   In other case suppose LPAR is assigned .4 CPU and 2 virtual CPU . Also SMT-2 is enabled . that means you will be having  2 threads  per Virtual CPU.  Each Virtual CPU is entitled 20 ms per timecycle/core  . If simultaneously requests from  both the threads of  Virtual CPU1 is queued up in the run-queue.The thread which is having the  high priority will be dispatched for execution  first. CPU dispatcher & Scheduler  will decide when to provide the timeslice to other  thread as per the scheduling algorithms.IF the primary physical CPU is not able to provide the timeslice to the thread ,context switching will happen and the request will be executed by other physical CPU of the same pool . That means anyhow your's service-time will be increased . This in terms will increase the application  response time  .




Thursday, April 09, 2015

Cluster issue(netmon.cf )--solved

                           HACMP CLUSTER ISSUE.

The hacmp cluster nodes  are having the only 1 ethernet adapter which is virtual.


           IP Configuration on both nodes
node-1 
 en0: 
boot-ip           :  192.168.3.7
Persistent-ip :   10.1.1.16

node-2 

 en0:
boot-ip           :    192.168.3.8
persistent-ip :    10.1.1.18

problem statement


After performing the re-configuration ,when we started the cluster, automatically 1 node was getting rebooted automatically.


              Understanding the exact issue.

1. we verified the cluster logs (hacmp.out and cluster.log files) and found the  error mssgs from both the nodes.

       hacmp.out log entry on "node-1"

dec  5 23:07:52 node-1 user:notice HACMP for AIX: EVENT START: fail_interface node-1 192.168.3.7    >>>>>>>>>>>>>>>>> this indicates that there is some issue with boot-ip interface
dec  5 23:07:52 node-1 user:notice HACMP for AIX: EVENT COMPLETED: fail_interface node-1 192.168.3.7 0
dec  5 23:07:59 node-1 local0:crit clstrmgrES[7143544]: Sun dec  5 23:07:59 announcementCb: Called, state=ST_STABLE, provider token 1
dec  5 23:07:59 node-1 local0:crit clstrmgrES[7143544]: Sun dec  5 23:07:59 announcementCb: GsToken 2, AdapterToken -1, rm_GsToken 1
dec  5 23:07:59 node-1 local0:crit clstrmgrES[7143544]: Sun dec  5 23:07:59 announcementCb: GRPSVCS announcment code=512; exiting
dec  5 23:07:59 node-1 local0:crit clstrmgrES[7143544]: Sun dec  5 23:07:59  CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs) >>>this refers nthat there can be heartbeat issue.
dec  5 23:07:59 node-1 daemon:err|error haemd[13041862]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395,                                     haemd: 2521-032 Cannot dispatch group services (1). >>> This again indicates that there is some issue with boot-ip's
dec  5 23:07:59 node-1 user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.
dec  5 19:08:00 node-1 user:notice HACMP for AIX: clexit.rc : Halting system immediately!!!


                      Cluster.log error entry

cllsstbys: No communication interfaces found.


on other node also same type of error mssgs were recieved.


Now question arise that why both the nodes are complaining that the boot-ip's interfaces  are down .                                                                                                                                      


1. So  again we re-validated the cluster configuration, we tried ping testand also performed heartbeat functionality  test from both the nodes to figure out exact issue.
                           The cluster configuration was fine and also offline synchronization was happening successfully,hearbeat links were operational and also the ping test was successful.

 2. Then,we started to verify the cluster related configuration files and at last were successful in finding the root cause.


                 Root Cause


 After going through the netmon.cf file that is normally used in virtualized cluster environment, we found the following entry  in the netmon.cf file

        Node-1

!REQD !ALL 192.168.2.8

  The node-1  ethernet adapter  will be considered up if it is able to ping 192.168.2.8  .

  Here was the issue, the entry in the netmon.cf file was in-correct means wrong ip was mentioned in the file.  because of the entry in netmon.cf file , the node-1 where trying to reach the ip-address 192.168.2.8 ,since this ip doesn't exist  and is un-reachable ,it marks the interface as down in cluster .
         Node-2

!REQD !ALL 192.168.2.7

 The node-2  ethernet adapter  will be considered up if it is able to ping 192.168.2.7


     here was the issue, the entry in the netmon.cf file. here also since ip 192.168.2.7 was not reachable, the cluster was marking the interface as down in the cluster log.


This lead to the condition when both the nodes think that other node is not reachable and will try to grab the RG 's and for maintaining the data integrity,it will reboot the other node.



                   Solution provided


We modified the entry in the netmon.cf file  on both the nodes as

 Node-1

!REQD !ALL 192.168.3.8


Node-2

!REQD !ALL 192.168.3.7

After that we synchronized the cluster and started the cluster, the issue got resolved.


Note : if we are removing the netmon.cf file then also the issue will get resolved.

Friday, March 13, 2015

0505-121:alt_disk_install error & 0516-082 -----solved

516-082:/usr/sbin/lchangelv: Unable to access a special device file.

0505-121:alt_disk_install error


0516-082:/usr/sbin/lchangelv: Unable to access a special device file.
Execute redefinevg and synclvodm to build correct environment.


              Understanding the exact issue first

 The lchangelv command is low level command used to change the lv parameters. As per the error ,lchangelv is not able to access the special device files ,means issue seems with device files  and Can be with ODM also.

                  Below are the steps performed  


     1.                         for understanding , i ran alt_clone again and tried to figure out the major number of   the device files related to alt_clone. got the details


                                                  major number
brw-rw----    1 root     system       40,                 9   Mar 14 06:32 alt_hd10opt
brw-rw----    1 root     system       40,                14 Mar 14 06:32 alt_fslv01 







                                                                                                             
 2. verified the  existing device files starting with 40 to figure out whether the device files where actually used .

3. Copied all the files with major number 40 to different folder .

4. Again ran alt_clone ,but same error.But noticed that again the minor numbers are starting with 20 ,means there are some  ODM entries also that are affecting.

5. Since every device file will have entry in CuDvDr ,i started searching for the Major Number 40 in it
                            
odmget -q value1=40 CuDvDr
6. Once again cross verified the values  and after confirmation  removed it.
odmdelete -q value1=40 -o CuDvDr

7. Now my CuDvDr class is clear. But we need to verify all the object classes.

#odmget CuAt | grep alt
#odmget CuDv | grep alt
#odmget CuDvDr | grep alt
#odmget CuDep | grep alt

#odmget -q parent=altinst_rootvg -o CuDv








8. Found that   entries where available only on CuDv and CuDep. Hence removed the entries using below command.


odmdelete -q parent=altinst_rootvg -o CuDv
odmdelete -q name=altinst_rootvg -o CuDep



9. Started the Alt_clone  and Completed successfully.

Wednesday, January 21, 2015

LINUX Disk Storage Management & LVM Storage Management

       LINUX Disk Storage Management & LVM Storage Management




                     Disk Storage  Management
                    ------------------------------------------

Physical Disks are represented in LINUX as /dev/sda(SCSI Disks), /dev/hda(IDE Disks).

Suppose there is three SCSI Disks connected to the server . It will appear on server as

1st  Disk --/dev/sda
2nd Disk --/dev/sdb
3rd Disk --/dev/sdc



A valid block device could be one of two types of entries:

    A mapped device — A logical volume in a volume group, for example, /dev/mapper/VolGroup00-LogVol02.

    A static device — A traditional storage volume, for example, /dev/hdbX,/dev/sdaX, where hdb & sda  is a storage device name and X is the partition number. 

What is Partition?

The physical disk can be divided into one or more logical disks . These logical disks are known as partitions.


The idea is that if you have one hard disk, and want to have, say, two operating systems on it, you can divide the disk into two partitions. Each operating system uses its partition as it wishes and doesn't touch the other ones. This way the two operating systems can co-exist peacefully on the same hard disk. Without partitions one would have to buy a hard disk for each operating system.


*On an IDE drive you can have up to 63 partitions, 3 primary and 60 logical ( contained in one extended partition )

*On a SCSI drive the maximum number of partitions is 15.

Ex. -- Suppose you want to  4 partition  on new disk assigned to the server /dev/sdb.

After  partitioning the newly created  logical partitions will appear as.
 /dev/sdb1 ,/dev/sdb2 ,/dev/sdb3 and /dev/sdb4



What is Extended Partition ?


An extended partition is the only kind of partition that can have multiple partitions inside. Think of it like a box that contains other boxes, the logical partitions.

 The extended partition can't store anything, it's just a holder for logical partitions.
The extended partitions is a way to get around the fact you can only have four primary partitions on a drive. You can put lots of logical partitions inside it.
 
 What is Logical Partition?

Logical partitions are partitions that are created by dividing up the extended partition.


                       MBR(Master Boot Record)

The MBR is a small program that is executed when a computer begins to boot up (i.e., start up) in order to find the operating system and load parts of it into memory.

The first sector is the master boot record (MBR) of the disk


 The master boot record contains a small program that ;

1.  Reads the partition table, checks which partition is active (that is, marked bootable),
2. Reads  the first sector of that partition, the partition's boot sector (the MBR is also a boot sector, but it has a special status and therefore a special name). This boot sector contains another small program that reads the first part of the operating system stored on that partition (assuming it is bootable), and then starts it.


Understanding the Partitoning Concept


There are two ways of partitioning the disk :

 1. Standard Partitions using parted
 2. LVM Partition Management


 Standard Partitions using parted

parted utility is used in linux for partitioning the disks having large size greater than 2 TB .

By default, the parted package is included when installing Red Hat Enterprise Linux.

 Using the parted utility , we can perform below tasks.

    a)  View the existing partition table

    b)  Change the size of existing partitions

    c)  Add partitions from free space or additional hard drives 



                          Viewing the Partition Table






            Creating a Partition

 For creating the partition on new disk first we need to label the disk.

From the partition table, determine the start and end points of the new partition and what partition type it should be.







 Removing  a Partition


 Creation of swap Partition using parted

 Creating a LVM Partition using parted


 Creation of boot partition using parted






Saturday, January 17, 2015

REDHAT LINUX BASICS -STARTUP

                    LINUX -OPERATING SYSTEM.

Linus Travolds released linux in 1991 under GPL.

What is kernel?

Kernel is called as the heart of operating system. Kernel is also the program  acting as chief opertions

There are many functionalities that are handled by Kernel.Below are the list of some critical fuctionalities:
 1. Starting & Stopping other programs.
 2. Handling Requests from memory
 3. Accessing disks
 4.Managing network connections etc..


Kernel are basically of two types :

1. Monolithic -----That provides all the services that application needs
                              EX; Linux is using monolithic kernel

2. Micro Kernel --- These consists of small core set of services . It nees                       other modules to be loaded  to perform other functions.
                               EX:Windows.



LINUX Distributions  are classified into two groups

1. Commercial     --  This type of distribution tends to have longer release cycle .Also Commercial generally offers support for their distribution  at certain cost. EX--redhat,suse

2.Non-Commercial  --The company offers use the  non-commercial distribution basically  for testing purpose   of the software. Several of ,non-commercial distributions are backed up with the support.
Ex: Debian,Fedora,Ubuntu



LINUX Licences:

GNU Public Licences(GPL)  ---GPL States that the software realesed is free .It's acceptable to take the software and resell it for his own profit,But when reseling  and if any changes made in the code ,u need to release the full source code  including the changes  at GPL platform and also the new source code will be under GPL . EX. Redhat

BSD & Apache  -- These types of licences gives the user to modify the source code without disclosing the changes made in the source code.


------------------------------------------------------------------------------------------------

Basic Linux System Adminsitration Tasks;

1. User Management
2. Logical Volume Management
3. Network Management
4.Device Management.
5.Package Management

 --------------------------------------------------------------------------------------

User Management In Linux

1.  Every file or program under Linux is owned by a user.
2.  Each user will be having a unique User ID(UID).
3.  Root user is known as super user which can do all the tasks in linux.
     By default the UID for root user is "0" .
4.  System Users are  normally having the UID from 0 to 499  . The manually   created users will  have UID after that.

 5. All the user information in linux is kept under text files .

Below are the files where the user's information is kept.

1.  /etc/passwd -- this file stores user-name,encrypted password  entry,UID,GID,Gecos,Home directory and login shell informations

2. /etc/shadow ---  this file stores the encrypted passoword information for user accounts.

why was the requirement of  /etc/shadow file if it was possible through /etc/password file only?

 Ans: As we all know that /etc/passwd file is readable by all the users,it was leading to the security treat since it was easy for the hackers to crack the  encrypted password .  So for handling this linux introduced /etc/shadow file that is only readable by root users or other required priviledged programs that requires access to that information.



How to create a user 

Using the "useradd' command we are creating the users in linux.
Whenever we are running the useradd command the ASCII Text File  " /etc/default/useradd" is executed.

Content of /etc/default/useradd

# useradd defaults file
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

 *Above mentioned parameters are automatically taken once the useradd command is executed .

By default, a group will also be created for the new user .




Changing the default values(changing the /etc/default/useradd parameters)


When invoked with only the -D option, useradd will display the current default values

[root@abhi ~]# useradd -D
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes






--------------------------------------------------------------------------------------------------------
Below help page of linux will be helpful in using the useradd command:



Usage: useradd [options]USER-NAME

Options:
  -b,   --base-dir BASE_DIR             base directory for the home directory of the   new account
  -c,  - -comment COMMENT         GECOS field of the new account
  -d, --home-dir HOME_DIR            home directory of the new account
  -e, --expiredate EXPIRE_DATE    expiration date of the new account(The date is specified in the format YYYY-MM-DD.)
  -f, --inactive INACTIVE                 password inactivity period of the new account
  -g, --gid GROUP                            name or ID of the primary group of the new account
  -G, --groups GROUPS                   list of supplementary groups of the new account
  -m, --create-home                          create the user's home directory 
  -M, --no-create-home                      do not create the user's home directory
  -p, --password PASSWORD           encrypted password of the new account
  -r, --system                                     create a system account
  -s, --shell SHELL                            login shell of the new account
  -u, --uid UID                                    user ID of the new account
  -U, --user-group                              create a group with the same name as the user

--------------------------------------------------------------------------------------------------
Example

1. #useradd test

 This will create a user-id and it's home directory . Home directory will be by default "/home/user-id"

 2. # useradd -d /home/test  -p test123  test

Here we are creating a user test with home directory "/home/test"  and the passowrd that will be stored in /etc/shadow will be "test123"

'-p" parameter is not recommended to use until you are not creating the encrypted password using crypt command.

[root@abhi ~]# cat /etc/shadow |grep -i test
test:test123:16452:1:90:7:::
[root@abhi ~]#


3. Creation of  system  user account  with UID 510 and Primary goup ID as 500 .System user acount will not have home directory . But the user will have the no-ageing(means never expiry ) by default.
[root@abhi ~]# useradd -u 510 -g 500 -r test

**** This is helpful when customer requests for user account for collecting some details,who can't create any files or directory except /tmp.


4. If you want to create system user with home directory you need to use -m option .
#useradd -r -m test

5 .Creating a user-ID whose Gecos is "test user". The user account expires on 2015-12-18  and will become inactive after 5 days the user-ID expires.

[root@abhi ~]# useradd -c "test user" -e 2015-12-18  -f 5 test2

[root@abhi ~]# cat /etc/passwd |grep -i test2
test2:x:501:501:test user:/home/test2:/bin/bash






Content of /etc/shadow file after this:

[root@abhi ~]# cat /etc/shadow|grep test2
test2:!!:16452:1:90:7:5:16787:

.Note:   -f 0 means that the user account will become inactive as soon as    user-id expires

            -f -1 means that  user account inactive parameter  will be disbaled    for this    user.



Changing the base-dir )HOME) parameter in /etc/default/useradd file
[root@abhi ~]# useradd -D -b /home/test
[root@abhi ~]# useradd -D
GROUP=100
HOME=/home/test
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes
[root@abhi~]#





----------------------------------------------------------------------------------------------------------------------
                    How to remove a user .

We are using the command "userdel" to remove the user.




# userdel test       -----removes the user from the system(including entry in /etc/passwd & /etc/shadow file) IT will not remove the user's home directory.
#userdel -r test     -----It will remove the user definition and also the home directory of the user.
#userdel -f -r test  -----It will remove the user definition ,home directory and other definitions of user  forcefully,even if he is still logged in.

               Changing the attributes of user

We can change the attributes of users using the "usermod" command.

Below are the options available for usermod command

Usage: usermod [options] LOGIN

Options:
  -c, --comment  COMMENT                   new value of the GECOS field
  -d, --home HOME_DIR                           new home directory for the user account
  -e, --expiredate EXPIRE_DATE             set account expiration date to EXPIRE_DATE
  -f, --inactive INACTIVE                         set password inactive after expiration to INACTIVE
  -g, --gid GROUP                                    force use GROUP as new primary group
  -G, --groups GROUPS                            new list of supplementary GROUPS
   -l, --login NEW_LOGIN                       new value of the login name
  -L, --lock                                                lock the user account
  -m, --move-home                                    move contents of the home directory to the
                                                                  new location (use only with -d)
   -s, --shell SHELL                                  new login shell for the user account
  -u, --uid UID                                          new UID for the user account
  -U, --unlock                                            unlock the user account



#usermod -L test  ---locks the user account
#usermod -U test ----unlocks the user account
# usermod -u 505 test ---changing the UID for the user
# usermod -G admin test  --changing the primary group of user test to admin
#usermod -G  users,admin,system test  -- adding the user "test" to users,admin & test group
#usermod e 2015-12-18  -f 5 test2     -- modifying the account expiry date for the user test2 to 18th dec 2015 and password to be set as inactive after 5 days of expiry.
#usermod -a aks test -- appending the user to the group aks
#usermod -m -d /etc/test test ---moving the home directory and it's contents to new location /etc/test for user test.
---------------------------------------------------------------------------------------------------------------------


                            How to create a group

We can grate a group using the command "groupadd" Group details are stored in files /etc/group and /etc/gshadow .

#groupadd aks   ---creates a group named "aks"
#groupadd -g 508 abhi ---creates a group abhi with GID 508

                        How to delete a group

we can delete a group using the groupdel command

#groupdel aks

                       MOdifying group attributes

Group Attributes are modified using the command "groupmod"

#groupmod -g 510 abhi -- changing the GID for group abhi
#groupmod -n test abhi ---changing the group name from "abhi" to "test"

------------------------------------------------------------------------------------------------------------------------------------



 Some tips on applying Security Hardening   for users.

1. Setting the password policies for particular user 



Listing the current password policies applied to user "test"
#chage -l test

Last password change                                                           : Jan 17, 2015
Password expires                                                                   : Apr 17, 2015
Password inactive                                                                  : never
Account expires                                                                     : never
Minimum number of days between password change        : 0
Maximum number of days between password change        : 90
Number of days of warning before password expires         : 7


 setting  the parameter (Maximum) to 90 . the user will be prompted for changing the password  after 90 days.

  # chage -M 90 test 
  
       


#chage -W 8 test --  Start warning the user  8 days before password expires