UNIX SYSTEM ADMINISTRATION : September 2015

IBM Introduced GPFS Filesystems in 1998.
GPFS is a high performance clustered file system developed by IBM .

GPFS provides concurrent high speed file access to application executing on multiple nodes of cluster

It is a high-performance shared-disk file system that can provide fast data access from all nodes in a homogenous or heterogenous cluster of IBM UNIX servers running either the AIX or the Linux operating system or windows.

All nodes in a GPFS cluster have the same GPFS journaled filesystem mounted, allowing multiple nodes to be active at the same time on the same data.

GPFS Filesystem internals

A file system (or stripe group) consists of a set of disks that are used to store file metadata as well as data and structures used by GPFS, including quota files and GPFS recovery

How does the GPFS Filesystem works ?

Whenever a disk is added to GPFS Filesystem , a file system descriptor is written on it . The filesystem desccriptor is written at a fixed position
on each disks which helps the GPFS to identify this disk and its place in a file system.

The filesystem descriptor contains file system specifications and information about the state of the file system.

the GPFS Filesystem uses the concept of inodes,indirect blocks and data blocks to access and store the disks .

what is metadata ?

Inodes and indiret blocks are considered as metadata .
The metadata for each file is stored in the inodes and contains information such as file-name,file-size and last modification timestamp.

For faster access , the inodes of the small files also contains the addresses of all disk blocks that contains the file data.

You can control which disks GPFS uses for storing metadata when creating the file system using the mmcrfs command or
when modifying the file system at a later time by issuing the mmchdisk command.

How to define which disk will be used for storing the metadata ?

already discussed ,the format of the disk descriptor file .

Diskname:::Diskusage:FailureGroup::StoragePool:

The DiskUsage field will decide what kind of data you are going to store in the disk

Below are the options that can be used.

dataAndMetadata >> indicates that disk stores both data and metadata
dataOnly >> indicates that disk stores only data
metadataOnly >> indicates that disk contains only metadata
descOnly >> indicates that disk contains only file system decsriptor.

We can also use the same options with the mmchdisk command for changing the disk usage options .

But after changing the diskusage paramter using mmchdisk command ,we need to use the mmrestripfs command with -r option to re-allocate the data
as per the new disk parameter. This is online activity but running the mmrestripefs command is I/O intensive,so need to be executed when i/O load is
less.

ex. mmchdisk gpfs0 change -d "gpfsnsd:::dataOnly"

after this confirm whether the changes has been done successfully using the below command
mmlsdisk gpfs0

GPFS and memory

GPFS uses three areas of memory:

memory allocated from the kernel heap,
memory allocated within the daemon segment, and
shared segments accessed from both the daemon and the kernel.

Memory allocated from the kernel heap
GPFS uses kernel memory for control structures such as vnodes and related structures
that establish the necessary relationship with the operating system

Memory allocated within the daemon segment
GPFS uses daemon segment memory for file system manager functions. Because of that, the file system manager
node requires more daemon memory since token states for the entire file system are initially stored there.

File system manager functions requiring daemon memory include:

Structures that persist for the execution of a command
Structures that persist for I/O operations
States related to other nodes

Shared segments accessed from both the daemon and the kernel

Shared segments consist of both pinned and unpinned memory that is allocated at daemon startup.
The initial values are the system defaults. However, you can change these values later using the mmchconfig

The pinned memory is called the pagepool and is configured by setting the pagepool cluster configuration parameter.
This pinned area of memory is used for storing file data and for optimizing the performance of various data access patterns

In a non-pinned area of the shared segment, GPFS keeps information about open and recently opened files. This information is held in two forms:
1. full inode cache
2. stat cache

Pinned memory

GPFS uses pinned memory (also called pagepool memory) for storing file data and metadata in support of I/O operations.
With some access patterns, increasing the amount of pagepool memory can increase I/O performance

Increased pagepool memory can be useful in the following cases:
There are frequent writes that can be overlapped with application execution.
There is frequent reuse of file data that can fit in the pagepool.
The I/O pattern contains various sequential reads large enough that the prefetching data improves performance.

Pinned memory regions cannot be swapped out to disk, which means that GPFS will always consume at least the value of pagepool in system memory.

Non-pinned memory
There are two levels of cache used to store file metadata:

Inode cache
The inode cache contains copies of inodes for open files and for some recently used files that are no longer open.
The maxFilesToCache parameter controls the number of inodes cached by GPFS.

Every open file on a node consumes a space in the inode cache.
Additional space in the inode cache is used to store the inodes for recently used files in case another application needs that data.

The number of open files can exceed the value defined by the maxFilesToCache parameter to enable applications to operate. However,
when the maxFilesToCache number is exceeded, there is not more caching of recently open files, and only open file inode data is kept in the cache.

Stat cache
The stat cache contains enough information to respond to inquiries about the file and open it, but not enough information to read from it or write to it.

A stat cache entry consumes significantly less memory than a full inode. The default value stat cache is four times the maxFilesToCache parameter.

This value may be changed through the maxStatCache parameter on the mmchconfig command.

Steps to add the disks to the filesystem

step 1 : Before adding a disks in the GPFS ,take the details of GPFS disks .

# mmlsnsd and also verify using the command

# mmlsnsd
File system Disk name NSD servers
--------------------------------------------------------------------------
gpfs0 nsd08 (directly attached)

gpfs0 nsd09 (directly attached)

#mmlsnsd -m >> this gives details of the corresponding disk and ID .

Step 2 : Before adding the disk in GPFS filesystem ,we need to create the
GPFS Disk using the command mmcrnsd.

For creating a nsd we need to create a disk descriptor file . The format of the file is as follows .
it is not necessary to to define all fields.

disk-Name:Primaryserver:backupserver:diskusage:failuregroup:desiredname:storagepool

I am going to add hdisk1,hdisk2,hdisk3,hdisk4,hdisk5,hdisk6 to the filesystem gpfs0 .

Create the file /tmp/abhi/gpfs-disks.txt .

hdisk1:::dataAndMetadata::nsd01::
hdisk2:::dataAndMetadata::nsd02::
hdisk3:::dataAndMetadata::nsd03::
hdisk4:::dataAndMetadata::nsd04::
hdisk5:::dataAndMetadata::nsd05::
hdisk6:::dataAndMetadata::nsd06::

#mmcrnsd -F /tmp/abhi/gpfs-disks.txt

mmcrnsd: Processing disk hdisk1
mmcrnsd: Processing disk hdisk2
mmcrnsd: Processing disk hdisk3
mmcrnsd: Processing disk hdisk4
mmcrnsd: Processing disk hdisk5
mmcrnsd: Processing disk hdisk6
mmcrnsd: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

Once the command is sucessful ,we can see that NSD names corresponding to the disks in lspv output.

# lspv
hdisk0 00c334b6af00e77b rootvg active
hdisk1 none nsd01
hdisk2 none nsd02
hdisk3 none nsd03
hdisk4 none nsd04
hdisk5 none nsd05
hdisk6 none nsd06
hdisk8 none nsd08
hdisk9 none nsd09

Also we need to verify using the mmlsnsd command .

# mmlsnsd
File system Disk name NSD servers
--------------------------------------------------------------------------
gpfs0 nsd08 (directly attached)

gpfs0 nsd09 (directly attached)

(free disk) nsd01 (directly attached)

(free disk) nsd02 (directly attached)

(free disk) nsd03 (directly attached)

(free disk) nsd04 (directly attached)

(free disk) nsd05 (directly attached)

(free disk) nsd06 (directly attached)

step 3 -- after this we need to add the disks to the filesystems

Before adding the disk to the GPFS filesystems ,we need to create a disk descriptor file .
since while creating the NSD ,we have already defined some of the parameters so no need to define it again here .
Below fields "diskname",datausage ,failure group,storagepool should be defined

by default GPFS Cluster will have one storage pool that is "system" but we can define many storage pools as per our requirement.

diskname:::diskusage:failuregroup::storagepool:

cat /tmp/abhi/gpfs-disk.txt
nsd01:::dataAndMetadata:-1::system
nsd02:::dataAndMetadata:-1::system
nsd03:::dataAndMetadata:-1::system

#mmadddisk gpfs -F /tmp/cg/gpfs-disk.txt -r >>>-r option is used here for re-balancing the data on all the new disks

Note: Rebalancing of data is I/O intensive job . it is not preferred to use this option during peak load .

Once added verify the disk size using the df -gt and also the output of #mmlsnsd.

# mmlsnsd
File system Disk name NSD servers
--------------------------------------------------------------------------
gpfs0 nsd08 (directly attached)

gpfs0 nsd09 (directly attached)

gpfs0 nsd01 (directly attached)

gpfs0 nsd02 (directly attached)

gpfs0 nsd03 (directly attached)

gpfs0 nsd04 (directly attached)

gpfs0 nsd05 (directly attached)

gpfs0 nsd06 (directly attached)

UNIX SYSTEM ADMINISTRATION

Saturday, September 19, 2015

Introduction to GPFS Filesystem

GPFS Filesystem internals

GPFS and memory

Monday, September 14, 2015

Adding the space or disks in GPFS Filesystem

Pages