- IBM Introduced GPFS Filesystems in 1998.
- GPFS is a high performance clustered file system developed by IBM .
- GPFS provides concurrent high speed file access to application executing on multiple nodes of cluster
- It is a high-performance shared-disk file system that can provide fast data access from all nodes in a homogenous or heterogenous cluster of IBM UNIX servers running either the AIX or the Linux operating system or windows.
- All nodes in a GPFS cluster have the same GPFS journaled filesystem mounted, allowing multiple nodes to be active at the same time on the same data.
GPFS Filesystem internals
A file system (or stripe group) consists of a set of disks that are used to store file metadata as well as data and structures used by GPFS, including quota files and GPFS recovery
How does the GPFS Filesystem works ?
Whenever a disk is added to GPFS Filesystem , a file system descriptor is written on it . The filesystem desccriptor is written at a fixed position
on each disks which helps the GPFS to identify this disk and its place in a file system.
The filesystem descriptor contains file system specifications and information about the state of the file system.
the GPFS Filesystem uses the concept of inodes,indirect blocks and data blocks to access and store the disks .
what is metadata ?
Inodes and indiret blocks are considered as metadata .
The metadata for each file is stored in the inodes and contains information such as file-name,file-size and last modification timestamp.
For faster access , the inodes of the small files also contains the addresses of all disk blocks that contains the file data.
You can control which disks GPFS uses for storing metadata when creating the file system using the mmcrfs command or
when modifying the file system at a later time by issuing the mmchdisk command.
How to define which disk will be used for storing the metadata ?
already discussed ,the format of the disk descriptor file .
Diskname:::Diskusage:FailureGroup::StoragePool:
The DiskUsage field will decide what kind of data you are going to store in the disk
Below are the options that can be used.
- dataAndMetadata >> indicates that disk stores both data and metadata
- dataOnly >> indicates that disk stores only data
- metadataOnly >> indicates that disk contains only metadata
- descOnly >> indicates that disk contains only file system decsriptor.
We can also use the same options with the mmchdisk command for changing the disk usage options .
But after changing the diskusage paramter using mmchdisk command ,we need to use the mmrestripfs command with -r option to re-allocate the data
as per the new disk parameter. This is online activity but running the mmrestripefs command is I/O intensive,so need to be executed when i/O load is
less.
ex. mmchdisk gpfs0 change -d "gpfsnsd:::dataOnly"
after this confirm whether the changes has been done successfully using the below command
mmlsdisk gpfs0
GPFS and memory
GPFS uses three areas of memory:
- memory allocated from the kernel heap,
- memory allocated within the daemon segment, and
- shared segments accessed from both the daemon and the kernel.
Memory allocated from the kernel heap
GPFS uses kernel memory for control structures such as vnodes and related structures
that establish the necessary relationship with the operating system
Memory allocated within the daemon segment
GPFS uses daemon segment memory for file system manager functions. Because of that, the file system manager
node requires more daemon memory since token states for the entire file system are initially stored there.
File system manager functions requiring daemon memory include:
- Structures that persist for the execution of a command
- Structures that persist for I/O operations
- States related to other nodes
Shared segments accessed from both the daemon and the kernel
Shared segments consist of both pinned and unpinned memory that is allocated at daemon startup.
The initial values are the system defaults. However, you can change these values later using the mmchconfig
The pinned memory is called the pagepool and is configured by setting the pagepool cluster configuration parameter.
This pinned area of memory is used for storing file data and for optimizing the performance of various data access patterns
In a non-pinned area of the shared segment, GPFS keeps information about open and recently opened files. This information is held in two forms:
1. full inode cache
2. stat cache
Pinned memory
GPFS uses pinned memory (also called pagepool memory) for storing file data and metadata in support of I/O operations.
With some access patterns, increasing the amount of pagepool memory can increase I/O performance
Increased pagepool memory can be useful in the following cases:
There are frequent writes that can be overlapped with application execution.
There is frequent reuse of file data that can fit in the pagepool.
The I/O pattern contains various sequential reads large enough that the prefetching data improves performance.
Pinned memory regions cannot be swapped out to disk, which means that GPFS will always consume at least the value of pagepool in system memory.
Non-pinned memory
There are two levels of cache used to store file metadata:
Inode cache
The inode cache contains copies of inodes for open files and for some recently used files that are no longer open.
The maxFilesToCache parameter controls the number of inodes cached by GPFS.
Every open file on a node consumes a space in the inode cache.
Additional space in the inode cache is used to store the inodes for recently used files in case another application needs that data.
The number of open files can exceed the value defined by the maxFilesToCache parameter to enable applications to operate. However,
when the maxFilesToCache number is exceeded, there is not more caching of recently open files, and only open file inode data is kept in the cache.
Stat cache
The stat cache contains enough information to respond to inquiries about the file and open it, but not enough information to read from it or write to it.
A stat cache entry consumes significantly less memory than a full inode. The default value stat cache is four times the maxFilesToCache parameter.
This value may be changed through the maxStatCache parameter on the mmchconfig command.