Pages

RHEL 7, 8, 9, 10 – Storage Issues

Storage issues in Red Hat Enterprise Linux (RHEL) are among the most critical problems administrators face. They can cause boot failures, application downtime, data corruption, or performance degradation.

This guide provides a structured troubleshooting approach that works consistently across RHEL 7 through RHEL 10.

1. Identify the Storage Problem
Start by understanding what type of storage issue you are facing.
Symptom                                                     Likely Cause
Filesystem full                                          → Disk usage or log growth
Mount fails at boot                                            → /etc/fstab error
Disk not detected                                               → Hardware or driver issue
LVM volumes missing                                    → VG/LV not activated
Read-only filesystem                               → Filesystem corruption
Slow I/O                                                                → Disk or SAN performance
iSCSI/NFS not mounting                               → Network or auth issue

2. Check Disk Detection and Hardware Status
List Block Devices

# lsblk
Check Disk Details
# blkid
# fdisk -l
Check Kernel Disk Messages
# dmesg | grep -i sd
If disks are missing, verify:
  • SAN mapping
  • VM disk attachment
  • Hardware health
3. Filesystem Full Issues
Check Disk Usage

# df -h
Find Large Files
# du -sh /* 2>/dev/null
Clear Logs Safely
# journalctl --vacuum-time=7d

4. Read-Only Filesystem Issues
This usually indicates filesystem corruption.
Verify Mount Status
# mount | grep ro
Remount (Temporary)
# mount -o remount,rw /
Permanent Fix
Boot into rescue mode
Run:
# fsck -y /dev/mapper/rhel-root
Never run fsck on mounted filesystems.

5. Fix /etc/fstab Mount Failures
Incorrect entries cause boot into emergency mode.
Check fstab
# vi /etc/fstab
Verify UUIDs
# blkid
Test fstab
# mount -a
Comment out invalid entries if necessary.

6. LVM Issues (Most Common in RHEL)
Check LVM Status

# pvs
# vgs
# lvs
Activate Volume Groups
# vgchange -ay
Scan for Missing Volumes
# pvscan
# vgscan
# lvscan

7. Extend LVM Filesystem (Low Space)
Extend Logical Volume

# lvextend -L +10G /dev/rhel/root
Resize Filesystem
# xfs_growfs /
# resize2fs /dev/rhel/root

8. Recover Missing or Corrupt LVM
Rebuild LVM Metadata

# vgcfgrestore vg_name
List backups:
# ls /etc/lvm/archive/

9. Boot Fails Due to Storage Issues
Check initramfs

# lsinitrd
Rebuild initramfs
# dracut -f
Verify Root Device
# blkid

10. NFS Storage Issues
Check Mount Status

# mount | grep nfs
Test Connectivity
# showmount -e server_ip
Restart Services
# systemctl restart nfs-client.target

11. iSCSI Storage Issues
Check iSCSI Sessions

# iscsiadm -m session
Discover Targets
# iscsiadm -m discovery -t sendtargets -p target_ip
Login to Target
# iscsiadm -m node -l

12. Multipath Issues (SAN Storage)
Check Multipath Status
# multipath -ll
Restart Multipath
# systemctl restart multipathd

13. Storage Performance Issues
Check Disk I/O

# iostat -xm 5
Identify Slow Processes
# iotop

14. SELinux Storage-Related Issues
SELinux may block access to mounted volumes.
Check Denials
# ausearch -m avc -ts recent
Fix Context
# restorecon -Rv /mount_point

15. Backup and Data Safety (Before Fixes)
Always verify backups before major storage changes.

# rsync -av /data /backup

16. Best Practices to Prevent Storage Issues
  • Monitor disk usage proactively
  • Validate /etc/fstab changes
  • Use LVM snapshots
  • Keep rescue media available
  • Monitor SAN/NAS health
  • Perform regular filesystem checks
Conclusion
Storage troubleshooting in RHEL 7, 8, 9, and 10 follows consistent principles:
  • Verify hardware and detection
  • Fix filesystem and LVM issues
  • Validate mounts and network storage
  • Monitor performance and prevent recurrence
Using this step-by-step approach ensures data integrity, stability, and minimal downtime in enterprise Linux environments.

RHEL 7, 8, 9, 10 – Network Issues

Network issues in Red Hat Enterprise Linux (RHEL) can cause service outages, application failures, storage disconnects, and cluster instability.

This guide provides a systematic, version-aware approach to diagnosing and fixing network problems across RHEL 7 through RHEL 10.

1. Identify the Network Problem
Start by identifying what exactly is failing.
Symptom                                            Possible Cause
No network connectivity                      → Interface down, cable, driver
Cannot reach gateway                           → Routing issue
DNS not resolving                                  → DNS configuration
Network slow                                           → Duplex / MTU / congestion
Interface missing                                     → Driver or udev issue
Network fails after reboot                   → NetworkManager config
Services unreachable                             → Firewall or SELinux

2. Check Network Interface Status
List Interfaces
# ip link show
Check Interface IP
# ip addr show
Bring Interface Up
# ip link set eth0 up

3. Verify Network Services (RHEL Differences)
RHEL Version                        Network Service
RHEL 7                                → NetworkManager / network
RHEL 8+                              → NetworkManager only
Check NetworkManager
# systemctl status NetworkManager
Restart if needed:
# systemctl restart NetworkManager

4. Test Basic Connectivity
Test Loopback
# ping 127.0.0.1
Test Gateway
# ping <gateway-ip>
Test External IP
# ping 8.8.8.8
If IP works but hostname fails → DNS issue.

5. Check Routing Table
# ip route show
Ensure a default route exists:
default via <gateway> dev eth0
Add route (temporary):
# ip route add default via <gateway>

6. DNS Troubleshooting
Check DNS Configuration

# cat /etc/resolv.conf
Test DNS Resolution
# nslookup google.com
# dig google.com
NetworkManager DNS
# nmcli dev show | grep DNS

7. NetworkManager (nmcli) Troubleshooting
Show Connections

# nmcli connection show
Check Active Connection
# nmcli device status
Restart Connection
# nmcli connection down <conn-name>
# nmcli connection up <conn-name>

8. Fix Network Issues After Reboot
Check auto-connect:

# nmcli connection show <conn-name> | grep autoconnect
Enable:
# nmcli connection modify <conn-name> connection.autoconnect yes

9. Firewall Issues (firewalld)
Check Firewall Status

# firewall-cmd --state
List Rules
# firewall-cmd --list-all
Allow Service or Port
# firewall-cmd --add-service=ssh --permanent
# firewall-cmd --add-port=8080/tcp --permanent
# firewall-cmd --reload

10. SELinux Network-Related Issues
SELinux can block network connections.
Check SELinux Status
# getenforce
Identify Denials
# ausearch -m avc -ts recent
Enable Required Boolean
# setsebool -P httpd_can_network_connect on

11. Interface Missing or Renamed
List NICs

# lspci | grep -i ethernet
Check Drivers
# lsmod | grep <driver>
Predictable Interface Names
# ip link
Example: ens192 instead of eth0

12. MTU and Performance Issues
Check MTU

# ip link show eth0
Set MTU (Temporary)
# ip link set dev eth0 mtu 9000
Make Permanent
# nmcli connection modify <conn-name> 802-3-ethernet.mtu 9000

13. Bonding / Teaming Issues
Check Bond Status

# cat /proc/net/bonding/bond0
Restart Bond
# nmcli connection down bond0
# nmcli connection up bond0

14. Network Logs and Debugging
Kernel Messages
# dmesg | grep -i network
NetworkManager Logs
# journalctl -u NetworkManager

15. Network Storage Impact (NFS / iSCSI)
Network failures may affect storage mounts.
# showmount -e server_ip
# iscsiadm -m session

16. Best Practices to Prevent Network Issues
  • Use NetworkManager consistently
  • Validate firewall rules
  • Document static IP settings
  • Monitor network latency
  • Test changes before reboot
  • Keep NIC drivers updated
Conclusion
Network troubleshooting in RHEL 7, 8, 9, and 10 follows the same fundamentals:
  • Verify interfaces and IPs
  • Check routing and DNS
  • Validate NetworkManager
  • Review firewall and SELinux
Using this step-by-step approach ensures quick resolution and stable connectivity in enterprise Linux environments.

Splunk Server and Forwarder Installation

In any enterprise environment, log collection and analysis are critical for security monitoring, performance troubleshooting, and threat detection. Splunk is a market-leading platform that helps organizations collect, index, and visualize machine data.

However, manually installing Splunk Enterprise Server and configuring forwarders on several client machines can become time-consuming. In this blog post, we will automate the process from end-to-end.

Understanding the Components

Splunk Enterprise Server
This is the main Splunk system that stores, indexes, and searches all logs. It provides:
  • Web UI
  • Indexing database
  • Search head
  • User management
  • Dashboard visualization
Splunk Universal Forwarder
This is a lightweight agent installed on client machines. It:
  • Sends logs to the Splunk server
  • Runs silently as a background service
  • Consumes minimal CPU & memory
Prerequisites:

Server Requirements

  • OS: Linux (Ubuntu/RHEL/CentOS/Amazon Linux)
  • 4+ GB RAM
  • 20+ GB Disk
  • Port 8000 (Web), 8089 (mgmt), 9997 (data input) open
Client Requirements
  • Linux-based client machines
  • sudo access
  • Network reachability to server port: 9997
Download Splunk Installer Links
Component Download URL
Splunk Enterprise https://www.splunk.com/en_us/download/splunk-enterprise.html
Splunk Forwarder https://www.splunk.com/en_us/download/universal-forwarder.html

SPLUNK ENTERPRISE SERVER INSTALLATION 

Step 1: Update system

# dnf update -y

Step 2: Create Splunk OS User

Splunk should never run as root.
# useradd -m splunk
Verify:
# id splunk
uid=1001(splunk) gid=1001(splunk) groups=1001(splunk)

Step 3: Download Splunk Enterprise
Download Splunk from the official Splunk website and copy it to your server (example path used below):
/root/splunk-9.0.2-17e00c557dc1-linux-2.6-x86_64.rpm

Step 4: Install Splunk Enterprise
Install the RPM package:
# rpm -ivh splunk-9.0.2-17e00c557dc1-linux-2.6-x86_64.rpm
By default, Splunk installs to:
# ls -ld /opt/splunk/
drwx------ 12 splunk splunk 4096 Dec 20 03:04 /opt/splunk/
[root@inddcpspn01 ~]#

Step 5: Set Correct Ownership
Give ownership of Splunk files to the splunk user:
# chown -R splunk:splunk /opt/splunk

Step 6: First Start of Splunk (Create Admin User)
This step is critical
The admin user is created only on the first successful start.
Run the following command as the splunk user:
# sudo -u splunk /opt/splunk/bin/splunk start
Type q
Do you agree with this license? [y/n]: y
Please enter an administrator username:admin
Please enter a new password:Welcome@123
Please confirm new password:Welcome@123

Step 7: Verify Admin User Creation
Check the password file:
# ls -l /opt/splunk/etc/passwd
# cat /opt/splunk/etc/passwd
You should see:
:admin:$6$5SYFmoISyswPtUPt$AXKb2n0RD7mL8UAz1wyZkgTdHkHWFIes/9DMz.4gw3.xnVyLyxpzj1mADGt8HTVJ.ky7f8tay1.bg.7osl7ci1::Administrator:admin:changeme@example.com:::20441
If this file exists, the admin user is created successfully.

Step 8: Enable Splunk at Boot
Install chkconfig if your system complains it's missing (e.g., on RHEL/CentOS 9)
$ dnf install chkconfig
$ sudo -u splunk /opt/splunk/bin/splunk stop
$ /opt/splunk/bin/splunk enable boot-start -user splunk
Init script installed at /etc/init.d/splunk.
Init script is configured to run at boot.
$ sudo -u splunk /opt/splunk/bin/splunk start

Step 9: Start / Stop Splunk
Start Splunk
$ sudo -u splunk /opt/splunk/bin/splunk start
Stop Splunk
$ sudo -u splunk /opt/splunk/bin/splunk stop
Check Status
$ sudo -u splunk /opt/splunk/bin/splunk status

Step 10: Access Splunk Web UI
Open a browser and go to:
http://<server-ip>:8000 or http://<Server FQDN>:8000

Login with:
Username: admin
Password: Welcome@123

Step 11: (Optional) Firewall Configuration

Allow Splunk Web port:
# firewall-cmd --add-port=8000/tcp --permanent
# firewall-cmd --reload

Common Issues & Fixes
Admin password not working
Splunk was started before --seed-passwd
/opt/splunk/etc/passwd not created
Wrong user used to start Splunk
Fix: Stop Splunk, remove init files, and start again with --seed-passwd.

ENABLE SPLUNK DATA INPUT (TCP 9997)
Log into Splunk Web UI:
URL:
http://<server-ip>:8000 or http://<Server FQDN>:8000
Then:
Enable Receiving Port
Go to Settings → Forwarding and Receiving
Click Configure Receiving
Click New Receiving Port
Enter:
Port: 9997

Save
Verify Receiving Port
# netstat -tulnp | grep 9997
tcp        0      0 0.0.0.0:9997            0.0.0.0:*               LISTEN      42402/splunkd
# ss -tulnp | grep 9997
tcp   LISTEN 0      128          0.0.0.0:9997      0.0.0.0:*    users:(("splunkd",pid=42402,fd=197))

SPLUNK FORWARDER INSTALLATION ON CLIENT 

Step 1: Create Splunk User on Client
Splunk services should not run as root.
# useradd -m splunk
Verify:
# id splunk
uid=1001(splunk) gid=1001(splunk) groups=1001(splunk)

Step 2: Download Splunk Universal Forwarder
Download the Universal Forwarder package from the Splunk website and copy it to the client server.
Example RPM file:
splunkforwarder-9.0.2-17e00c557dc1-linux-2.6-x86_64.rpm

Step 3: Install Splunk Universal Forwarder
Install the RPM:
# rpm -ivh splunkforwarder-9.0.2-17e00c557dc1-linux-2.6-x86_64.rpm
Default installation path:
# ls -ld /opt/splunkforwarder
drwxr-xr-x 9 splunk splunk 4096 Dec 19 22:02 /opt/splunkforwarder

Step 4: Set Correct Ownership
# chown -R splunk:splunk /opt/splunkforwarder

Step 5: First Start of Splunk Forwarder
Start the Universal Forwarder for the first time:
$ sudo -u splunk /opt/splunkforwarder/bin/splunk start 
Type q
Do you agree with this license? [y/n]: y
Please enter an administrator username: admin
Please enter a new password: Welcome@123
Please confirm new password: Welcome@123

Step 6: Enable Forwarder to Start at Boot
Install chkconfig using dnf or yum. 
# dnf install chkconfig
$ sudo -u splunk /opt/splunkforwarder/bin/splunk stop
# /opt/splunkforwarder/bin/splunk enable boot-start -user splunk
Systemd unit file installed by user at /etc/systemd/system/SplunkForwarder.service.
Configured as systemd managed service.
$ sudo -u splunk /opt/splunkforwarder/bin/splunk start

Step 7: Configure Forwarder to Send Data to Indexer
Add Splunk Indexer as Receiving Destination
$ sudo -u splunk /opt/splunkforwarder/bin/splunk add forward-server 192.168.10.109:9997
Warning: Attempting to revert the SPLUNK_HOME ownership
Warning: Executing "chown -R splunk /opt/splunkforwarder"
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Splunk username: admin
Password:
Added forwarding to: 192.168.10.109:9997.

Verify:
$ sudo -u splunk /opt/splunkforwarder/bin/splunk list forward-server
Warning: Attempting to revert the SPLUNK_HOME ownership
Warning: Executing "chown -R splunk /opt/splunkforwarder"
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Active forwards:
        192.168.10.109:9997
Configured but inactive forwards:
        None

Step 8: Add Log Files to Monitor
Example: Monitor Linux system logs
$ sudo -u splunk /opt/splunkforwarder/bin/splunk add monitor /var/log/messages
Warning: Attempting to revert the SPLUNK_HOME ownership
Warning: Executing "chown -R splunk /opt/splunkforwarder"
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Added monitor of '/var/log/messages'.

For Ubuntu:
$ sudo -u splunk /opt/splunkforwarder/bin/splunk add monitor /var/log/syslog

Step 9: Restart Splunk Forwarder
$ sudo -u splunk /opt/splunkforwarder/bin/splunk restart

Step 10: Verify Data on Splunk Server
On the Splunk Enterprise server:
Login to Splunk Web : http://<server-ip>:8000 or http://<Server FQDN>:8000
Go to Search & Reporting
Run:
index=_internal | stats count by host
You should see the client hostname.

Firewall Configuration (Optional)
Allow outgoing traffic to indexer:
# firewall-cmd --add-port=9997/tcp --permanent
# firewall-cmd --reload

Common Issues & Troubleshooting
  • Forwarder not sending data
  • Indexer port 9997 not enabled
  • Firewall blocking traffic
  • Incorrect indexer IP
Check forwarder status
$ sudo -u splunk /opt/splunkforwarder/bin/splunk status
Warning: Attempting to revert the SPLUNK_HOME ownership
Warning: Executing "chown -R splunk /opt/splunkforwarder"
egrep: warning: egrep is obsolescent; using grep -E
splunkd is running (PID: 1768).
splunk helpers are running (PIDs: 1794).
egrep: warning: egrep is obsolescent; using grep -E

Check logs
$ tail -f /opt/splunkforwarder/var/log/splunk/splunkd.log

Conclusion
You have successfully installed and configured the Splunk Enterprise Server and the Splunk Universal Forwarder on the Splunk server and Splunk client machine. The Splunk client is now actively forwarding log data to the Splunk Enterprise server, enabling centralized log collection, monitoring, and analysis across the environment.

This setup provides better visibility into system activity, faster troubleshooting, and a scalable foundation for enterprise-level monitoring and observability.

RHEL 7, 8, 9, 10 – Security Issues

Security issues in Red Hat Enterprise Linux (RHEL) can surface as login failures, service denials, SELinux blocks, firewall problems, authentication errors, or compliance violations.

This guide provides a structured troubleshooting methodology applicable to RHEL 7 through RHEL 10.

1. Identify the Type of Security Issue
Before making changes, determine what is being blocked.
User cannot log in → PAM / SSH / SELinux
Service not accessible → Firewall / SELinux
Permission denied → SELinux / file context
SSH connection refused → sshd / firewall
Application fails after reboot → SELinux labeling
Compliance scan failures → OpenSCAP / crypto policy

2. Check System Logs First (Golden Rule)
Authentication and Security Logs

/var/log/secure
systemd Journal (All Versions)
# journalctl -xe
# journalctl -u sshd

3. SELinux Troubleshooting (Most Common Issue)
SELinux is enabled by default in all RHEL versions.
Check SELinux Status
# getenforce
# sestatus
Identify SELinux Denials
# ausearch -m avc -ts recent
Or:
# journalctl | grep AVC
Interpret SELinux Alerts
# sealert -a /var/log/audit/audit.log
Fix SELinux Issues (Recommended Approach)
Restore File Contexts
# restorecon -Rv /path
Enable Required Booleans
# getsebool -a | grep httpd
# setsebool -P httpd_can_network_connect on
Temporary Disable (For Testing Only)
# setenforce 0
Permanent disable (NOT recommended):
# vi /etc/selinux/config

4. Firewall Issues (firewalld)
Check Firewall Status

# systemctl status firewalld
firewall-cmd --state
List Active Rules
# firewall-cmd --list-all
Allow a Service or Port
# firewall-cmd --add-service=http --permanent
# firewall-cmd --add-port=8080/tcp --permanent
# firewall-cmd --reload
Verify Zones
# firewall-cmd --get-active-zones

5. SSH Security Issues
Check SSH Service

systemctl status sshd
Verify SSH Configuration
# sshd -t
# vi /etc/ssh/sshd_config
Common issues:
  • PermitRootLogin no
  • PasswordAuthentication no
  • Wrong SSH port
Restart SSH Safely
# sshd -t && systemctl restart sshd

6. User Authentication & PAM Issues
Verify User Account

# id username
# passwd -S username
Check Account Lockout
# faillog -u username
# pam_tally2 --user username    # RHEL 7
# faillock --user username          # RHEL 8+
Reset Failed Login Count
# faillock --user username --reset

7. File and Directory Permission Issues
Check Ownership

# ls -ld /path
Fix Permissions
# chmod 755 /path
# chown user:group /path
Permissions alone may not fix SELinux issues.

8. sudo Issues
Check sudo Access

# sudo -l
Validate sudoers File
# visudo
Check:
username ALL=(ALL) ALL

9. Security Updates and Patch Issues
Check Installed Security Updates
# yum updateinfo list security # RHEL 7
# dnf updateinfo list security # RHEL 8+
Apply Security Updates
# yum update --security
# dnf update --security

10. OpenSCAP & Compliance Failures
Scan System

# oscap xccdf eval --profile standard --results scan.xml /usr/share/xml/scap/ssg/content/ssg-rhel*.xml
Common Compliance Failures
  • Password complexity
  • SSH hardening
  • File permissions
  • Crypto policies
11. Crypto Policy Issues (RHEL 8+)
Check Current Policy
# update-crypto-policies --show
Set Default Policy
# update-crypto-policies --set DEFAULT

12. Auditd Issues
Check Audit Service

# systemctl status auditd
Search Audit Logs
# ausearch -k ssh

13. Container Security Issues (RHEL 8+)
SELinux + Containers

# podman inspect container_name | grep SELinux
Fix volume labels:
:Z or :z

14. Kernel & Security Module Issues
Check Loaded Modules

# lsmod
Rebuild SELinux Labels
# touch /.autorelabel
# reboot

15. Best Practices to Prevent Security Issues
  • Keep SELinux enabled
  • Monitor /var/log/secure
  • Apply security patches regularly
  • Use firewalld zones properly
  • Test changes in non-production
  • Enable audit logging
Conclusion
Security troubleshooting in RHEL 7, 8, 9, and 10 follows a consistent methodology:
  • Identify blocked access
  • Review logs
  • Check SELinux and firewall
  • Validate authentication and permissions
  • Apply fixes systematically
Following these steps ensures secure, compliant, and stable systems in enterprise environments.

RHEL 7, 8, 9 &10 – Bootloader Issues

GRUB (Grand Unified Bootloader) problems are among the most common causes of Linux systems failing to boot. Across RHEL 7, 8, 9, and future RHEL 10, the bootloader stack remains GRUB2 + systemd, with differences mainly in BIOS vs UEFI handling.

This guide provides a version-aware, step-by-step approach to diagnosing and fixing GRUB issues in all supported RHEL versions.

1. Common GRUB Issues in RHEL
Symptom                       Likely Cause
No GRUB menu               → Missing or corrupted GRUB
grub> prompt                   →  GRUB config missing
grub rescue>                     → Core GRUB files missing
Boot loops                     → Wrong kernel or root device
Kernel not found           → Incorrect grub.cfg
System boots to rescue  → Wrong kernel parameters

2. Understand GRUB Differences by RHEL Version
RHEL Version               Firmware                             GRUB Location
RHEL 7                         BIOS / UEFI                        /boot/grub2/grub.cfg
RHEL 8                         UEFI default                        /boot/efi/EFI/redhat/grub.cfg
RHEL 9                         UEFI only (mostly)             /boot/efi/EFI/redhat/grub.cfg
RHEL 10                       UEFI only (expected)          /boot/efi/EFI/redhat/grub.cfg

3. Access the GRUB Menu

Reboot the system
Press Esc or Shift
If GRUB appears, select Advanced options
Try booting an older kernel
If this works, the issue is likely a broken kernel or config, not GRUB itself.

4. Fix Temporary GRUB Issues (Edit Boot Parameters)
Highlight the kernel
Press e
Find the line starting with linux
Common Debug Parameters
rd.break
systemd.unit=rescue.target
systemd.unit=emergency.target
selinux=0
nomodeset
Press Ctrl + X to boot.

5. Fix “grub>” or “grub rescue>” Prompt
Identify Boot and Root Partitions
ls
ls (hd0,gpt1)/
Set Correct Root
set root=(hd0,gpt1)
set prefix=(hd0,gpt1)/boot/grub2
insmod normal
normal
If GRUB loads, reinstall it permanently (see Section 8).

6. Boot Using RHEL Installation ISO (Rescue Mode)
This is the most reliable recovery method.
Boot from RHEL 7/8/9 ISO
Select:
Troubleshooting → Rescue a Red Hat Enterprise Linux system
Mount the system automatically
Enter shell

7. Chroot into Installed System
# chroot /mnt/sysimage
From here, all fixes apply directly to your installed OS.

8. Reinstall GRUB (Correct Method by Version)
RHEL 7 – BIOS Systems
# grub2-install /dev/sda
# grub2-mkconfig -o /boot/grub2/grub.cfg

RHEL 7 – UEFI Systems
# yum reinstall grub2-efi shim
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

RHEL 8, 9, 10 – UEFI Systems
# dnf reinstall grub2-efi shim
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

Verify EFI entries:
# efibootmgr -v

9. Rebuild GRUB Configuration Only (If GRUB Exists)
Sometimes only grub.cfg is broken.
RHEL 7 (BIOS)

# grub2-mkconfig -o /boot/grub2/grub.cfg
RHEL 8/9/10 (UEFI)
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

10. Fix Wrong Root or UUID in GRUB
Check actual UUIDs:
# blkid
Update GRUB config if root device changed:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

11. Fix GRUB After Disk or Partition Changes
If disk order changed (sda → sdb):
Verify disks:

# lsblk
Reinstall GRUB to correct disk:
# grub2-install /dev/sda

12. Secure Boot Issues (RHEL 8+)
If system fails after enabling Secure Boot:
# dnf reinstall shim grub2-efi kernel
Ensure Secure Boot–signed kernels are installed.

13. SELinux and GRUB Boot Failures
Temporary Fix
Edit GRUB kernel line:

selinux=0
Permanent Fix
# touch /.autorelabel
# reboot

14. Best Practices to Avoid GRUB Issues
  • Keep multiple kernels installed
  • Avoid manual GRUB edits
  • Always regenerate grub.cfg after disk changes
  • Keep rescue ISO available
  • Use snapshots before kernel updates (VMs)
15. Quick GRUB Recovery Checklist
  • Boot older kernel
  • Rescue mode via ISO
  • Chroot into system
  • Reinstall GRUB
  • Regenerate grub.cfg
  • Verify EFI boot entries
Conclusion
GRUB issues across RHEL 7, 8, 9, and 10 follow the same recovery principles:
  • Identify firmware (BIOS vs UEFI)
  • Use rescue mode
  • Reinstall GRUB properly
  • Regenerate configuration files
Mastering these steps ensures fast recovery and minimal downtime in enterprise Linux environments.

Chef Installation

Chef is a powerful configuration management tool that helps automate infrastructure, manage configurations, and ensure consistency across environments. 

Chef Server: Central hub that stores cookbooks, policies, and node metadata
Chef Workstation: Used by admins to develop cookbooks and interact with the server using Knife
Chef Infra Client (Node): Target system managed by Chef
Chef Manage: Web-based UI for managing Chef Server

Download and Upload Chef Packages
Download the required RPM packages from Chef Downloads(https://www.chef.io/downloads) and upload them to the Chef Server using WinSCP or scp.
Packages used in this setup:
  • Chef Infra Server: `chef-server-core-14.9.23-1.el7.x86_64.rpm`
  • Chef Workstation: `chef-workstation-21.10.640-1.el7.x86_64.rpm`
  • Chef Manage: `chef-manage-2.5.4-1.el7.x86_64.rpm`
  • Chef Infra Client: `chef-17.6.18-1.el7.x86_64.rpm`
Install Chef Infra Server
Log in as root on the Chef Server and install the package:
# cd /tmp/chef
# dnf install chef-server-core-14.9.23-1.el7.x86_64.rpm -y
Configure the Chef Server:
# chef-server-ctl reconfigure
Chef License Acceptance
Before you can continue, 3 product licenses
must be accepted. View the license at
https://www.chef.io/end-user-license-agreement/
Licenses that need accepting:
  • Chef Infra Server
  • Chef Infra Client
  • Chef InSpec
Do you accept the 3 product licenses (yes/no)?
> yes
Check the status of Chef services:
# chef-server-ctl status

Create Chef Admin User
Create an administrator user:
# chef-server-ctl user-create admin System Admin sysadm@ppc.com 'Welcome@123' \
--filename /etc/opscode/admin.pem
Create the Organization:
# chef-server-ctl org-create chefmng 'chefmanager' --association_user admin --filename /etc/opscode/org-validator.pem
List existing organizations:
# chef-server-ctl org-list
Verify private keys:
# find /etc/opscode/ -name "*.pem"

Install Chef Manage on the Chef Server:
# cd /tmp/chef
# dnf install chef-manage-2.5.4-1.el7.x86_64.rpm -y
# chef-server-ctl reconfigure
# chef-manage-ctl reconfigure
Type 'yes' to accept the software license agreement, or anything else to cancel.
yes
Access the UI in your browser:
https://<chef-server-ip>

Login with user "admin" & password "Welcome@123"

Install Chef Workstation:
On the Chef Workstation machine:
# cd /tmp/chef
# dnf install chef-workstation-21.10.640-1.el7.x86_64.rpm -y
Verify installation:
# chef --version
# knife --version
Set Command Executable Path:
# vi ..bash_profile
export PATH=$PATH:/opt/opscode/bin

Generate a Chef repository:
# chef generate repo chef-repo
+---------------------------------------------+
            Chef License Acceptance
Before you can continue, 1 product license
must be accepted. View the license at
https://www.chef.io/end-user-license-agreement/
License that need accepting:
  * Chef Workstation
Do you accept the 1 product license (yes/no)?
> yes
Create a `.chef` directory for Knife configuration:
# mkdir ~/chef-repo/.chef
# cd ~/chef-repo

Step 7: Configure SSH Access
Generate SSH keys on the Chef Workstation:
# ssh-keygen -b 4096
Copy the public key to the Chef Server:
# ssh-copy-id root@192.168.10.108
Copy the `.pem` files from Chef Server to Workstation:
# scp root@192.168.10.108:/root/*.pem ~/chef-repo/.chef
Verify copied keys:
# ls ~/chef-repo/.chef

Configure Knife:
Create the Knife configuration file:
# vim ~/chef-repo/.chef/config.rb
Add the following content:
current_dir = File.dirname(__FILE__)
log_level                :info
log_location             STDOUT
node_name                "admin"
client_key               "#{current_dir}/admin.pem"
chef_server_url          "https://inddcpchf01.ppc.com/organizations/chefmng"
cookbook_path            ["#{current_dir}/../cookbooks"]

Fetch SSL certificates:
# knife ssl fetch
Verify connectivity:
# knife client list

Install Chef Infra Client
On the client node:
# cd /tmp/chef
# dnf install chef-17.6.18-1.el7.x86_64.rpm -y

Step 10: Bootstrap a Client Node
From the Chef Workstation:
# knife bootstrap <chef client IP Address> --ssh-user <user name> --ssh-password <password> --node-name <chef client node name>
Verify nodes:
# knife node list
# knife node show client-node

Create Cookbook Directory
# mkdir -p ~/chef-repo/cookbooks/sample_nginx
# cd ~/chef-repo/cookbooks/sample_nginx

Generate Cookbook
# chef generate cookbook .

Edit Default Recipe
Edit `recipes/default.rb`:

package 'nginx' do
action :install
end

service 'nginx' do
action [:enable, :start]
end

file '/etc/nginx/sites-available/default' do
content 'server { listen 80; server_name localhost; location / { root /var/www/html; index index.html; } }'
notifies :restart, 'service[nginx]'
end

Upload the cookbook to Chef Server:
# knife cookbook upload sample_nginx
Bootstrap the node with the recipe:
# knife bootstrap <chef client IP Address> --ssh-user <user name> --ssh-password <password> --node-name <chef client node name>
Run Chef Client manually on the node:
# chef-client

Chef Resources:

package (Linux/Unix/Windows)
action ---> :install, :upgrade, :remove, :purge
version ---> Specify version
options ---> Extra CLI options for package manager
timeout ---> Wait time for install

Variables:
node['cookbook']['package_name'] ---> Package name (nginx, httpd, etc.)
node['cookbook']['package_version'] ---> Version to install

service (Linux/Unix/Windows)
action ---> :start, :stop, :restart, :reload, :enable, :disable
supports ---> Hash of supported actions (restart, reload, status)
subscribes ---> Trigger action on resource change
timeout ---> Wait time for service command

Variables:
node['cookbook']['service_name'] ---> Service name
node['cookbook']['service_action'] ---> Desired actions

template
source ---> Template file in cookbook (.erb)
path ---> Target path (override resource name)
owner ---> File owner
group ---> File group
mode ---> File permissions (0644)
variables ---> Hash of variables passed to template (@var)
action ---> :create, :create_if_missing, :delete
notifies ---> Trigger another resource on change
backup ---> Number of backups to keep

Variables:
node['cookbook']['doc_root'] ---> Document root (Linux)
node['cookbook']['iis_root'] ---> IIS root (Windows)
node['cookbook']['port'] ---> Port number
node['cookbook']['server_name'] ---> Server hostname

file
content ---> File content
owner ---> File owner
group ---> File group
mode ---> File permissions (0644)
backup ---> Number of backups to keep
action ---> :create, :delete, :touch

Variables:
node['cookbook']['file_path'] ---> File path
node['cookbook']['file_content'] ---> Content

user
comment ---> User description/full name
uid ---> User ID
home ---> Home directory
shell ---> Login shell
password ---> Hashed password
manage_home ---> Create home directory if true
action ---> :create, :remove, :modify, :lock, :unlock

Variables:
node['cookbook']['user_name'] ---> Username
node['cookbook']['user_home'] ---> Home directory
node['cookbook']['user_shell'] ---> Shell
node['cookbook']['user_password'] ---> Password hash

directory
owner ---> Directory owner
group ---> Directory group
mode ---> Directory permissions (0755)
recursive ---> Create parent directories if missing
action ---> :create, :delete, :nothing

Variables:
node['cookbook']['dir_path'] ---> Path
node['cookbook']['dir_owner'] ---> Owner
node['cookbook']['dir_group'] ---> Group

execute
command ---> Command to execute
cwd ---> Working directory
environment ---> Environment variables
creates ---> Skip execution if file exists
action ---> :run, :nothing

Variables:
node['cookbook']['exec_command'] ---> Command
node['cookbook']['exec_cwd'] ---> Working directory

powershell_script (Windows)
code ---> PowerShell commands to execute
cwd ---> Working directory
guard_interpreter ---> Interpreter for guards (:powershell_script)
action ---> :run, :nothing

Variables:
node['cookbook']['ps_script'] ---> Code string
node['cookbook']['ps_cwd'] ---> Working directory

cron (Linux/Unix)
minute ---> Minute field
hour ---> Hour field
day ---> Day of month
month ---> Month field
weekday ---> Day of week
command ---> Command to execute
user ---> Run as this user
action ---> :create, :delete, :run

Variables:
node['cookbook']['cron_minute'] ---> Minute
node['cookbook']['cron_hour'] ---> Hour
node['cookbook']['cron_command'] ---> Command
node['cookbook']['cron_user'] ---> User

remote_file
source ---> URL or file path to copy from
path ---> Destination path
owner ---> File owner
group ---> File group
mode ---> Permissions (0644)
action ---> :create, :create_if_missing, :delete
checksum ---> Verify file integrity (MD5/SHA256)

Variables:
node['cookbook']['remote_file_source'] ---> URL/path
node['cookbook']['remote_file_path'] ---> Destination
node['cookbook']['remote_file_owner'] ---> Owner
node['cookbook']['remote_file_mode'] ---> Permissions

git
repository ---> Git repo URL
revision ---> Branch, tag, or commit
destination ---> Local clone path
user ---> Run as user
action ---> :checkout, :sync, :export
enable_submodules ---> true/false

Variables:
node['cookbook']['git_repo'] ---> Repo URL
node['cookbook']['git_branch'] ---> Branch or tag
node['cookbook']['git_dest'] ---> Destination path

bash (Linux/Unix)
code ---> Bash commands
cwd ---> Working directory
environment ---> Environment variables
user ---> Run as this user
group ---> Run as this group
action ---> :run, :nothing

Variables:
node['cookbook']['bash_code'] ---> Commands
node['cookbook']['bash_cwd'] ---> Working directory

windows_feature
feature_name ---> Name of Windows feature
action ---> :install, :remove, :nothing
all ---> Install dependent features (true/false)
Variables:
node['cookbook']['feature_name'] ---> Feature to install

ark (Linux/Unix)
url ---> Download URL
path ---> Installation path
owner ---> Owner
group ---> Group
action ---> :put, :install, :cherry_pick
checksum ---> Verify file integrity

Variables:
node['cookbook']['ark_url'] ---> Archive URL
node['cookbook']['ark_path'] ---> Install path

GitLab CE Installation

GitLab CE Installation on RHEL 9 / CentOS 9
GitLab Community Edition (CE) is a powerful, self-hosted DevOps platform that provides Git repository management, CI/CD pipelines, artifact storage, container registry, issue tracking, and more. This guide walks you through installing GitLab CE on RHEL 9 / CentOS 9, configuring a custom external URL, and implementing SSL/TLS using Apache (httpd) as a reverse proxy.

1. Install Required Dependencies

Before installing GitLab, ensure your system has the required packages.
# dnf install -y curl policycoreutils openssh-server openssh-clients

2. Add GitLab CE Repository
Use GitLab’s official repository installation script.
# curl -sS https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.rpm.sh | sudo bash

3. Install GitLab CE
# dnf install -y gitlab-ce
This installs all required GitLab components, including NGINX (bundled), Redis, and PostgreSQL.

4. Configure GitLab URL
Edit the primary GitLab configuration file:
# vim /etc/gitlab/gitlab.rb
Add or modify the external URL:
external_url 'http://www.gitlab.ppc.com'
Save and exit.

5. Reconfigure GitLab
Run the reconfiguration command to generate configurations and start services.
# gitlab-ctl reconfigure
GitLab will now be accessible at:
http://server-hostname
http://server-IP-address

SSL/TLS Implementation Using Apache (httpd)
GitLab comes with a built-in NGINX server, but many enterprises prefer using Apache for SSL termination and reverse proxying.
Below is how to configure Apache with SSL for GitLab.

6. Install Apache HTTP Server
# dnf install -y httpd mod_ssl
# systemctl enable httpd
# systemctl start httpd

7. Generate or Install SSL Certificates
You can use:
Self-signed Certificates (testing)
Let's Encrypt (production)
CA-signed Certificates (enterprise)

To generate a self-signed certificate:
# openssl req -newkey rsa:2048 -nodes -keyout /etc/pki/tls/private/gitlab.key -x509 -days 365 -out /etc/pki/tls/certs/gitlab.crt

8. Configure Apache Reverse Proxy for GitLab
Create a new Apache configuration file:
# vim /etc/httpd/conf.d/gitlab.conf
Add the following configuration:
<VirtualHost *:443>
ServerName www.gitlab.ppc.com

SSLEngine on
SSLCertificateFile /etc/pki/tls/certs/gitlab.crt
SSLCertificateKeyFile /etc/pki/tls/private/gitlab.key

ProxyPreserveHost On

<Location />    --Optional if any port define  Example
ProxyPass http://127.0.0.1:8080/
ProxyPassReverse http://127.0.0.1:8080/
</Location>
</VirtualHost>

<VirtualHost *:80>
ServerName www.gitlab.ppc.com
Redirect permanent / https://gitlab.ppc.com/
</VirtualHost>

Save and exit.

9. Adjust SELinux Policies (if enabled)
# setsebool -P httpd_can_network_connect 1

10. Restart Apache
# systemctl restart httpd
You can now access GitLab using HTTPS:
https://www.gitlab.ppc.com

Conclusion

You have successfully installed GitLab CE on RHEL 9 / CentOS 9, configured the external URL, and set up SSL/TLS security using Apache as a reverse proxy. With GitLab now running securely, you can begin creating repositories, configuring CI/CD pipelines, managing runners, and integrating GitLab with your DevOps ecosystem.

Jenkins Installation

Jenkins Installation on RHEL 9: A Complete Guide
This guide walks through the steps required to install and configure Jenkins on RHEL 9, including setting up Java, dependencies, Jenkins repository, system service configuration, reverse proxy with Apache, Python tooling, Terraform, and PowerShell.

1. Install Required Dependencies
Begin by installing all essential packages, including Java 17, Git, compiler tools, Node.js, Python pip, Docker-related libraries, and others.
# dnf install -y fontconfig java-17-openjdk git gcc gcc-c++ nodejs gettext device-mapper-persistent-data lvm2 bzip2 python3-pip wget libseccomp
# java --version

2. Configure Jenkins Repository
Download and add the Jenkins repository for RHEL-based systems.
# wget -O /etc/yum.repos.d/jenkins.repo https://pkg.jenkins.io/redhat-stable/jenkins.repo
# rpm --import https://pkg.jenkins.io/redhat-stable/jenkins.io.key

3. Install and Start Jenkins
Install Jenkins and configure it as a service.
# dnf install jenkins -y
# systemctl start jenkins
# systemctl enable jenkins
# systemctl status jenkins

Jenkins will now be available on Example : http://192.168.10.106:8080
# cat /var/lib/jenkins/secrets/initialAdminPassword
6bedde9c71eb4d999a5cfdfe43f0d052  -- Enter this Password & Continue 

Install the suggested plugins then Click 
Now plugins installation in progress 
Once Plugin are installed then Set the Admin password & email Address then Save & Continue 

Jenkins URL : IP Address or FQDN hostname with port no 8080 as below then Save & Finish 

Now Jenkins is ready to start 


Install Additional Plugin Ansible, terraform, PowerShell, GitHub, GitLab, AWS, GCP & Azure 





 


4. Configure Apache as a Reverse Proxy for Jenkins
Install and enable Apache HTTP Server.
# dnf install httpd -y  
# systemctl start httpd 
# systemctl enable httpd 
# systemctl status httpd

Navigate to the Apache configuration directory:
# cd /etc/httpd/conf.d/ 
# mv welcome.conf welcome.conf.bkp 
# vi jenkins.conf
ProxyRequests Off
ProxyPreserveHost On
AllowEncodedSlashes NoDecode

<Proxy http://localhost:8080/*>
Order deny,allow
Allow from all
</Proxy>

ProxyPass / http://localhost:8080/ nocanon
ProxyPassReverse / http://localhost:8080/
ProxyPassReverse / http://www.jenkins.ppc.com/

Restart Apache: 
# systemctl restart httpd

Now Jenkins will be accessible using your domain or server IP via port 80.
http://www.jenkins.ppc.com


5. Install and Configure Python Tools
Upgrade pip and install commonly used DevOps/Cloud SDKs. 
# python3 -m pip install --upgrade pip 
# pip3 install ansible
# pip3 install gcloud 
# pip3 install awscli 
# pip3 install azure-cli 
# pip3 install --upgrade pyvmomi 
# pip3 install vmware-vcenter 
# pip3 install --upgrade git+https://github.com/vmware/vsphere-automation-sdk-python.git

Create and Activate Python Virtual Environment
# python3 -m venv venv_name 
# source venv_name/bin/activate 
# pip install --upgrade pip setuptools

6. Install Terraform on RHEL 9
Add the HashiCorp repo and install Terraform.
# yum install -y yum-utils 
# yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo 
# yum -y install terraform 
# terraform version

7. Install PowerShell on RHEL 9
Install PowerShell from the official RPM package. 
# dnf install https://github.com/PowerShell/PowerShell/releases/download/v7.5.4/powershell-7.5.4-1.rh.x86_64.rpm 
# pwsh --version

Conclusion
You've successfully installed Jenkins, configured Apache reverse proxy, set up Python cloud tooling, installed Terraform, and enabled PowerShell on RHEL 9. This setup prepares your server for end-to-end DevOps automation, CI/CD pipelines, cloud provisioning, and infrastructure management.
Feel free to extend Jenkins further using plugins and pipeline automation.

Ansible AWX

Install Ansible AWX on CentOS/RHEL8/9
If you want to manage automation at scale, Ansible AWX (the open-source version of Ansible Tower) is a powerful solution. This guide walks you through installing AWX 17.1.0 on a CentOS/RHEL-based system using Docker and Docker Compose.

Prerequisites:

Before starting, ensure you have:
  • A fresh CentOS/RHEL system (8/9 preferred)
  • Root or sudo access
  • Internet connectivity
Step 1: Install Required Packages
# dnf -y install git gcc gcc-c++ nodejs gettext device-mapper-persistent-data lvm2 bzip2 python3-pip wget libseccomp

Step 2: Remove Old Docker Installation
# dnf remove docker* -y

Step 3: Configure Docker Repository
# dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo

Step 4: Install Docker CE
# dnf -y install docker-ce
# systemctl enable docker
# systemctl start docker 
# systemctl status docker

Step 5: Install Python Build Dependencies
# python3 -m pip install --upgrade pip
# pip3 install setuptools_rust 
# pip3 install wheel 
# pip3 install ansible
# pip3 install docker-compose

Step 6: Download AWX Installer
# git clone -b 17.1.0 https://github.com/ansible/awx.git
# cd awx/installer

Step 7: Create Required Directories
# mkdir -p /opt/awx/pgdocker /opt/awx/awxcompose /opt/awx/projects

Step 8: Generate a Secret Key
# openssl rand -base64 30
Copy this key for later use.

Step 9: Edit the AWX Inventory File
Open the inventory file:
# vi inventory
Update the following parameters as needed:
admin_password=Welcome@123
awx_official=true
pg_database=awx
pg_password=Welcome@123
awx_alternate_dns_servers="192.168.10.100,192.168.20.100"
postgres_data_dir="/opt/awx/pgdocker"
docker_compose_dir="/opt/awx/awxcompose"
project_data_dir="/opt/awx/projects"
secret_key=XXXXXXXXXX   # openssl rand -base64 30 command output
Save and exit the file.

Step 10: Run the AWX Installer

Once the inventory file is updated, run:
ansible-playbook -i inventory install.yml
This may take several minutes.

Step 11: Access AWX Web Interface

After the installation completes, open a browser and go to:
http://<server-ip>
http://<server_hostname or FQDN>

Log in using:
Username: admin
Password: Welcome@123 (or the password you set)

Conclusion
Installing Ansible AWX on CentOS/RHEL 8/9 provides a powerful and centralized way to manage automation across your infrastructure. By following this guide, you’ve prepared your system with all required dependencies, deployed Docker and Docker Compose, configured AWX using the official installer, and successfully launched the AWX web interface.
  • With AWX now running, you can:
  • Create and manage projects, inventories, and credentials
  • Build and schedule playbook automation workflows
  • Monitor job executions in real time
  • Integrate AWX with Git, cloud providers, and external systems
  • Scale automation across teams and environments
This setup forms the foundation for enterprise-grade automation and can be expanded further with clustering, HTTPS/SSL configuration, LDAP/AD integration, and backup strategies.

Renew GPFS (IBM Spectrum Scale) Certificates

IBM Spectrum Scale (GPFS) uses internal SSL certificates to secure communication among cluster nodes. When these certificates are close to expiration—or have already expired—you must renew them to restore healthy cluster communication.

This article provides step-by-step instructions for renewing GPFS certificates using both the online (normal) and offline (expired certificate) methods.

Renewing GPFS Certificate – Online Method (Recommended)
Use this method when the certificates have NOT yet expired.
This method does not require shutting down the cluster.

1. Check the current certificate expiry date
Run on any cluster node:
# mmcommon run mmgskkm print --cert /var/mmfs/ssl/id_rsa_committed.cert | grep Valid
Or:
# /usr/lpp/mmfs/bin/mmcommon run mmgskkm print --cert /var/mmfs/ssl/id_rsa_committed.cert | grep Valid

2. Generate new authentication keys
# mmauth genkey new

3. Commit the new keys
# mmauth genkey commit

4. Validate the updated certificate on all nodes
# mmcommon run mmgskkm print --cert /var/mmfs/ssl/id_rsa_committed.cert | grep Valid
Or:
/usr/lpp/mmfs/bin/mmcommon run mmgskkm print --cert /var/mmfs/ssl/id_rsa_committed.cert | grep Valid

Renewing GPFS Certificate – Offline Method (Certificates Already Expired)
If the cluster fails to start or nodes cannot communicate due to an expired certificate, use this offline method.
This requires a temporary cluster shutdown and manual time adjustment.

1. Verify certificate expiration
# mmdsh -N all 'openssl x509 -in /var/mmfs/ssl/id_rsa_committed.pub -dates -noout'

2. Stop NTP service (important for manual time rollback)
# lssrc -s xntpd
# stopsrc -s xntpd

3. Shut down GPFS on all nodes
# mmshutdown -a

4. Stop CCR monitoring on quorum nodes
# mmdsh -N quorumNodes "/usr/lpp/mmfs/bin/mmcommon killCcrMonitor"

5. Roll back the system time on ALL nodes
Set the clock just before the certificate expiry time.
Example:
date 072019542025
Explanation:
07 = Month (July)
20 = Day
19:54 = Time
2025 = Year

6. Restart CCR monitor
# mmdsh -N quorumNodes "/usr/lpp/mmfs/bin/mmcommon startCcrMonitor"

7. Generate & commit new keys
# mmauth genkey new
# mmauth genkey commit

8. Restore correct date and restart NTP
# date <current_correct_time>
# startsrc -s xntpd

9. Verify the new certificate
# mmdsh -N all 'openssl x509 -in /var/mmfs/ssl/id_rsa_committed.pub -dates -noout'

10. Restart GPFS on all nodes
# mmstartup -a

Extracting Disk Details (Size, LUN ID, and WWPN) on IBM AIX

Managing storage on IBM AIX systems often requires gathering detailed information about disks — including their size, LUN ID, and WWPN (World Wide Port Name) of the Fibre Channel adapters they connect through.

This information is especially useful for SAN teams and system administrators when verifying storage mappings, troubleshooting, or documenting configurations.

In this post, we’ll look at a simple shell script that automates this task.

The script:
  • Loops through all disks known to AIX (lspv output).
  • Extracts each disk’s LUN ID from lscfg.
  • Gets its size in GB using bootinfo.
  • Finds all FC adapters (fcsX) and displays their WWPNs.
  • Prints a consolidated, easy-to-read summary.
The Script

#!/bin/ksh
for i in $(lspv | awk '{print $1}')
do

# Get LUN ID
LUNID=$(lscfg -vpl "$i" | grep -i "LIC" | awk -F. '{print $NF}')

# Get size in GB
DiskSizeMB=$(bootinfo -s "$i")
DiskSizeGB=$(echo "scale=2; $DiskSizeMB/1024" | bc)

# Loop over all FC adapters
for j in $(lsdev -Cc adapter | grep fcs | awk '{print $1}')
do
WWPN=$(lscfg -vpl "$j" | grep -i "Network Address" | sed 's/.*Address[ .]*//')
echo "Disk: $i Size: ${DiskSizeGB}GB LUN ID: $LUNID WWPN: $WWPN"
done
done


How It Works:
  • lspv lists all disks managed by AIX (e.g., hdisk0, hdisk1).
  • lscfg -vpl hdiskX displays detailed configuration information for each disk, including the LUN ID.
  • bootinfo -s hdiskX returns the disk size in megabytes.
  • lsdev -Cc adapter | grep fcs lists all Fibre Channel adapters (fcs0, fcs1, etc.).
  • lscfg -vpl fcsX | grep "Network Address" shows the adapter’s WWPN.
  • sed 's/.*Address[ .]*//' cleans the output, leaving only the WWPN value.
Example Output:
Disk: hdisk0 Size: 100.00GB LUN ID: 500507680240C567 WWPN: C0507601D8123456
Disk: hdisk0 Size: 100.00GB LUN ID: 500507680240C567 WWPN: C0507601D8123457
Disk: hdisk1 Size: 200.00GB LUN ID: 500507680240C568 WWPN: C0507601D8123456
Disk: hdisk1 Size: 200.00GB LUN ID: 500507680240C568 WWPN: C0507601D8123457


This shows each disk (hdiskX) with its size, LUN ID, and all connected FC adapter WWPNs.

Presenting Fibre-Channel Storage to AIX LPARs with Dual VIOS (NPIV / vFC)

Present SAN LUNs to AIX LPARs using NPIV / virtual Fibre Channel (vFC) so each LPAR has redundant SAN paths through two VIOS servers (VIOS1 = primary, VIOS2 = backup) and can use multipathing (native MPIO or PowerPath).

NPIV (N_Port ID Virtualization) lets an LPAR present its own virtual WWPNs to the SAN while physical Fibre Channel hardware is on the VIOS. With two VIOS nodes and dual SAN fabrics, you get end-to-end redundancy:
  • VIOS1 and VIOS2 each present vFC adapters to the LPAR via the HMC.
  • Each VIOS has physical FC ports connected to redundant SAN switches/fabrics.
  • LUNs are zoned and masked to VIOS WWPNs. AIX LPARs discover LUNs, use multipathing, and survive single-path failures.
Prerequisites & Assumptions:
  • HMC admin, VIOS (padmin/root), and AIX root access available.
  • VIOS1 & VIOS2 installed, registered with HMC and reachable.
  • Each VIOS has at least one physical FC port (e.g., fcs0, fcs1).
  • SAN team will perform zoning & LUN masking.
  • Backups of VIOS and HMC configs completed.
  • You know which LPARs should receive which LUNs.
High-Level Flow:
  • Collect physical FC adapter names & WWPNs from VIOS1 and VIOS2.
  • Provide WWPNs to SAN admin for zoning & LUN masking.
  • Create vFC adapters for each AIX LPAR on the HMC and map them across VIOS1/VIOS2.
  • Verify mappings on HMC and VIOS (lsmap).
  • Ensure VIOS physical FC ports are logged into fabric.
  • On AIX LPARs: run cfgmgr, enable multipathing, create PVs/VGs/LVs as required.
  • Test failover by disabling a path and verifying I/O continues.
  • Document and monitor.
Step-by-Step Configuration

Step 1 — Verify VIOS Physical Fibre Channel Adapters
On VIOS1 and VIOS2, log in as padmin and identify FC adapters:
$ lsdev -type adapter
Expected output snippet:
VIOS1:
fcs0 Available 00-00 Fibre Channel Adapter
fcs1 Available 00-01 Fibre Channel Adapter
VIOS2:

fcs0 Available 00-00 Fibre Channel Adapter
fcs1 Available 00-01 Fibre Channel Adapter
Retrieve WWPNs for each adapter:
$ lsattr -El fcs0 | grep -i wwpn
Record results: 
VIOS    Adapter        WWPN
VIOS1   fcs0              20:00:00:AA:AA:AA
VIOS2   fcs0              20:00:00:CC:CC:CC

Step 2 — SAN Zoning & LUN Presentation
Provide the recorded VIOS WWPNs to the SAN Administrator.
Request:
  • Zoning between each VIOS WWPN and Storage Controller ports.
  • LUN masking to present LUN-100 to both VIOS WWPNs.
  • Confirmation that both VIOS ports see the LUNs across both fabrics.
Tip: Ensure both fabrics (A & B) are zoned independently for redundancy.

Step 3 — Create Virtual Fibre Channel (vFC) Adapters via HMC

On the HMC:
Select AIX-LPAR1 → Configuration → Virtual Adapters.
Click Add → Virtual Fibre Channel Adapter.
Create two vFC adapters:
vfc0 mapped to VIOS1
vfc1 mapped to VIOS2
Save configuration and activate (Dynamic LPAR operation if supported).
Expected vFC mapping:
Adapter     Client LPAR    Server VIOS       Mapping Status
vfc0           AIX-LPAR1     VIOS1                Mapped OK
vfc1           AIX-LPAR1     VIOS2                Mapped OK

Step 4 — Verify vFC Mapping on VIOS
Log in to each VIOS (padmin):
$ lsmap -all -type fcs

Example output:
On VIOS1:
Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ ----------- --------
vfchost0 U9105.22A.XXXXXX-V1-C5 5 AIX-LPAR1 AIX
Status:LOGGED_IN
FC name:fcs0
Ports logged in: 2
VFC client name: fcs0
VFC client WWPN: 10:00:00:11:22:33:44:55

On VIOS2:
Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ ----------- --------
vfchost0 U9105.22A.XXXXXX-V2-C6 5 AIX-LPAR1 AIX
Status:LOGGED_IN
FC name:fcs0

Ports logged in: 2
VFC client name: fcs1

VFC client WWPN: 10:00:00:55:66:77:88:99

Confirm each VIOS vFC host maps to the correct AIX vFC client.

Step 5 — Verify VIOS FC Port Fabric Login
On each VIOS:
$ fcstat fcs0
Verify:
Port is online.
Logged into fabric.
No link errors.

Step 6 — Discover Devices on AIX LPAR

Boot or activate AIX-LPAR1 SMS mode and 
  • Open HMC → Open vterm/console for AIX-LPAR1.
  • HMC GUI: Tasks → Operations → Activate → Advanced → Boot Mode = SMS → Activate. 
  • In SMS console: 5 (Select Boot Options) → Select Install/Boot Device → List all Devices → pick device → Normal Boot Mode → Yes to exit and boot from that device.
Verify Fibre Channel adapters:
# lsdev -Cc adapter | grep fcs
fcs0 Available Fibre Channel Adapter
fcs1 Available Fibre Channel Adapter
List discovered disks:
# lsdev -Cc disk
# lspv
Expected:
hdisk12 Available 00-08-00-4,0 16 Bit LUNZ Disk Drive

Step 7 — Configure Multipathing
If using native AIX MPIO, verify:
# lspath
Enabled hdisk12 fscsi0
Enabled hdisk12 fscsi1
If using EMC PowerPath:
# powermt display dev=all
Confirm both paths active.

Step 8 — Test Redundancy / Failover
To validate multipathing:
On VIOS1, disable the FC port temporarily:
$ rmdev -l fcs0 -R
On AIX LPAR, verify disk is still accessible:
# lspath -l hdisk12
Expected:
Enabled hdisk12 fscsi1
Failed hdisk12 fscsi0
Re-enable path:
$ cfgdev
Confirm path restoration:
Enabled hdisk12 fscsi0
Enabled hdisk12 fscsi1

Step 9— Post-Deployment Checks
Verify all paths:
# lspath
Check VIOS logs for FC errors:
$ errlog -ls
Save configuration backups:
$ backupios -file /home/padmin/vios1_bkup
$ backupios -file /home/padmin/vios2_bkup