Pages

Booting LPAR into maintenance mode(fix the issue)

Maintenance Boot:
A Maintenance Boot (also called Maintenance Mode or Service Mode Boot) is a special boot method on an AIX Logical Partition (LPAR) that allows system administrators to perform system recovery, troubleshooting, patching, or disk maintenance operations without booting the normal rootvg.

Instead of loading the regular AIX system image from disk, the LPAR boots into a minimal AIX environment (typically from a NIM SPOT, installation media, or alternate disk**) to perform offline administrative tasks.

When is a Maintenance Boot Used?
You perform a maintenance boot in scenarios such as:
System recovery – when rootvg or boot logical volumes are corrupted or unbootable.
Migration / Upgrade – to patch, update, or migrate the OS from an alternate environment (using alt_disk_copy or nimadm).
Multipathing conversion – e.g., when converting from EMC PowerPath to native AIX MPIO, as changes must be made to rootvg while it’s inactive.
Filesystem repairs – to run fsck or modify /etc/filesystems on a rootvg that cannot mount normally.
ODM or device configuration repair – to fix corrupted Object Data Manager entries or remove faulty drivers.
Password or security recovery – to reset root password or access a locked system.

During a Maintenance Boot:
The system boots into a temporary minimal AIX kernel loaded from a NIM SPOT, installation DVD, or alternate disk image.
The rootvg of the system (your main AIX OS) remains unmounted initially.
Administrators can mount rootvg manually or use tools like alt_rootvg_op, chroot, or smitty maintenance to work inside it.
Once maintenance is complete (e.g., package removal, MPIO setup, filesystem fix), the LPAR can reboot normally back into its regular rootvg.

Maintenance Boot Sources:
NIM SPOT:Most common in enterprise environments. Boots over network from a NIM master using a defined SPOT resource (e.g. nim -o maint_boot -a spot=spot_7300-03-01 lpar1).
AIX Installation DVD/ISO/VIOS Repository:Bootable media containing the AIX base OS. Used when no network boot is available.
Alternate Disk (altinst_rootvg) Boots from a cloned rootvg copy created by alt_disk_copy. Common for patching or PowerPath→MPIO conversions.

NIM Reset and Maintenance Boot Preparation:

1. Reset the NIM client definition <nimclient>
When a NIM client has a stale or failed state (e.g., after an interrupted operation), you must reset it.
# nim -F -o reset <nimclient>
Explanation:
-F: Forces the reset operation.
-o reset: Clears all current operations/states for the NIM client.
<nimclient>: The name of the NIM client object on the NIM master.
If it fails or reports that allocations still exist, use the force attribute:
# nim -o reset -a force=yes <nimclient>
This completely clears any current NIM state associated with that client.

2. Deallocate All Resources for the Client
If resources (SPOT, mksysb, lpp_source, etc.) are still allocated to the client, remove them:
# nim -Fo deallocate -a subclass=all <nimclient>
Explanation:
-F: Forces deallocation even if the NIM database thinks resources are still in use.
-o deallocate: Removes all allocated resources.
-a subclass=all: Targets all resource types.
This ensures the client can be cleanly redefined or reused for another operation (like an alternate disk maintenance boot).

3. Remove an Old or Corrupted NIM Object Definition
If a previous client definition <nimclient> or a test system needs to be deleted:
# nim -o remove <nimclient>
Explanation:
This removes the NIM database entry for that client completely — use it only when you are sure the client definition is no longer needed.

4. Recreate or Define a NIM Client or Machine Object
You can recreate a machine definition using the SMIT interface:
# smit nim_mkmac
[Entry Fields]
* NIM Machine Name                                             [<nimclient>]
* Machine Type                                                       [standalone] +
* Hardware Platform Type                                      [chrp] 
 +
Kernel to use for Network Boot                              [64] 
 +
Communication Protocol used by client                 [nimsh] 
 +
Primary Network Install Interface
* Cable Type tp +
Network Speed Setting                                           [                ]
 +
Network Duplex Setting                                         [                ] 
 +
* NIM Network                                                      ent-NetworkX
* Host Name                                                          <nimclient>
Network Adapter Hardware Address                     [       0        ]
Network Adapter Logical Device Name                [                ]
IPL ROM Emulation Device                                  [                ] 
 +/
CPU Id []
# VLAN Tag Priority (0 to 7)                                    [                ]
# VLAN Tag Identifier (0 to 4094)                           [                ]
Machine Group                                                          [                ] 
 +
Managing System Information
WPAR Options
Managing System                                                    [                ]
-OR-
LPAR Options
Identity                                                                    [                ]
Management Source                                               [                ] 
 +
Comments                                                               [                ]

Alternatively, you can define the client manually via CLI:
# nim -o define -t standalone -a platform=chrp -a netboot_kernel=mp -a if1="network_name interface_name hostname" <nimclient>

5. Verify the NIM Client Definition
After creation, verify that the client exists and is configured correctly:
# lsnim | grep <nimclient>
Example:
# lsnim | grep mcsm
Expected output should show your NIM client with type standalone.

6. Initiate Maintenance Boot from a SPOT
Once the NIM client is properly defined and reset, you can boot it into maintenance mode using a defined SPOT (Shared Product Object Tree).
# nim -o maint_boot -a spot=spot_7300-03-01 <nimclient>
Explanation:
-o maint_boot: Tells NIM to initiate a maintenance (service) boot.
-a spot=spot_7200-03-01: Specifies which SPOT to use (adjust for your AIX TL/SP version).
<nimclient>: Replace with the actual NIM client name (e.g., mcsm).
This will prepare and initiate a network boot from the NIM master using the chosen SPOT.

7. Check NIM resources:
# lsnim -l
Check specific SPOT details:
# lsnim -l spot_7300-03-01
Check NIM master status:
# lssrc -s nimsh

8. Logged in to the HMC & boot the LPAR SMS mode
HMC : https://hmchostname.ppc.com --> with user "hscroot" and password
System resources --> Partitions --> Select LPAR "aixtest01"


Select the Partition Profile ---> Operation --> Activate partition --> With SMS mode



Select the Partition Profile ---> Operation --> Open Console

9. Setup the ping test select the Setup Remote IPL (Initial Program Load)
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Main Menu
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. I/O Device Information
4. Select Console
5. Select Boot Options
-------------------------------------------------------------------------------
Navigation Keys:
X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:2

10. Select the Network Adapter (make sure network adapter should be back/management network.
NIC Adapters

Device Slot Hardware Address
1. Port 1 - 2 PORT Gigabit Et Un-P1-T9                 00096bff616b
2. Port 2 - 2 PORT Gigabit Et Un-P1-T10           00096bff616a

--------------------------------------------------------------------------------
Navigation keys:
M = return to main menu
ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Type the number of the menu item and press Enter or Select a Navigation key: 2


11. Select Internet Protocol Version.
1. IPv4 - Address Format 123.231.111.222
2. IPv6 - Address Format 1234:5678:90ab:cdef:1234:5678:90ab:cdef

--------------------------------------------------------------------------------
Navigation keys:
M = return to main menu
ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key: 1

12. Network Parameters
Port 1 - 2 PORT Gigabit Et Un-P1-T9 00096bff616b
1. IP Parameters
2. Adapter Parameters
3. Ping Test
--------------------------------------------------------------------------------
Navigation keys:
M = return to main menu
ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Type the number of the menu item and press Enter or Select a Navigation key: 1

13. IP Parameters
Port 1 - 2 PORT Gigabit Et Un-P1-T9 00096bff616b
1. Client IP Address [192.168.10.101] ----> NIM Client IP Address
2. Server IP Address [192.168.22.100] ----> NIM Server IP Address
3. Gateway IP Address [192.168.10.1] --> NIM Client Gateway
4. Subnet Mask [255.255.255.0] --> NIM Client SubNet mask

--------------------------------------------------------------------------------
Navigation keys:
M = return to main menu
ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Type the number of the menu item and press Enter or Select a Navigation key: 1 , 2 , 3 & 4

14. Network Parameters
Port 1 - 2 PORT Gigabit Et Un-P1-T9 00096bff616b
1. IP Parameters
2. Adapter Parameters
3. Ping Test
--------------------------------------------------------------------------------
Navigation keys:
M = return to main menu
ESC key = return to previous screen X = eXit System Management Services
--------------------------------------------------------------------------------
Type the number of the menu item and press Enter or Select a Navigation key: 3

 Note : Perform a Ping Test to validate connectivity that should be successfully......pinging ........

15.Once ping test done then ESC key = return to previous screen main page
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Main Menu
1. Select Language
2. Setup Remote IPL (Initial Program Load)
3. I/O Device Information
4. Select Console
5. Select Boot Options
-------------------------------------------------------------------------------
Navigation Keys:
X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:5

16. Press 1 to select "Select Install/Boot Device"
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Multiboot
1. Select Install/Boot Device
2. Configure Boot Device Order
3. Multiboot Startup <OFF>
4. SAN Zoning Support
-------------------------------------------------------------------------------
Navigation keys:
M = return to Main Menu
ESC key = return to previous screen X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:1

17. Press 5 to select "List all Devices"
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Select Device Type
1. Tape
2. CD/DVD
3. Hard Drive
4. Network
5. List all Devices
------------------------------------------------------------------------------
Navigation keys:
M = return to Main Menu
ESC key = return to previous screen X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:4

18. Review the device list. In this case, press 2 to select "Logical LAN"
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Select Device
Device Current Device
Number Position Name
1. - Interpartition Logical LAN (loc=U8247.22L.211E16A-V9-C2-T1)
2. - Interpartition Logical LAN (loc=U8287.23L.211E16A-V10-C2-T2)
-------------------------------------------------------------------------------
Navigation keys:
M = return to Main Menu
ESC key = return to previous screen X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:2

19. Press 2 to select "Service Mode Boot"
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Select Task
Logical LAN (loc=U8287.23L.211E16A-V10-C2-T2)
1. Information
2. Normal Mode Boot
3. Service Mode Boot
-------------------------------------------------------------------------------
Navigation keys:
M = return to Main Menu
ESC key = return to previous screen X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:3

20. Press 1 to select "Yes"
PowerPC Firmware
Version FW860.20 (SV860_064)
SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
Are you sure you want to exit System Management Services?
1. Yes
2. No
-------------------------------------------------------------------------------
Navigation Keys:

X = eXit System Management Services
-------------------------------------------------------------------------------
Type menu item number and press Enter or select Navigation key:1

21. After the system has restarted and booted from the USB device, press 1 to continue with AIX BOS installation
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

-----------------------------------------------------------------------------------------------------------------------------
                                                                                    Welcome to AIX.
                                                       boot image timestamp: 13:55:11 10/16/2025
                                                    The current time and date:13:55:11 10/16/2025
                                  processor count: 1; memory size: 8192MB; kernel size: 57983554
     boot device: /pci@80000002000001b/usb@0/hub@1/usb-scsi@4/disk@0,0: \ppc\chrp\bootfile.exe
-----------------------------------------------------------------------------------------------------------------------------

******* Please define the System Console. *******
Type a 1 and press Enter to use this terminal as the system console.

22. At this point you are in the maintenance screen. Choose to access a root volume group in order to get into maintenance mode.
Maintenance
Type the number of your choice and press Enter.
1 Access a Root Volume Group
2 Copy a System Dump to Removable Media
3 Access Advanced Maintenance Functions
4 Erase Disks
5 Configure Network Disks (iSCSI)
6 Select Storage Adapters
Warning:
If you choose to access a root volume group, you will NOT be able to return to the Base Operating System Installation menus without rebooting.
0 Continue

88 Help ?
>>> 99 Previous Menu
>>> Choice [99]:
Type the number of your choice and press Enter. 0


23. Access a Root Volume Group
Type the number for a volume group to display the logical volume information
and press Enter.
1) Volume Group 00050a850000d6000000013c4fcbb3ba contains these disks:
hdisk0 10240 vscsi

2) Volume Group 00cda8df00004c000000010a622ba81c contains these disks:
hdisk3 10240 vscsi hdisk1 10240 vscsi
3) Volume Group 00050a850000d60000000116c60aefc3 contains these disks:
hdisk5 10240 vscsi hdisk2 10240 vscsi
4) Volume Group 00050a850000d6000000013c203de90d contains these disks:
hdisk4 10240 vscsi

88 Help ?
>>> 99 Previous Menu
>>> Choice [99]:
Type the number of your choice and press Enter.: 1

24. Volume Group Information
---------------------------------------------------------------------------
Volume Group ID 00050a850000d6000000013c4fcbb3ba includes the following
logical volumes:
loglv01 loglv00 fslv02 fslv03
---------------------------------------------------------------------------
Type the number of your choice and press Enter.
1) Access this Volume Group and start a shell
2) Access this Volume Group and start a shell before mounting filesystems
88 Help ?
>>> 99 Previous Menu
>>> Choice [99]:
Type the number of your choice and press Enter.: 1

25. Importing Volume Group....................................................................
rootvg
Checking the / filesystem.
The current volume is: /dev/hd4
Primary superblock is valid.
J2_LOGREDO:log redo processing for /dev/hd4
Primary superblock is valid.
Checking the /usr filesystem.
The current volume is: /dev/hd2
Primary superblock is valid.
Exit from this shell to continue the process of accessing the root
volume group.
#
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fix for BLV Corruption or Boot Disk Misconfiguration:

1: Boot the LPAR into Maintenance Mode Above Steps:

2: Verify Rootvg and Disks
List the disks and confirm which disk belongs to rootvg:
# lspv | grep rootvg
Example output:
hdisk1 00f6429b8f3b2a5e rootvg active
Ensure this disk is your intended boot disk (and not a PowerPath pseudo-device like hdiskpowerX).

3: Remove Incorrect Boot Logical Volume and Recreate It
If hd5 already exists, remove it:
# rmlv -f hd5
Recreate hd5 as a boot logical volume in rootvg on the correct physical disk (not a hdiskpower device — use a real MPIO disk):
# mklv -y hd5 -t boot -a e rootvg 1 hdisk1

4: Rebuild the BLV Device Links
Clean and recreate symbolic links in /dev for the boot process:
# cd /dev
# rm -f ipldevice ipl_blv
# ln -sf /dev/rhdisk1 /dev/ipldevice
# ln -sf /dev/rhd5 /dev/ipl_blv
Verify:
# ls -l /dev/ipldevice /dev/ipl_blv
They should both exist and point to the correct raw devices.

5: Rebuild the Boot Image
Now rebuild the boot logical volume:
# bosboot -ad /dev/ipldevice
Expected output:
bosboot: Boot image is 262144 bytes.
bosboot: 0518-506 bosboot completed successfully.
If you still see:
0301-154 bosboot: missing boot logical volume hd5
or
0503-185 bosboot: I/O error
then double-check that:
/dev/ipldevice points to a valid raw disk (rhdiskX)
hd5 exists and is type boot
Disk is part of rootvg and active
You can confirm:
# lslv hd5

6: Set Bootlist Correctly
Once bosboot succeeds, set the bootlist to the correct disk:
# bootlist -m normal -o # shows current bootlist
# bootlist -m normal hdisk1 # sets boot device to hdisk1
# bootlist -m normal -o # verify
If you are using MPIO (not PowerPath), do not use hdiskpower devices in your bootlist — use the underlying MPIO hdiskX.

7: Save and Verify Boot Records
Confirm the boot record on the disk:
# bootinfo -b hdisk1
If bootinfo -b returns 0, the BLV is not valid — re-run bosboot -ad /dev/ipldevice.

8: Reboot and Validate
Reboot the LPAR from HMC or console:
# shutdown -Fr
or
# reboot
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Recover Root Password in Maintenance Mode (AIX):
1: Boot the LPAR into Maintenance Mode Above Steps:
2: Change the password using the passwd command:

# passwd root
3: Reboot the LPAR from HMC or console:
# shutdown -Fr
or
# reboot
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AIX Root Filesystem Recovery in Maintenance Mode:

1: Boot the LPAR into Maintenance Mode Above Steps:
2. Run Filesystem Consistency Check (fsck)
Corruption in /, /usr, or /var is the most common reason for boot failure.
Run fsck on the unmounted JFS2 filesystems:
# fsck -y /dev/hd4 # root filesystem
# fsck -y /dev/hd2 # /usr
# fsck -y /dev/hd9var # /var
# fsck -y /dev/hd3 # /tmp

Repeat until each returns:
The filesystem is clean.
If you see superblock or log device errors, you can recreate the log device:
# lslv -m loglv00 # verify log device
# logform /dev/loglv00
Then re-run fsck.

3: Mount the Repaired Filesystems
After successful fsck, re-mount everything:
# mount /dev/hd4 /mnt
# mount /dev/hd2 /mnt/usr
# mount /dev/hd9var /mnt/var
# mount /dev/hd3 /mnt/tmp
Verify mount points:
# df -m /mnt /mnt/usr /mnt/var

4: Reboot the LPAR from HMC or console:
# shutdown -Fr
or
# reboot
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AIX Process or Service hung Recovery in Maintenance Mode:

1: Boot the LPAR into Maintenance Mode Above Steps:
2: Change the password using the passwd command:

# vi /etc/inittab ---> comment the service to stop while LPAR boot
# vi /etc/services ---> comment the service to stop while LPAR boot
3: Reboot the LPAR from HMC or console:
# shutdown -Fr
or
# reboot

No comments:

Post a Comment