The other day I encountered a problem where a machine had a fairly fatal error when trying to boot up, with this error:
krtld: failed to open '/platform/i86pc/kernel/amd64/unix' krtld: bind_primary(): no relocation information found for module /platform/i86pc/kernel/amd64/unix krtld: error during initial load/link phase
or as can be more fully seen in the screen shot:
After rebooting and selecting safe mode from the GRUB menu I then imported the syspool and mounted up the current boot image filesystem:
# zpool import -f syspool # zpool get bootfs syspool NAME PROPERTY VALUE SOURCE syspool bootfs syspool/rootfs-nmu-000 local # mount -F zfs syspool/rootfs-nmu-000 /mnt
At this point I needed to inspect the boot_archive file as instructed, which is fairly easily done thus:
# mkdir /a # cd /mnt/platform/i86pc/amd64 # lofiadm -a `pwd`/boot_archive /dev/lofi/1 # mount -F hsfs /dev/lofi/1 /a # cd /a # ls boot etc kernel
Walking through the boot_archive image I observed that we had no platform directory – so no wonder the machine couldn’t find the unix kernel file.
At this point I assumed the boot_archive file was somehow corrupt and as per so many documents, decided to rebuild the archive using bootadm:
# bootadm update-archive -R /mnt updating //platform/i86pc/boot_archive updating //platform/i86pc/amd64/boot_archive
After this, I unmounted /a, destroyed the lofi device, unmounted /mnt and exported the syspool before rebooting, only to find the exact same error.
At this point I got the machine booted once more from failsafe and decided to check the location of the unix file for a 64 bit kernel:
# ls -l /mnt/platform/i86pc/kernel/amd64/unix -rwxr-xr-x 1 root sys 1903080 Sep 9 02:09 /platform/i86pc/kernel/amd64/unix
Hmm, this file does exist so why doesn’t it get copied into the boot_archive file?
After poking at this for a while and trying to read up on the process of boot archive creation, I eventually turned to the source and started seeing references to a cache directory, which turned out to be the archive_cache directory here:
/platform/i86pc/archive_cache
/platform/i86pc/amd64/archive_cache
Upon checking these directories it turned out that the platform sub-directory had been removed, thus any attempt to rebuild the boot_archive file was doomed to failure.
There’s very little documentation about the archive_cache and even less about fixing it when it’s corrupt – there’s nothing in the man page.
Trying to be clever I took a copy of the platform directory from a similarly configured machine, copied that over to the 32bit and 64bit archive_cache directories, then rebuilt the boot_archive file again.
Upon rebooting the machine this time, I got this error:
Alas this just goes to show that not all machines are identical and when you’re dealing with the kernel and dynamic kernel modules, everything has to be exactly right.
At this point I also tried being clever and thought that if I deleted the archive_cache directories, then created an empty archive_cache directory with the same permissions, that bootadm would find nothing in the cache and repopulate it?
Apparently not – if you do this you end up with a small boot_archive file and nothing in the archive_cache. This was me trying to outthink the software when in reality I was being too smart.
It turns out that if you delete the archive_cache directories completely, then bootadm will notice this, recreate the directory for you and then create a valid boot_archive file.
# bootadm update-archive -v archive cache directory not found: //platform/i86pc/archive_cache archive cache directory not found: //platform/i86pc/amd64/archive_cache new /boot/acpi/tables new /boot/solaris/bootenv.rc need to create directory path for //platform/i86pc/archive_cache/boot/solaris need to create directory path for //platform/i86pc/amd64/archive_cache/boot/solaris new /boot/solaris/devicedb/master need to create directory path for //platform/i86pc/archive_cache/boot/solaris/devicedb need to create directory path for //platform/i86pc/amd64/archive_cache/boot/solaris/devicedb cannot find: /etc/cluster/nodeid: No such file or directory new /etc/dacf.conf need to create directory path for //platform/i86pc/archive_cache/etc need to create directory path for //platform/i86pc/amd64/archive_cache/etc new /etc/devices/devid_cache need to create directory path for //platform/i86pc/archive_cache/etc/devices need to create directory path for //platform/i86pc/amd64/archive_cache/etc/devices cannot find: /etc/devices/mdi_ib_cache: No such file or directory new /etc/devices/mdi_scsi_vhci_cache cannot find: /etc/devices/retire_store: No such file or directory new /etc/devices/pci_unitaddr_persistent new /etc/driver_aliases new /etc/driver_classes new /etc/mach new /etc/name_to_major new /etc/name_to_sysnum . . . new /platform/i86pc/kernel/amd64/unix new /platform/i86pc/kernel/unix . . . new /platform/i86pc/ucode/AuthenticAMD/3010-00 new /etc/zfs/zpool.cache need to create directory path for //platform/i86pc/archive_cache/etc/zfs need to create directory path for //platform/i86pc/amd64/archive_cache/etc/zfs updating //platform/i86pc/boot_archive Unable to extend //platform/i86pc/boot_archive... rebuilding archive Successfully created //platform/i86pc/boot_archive updating //platform/i86pc/amd64/boot_archive Unable to extend //platform/i86pc/amd64/boot_archive... rebuilding archive Successfully created //platform/i86pc/amd64/boot_archive
Upon creation of the new boot_archive, I rebooted and the system came up normally.