Nutanix CVM Booting issue after Upgrading VMware ESXi Hypervisor

Nutanix Controller VM (CVM) is the data plane in Nutanix Hyper converge HCI platform to serving the I/O operations, but after upgrading the VMware ESXi hypervisor, Nutanix CVM is not booting up issue occurred and looses the pass through control on storage.

When upgrading ESXi to 5.0_U2 (some call it SP2) and 6.x some methods of upgrade cause the Nutanix Controller VM to be un-bootable due to message from the CVM that device “03:00.0” (at least) is not in “passthrough” mode.

In one example, VUM assumed a complete set of upgrades and patches which re-installed drivers that the Nutanix factory de-installs, causing the passthrough devices to become “owned” by an ESXi driver which should not have taken control.

Example of configuration after ESXi bootup:

Nutanix CVM Booting issue on VMware ESXi
Nutanix CVM Booting issue on VMware ESXi

Also, a message similar to the following may be displayed in /var/run/log/vmkernel.log:

Nutanix CVM Troubleshooting steps

Should the above 03:00.0 device configuration be seen, including its proper name, this indicates a driver in ESXi has taken ownership of the device.
Typically, the Intel card shows “unknown/unknown” instead of Patsburg Dual 4-Port SATA STorage Control Unit.

To check if unwanted drivers are present, type:

esxcli software vib list | egrep "rste|mpt2sas"

If any lines of results are shown, the old, unwanted, drivers were replaced by the upgrade procedure.

To remove these drivers and free up the Nutanix devices, type the following on the ESXi host:

esxcli software vib remove -f -n scsi-rste

esxcli software vib remove -f -n scsi-mpt2sas

ESXi reboot is required.

Also, after this step, the 03:00.0 (and any sub-devices) must be re-enabled for Passthrough, as well as the 086:00.0 device, after which yet another ESXi reboot is required, before the CVM should be restarted.

Conclusion

Nutanix give world’s best support to their customers and advice to log a case with Nutanix support regarding any Nutanix Hyper converge HCI platform related issue to get resolution as soon as possbile as per best effort with minimum or near zero downtime impact on running production environment.

Thanks to being here, Enjoy Hyper HCI Blog & can you stay tuned to join Hyperhci latest blogs on social media.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑