Nutanix Vmware Esxi Cluster : Slow Storage vMotion Performance Issue

Nutanix VMware ESXi cluster storage vMotion Performance Issue

If you are running Nutanix cluster installed VMware ESXi hypervisor, You might face slow storage vMotion performance issue in Nutanix Hyper converge HCI infrastructure.

This problem might occur because of how Storage vMotion is implemented within ESXi while using NAS storage (which includes data-stores using NFS like Nutanix) and also Nutanix VAAI is not in effect or installed.
 

There are two storage vMotion issues

There are two main storage vMotion issues in Nutanix Hyper HCI cluster :

Issue 1 : vStorage API for Array Integration (VAAI)

When storage vMotion / migrating a VM on block storage that supports VAAI, the ESXi host can trigger a VAAI primitive and offload the task to the storage array to perform the heavy-lifting which makes this operation very quick.

This doesn’t happen with ESXi on NAS storage as explained in this VMware document : VAAI NAS PrimitivesFull File Clone > “There is one noticeable difference between this primitive and the XCOPY primitive on block storage: This primitive cannot be called for Storage vMotion operations, which continue to use the Data Mover.”

Read more : Install Nutanix VAAI in VMware ESXi

Nutanix VAAI Plugin

Issue 2: ESXi Data Mover Component FSDM

ESXi Data Mover component which is used for VMware Esxi Storage vMotion has 3 different versions:

  • FSDM – which is the most simple, most compatible and slowest data mover (highest level in the IO stack too)
  • FS3DM – more advanced and faster
  • FS3DM Hardware Accelerated – offloads to the array


FSDM is the one which is used for NFS/NAS storage on ESXi hosts. A big disadvantage of this FSDM version is that it does not “understand” thin/thick provisioning.

If you perform a Storage vMotion on a VM with huge, thin-provisioned disks (vmdk’s), the FSDM will copy the whole space byte-by-byte (including the zero blocks) – basically inflating the thin disk on the destination datastore.

In this scenario, the Nutanix cluster would need to physically copy each byte of the VM from one container to another up to the full size of all the disks.

Considering both issues mentioned above (no VAAI and FSDM) for Storage vMotion, the migration process comes down to copy large (vmdk) files.

The Nutanix converged platform is designed for large numbers of parallel Virtual Machine IO operations. Single-threaded sequential IO to a single vdisk is going to be much slower compared to multi-threaded IO.

Read more : Storage vMotion and provisioning for “Monster VMs”

Data Mover Overview

Nutanix Recommended Solution

As we can’t force ESXi to use VAAI for Storage vMotion on NFS storage, we can only work around the issues:

  • Upgrading the Nutanix Acropolis AOS version on the Nutanix cluster may further improve performance as optimizations are continually included in new releases (check release notes for details).
  • If your VM has huge, thin-provisioned vmdk’s (vdisks), it may be faster to shutdown the machine, manually copy the VM files to the new container (data store) and later re-import the VM to vSphere /vCenter – this will prevent FSDM from inflating the thin provisioned files.

Conclusion

Nutanix ESXi cluster give best performance on NFS storage, but might have storage vMotion performance issue in that case log a case with VMware and Nutanix to sort out this problem.

Thanks to being here enjoy Hyper HCI Blog 🙂