What Is Nutanix Degraded Node Forwarding State

Nutanix developed extremely intelligent and self healing system that pro-actively detects the upcoming issue(s) and pop up warning in form of ALERT in Prism and via e-mail. Nutanix recently added one more feature called Nutanix degraded node forwarding state in Nutanix cluster.

Nutanix cluster runs algorithm in background to detect the Nutanix degraded node depends on node’s performance peer health global database score. If Nutanix algorithm found any node thats score is not up to the mark and Nutanix Controller-VM: CVM is struggling to complete their operation in real time and facing slowness and performance issue in term of network, disk, DIMM: Memory with unresponsive state then Nutanix cluster will take action to enter that node in forwarding state to prevent any failure.

It means when Nutanix Controller-VM: CVM having issue on hosting cluster services. in other words Nutanix CVM is not 100% reliable to perform normal cluster operations then Nutanix cluster enter that node in forwarding state.

Once a node is detected by Nutanix cluster as degraded node ( because low performance issue ) the leadership and critical services will not be hosted on that node.

Until a degraded node is unmarked from the degraded state, Nutanix node’s Cassandra services remain in the forwarding state, see Cluster Components. Once the node is marked as fixed in Nutanix Prism as Service, Cassandra restarts its services on the node.

Degraded Node State Cause

Nutanix cluster intelligently work to detect node(s) having performance issue as per predefined node’s health and performance score.

The Nutanix node health score is depends on following factors:

Network bandwidth reduction
Network pack loss / drop i.e Nutanix AHV Infected By OVS Packet Looping Issue
Network latency
Soft lockups
Partially bad disks i.e Bit-rot data corruption issue
disk failure i.e SSD failure

Hardware issues (such as unreliable DIMM with ECC errors)
RPC failure or timeouts
Remote Procedure Call: RPC latency
Either a metadata drive has failed, node removal has been initiated, or an unexpected subsystem fault has been encountered.
Nutanix Controller-VM: CVM or critical service rebooting frequently

Note : if one node consistently receives poor scores for approximately 10 minutes then the peers mark that node as a degraded node. Clustering algorithms are used to identify outlier scores.

Degraded Node Event Alert

When Nutanix cluster declared any node as degraded node in cluster then Nutanix prism prompt following degrade node alert messages:

1. Metadata service on CVM ip_address is running in forwarding mode due to reason.

2. Cassandra on CVM ip_address is running in forwarding mode due to reason.

3. Possible degraded node

Determine Degraded Node

If you want check the Nutanix degraded node in your cluster then need to run simple Nutanix Degrade Node command on any Nutanix CVM in cluster.

 nutanix@cvm$ ncc health_checks system_checks degraded_node_check

Degraded Node Impact

Lets explore the Nutanix degraded Node impact on Nutanix cluster is following

Impact 1 : Cluster performance may be significantly degraded. In the case of multiple nodes with the same condition, the cluster may become unable to service I/O requests.

Impact 2 : Containers or data stores might be unavailable for 10 minutes until the node is marked as degraded.

Impact 3 : Upgrades and break fixes are not allowed until the degraded node is fixed.

Impact 4 : Continuing to run a degraded node can affect overall cluster and user VM performance.

Impact 5 : ZooKeeper might places the node into maintenance mode forward leadership position to another Nutanix CVM.

Impact 6 : Cassandra services remain in the forwarding state.

Impact 7 : The leadership and critical services will not be hosted on that node.

Impact 8 : Degraded node can adversely affect the performance of an entire cluster.

Nutanix Auto-healing Action

When Nutanix cluster detects any nodes as degraded node in cluster then Nutanix auto healing system work to mitigate the effects of failures and reduces the overall impact to the cluster.

To mitigate the impact, this software can perform one of the following actions:

Prevent components on the degraded node from acquiring leadership roles
Place the degraded node CVM in maintenance mode and reboots the Nutanix CVM to stop the services
Shut down the host

Read Also : Nutanix Acropolis AOS Vs AHV

Conclusion

Nutanix evolving the intelligence of system in each new release of Nutanix AOS and AHV Hypervisor to automate the task and auto-healing the system as much as possible. Nutanix degraded node forwarding state is the one important featured added in Nutanix AOS / CVM to minimize the impact of failure and performance degrade.

Thanks to being with HyperHCI Tech Blog to learn new tech topic on every day.!

Manish Kumar

I’m Manish Kumar, founder of HyperHCI.com and a senior IT consultant with 13+ years of experience in infrastructure design and cybersecurity. An official certified SME for ISC2 and Nutanix, Also, certified in CISSP, CompTIA Security+, VMware and AWS. My expertise covers HCI, virtualization, cloud computing, network and security across Nutanix, VMware, and AWS platforms Read more

What is Nutanix Degraded Node forwarding state

Degraded Node State Cause

Degraded Node Event Alert

Determine Degraded Node

Degraded Node Impact

Nutanix Auto-healing Action

Conclusion

Recent Posts

Leave a Comment Cancel Reply

What is Nutanix Degraded Node forwarding state

Degraded Node State Cause

Degraded Node Event Alert

Determine Degraded Node

Degraded Node Impact

Nutanix Auto-healing Action

Conclusion

Recent Posts

Leave a Comment Cancel Reply

Recommended For You