What is Nutanix Degraded Node forwarding state

Nutanix Degraded Node Forwarding State

Nutanix developed extremely intelligent and self healing system that pro-actively detects the upcoming issue(s) and pop up warning in form of ALERT in Prism and via e-mail. Nutanix recently added one more feature called Nutanix degraded node forwarding state in Nutanix cluster.

Nutanix cluster runs algorithm in background to detect the Nutanix degraded node depends on node’s performance peer health global database score. If Nutanix algorithm found any node thats score is not up to the mark and Nutanix Controller-VM: CVM is struggling to complete their operation in real time and facing slowness and performance issue in term of network, disk, DIMM: Memory with unresponsive state then Nutanix cluster will take action to enter that node in forwarding state to prevent any failure.

It means when Nutanix Controller-VM: CVM having issue on hosting cluster services. in other words Nutanix CVM is not 100% reliable to perform normal cluster operations then Nutanix cluster enter that node in forwarding state.

Once a node is detected by Nutanix cluster as degraded node ( because low performance issue ) the leadership and critical services will not be hosted on that node.

Until a degraded node is unmarked from the degraded state, Nutanix node’s Cassandra services remain in the forwarding state, see Cluster Components. Once the node is marked as fixed in Nutanix Prism as Service, Cassandra restarts its services on the node.

Read Also : Why Nutanix Adopted Web-Scale Infrastructure Concept ?

Degraded Node State Cause

Nutanix cluster intelligently work to detect node(s) having performance issue as per predefined node’s health and performance score.

The Nutanix node health score is depends on following factors:

  • Hardware issues (such as unreliable DIMM with ECC errors)
  • RPC failure or timeouts
  • Remote Procedure Call: RPC latency
  • Either a metadata drive has failed, node removal has been initiated, or an unexpected subsystem fault has been encountered.
  • Nutanix Controller-VM: CVM or critical service rebooting frequently

Note : if one node consistently receives poor scores for approximately 10 minutes then the peers mark that node as a degraded node. Clustering algorithms are used to identify outlier scores.

Read Also : Nutanix Acropolis AHV Core Architecture Explained

Degraded Node Event Alert

When Nutanix cluster declared any node as degraded node in cluster then Nutanix prism prompt following degrade node alert messages:

1. Metadata service on CVM ip_address is running in forwarding mode due to reason.

2. Cassandra on CVM ip_address is running in forwarding mode due to reason.

3. Possible degraded node

Read Also : How To Change Nutanix CVM, AHV and IPMI Passwords

Determine Degraded Node

If you want check the Nutanix degraded node in your cluster then need to run simple Nutanix Degrade Node command on any Nutanix CVM in cluster.

 nutanix@cvm$ ncc health_checks system_checks degraded_node_check

Read Also : Nutanix Acropolis acli vs ncli Command Explained

Degraded Node Impact

Lets explore the Nutanix degraded Node impact on Nutanix cluster is following

Impact 1 : Cluster performance may be significantly degraded. In the case of multiple nodes with the same condition, the cluster may become unable to service I/O requests.

Impact 2 : Containers or data stores might be unavailable for 10 minutes until the node is marked as degraded.

Impact 3 : Upgrades and break fixes are not allowed until the degraded node is fixed.

Impact 4 : Continuing to run a degraded node can affect overall cluster and user VM performance.

Impact 5 : ZooKeeper might places the node into maintenance mode forward leadership position to another Nutanix CVM.

Impact 6 : Cassandra services remain in the forwarding state.

Impact 7 : The leadership and critical services will not be hosted on that node.

Impact 8 : Degraded node can adversely affect the performance of an entire cluster.

Read Also : Nutanix Cluster size Limitation, Scabalibity or Maximums

Nutanix Auto-healing Action

When Nutanix cluster detects any nodes as degraded node in cluster then Nutanix auto healing system work to mitigate the effects of failures and reduces the overall impact to the cluster.

To mitigate the impact, this software can perform one of the following actions:

Read Also : Nutanix Acropolis AOS Vs AHV

Conclusion

Nutanix evolving the intelligence of system in each new release of Nutanix AOS and AHV Hypervisor to automate the task and auto-healing the system as much as possible. Nutanix degraded node forwarding state is the one important featured added in Nutanix AOS / CVM to minimize the impact of failure and performance degrade.

Thanks to being with HyperHCI Tech Blog to learn new tech topic on every day.!