Nutanix Availability Domain Provides Node, Block and Rack Awareness feature

Nutanix Availability Domain ( AD ) is the key struct of Nutanix Acropolis Distributed Storage Fabric ( ADSF ) that make sure the Nutanix critical Components and data blocks must have redundancy in any supported failure scenarios Nutanix distributed system.

What is actual meant by Availability Domain ?

First understand the function of Availability domain

Domain is a group of all the possible replacements for a variable like Disk, node, block or rack.

The domain is the set of all possible values ( disk, node, block or rack ) which will make the function “work”.

When finding the domain, remember:

  • The denominator (bottom) of a fraction cannot be zero
  • The number under a square root sign must be positive in this section

In final words, Availability domain keep possible extra value / component n+1/2 (where n = disk, node, block or rack ) for fault tolerance to make the Nutanix cluster works without any interruption.

Availability Domain

Nutanix Supported Awareness Level

Nutanix supports four level of awareness

  • Disk ( In all AOS Version )
  • Node / Host ( In all AOS Version )
  • Block / Chasis ( In AOS version 4.5 and later )
  • Rack ( In AOS version 5.9 and later )

How Availability Domain Works

Nutanix Availability Domain deliver the data resiliency to protect the cluster from dangerous state to maintain the redundancy of data and component(s) failure.

Availability Domain : Fault Domain Type – Disk

Fault Domain Type Disk
Fault Domain Type Disk

Availability Domain : Fault Domain Type – Node / Host

Fault Domain type Host
Fault Domain type Host

Awareness Key Components

Nutanix Availability Domain ( AD ) delivers data resiliency (simultaneous failure) of Data, Metadata and cluster configuration data to using

Replication Factor – RF 2 / 3 and Redundancy Factor – RF 2 /3.

Nutanix cluster key Awareness critical components are :

  1. Data ( Virtual Machine – VM Data )
  2. Metadata ( Cassandra )
  3. Cluster configuration data ( Zookeeper )


Read more : Nutanix Cluster Core Components List

Data

The Data is the one of the most critical key component in Nutanix cluster Awareness and must be maintain the redundant copy of the VM Data.

Nutanix Distributed Storage Fabric ( DSF ) ensure the the VM’s Data replicas are written to the other Node, Block or Racks in the Nutanix cluster in case of failure – node, block or rack the data is remains available without any damage or corruption.

Data with Block and Rack Awareness
Data with Block and Rack Awareness

Use case 1 : If you have 4 node cluster where a disk fails each CVM will handle 25% of the metadata scan and data rebuild.

Use case 2 : In a 10 node cluster, each CVM will handle 10% of the metadata scan and data rebuild.

Use case 3 : In a 50 node cluster, each CVM will handle 2% of the metadata scan and data rebuild.


Read more : Nutanix CVM Kernel Panic Issue

Metadata ( Cassandra )

Nutanix has its own customized dear Cassandra component to store metadata and important information to leverage ring-like structure and replicate the metadata and essential information to all nodes in the cluster.

Cassandra runs on all nodes of the cluster. These nodes communicate with each other once a second using the Gossip protocol, ensuring that the state of the database is current on all nodes

Cassandra peers are replicate to all nodes in clockwise manner to distribute the peers among the blocks or racks to leverage the Awareness, to ensure no two peers are on the same block or rack.

Cassandra Block and Rack Awareness
Cassandra Block and Rack Awareness

In AOS version 5.0 supported cluster Fault Tolerance FT level are FT1 and FT2 which correspond to metadata RF3 and data RF2, or metadata RF5 and data RF3 respectively.


Read more : Nutanix AOS Release STS Vs LTS

Cluster Configuration Data

Nutanix leverages Zookeeper to store essential configuration data for the cluster and distributed in a block or rack aware manner to ensure availability in the case of a block or rack failure.

In the event of a block or rack failure or outage, for example one of the Zookeeper ( there is 3 redundant copies of zookeeper ) nodes has gone down, the Zookeeper role would be transferred to another node in the cluster as shown below:

Zookeeper Block or Rack Awareness
Zookeeper Block or Rack Awareness

When the block or rack comes back online, the Zookeeper role would be transferred back to maintain block or rack awareness.


Read more : How To Shutdown And Start Nutanix AHV Cluster

Nutanix Cluster FT Level

The Block or Rack awareness is built into the software. Rack awareness is achieved by striping the cluster across racks and leveraging block awareness by which each “block” is actually in a different rack.

Availability Domain : Nutanix Cluster FT Level with RF 2 / 3

Awareness LevelRedundancy Factor RFFT LevelMin. UnitsSimultaneous failure toleranceDisk failure tolerance
Node213 Nodes1 Node1 Disk
Node325 Nodes2 Node2 Disk
Block213 Blocks1 Block1 Disk
Block325 Blocks2 Blocks2 Disk
Rack213 Racks1 Rack1 Disk
Rack325 Racks2 Racks2 Disk

Availability Domain : Nutanix Cluster FT Level with Eraser Coding ( EC )

Awareness LevelFT LevelMin. UnitsSimultaneous failure tolerance
Node14 Nodes1 Node
Node26 Nodes2 Nodes
Block14 Blocks1 Block
Block26 Blocks2 Blocks
Rack14 Racks1 Rack
Rack26 Racks2 Racks


Conclusion

Nutanix Availability Domain is the uniqe feature providing by Nutanix hyper converged platform to maintain the redundancy of data and component using Fault Tolerance and Replication factor.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑