Nutanix Availability Domain ( AD ) is the key struct of Nutanix Acropolis Distributed Storage Fabric ( ADSF ) that make sure the Nutanix critical Components and data blocks must have redundancy in any supported failure scenarios Nutanix distributed system.
What is actual meant by Availability Domain ?
First understand the function of Availability domain
Domain is a group of all the possible replacements for a variable like Disk, node, block or rack.
The domain is the set of all possible values ( disk, node, block or rack ) which will make the function “work”.
When finding the domain, remember:
- The denominator (bottom) of a fraction cannot be zero
- The number under a square root sign must be positive in this section
In final words, Availability domain keep possible extra value / component n+1/2 (where n = disk, node, block or rack ) for fault tolerance to make the Nutanix cluster works without any interruption.
Nutanix Supported Awareness Level
Nutanix supports four level of awareness
- Disk ( In all AOS Version )
- Node / Host ( In all AOS Version )
- Block / Chasis ( In AOS version 4.5 and later )
- Rack ( In AOS version 5.9 and later )
Read also: Nutanix Block Vs Nutanix Node
How Availability Domain Works
Nutanix Availability Domain deliver the data resiliency to protect the cluster from dangerous state to maintain the redundancy of data and component(s) failure.
Availability Domain : Fault Domain Type – Disk
Availability Domain : Fault Domain Type – Node / Host
Awareness Key Components
Nutanix Availability Domain ( AD ) delivers data resiliency (simultaneous failure) of Data, Metadata and cluster configuration data to using
Replication Factor – RF 2 / 3 and Redundancy Factor – RF 2 /3.
Nutanix cluster key Awareness critical components are :
- Data ( Virtual Machine – VM Data )
- Metadata ( Cassandra )
- Cluster configuration data ( Zookeeper )
Read more : Nutanix Cluster Core Components List
Data
The Data is the one of the most critical key component in Nutanix cluster Awareness and must be maintain the redundant copy of the VM Data.
Nutanix Distributed Storage Fabric ( DSF ) ensure the the VM’s Data replicas are written to the other Node, Block or Racks in the Nutanix cluster in case of failure – node, block or rack the data is remains available without any damage or corruption.
Use case 1 : If you have 4 node cluster where a disk fails each CVM will handle 25% of the metadata scan and data rebuild.
Use case 2 : In a 10 node cluster, each CVM will handle 10% of the metadata scan and data rebuild.
Use case 3 : In a 50 node cluster, each CVM will handle 2% of the metadata scan and data rebuild.
Read more : Nutanix CVM Kernel Panic Issue
Metadata ( Cassandra )
Nutanix has its own customized dear Cassandra component to store metadata and important information to leverage ring-like structure and replicate the metadata and essential information to all nodes in the cluster.
Cassandra runs on all nodes of the cluster. These nodes communicate with each other once a second using the Gossip protocol, ensuring that the state of the database is current on all nodes.
Cassandra peers are replicate to all nodes in clockwise manner to distribute the peers among the blocks or racks to leverage the Awareness, to ensure no two peers are on the same block or rack.
Read also: Nutanix Cluster Most Critical Services
In AOS version 5.0 supported cluster Fault Tolerance FT level are FT1 and FT2 which correspond to metadata RF3 and data RF2, or metadata RF5 and data RF3 respectively.
Read more : Nutanix AOS Release STS Vs LTS
Cluster Configuration Data
Nutanix leverages Zookeeper to store essential configuration data for the cluster and distributed in a block or rack aware manner to ensure availability in the case of a block or rack failure.
In the event of a block or rack failure or outage, for example one of the Zookeeper ( there is 3 redundant copies of zookeeper ) nodes has gone down, the Zookeeper role would be transferred to another node in the cluster as shown below:
When the block or rack comes back online, the Zookeeper role would be transferred back to maintain block or rack awareness.
Read more : How To Shutdown And Start Nutanix AHV Cluster
Nutanix Cluster FT Level
The Block or Rack awareness is built into the software. Rack awareness is achieved by striping the cluster across racks and leveraging block awareness by which each “block” is actually in a different rack.
Availability Domain : Nutanix Cluster FT Level with RF 2 / 3
Awareness Level | Redundancy Factor RF | FT Level | Min. Units | Simultaneous failure tolerance | Disk failure tolerance |
Node | 2 | 1 | 3 Nodes | 1 Node | 1 Disk |
Node | 3 | 2 | 5 Nodes | 2 Node | 2 Disk |
Block | 2 | 1 | 3 Blocks | 1 Block | 1 Disk |
Block | 3 | 2 | 5 Blocks | 2 Blocks | 2 Disk |
Rack | 2 | 1 | 3 Racks | 1 Rack | 1 Disk |
Rack | 3 | 2 | 5 Racks | 2 Racks | 2 Disk |
Availability Domain : Nutanix Cluster FT Level with Eraser Coding ( EC )
Awareness Level | FT Level | Min. Units | Simultaneous failure tolerance |
Node | 1 | 4 Nodes | 1 Node |
Node | 2 | 6 Nodes | 2 Nodes |
Block | 1 | 4 Blocks | 1 Block |
Block | 2 | 6 Blocks | 2 Blocks |
Rack | 1 | 4 Racks | 1 Rack |
Rack | 2 | 6 Racks | 2 Racks |
Read also: Nutanix Infra Default Credentials
Conclusion
Nutanix Availability Domain is the uniqe feature providing by Nutanix hyper converged platform to maintain the redundancy of data and component using Fault Tolerance and Replication factor.