Site icon HyperHCI.com

Nutanix RF 2 / 3: Redundancy Factor Vs Replication Factor Explained

Nutanix Redundancy factor Replication factor RF 2 / RF 3

Nutanix Redundancy factor Replication factor RF 2 / RF 3

Do you know what is difference between Nutanix HCI Redundancy factor RF 2 / 3 Factor vs Replication factor RF 2 / 3 Factor vs Resiliency Status. Nutanix RF2 vs RF3 both have different working mechanism but use same concept to give extra layer of component(s) failure security i.e hardware component(s) failure and software defined storage media failure.

On Nutanix HCI platform we can change / increase / upgrade the Nutanix Redundancy factor RF from RF 2 to RF 3 but can not vise-verse.

Lets talk about Nutanix Redundancy factor vs Replication factor ( Nutanix RF 2 vs RF 3 ) in detail.

Nutanix Fail-over techniques

Nutanix fail-over techniques are divided into two factors, first is hardware fail-over technique to make hardware redundant to using Redundancy factor RF 2 / 3 and second is data redundancy fail-over to protect the data to using Replication factor Rf 2 / 3 as following:

What is Nutanix Redundancy ?

The Nutanix Redundancy is the technique provision of functional capabilities that would be give continuous operation or no interruption in operation in case of component(s) failure.

Redundancy Factor Rf 2 / 3 depends on the automatic fault tolerance mechanism that relies on specialized hardware to detect a hardware fault or component failure and instantaneously switch to a redundant hardware component, whether the failed component is a processor, memory, power supply PDU , I/O subsystem, or storage subsystem , storage media or drives and cut over is apparently seamless and offers non-stop service.

But it would not necessarily maintain the safe state to having full functionality or fidelity. The system may operate in a degraded state and it would not put the system immediately in a dangerous state.

Failure Hardware Component List

In real scenarios, Every hardware vendor provide redundancy for hardware component failure for non-interrupting operations.

Nutanix Fail-over Component Table

Here is list what component(s) might redundant along with fault tolerance mechanism.

What is Nutanix Redundancy Factor ?

Nutanix Redundancy factor RF 2 / 3 is the technique that is determine how many component can be sustained in case of failure to deliver non-interrupting operations.

Nutanix offer two Redundancy factor of component failure :

Nutanix Redundancy Factor 2 RF-2

Nutanix Redundancy Factor 2 ( RF 2 ) which is default Redundancy factor required to build the Nutanix cluster. Nutanix required minimum three nodes to form the enterprise level along with fault tolerance Nutanix cluster to sustain the single component failure scenario.

If Nutanix cluster configured with Redundancy Factor 2 RF-2 it means Nutanix’s hardware has two components redundancy in case of one component failed the another / backup one component will take-over the operation load without any interruption and deliver one components failure sustainability in the Nutanix cluster.

Use Case : Nutanix’s server has two Power Supply Units ( PDU ) in redundancy state, if one PDU failed / faulty cause of any reason the another one will take-overt the responsibility to supply the power to the Nutanix server without any interruption or downtime.

Nutanix Redundancy Factor 3 RF-3

Nutanix Redundancy Factor 3 ( RF 3 ) delivers the advanced level Redundancy, Nutanix hyper converged HCI cluster required minimum five nodes to form the Nutanix cluster and can sustain up to two components failure simultaneously.

Redundancy Factor 3 ( RF 3 ) means Nutanix’s hardware has three components redundancy in case of two component failed the another / backup two component will take-over the operation load without any interruption and deliver two components failure sustainability in the Nutanix cluster.

Nutanix Redundancy Factor 3 ( RF 3 ) is useful in most critical environment where Nutanix Redundancy Factor 2 ( RF 2 ) is not enough to handle more than one component failure scenario.

Use Case : If Nutanix cluster having five or more nodes in the cluster, then Nutanix cluster can sustain up to two nodes or storage drives failure without any interruption or break down and will continue the operation.

What is Nutanix Resiliency Status ?

Nutanix Resiliency Status also known for data protection to create data redundancy and highest degree of availability of master data block distributed in the Nutanix cluster.

Nutanix Resiliency mechanism make ensure the data blocks are redundant or having one / two copies, a majority of nodes must agree before anything is committed, which is enforced using the Paxos algorithm.  This ensures strict consistency for all data and global metadata stored as part of the platform.

The Nutanix platform currently uses a Resiliency mechanism checksum to ensure data redundancy and availability in the case of a node or disk failure or corruption.  As explained above, the OpLog acts as a staging area to absorb incoming writes onto a low-latency SSD tier.

Upon being written to the local OpLog, the data is synchronously replicated to another one or two Nutanix CVM’s OpLog (dependent on RF) before being acknowledged (Ack) as a successful write to the host.  This ensures that the data exists in at least two or three independent locations and is fault tolerant.

Read more What is Nutanix Acropolis (AOS) ?

Nutanix Replication Factor 2 RF-2

Nutanix Replication Factor 2 ( RF 2 ) is required minimum three nodes with Redundancy factor 2 RF-2 to maintain one copy of Master data block ( VM’s data and OpLog ) to sustain the single storage drive failure or cause bit-rot or data corruption on drive on single drive or single node failure in Nutanix cluster.

Nutanix Node Failure Scenario

Nutanix Replication Factor 3 RF-3

Nutanix Replication Factor 3 ( RF 3 ) is required minimum five nodes to maintain two copies of Master data block ( VM’s data and OpLog ) to sustain the two storage drive failure or cause bit-rot or data corruption on drive on single drive or two nodes failure in Nutanix cluster.

How Replication Factor Works ?

Nutanix Replication Factor (RF 2 / 3 ) Data RF is configured via Prism and is done at the container level. All nodes participate in OpLog replication to eliminate any “hot nodes”, ensuring linear performance at scale.

While the data is being written, a checksum is computed and stored as part of its metadata. Data is then asynchronously drained to the extent store where the RF is implicitly maintained.In the case of a node or disk failure, the data is then re-replicated among all nodes in the cluster to maintain the RF.

Any time the data is read, the checksum is computed to ensure the data is valid.  In the event where the checksum and data don’t match, the replica of the data will be read and will replace the non-valid copy.
In the case of a node or disk failure, the data is then re-replicated among all nodes in the cluster to maintain the RF.

Any time the data is read, the checksum is computed to ensure the data is valid.  In the event where the checksum and data don’t match, the replica of the data will be read and will replace the non-valid copy.

Data is also consistently monitored to ensure integrity even when active I/O isn’t occurring. The Nutanix cluster component Stargate’s scrubber operation will consistently scan through extent groups and perform checksum validation when disks aren’t heavily utilized. This protects against things like bit rot or corrupted sectors.

Nutanix Topx Video

Nutanix Redundancy Factor Vs Resiliency / Replication Factor

Change Redundancy Factor RF on Nutanix cluster ?

We can change Nutanix Redundancy Factor RF Number RF-2 to RF-3 but not vise versa from Nutanix Prism Web console or command line as well.

Watch video how to do it.

https://www.youtube.com/watch?v=scvzQ3SNAEE
Change Nutanix cluster RF Number

Conclusion

Nutanix delivers the best fail-over mechanism ie. Redundancy Factor ( RF 2 / 3 ) for hardware component fault tolerance and Replication Factor ( RF 2 / 3 ) for data protection to create redundant copies of data block to handle one or two component failure scenarios without any operation interruption in production environment.

Exit mobile version