How Nutanix Protects from Bit-rot Data silent corruption

Nutanix Controller VM ( CVM) is designed to prevent from Bit-rot data silent corruption on local disk SSD / HDD storage and keep VMs data safe.

Bit-rot is the most dangerous issue and enemy of data stored on any type of storage media, the Software Defined Storage (SDS) has solution for this issue.

What is Bit-rot ?

Bit-rot is the silent killer of data or deterioration of the integrity of data stored on storage media.

Bit-rot kills / corrupt the data in background without your knowledge.
Bit-rot also known as data rot, data / bit decay, silent corruption.

Bit-rot works silently and corrupt data bits slowly over the time and depends on aging factor of disk usage.

Example : Warms eat leaf bit by bit and make lots of holes in leaf and made leaf useless, in same way Bit-rot works and corrupt data bits stored on SSD /HDD drive storage and made your saved data useless, no longer to access.
“Display error : Data corrupted or missing

Why Bit-rot Data corruption happens ?

There is many reasons to Bit-rot data corruption take place on storage and corrupt the data silently over the time.

Bit-rot is the known issue for “Storage vendors” like Dell EMC PowerMax, X-IO ISE 900, HPE 3PAR StoreServe, NetApp AFA A800, Fujitsu Eternus, IBM FlashSystem 9100, Tegile IntelliFlash etc. and all are aware and designed core programs to run for periodic scan on storage media to detect the Bit-rot issue.

Bit-rot may happens due to following reasons

  • Changing or lose the electric charge on hard disk which stores the data bits on storage SSD /HDD
  • Bit-rot cause of wear or attrition
  • Dust could be reason for Bit-rot
  • Radiation charge on storage SSD /HDD
  • High heat may impact on storage SSD / HDD
  • If storage SSD /HDD making noise on power-on disk
  • Disk over head up may cause the Bit-rot issue
  • Corrosion : Erasing data, generates “thermal asperity generated heat” could impact the storage SSD / HDD sectors to fade up the data bits

Nutanix Protects from Bit-rot issue

Nutanix HCI is different from other HCI solutions, Nutanix has efficient way for data protection, prevention and integrity to provide data high availability and high resilient storage known as ADSF ( Acropolis Distributed Storage Fabric).

Nutanix ADFS provide core software-based data protection mechanisms in place to prevent data loss due to bit rotting and include data protection and recovery mechanisms as well.

Bit-rot Prevention Methods

Nutanix cluster uses three methods handling by Nutanix CVMs

Checksum Algorithm

Nutanix uses data checksum algorithm for each I/O ( Read and write ) operation and data checksum in metadata store to maintain data integrity.

This I/O operating is being performed by Nutanix cluster component called Stargate, is responsible to compute the checksum of each data which is being written on local storage and create metadata DB for future reads data validation.

Stargate validate each reads and writes must have checksum value to validate the data integrity.

Read more : Nutanix Cluster’s Components Explained

Replication Factor RF 2 / 3

Nutanix uses Replication factor RF 2 / 3 to make 2 or 3 replicas (duplicate copy ) of VMs data on ADFS file system to maintain redundancy of data in case of drive failure or host failure.

Use case : If any byte of master’s data is corrupted cause of any reason ( Bit-rot/ drive / host failure), Stargate validate each byte of data during reading and writing to match checksum of data, if any corruption detected then stargate copy corrected data from replica and correct the corrupted data.

MapReduce Algorithm

Nutanix MapReduce algorithm runs by Nutanix component called Curator is responsible to drive scrubbing to erase junk data ( No longer to use or deleted by guest VMs) to clean the disk to increase the drive health.

The disk scrubbing activity is done at low priority for all disks in the cluster. Any corrupted data result in the data replica getting marked as bad – thus triggering off replication from a good replica.

So even if a disk sector was to go bad after a successful I/O, Stargate’s scrubber operation would detect it and then create new replicas as necessary.

Conclusion

Nutanix pro-actively scan Bit-rot errors on storage to mitigate it to make corrected copy of data in case of Bit-rot error detected on drives sector(s).

Data integrity job is being done by curator and stargate both in different manner.


Thanks guys to being here Enjoy Hyperhci blog to stay tuned.
You can also join Hyper HCI Blog on Social network to get latest and trending technology blogs.!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑