Nutanix Cluster Most Critical Services

Nutanix HCI Cluster Critical Services

Nutanix cluster most critical services of Acropolis Cluster runs tones of CVM services in background those are responsible to successfully create / form / build the Nutanix Acropolis Cluster, all the Nutanix cluster critical services runs into Nutanix Controller-VM CVM that is responsible to execute the Nutanix Acropolis Cluster services, elect the master-slave of critical services and work as storage control plane to control the Nutanix node’s storage to read / write data to/from storage media.

Nutanix Acropolis Cluster critical services list is here: SSL Terminator, Secure File Sync, Medusa, Dynamic Ring Changer, Pithos, Mantle, Hera, Stargate, Insights DB, Insights Data Transfer, Ergon, Cerebro, Chronos, Curator, Athena, Prism, CIM, Alert Manager, Arithmos, Catalog, Acropolis, Uhura, Snmp, Sys Stat Collector, Tunnel, Janus, Nutanix Guest Tools, Minerva CVM, Cluster Config, Mercury, APLOS Engine, APLOS, Lazan, Delphi, XTrim, Cluster Health etc.

Nutanix Cluster services List

Nutanix Acropolis Cluster is created / formed / build with dozens of Acropolis critical services those are responsible to run the Nutanix cluster successfully. You can check Nutanix cluster services list to execute the Nutanix cluster status command to get similar output as following:

cvm$ cluster status

CVM: 10.xx.xx.xx Up:     Service name   Status   Process ID
Zeus UP [17282, 17339, 17340, 17343]
Scavenger UP [19226, 19258, 19259, 19260]
SSLTerminator UP [19912, 19948, 19949, 27427]
SecureFileSync UP [19916, 19972, 19973, 19974]
Medusa UP [21332, 21397, 21398, 21410, 21643]
DynamicRingChanger UP [23925, 23969, 23970, 24038]
Pithos UP [23934, 23995, 23996, 24031]
Mantle UP [23942, 24027, 24028, 24098]
Hera UP [24009, 24079, 24080, 24081]
Stargate UP [24601, 24631, 24632, 24648, 24649]
InsightsDB UP [25207, 25243, 25244, 25319]
InsightsDataTransfer UP [25211, 25273, 25274, 25314, 25315, 25316, 25317]
Ergon UP [25215, 25309, 25310, 25312]
Cerebro UP [25257, 25356, 25357, 25472]
Chronos UP [25291, 25397, 25398, 25435]
Curator UP [25365, 25431, 25432, 25484]
Athena UP [25403, 25468, 25469, 25470]
Prism UP [25785, 25820, 25821, 25893]
CIM UP [25789, 25849, 25850, 25880]
AlertManager UP [25793, 25877, 25878, 25951]
Arithmos UP [25823, 25912, 25913, 26068]
Catalog UP [25860, 25944, 25945, 25947]
Acropolis UP [25910, 26003, 26004, 26006]
Uhura UP [25939, 26059, 26060, 27256]
Snmp UP [25961, 26117, 26119, 26121]
SysStatCollector UP [26010, 26134, 26135, 26136]
Tunnel UP [26066, 26166, 26167]
Janus UP [26113, 26194, 26195]
NutanixGuestTools UP [26198, 26244, 26245, 26247]
MinervaCVM UP [27051, 27093, 27094, 27095, 27474]
ClusterConfig UP [27059, 27125, 27126, 27128]
Mercury UP [27068, 27157, 27158, 27185]
APLOSEngine UP [27098, 27181, 27182, 27183]
APLOS UP [27489, 27613, 27614, 27617]
Lazan UP [27575, 27642, 27643, 27644]
Delphi UP [27586, 27671, 27672, 27673]
XTrim UP [27623, 27698, 27699, 27701]
ClusterHealth UP [27663, 27740, 27741]

You can see there is dozens of services are running by the Nutanix Controller-VM CVM.

Explore Nutanix Acropolis Cluster Services

It is very important to know about every Nutanix HCI Acopolis Cluster service and responsibilities to get good understanding of Nutanix Acropolis Cluster services. Lets explore the every Nutanix services one by one in depth. Read also Nutanix Cluster’s Components and Acropolis Services Explained

Nutanix Acropolis Cluster services list is here:
Medusa SSLTerminator SecureFileSync DynamicRingChanger Pithos Mantle Hera Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura Snmp SysStatCollector Tunnel Janus NutanixGuestTools MinervaCVM ClusterConfig Mercury APLOSEngine APLOS Lazan Delphi XTrim ClusterHealth etc.

Zeus Service

Nutanix Zeus is the interface for Zookeeper to give information access to other components about Nutanix cluster included Store all cluster information.
A key element of a distributed system is a method for all nodes to store and update the cluster’s configuration.

Zeus maintains Cluster configuration that contains information about physical components ( nodes, disks ) and logical components ( storage container ) in the cluster. Zeus keep tracks of nodes IP address, capacities and data replication rules like Nutanix RF-2 vs RF-3. Zeus is the Nutanix library that all other components use to access the cluster configuration

Medusa Service

Nutanix Medusa Service is a Distributed systems that store data for other systems (for example, a hypervisor that hosts virtual machines) must have a way to keep track of where that data is. In the case of a Nutanix cluster, it is also important to track where the replicas of that data is stored.

Medusa is a Nutanix abstraction layer that sits in front of the database that holds this metadata. The database is distributed across all nodes in the cluster, using a modified form of Apache Cassandra.

Scavenger Service

Nutanix Scanvenger is the file cleanup manager in Nutanix Cluster work with medusa manager, Curator and stargate to clean up garbage data from storage media.

SSL Terminator

Nutanix SSL Terminator uses a certificate for authentication before sending encrypted data from a client computer to the web server. SSL termination, a form of SSL offloading, shifts some of this responsibility from the web server to a different machine.

SSL Terminator Load balances decrypt the traffic at the traffic manager and pass unencrypted traffic to the back-end node. Because of this, the customer’s back-end nodes don’t know what protocol the client requested. Therefore the X-Forwarded-Proto (XFP) header has been added for identifying the originating protocol of an HTTP request as “http” or “https” depending on what protocol the client requested.

To be used with HTTP to HTTPS redirection, secure-Port must be set to 443 and secure-Traffic-Only must be true. Read more

SSL Terminator

Secure File Sync

Nutanix Secure file sync commonly encrypts data in flight using Secure Sockets Layer or Transport Layer Security encryption. Rather than simply storing synchronized files directly in a device’s file system as a consumer-grade service might, products in the EFSS market usually enforce file system encryption for synchronized data or store data in an encrypted “vault” on the device.

Secure File Sync

Dynamic Ring Changer

Nutanix Metadata uses dynamic ring changer to perform metadata Replication Factor ( RF ) RF2 / RF3 Migration scans to increase the RF. Nutanix Dynamic Ring Changer is responsible for RF 2 / 3 and create the ring accordingly to maintain the metadata in ring of all nodes in the Nutanix cluster.

Pithos

Nutanix Pithos is the virtual Disk ( vDisk ) configuration Manager, work as storage service on NDFS ( Nutanix Distributed File System ) to configure the data. Pithos runs on every node / host with built-in on top of the Cassandra Distributed Metadata Store. Pithos manager stored data as Objects, organized in containers in Nutanix HCI platform. read more about Pithos manager

Mantle

Nutanix Mantle is the Local Key manager that’s rotate the master key sync in the cluster to each node to deliver enhanced data security with encryption over the stored data in Nutanix Acropolis Cluster.

Nutanix Mantle performs following responsibilities:

  • Data is encrypted at all times.
  • Data is inaccessible in the event of drive or node theft.
  • Data on a drive can be securely destroyed.
  • Re-key of the master encryption key at arbitrary times is supported.

Nutanix implements a data security configuration that uses AOS functionality along with the key management server. Nutanix uses open standards (KMIP protocols) for interoperability and Nutanix CVM strong security with networks Ports.

Hera

Nutanix Hera full form is High Efficiency Reliable Access ( HERA ) of data store gateway to scale high speed database access to the terabytes of data in real time.

Advantages of Nutanix Hera:

  • Hera is Data Access Gateway for databases and is a key enabler for scaling and improving availability databases.
  • It Protects the database from resource exhaustion by evicting poorly performing queries,
  • Intelligently routes read/write traffic appropriately for better load balancing,
  • Improves tolerance to database outages,
  • Provides high performance secured connections between applications and Hera
  • Provides domain agnostic database sharing for horizontal database scaling.
  • Automatic transaction application fail-over between replica databases

Read more about Hera algorithm software: HERA is the PayPal open source algorithm to access database

Stargate

Nutanix Stargate is Data I/O manager responsible for all data management and I/O operations and is the main interface from the hypervisor (via NFS, iSCSI, or SMB). Stargate service runs on every node in the cluster in order to serve localized I/O.

A distributed system that presents storage to other systems (such as a hypervisor) needs a unified component for receiving and processing data that it receives

All read and write requests are sent across vSwitch Nutanix to the Stargate process running on that node. Stargate depends on Medusa to gather metadata and Zeus to gather cluster configuration data.

Read Aslo How Nutanix Protects from Bit-rot Data silent corruption

Insights DB Database

Nutanix InsightsDB / Insights DB / Insights Database is a fast, lightweight and performance is much more consistent. Insights DB databases are supported by InSiGHT curators, who in turn are supported by panels of experts who review the pathogenicity assignments based on the broad experience of InSiGHT members and published information.

Insights DB databases are generally considered to present the most authoritative interpretation of the variants, based on defined criteria for interpretation. The data are published with the rider that they are for diagnostic use for individual families; any research use requires permission of the curator and relevant submitter. Read more Insights Database

Insights Data Transfer

Nutanix Insights Data Transfer Service automates data movement from SaaS applications to capacity storage on a scheduled, managed basis. This is much more efficient as the hypervisor doesn’t need to be the ‘man in the middle’. Which include full copy and zeroing operations.

However, contrary to VAAI which has a ‘fast file’ clone operation (using writable snapshots), the ODX primitives do not have an equivalent and perform a full copy. Given this, it is more efficient to rely on the native DSF clones which can currently be invoked via nCLI, REST, or PowerShell CMDlets.

Ergon

Nutanix Ergon is the task manager that is responsible to start and kill the running , stuck tasks automatically and/or manually if needed. Ergon command has two types First is: ergon_task_update that is available prior to AOS version 5.5. After the second type of Ergon command launched with AOS version 5.5 or later ecli ( Ergon Command Line Interface ) to manually kill the task if task(s) stuck.

 Example:
(Option 1) nutanix@NTNX-A-CVM::~$ acli task.list
(Option 2)nutanix@NTNX-A-CVM::~$ ecli task.list include_completed=false
Output : -Task UUID Parent Task UUID Component Sequence-id Type Status 3f814122-0bd8-4cc4-af3d-f17763c2e7f0 lcm 1 kLcmInventoryOperation kRunning  
nutanix@NTNX-A-CVM::~$ ergon_update_task --task_uuid='3f814122-0bd8-4cc4-af3d-f17763c2e7f0' --task_status=succeeded or aborted
WARNING: Using this command can cause database corruption and complete system failure, if used improperly.
Are you sure you want to continue? (y/n) 

Cerebro

Nutanix Cerebro is the DR Replication manager to replicate the snapshots from DC to DR scenario and vise-versa if remote site and protection domain is configured. Nutanix Cerebro invoked the command to do I/O operation by Stargate.

Chronos

Nutanix Chronos is the Job and Task Scheduler that is responsible to receiving the job and task resulting from Curator scan and schedule the task on each node. Nutanix chronos is elected as master and run on the same node where is Curator master runs.

Curator

Nutanix Curator is the cleanup manager that run MapReduce algorithm to scan the cluster metadata and VMs master data on DSF storage. Curator depends on Zeus to learn which nodes are available, and Medusa to gather metadata. Based on that analysis, it sends commands to Stargate.

Athena

Nutanix Athena ( Authentication and Authorization ) is an interactive query service that makes it easy to analyze data directly in Cassandra Metadata and VM’s master data. Athena helps you analyze unstructured, semi-structured, and structured data stored on distributed storage.

Nutanix Athena is server-less, so there is no infrastructure to set up or manage, Athena scales automatically executing queries in parallel, so results are fast, even with large data-sets and complex queries.

Prism

Nutanix Prism is the single pan of glass to manage the Nutanix HCI infra from single Web-based console works as Prism as Service which has the ability to manage distributed resources across the cluster and it gives the flexibility to manage and monitor objects and services of Nutanix cluster. read more about Nutanix Prism Core Architecture Explained

Read Also Nutanix Prism web console is slow, not working, hanging issues troubleshooting

Common Information Model CIM

Nutanix CIM ( Common Information Model ) CIM server / Service / daemon is simply a server daemon that allows connections using the Common Information Model. It’s typically used by vendors and OEMs as a way for their management and monitoring software and/or services to connect to their servers. Nutanix uses CIM to establish the Acropolis cluster with IPMI hardware management interface to get the hardware information and hardware health status to present in Nutanix Prism / Prism central console.

Alert Manager

Nutanix Alert Manager is responsible to receive the hardware and Nutanix Acropolis cluster health status report periodically and prompt the alert on severity basis as shown below:

Displays the severity level of this condition. There are three levels:
Critical
A “critical” alert is one that requires immediate attention, such as a failed Controller VM.
Warning
A “warning” alert is one that might need attention soon, such as an issue that could lead to a performance problem.
Informational
An “informational” alert highlights a condition to be aware of, for example, a reminder that the support tunnel is enabled.

Nutanix Alert manager provide the option to manually resolve, acknowledge the alerts.

If need we can restart the health service through this command:
Restart Nutanix cluster_healthallssh genesis stop cluster_health; cluster start

Arithmos

Nutanix Arithmos is Stats engine that shows metrics reports for the Nutanix cluster, hosts, VMs, hardware, IOPs, latency etc on Prism. Hypervisor related metrics information is retrieved using Hyperint component running in Nutanix CVM.

Nutanix Arithmos stores the information, it receives from Hyperint, Prism service uses information stores in Arithmos to display information in Prism GUI. Hyperint Network port list here:

TCP / 2030Hyperint OR Acropolis
TCP / 2031Hyperint Monitor
TCP / 2032Hyperint JMX
TCP / 2033Acropolis Hyperint slave

Restart the Hyperint service on Nutanix cluster to run following command:
cvm$ allssh ~/cluster/bin/genesis stop hyperint && cluster start

Read Also Nutanix Acropolis AHV Core Architecture Explained

Catalog Service

Nutanix Catalog service that is available in Prism Central includes a catalog service for storing VM snapshots and images. A Prism Central or self-service administrator creates this catalog of objects so that self-service users who have permissions to create a VM can use them.

Acropolis Service

Nutanix Acropolis Service runs on each Controller-VM CVM on Master-slave fashion on every CVM with an elected Acropolis Master which is responsible for task scheduling, execution, IPAM, etc.  Similar to other components which have a Master, if the Acropolis Master fails, a new one will be elected. Read more about Nutanix Acropolis Service

Read Also Nutanix Acropolis acli vs ncli Command Explained

SNMP Protocol

Nutanix provides SNMP Protocol built-in feature to enable the integration with SNMP server to monitor the Nutanix HCI cluster activity logs, this data can help IT professionals keep their finger on the pulse of all their managed devices and applications. Every device within the network can be queried in real time with SNMP, TCP, and other types of probes for their performance metrics. Read more how SNMP Protocol works

Sys Stat Collector

Nutanix Sys State Colletctor is a service that is responsible to collect the system logs and generate the system stats to store in local datbase for three months in Prism element.

Secure Tunnel Service

Nutanix SSH Secure Tunnel Service provides built-in Remote Support Tunnel feature in Prism so that Nutanix support can access Nutanix Acropolis cluster through Nutanix CVM remote Secure Tunnel service using network port with SSH protocol if customer allow to do troubleshooting directly on Nutanix cluster without using any remote desktop third party software. Nutanix SSH Tunnel Service. Read more about Nutanix Secure tunnel Service and Network ports list

Janus

Nutanix Janus is the Prism UI web console configuration of Cloud Connect for replicating VMs to Amazon Web Services run on Network port number 2022.

Read Also Nutanix HCI Infra Network Port List

Nutanix Guest Tools

Nutanix guest tools (NGT) is a software bundle that you can install in a guest virtual machine (Microsoft Windows or Linux) to enable the advanced functionality provided by Nutanix. The NGT bundle consists of the following components. Nutanix Guest Agent (NGA) service. Communicates with the Nutanix Controller VM. Read more Nutanix Guest Tool NGT Installations in Windows and Linux VM

Minerva CVM

Nutanix Minerva CVM is created When a Files file server is deployed on a Nutanix cluster, it creates multiple (3 or more) guest VMs on the cluster. The Files cluster is composed of these FSVMs.

To start a Nutanix cluster, you must log into the CVM that runs Files with SSH.

Get a list of file servers.
nutanix@cvm$ minerva get_fileservers

To start a file server, enter the following command from one of the CVMs in the Nutanix base cluster.
nutanix@cvm$ minerva -f file_server_uuid start

Note: Replace file_server_uuid with the UUID of the file server.

To start all file servers, enter the following command from one of the CVMs in the base Nutanix cluster:
nutanix@cvm$ minerva -a start

Cluster Config

Nutanix Cluster config file is store by zookeeper manager to centralized the cluster information like Host IP address, host, disk, container, Redundancy factor RF2 / RF3. Zookeeper runs on either three or five nodes, depending on the redundancy factor that is applied to the cluster. one Zookeeper node is elected as the leader in the cluster.

Read Also Nutanix Prism Central : Pro Vs Starter Features

Mercury

Nutanix Mercury handles dynamic data structures by providing several abstract data types in the standard library that manage collections of items with different operations and trade offs. Programmers can also create their own abstract data types. Read more about Mercury project

APLOS Engine

Nutanix APLOS Engine creates and manage to create recovery plans. APLOS acts a proxy for incoming requests and will feed them to the APLOS Engine.
Read more Nutanix CALM APLOS Engine

Lazan

Nutanix Lazan Resource allocator running on Prim Central. Provides target cluster for replication based on certain constraints. The local cerebro will talk to Lazan on PC to pick the right cluster. Once it finds the right Prism Element the remote site will be automatically created.

Delphi

Nutanix Delphi method is a forecasting process framework based on the result. Delphi Technique is a method used to estimate the likelihood and outcome of future events. A group of experts exchange views, and each independently gives estimates and assumptions to a facilitator who reviews the data and issues a summary report. Read more Delphi Technique / method

XTrim

Nutanix XTrim / Xtreme Computing Platform includes a variety of features to enable you to administer your environment according to your current and future needs. You can use the default feature set of AOS, upgrade to a richer feature set, update / install your license ( Starter, Pro and Ultimate ) for a longer term, or reassign existing licenses to nodes or clusters as needed.

Cluster Health

Nutanix Cluster health dashboard displays dynamically updated health information about VMs, hosts, disks, storage pools, containers, cluster services in the cluster.

A set of health checks are run regularly that provide a range of clusters health indicators. You can specify which checks to run and configure the schedule and other parameters for each health check.

The cluster health checks cover a range of entities including Nutanix AOS, hypervisor, and hardware components.

Epsilon

Nutanix Epsilon is the Calm orchestration engine running on Prism Central – PC. To facilitate the DR orchestration Epsilon will carry out VM Restore, Migrate / Failover and Power On requests. Epsilon ensures all task executions are handles either success or failure. This service runs as a docker container.

Read Also Nutanix Karbon Integrated with Kubernetes Platform

Conclusion

Above Nutanix Acropolis services are extremely critical to create / form / build the Nutanix cluster and run the Nutanix Acropolis cluster with proper functionality. Nutanix Acropolis is build with lots services in other word – Nutanix Acropolis cluster is storm of services, everything is a service and run as service in Nutanix cluster.

If you have question / doubt ? you can mention in comment to get more clarity on it.

Thanks to being with HyperHCI Tech Blog to learn something new – every day.!