In this 16th part of the VMware vSphere 8 Nested Home Lab series, we will learn to configure VMware vSphere HA.
VMware vSphere HA is a clustering feature that ensures automatic failover of virtual machines (VMs) in case of host failure. It monitors host and VM health, and if a failure is detected, it restarts affected VMs on healthy hosts within the cluster—minimizing disruption and preserving uptime.
How HA Works Behind the Scenes
- Master-Slave Architecture: One host is elected as the master to monitor the cluster and coordinate failovers.
- Heartbeat Monitoring: Hosts exchange heartbeats via management networks and datastores to detect failures.
- Isolation Response: If a host loses network connectivity but remains powered on, HA can shut down or restart VMs based on configured policies.
- Admission Control: Ensures enough resources are reserved to restart VMs in case of failure.
Core Features of vSphere HA
Automatic VM Restart
- Detects host failures and restarts affected VMs on healthy hosts.
- Ensures minimal disruption to services.
Host Monitoring
- Uses heartbeats and network pings to monitor ESXi host health.
- Declares a host failed if it stops responding within a set interval.
VM Monitoring
- Tracks VM heartbeats via VMware Tools.
- Restarts VMs if the guest OS becomes unresponsive.
- Can be extended to Application Monitoring for deeper insight.
Admission Control
- Reserves resources to guarantee failover capacity.
- Policies include:
- Slot-based
- Dedicated failover hosts
- Cluster resource percentage
Proactive HA
- Monitors hardware health (e.g., CPU, memory, power supply).
- Migrates VMs off degraded hosts before failure occurs.
- Integrates with hardware vendors for predictive alerts.
Isolation Response
- Handles cases where a host loses network connectivity but isn’t fully down.
- Options include:
- Power off and restart VMs
- Shut down and restart VMs
- Leave VMs powered on
Heartbeat Datastores
- Adds redundancy to host monitoring using datastore heartbeats.
- Helps distinguish between host failure and network partition.
Failure Detection & Response
- Covers multiple failure types:
- Host failure
- Datastore APD (All Paths Down)
- Datastore PDL (Permanent Device Loss)
- Configurable responses include issuing events or restarting VMs.
Restart Priorities & Dependencies
- Define which VMs restart first after a failure.
- Set dependencies so critical services boot before others.
Advanced Options
- Fine-tune HA behavior using custom configuration strings.
- Useful for complex or non-standard environments.
Pro Tip: Use Proactive HA to detect hardware degradation before failure occurs, allowing for preemptive VM migration.
Best Practices for HA Configuration
| Practice | Benefit |
|---|---|
| Use redundant management networks | Improves failure detection reliability |
| Enable datastore heartbeats | Helps distinguish between host and network failures |
| Configure restart priorities | Ensures critical VMs recover first |
| Monitor VM Tools heartbeats | Enables granular VM-level failover |
| Regularly test failover scenarios | Validates your HA setup under real conditions |
Step-by-Step: Enabling HA in vSphere 8
Let’s walk through the process of activating HA in a vSphere 8 cluster. Login to vCenter Server at https://vcenter01.virtshinobi.local/ui using admin credentials.
In the Hosts and Clusters view, select Cluster-01. On the right side pane navigate to Configure tab and select vSphere Availability under Services. Click on Edit.

Toggle vSphere HA to ON and make sure Enable Host Monitoring toggle is enabled.

Host Failure Response Options:
These settings determine what vSphere HA does when a host fails or becomes isolated from the cluster.
| Option | Behavior |
|---|---|
| Restart | • Default and most common setting. • When a host fails, HA restarts affected VMs on other healthy hosts in the cluster. • VMs are restarted based on their restart priority (High, Medium, Low). • Ensures minimal downtime and automatic recovery. |
| Disabled | • Turns off host monitoring. • No VMs are restarted if a host fails. • Useful during maintenance or when HA is not required. |
Select the default option to Restart VMs and move on to the next section.

Host Isolation Response Options:
This kicks in when a host loses management network connectivity but is still running.
| Option | Behavior |
|---|---|
| Disabled | • VMs continue running on the isolated host. • No restart occurs unless the host completely fails. • Reduces false positives but risks split-brain scenarios if the host is truly isolated. |
| Power Off and Restart VMs | • VMs are forcibly powered off on the isolated host. • HA restarts them on another host. • Fast recovery, but may risk data loss if VMs weren’t gracefully shut down. |
| Shut Down and Restart VMs | • VMs are shut down gracefully using VMware Tools. • If shutdown isn’t completed within a timeout (default: 300 seconds), they’re powered off. • Preserves VM state and reduces risk of corruption. |
Select the default option Disabled and move on to the next section.

Datastore with PDL Options:
Permanent Device Loss (PDL) occurs when a storage device (like a LUN) becomes permanently inaccessible to an ESXi host. This typically happens due to:
- Hardware failure
- Improper zoning or masking
- Manual removal of a device
The storage array sends SCSI sense codes to indicate the device is gone. Once received, the ESXi host:
- Stops all I/O to the device
- Marks the device as lost
- Closes VM I/O sessions
With VMCP (VM Component Protection) enabled, vSphere HA can detect PDL and take action based on the settings below:
| Option | Behavior |
|---|---|
| Disabled | No action taken. VMs may hang or crash. |
| Issue Events | Alerts are generated, but VMs are not restarted. |
| Power Off and Restart VMs | Affected VMs are powered off and restarted on healthy hosts with access to the datastore. |
Best Practice: Always set PDL response to Power Off and Restart VMs to ensure recovery and uptime continuity.

Datastore with APD Options:
All Paths Down (APD) occurs when an ESXi host loses all access paths to a storage device, but the device doesn’t report a permanent failure. It’s a transient issue—maybe caused by network hiccups, SAN misconfigurations, or temporary outages.
Unlike Permanent Device Loss (PDL), APD might resolve itself. That’s why VMware gives you nuanced control over how HA responds.
To protect VMs during APD events, you enable VM Component Protection (VMCP) in your cluster settings.
APD Response Options:
| Option | Behavior |
|---|---|
| Disabled | No action taken. VMs may hang or become unresponsive. |
| Issue Events | Alerts are generated, but VMs are not restarted. |
| Power Off and Restart VMs – Conservative | VMs are restarted only if another host with datastore access is available. |
| Power Off and Restart VMs – Aggressive | VMs are restarted even if HA can’t confirm another host has access. Riskier, but faster recovery. |
Select the default option Power Off and Restart VMs – Conservative Restart Policy and move on to the next section.

VM Monitoring Options:
In VMware vSphere HA, VM Monitoring is a powerful feature that goes beyond host-level protection by watching individual virtual machines for signs of failure.
How VM Monitoring Works
- Heartbeat Detection: Uses VMware Tools to detect if the guest OS is responsive.
- I/O Activity Check: If heartbeats fail, it checks for disk I/O to avoid false positives.
- Reset Logic: If both heartbeat and I/O are absent, the VM is restarted.
Application Monitoring
- Requires integration via SDK or supported apps.
- Monitors app-specific heartbeats and restarts the VM if the app fails.
| Option | Behavior |
|---|---|
| Disabled | No VM or application monitoring. |
| VM Monitoring Only | Monitors VM heartbeats via VMware Tools. |
| VM and Application Monitoring | Adds application-level heartbeat checks (requires SDK or supported apps) |
You can configure the setting to your liking, we will keep the default value of Disabled selected and move on to next section.

VM Monitoring Sensitivity:
In VMware vSphere HA, VM Monitoring sensitivity settings determine how aggressively the system responds to signs of VM failure. You can choose from preset levels or define custom thresholds to fine-tune behavior.
Preset Sensitivity Levels
These are quick options available via a slider in the vSphere Client:
| Sensitivity | Failure Interval | Minimum Uptime | Max Resets | Reset Time Window |
|---|---|---|---|---|
| Low | 120 sec | 480 sec | 3 | 7 days |
| Medium | 60 sec | 240 sec | 3 | 24 hours |
| High | 30 sec | 120 sec | 3 | 1 hour |
- Failure Interval: Time without heartbeat or I/O before VM is considered failed.
- Minimum Uptime: Delay after VM boots before monitoring starts.
- Max Resets: Limits how often a VM can be restarted.
- Reset Time Window: Timeframe for counting resets.
Custom Sensitivity Settings:
If presets don’t suit your environment, you can manually configure:
- Failure Interval (e.g., 45 seconds)
- Minimum Uptime (e.g., 180 seconds)
- Maximum per-VM resets (e.g., 5)
- Maximum resets time window (e.g., 12 hours)
This is ideal for workloads with unique responsiveness or recovery needs.
Leave the default settings and click Admission Control tab to move on to next section.

Admission Control:
HA Admission Control ensures that the cluster reserves enough resources to restart VMs in case of host failure. It’s a safeguard against overcommitting resources and losing availability guarantees.
Admission Control Policies
| Policy Name | Description |
|---|---|
| Cluster Resource Percentage (Default) | • Reserves a percentage of CPU and memory for failover. • Automatically adjusts based on the number of tolerated host failures. • You can override the calculated percentage if needed. |
| Slot Policy | • Calculates slot size based on VM reservations. • Determines how many VMs can be restarted based on available slots. • Best for clusters with uniform VM resource reservations. |
| Dedicated Failover Hosts | • Assigns specific hosts for failover only. • These hosts don’t run VMs during normal operation. • Rarely used due to inefficiency. |
| Disabled | • No resource reservation for failover. • VMs can power on even if availability constraints are violated. • Not recommended for production environments. |
Additional Settings:
- Host Failures Cluster Tolerates: Defines how many host failures the cluster can recover from.
- Performance Degradation Tolerance: Sets how much performance drop is acceptable during failover (e.g., 0% = no degradation allowed).
- Override Calculated Failover Capacity: Lets you manually set CPU/memory reservation percentages.
Configure the settings as per the screenshots below and click on the Heartbeat Datastores tab to continue.


Heartbeat Datastores:
Heartbeat Datastores are a critical part of VMware HA’s ability to distinguish between a host that’s truly down and one that’s just network-isolated. When the management network fails, datastore heartbeats act as a backup signal to help HA make smarter decisions.
What Are Heartbeat Datastores?
- Used when network heartbeats are lost.
- Help the master host determine if a slave host is still alive.
- Prevent unnecessary VM restarts due to false host failure detection.
Configuration Option
| Option | Description |
|---|---|
| Automatically select datastores accessible from the host | vSphere HA picks shared datastores available to all hosts. |
| Use datastores only from the specified list | You manually select datastores. HA won’t use others even if these fail. |
| Use datastores from the specified list and complement automatically if needed | Preferred datastores are used, but HA can fall back to others if needed. |
Best Practices
- Ensure at least two shared datastores are accessible by all hosts.
- Avoid using only one datastore—this can trigger HA warnings or errors.
- If you must suppress warnings (not recommended), use the advanced option:
das.ignoreInsufficientHbDatastore = true
Select the option Use datastores from the specified list and complement automatically if needed, select the two datastores iSCSI_DS01 and iSCSI_DS02 and click OK. Monitor the progress in the Recent Tasks pane at the bottom.

We have not configured any Advanced Options. Feel free to explore below Advanced Isolation settings and configure to your liking.
Advanced Isolation Settings
You can fine-tune behavior using HA advanced options:
| Option | Description |
|---|---|
das.isolationaddressX | Specifies alternate IPs to test for isolation (e.g., gateway, DNS). |
das.isolationshutdowntimeout | Timeout for graceful shutdown before forced power-off. |
das.config.fdm.isolationPolicyDelaySec | Delay before executing isolation response. |
Once the task is complete, you should see that the HA is enabled and a summary of the various configuration options we selected while configuring it.


That’s it for VMware vSphere HA. In the next part of the series we will configure VMware Distributed Resource Scheduler or DRS. So stay tuned.
Discover more from VirtShinobi.blog
Subscribe to get the latest posts sent to your email.







