vSAN 6.6 RVC Guide Part 4 - Cluster Health

vSAN 6.6 RVC Guide

The "vSAN 6.6 RVC Guide" series explains how to manage your VMware Virtual SAN environment with the Ruby vSphere Console. RVC is an interactive command line tool to control and automate your platform. If you are new to RVC, make sure to read the Getting Started with Ruby vSphere Console Guide. All commands are from the latest vSAN 6.6 version.

In the fourth part, I'm working with commands related to the vSAN health plugin. These commands are only available when the Health Services are installed.

To shorten commands I've used marks for the vSAN enabled Cluster, a Virtual Machine and an ESXi Hosts participating in the vSAN. This enables me to use ~cluster, ~vm and ~esx in examples:

/localhost/DC> mark cluster /vc.virten.lab/Datacenter/computers/VSAN-Cluster/
/localhost/DC> mark vm /vc.virten.lab/Datacenter/vms/vma.virten.lab
/localhost/DC> mark esx /vc.virten.lab/Datacenter/computers/VSAN-Cluster/hosts/esx1.virten.lab/
/localhost/DC> mark vc /vc.virten.lab/

Object Management

vsan.health.cluster_status ~cluster
Verifies that the vSAN health check plugin has been successfully installed in the cluster.

Example 1 – Show vSAN health plugin status:

> vsan.health.cluster_status ~cluster
Configuration of ESX vSAN Health Extension: installed (OK)
Host 'vesx3.virten.lab' has health system version '6.6.0' installed
Host 'vesx2.virten.lab' has health system version '6.6.0' installed
Host 'vesx1.virten.lab' has health system version '6.6.0' installed
vCenter Server has health system version '6.6.0' installed

vsan.health.health_summary [-c] ~cluster
Performs a basic health check on the vSAN Cluster. This is one of the starter commands during troubleshooting. With the -c option you can also verify VM creation on all ESXi hosts in the vSAN Cluster.

-c, --create-vm-test: performs a proactive VM creation test.

Example 1 – Perform a basic health check:

>  vsan.health.health_summary ~cluster
Overall health: yellow (Cluster health issue)
+------------------------------------------------------+---------+
| Health check                                         | Result  |
+------------------------------------------------------+---------+
| Cluster                                              | Warning |
|   ESXi vSAN Health service installation              | Passed  |
|   vSAN Health Service up-to-date                     | Passed  |
|   Advanced vSAN configuration in sync                | Passed  |
|   vSAN CLOMD liveness                                | Passed  |
|   vSAN Disk Balance                                  | Warning |
|   Resync operations throttling                       | Passed  |
|   vSAN cluster configuration consistency             | Passed  |
|   Time is synchronized across hosts and VC           | Passed  |
|   vSphere cluster members match vSAN cluster members | Passed  |
|   Software version compatibility                     | Passed  |
|   Disk format version                                | Passed  |
+------------------------------------------------------+---------+
| Hardware compatibility                               | Warning |
|   vSAN HCL DB up-to-date                             | Warning |
|   vSAN HCL DB Auto Update                            | skipped |
|   SCSI controller is VMware certified                | skipped |
|   Controller is VMware certified for ESXi release    | skipped |
|   Controller driver is VMware certified              | skipped |
|   Controller firmware is VMware certified            | skipped |
|   Controller disk group mode is VMware certified     | skipped |
+------------------------------------------------------+---------+
| Performance service                                  | Warning |
|   Performance service status                         | Warning |
+------------------------------------------------------+---------+
| Online health (Last check: 48 minute(s) ago)         | Warning |
|   Customer experience improvement program (CEIP)     | Passed  |
|   Online health connectivity                         | skipped |
|   Disks usage on storage controller                  | Passed  |
|   vSAN max component size                            | Warning |
+------------------------------------------------------+---------+
| Network                                              | Passed  |
|   Hosts disconnected from VC                         | Passed  |
|   Hosts with connectivity issues                     | Passed  |
|   vSAN cluster partition                             | Passed  |
|   All hosts have a vSAN vmknic configured            | Passed  |
|   All hosts have matching subnets                    | Passed  |
|   vSAN: Basic (unicast) connectivity check           | Passed  |
|   vSAN: MTU check (ping with large packet size)      | Passed  |
|   vMotion: Basic (unicast) connectivity check        | skipped |
|   vMotion: MTU check (ping with large packet size)   | Passed  |
|   Network latency check                              | Passed  |
+------------------------------------------------------+---------+
| Physical disk                                        | Passed  |
|   Overall disks health                               | Passed  |
|   Metadata health                                    | Passed  |
|   Disk capacity                                      | Passed  |
|   Software state health                              | Passed  |
|   Congestion                                         | Passed  |
|   Component limit health                             | Passed  |
|   Component metadata health                          | Passed  |
|   Memory pools (heaps)                               | Passed  |
|   Memory pools (slabs)                               | Passed  |
+------------------------------------------------------+---------+
| Data                                                 | Passed  |
|   vSAN object health                                 | Passed  |
+------------------------------------------------------+---------+
| Limits                                               | Passed  |
|   Current cluster situation                          | Passed  |
|   After 1 additional host failure                    | Passed  |
|   Host component limit                               | Passed  |
+------------------------------------------------------+---------+

Details about any failed test below ...
Cluster - vSAN Disk Balance: yellow
  +--------------------+-------+
  | Metric             | Value |
  +--------------------+-------+
  | Average Disk Usage | 15 %  |
  | Maximum Disk Usage | 35 %  |
  | Maximum Variance   | 30 %  |
  | LM Balance Index   | 10 %  |
  +--------------------+-------+
  +------------------+-----------------------------------------+-------------------------------+--------------+
  | Host             | Device                                  | Rebalance State               | Data To Move |
  +------------------+-----------------------------------------+-------------------------------+--------------+
  | vesx3.virten.lab | Local VMware Disk (mpx.vmhba0:C0:T2:L0) | Proactive rebalance is needed | 1.9889 GB    |
  +------------------+-----------------------------------------+-------------------------------+--------------+

Hardware compatibility - vSAN HCL DB up-to-date: yellow
  +--------------------------------+---------------------+
  | Entity                         | Time in UTC         |
  +--------------------------------+---------------------+
  | Current time                   | 2017-07-25 14:13:23 |
  | Local HCL DB copy last updated | 2017-04-25 16:56:42 |
  +--------------------------------+---------------------+

Hardware compatibility - SCSI controller is VMware certified: skipped
  +------------------+--------+------------------------------------+--------------+---------------------+----------------------+
  | Host             | Device | Current controller                 | Used by vSAN | PCI ID              | Controller certified |
  +------------------+--------+------------------------------------+--------------+---------------------+----------------------+
  | vesx2.virten.lab | vmhba0 | VMware Inc. PVSCSI SCSI Controller | Yes          | 15ad,07c0,15ad,07c0 | Warning              |
  | vesx1.virten.lab | vmhba0 | VMware Inc. PVSCSI SCSI Controller | Yes          | 15ad,07c0,15ad,07c0 | Warning              |
  | vesx3.virten.lab | vmhba0 | VMware Inc. PVSCSI SCSI Controller | Yes          | 15ad,07c0,15ad,07c0 | Warning              |
  +------------------+--------+------------------------------------+--------------+---------------------+----------------------+

Hardware compatibility - Controller is VMware certified for ESXi release: skipped
  +------------------+--------------------------------------------+----------------------+-------------------+-------------------------+
  | Host             | Device                                     | Current ESXi release | Release supported | Certified ESXi releases |
  +------------------+--------------------------------------------+----------------------+-------------------+-------------------------+
  | vesx2.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | vSAN 6.6             | Warning           | N/A                     |
  | vesx1.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | vSAN 6.6             | Warning           | N/A                     |
  | vesx3.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | vSAN 6.6             | Warning           | N/A                     |
  +------------------+--------------------------------------------+----------------------+-------------------+-------------------------+

Hardware compatibility - Controller driver is VMware certified: skipped
  +------------------+--------------------------------------------+-----------------------------------+------------------+---------------------+
  | Host             | Device                                     | Current driver                    | Driver certified | Recommended drivers |
  +------------------+--------------------------------------------+-----------------------------------+------------------+---------------------+
  | vesx2.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | pvscsi (0.1-1vmw.650.0.0.4564106) | Warning          | N/A                 |
  | vesx1.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | pvscsi (0.1-1vmw.650.0.0.4564106) | Warning          | N/A                 |
  | vesx3.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | pvscsi (0.1-1vmw.650.0.0.4564106) | Warning          | N/A                 |
  +------------------+--------------------------------------------+-----------------------------------+------------------+---------------------+

Hardware compatibility - Controller firmware is VMware certified: skipped
  +------------------+--------------------------------------------+------------------+--------------------+-----------------------+
  | Host             | Device                                     | Current firmware | Firmware certified | Recommended firmwares |
  +------------------+--------------------------------------------+------------------+--------------------+-----------------------+
  | vesx2.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | N/A              | Warning            | N/A                   |
  | vesx1.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | N/A              | Warning            | N/A                   |
  | vesx3.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | N/A              | Warning            | N/A                   |
  +------------------+--------------------------------------------+------------------+--------------------+-----------------------+

Hardware compatibility - Controller disk group mode is VMware certified: skipped
  +------------------+--------------------------------------------+-------------------------+---------------------------+-----------------------------+
  | Host             | Device                                     | Current disk group mode | Disk group mode certified | Recommended disk group mode |
  +------------------+--------------------------------------------+-------------------------+---------------------------+-----------------------------+
  | vesx2.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | All Flash               | Warning                   | N/A                         |
  | vesx1.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | All Flash               | Warning                   | N/A                         |
  | vesx3.virten.lab | vmhba0: VMware Inc. PVSCSI SCSI Controller | All Flash               | Warning                   | N/A                         |
  +------------------+--------------------------------------------+-------------------------+---------------------------+-----------------------------+

Performance service - Performance service status: yellow
  +---------+---------------------------------+
  | Result  | Status                          |
  +---------+---------------------------------+
  | Warning | Performance service is disabled |
  +---------+---------------------------------+

Online health (Last check: 48 minute(s) ago) - vSAN max component size: yellow
  +------------------+--------------------------------+---------+--------------------------------------+----------------------------------+
  | Host             | Cluster Smallest Disk Size(GB) | Status  | Recommended Max Component Size in GB | Current Max Component Size in GB |
  +------------------+--------------------------------+---------+--------------------------------------+----------------------------------+
  | vesx1.virten.lab | 10                             | Warning | 180                                  | 255                              |
  | vesx3.virten.lab | 10                             | Warning | 180                                  | 255                              |
  | vesx2.virten.lab | 10                             | Warning | 180                                  | 255                              |
  +------------------+--------------------------------+---------+--------------------------------------+----------------------------------+

Network - vMotion: Basic (unicast) connectivity check: skipped
  +-----------+---------+-----------+-------------+
  | From Host | To Host | To Device | Ping result |
  +-----------+---------+-----------+-------------+
  +-----------+---------+-----------+-------------+
  +------------------+------------------+-----------+-------------+
  | From Host        | To Host          | To Device | Ping result |
  +------------------+------------------+-----------+-------------+
  | vesx2.virten.lab | vesx1.virten.lab | vmk1      | Passed      |
  | vesx2.virten.lab | vesx3.virten.lab | vmk1      | Passed      |
  | vesx1.virten.lab | vesx2.virten.lab | vmk1      | Passed      |
  | vesx1.virten.lab | vesx3.virten.lab | vmk1      | Passed      |
  | vesx3.virten.lab | vesx2.virten.lab | vmk1      | Passed      |
  | vesx3.virten.lab | vesx1.virten.lab | vmk1      | Passed      |
  +------------------+------------------+-----------+-------------+

[[0.070919772, "initial connect"],
 [2.234612291, "cluster-health"],
 [0.022485742, "table-render"]]

Example 2 – Perform a proactive VM creation test:

>  vsan.health.health_summary ~cluster -c
[...]
Performing pro-active VM creation test ...
+------------------+---------+
| Check            | Result  |
+------------------+---------+
| vesx1.virten.lab | Success |
| vesx2.virten.lab | Success |
| vesx3.virten.lab | Success |
+------------------+---------+

vsan.health.cluster_rebalance ~cluster
This command triggers an immediate rebalance in the vSAN Cluster. The command is similar to vsan.proactive_rebalance. The "vSAN Disk Balance" Health check, which can be viewed with the vsan.health.health_summary command, shows when a rebalancing is required.

Example 1 – Verify and rebalance vSAN disks:

> vsan.health.health_summary ~cluster
[...]
 |   vSAN Disk Balance                                  | Warning |
[...]
Details about any failed test below ...
Cluster - vSAN Disk Balance: yellow
  +--------------------+-------+
  | Metric             | Value |
  +--------------------+-------+
  | Average Disk Usage | 15 %  |
  | Maximum Disk Usage | 35 %  |
  | Maximum Variance   | 30 %  |
  | LM Balance Index   | 10 %  |
  +--------------------+-------+
  +------------------+-----------------------------------------+-------------------------------+--------------+
  | Host             | Device                                  | Rebalance State               | Data To Move |
  +------------------+-----------------------------------------+-------------------------------+--------------+
  | vesx3.virten.lab | Local VMware Disk (mpx.vmhba0:C0:T2:L0) | Proactive rebalance is needed | 1.9889 GB    |
  +------------------+-----------------------------------------+-------------------------------+--------------+
[...]
> vsan.health.cluster_rebalance ~cluster
This command will trigger the immediate rebalance of vSAN
cluster. It will rebalance the vSAN objects for the imbalance hosts
based on the disk usage. This process may take a moment ...

 vSAN66: success
> vsan.health.health_summary ~cluster
[...]
 |   vSAN Disk Balance                                  | Passed  |
[...]

vsan.health.cluster_debug_multicast [-d] ~cluster
This command performs a multicast test. It ensures that all hosts can receive multicast packets. Each host is represented with a character [ABC].
Please note that vSAN 6.6 no longer uses multicast. This test can only be used with older versions.

-d, --duration: Duration to watch for packets (default: 60)

Example 1 – Debug Multicast:

> vsan.health.cluster_debug_multicast ~cluster
2017-07-25 15:00:06 +0000: Gathering information about hosts and vSAN
2017-07-25 15:00:06 +0000: Watching packets for 60 seconds
2017-07-25 15:00:06 +0000: Got observed packets from all hosts, analysing

Automated system couldn't derive any issues.
Either no problem exists or manual inspection is required.

To further help the network admin, the following is a list of
packets with source and destination IPs. As all these packets
are multicast, they should have been received by all hosts in
the cluster. To show which hosts actually saw the packets, each
host is represented by a character (A-Z). If the character is
listed in front of the packet, the host received the packet. If
the space is left empty, the host didn't receive the packet.

A = Host vesx1.virten.lab
B = Host vesx2.virten.lab
C = Host vesx3.virten.lab

vsan.health.multicast_speed_test ~cluster
Performs a multicast speed test to ensure that there is adequate network bandwidth between all host in the vSAN cluster. Please note that despite multicast is no longer used, the speed test still works in vSAN 6.6.

Example 1 – Perform a multicast speed test:

> vsan.health.multicast_speed_test ~cluster
Performing a multicast speed test. One host is selected to send multicast
traffic, all other hosts will attempt to receive the packets. The test
is designed such that the sender sends more than most physical networks
can handle, i.e. it is expected that the physical network may drop packets
which then won't be received by the receivers. Assuming a TCP speed test
shows good performance, the most likely suspect for failing the multicast
speed test are multicast bottlenecks in physical switches.
The key question this test tries to answer is: What bandwidth is the
receiver able to get? For vSAN to work well, this number should be at
least 20MB/s. Typical enterprise environments should be able to do 50MB/s
or more.

Now running test ...

Overall health: Passed
+------------------+---------------+---------------------------+--------------------------------------------+
| Host             | Health Status | Received Bandwidth (MB/s) | Maximum Achievable Bandwidth Result (MB/s) |
+------------------+---------------+---------------------------+--------------------------------------------+
| vesx1.virten.lab | Passed        | 63.68                     | 125.00                                     |
| vesx3.virten.lab | Passed        | 62.58                     | 125.00                                     |
+------------------+---------------+---------------------------+--------------------------------------------+

vsan.health.hcl_update_db [-l|-u|-f] ~vc
Updates the HCL database from local file, URL or vmware.com. When the vCenter has internet connectivity, you can run the command without any options. The official site for downloading the HCL file is: http://partnerweb.vmware.com/service/vsan/all.json

-l, --local-file: Path to local file that contains DB
-u, --url: Path to URL that contains DB
-f, --force: Skip any questions, and proceed

Example 1 – Update the HCL database from vmware.com without asking:

> vsan.health.hcl_update_db ~vc -f
Updating DB from vmware.com.
Note: vCenter needs to have access to vmware.com.
Done

vsan.health.cluster_repair_immediately ~cluster
This command triggers an immediate repair of objects that are impacted by components in the ABSENT state, caused by failed hosts or hot-unplugged drives. vSAN will wait 60 minutes by default as in most such cases the failed components will come back. The second category of objects was not repaired previously because under the cluster conditions at the time it wasn't possible. vSAN will periodically recheck those objects. Both types of objects will be instructed to attempt a repair immediately.

Example 1 – Start immediate repair after disk failure:

> vsan.health.cluster_repair_immediately ~cluster

 vSAN66: success

vsan.health.cluster_attach_to_sr [-s] ~cluster
Performs an automated creation of vSAN related log bundles and uploads them to the support request opened with VMware Global Support Services (GSS).

-s, --sr=: SR number

Example 1 – Create and upload log bundle:

> > vsan.health.cluster_attach_to_sr -s 99999999 ~cluster
vSAN Support Assistant performs automated upload
of support bundles, and so does not allow you to review,
obfuscate or otherwise edit the contents of your support
data prior to it being sent to VMware. If your support data
may contain regulated data, such as personal, health care data
and/or financial data, you should instead use the more manual
workflow by clicking vCenter -> Actions -> Export System Logs
selecting 'Include vCenter Server' as well as all ESX hosts
in the cluster. Follow VMware KB 2072796
(http://kb.vmware.com/kb/2072796) for the manual workflow.
This process may take a moment ...

Attaching vSAN support bundle for the cluster 'vSAN66' ...
 vSAN66: running |-------                              |

vsan.health.health_check_interval_status ~cluster
Get the current health check interval status. Default interval is 60 minutes.

Example 1 – Display health check interval:

> vsan.health.health_check_interval_status ~cluster
+---------+-----------------------+
| Cluster | Health Check Interval |
+---------+-----------------------+
| vSAN66  | 60 mins               |
+---------+-----------------------+

vsan.health.health_check_interval_configure [-i] ~cluster
Configure the health check interval (in minutes) for the cluster. Default is 60 minutes. Set the interval to 0 to disable periodical health checks.

-i, --interval: Health Check Interval in minutes

Example 1 – Disable periodical health checks:

> vsan.health.health_check_interval_configure -i 0 ~cluster
Disabled the periodical health check for vSAN66

Example 2 – Re-enable health checks with a non-default interval:

> vsan.health.health_check_interval_configure -i 120 ~cluster
Successfully set the health check interval for vSAN66 to 120 minutes!

vsan.health.cluster_proxy_status ~cluster
When the vCenter does not have access to the Internet, you can configure a proxy that allows Internet connectivity. This is required for health check plugins such as CEIP and HCL database updates.

Example 1 – Display vSAN health service proxy configuration (No proxy used):

> vsan.health.cluster_proxy_status ~cluster/
+---------+------------+------------+------------+
| Cluster | Proxy Host | Proxy Port | Proxy User |
+---------+------------+------------+------------+
| vSAN66  |            |            |            |
+---------+------------+------------+------------+

vsan.health.cluster_proxy_configure [-o|-p|-u]~cluster
Configure a proxy that allows health check plugins such as CEIP, HCL database updates and Support Assistant to connect to the Internet.

-o, --host: Proxy host
-p, --port: Proxy port
-u, --user: Proxy user

Example 1 – Configure a proxy:

> vsan.health.cluster_proxy_configure -o proxy.virten.lab -p 8080 -u vc ~cluster
Enter proxy password (empty for no password):
Enter proxy password again:

Configure the proxy for the cluster 'vSAN66' ...
>
> vsan.health.cluster_proxy_status ~cluster
+---------+------------------+------------+------------+
| Cluster | Proxy Host       | Proxy Port | Proxy User |
+---------+------------------+------------+------------+
| vSAN66  | proxy.virten.lab | 8080       | vc         |
+---------+------------------+------------+------------+

vsan.health.cluster_load_test_run [-r|-t|-d|-a] ~cluster
Runs a storage performance test for the vSAN. There are two options to run the test. You can either run everything in a single step or run the test as 3 distinct steps. The 3 steps are Prepare - Run - Cleanup.

-r, --runname: Test name
-t, --type: VMDK workload type
-d, --duration-sec: Duration for running the load test in second
-a, --action: Possible actions are 'prepare', 'run', 'cleanup' and 'fullrun'. (Default is fullrun)

Available testing methods:

Low stress test
Basic sanity test, focus on Flash cache layer
Stress test
Performance characterization - 100% Read, optimal RC usage
Performance characterization - 100% Write, optimal WB usage
Performance characterization - 100% read, optimal RC usage after warmup
Performance characterization - 70/30 read/write mix, realistic, optimal flash cache usage
Performance characterization - 70/30 read/write mix, high IO size, optimal flash cache usage
Performance characterization - 100% read, Low RC hit rate / All-Flash demo
Performance characterization - 100% Streaming reads
Performance characterization - 100% Streaming writes

Refer to KB2147074 for detailed test specifications.

Example 1 – Perform a "Low Stress test" for 60 seconds:

> vsan.health.cluster_load_test_run -r stresstest -t "Low stress test" -d 60 ~cluster
This command will run the VMDK load test for the given cluster
If the action is 'fullrun' or not specified, it will do all of steps
to run the test including preparing, running and cleaning up. And
it will only run the test based on the VMDK which is created by
cluster_load_test_prepare if action is 'run'. In this sitution, the
VMDK cleanup step is required by calling cluster_load_test_cleanup

 vSAN66: success
VMDK load test completed for the cluster vSAN66: green
+------------------+-----------------+------------------+----------------+------+-----------------+----------------------+----------------------+
| Host             | Workload Type   | VMDK Disk Number | Duration (sec) | IOPS | Throughput MB/s | Average Latency (ms) | Maximum Latency (ms) |
+------------------+-----------------+------------------+----------------+------+-----------------+----------------------+----------------------+
| vesx2.virten.lab | Low stress test | 0                | 60             | 4822 | 18.84           | 0.21                 | 14.79                |
| vesx1.virten.lab | Low stress test | 0                | 60             | 2905 | 11.35           | 0.34                 | 27.40                |
| vesx3.virten.lab | Low stress test | 0                | 60             | 2888 | 11.28           | 0.34                 | 32.50                |
+------------------+-----------------+------------------+----------------+------+-----------------+----------------------+----------------------+

Example 2 – Perform a load test in interavtive mode:

> vsan.health.cluster_load_test_run -r stresstest -d 60 ~cluster
This command will run the VMDK load test for the given cluster
If the action is 'fullrun' or not specified, it will do all of steps
to run the test including preparing, running and cleaning up. And
it will only run the test based on the VMDK which is created by
cluster_load_test_prepare if action is 'run'. In this sitution, the
VMDK cleanup step is required by calling cluster_load_test_cleanup

0: Low stress test
1: Basic sanity test, focus on Flash cache layer
2: Stress test
3: Performance characterization - 100% Read, optimal RC usage
4: Performance characterization - 100% Write, optimal WB usage
5: Performance characterization - 100% read, optimal RC usage after warmup
6: Performance characterization - 70/30 read/write mix, realistic, optimal flash cache usage
7: Performance characterization - 70/30 read/write mix, high IO size, optimal flash cache usage
8: Performance characterization - 100% read, Low RC hit rate / All-Flash demo
9: Performance characterization - 100% Streaming reads
10: Performance characterization - 100% Streaming writes
Choose the storage workload type [0]:
1
 vSAN66: success
VMDK load test completed for the cluster vSAN66: green
[...]

vsan.health.cluster_load_test_prepare [-r|-t] ~cluster
Set of commands to run a storage performance test for the vSAN. I recommend using the vsan.health.cluster_load_test_run command in "fullrun" mode to perform the test with a single command. If you want to run the load test in separate steps, use this command to prepares a load test.

-r, --runname: Test name
-t, --type: VMDK workload type

Example 1 – Prepare a "low stress test":

> vsan.health.cluster_load_test_prepare -r stresstest -t "Low stress test" ~cluster
Preparing VMDK test on vSAN66
 vSAN66: success
Preparing VMDK load test is completed for the cluster vSAN66 with status green
+------------------+--------+-------+
| Host             | Status | Error |
+------------------+--------+-------+
| vesx2.virten.lab | Passed |       |
| vesx1.virten.lab | Passed |       |
| vesx3.virten.lab | Passed |       |
+------------------+--------+-------+

vsan.health.cluster_load_test_cleanup [-r] ~cluster
Set of commands to run a storage performance test for the vSAN. I recommend using the vsan.health.cluster_load_test_run command in "fullrun" mode to perform the test with a single command. If you want to run the load test in separate steps, use this command to clean up after the load test.

-r, --runname: Test name

Example 1 – Perform a load test in 3 separate steps:

> vsan.health.cluster_load_test_prepare -r stresstest -t "Low stress test" ~cluster
Preparing VMDK test on vSAN66
 vSAN66: success
Preparing VMDK load test is completed for the cluster vSAN66 with status green
+------------------+--------+-------+
| Host             | Status | Error |
+------------------+--------+-------+
| vesx2.virten.lab | Passed |       |
| vesx1.virten.lab | Passed |       |
| vesx3.virten.lab | Passed |       |
+------------------+--------+-------+

> vsan.health.cluster_load_test_run -r stresstest -d 60 -t "Low stress test" -a run ~cluster
This command will run the VMDK load test for the given cluster
If the action is 'fullrun' or not specified, it will do all of steps
to run the test including preparing, running and cleaning up. And
it will only run the test based on the VMDK which is created by
cluster_load_test_prepare if action is 'run'. In this sitution, the
VMDK cleanup step is required by calling cluster_load_test_cleanup

 vSAN66: success
VMDK load test completed for the cluster vSAN66: green
+------------------+-----------------+------------------+----------------+------+-----------------+----------------------+----------------------+
| Host             | Workload Type   | VMDK Disk Number | Duration (sec) | IOPS | Throughput MB/s | Average Latency (ms) | Maximum Latency (ms) |
+------------------+-----------------+------------------+----------------+------+-----------------+----------------------+----------------------+
| vesx2.virten.lab | Low stress test | 0                | 60             | 4492 | 17.55           | 0.22                 | 15.18                |
| vesx1.virten.lab | Low stress test | 0                | 60             | 4338 | 16.95           | 0.23                 | 16.37                |
| vesx3.virten.lab | Low stress test | 0                | 60             | 4167 | 16.28           | 0.24                 | 20.39                |
+------------------+-----------------+------------------+----------------+------+-----------------+----------------------+----------------------+

> vsan.health.cluster_load_test_cleanup -r stresstest  ~cluster
Cleaning up VMDK test on cluster vSAN66
 vSAN66: success
Cleanup VMDK load test is completed for the cluster vSAN66 with status green
+------------------+--------+-------+
| Host             | Status | Error |
+------------------+--------+-------+
| vesx2.virten.lab | Passed |       |
| vesx1.virten.lab | Passed |       |
| vesx3.virten.lab | Passed |       |
+------------------+--------+-------+

vsan.health.silent_health_check_configure [-a|-r|-i|-n] ~cluster
In home labs for examples, when you don't have supported hardware and you still want to have a green health, you can silence health checks. Silenced checks are displayed with a green checkmark and are marked as "Skipped".
Use the vsan.health.silent_health_check_status command to identify the Health Check Id for specific checks or use interactive mode if you don't know Check Ids.

-a, --add-checks: Add checks to silent list
-r, --remove-checks: Remove checks from silent list
-i, --interactive-add: Use interactive mode to add checks to the silent list
-n, --interactive-remove: Use interactive mode to remove checks from the silent list

Example 1 – Disable HCL related health checks:

> vsan.health.silent_health_check_configure -a controllerdiskmode ~cluster
Successfully add check "Controller disk group mode is VMware certified" to silent health check list for vSAN66
> vsan.health.silent_health_check_configure -a controllerdriver ~cluster
Successfully add check "Controller driver is VMware certified" to silent health check list for vSAN66
> vsan.health.silent_health_check_configure -a controllerfirmware ~cluster
Successfully add check "Controller firmware is VMware certified" to silent health check list for vSAN66
> vsan.health.silent_health_check_configure -a controllerreleasesupport ~cluster
Successfully add check "Controller is VMware certified for ESXi release" to silent health check list for vSAN66
> vsan.health.silent_health_check_configure -a controlleronhcl ~cluster
Successfully add check "SCSI controller is VMware certified" to silent health check list for vSAN66

vsan.health.silent_health_check_status ~cluster
Displays the current silent status for health checks.

Example 1 – Health check list with some disabled checks:

> vsan.health.silent_health_check_status ~cluster
Silent Status of Cluster vSAN66:
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Health Check                                                                 | Health Check Id                     | Silent Status |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Cloud Health                                                                 |                                     |               |
|   Controller utility is installed on host                                    | vendortoolpresence                  | Normal        |
|   Controller with pass-through and RAID disks                                | mixedmode                           | Normal        |
|   Customer experience improvement program (CEIP)                             | vsancloudhealthceipexception        | Normal        |
|   Disks usage on storage controller                                          | diskusage                           | Normal        |
|   Online health connectivity                                                 | vsancloudhealthconnectionexception  | Silent        |
|   vSAN and VMFS datastores on a Dell H730 controller with the lsi_mr3 driver | mixedmodeh730                       | Normal        |
|   vSAN configuration for LSI-3108 based controller                           | h730                                | Normal        |
|   vSAN max component size                                                    | smalldiskstest                      | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Cluster                                                                      |                                     |               |
|   Advanced vSAN configuration in sync                                        | advcfgsync                          | Normal        |
|   Deduplication and compression configuration consistency                    | physdiskdedupconfig                 | Normal        |
|   Deduplication and compression usage health                                 | physdiskdedupusage                  | Normal        |
|   Disk format version                                                        | upgradelowerhosts                   | Normal        |
|   ESXi vSAN Health service installation                                      | healtheaminstall                    | Normal        |
|   Resync operations throttling                                               | resynclimit                         | Normal        |
|   Software version compatibility                                             | upgradesoftware                     | Normal        |
|   Time is synchronized across hosts and VC                                   | timedrift                           | Normal        |
|   vSAN CLOMD liveness                                                        | clomdliveness                       | Normal        |
|   vSAN Disk Balance                                                          | diskbalance                         | Normal        |
|   vSAN Health Service up-to-date                                             | healthversion                       | Normal        |
|   vSAN cluster configuration consistency                                     | consistentconfig                    | Normal        |
|   vSphere cluster members match vSAN cluster members                         | clustermembership                   | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Data                                                                         |                                     |               |
|   vSAN VM health                                                             | vmhealth                            | Normal        |
|   vSAN object health                                                         | objecthealth                        | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Encryption                                                                   |                                     |               |
|   CPU AES-NI is enabled on hosts                                             | hostcpuaesni                        | Normal        |
|   vCenter and all hosts are connected to Key Management Servers              | kmsconnection                       | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Hardware compatibility                                                       |                                     |               |
|   Controller disk group mode is VMware certified                             | controllerdiskmode                  | Silent        |
|   Controller driver is VMware certified                                      | controllerdriver                    | Silent        |
|   Controller firmware is VMware certified                                    | controllerfirmware                  | Silent        |
|   Controller is VMware certified for ESXi release                            | controllerreleasesupport            | Silent        |
|   Host issues retrieving hardware info                                       | hclhostbadstate                     | Normal        |
|   SCSI controller is VMware certified                                        | controlleronhcl                     | Silent        |
|   vSAN HCL DB Auto Update                                                    | autohclupdate                       | Silent        |
|   vSAN HCL DB up-to-date                                                     | hcldbuptodate                       | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Limits                                                                       |                                     |               |
|   After 1 additional host failure                                            | limit1hf                            | Normal        |
|   Current cluster situation                                                  | limit0hf                            | Normal        |
|   Host component limit                                                       | nodecomponentlimit                  | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Network                                                                      |                                     |               |
|   Active multicast connectivity check                                        | multicastdeepdive                   | Normal        |
|   All hosts have a vSAN vmknic configured                                    | vsanvmknic                          | Normal        |
|   All hosts have matching multicast settings                                 | multicastsettings                   | Normal        |
|   All hosts have matching subnets                                            | matchingsubnet                      | Normal        |
|   Hosts disconnected from VC                                                 | hostdisconnected                    | Normal        |
|   Hosts with connectivity issues                                             | hostconnectivity                    | Normal        |
|   Multicast assessment based on other checks                                 | multicastsuspected                  | Normal        |
|   Network latency check                                                      | hostlatencycheck                    | Normal        |
|   vMotion: Basic (unicast) connectivity check                                | vmotionpingsmall                    | Silent        |
|   vMotion: MTU check (ping with large packet size)                           | vmotionpinglarge                    | Normal        |
|   vSAN cluster partition                                                     | clusterpartition                    | Normal        |
|   vSAN: Basic (unicast) connectivity check                                   | smallping                           | Normal        |
|   vSAN: MTU check (ping with large packet size)                              | largeping                           | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Performance service                                                          |                                     |               |
|   All hosts contributing stats                                               | hostsmissing                        | Normal        |
|   Performance data collection                                                | collection                          | Normal        |
|   Performance service status                                                 | perfsvcstatus                       | Normal        |
|   Stats DB object                                                            | statsdb                             | Normal        |
|   Stats DB object conflicts                                                  | renameddirs                         | Normal        |
|   Stats master election                                                      | masterexist                         | Normal        |
|   Verbose mode                                                               | verbosemode                         | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Physical disk                                                                |                                     |               |
|   Component limit health                                                     | physdiskcomplimithealth             | Normal        |
|   Component metadata health                                                  | componentmetadata                   | Normal        |
|   Congestion                                                                 | physdiskcongestion                  | Normal        |
|   Disk capacity                                                              | physdiskcapacity                    | Normal        |
|   Memory pools (heaps)                                                       | lsomheap                            | Normal        |
|   Memory pools (slabs)                                                       | lsomslab                            | Normal        |
|   Metadata health                                                            | physdiskmetadata                    | Normal        |
|   Overall disks health                                                       | physdiskoverall                     | Normal        |
|   Physical disk health retrieval issues                                      | physdiskhostissues                  | Normal        |
|   Software state health                                                      | physdisksoftware                    | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| Stretched cluster                                                            |                                     |               |
|   Invalid preferred fault domain on witness host                             | witnesspreferredfaultdomaininvalid  | Normal        |
|   Invalid unicast agent                                                      | hostwithinvalidunicastagent         | Normal        |
|   No disk claimed on witness host                                            | witnesswithnodiskmapping            | Normal        |
|   Preferred fault domain unset                                               | witnesspreferredfaultdomainnotexist | Normal        |
|   Site latency health                                                        | siteconnectivity                    | Normal        |
|   Unexpected number of fault domains                                         | clusterwithouttwodatafaultdomains   | Normal        |
|   Unicast agent configuration inconsistent                                   | clusterwithmultipleunicastagents    | Normal        |
|   Unicast agent not configured                                               | hostunicastagentunset               | Normal        |
|   Unsupported host version                                                   | hostwithnostretchedclustersupport   | Normal        |
|   Witness host fault domain misconfigured                                    | witnessfaultdomaininvalid           | Normal        |
|   Witness host not found                                                     | clusterwithoutonewitnesshost        | Normal        |
|   Witness host within vCenter cluster                                        | witnessinsidevccluster              | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+
| vSAN iSCSI target service                                                    |                                     |               |
|   Home object                                                                | iscsihomeobjectstatustest           | Normal        |
|   Network configuration                                                      | iscsiservicenetworktest             | Normal        |
|   Service runtime status                                                     | iscsiservicerunningtest             | Normal        |
+------------------------------------------------------------------------------+-------------------------------------+---------------+

vSAN 6.6 RVC Guide Part 4 - Cluster Health

vSAN 6.6 RVC Guide

Object Management

Share:

Leave a Reply Cancel reply