With vSphere 6.0 VMware has separated their vCenter Server into two components - vCenter Server and Platform Services Controller. They also created a list of topologies they recommend for deployments. The deployment they recommend for high availability includes an External Load Balancer where vCenter Servers are pointed to. An alternate solution is to have multiple Platform Services Controllers and vCenters pointed directly to them. There is no need to have one PSC for each vCenter. Each PSC can manage up to 4 vCenters, so with 2 PSC you can manage 8 vCenters, with 3 you can manage 10 vCenters, which is the configuration maximum at the moment.
The solution with a Load Balancer sounds nice, and I'm sure it's operable, but it has some drawbacks:
- It requires a third-party Load Balancer (Compatible Load Balancers are NSX-v, Citrix NetScaler and F5 Network Big-IP)
- Configuration is complex
- Troubleshooting is even complexer
- Does not scale (1 PSC can handle 4 vCenters, with a Load Balancer (which is used for redundancy) 2 PSC are required to handle 4 vCenters)
Why not take the Load Balancer, and it's complexity, out of the design and repoint vCenters to a working PSC in case of an unrecoverable PSC failure? PSC data is replicated, so you do not lose any information. Repointing takes about 5 Minutes and is basically a one-liner.
Note: Repointing has been introduced in vSphere 6.0 Update 1. This method does not work in the first release!
Setup is straightforward:
- Install first external PSC (Create Domain)
- Install second external PSC (Join Domain)
- Install up to 4 external vCenters and join any of the two PSC
3 PSC are required for 8 vCenter / 4 PSC are required for 10 vCenter
What happens when the ESXi host running a PSC crashes?
VMware HA restarts the PSC on another host. No login possible for about 3 minutes - Nothing to do.
What happens when the PSC Virtual Machine is broken?
No login possible. The browser displays the following error message (vCenter itself is reachable, but it redirects to the PSC for login):
If you can't fix the PSC, repoint the vCenter to another PSC and restart vCenter Services. Repointing is explained in KB2113917.
- SSH to the vCenter Server Appliance and login as root
- Activate and start the bash shell
Command> shell.set --enabled True Command> shell
- In case you are not sure to which PSC the vCenter is connected, verify the current PSC with the following command:
vcsa:~ # /usr/lib/vmware-vmafd/bin/vmafd-cli get-dc-name --server-name localhost psc01.virten.lab
- Repoint the vCenter to a working PSC
vcsa:~ # /usr/lib/vmware-vmafd/bin/vmafd-cli set-dc-name --server-name localhost --dc-name psc02.virten.lab
- Restart all services in the vCSA (Takes about 5 Minutes)
service-control --stop --all service-control --start --all
Logins are now processed by the second PSC.
Excellent. I will have to try this out.
Package that into a script and then use a monitoring tool to check for the HTTP error message and you could automate that to happen.
William Lam has done this: http://www.virtuallyghetto.com/2015/12/how-to-automatically-repoint-failover-vcsa-to-another-replicated-platform-services-controller-psc.html
VCP6-VDC certification training taught us this, but I was left wondering if/when an HA-like failover function for next-nearest-PSC-seek to temporarily connect to might come about. VMware's HA has evolved to be quite reliable and functions wonderfully, even in the absence of a vCenter as per its design.
VMware could take a clue from Microsoft's Active Directory which has this functionality built in and it works quite well despite a local DC failure; just go out on the WAN and find another DC to authenticate against. These PSCs could very well be trained to advertise next-nearest-neighbor alternatives (or even the entire PSC list) to each vCenter in the domain (not necessarily just the ones connected to that particular PSC) such that if the primary PSC is unavailable, the vCenter logs the fact, then connects to the next nearest PSC in the list (either round robin or negotiated based upon number of connected vCenters).
An improvement like that would take this topic to the next level for sure. I eagerly await that enhancement.
VCP-DCV rather ^^^ (poor tired eyes and fingers)