Resilient vSphere 6.0 PSC deployment without Load Balancer

With vSphere 6.0 VMware has separated their vCenter Server into two components - vCenter Server and Platform Services Controller. They also created a list of topologies they recommend for deployments. The deployment they recommend for high availability includes an External Load Balancer where vCenter Servers are pointed to. An alternate solution is to have multiple Platform Services Controllers and vCenters pointed directly to them. There is no need to have one PSC for each vCenter. Each PSC can manage up to 4 vCenters, so with 2 PSC you can manage 8 vCenters, with 3 you can manage 10 vCenters, which is the configuration maximum at the moment.

psc-deployment-with-or-without-loadbalancer — Platform Services Controller with or without Loadbalancer?

The solution with a Load Balancer sounds nice, and I'm sure it's operable, but it has some drawbacks:

It requires a third-party Load Balancer (Compatible Load Balancers are NSX-v, Citrix NetScaler and F5 Network Big-IP)
Configuration is complex
Troubleshooting is even complexer
Does not scale (1 PSC can handle 4 vCenters, with a Load Balancer (which is used for redundancy) 2 PSC are required to handle 4 vCenters)

psc-deployment-without-loadbalancer-failover — Repointing vCenter in case of PSC Failure

Why not take the Load Balancer, and it's complexity, out of the design and repoint vCenters to a working PSC in case of an unrecoverable PSC failure? PSC data is replicated, so you do not lose any information. Repointing takes about 5 Minutes and is basically a one-liner.

Note: Repointing has been introduced in vSphere 6.0 Update 1. This method does not work in the first release!

Setup is straightforward:

Install first external PSC (Create Domain)
Install second external PSC (Join Domain)
Install up to 4 external vCenters and join any of the two PSC

3 PSC are required for 8 vCenter / 4 PSC are required for 10 vCenter

What happens when the ESXi host running a PSC crashes?
VMware HA restarts the PSC on another host. No login possible for about 3 minutes - Nothing to do.

What happens when the PSC Virtual Machine is broken?
No login possible. The browser displays the following error message (vCenter itself is reachable, but it redirects to the PSC for login):
browser-error-message-psc-down

If you can't fix the PSC, repoint the vCenter to another PSC and restart vCenter Services. Repointing is explained in KB2113917.

SSH to the vCenter Server Appliance and login as root

Activate and start the bash shell

Command> shell.set --enabled True
Command> shell

In case you are not sure to which PSC the vCenter is connected, verify the current PSC with the following command:
```
vcsa:~ # /usr/lib/vmware-vmafd/bin/vmafd-cli get-dc-name --server-name localhost
psc01.virten.lab
```

Repoint the vCenter to a working PSC

vcsa:~ # /usr/lib/vmware-vmafd/bin/vmafd-cli set-dc-name --server-name localhost --dc-name psc02.virten.lab

Restart all services in the vCSA (Takes about 5 Minutes)

service-control --stop --all 
service-control --start --all

Logins are now processed by the second PSC.

5 thoughts on “Resilient vSphere 6.0 PSC deployment without Load Balancer”

Gary April 15, 2016 at 10:11 pm
Excellent. I will have to try this out.
Damon April 19, 2016 at 6:06 pm
Package that into a script and then use a monitoring tool to check for the HTTP error message and you could automate that to happen.
1. fgrehl April 19, 2016 at 6:19 pm
  William Lam has done this: http://www.virtuallyghetto.com/2015/12/how-to-automatically-repoint-failover-vcsa-to-another-replicated-platform-services-controller-psc.html
John Fleuchaus April 20, 2016 at 1:04 am
VCP6-VDC certification training taught us this, but I was left wondering if/when an HA-like failover function for next-nearest-PSC-seek to temporarily connect to might come about. VMware's HA has evolved to be quite reliable and functions wonderfully, even in the absence of a vCenter as per its design.
VMware could take a clue from Microsoft's Active Directory which has this functionality built in and it works quite well despite a local DC failure; just go out on the WAN and find another DC to authenticate against. These PSCs could very well be trained to advertise next-nearest-neighbor alternatives (or even the entire PSC list) to each vCenter in the domain (not necessarily just the ones connected to that particular PSC) such that if the primary PSC is unavailable, the vCenter logs the fact, then connects to the next nearest PSC in the list (either round robin or negotiated based upon number of connected vCenters).
An improvement like that would take this topic to the next level for sure. I eagerly await that enhancement.
1. John Fleuchaus April 20, 2016 at 1:06 am
  VCP-DCV rather ^^^ (poor tired eyes and fingers)

Resilient vSphere 6.0 PSC deployment without Load Balancer

Share:

5 thoughts on “Resilient vSphere 6.0 PSC deployment without Load Balancer”

Leave a Reply Cancel reply