【原创】修复vCenter的Web界面无法启动的问题

零 出现的问题

在使用过程中,发现vCenter无法登录,报错。
因此重新启动vCenter,打开Web,先是报“no healthy upstream”,等了二十分钟后错误依旧。
再次重启vCenter,打开Web,还是报同样的错误。

登录vCenter Server后台管理界后,发现有一些服务没启动起来,其中包括“VMware vCenter Server”也没启动起来。

根据错误信息“no healthy upstream”在网上搜答案,结果基本上没说到点上的。
无奈,请求VMware技术支持。

一 客服支持

接待我的时Dell的工程师,使用Zoom远程控制我的桌面,期间进行语音通话。

1、检查 dns 服务配置,包括正向解析和反向解析。

nslookup vcsa.my8421.com
nslookup 192.168.250.88

2、检查磁盘使用

df -h

3、检查证书

技术工程师传来一个python脚本,“checksts.py”
脚本内容如下:

#!/opt/vmware/bin/python


"""
Copyright 2020-2022 VMware, Inc.  All rights reserved. -- VMware Confidential
Author:  Keenan Matheny ([email protected])

"""
##### BEGIN IMPORTS #####

import os
import sys
import json
import subprocess
import re
import pprint
import ssl
from datetime import datetime, timedelta
import textwrap
from codecs import encode, decode
import subprocess
from time import sleep
try:
    # Python 3 hack.
    import urllib.request as urllib2
    import urllib.parse as urlparse
except ImportError:
    import urllib2
    import urlparse

sys.path.append(os.environ['VMWARE_PYTHON_PATH'])
from cis.defaults import def_by_os
sys.path.append(os.path.join(os.environ['VMWARE_CIS_HOME'],
                def_by_os('vmware-vmafd/lib64', 'vmafdd')))
import vmafd
from OpenSSL.crypto import (load_certificate, dump_privatekey, dump_certificate, X509, X509Name, PKey)
from OpenSSL.crypto import (TYPE_DSA, TYPE_RSA, FILETYPE_PEM, FILETYPE_ASN1 )

today = datetime.now()
today = today.strftime("%d-%m-%Y")

vcsa_kblink = "https://kb.vmware.com/s/article/76719"
win_kblink = "https://kb.vmware.com/s/article/79263"

##### END IMPORTS #####

class parseCert( object ):
    # Certificate parsing

    def format_subject_issuer(self, x509name): 
        items = []
        for item in x509name.get_components():
            items.append('%s=%s' %  (decode(item[0],'utf-8'), decode(item[1],'utf-8')))
        return ", ".join(items)

    def format_asn1_date(self, d):
        return datetime.strptime(decode(d,'utf-8'), '%Y%m%d%H%M%SZ').strftime("%Y-%m-%d %H:%M:%S GMT")

    def merge_cert(self, extensions, certificate):
        z = certificate.copy()
        z.update(extensions)
        return z

    def __init__(self, certdata):

        built_cert = certdata
        self.x509 = load_certificate(FILETYPE_PEM, built_cert)
        keytype = self.x509.get_pubkey().type()
        keytype_list = {TYPE_RSA:'rsaEncryption', TYPE_DSA:'dsaEncryption', 408:'id-ecPublicKey'}
        extension_list = ["extendedKeyUsage",
                        "keyUsage",
                        "subjectAltName",
                        "subjectKeyIdentifier",
                        "authorityKeyIdentifier"]
        key_type_str = keytype_list[keytype] if keytype in keytype_list else 'other'

        certificate = {}
        extension = {}
        for i in range(self.x509.get_extension_count()):
            critical = 'critical' if self.x509.get_extension(i).get_critical() else ''

            if decode(self.x509.get_extension(i).get_short_name(),'utf-8') in extension_list:
                extension[decode(self.x509.get_extension(i).get_short_name(),'utf-8')] = self.x509.get_extension(i).__str__()

        certificate = {'Thumbprint': decode(self.x509.digest('sha1'),'utf-8'), 'Version': self.x509.get_version(),
         'SignatureAlg' : decode(self.x509.get_signature_algorithm(),'utf-8'), 'Issuer' :self.format_subject_issuer(self.x509.get_issuer()), 
         'Valid From' : self.format_asn1_date(self.x509.get_notBefore()), 'Valid Until' : self.format_asn1_date(self.x509.get_notAfter()),
         'Subject' : self.format_subject_issuer(self.x509.get_subject())}
        
        combined = self.merge_cert(extension,certificate)
        cert_output = json.dumps(combined)

        self.subjectAltName = combined.get('subjectAltName')
        self.subject = combined.get('Subject')
        self.validfrom = combined.get('Valid From')
        self.validuntil = combined.get('Valid Until')
        self.thumbprint = combined.get('Thumbprint')
        self.subjectkey = combined.get('subjectKeyIdentifier')
        self.authkey = combined.get('authorityKeyIdentifier')
        self.combined = combined

class parseSts( object ):

    def __init__(self):
        self.processed = []
        self.results = {}
        self.results['expired'] = {}
        self.results['expired']['root'] = []
        self.results['expired']['leaf'] = []
        self.results['valid'] = {}
        self.results['valid']['root'] = []
        self.results['valid']['leaf'] = []

    def get_certs(self,force_refresh):
        urllib2.getproxies = lambda: {}
        vmafd_client = vmafd.client('localhost')
        domain_name = vmafd_client.GetDomainName()

        dc_name = vmafd_client.GetAffinitizedDC(domain_name, force_refresh)
        if vmafd_client.GetPNID() == dc_name:
            url = (
                'http://localhost:7080/idm/tenant/%s/certificates?scope=TENANT'
                % domain_name)
        else:
            url = (
                'https://%s/idm/tenant/%s/certificates?scope=TENANT'
                % (dc_name,domain_name))
        return json.loads(urllib2.urlopen(url).read().decode('utf-8'))

    def check_cert(self,certificate):
        cert = parseCert(certificate)
        certdetail = cert.combined

            #  Attempt to identify what type of certificate it is
        if cert.authkey:
            cert_type = "leaf"
        else:
            cert_type = "root"
        
        #  Try to only process a cert once
        if cert.thumbprint not in self.processed:
            # Date conversion
            self.processed.append(cert.thumbprint)
            exp = cert.validuntil.split()[0]
            conv_exp = datetime.strptime(exp, '%Y-%m-%d')
            exp = datetime.strftime(conv_exp, '%d-%m-%Y')
            now = datetime.strptime(today, '%d-%m-%Y')
            exp_date = datetime.strptime(exp, '%d-%m-%Y')
            
            # Get number of days until it expires
            diff = exp_date - now
            certdetail['daysUntil'] = diff.days

            # Sort expired certs into leafs and roots, put the rest in goodcerts.
            if exp_date <= now:
                self.results['expired'][cert_type].append(certdetail)
            else:
                self.results['valid'][cert_type].append(certdetail)
    
    def execute(self):

        json = self.get_certs(force_refresh=False)
        for item in json:
            for certificate in item['certificates']:
                self.check_cert(certificate['encoded'])
        return self.results

def main():

    warning = False
    warningmsg = '''
    WARNING! 
    You have expired STS certificates.  Please follow the KB corresponding to your OS:
    VCSA:  %s
    Windows:  %s
    ''' % (vcsa_kblink, win_kblink)
    parse_sts = parseSts()
    results = parse_sts.execute()
    valid_count = len(results['valid']['leaf']) + len(results['valid']['root'])
    expired_count = len(results['expired']['leaf']) + len(results['expired']['root'])
          
    
    #### Display Valid ####
    print("\n%s VALID CERTS\n================" % valid_count)
    print("\n\tLEAF CERTS:\n")
    if len(results['valid']['leaf']) > 0:
        for cert in results['valid']['leaf']:
            print("\t[] Certificate %s will expire in %s days (%s years)." % (cert['Thumbprint'], cert['daysUntil'], round(cert['daysUntil']/365)))
    else:
        print("\tNone")
    print("\n\tROOT CERTS:\n")
    if len(results['valid']['root']) > 0:
        for cert in results['valid']['root']:
            print("\t[] Certificate %s will expire in %s days (%s years)." % (cert['Thumbprint'], cert['daysUntil'], round(cert['daysUntil']/365)))
    else:
        print("\tNone")


    #### Display expired ####
    print("\n%s EXPIRED CERTS\n================" % expired_count)
    print("\n\tLEAF CERTS:\n")
    if len(results['expired']['leaf']) > 0:
        for cert in results['expired']['leaf']:
            print("\t[] Certificate: %s expired on %s!" % (cert.get('Thumbprint'),cert.get('Valid Until')))
            continue
    else:
        print("\tNone")

    print("\n\tROOT CERTS:\n")
    if len(results['expired']['root']) > 0:
        for cert in results['expired']['root']:
            print("\t[] Certificate: %s expired on %s!" % (cert.get('Thumbprint'),cert.get('Valid Until')))
            continue
    else:
        print("\tNone")

    if expired_count > 0:
        print(warningmsg)


if __name__ == '__main__':
    exit(main())


执行脚本

python checksts.py
for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

这里发现证书过期了。证书有效期两年,过期前可以通话Web界面续订。现在已经过期,只有重置证书了。

4、重置证书

/usr/lib/vmware-vmca/bin/certificate-manager

接下来,填写一堆东西,很多都是默认。特别注意登录账户、密码要填对。

5、重启vCenter

reboot

6、删除旧证书

这里使用了一个技术工程师传来的一个脚本,名称为“clean_backup_stores.sh”。
加上执行权限后,执行

#!/bin/bash

#Cesar Badilla Monday, November 16, 2020 10:41:17 PM 

echo "######################################################"
echo;echo "These are the current Certificate Stores:";echo
		
		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done; 
echo;echo "If there is any expired or expiring Certificates within the BACKUP_STORES please continue to run this script";echo "######################################################";echo 

	read -p "Have you taken powered off snapshots of all PSC's and VCSA's within the SSO domain(Y|y|N|n)" -n 1 -r

	if [[ ! $REPLY =~ ^[Yy]$ ]]
	then 
	exit 1
	fi
echo

		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store BACKUP_STORE |grep -i "alias" | cut -d ":" -f2);do echo BACKUP_STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store BACKUP_STORE --alias $i -y; done 
		
	for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done | grep -i 'BACKUP_STORE_H5C'&> /dev/null

	if [ $? == 0 ]; then 
		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store BACKUP_STORE_H5C |grep -i "alias" | cut -d ":" -f2); do echo BACKUP_STORE_H5C $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store BACKUP_STORE_H5C --alias $i -y; done
	
echo 
echo "--------------------------------------------------------";
fi

echo "######################################################";
echo;echo "The resulting BACKUP_STORES after the cleanups are: ";echo

		for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

echo "######################################################";echo "--------------------------------------------------------"; echo "--------------------------------------------------------";
echo "Results: ";
echo "--------------------------------------------------------"; echo "--------------------------------------------------------";
echo;echo "The Certificate BACKUP_STORES were successfully cleaned";echo;
echo "Please acknowlege and reset to green any certificate related alarm."
echo "Restart services on all PSC's and VCSA's in the SSO Domain with command.";echo;echo "service-control --stop --all && service-control --start --all(optional)."
echo "--------------------------------------------------------";
echo;echo "If you could not restart the services, please monitor
the VCSA for 24 hours and the alarm should not reappear 
after the acknowlegement."
echo;echo "######################################################"

7、验证

登录vCenter,一切恢复正常。

二、总结

证书有效期两年,两年内,续订;超过两年,重置。
另外,技术工程师操作的比较谨慎,在处理之前,先通过ESXi对 vCenter做了一个快照。配置完成后,检查没问题后,才把这个快照删了。

你可能感兴趣的:(VMware,虚拟机,vCenter,Esxi,虚拟化)