OpenShift Container Platform 4.3部署实录

本文参照红帽官方文档,在裸机安装Openshift4.3文档进行。因为只有一台64G内存的PC机,安装vmware vsphere 6.7免费版进行本测试,所以尝试在OCP官方文档要求的最低内存需求基础上,内存减半安装,记录如下。

1、ocp安装的过程

红帽官方文档记载的安装过程如下:

  1. bootstrap启动并从准备好master需要的资源
  2. master从bootstrap获取需要的资源并完成启动
  3. master通过bootstrap构建etcd集群
  4. bootstrap使用刚才构建的etcd集群启动一个临时的kubernetes control plane
  5. 临时control plane在master节点启动生产control plane
  6. 临时control plane关闭并将控制权移交给生产control plane
  7. bootstrap将ocp组建注入到生产control plane
  8. 安装程序关闭bootstrap
  9. control plane 部署计算节点
  10. control plane 通过operator方式安装其他服务

2、准备服务器资源

服务器规划如下:

  • 3台control plane节点,安装etcd、control plane组件和infras基础组件,因为资源紧张,不部署dns服务器,通过hosts文件解析域名;
  • 2台compute 节点,运行实际负载;
  • 1台bootstrap节点,执行安装任务;
  • 1台misc/lb节点,用于准备安装资源、启动bootstrap,并作为lb节点使用。
Hostname vcpu ram hdd ip fqdn
misc/lb 4 8g 120g 192.168.128.30 misc.ocptest.ipingcloud.com/lb.ocptest.ipincloud.com
bootstrap 4 8g 120g 192.168.128.31 bootstrap.ocptest.ipincloud.com
master1 4 8g 120g 192.168.128.32 master1.ocptest.ipincloud.com
master2 4 8g 120g 192.168.128.33 master2.ocptest.ipincloud.com
master3 4 8g 120g 192.168.128.34 master3.ocptest.ipincloud.com
worker1 2 4g 120g 192.168.128.35 worker1.ocptest.ipincloud.com
worker2 2 4g 120g 192.168.128.36 worker2.ocptest.ipincloud.com

3、准备网络资源

api server和ingress公用一个lb,即misc/lb
以为dns配置记录,ocptest是cluster名,ipingcloud.com是基础域名.这些配置,需要修改ansi-playbook文件的tasks/相应模板。
参见
https://github.com/scwang18/ocp4-upi-helpernode.git

  • dns配置
组件 dns记录 描述
Kubernetes API api.ocptest.ipincloud.com 该DNS记录指向control plane节点的负载平衡器。群集外部和群集中所有节点都必须可以解析此记录。
Kubernetes API api-int.ocptest.ipincloud.com 该DNS记录指向control plane节点的负载平衡器。该记录必须可从群集中的所有节点上解析。
Routes *.apps.ocptest.ipincloud.com 通配符DNS记录指向ingress slb。群集外部和群集中所有节点都必须可以解析此记录。
etcd etcd-.ocptest.ipincloud.com DNS记录指向etcd节点,群集所有节点都必须可以解析此记录。
etcd _etcd-server-ssl._tcp.ocptest.ipincloud.com 因为etcd使用2380对外服务,因此,需要建立对应每台etcd节点的srv dns记录,优先级0,权重10和端口2380,如下表
  • etcd srv dns记录表

#一下激怒是必须的,用于bootstrap创建etcd服务器上,自动配置etcd服务解析

#_service._proto.name. TTL class SRV priority weight port target.
_etcd-server-ssl._tcp.. 86400 IN SRV 0 10 2380 etcd-0...
_etcd-server-ssl._tcp.. 86400 IN SRV 0 10 2380 etcd-1...
_etcd-server-ssl._tcp.. 86400 IN SRV 0 10 2380 etcd-2...
  • 创建ssh私钥并加入ssh agent

通过免登陆ssh私钥,可以用core用户身份登录到master节点,在集群上进行安装调试和灾难恢复。

(1)在misc节点上执行一下命令创建sshkey

ssh-keygen -t rsa -b 4096 -N '' 

以上命令在~/.ssh/文件夹下创建id_rsa和id_rsa.pub两个文件。

(2)启动ssh agent进程并把将无密码登录的私钥加入ssh agent

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa

下一步安装ocp时,需要将ssh公钥提供给安装程序配置文件。

因为我们采用自己手动准备资源方式,因此,需要将ssh公钥放到集群各节点,本机就可以免密码登录集群节点

#将刚才生成的 ~/.ssh目录中的 id_rsa.pub 这个文件拷贝到你要登录的集群节点 的~/.ssh目录中
scp ~/.ssh/id_rsa.pub [email protected]:~/.ssh/
#然后在集群节点上运行以下命令来将公钥导入到~/.ssh/authorized_keys这个文件中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

4、获取安装程序

需要注册红帽官网账号,下载测试版安装程序,下载链接具体过程略。
https://cloud.redhat.com/openshift/install/metal/user-provisioned

  • 下载安装程序
rm -rf /data/pkg
mkdir -p /data/pkg
cd /data/pkg

#ocp安装程序
#wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-install-linux-4.3.0.tar.gz

#ocp 客户端
#wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux-4.3.0.tar.gz

#rhcos安装程序
wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.0-x86_64-installer.iso

#rhcos  bios raw文件
wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.0-x86_64-metal.raw.gz

#如果采用iso文件方式安装,相面两个文件都不需要下载

#rhcos安装程序内核文件,用于使用ipex方式安装
wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.0-x86_64-installer-kernel

#rhcos初始化镜像文件,用于使用ipex方式安装
wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.0-x86_64-installer-initramfs.img

5、准备工具机misc

参照王征的脚本修改的工具机准备工具,可以方便的在工具机上启动 LB、DHCP、PXE、DNS和HTTP服务
(1)安装ansible和git

yum -y install ansible git

(2)从github拉取playbook

cd /data/pkg
git clone https://github.com/scwang18/ocp4-upi-helpernode.git

(3)修改playbook的参数文件
根据自己的网络规划修改参数文件

[root@centos75 pkg]# cd /data/pkg/ocp4-upi-helpernode/
[root@centos75 ocp4-upi-helpernode]# cat vars-static.yaml
[root@misc pkg]# cat vars-static.yaml
---
staticips: true
named: true
helper:
  name: "helper"
  ipaddr: "192.168.128.30"
  networkifacename: "ens192"
dns:
  domain: "ipincloud.com"
  clusterid: "ocptest"
  forwarder1: "192.168.128.30"
  forwarder2: "192.168.128.30"
  registry:
    name: "registry"
    ipaddr: "192.168.128.30"
  yum:
    name: "yum"
    ipaddr: "192.168.128.30"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.128.31"
masters:
  - name: "master1"
    ipaddr: "192.168.128.32"
  - name: "master2"
    ipaddr: "192.168.128.33"
  - name: "master3"
    ipaddr: "192.168.128.34"
workers:
  - name: "worker1"
    ipaddr: "192.168.128.35"
  - name: "worker2"
    ipaddr: "192.168.128.36"
force_ocp_download: false

ocp_bios: "file:///data/pkg/rhcos-4.3.0-x86_64-metal.raw.gz"
ocp_initramfs: "file:///data/pkg/rhcos-4.3.0-x86_64-installer-initramfs.img"
ocp_install_kernel: "file:///data/pkg/rhcos-4.3.0-x86_64-installer-kernel"
ocp_client: "file:///data/pkg/openshift-client-linux-4.3.0.tar.gz"
ocp_installer: "file:///data/pkg/openshift-install-linux-4.3.0.tar.gz"
ocp_filetranspiler: "file:///data/pkg/filetranspiler-master.zip"
registry_server: "registry.ipincloud.com:8443"
[root@misc pkg]#

(4)执行ansible安装

ansible-playbook -e @vars-static.yaml tasks/main.yml

6、准备docker env

# 在可以科学上网的机器上打包必要的镜像文件

#rm -rf /data/ocp4
mkdir -p /data/ocp4
cd /data/ocp4

# 这个脚本不好用,不下载,使用下面自己修改过
# wget https://raw.githubusercontent.com/wangzheng422/docker_env/dev/redhat/ocp4/4.3/scripts/build.dist.sh

yum -y install podman docker-distribution pigz skopeo docker buildah jq python3-pip 

pip3 install yq

# https://blog.csdn.net/ffzhihua/article/details/85237411
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/python-rhsm-certificates-1.19.10-1.el7_4.x86_64.rpm
rpm2cpio python-rhsm-certificates-1.19.10-1.el7_4.x86_64.rpm | cpio -iv --to-stdout ./etc/rhsm/ca/redhat-uep.pem | tee /etc/rhsm/ca/redhat-uep.pem

systemctl start docker

docker login -u wuliangye2019 -p Red@123! registry.redhat.io
docker login -u wuliangye2019 -p Red@123! registry.access.redhat.com
docker login -u wuliangye2019 -p Red@123! registry.connect.redhat.com

podman login -u wuliangye2019 -p Red@123! registry.redhat.io
podman login -u wuliangye2019 -p Red@123! registry.access.redhat.com
podman login -u wuliangye2019 -p Red@123! registry.connect.redhat.com

# to download the pull-secret.json, open following link
# https://cloud.redhat.com/openshift/install/metal/user-provisioned
cat << 'EOF' > /data/pull-secret.json
{"auths":{"cloud.openshift.com":{"auth":"xxxxxxxxxxx}}}
EOF

创建 build.dist.sh文件

#!/usr/bin/env bash

set -e
set -x

var_date=$(date '+%Y-%m-%d')
echo $var_date
#以下不用每次都执行
#cat << EOF >>  /etc/hosts
#127.0.0.1 registry.ipincloud.com
#EOF


#mkdir -p /etc/crts/
#cd /etc/crts
#openssl req \
#   -newkey rsa:2048 -nodes -keyout ipincloud.com.key \
#   -x509 -days 3650 -out ipincloud.com.crt -subj \
#   "/C=CN/ST=GD/L=SZ/O=Global Security/OU=IT Department/CN=*.ipincloud.com"

#cp /etc/crts/ipincloud.com.crt /etc/pki/ca-trust/source/anchors/
#update-ca-trust extract

systemctl stop docker-distribution

rm -rf /data/registry
mkdir -p /data/registry
cat << EOF > /etc/docker-distribution/registry/config.yml
version: 0.1
log:
  fields:
    service: registry
storage:
    cache:
        layerinfo: inmemory
    filesystem:
        rootdirectory: /data/registry
    delete:
        enabled: true
http:
    addr: :8443
    tls:
       certificate: /etc/crts/ipincloud.com.crt
       key: /etc/crts/ipincloud.com.key
EOF
systemctl restart docker
systemctl enable docker-distribution

systemctl restart docker-distribution

build_number_list=$(cat << EOF
4.3.0
EOF
)
mkdir -p /data/ocp4
cd /data/ocp4

install_build() {
    BUILDNUMBER=$1
    echo ${BUILDNUMBER}
    
    mkdir -p /data/ocp4/${BUILDNUMBER}
    cd /data/ocp4/${BUILDNUMBER}

    #下载并安装openshift客户端和安装程序 第一次需要运行,工具机ansi初始化时,已经完成这些动作了
    #wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/release.txt

    #wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-client-linux-${BUILDNUMBER}.tar.gz
    #wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/${BUILDNUMBER}/openshift-install-linux-${BUILDNUMBER}.tar.gz

    #解压安装程序和客户端到用户执行目录 第一次需要运行
    #tar -xzf openshift-client-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
    #tar -xzf openshift-install-linux-${BUILDNUMBER}.tar.gz -C /usr/local/bin/
    
    export OCP_RELEASE=${BUILDNUMBER}
    export LOCAL_REG='registry.ipincloud.com:8443'
    export LOCAL_REPO='ocp4/openshift4'
    export UPSTREAM_REPO='openshift-release-dev'
    export LOCAL_SECRET_JSON="/data/pull-secret.json"
    export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=${LOCAL_REG}/${LOCAL_REPO}:${OCP_RELEASE}
    export RELEASE_NAME="ocp-release"

    oc adm release mirror -a ${LOCAL_SECRET_JSON} \
    --from=quay.io/${UPSTREAM_REPO}/${RELEASE_NAME}:${OCP_RELEASE}-x86_64 \
    --to-release-image=${LOCAL_REG}/${LOCAL_REPO}:${OCP_RELEASE} \
    --to=${LOCAL_REG}/${LOCAL_REPO}

}

while read -r line; do
    install_build $line
done <<< "$build_number_list"

cd /data/ocp4

#wget -O ocp4-upi-helpernode-master.zip https://github.com/wangzheng422/ocp4-upi-helpernode/archive/master.zip

#以下注释,因为quay.io/wangzheng422这个仓库的registry版本是v1不能与v2共存
#podman pull quay.io/wangzheng422/filetranspiler
#podman save quay.io/wangzheng422/filetranspiler | pigz -c > filetranspiler.tgz

#podman pull docker.io/library/registry:2
#podman save docker.io/library/registry:2 | pigz -c > registry.tgz

systemctl start docker

docker login -u wuliangye2019 -p Red@123! registry.redhat.io
docker login -u wuliangye2019 -p Red@123! registry.access.redhat.com
docker login -u wuliangye2019 -p Red@123! registry.connect.redhat.com

podman login -u wuliangye2019 -p Red@123! registry.redhat.io
podman login -u wuliangye2019 -p Red@123! registry.access.redhat.com
podman login -u wuliangye2019 -p Red@123! registry.connect.redhat.com

# 以下命令要运行 2-3个小时,耐心等待。。。

# build operator catalog
podman login registry.ipincloud.com:8443 -u root -p Scwang18
oc adm catalog build \
    --appregistry-endpoint https://quay.io/cnr \
    --appregistry-org redhat-operators \
    --to=${LOCAL_REG}/ocp4-operator/redhat-operators:v1
    
oc adm catalog mirror \
    ${LOCAL_REG}/ocp4-operator/redhat-operators:v1 \
    ${LOCAL_REG}/operator

#cd /data
#tar cf - registry/ | pigz -c > registry.tgz

#cd /data
#tar cf - ocp4/ | pigz -c > ocp4.tgz

执行build.dist.sh脚本

这里有个巨坑,因为从quay.io拉取image镜像到本地时,拉取的文件有5G多,通常一次拉取不完,会出错,每次出错后,重新运行build.dist.sh会把以前的registry删除掉,从头再来,浪费很多时间,实际上可以不用删除,执行oc adm release mirror时会自动跳过已经存在的image。血泪教训。

bash build.dist.sh

oc adm release mirror执行完毕后,回根据官方镜像仓库生成本地镜像仓库,返回的信息需要记录下来,特别是imageContentSource信息,后面 install-config.yaml 文件里配置进去


Success
Update image:  registry.ipincloud.com:8443/ocp4/openshift4:4.3.0
Mirror prefix: registry.ipincloud.com:8443/ocp4/openshift4

To use the new mirrored repository to install, add the following section to the install-config.yaml:

imageContentSources:
- mirrors:
  - registry.ipincloud.com:8443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ipincloud.com:8443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev


To use the new mirrored repository for upgrades, use the following to create an ImageContentSourcePolicy:

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: example
spec:
  repositoryDigestMirrors:
  - mirrors:
    - registry.ipincloud.com:8443/ocp4/openshift4
    source: quay.io/openshift-release-dev/ocp-release
  - mirrors:
    - registry.ipincloud.com:8443/ocp4/openshift4
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

以下命令不需要执行,在build.dish.sh里已经执行了

oc adm release mirror -a /data/pull-secret.json --from=quay.io/openshift-release-dev/ocp-release:4.3.0-x86_64 --to-release-image=registry.ipincloud.com:8443/ocp4/openshift4:4.3.0 --to=registry.ipincloud.com:8443/ocp4/openshift4    

podman login registry.ipincloud.com:8443 -u root -p Scwang18
oc adm catalog build \
    --appregistry-endpoint https://quay.io/cnr \
    --appregistry-org redhat-operators \
    --to=registry.ipincloud.com:8443/ocp4-operator/redhat-operators:v1
    
oc adm catalog mirror \
    registry.ipincloud.com:8443/ocp4-operator/redhat-operators:v1 \
    registry.ipincloud.com:8443/operator

#如果oc adm catalog mirror执行不成功,会生成一个mapping.txt的文件,可以根据这个文件,执行不成功的行删除,再以下面的方式执行
oc image mirror -a /data/pull-secret.json -f /data/mapping-ok.txt


oc image mirror quay.io/external_storage/nfs-client-provisioner:latest registry.ipincloud.com:8443/ocp4/openshift4/nfs-client-provisioner:latest

oc image mirror quay.io/external_storage/nfs-client-provisioner:latest registry.ipincloud.com:8443/quay.io/external_storage/nfs-client-provisioner:latest

#查看镜像的sha
curl -v --silent -H "Accept: application/vnd.docker.distribution.manifest.v2+json" -X GET  https://registry.ipincloud.com:8443/v2/ocp4/openshift4/nfs-client-provisioner/manifests/latest 2>&1 | grep Docker-Content-Digest | awk '{print ($3)}'

#删除镜像摘要
curl -v --silent -H "Accept: application/vnd.docker.distribution.manifest.v2+json" -X DELETE https://registry.ipincloud.com:8443/v2/ocp4/openshift4/nfs-client-provisioner/manifests/sha256:022ea0b0d69834b652a4c53655d78642ae23f0324309097be874fb58d09d2919

#回收镜像空间
podman exec -it  mirror-registry /bin/registry garbage-collect  /etc/docker/registry/config.yml

7、创建installer配置文件

(1)创建installer文件夹

rm -rf /data/install
mkdir -p /data/install
cd /data/install

(2)定制install-config.yaml文件

  • 补充pullSecret
[root@misc data]# cat /data/pull-secret.json
{"auths":{"cloud.openshift.com":{"auth":"省略"}}}
  • 添加sshKey(3.1创建的公钥文件内容)
cat ~/.ssh/id_rsa.pub
  • additionalTrustBundle(Mirror registry创建是生成的csr)
[root@misc crts]# cat /etc/crts/ipincloud.com.crt
-----BEGIN CERTIFICATE-----
xxx省略
-----END CERTIFICATE-----
  • 添加代理

生产环境可以不用直连外网,通过在install-config.yaml文件为集群设置代理。

本次测试,为了加速外网下载,我在aws上事先搭建了一个v2ray server,misc服务器作为v2ray客户端,具体搭建过程另文叙述。

  • 在反复试验时,比如 install-config.yaml 所在的目录是 config,必须 rm -rf install 而不是 rm -rf install/*,后者未删除其中的隐藏文件 .openshift_install_state.json,有可能引起:x509: certificate has expired or is not yet valid。

  • 在文档和博客示例中 install-config.yaml 的 cidr 配置为 10 网段,由于未细看文档理解成了节点机网段,这造成了整个过程中最莫名其妙的错误:no matches for kind MachineConfig。

  • 最终文件内容如下:

[root@centos75 install]# vi install-config.yaml
apiVersion: v1
baseDomain: ipincloud.com
proxy:
  httpProxy: http://192.168.128.30:8001
  httpsProxy: http://192.168.128.30:8001
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocptest
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: '{"auths":{"省略'
additionalTrustBundle: |
  -----BEGIN CERTIFICATE-----
  省略,注意这里要前面空两格
  -----END CERTIFICATE-----
imageContentSources:
- mirrors:
  - registry.ipincloud.com:8443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - registry.ipincloud.com:8443/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

(3)备份定制install-config.yaml文件,便于以后可以重复使用

cd /data/install
cp install-config.yaml  ../install-config.yaml.20200205

8、创建Kubernetes manifest和Ignition配置文件

(1)生成Kubernetes manifests文件

openshift-install create manifests --dir=/data/install

注意:指定install-config.yaml所在目录是,需要使用绝的路径

(2)修改 manifests/cluster-scheduler-02-config.yml文件以防止pod调度到control plane节点

红帽官方安装文档说明,kubernetes不支持ingress的load balancer访问control-plane节点的pod

a.打开manifests/cluster-scheduler-02-config.yml
b.找到mastersSchedulable参数,设置为False
c.保存并退出。

vi /data/install/manifests/cluster-scheduler-02-config.yml

(3)创建Ignition配置文件

注意:创建Ignition配置文件完成后,install-config.yaml文件将被删除,请务必先备份此文件。

openshift-install create ignition-configs --dir=/data/install

(4)将Ignition配置文件拷贝到http服务器目录,待安装时使用

cd /data/install
\cp -f bootstrap.ign /var/www/html/ignition/bootstrap.ign
\cp -f master.ign /var/www/html/ignition/master1.ign
\cp -f master.ign /var/www/html/ignition/master2.ign
\cp -f master.ign /var/www/html/ignition/master3.ign
\cp -f worker.ign /var/www/html/ignition/worker1.ign
\cp -f worker.ign /var/www/html/ignition/worker2.ign

cd /var/www/html/ignition/
chmod 755 *.ign

至此,已完成必要的配置文件设置,开始进入下一步创建节点。

9、定制RHCOS ISO

安装时需要修改启动参数,只能手动录入,每台机器修改很麻烦,容易出错,因此我们采用genisoimage来定制每台机器的安装镜像。

#安装镜像创建工具
yum -y install genisoimage libguestfs-tools
systemctl start libvirtd

#设置环境变量
export NGINX_DIRECTORY=/data/pkg
export RHCOSVERSION=4.3.0
export VOLID=$(isoinfo -d -i ${NGINX_DIRECTORY}/rhcos-${RHCOSVERSION}-x86_64-installer.iso | awk '/Volume id/ { print $3 }')
#生成一个临时文件目录,用于放置过程文件
TEMPDIR=$(mktemp -d)
echo $VOLID
echo $TEMPDIR


cd ${TEMPDIR}
# Extract the ISO content using guestfish (to avoid sudo mount)
#使用guestfish可以将不用sudo mount将iso文件解压出来
guestfish -a ${NGINX_DIRECTORY}/rhcos-${RHCOSVERSION}-x86_64-installer.iso \
  -m /dev/sda tar-out / - | tar xvf -

#定义修改配置文件的函数
modify_cfg(){
  for file in "EFI/redhat/grub.cfg" "isolinux/isolinux.cfg"; do
    # 添加恰当的 image 和 ignition url
    sed -e '/coreos.inst=yes/s|$| coreos.inst.install_dev=sda coreos.inst.image_url='"${URL}"'\/install\/'"${BIOSMODE}"'.raw.gz coreos.inst.ignition_url='"${URL}"'\/ignition\/'"${NODE}"'.ign ip='"${IP}"'::'"${GATEWAY}"':'"${NETMASK}"':'"${FQDN}"':'"${NET_INTERFACE}"':none:'"${DNS}"' nameserver='"${DNS}"'|' ${file} > $(pwd)/${NODE}_${file##*/}
    # 修改参数里的启动等待时间
    sed -i -e 's/default vesamenu.c32/default linux/g' -e 's/timeout 600/timeout 10/g' $(pwd)/${NODE}_${file##*/}
  done
}

#设置url,网关、dns等iso启动通用参数变量
URL="http://192.168.128.30:8080"
GATEWAY="192.168.128.254"
NETMASK="255.255.255.0"
DNS="192.168.128.30"

#设置bootstrap节点变量
NODE="bootstrap"
IP="192.168.128.31"
FQDN="bootstrap"
BIOSMODE="bios"
NET_INTERFACE="ens192"
modify_cfg

#设置master1节点变量
NODE="master1"
IP="192.168.128.32"
FQDN="master1"
BIOSMODE="bios"
NET_INTERFACE="ens192"
modify_cfg

#设置master2节点变量
NODE="master2"
IP="192.168.128.33"
FQDN="master2"
BIOSMODE="bios"
NET_INTERFACE="ens192"
modify_cfg

#设置master3节点变量
NODE="master3"
IP="192.168.128.34"
FQDN="master3"
BIOSMODE="bios"
NET_INTERFACE="ens192"
modify_cfg

#设置master4节点变量
NODE="worker1"
IP="192.168.128.35"
FQDN="worker1"
BIOSMODE="bios"
NET_INTERFACE="ens192"
modify_cfg

#设置master5节点变量
NODE="worker2"
IP="192.168.128.36"
FQDN="worker2"
BIOSMODE="bios"
NET_INTERFACE="ens192"
modify_cfg


# 为每个节点创建不同的安装镜像
# https://github.com/coreos/coreos-assembler/blob/master/src/cmd-buildextend-installer#L97-L103
for node in bootstrap master1 master2 master3 worker1 worker2; do
  # 为每个节点创建不同的 grub.cfg and isolinux.cfg 文件
  for file in "EFI/redhat/grub.cfg" "isolinux/isolinux.cfg"; do
    /bin/cp -f $(pwd)/${node}_${file##*/} ${file}
  done
  # 创建iso镜像
  genisoimage -verbose -rock -J -joliet-long -volset ${VOLID} \
    -eltorito-boot isolinux/isolinux.bin -eltorito-catalog isolinux/boot.cat \
    -no-emul-boot -boot-load-size 4 -boot-info-table \
    -eltorito-alt-boot -efi-boot images/efiboot.img -no-emul-boot \
    -o ${NGINX_DIRECTORY}/${node}.iso .
done

# 清除过程文件
cd
rm -Rf ${TEMPDIR}

cd ${NGINX_DIRECTORY}

9、在节点机器上安装RHCOS

(1)将定制的ISO文件拷贝到vmware esxi主机上,准备装节点

[root@misc pkg]# scp bootstrap.iso [email protected]:/vmfs/volumes/hdd/iso
[root@misc pkg]# scp m*.iso [email protected]:/vmfs/volumes/hdd/iso
[root@misc pkg]# scp w*.iso [email protected]:/vmfs/volumes/hdd/iso

(2)按规划创建master,设置从iso启动安装

  • 进入启动界面后,直接点击安装,系统自动回自动下载bios和配置文件,完成安装
  • 安装完成后,需要将iso文件退出来,避免再次进入安装界面
  • 安装顺序是bootstrap,master1,master2,master3,待master安装并启动完成后,再进行worker安装
  • 安装过程中可以通过proxy查看进度 http://registry.ipincloud.com:9000/
  • 安装过程中可以在misc节点查看详细的bootstrap进度。
openshift-install --dir=/data/install wait-for bootstrap-complete --log-level debug

注意事项:

  • ignition和iso文件的正确匹配
  • 我在安装的时候,master1提示etcdmain: member ab84b6a6e4a3cc9a has already been bootstrapped,花了很多时间分析和解决问题,因为master1在安装完成后,etcd组件会自动安装并注册为member,我再次使用iso文件重新安装master1后,etcd自动安装注册时,会检测到etcd及集群里已经有这个member,无法重新注册,因此这个节点的etcd一直无法正常启动,解决办法是:

手工修改-aster1节点的etcd的yaml文件,在exec etcd命令末尾增加–initial-cluster-state=existing参数,再删除问题POD后,系统会自动重新安装etcd pod,恢复正常。
正常启动以后,要把这个改回去,否则machine-config回一直无法完成

#
[root@master1 /]# vi /etc/kubernetes/manifests/etcd-member.yaml

      exec etcd \
        --initial-advertise-peer-urls=https://${ETCD_IPV4_ADDRESS}:2380 \
        --cert-file=/etc/ssl/etcd/system:etcd-server:${ETCD_DNS_NAME}.crt \
        --key-file=/etc/ssl/etcd/system:etcd-server:${ETCD_DNS_NAME}.key \
        --trusted-ca-file=/etc/ssl/etcd/ca.crt \
        --client-cert-auth=true \
        --peer-cert-file=/etc/ssl/etcd/system:etcd-peer:${ETCD_DNS_NAME}.crt \
        --peer-key-file=/etc/ssl/etcd/system:etcd-peer:${ETCD_DNS_NAME}.key \
        --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
        --peer-client-cert-auth=true \
        --advertise-client-urls=https://${ETCD_IPV4_ADDRESS}:2379 \
        --listen-client-urls=https://0.0.0.0:2379 \
        --listen-peer-urls=https://0.0.0.0:2380 \
        --listen-metrics-urls=https://0.0.0.0:9978 \
        --initial-cluster-state=existing
        
[root@master1 /]# crictl pods
POD ID              CREATED             STATE               NAME                                                     NAMESPACE                                ATTEMPT
c4686dc3e5f4f       38 minutes ago      Ready               etcd-member-master1.ocptest.ipincloud.com                openshift-etcd                           5        
[root@master1 /]# crictl rmp xxx

  • 检查是否安装完成
    如果出现INFO It is now safe to remove the bootstrap resources,表示master节点安装完成,控制面转移到master集群。

[root@misc install]# openshift-install --dir=/data/install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer v4.3.0
DEBUG Built from commit 2055609f95b19322ee6cfdd0bea73399297c4a3e
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocptest.ipincloud.com:6443...
INFO API v1.16.2 up
INFO Waiting up to 30m0s for bootstrapping to complete...
DEBUG Bootstrap status: complete
INFO It is now safe to remove the bootstrap resources
[root@misc install]#

(3)安装worker

  • 进入启动界面后,直接点击安装,系统自动回自动下载bios和配置文件,完成安装
  • 安装完成后,需要将iso文件退出来,避免再次进入安装界面
  • 安装顺序是bootstrap,master1,master2,master3,待master安装并启动完成后,再进行worker安装
  • 安装过程中可以通过proxy查看进度 http://registry.ipincloud.com:9000/
  • 也可以在misc节点是查看详细安装节点
[root@misc redhat-operators-manifests]#  openshift-install --dir=/data/install wait-for install-complete --log-level debug
DEBUG OpenShift Installer v4.3.0
DEBUG Built from commit 2055609f95b19322ee6cfdd0bea73399297c4a3e
INFO Waiting up to 30m0s for the cluster at https://api.ocptest.ipincloud.com:6443 to initialize...
DEBUG Cluster is initialized
INFO Waiting up to 10m0s for the openshift-console route to be created...
DEBUG Route found in openshift-console namespace: console
DEBUG Route found in openshift-console namespace: downloads
DEBUG OpenShift console route is created
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/data/install/auth/kubeconfig'
INFO Access the OpenShift web-console here:
https://console-openshift-console.apps.ocptest.ipincloud.com
INFO Login to the console with user: kubeadmin, password: pubmD-8Baaq-IX36r-WIWWf


  • 需要审批worker节点的加入申请

查看待审批的csr

[root@misc ~]# oc get csr
NAME        AGE   REQUESTOR                                                                   CONDITION
csr-7lln5   70m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-d48xk   69m   system:node:master1.ocptest.ipincloud.com                                   Approved,Issued
csr-f2g7r   69m   system:node:master2.ocptest.ipincloud.com                                   Approved,Issued
csr-gbn2n   69m   system:node:master3.ocptest.ipincloud.com                                   Approved,Issued
csr-hwxwx   13m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-ppgxx   13m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-wg874   70m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-zkp79   70m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
[root@misc ~]#

执行审批

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

(3)在misc上启动nfs


bash /data/pkg/ocp4-upi-helpernode/files/nfs-provisioner-setup.sh
#查看状态
oc get pods -n nfs-provisioner
(4)ocp内部registry使用nfs作为存储
oc patch configs.imageregistry.operator.openshift.io cluster -p '{"spec":{"storage":{"pvc":{"claim":""}}}}' --type=merge

oc get clusteroperator image-registry

10 配置登录

(1)配置普通管理员账号

#在misc机器上创建admin token
mkdir -p ~/auth
htpasswd -bBc ~/auth/admin-passwd admin scwang18
#拷贝到本地
mkdir -p ~/auth
scp -P 20030 [email protected]:/root/auth/admin-passwd  ~/auth/
#在 OAuth Details 页面添加 HTPasswd 类型的 Identity Providers 并上传admin-passwd 文件。
https://console-openshift-console.apps.ocptest.ipincloud.com
#授予新建的admin用户集群管理员权限
oc adm policy add-cluster-role-to-user cluster-admin admin

你可能感兴趣的:(openshift)