【K8S学习一】基于rhel7的k8s+harbor离线安装部署及测试使用全过程

一、导语

kubernetes的兴起与应用不仅为容器的发展推波助澜,也成就了云原生技术的火爆。同样,金融行业也逐步涌现出很多上云的系统。为了保证我行后期上云更加容易,更容易上手,因此对k8s及云原生技术也开展一些学习和实验。实验才是掌握知识最快的方式,开展k8s的相关学习,我也是选择从安装部署开始,拥有一套自己的kubernetes集群,然后带着疑问进行研究学习,后期也会通过书本进行一些系统的了解和学习,希望自己能够坚持下去。下面则通过离线的方式基于RHEL7搭建一套k8s集群。

环境:

IP 主机名 功能
172.16.131.83 k8s-master master管理节点
172.16.131.84 k8s-node1 工作节点1
172.16.131.85 k8s-node2 工作节点2
172.16.131.86 k8s-node3 工作节点3
172.16.131.87 registry-harbor 仓库
172.16.131.88 k8s-zhongzhuan 外网中转

二、部署前准备工作:

在k8s集群上操作

1)在k8s节点修改主机名:

cp /etc/hosts /etc/hosts_`date +%y%m%d`
echo "
172.16.131.83 k8s-master
172.16.131.84 k8s-node1
172.16.131.85 k8s-node2
172.16.131.86 k8s-node3
172.16.131.87 registry-harbor
" >> /etc/hosts

2)系统参数配置:

echo "fs.file-max = 6815744
kernel.sem = 10000  10240000 10000 1024
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 751619276800
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.wmem_default = 16777216
fs.aio-max-nr = 6194304
vm.dirty_ratio=20
vm.dirty_background_ratio=3
vm.dirty_writeback_centisecs=100
vm.dirty_expire_centisecs=500
vm.min_free_kbytes=524288
net.core.netdev_max_backlog = 30000
net.core.netdev_budget = 600
#vm.nr_hugepages = 
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
net.ipv4.ipfrag_time = 60
net.ipv4.ipfrag_low_thresh = 6291456
net.ipv4.ipfrag_high_thresh = 8388608
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
vm.swappiness=0">> /etc/sysctl.conf && sysctl -p

3)用户限制参数配置:

cp /etc/security/limits.conf /etc/security/limits_`date +"%Y%m%d_%H%M%S"`.conf
echo "
* soft    nproc   655350
* hard    nproc   655350
* soft    nofile  655360
* hard    nofile  655360

* soft    stack  102400
* hard    stack  327680
* soft    stack  102400
* hard    stack  327680
* soft memlock -1
* hard memlock -1" >>/etc/security/limits.conf

4)关闭防火墙:

systemctl stop firewalld
systemctl disable firewalld

5)关闭selinux:

setenforce 0
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config

6)关闭透明大页:

[ -f /sys/kernel/mm/transparent_hugepage/enabled ] &&  echo never > /sys/kernel/mm/transparent_hugepage/enabled
 [ -f /sys/kernel/mm/redhat_transparent_hugepage/enabled ] && echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
 grep transparent_hugepage /etc/rc.d/rc.local 1>/dev/null || echo '[ -f /sys/kernel/mm/transparent_hugepage/enabled ] &&  echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.local
 grep redhat_transparent_hugepage /etc/rc.d/rc.local 1>/dev/null || echo '[ -f /sys/kernel/mm/redhat_transparent_hugepage/enabled ] && echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled' >> /etc/rc.local
 [ -x /etc/rc.d/rc.local ] || chmod +x /etc/rc.d/rc.local

7)关闭swap

swapoff -a
sed -i 's/.*swap.*/#&/' /etc/fstab

8)配置ssh(sshUserSetup.sh具体内容见附录)

sh sshUserSetup.sh -user root -hosts "k8s-master k8s-node1 k8s-node2 k8s-node3"

9)同步时钟(其他节点同步):

master中:
vi /etc/ntp.conf

#server 0.rhel.pool.ntp.org iburst
#server 1.rhel.pool.ntp.org iburst
#server 2.rhel.pool.ntp.org iburst
#server 3.rhel.pool.ntp.org iburst
server 127.127.1.0
fudge 127.127.1.0 stratum 10


其他机器:
crontab -e
*/2 * * * * /usr/sbin/ntpdate 172.16.131.83
date && ssh k8s-node1 date && ssh k8s-node2 date && ssh k8s-node3 date

二、安装容器基础docker及cri-docker

在kubernetes的1.24之后,kubernetes对docker作为容器运行时兼容性不好,在部署初始化时时会出现无法从私有仓库里拉取镜像的问题。因此,此时则有两种解决方案,即方案一,部署cri-docker配合docker容器运行时进行使用;方案二,使用containerd作为容器运行时。这里我们选择第一种方式进行。

联网中转机器上操作:

1)安装需要软件(利用本地源即可)

yum install -y yum-utils device-mapper-persistent-data lvm2 wget

2)安装epel(需要centos7源)

获取阿里云的centos-7的repo文件:
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

3)修改CentOS-Base.repo文件,把文件里面的$releasever全部替换为版本号7:

vi /etc/yum.repos.d/CentOS-Base.repo
%s/$releasever/7/g 

4)清理注册源:

yum clean all&& yum makecache fast

5)安装epel-release.noarch

yum install -y epel-release.noarch

6)下载docker源

yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
or
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

7)生效yum仓库

yum-config-manager --enable docker-ce-nightly
(检查可以安装的docker版本:yum list docker-ce --showduplicates | sort -r)

注:当检查可安装的docker版本时出现以下类似错误的时候

https://mirrors.aliyun.com/docker-ce/linux/centos/7Server/x86_64/stable/repodata/7cc100684a6630e5382cf07c92483acecdff60eb94243af9acb95654c2913d70-primary.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found
Trying other mirror.

主要原因是由于,仓库配置中的$releasever找不到导致,此时可以作如下操作:

vi /etc/yum.repos.d/docker-ce.repo
%s/$releasever/7/g 

8)清理注册源:

yum clean all&& yum makecache fast

9)下载指定版本的docker的相关部署包:

mkdir -p /app/soft/docker
cd /app/soft/docker
yumdownloader --resolve docker-ce-23.0.1

10)打包:

cd /app/soft
tar -cvzf docker_v23.0.1_offline_pkg.tar.gz docker

11)将docker_v23.0.1_offline_pkg.tar.gz包发送至离线机器

scp -rp docker_offline_pkg.tar.gz 172.16.131.83:/app/soft/
scp -rp docker_offline_pkg.tar.gz 172.16.131.84:/app/soft/
scp -rp docker_offline_pkg.tar.gz 172.16.131.85:/app/soft/
scp -rp docker_offline_pkg.tar.gz 172.16.131.86:/app/soft/
scp -rp docker_offline_pkg.tar.gz 172.16.131.87:/app/soft/

12)下载cri-doceker的二进制包
下载地址:

https://github.com/Mirantis/cri-dockerd/releases/

选择二进制包:

cri-dockerd-0.3.1.amd64.tgz

在k8s集群上操作

1)解压离线安装包:

tar -xvzf docker_offline_pkg.tar.gz -C /app/soft/
tar -xvzf cri-dockerd-0.3.1.amd64.tgz -C /app/soft/

2)安装docker:

cd /app/soft/
yum install *.rpm

3)启动docker:

systemctl start docker && systemctl enable docker

4)安装cri-docker,解压安装包

tar -xvzf cri-dockerd-0.3.1.amd64.tgz -C /app/soft

5)拷贝二进制文件到/usr/bin下,并设置权限:

cd cri-dockerd
cp cri-dockerd /usr/bin/
chmod +x /usr/bin/cri-dockerd 

6)配置cri-dockerd的启动文件:

cat <<"EOF" > /usr/lib/systemd/system/cri-docker.service
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket

[Service]
Type=notify

ExecStart=/usr/bin/cri-dockerd --network-plugin=cni --pod-infra-container-image=172.16.131.87:1088/kubernetes-deploy/pause:3.7

ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

StartLimitBurst=3

StartLimitInterval=60s

LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target

EOF

注:在启动文件里面需要加pod-infra-container-image的配置,否则后续在进行kubernetes安装部署的时候,pause的下载会默认到k8s.gcr.io/pause3.7上下载,从而无法获取,加上改参数,则会到我们指定的仓库下载镜像,具体参数如下:–pod-infra-container-image=172.16.131.87:1088/kubernetes-v1.24.12-deploy/pause:3.7

7)配置生成socket文件:

cat <<"EOF" > /usr/lib/systemd/system/cri-docker.socket
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service

[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker

[Install]
WantedBy=sockets.target

EOF

8)启动cri-docker运行时

systemctl daemon-reload
systemctl start cri-docker
systemctl enable cri-docker
systemctl status cri-docker

三、安装私有仓库harbor

联网中转机器操作:

1)下载配置epel源

wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo

2)下载docker-compose

检查版本:
yum list docker-compose --showduplicates | sort -r
创建目录:
mkdir -p /app/soft/docker-compose
cd /app/soft/docker-compose
安装指定版本:
yumdownloader --resolve docker-compose-1.18.0

3)打包docker-compose安装包:

cd /app/soft
tar -cvzf docker-compase_offline_pkg_v1.18.0.tar.gz docker-compase

4)将docker_offline_pkg.tar.gz包发送至离线机器

scp -rp docker-compase_offline_pkg_v1.18.0.tar.gz 172.16.220.83:/app/soft/
scp -rp docker-compase_offline_pkg_v1.18.0.tar.gz 172.16.220.84:/app/soft/
scp -rp docker-compase_offline_pkg_v1.18.0.tar.gz 172.16.220.85:/app/soft/
scp -rp docker-compase_offline_pkg_v1.18.0.tar.gz 172.16.220.86:/app/soft/
scp -rp docker-compase_offline_pkg_v1.18.0.tar.gz 172.16.220.87:/app/soft/

5)在离线机器上解压离线安装包:

tar -xvzf docker-compase_offline_pkg_v1.18.0.tar.gz -C /app/soft/

6)在离线机器上安装docker-compase:

cd /app/soft/docker-compase
yum install *.rpm

7)下载harbor的离线安装包(联网中转机)

curl -O https://github.com/goharbor/harbor/releases/download/v2.7.1/harbor-offline-installer-v2.7.1.tgz
或者直接到github上手动下载上传

8)传输离线包至registry-harbor主机下并解压

scp -rp /app/soft/harbor-offline-installer-v2.7.1.tgz 172.16.131.87:/app/soft/
tar -xvzf /app/soft/harbor-offline-installer-v2.7.1.tgz -C /app/

9)根据需求修改yaml文件
cp harbor.yml.tmpl harbor.yml
vi harbor.yml
主要修改内容包括:

  • List item
  • hostname
  • port
  • 注释htpps及其相关内容
  • harbor_admin_password
  • database部分的password
  • data_volume
  • log部分的location
  • 其余则根据实际情况进行个性化修改,最终形成如下的harbor.yaml文件:
# Configuration file of Harbor

# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname: 172.16.131.87

# http related config
http:
  # port for http, default is 80. If https enabled, this port will redirect to https port
  port: 1088

# https related config
#https:
#  # https port for harbor, default is 443
#  port: 443
#  # The path of cert and key files for nginx
#  certificate: /your/certificate/path
#  private_key: /your/private/key/path

# # Uncomment following will enable tls communication between all harbor components
# internal_tls:
#   # set enabled to true means internal tls is enabled
#   enabled: true
#   # put your cert and key files on dir
#   dir: /etc/harbor/tls/internal

# Uncomment external_url if you want to enable external proxy
# And when it enabled the hostname will no longer used
# external_url: https://reg.mydomain.com:8433

# The initial password of Harbor admin
# It only works in first time to install harbor
# Remember Change the admin password from UI after launching Harbor.
harbor_admin_password: Harbor@1234

# Harbor DB configuration
database:
  # The password for the root user of Harbor DB. Change this before any production use.
  password: Harbor@1234
  # The maximum number of connections in the idle connection pool. If it <=0, no idle connections are retained.
  max_idle_conns: 100
  # The maximum number of open connections to the database. If it <= 0, then there is no limit on the number of open connections.
  # Note: the default number of connections is 1024 for postgres of harbor.
  max_open_conns: 900
  # The maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse. If it <= 0, connections are not closed due to a connection's age.
  # The value is a duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
  conn_max_lifetime: 5m
  # The maximum amount of time a connection may be idle. Expired connections may be closed lazily before reuse. If it <= 0, connections are not closed due to a connection's idle time.
  # The value is a duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
  conn_max_idle_time: 0

# The default data volume
data_volume: /app/data

# Harbor Storage settings by default is using /data dir on local filesystem
# Uncomment storage_service setting If you want to using external storage
# storage_service:
#   # ca_bundle is the path to the custom root ca certificate, which will be injected into the truststore
#   # of registry's and chart repository's containers.  This is usually needed when the user hosts a internal storage with self signed certificate.
#   ca_bundle:

#   # storage backend, default is filesystem, options include filesystem, azure, gcs, s3, swift and oss
#   # for more info about this configuration please refer https://docs.docker.com/registry/configuration/
#   filesystem:
#     maxthreads: 100
#   # set disable to true when you want to disable registry redirect
#   redirect:
#     disabled: false

# Trivy configuration
#
# Trivy DB contains vulnerability information from NVD, Red Hat, and many other upstream vulnerability databases.
# It is downloaded by Trivy from the GitHub release page https://github.com/aquasecurity/trivy-db/releases and cached
# in the local file system. In addition, the database contains the update timestamp so Trivy can detect whether it
# should download a newer version from the Internet or use the cached one. Currently, the database is updated every
# 12 hours and published as a new release to GitHub.
trivy:
  # ignoreUnfixed The flag to display only fixed vulnerabilities
  ignore_unfixed: false
  # skipUpdate The flag to enable or disable Trivy DB downloads from GitHub
  #
  # You might want to enable this flag in test or CI/CD environments to avoid GitHub rate limiting issues.
  # If the flag is enabled you have to download the `trivy-offline.tar.gz` archive manually, extract `trivy.db` and
  # `metadata.json` files and mount them in the `/home/scanner/.cache/trivy/db` path.
  skip_update: false
  #
  # The offline_scan option prevents Trivy from sending API requests to identify dependencies.
  # Scanning JAR files and pom.xml may require Internet access for better detection, but this option tries to avoid it.
  # For example, the offline mode will not try to resolve transitive dependencies in pom.xml when the dependency doesn't
  # exist in the local repositories. It means a number of detected vulnerabilities might be fewer in offline mode.
  # It would work if all the dependencies are in local.
  # This option doesn’t affect DB download. You need to specify "skip-update" as well as "offline-scan" in an air-gapped environment.
  offline_scan: false
  #
  # Comma-separated list of what security issues to detect. Possible values are `vuln`, `config` and `secret`. Defaults to `vuln`.
  security_check: vuln
  #
  # insecure The flag to skip verifying registry certificate
  insecure: false
  # github_token The GitHub access token to download Trivy DB
  #
  # Anonymous downloads from GitHub are subject to the limit of 60 requests per hour. Normally such rate limit is enough
  # for production operations. If, for any reason, it's not enough, you could increase the rate limit to 5000
  # requests per hour by specifying the GitHub access token. For more details on GitHub rate limiting please consult
  # https://developer.github.com/v3/#rate-limiting
  #
  # You can create a GitHub token by following the instructions in
  # https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line
  #
  # github_token: xxx

jobservice:
  # Maximum number of job workers in job service
  max_job_workers: 10

notification:
  # Maximum retry count for webhook job
  webhook_job_max_retry: 10

chart:
  # Change the value of absolute_url to enabled can enable absolute url in chart
  absolute_url: disabled

# Log configurations
log:
  # options are debug, info, warning, error, fatal
  level: info
  # configs for logs in local storage
  local:
    # Log files are rotated log_rotate_count times before being removed. If count is 0, old versions are removed rather than rotated.
    rotate_count: 50
    # Log files are rotated only if they grow bigger than log_rotate_size bytes. If size is followed by k, the size is assumed to be in kilobytes.
    # If the M is used, the size is in megabytes, and if G is used, the size is in gigabytes. So size 100, size 100k, size 100M and size 100G
    # are all valid.
    rotate_size: 200M
    # The directory on your host that store log
    location: /app/harbor/log

  # Uncomment following lines to enable external syslog endpoint.
  # external_endpoint:
  #   # protocol used to transmit log to external endpoint, options is tcp or udp
  #   protocol: tcp
  #   # The host of external endpoint
  #   host: localhost
  #   # Port of external endpoint
  #   port: 5140

#This attribute is for migrator to detect the version of the .cfg file, DO NOT MODIFY!
_version: 2.7.0

# Uncomment external_database if using external database.
# external_database:
#   harbor:
#     host: harbor_db_host
#     port: harbor_db_port
#     db_name: harbor_db_name
#     username: harbor_db_username
#     password: harbor_db_password
#     ssl_mode: disable
#     max_idle_conns: 2
#     max_open_conns: 0
#   notary_signer:
#     host: notary_signer_db_host
#     port: notary_signer_db_port
#     db_name: notary_signer_db_name
#     username: notary_signer_db_username
#     password: notary_signer_db_password
#     ssl_mode: disable
#   notary_server:
#     host: notary_server_db_host
#     port: notary_server_db_port
#     db_name: notary_server_db_name
#     username: notary_server_db_username
#     password: notary_server_db_password
#     ssl_mode: disable

# Uncomment external_redis if using external Redis server
# external_redis:
#   # support redis, redis+sentinel
#   # host for redis: :
#   # host for redis+sentinel:
#   #  :,:,:
#   host: redis:6379
#   password: 
#   # sentinel_master_set must be set to support redis+sentinel
#   #sentinel_master_set:
#   # db_index 0 is for core, it's unchangeable
#   registry_db_index: 1
#   jobservice_db_index: 2
#   chartmuseum_db_index: 3
#   trivy_db_index: 5
#   idle_timeout_seconds: 30

# Uncomment uaa for trusting the certificate of uaa instance that is hosted via self-signed cert.
# uaa:
#   ca_file: /path/to/ca

# Global proxy
# Config http proxy for components, e.g. http://my.proxy.com:3128
# Components doesn't need to connect to each others via http proxy.
# Remove component from `components` array if want disable proxy
# for it. If you want use proxy for replication, MUST enable proxy
# for core and jobservice, and set `http_proxy` and `https_proxy`.
# Add domain to the `no_proxy` field, when you want disable proxy
# for some special registry.
proxy:
  http_proxy:
  https_proxy:
  no_proxy:
  components:
    - core
    - jobservice
    - trivy

# metric:
#   enabled: false
#   port: 9090
#   path: /metrics

# Trace related config
# only can enable one trace provider(jaeger or otel) at the same time,
# and when using jaeger as provider, can only enable it with agent mode or collector mode.
# if using jaeger collector mode, uncomment endpoint and uncomment username, password if needed
# if using jaeger agetn mode uncomment agent_host and agent_port
# trace:
#   enabled: true
#   # set sample_rate to 1 if you wanna sampling 100% of trace data; set 0.5 if you wanna sampling 50% of trace data, and so forth
#   sample_rate: 1
#   # # namespace used to differenciate different harbor services
#   # namespace:
#   # # attributes is a key value dict contains user defined attributes used to initialize trace provider
#   # attributes:
#   #   application: harbor
#   # # jaeger should be 1.26 or newer.
#   # jaeger:
#   #   endpoint: http://hostname:14268/api/traces
#   #   username:
#   #   password:
#   #   agent_host: hostname
#   #   # export trace data by jaeger.thrift in compact mode
#   #   agent_port: 6831
#   # otel:
#   #   endpoint: hostname:4318
#   #   url_path: /v1/traces
#   #   compression: false
#   #   insecure: true
#   #   timeout: 10s

# enable purge _upload directories
upload_purging:
  enabled: true
  # remove files in _upload directories which exist for a period of time, default is one week.
  age: 168h
  # the interval of the purge operations
  interval: 24h
  dryrun: false

# cache layer configurations
# If this feature enabled, harbor will cache the resource
# `project/project_metadata/repository/artifact/manifest` in the redis
# which can especially help to improve the performance of high concurrent
# manifest pulling.
# NOTICE
# If you are deploying Harbor in HA mode, make sure that all the harbor
# instances have the same behaviour, all with caching enabled or disabled,
# otherwise it can lead to potential data inconsistency.
cache:
  # not enabled by default
  enabled: false
  # keep cache for one day by default
  expire_hours: 24

10)安装harbor:

cd /app/soft/harbor
./install.sh

11)修改各服务器的容器仓库源为内网harbor,且将docker容器的cgroup的控制模式调整为systemd:

cat > /etc/docker/daemon.json<

四、安装部署kubernetes(使用kubeadm)

联网中转机:

1)配置kubernetes的yum源

cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

2)重新加载yum源

yum clean all && yum makecache

3)查看版本kubelet,kubeadm,kubectl的版本

yum list  kubelet --showduplicates | sort -r
yum list  kubeadm --showduplicates | sort -r
yum list  kubectl --showduplicates | sort -r

4)下载kubeadm相关包

yumdownloader kubelet-1.25.6 --resolve --destdir=/app/soft/kubernetes/kubelet
yumdownloader kubeadm-1.25.6 --resolve --destdir=/app/soft/kubernetes/kubeadm
yumdownloader kubectl-1.25.6 --resolve --destdir=/app/soft/kubernetes/kubectl

5)生成后,将kubeadm文件夹下载的kubectl-1.26.3和kubelet-1.26.3移走,并打包剩余的安装包

cd /app/soft/kubernetes/kubeadm/
mv *kubectl-1.26.3*.rpm *kubelet-1.26*.rpm ../../

tar -cvzf kubeadm_1.25.6_offline_install_pkg.tar.gz /app/soft/kubernetes

6)传输至离线的所有节点:

scp -rp /app/soft/kubeadm_1.25.6_offline_install_pkg.tar.gz 172.16.131.83:/app/soft/
scp -rp /app/soft/kubeadm_1.25.6_offline_install_pkg.tar.gz 172.16.131.84:/app/soft/
scp -rp /app/soft/kubeadm_1.25.6_offline_install_pkg.tar.gz 172.16.131.85:/app/soft/
scp -rp /app/soft/kubeadm_1.25.6_offline_install_pkg.tar.gz 172.16.131.86:/app/soft/

kubernetes集群:

1)所有机器,解压并安装kubelet,kubectl,kubeadm

tar -xvzf /app/soft/kubeadm_1.25.6_offline_install_pkg.tar.gz -C /app/
cd /app/kubernets/kubelet/
yum install -y *.rpm
cd /app/kubernets/kubectl/
yum install -y *.rpm
cd /app/kubernets/kubeadm/
yum install -y *.rpm

2)启动kubelet服务

systemctl start kubelet && systemctl enable kubelet && systemctl status kubelet

3)查看部署kubernetes所需的镜像版本

kubeadm config images list --kubernetes-version=v1.25.6 

registry.k8s.io/kube-apiserver:v1.25.6
registry.k8s.io/kube-controller-manager:v1.25.6
registry.k8s.io/kube-scheduler:v1.25.6
registry.k8s.io/kube-proxy:v1.25.6
registry.k8s.io/pause:3.7
registry.k8s.io/etcd:3.5.6-0
registry.k8s.io/coredns/coredns:v1.8.6

联网中转机:

1)下载k8s镜像:

mkdir -p /app/soft/k8s_images


docker pull dyrnq/kube-apiserver:v1.25.6
docker pull dyrnq/kube-controller-manager:v1.25.6
docker pull dyrnq/kube-scheduler:v1.25.6
docker pull dyrnq/kube-proxy:v1.25.6
docker pull dyrnq/pause:3.7
docker pull dyrnq/etcd:3.5.6-0
docker pull dyrnq/coredns:v1.8.6
docker pull registry:latest
docker pull quay.io/coreos/flannel:v0.15.1
docker pull flannel/flannel-cni-plugin:v1.1.2
docker pull nginx:latest

2)打包镜像:

docker save dyrnq/kube-apiserver:v1.25.6 -o kube-apiserver_v1.25.6.tar
docker save dyrnq/kube-controller-manager:v1.25.6 -o kube-controller-manager_v1.25.6.tar
docker save dyrnq/kube-scheduler:v1.25.6 -o kube-scheduler_v1.25.6.tar
docker save dyrnq/kube-proxy:v1.25.6 -o kube-proxy_v1.25.6.tar
docker save dyrnq/pause:3.7 -o pause_v1.25.6.tar
docker save dyrnq/etcd:3.5.6-0 -o etcd_v1.25.6.tar
docker save dyrnq/coredns:v1.8.6 -o coredns_v1.25.6.tar
docker save registry:latest -o registry_latest.tar
docker save quay.io/coreos/flannel:v0.15.1 -o flannel_v0.15.1.tar
docker save flannel/flannel-cni-plugin:v1.1.2 -o flannel-cni-plugin_v1.1.2.tar
docker save nginx:latest -o nginx:latest

3)将打包的镜像压缩,并传输至k8s的master节点

tar -cvzf /app/soft/k8s_images.tar.gz /app/soft/k8s_images
scp -rp /app/soft/k8s_images.tar.gz 172.16.131.83:/app/soft

k8s集群中master节点:

1)解压镜像

tar -xvzf /app/soft/k8s_images.tar.gz -C /app/soft

2)加载镜像

cd k8s_images

for i in `ls`
> do
> docker load -i $i
> done

3)重新给镜像打包:

docker images|awk '{print "docker tag " $1 ":" $2 " 172.16.131.87:1088/kubernetes-deploy/" $1 ":" $2}'|sed 1d

docker tag dyrnq/kube-apiserver:v1.25.6 172.16.131.87:1088/kubernetes-deploy/kube-apiserver:v1.25.6 
docker tag dyrnq/kube-controller-manager:v1.25.6 172.16.131.87:1088/kubernetes-deploy/kube-controller-manager:v1.25.6
docker tag dyrnq/kube-scheduler:v1.25.6 172.16.131.87:1088/kubernetes-deploy/kube-scheduler:v1.25.6
docker tag dyrnq/kube-proxy:v1.25.6 172.16.131.87:1088/kubernetes-deploy/kube-proxy:v1.25.6 
docker tag dyrnq/pause:3.7 172.16.131.87:1088/kubernetes-deploy/pause:3.7
docker tag dyrnq/etcd:3.5.6-0 172.16.131.87:1088/kubernetes-deploy/etcd:3.5.6-0 
docker tag dyrnq/coredns:v1.8.6 172.16.131.87:1088/kubernetes-deploy/coredns:v1.8.6
docker tag registry:latest 172.16.131.87:1088/kubernetes-deploy/registry:latest
docker tag quay.io/coreos/flannel:v0.15.1 172.16.131.87:1088/kubernetes-deploy/flannel:v0.15.1 
docker tag flannel/flannel-cni-plugin:v1.1.2 172.16.131.87:1088/kubernetes-deploy/flannel-cni-plugin:v1.1.2
docker tag nginx:latest 172.16.131.87:1088/kubernetes-deploy/nginx:latest

4)在各个节点登陆私有并在master节点推入新tag的镜像到仓库中:

docker login 172.16.131.87:1088
Username: admin
Password: 
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


docker images|grep "172.16.131.87"|awk '{print "docker push " $1 ":" $2}'

docker push 172.16.131.87:1088/kubernetes-deploy/kube-apiserver:v1.25.6 
docker push 172.16.131.87:1088/kubernetes-deploy/kube-controller-manager:v1.25.6
docker push 172.16.131.87:1088/kubernetes-deploy/kube-scheduler:v1.25.6
docker push 172.16.131.87:1088/kubernetes-deploy/kube-proxy:v1.25.6 
docker push 172.16.131.87:1088/kubernetes-deploy/pause:3.7
docker push 172.16.131.87:1088/kubernetes-deploy/etcd:3.5.6-0 
docker push 172.16.131.87:1088/kubernetes-deploy/coredns:v1.8.6
docker push 172.16.131.87:1088/kubernetes-deploy/registry:latest
docker push 172.16.131.87:1088/kubernetes-deploy/flannel:v0.15.1 
docker push 172.16.131.87:1088/kubernetes-deploy/flannel-cni-plugin:v1.1.2
docker push 172.16.131.87:1088/kubernetes-deploy/nginx:latest

5)在master节点初始化kubernetes集群

在master节点生成初始化集群参数配置文件:

kubeadm config print init-defaults > kubeadm.yaml

修改配置文件参数:

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.16.131.83
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/cri-dockerd.sock
  imagePullPolicy: IfNotPresent
  name: k8s-master
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: 172.16.131.87:1088/kubernetes-deploy
kind: ClusterConfiguration
kubernetesVersion: 1.25.6
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.224.0.0/16
scheduler: {}

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

6)修改containered的cri配置文件

vi /etc/containered/config.toml

将diaabled_plugins=["cri"]禁用

7)初始化kubernetes

kubeadm init --config=kubeadm.yaml


[init] Using Kubernetes version: v1.25.6
[preflight] Running pre-flight checks
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 23.0.1. Latest validated version: 18.09
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.1.0.1 172.16.131.83]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [172.16.131.83 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [172.16.131.83 127.0.0.1 ::1]
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 17.002750 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --experimental-upload-certs
[mark-control-plane] Marking the node k8s-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: f93xna.7kr79tn4z6fmzf23
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.16.131.83:6443 --token f93xna.7kr79tn4z6fmzf23 \
    --discovery-token-ca-cert-hash sha256:40dba9e45ffce1c08d415c44a962974f9081c1ae9e74e922a2410d4e0ebac590

8)根据提示启动kubernetes集群

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

注:
如果想重新初始化集群则需要做reset,此时则可进行如下操作(必须加–cri-socket unix:///var/run/cri-docker.sock,否则会报错):

kubeadm reset --cri-socket unix:///var/run/cri-docker.sock

9)配置fannel(或calcio)网络,用于不同主机之间的容器网络交互:

联网中转机操作下载fannel的yml配置文件:

wget https://github.com/flannel-io/flannel/blob/master/Documentation/kube-flannel.yml

10)修改kube-flannel.yml文件

---
kind: Namespace
apiVersion: v1
metadata:
  name: kube-flannel
  labels:
    k8s-app: flannel
    pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  labels:
    k8s-app: flannel
  name: flannel
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
- apiGroups:
  - networking.k8s.io
  resources:
  - clustercidrs
  verbs:
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  labels:
    k8s-app: flannel
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: flannel
  name: flannel
  namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
  labels:
    tier: node
    k8s-app: flannel
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-flannel
  labels:
    tier: node
    app: flannel
    k8s-app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni-plugin
        image: 172.16.131.87:1088/kubernetes-v1.24.12-deploy/flannel-cni-plugin:v1.1.2
       #image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.2
        command:
        - cp
        args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        volumeMounts:
        - name: cni-plugin
          mountPath: /opt/cni/bin
      - name: install-cni
        image: 172.16.131.87:1088/kubernetes-v1.24.12-deploy/flannel:v0.15.1
       #image: docker.io/rancher/mirrored-flannelcni-flannel:v0.21.4
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: 172.16.131.87:1088/kubernetes-v1.24.12-deploy/flannel:v0.15.1
       #image: docker.io/rancher/mirrored-flannelcni-flannel:v0.21.4
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
        - name: xtables-lock
          mountPath: /run/xtables.lock
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni-plugin
        hostPath:
          path: /opt/cni/bin
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate

11)在master上配置FANNEL网络:

kubectl apply -f /apps/flannel/kube-flannel.yml

12)根据上述提示在其他节点上执行命令加入kubectl集群(需要在命令上加入–cri-socket unix:///var/run/cri-dockerd.sock,否则会失败):

kubeadm join 172.16.131.83:6443 --token f93xna.7kr79tn4z6fmzf23 --discovery-token-ca-cert-hash sha256:40dba9e45ffce1c08d415c44a962974f9081c1ae9e74e922a2410d4e0ebac590 --cri-socket unix:///var/run/cri-dockerd.sock

也可以通过以下方式在主节点生成集群加入命令,并拷贝到其他node上执行:

kubeadm token create --print-join-command

kubeadm join 172.16.131.83:6443 --token r7oaex.qgqvdqvlyuubt5aw     --discovery-token-ca-cert-hash sha256:40dba9e45ffce1c08d415c44a962974f9081c1ae9e74e922a2410d4e0ebac590 --cri-socket unix:///var/run/cri-dockerd.sock

13)node节点执行后,如下则说明成功将节点加入集群,以后有新的节点需要加入kubernets集群也一样:

e922a2410d4e0ebac590
[preflight] Running pre-flight checks
        [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 23.0.1. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

14)检查集群状态:

kubectl get nodes

NAME       STATUS     ROLES     AGE     VERSION
k8s-master Ready      master    1h      v1.25.6
k8s-node1  Ready          2h      v1.25.6
k8s-node2  Ready          1h      v1.25.6
k8s-node3  Ready          1h      v1.25.6

注:
我在部署完成后,长时间检查发现node节点一直处于NotReady的状态

NAME       STATUS        ROLES     AGE     VERSION
k8s-master    Ready      master    1h      v1.25.6
k8s-node1  NotReady          2h      v1.25.6
k8s-node2  NotReady          1h      v1.25.6
k8s-node3  NotReady          1h      v1.25.6

此时kubenetes的状态是不正确的,因此需要排查,我们可以在k8s节点上运行如下命令用于查看错误日志,方便我们排查问题:

journalctl -u kubelet -f

此时在日志中,我看到两个报错:

  • 报错如下:
k8s-node1 kubelet[27242]: I1014 11:17:29.409068 27242 cni.go:239] "Unable to update cni config" err="no networks found in /etc/cni/net.d"
Oct 14 11:17:29 k8s-node1 kubelet[27242]: E1014 11:17:29.996079 27242 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
  • 报错为无法通过172.16.131.87:1088/kuberenets-deploy仓库获取镜像,认证失败

问题1的处理方式,即其他节点缺失配置文件,传输主节点的网络配置文件到其他节点即可

scp -rp /etc/cni k8s-node1:/etc/
scp -rp /etc/cni k8s-node2:/etc/
scp -rp /etc/cni k8s-node3:/etc/

此时可以发现所有节点状态为ready,即kubernetes的状态已经正确。

问题2的出现则是由于在搭建仓库上传k8s镜像的时候,将项目kubernetes-deploy项目设置为了私有,因此无法下载,最简单的方式就是直接在harbor上将该项目设置为公开即可(私有方式如何获取镜像后续再讨论)。

至此,我们整个基于红帽7的k8s通过kubeadm的离线安装部署整个就完成了,接下来就是通过部署一个nginx来验证整个集群的可用性了。

附:sshUserSetup.sh

#!/bin/sh
# Nitin Jerath - Aug 2005
#Usage sshUserSetup.sh  -user  [ -hosts \"\" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]
#eg. sshUserSetup.sh -hosts "host1 host2" -user njerath -advanced
#This script is used to setup SSH connectivity from the host on which it is
# run to the specified remote hosts. After this script is run, the user can use # SSH to run commands on the remote hosts or copy files between the local host
# and the remote hosts without being prompted for passwords or confirmations.
# The list of remote hosts and the user name on the remote host is specified as 
# a command line parameter to the script. Note that in case the user on the 
# remote host has its home directory NFS mounted or shared across the remote 
# hosts, this script should be used with -shared option. 
#Specifying the -advanced option on the command line would result in SSH 
# connectivity being setup among the remote hosts which means that SSH can be 
# used to run commands on one remote host from the other remote host or copy 
# files between the remote hosts without being prompted for passwords or 
# confirmations.
#Please note that the script would remove write permissions on the remote hosts
#for the user home directory and ~/.ssh directory for "group" and "others". This
# is an SSH requirement. The user would be explicitly informed about this by teh script and prompted to continue. In case the user presses no, the script would exit. In case the user does not want to be prompted, he can use -confirm option.
# As a part of the setup, the script would use SSH to create files within ~/.ssh
# directory of the remote node and to setup the requisite permissions. The 
#script also uses SCP to copy the local host public key to the remote hosts so
# that the remote hosts trust the local host for SSH. At the time, the script 
#performs these steps, SSH connectivity has not been completely setup  hence
# the script would prompt the user for the remote host password. 
#For each remote host, for remote users with non-shared homes this would be 
# done once for SSH and  once for SCP. If the number of remote hosts are x, the 
# user would be prompted  2x times for passwords. For remote users with shared 
# homes, the user would be prompted only twice, once each for SCP and SSH.
#For security reasons, the script does not save passwords and reuse it. Also, 
# for security reasons, the script does not accept passwords redirected from a 
#file. The user has to key in the confirmations and passwords at the prompts.
#The -verify option means that the user just wants to verify whether SSH has 
#been set up. In this case, the script would not setup SSH but would only check
# whether SSH connectivity has been setup from the local host to the remote 
# hosts. The script would run the date command on each remote host using SSH. In
# case the user is prompted for a password or sees a warning message for a 
#particular host, it means SSH connectivity has not been setup correctly for
# that host.
#In case the -verify option is not specified, the script would setup SSH and 
#then do the verification as well.
#In case the user speciies the -exverify option, an exhaustive verification would be done. In that case, the following would be checked:
# 1. SSH connectivity from local host to all remote hosts.
# 2. SSH connectivity from each remote host to itself and other remote hosts.

#echo Parsing command line arguments
numargs=$#

ADVANCED=false
HOSTNAME=`hostname`
CONFIRM=no
SHARED=false
i=1
USR=$USER

if  test -z "$TEMP"
then
  TEMP=/tmp
fi

IDENTITY=id_rsa
LOGFILE=$TEMP/sshUserSetup_`date +%F-%H-%M-%S`.log
VERIFY=false
EXHAUSTIVE_VERIFY=false
HELP=false
PASSPHRASE=no
RERUN_SSHKEYGEN=no
NO_PROMPT_PASSPHRASE=no

while [ $i -le $numargs ]
do
  j=$1 
  if [ $j = "-hosts" ] 
  then
     HOSTS=$2
     shift 1
     i=`expr $i + 1`
  fi
  if [ $j = "-user" ] 
  then
     USR=$2
     shift 1
     i=`expr $i + 1`
   fi
  if [ $j = "-logfile" ] 
  then
     LOGFILE=$2
     shift 1
     i=`expr $i + 1`
   fi
  if [ $j = "-confirm" ] 
  then
     CONFIRM=yes
   fi
  if [ $j = "-hostfile" ] 
  then
     CLUSTER_CONFIGURATION_FILE=$2
     shift 1
     i=`expr $i + 1`
   fi
  if [ $j = "-usePassphrase" ] 
  then
     PASSPHRASE=yes
   fi
  if [ $j = "-noPromptPassphrase" ] 
  then
     NO_PROMPT_PASSPHRASE=yes
   fi
  if [ $j = "-shared" ] 
  then
     SHARED=true
   fi
  if [ $j = "-exverify" ] 
  then
     EXHAUSTIVE_VERIFY=true
   fi
  if [ $j = "-verify" ] 
  then
     VERIFY=true
   fi
  if [ $j = "-advanced" ] 
  then
     ADVANCED=true
   fi
  if [ $j = "-help" ] 
  then
     HELP=true
   fi
  i=`expr $i + 1`
  shift 1
done


if [ $HELP = "true" ]
then
  echo "Usage $0 -user  [ -hosts \"\" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]"
echo "This script is used to setup SSH connectivity from the host on which it is run to the specified remote hosts. After this script is run, the user can use  SSH to run commands on the remote hosts or copy files between the local host and the remote hosts without being prompted for passwords or confirmations.  The list of remote hosts and the user name on the remote host is specified as a command line parameter to the script. "
echo "-user : User on remote hosts. " 
echo "-hosts : Space separated remote hosts list. " 
echo "-hostfile : The user can specify the host names either through the -hosts option or by specifying the absolute path of a cluster configuration file. A sample host file contents are below: " 
echo
echo  "   stacg30 stacg30int 10.1.0.0 stacg30v  -"
echo  "   stacg34 stacg34int 10.1.0.1 stacg34v  -"
echo 
echo " The first column in each row of the host file will be used as the host name."
echo 
echo "-usePassphrase : The user wants to set up passphrase to encrypt the private key on the local host. " 
echo "-noPromptPassphrase : The user does not want to be prompted for passphrase related questions. This is for users who want the default behavior to be followed." 
echo "-shared : In case the user on the remote host has its home directory NFS mounted or shared across the remote hosts, this script should be used with -shared option. " 
echo "  It is possible for the user to determine whether a user's home directory is shared or non-shared. Let us say we want to determine that user user1's home directory is shared across hosts A, B and C."
echo " Follow the following steps:"
echo "    1. On host A, touch ~user1/checkSharedHome.tmp"
echo "    2. On hosts B and C, ls -al ~user1/checkSharedHome.tmp" 
echo "    3. If the file is present on hosts B and C in ~user1 directory and"
echo "       is identical on all hosts A, B, C, it means that the user's home "
echo "       directory is shared."
echo "    4. On host A, rm -f ~user1/checkSharedHome.tmp"
echo " In case the user accidentally passes -shared option for non-shared homes or viceversa,SSH connectivity would only be set up for a subset of the hosts. The user would have to re-run the setyp script with the correct option to rectify this problem."
echo "-advanced :  Specifying the -advanced option on the command line would result in SSH  connectivity being setup among the remote hosts which means that SSH can be used to run commands on one remote host from the other remote host or copy files between the remote hosts without being prompted for passwords or confirmations."
echo "-confirm: The script would remove write permissions on the remote hosts for the user home directory and ~/.ssh directory for "group" and "others". This is an SSH requirement. The user would be explicitly informed about this by the script and prompted to continue. In case the user presses no, the script would exit. In case the user does not want to be prompted, he can use -confirm option."
echo  "As a part of the setup, the script would use SSH to create files within ~/.ssh directory of the remote node and to setup the requisite permissions. The script also uses SCP to copy the local host public key to the remote hosts so that the remote hosts trust the local host for SSH. At the time, the script performs these steps, SSH connectivity has not been completely setup  hence the script would prompt the user for the remote host password.  "
echo "For each remote host, for remote users with non-shared homes this would be done once for SSH and  once for SCP. If the number of remote hosts are x, the user would be prompted  2x times for passwords. For remote users with shared homes, the user would be prompted only twice, once each for SCP and SSH.  For security reasons, the script does not save passwords and reuse it. Also, for security reasons, the script does not accept passwords redirected from a file. The user has to key in the confirmations and passwords at the prompts. "
echo "-verify : -verify option means that the user just wants to verify whether SSH has been set up. In this case, the script would not setup SSH but would only check whether SSH connectivity has been setup from the local host to the remote hosts. The script would run the date command on each remote host using SSH. In case the user is prompted for a password or sees a warning message for a particular host, it means SSH connectivity has not been setup correctly for that host.  In case the -verify option is not specified, the script would setup SSH and then do the verification as well. "
echo "-exverify : In case the user speciies the -exverify option, an exhaustive verification for all hosts would be done. In that case, the following would be checked: "
echo "   1. SSH connectivity from local host to all remote hosts. "
echo "   2. SSH connectivity from each remote host to itself and other remote hosts.  "
echo The -exverify option can be used in conjunction with the -verify option as well to do an exhaustive verification once the setup has been done.  
echo "Taking some examples: Let us say local host is Z, remote hosts are A,B and C. Local user is njerath. Remote users are racqa(non-shared), aime(shared)."
echo "$0 -user racqa -hosts "A B C" -advanced -exverify -confirm"
echo "Script would set up connectivity from Z -> A, Z -> B, Z -> C, A -> A, A -> B, A -> C, B -> A, B -> B, B -> C, C -> A, C -> B, C -> C."
echo "Since user has given -exverify option, all these scenario would be verified too."
echo
echo "Now the user runs : $0 -user racqa -hosts "A B C" -verify"
echo "Since -verify option is given, no SSH setup would be done, only verification of existing setup. Also, since -exverify or -advanced options are not given, script would only verify connectivity from Z -> A, Z -> B, Z -> C"

echo "Now the user runs : $0 -user racqa -hosts "A B C" -verify -advanced"
echo "Since -verify option is given, no SSH setup would be done, only verification of existing setup. Also, since  -advanced options is given, script would verify connectivity from Z -> A, Z -> B, Z -> C, A-> A, A->B, A->C, A->D"

echo "Now the user runs:"
echo "$0 -user aime -hosts "A B C" -confirm -shared"
echo "Script would set up connectivity between  Z->A, Z->B, Z->C only since advanced option is not given."
echo "All these scenarios would be verified too."

exit
fi

if test -z "$HOSTS"
then
   if test -n "$CLUSTER_CONFIGURATION_FILE" && test -f "$CLUSTER_CONFIGURATION_FILE"
   then
      HOSTS=`awk '$1 !~ /^#/ { str = str " " $1 } END { print str }' $CLUSTER_CONFIGURATION_FILE` 
   elif ! test -f "$CLUSTER_CONFIGURATION_FILE"
   then
     echo "Please specify a valid and existing cluster configuration file."
   fi
fi

if  test -z "$HOSTS" || test -z $USR
then
echo "Either user name or host information is missing"
echo "Usage $0 -user  [ -hosts \"\" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]" 
exit 1
fi

if [ -d $LOGFILE ]; then
    echo $LOGFILE is a directory, setting logfile to $LOGFILE/ssh.log
    LOGFILE=$LOGFILE/ssh.log
fi

echo The output of this script is also logged into $LOGFILE | tee -a $LOGFILE

if [ `echo $?` != 0 ]; then
    echo Error writing to the logfile $LOGFILE, Exiting
    exit 1
fi

echo Hosts are $HOSTS | tee -a $LOGFILE
echo user is  $USR | tee -a $LOGFILE
SSH="/usr/bin/ssh"
SCP="/usr/bin/scp"
SSH_KEYGEN="/usr/bin/ssh-keygen"
calculateOS()
{
    platform=`uname -s`
    case "$platform"
    in
       "SunOS")  os=solaris;;
       "Linux")  os=linux;;
       "HP-UX")  os=hpunix;;
         "AIX")  os=aix;;
             *)  echo "Sorry, $platform is not currently supported." | tee -a $LOGFILE
                 exit 1;;
    esac

    echo "Platform:- $platform " | tee -a $LOGFILE
}
calculateOS
BITS=1024
ENCR="rsa"

deadhosts=""
alivehosts=""
if [ $platform = "Linux" ]
then
    PING="/bin/ping"
else
    PING="/usr/sbin/ping"
fi
#bug 9044791
if [ -n "$SSH_PATH" ]; then
    SSH=$SSH_PATH
fi
if [ -n "$SCP_PATH" ]; then
    SCP=$SCP_PATH
fi
if [ -n "$SSH_KEYGEN_PATH" ]; then
    SSH_KEYGEN=$SSH_KEYGEN_PATH
fi
if [ -n "$PING_PATH" ]; then
    PING=$PING_PATH
fi
PATH_ERROR=0
if test ! -x $SSH ; then
    echo "ssh not found at $SSH. Please set the variable SSH_PATH to the correct location of ssh and retry."
    PATH_ERROR=1
fi 
if test ! -x $SCP ; then
    echo "scp not found at $SCP. Please set the variable SCP_PATH to the correct location of scp and retry."
    PATH_ERROR=1
fi 
if test ! -x $SSH_KEYGEN ; then
    echo "ssh-keygen not found at $SSH_KEYGEN. Please set the variable SSH_KEYGEN_PATH to the correct location of ssh-keygen and retry."
    PATH_ERROR=1
fi 
if test ! -x $PING ; then
    echo "ping not found at $PING. Please set the variable PING_PATH to the correct location of ping and retry."
    PATH_ERROR=1
fi 
if [ $PATH_ERROR = 1 ]; then
    echo "ERROR: one or more of the required binaries not found, exiting"
    exit 1
fi
#9044791 end
echo Checking if the remote hosts are reachable | tee -a $LOGFILE
for host in $HOSTS
do
   if [ $platform = "SunOS" ]; then
       $PING -s $host 5 5
   elif [ $platform = "HP-UX" ]; then
       $PING $host -n 5 -m 5
   else
       $PING -c 5 -w 5 $host
   fi
  exitcode=`echo $?`
  if [ $exitcode = 0 ]
  then
     alivehosts="$alivehosts $host"
  else
     deadhosts="$deadhosts $host"
  fi
done

if test -z "$deadhosts"
then
   echo Remote host reachability check succeeded.  | tee -a $LOGFILE
   echo The following hosts are reachable: $alivehosts.  | tee -a $LOGFILE
   echo The following hosts are not reachable: $deadhosts.  | tee -a $LOGFILE
   echo All hosts are reachable. Proceeding further...  | tee -a $LOGFILE
else
   echo Remote host reachability check failed.  | tee -a $LOGFILE
   echo The following hosts are reachable: $alivehosts.  | tee -a $LOGFILE
   echo The following hosts are not reachable: $deadhosts.  | tee -a $LOGFILE
   echo Please ensure that all the hosts are up and re-run the script.  | tee -a $LOGFILE
   echo Exiting now...  | tee -a $LOGFILE
   exit 1
fi

firsthost=`echo $HOSTS | awk '{print $1}; END { }'`
echo firsthost $firsthost
numhosts=`echo $HOSTS | awk '{ }; END {print NF}'`
echo numhosts $numhosts

if [ $VERIFY = "true" ]
then
   echo Since user has specified -verify option, SSH setup would not be done. Only, existing SSH setup would be verified. | tee -a $LOGFILE
   continue
else
echo The script will setup SSH connectivity from the host ''`hostname`'' to all  | tee -a $LOGFILE 
echo the remote hosts. After the script is executed, the user can use SSH to run  | tee -a $LOGFILE 
echo commands on the remote hosts or copy files between this host ''`hostname`'' | tee -a $LOGFILE 
echo and the remote hosts without being prompted for passwords or confirmations. | tee -a $LOGFILE 
echo  | tee -a $LOGFILE 
echo NOTE 1: | tee -a $LOGFILE 
echo As part of the setup procedure, this script will use 'ssh' and 'scp' to copy | tee -a $LOGFILE 
echo files between the local host and the remote hosts. Since the script does not  | tee -a $LOGFILE 
echo store passwords, you may be prompted for the passwords during the execution of  | tee -a $LOGFILE 
echo the script whenever 'ssh' or 'scp' is invoked. | tee -a $LOGFILE 
echo  | tee -a $LOGFILE 
echo NOTE 2: | tee -a $LOGFILE 
echo "AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY" | tee -a $LOGFILE 
echo AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEGES TO THESE  | tee -a $LOGFILE 
echo "directories." | tee -a $LOGFILE 
echo  | tee -a $LOGFILE 
echo "Do you want to continue and let the script make the above mentioned changes (yes/no)?" | tee -a $LOGFILE 

if [ "$CONFIRM" = "no" ] 
then 
  read CONFIRM 
else
  echo "Confirmation provided on the command line" | tee -a $LOGFILE
fi 
   
echo  | tee -a $LOGFILE 
echo The user chose ''$CONFIRM'' | tee -a $LOGFILE 

if [ -z "$CONFIRM" -o "$CONFIRM" != "yes" -a "$CONFIRM" != "no" ]
then
  echo "You haven't specified proper input. Please enter 'yes' or 'no'. Exiting...."
  exit 0
fi
if [ "$CONFIRM" = "no" ] 
then 
  echo "SSH setup is not done." | tee -a $LOGFILE 
  exit 1 
else 
  if [ $NO_PROMPT_PASSPHRASE = "yes" ]
  then
    echo "User chose to skip passphrase related questions."  | tee -a $LOGFILE
  else
    if [ $SHARED = "true" ]
    then
	  hostcount=`expr ${numhosts} + 1`
	  PASSPHRASE_PROMPT=`expr 2 \* $hostcount`
    else
	  PASSPHRASE_PROMPT=`expr 2 \* ${numhosts}`
    fi
    echo "Please specify if you want to specify a passphrase for the private key this script will create for the local host. Passphrase is used to encrypt the private key and makes SSH much more secure. Type 'yes' or 'no' and then press enter. In case you press 'yes', you would need to enter the passphrase whenever the script executes ssh or scp. $PASSPHRASE " | tee -a $LOGFILE
    echo "The estimated number of times the user would be prompted for a passphrase is $PASSPHRASE_PROMPT. In addition, if the private-public files are also newly created, the user would have to specify the passphrase on one additional occasion. " | tee -a $LOGFILE
    echo "Enter 'yes' or 'no'." | tee -a $LOGFILE
    if [ "$PASSPHRASE" = "no" ]
    then
      read PASSPHRASE
    else
      echo "Confirmation provided on the command line" | tee -a $LOGFILE
    fi 

    echo  | tee -a $LOGFILE 
    echo The user chose ''$PASSPHRASE'' | tee -a $LOGFILE 
    if [ -z "$PASSPHRASE"  -o "$PASSPHRASE" != "yes" -a "$PASSPHRASE" != "no" ]
    then
      echo "You haven't specified whether to use Passphrase or not. Please specify 'yes' or 'no'. Exiting..."
      exit 0
    fi

    if [ "$PASSPHRASE" = "yes" ] 
    then 
       RERUN_SSHKEYGEN="yes"
#Checking for existence of ${IDENTITY} file
       if test -f  $HOME/.ssh/${IDENTITY}.pub && test -f  $HOME/.ssh/${IDENTITY} 
       then
	     echo "The files containing the client public and private keys already exist on the local host. The current private key may or may not have a passphrase associated with it. In case you remember the passphrase and do not want to re-run ssh-keygen, press 'no' and enter. If you press 'no', the script will not attempt to create any new public/private key pairs. If you press 'yes', the script will remove the old private/public key files existing and create new ones prompting the user to enter the passphrase. If you enter 'yes', any previous SSH user setups would be reset. If you press 'change', the script will associate a new passphrase with the old keys." | tee -a $LOGFILE
	     echo "Press 'yes', 'no' or 'change'" | tee -a $LOGFILE
             read RERUN_SSHKEYGEN 
             echo The user chose ''$RERUN_SSHKEYGEN'' | tee -a $LOGFILE 
	     if [ -z "$RERUN_SSHKEYGEN" -o "$RERUN_SSHKEYGEN" != "yes" -a "$RERUN_SSHKEYGEN" != "no" -a "$RERUN_SSHKEYGEN" != "change" ]
	     then
	       echo "You haven't specified whether to re-run 'ssh-keygen' or not. Please enter 'yes' , 'no' or 'change'. Exiting..."
	       exit 0;
	     fi
       fi 
     else
       if test -f  $HOME/.ssh/${IDENTITY}.pub && test -f  $HOME/.ssh/${IDENTITY} 
       then
         echo "The files containing the client public and private keys already exist on the local host. The current private key may have a passphrase associated with it. In case you find using passphrase inconvenient(although it is more secure), you can change to it empty through this script. Press 'change' if you want the script to change the passphrase for you. Press 'no' if you want to use your old passphrase, if you had one."
         read RERUN_SSHKEYGEN 
         echo The user chose ''$RERUN_SSHKEYGEN'' | tee -a $LOGFILE 
	 if [ -z "$RERUN_SSHKEYGEN" -o "$RERUN_SSHKEYGEN" != "yes" -a "$RERUN_SSHKEYGEN" != "no" -a "$RERUN_SSHKEYGEN" != "change" ]
	 then
	   echo "You haven't specified whether to re-run 'ssh-keygen' or not. Please enter 'yes' , 'no' or 'change'. Exiting..."
	   exit 0
	 fi
       fi
     fi
  fi
  echo Creating .ssh directory on local host, if not present already | tee -a $LOGFILE
  mkdir -p $HOME/.ssh | tee -a $LOGFILE
echo Creating authorized_keys file on local host  | tee -a $LOGFILE
touch $HOME/.ssh/authorized_keys  | tee -a $LOGFILE
echo Changing permissions on authorized_keys to 644 on local host  | tee -a $LOGFILE
chmod 644 $HOME/.ssh/authorized_keys  | tee -a $LOGFILE
mv -f $HOME/.ssh/authorized_keys  $HOME/.ssh/authorized_keys.tmp | tee -a $LOGFILE
echo Creating known_hosts file on local host  | tee -a $LOGFILE
touch $HOME/.ssh/known_hosts  | tee -a $LOGFILE
echo Changing permissions on known_hosts to 644 on local host  | tee -a $LOGFILE
chmod 644 $HOME/.ssh/known_hosts  | tee -a $LOGFILE
mv -f $HOME/.ssh/known_hosts $HOME/.ssh/known_hosts.tmp | tee -a $LOGFILE


echo Creating config file on local host | tee -a $LOGFILE
echo If a config file exists already at $HOME/.ssh/config, it would be backed up to $HOME/.ssh/config.backup.
echo "Host *" > $HOME/.ssh/config.tmp | tee -a $LOGFILE
echo "ForwardX11 no" >> $HOME/.ssh/config.tmp | tee -a $LOGFILE

if test -f $HOME/.ssh/config 
then
  cp -f $HOME/.ssh/config $HOME/.ssh/config.backup
fi

mv -f $HOME/.ssh/config.tmp $HOME/.ssh/config  | tee -a $LOGFILE
chmod 644 $HOME/.ssh/config

if [ "$RERUN_SSHKEYGEN" = "yes" ]
then
  echo Removing old private/public keys on local host | tee -a $LOGFILE
  rm -f $HOME/.ssh/${IDENTITY} | tee -a $LOGFILE
  rm -f $HOME/.ssh/${IDENTITY}.pub | tee -a $LOGFILE
  echo Running SSH keygen on local host | tee -a $LOGFILE
  $SSH_KEYGEN -t $ENCR -b $BITS -f $HOME/.ssh/${IDENTITY}   | tee -a $LOGFILE

elif [ "$RERUN_SSHKEYGEN" = "change" ]
then
    echo Running SSH Keygen on local host to change the passphrase associated with the existing private key | tee -a $LOGFILE
    $SSH_KEYGEN -p -t $ENCR -b $BITS -f $HOME/.ssh/${IDENTITY} | tee -a $LOGFILE
elif test -f  $HOME/.ssh/${IDENTITY}.pub && test -f  $HOME/.ssh/${IDENTITY} 
then
    continue
else
    echo Removing old private/public keys on local host | tee -a $LOGFILE
    rm -f $HOME/.ssh/${IDENTITY} | tee -a $LOGFILE
    rm -f $HOME/.ssh/${IDENTITY}.pub | tee -a $LOGFILE
    echo Running SSH keygen on local host with empty passphrase | tee -a $LOGFILE
    $SSH_KEYGEN -t $ENCR -b $BITS -f $HOME/.ssh/${IDENTITY} -N ''  | tee -a $LOGFILE
fi

if [ $SHARED = "true" ]
then
  if [ $USER = $USR ]
  then
#No remote operations required
    echo Remote user is same as local user | tee -a $LOGFILE
    REMOTEHOSTS=""
    chmod og-w $HOME $HOME/.ssh | tee -a $LOGFILE
  else    
    REMOTEHOSTS="${firsthost}"
  fi
else
  REMOTEHOSTS="$HOSTS"
fi

for host in $REMOTEHOSTS
do
     echo Creating .ssh directory and setting permissions on remote host $host | tee -a $LOGFILE
     echo "THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR "group" AND "others" ON THE HOME DIRECTORY FOR $USR. THIS IS AN SSH REQUIREMENT." | tee -a $LOGFILE
     echo The script would create ~$USR/.ssh/config file on remote host $host. If a config file exists already at ~$USR/.ssh/config, it would be backed up to ~$USR/.ssh/config.backup. | tee -a $LOGFILE
     echo The user may be prompted for a password here since the script would be running SSH on host $host. | tee -a $LOGFILE
     $SSH -o StrictHostKeyChecking=no -x -l $USR $host "/bin/sh -c \"  mkdir -p .ssh ; chmod og-w . .ssh;   touch .ssh/authorized_keys .ssh/known_hosts;  chmod 644 .ssh/authorized_keys  .ssh/known_hosts; cp  .ssh/authorized_keys .ssh/authorized_keys.tmp ;  cp .ssh/known_hosts .ssh/known_hosts.tmp; echo \\"Host *\\" > .ssh/config.tmp; echo \\"ForwardX11 no\\" >> .ssh/config.tmp; if test -f  .ssh/config ; then cp -f .ssh/config .ssh/config.backup; fi ; mv -f .ssh/config.tmp .ssh/config\""  | tee -a $LOGFILE
     echo Done with creating .ssh directory and setting permissions on remote host $host. | tee -a $LOGFILE
done

for host in $REMOTEHOSTS
do
  echo Copying local host public key to the remote host $host | tee -a $LOGFILE
  echo The user may be prompted for a password or passphrase here since the script would be using SCP for host $host. | tee -a $LOGFILE

  $SCP $HOME/.ssh/${IDENTITY}.pub  $USR@$host:.ssh/authorized_keys | tee -a $LOGFILE
  echo Done copying local host public key to the remote host $host | tee -a $LOGFILE
done

cat $HOME/.ssh/${IDENTITY}.pub >> $HOME/.ssh/authorized_keys | tee -a $LOGFILE

for host in $HOSTS
do
  if [ "$ADVANCED" = "true" ] 
  then
    echo Creating keys on remote host $host if they do not exist already. This is required to setup SSH on host $host. | tee -a $LOGFILE
    if [ "$SHARED" = "true" ]
    then
      IDENTITY_FILE_NAME=${IDENTITY}_$host
      COALESCE_IDENTITY_FILES_COMMAND="cat .ssh/${IDENTITY_FILE_NAME}.pub >> .ssh/authorized_keys"
    else
      IDENTITY_FILE_NAME=${IDENTITY}
    fi

   $SSH  -o StrictHostKeyChecking=no -x -l $USR $host " /bin/sh -c \"if test -f  .ssh/${IDENTITY_FILE_NAME}.pub && test -f  .ssh/${IDENTITY_FILE_NAME}; then echo; else rm -f .ssh/${IDENTITY_FILE_NAME} ;  rm -f .ssh/${IDENTITY_FILE_NAME}.pub ;  $SSH_KEYGEN -t $ENCR -b $BITS -f .ssh/${IDENTITY_FILE_NAME} -N '' ; fi; ${COALESCE_IDENTITY_FILES_COMMAND} \"" | tee -a $LOGFILE
  else 
#At least get the host keys from all hosts for shared case - advanced option not set
    if test  $SHARED = "true" && test $ADVANCED = "false"
    then
      if [ "$PASSPHRASE" = "yes" ]
      then
	 echo "The script will fetch the host keys from all hosts. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase." | tee -a $LOGFILE
      fi
      $SSH  -o StrictHostKeyChecking=no -x -l $USR $host "/bin/sh -c true"
    fi
  fi
done

for host in $REMOTEHOSTS
do
  if test $ADVANCED = "true" && test $SHARED = "false"  
  then
      $SCP $USR@$host:.ssh/${IDENTITY}.pub $HOME/.ssh/${IDENTITY}.pub.$host | tee -a $LOGFILE
      cat $HOME/.ssh/${IDENTITY}.pub.$host >> $HOME/.ssh/authorized_keys | tee -a $LOGFILE
      rm -f $HOME/.ssh/${IDENTITY}.pub.$host | tee -a $LOGFILE
    fi
done

for host in $REMOTEHOSTS
do
   if [ "$ADVANCED" = "true" ]
   then
      if [ "$SHARED" != "true" ]
      then
         echo Updating authorized_keys file on remote host $host | tee -a $LOGFILE
         $SCP $HOME/.ssh/authorized_keys  $USR@$host:.ssh/authorized_keys | tee -a $LOGFILE
      fi 
     echo Updating known_hosts file on remote host $host | tee -a $LOGFILE
     $SCP $HOME/.ssh/known_hosts $USR@$host:.ssh/known_hosts | tee -a $LOGFILE
   fi
   if [ "$PASSPHRASE" = "yes" ]
   then
	 echo "The script will run SSH on the remote machine $host. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase." | tee -a $LOGFILE
   fi
     $SSH -x -l $USR $host "/bin/sh -c \"cat .ssh/authorized_keys.tmp >> .ssh/authorized_keys; cat .ssh/known_hosts.tmp >> .ssh/known_hosts; rm -f  .ssh/known_hosts.tmp  .ssh/authorized_keys.tmp\"" | tee -a $LOGFILE
done

cat  $HOME/.ssh/known_hosts.tmp >> $HOME/.ssh/known_hosts | tee -a $LOGFILE
cat  $HOME/.ssh/authorized_keys.tmp >> $HOME/.ssh/authorized_keys | tee -a $LOGFILE
#Added chmod to fix BUG NO 5238814
chmod 644 $HOME/.ssh/authorized_keys
#Fix for BUG NO 5157782
chmod 644 $HOME/.ssh/config
rm -f  $HOME/.ssh/known_hosts.tmp $HOME/.ssh/authorized_keys.tmp | tee -a $LOGFILE
echo SSH setup is complete. | tee -a $LOGFILE
fi
fi

echo                                                                          | tee -a $LOGFILE
echo ------------------------------------------------------------------------ | tee -a $LOGFILE
echo Verifying SSH setup | tee -a $LOGFILE
echo =================== | tee -a $LOGFILE
echo The script will now run the 'date' command on the remote nodes using ssh | tee -a $LOGFILE
echo to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,  | tee -a $LOGFILE
echo THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR | tee -a $LOGFILE
echo PASSWORDS. If you see any output other than date or are prompted for the | tee -a $LOGFILE
echo password, ssh is not setup correctly and you will need to resolve the  | tee -a $LOGFILE
echo issue and set up ssh again. | tee -a $LOGFILE
echo The possible causes for failure could be:  | tee -a $LOGFILE
echo   1. The server settings in /etc/ssh/sshd_config file do not allow ssh | tee -a $LOGFILE
echo      for user $USR. | tee -a $LOGFILE
echo   2. The server may have disabled public key based authentication.
echo   3. The client public key on the server may be outdated.
echo   4. ~$USR or  ~$USR/.ssh on the remote host may not be owned by $USR.  | tee -a $LOGFILE
echo   5. User may not have passed -shared option for shared remote users or | tee -a $LOGFILE
echo     may be passing the -shared option for non-shared remote users.  | tee -a $LOGFILE
echo   6. If there is output in addition to the date, but no password is asked, | tee -a $LOGFILE
echo   it may be a security alert shown as part of company policy. Append the | tee -a $LOGFILE
echo   "additional text to the /sysman/prov/resources/ignoreMessages.txt file." | tee -a $LOGFILE
echo ------------------------------------------------------------------------ | tee -a $LOGFILE
#read -t 30 dummy
  for host in $HOSTS
  do
    echo --$host:-- | tee -a $LOGFILE

     echo Running $SSH -x -l $USR $host date to verify SSH connectivity has been setup from local host to $host.  | tee -a $LOGFILE
     echo "IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR." | tee -a $LOGFILE
     if [ "$PASSPHRASE" = "yes" ]
     then
       echo "The script will run SSH on the remote machine $host. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase." | tee -a $LOGFILE
     fi
     $SSH -l $USR $host "/bin/sh -c date"  | tee -a $LOGFILE
echo ------------------------------------------------------------------------ | tee -a $LOGFILE
  done


if [ "$EXHAUSTIVE_VERIFY" = "true" ]
then
   for clienthost in $HOSTS
   do

      if [ "$SHARED" = "true" ]
      then
         REMOTESSH="$SSH -i .ssh/${IDENTITY}_${clienthost}"
      else
         REMOTESSH=$SSH
      fi

      for serverhost in  $HOSTS
      do
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
         echo Verifying SSH connectivity has been setup from $clienthost to $serverhost  | tee -a $LOGFILE
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
         echo "IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL."  | tee -a $LOGFILE
         $SSH -l $USR $clienthost "$REMOTESSH $serverhost \"/bin/sh -c date\""  | tee -a $LOGFILE
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
      done  
       echo -Verification from $clienthost complete- | tee -a $LOGFILE
   done
else
   if [ "$ADVANCED" = "true" ]
   then
      if [ "$SHARED" = "true" ]
      then
         REMOTESSH="$SSH -i .ssh/${IDENTITY}_${firsthost}"
      else
         REMOTESSH=$SSH
      fi
     for host in $HOSTS
     do
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
        echo Verifying SSH connectivity has been setup from $firsthost to $host  | tee -a $LOGFILE
        echo "IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL." | tee -a $LOGFILE
       $SSH -l $USR $firsthost "$REMOTESSH $host \"/bin/sh -c date\"" | tee -a $LOGFILE
         echo ------------------------------------------------------------------------ | tee -a $LOGFILE
    done
    echo -Verification from $clienthost complete- | tee -a $LOGFILE
  fi
fi
echo "SSH verification complete." | tee -a $LOGFILE

你可能感兴趣的:(kubernetes,云原生,kubernetes,学习,docker)