实战踩坑记:KubernetesV1.10 集群的建立

准备工作

  1. 准备三台机器,使用的系统是ubuntu1604。
  2. 做好地址规划,三台机器同网段。
主机名称 IP 备注
node01 192.168.175.96 master and etcd
node02 192.168.175.101 master and etcd
node03 192.168.175.57 master and etcd
VIP 192.168.175.120 非主机只是一个虚拟IP
  1. 准备好相对应的软件版本。
    docker 18.06.1-ce 容器软件
    kubelet v1.10.3 在kubernetes集群中,每个Node节点都会启动kubelet进程,用来处理Master节点下发到本节点的任务,管理Pod和其中的容器。kubelet会在API Server上注册节点信息,定期向Master汇报节点资源使用情况,并通过cAdvisor监控容器和节点资源。可以把kubelet理解成【Server-Agent】架构中的agent,是Node上的pod管家
    kubectl v1.10.3 Kubernetes提供的kubectl命令是与集群交互最直接的方式
    kubeadm v1.10.3 kubeadm是Kubernetes 1.4开始新增的特性,用于快速搭建Kubernetes集群环境,两个命令就能把一个k8s集群搭建起来。
    etcd v3.3.5 是一个键值存储仓库,用于配置共享和服务发现。
    注意以上软件的搭建日期为2019年的3月,随着软件的更新,按照以下步骤进行的操作可能会因为版本的问题报错,请参考此文章的读者,出现版本更新后请注意阅读相关软件官网的文档。

1. 环境初始化

1.1 修改主机名(所有机器执行)

hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03

1.2 配置主机映射(所有机器执行)

通过cat命令追加主机列表
cat << EOF >> /etc/hosts
192.168.175.96 node01
192.168.175.101 node02
192.168.175.57 node03
EOF

1.3 node01上执行ssh免密码登陆配置

ssh-keygen  #一路回车即可
ssh-copy-id  node02
ssh-copy-id  node03
ssh-copy-id  node04

错误处理:

root@ubuntu:/home/wangdong# ssh-copy-id node02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'node02 (192.168.175.101)' can't be established.
ECDSA key fingerprint is SHA256:WpahJfY7TNgpVHPAlEbd+6ehVUVU7gwBrpx47tqTdq8.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node02's password: 
Permission denied, please try again.

出现这个错误的原因是:node2上面禁止了root用户的远程登录,需要到对应的ssh配置文件中进行更改来允许root用户远程登录

修改ssh服务配置文件,sudo vi /etc/ssh/sshd_config
调整PermitRootLogin参数值为yes,如下图:

 26 # Authentication:
 27 LoginGraceTime 120
 28 PermitRootLogin yes
 29 StrictModes yes

重启ssh的服务

root@ubuntu:/etc/ssh# /etc/init.d/sshd  restart
bash: /etc/init.d/sshd: No such file or directory
root@ubuntu:/etc/ssh# /etc/init.d/ssh  restart
[ ok ] Restarting ssh (via systemctl): ssh.service.

在node1上面执行 ssh-copy-id node02

root@ubuntu:/home/wangdong# ssh-copy-id node02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node02's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'node02'"
and check to make sure that only the key(s) you wanted were added.

系统提示成功的增加了一条记录。同样的方法在node3中执行,一样成功,进入下一步。

1.4 三台主机配置设置内核、K8S源、关闭Swap、配置ntp(配置完后重启一次)

#其实我更推荐使用的是ntp服务,而不是ntpdate。两者的区别是ntp是渐进式的同步时间,不会引起程序混乱。
而ntpdate是暴力的跳转到一个时间,可能会引起程序的混乱。

swapoff -a 
sed -i 's/.*swap.*/#&/' /etc/fstab    #禁用swap分区

#加载的是内核模块
modprobe br_netfilter
cat <  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl -p /etc/sysctl.d/k8s.conf
ls /proc/sys/net/bridge

 #https://opsx.alibaba.com/mirror 阿里云地址
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
cat </etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main 
EOF
apt-get update  #这次更新的是阿里云下载的链接
apt-cache madison kubelet   #使用这个命令来查看对应软件包的版本
apt-cache madison kubelet  |  grep  1.10.3  #用来查看执行版本是否存在,存在可继续
存在会显示如下内容:
 kubelet |  1.10.3-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages
apt-get install -y kubelet= 1.10.3-00
apt-get install -y  kubeadm= 1.10.3-00
apt-get install -y kubectl= 1.10.3-00

systemctl enable ntpdate.service   #时间相差不大,可以不做。时间同步也可以使用ntp服务来设置
echo '*/30 * * * * /usr/sbin/ntpdate time7.aliyun.com >/dev/null 2>&1' > /tmp/crontab2.tmp
crontab /tmp/crontab2.tmp
systemctl start ntpdate.service
 
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nproc 65536"  >> /etc/security/limits.conf
echo "* hard nproc 65536"  >> /etc/security/limits.conf
echo "* soft  memlock  unlimited"  >> /etc/security/limits.conf
echo "* hard memlock  unlimited"  >> /etc/security/limits.conf

2. 安装、配置keepalived(主节点)

keepalived是一个类似于layer3, 4 & 5交换机制的软件,也就是我们平时说的第3层、第4层和第5层交换。Keepalived是自动完成,不需人工干涉。
keepalived配虚拟ip(vip)的作用
keepalived是以VRRP协议为实现基础的,VRRP全称Virtual Router Redundancy Protocol,即虚拟路由冗余协议。
虚拟路由冗余协议,可以认为是实现路由器高可用的协议,即将N台提供相同功能的路由器组成一个路由器组,这个组里面有一个master和多个backup,
master上面有一个对外提供服务的vip(VIP = Virtual IP Address,虚拟IP地址,该路由器所在局域网内其他机器的默认路由为该vip),master会发组播,
当backup收不到VRRP包时就认为master宕掉了,这时就需要根据VRRP的优先级来选举一个backup当master。这样的话就可以保证路由器的高可用了。
可以理解为所有keepalived节点统一一个ip,一个down掉,其他节点变为主服务器。

2.1 安装keepalived 这一个主要是负载均衡的时候使用的,可以跳过去

apt install -y keepalived
systemctl enable keepalived
node01的keepalived.conf
cat < /etc/keepalived/keepalived.conf
global_defs {
   router_id LVS_k8s
}

vrrp_script CheckK8sMaster {
    script "curl -k https://192.168.175.120:6443"
    interval 3
    weight 3
    timeout 9
}

vrrp_instance VI_1 {
    state MASTER
    interface ens160
    virtual_router_id 61
    priority 100
    advert_int 1
    mcast_src_ip 192.168.175.96
    nopreempt
    authentication {
        auth_type PASS
        auth_pass sqP05dQgMSlzrxHj
    }
    unicast_peer {
        192.168.175.101
        192.168.175.57
    }
    virtual_ipaddress {
        192.168.175.120/24
    }
    track_script {
        CheckK8sMaster
    }

}
EOF
node02的keepalived.conf
cat < /etc/keepalived/keepalived.conf
global_defs {
   router_id LVS_k8s
}

global_defs {
   router_id LVS_k8s
}

vrrp_script CheckK8sMaster {
    script "curl -k https://192.168.175.120:6443"
    interval 3
    weight 3
    timeout 9
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens160
    virtual_router_id 61
    priority 90
    advert_int 1
    mcast_src_ip 192.168.175.101
    nopreempt
    authentication {
        auth_type PASS
        auth_pass sqP05dQgMSlzrxHj
    }
    unicast_peer {
        192.168.175.96
        192.168.175.57
    }
    virtual_ipaddress {
        192.168.175.120/24
    }
    track_script {
        CheckK8sMaster
    }

}
EOF
node03的keepalived.conf
cat < /etc/keepalived/keepalived.conf
global_defs {
   router_id LVS_k8s
}

global_defs {
   router_id LVS_k8s
}

vrrp_script CheckK8sMaster {
    script "curl -k https://192.168.175.120:6443"
    interval 3
    weight 3
    timeout 9
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens160
    virtual_router_id 61
    priority 80
    advert_int 1
    mcast_src_ip 192.168.175.57
    nopreempt
    authentication {
        auth_type PASS
        auth_pass sqP05dQgMSlzrxHj
    }
    unicast_peer {
        192.168.175.96
        192.168.175.101
    }
    virtual_ipaddress {
        192.168.175.120/24
    }
    track_script {
        CheckK8sMaster
    }

}
EOF

2.2 启动keepalived

systemctl restart keepalived
可以看到VIP已经绑定到node01上面了,可以ping通VIP地址
root@node01:~# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens160:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:0b:06:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.175.96/24 brd 192.168.175.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 192.168.175.120/24 scope global secondary ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe0b:6c2/64 scope link 
       valid_lft forever preferred_lft forever

3. 创建etcd证书(node01上执行即可)

3.1 设置cfssl环境 CFSSL是CloudFlare开源的一款PKI/TLS工具。

CFSSL 包含一个命令行工具 和一个用于 签名,验证并且捆绑TLS证书的 HTTP API 服务。 使用Go语言编写。

wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
chmod +x cfssl_linux-amd64
mv cfssl_linux-amd64 /usr/local/bin/cfssl
chmod +x cfssljson_linux-amd64
mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
chmod +x cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 /usr/local/bin/cfssl-certinfo
export PATH=/usr/local/bin:$PATH

3.2 创建 CA 配置文件(下面配置的IP为etc节点的IP)

mkdir /root/ssl
cd /root/ssl
cat >  ca-config.json <  ca-csr.json < etcd-csr.json <

3.3 node01分发etcd证书到node02、node03上面

mkdir -p /etc/etcd/ssl  #三台机器上分别执行
cp etcd.pem etcd-key.pem ca.pem /etc/etcd/ssl/
scp -r /etc/etcd/ssl/*.pem node02:/etc/etcd/ssl/
scp -r /etc/etcd/ssl/*.pem node03:/etc/etcd/ssl/

3.4 安装配置etcd (三主节点)ETCD是用于共享配置和服务发现的分布式,一致性的KV存储系统。

安装etcd
apt install etcd -y
node01的etcd.service
cat </etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/bin/etcd   --name node01   --cert-file=/etc/etcd/ssl/etcd.pem   --key-file=/etc/etcd/ssl/etcd-key.pem   --peer-cert-file=/etc/etcd/ssl/etcd.pem   --peer-key-file=/etc/etcd/ssl/etcd-key.pem   --trusted-ca-file=/etc/etcd/ssl/ca.pem   --peer-trusted-ca-file=/etc/etcd/ssl/ca.pem   --initial-advertise-peer-urls https://192.168.175.96:2380   --listen-peer-urls https://192.168.175.96:2380   --listen-client-urls https://192.168.175.96:2379,http://127.0.0.1:2379   --advertise-client-urls https://192.168.175.96:2379   --initial-cluster-token etcd-cluster-0   --initial-cluster node01=https://192.168.175.96:2380,node02=https://192.168.175.101:2380,node03=https://192.168.175.57:2380   --initial-cluster-state new   --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
node02的etcd.service
cat </etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/bin/etcd   --name node02   --cert-file=/etc/etcd/ssl/etcd.pem   --key-file=/etc/etcd/ssl/etcd-key.pem   --peer-cert-file=/etc/etcd/ssl/etcd.pem   --peer-key-file=/etc/etcd/ssl/etcd-key.pem   --trusted-ca-file=/etc/etcd/ssl/ca.pem   --peer-trusted-ca-file=/etc/etcd/ssl/ca.pem   --initial-advertise-peer-urls https://192.168.175.101:2380   --listen-peer-urls https://192.168.175.101:2380   --listen-client-urls https://192.168.175.101:2379,http://127.0.0.1:2379   --advertise-client-urls https://192.168.175.101:2379   --initial-cluster-token etcd-cluster-0   --initial-cluster node01=https://192.168.175.96:2380,node02=https://192.168.175.101:2380,node03=https://192.168.175.57:2380   --initial-cluster-state new   --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
node03的etcd.service
cat </etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/usr/bin/etcd   --name node03   --cert-file=/etc/etcd/ssl/etcd.pem   --key-file=/etc/etcd/ssl/etcd-key.pem   --peer-cert-file=/etc/etcd/ssl/etcd.pem   --peer-key-file=/etc/etcd/ssl/etcd-key.pem   --trusted-ca-file=/etc/etcd/ssl/ca.pem   --peer-trusted-ca-file=/etc/etcd/ssl/ca.pem   --initial-advertise-peer-urls https://192.168.175.57:2380   --listen-peer-urls https://192.168.175.57:2380   --listen-client-urls https://192.168.175.57:2379,http://127.0.0.1:2379   --advertise-client-urls https://192.168.175.57:2379   --initial-cluster-token etcd-cluster-0 --initial-cluster node01=https://192.168.175.96:2380,node02=https://192.168.175.101:2380,node03=https://192.168.175.57:2380   --initial-cluster-state new   --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
添加自启动(etc集群最少2个节点才能启动,启动报错看mesages日志)
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
systemctl status etcd

注意执行 systemctl enable etcd 的时候会报错,错误提示如下:
root@node01:~/ssl# systemctl enable etcd
Synchronizing state of etcd.service with SysV init with /lib/systemd/systemd-sysv-install…
Executing /lib/systemd/systemd-sysv-install enable etcd
Failed to execute operation: File exists
错误的意思是
etcd的service上曾经enable,因为文件存在着,所以创建软链接失败.解决办法是先disable之前的那个etcd的service,再enable etcd.service

 systemctl disable etcd.service
 systemctl enable etcd.service
 systemctl enable etcd

这样就会正常提示如下内容:
root@node01:~/ssl# systemctl enable etcd
Synchronizing state of etcd.service with SysV init with /lib/systemd/systemd-sysv-install…
Executing /lib/systemd/systemd-sysv-install enable etcd

在三个etcd节点执行一下命令检查(先做完下一步的etcd升级再执行本步)

访问ETCD要使用证书,以下搜索的相关资料,如果执行下面的步骤出现Error: context deadline exceeded 就可以参照下面这部分的内容来进行排查:

k8s现在使用的是etcd v3,必须提供ca、key、cert,否则会出现Error: context deadline exceeded

不加--endpoint参数时,默认访问的127.0.0.1:2379,而使用--endpoint参数时,必须提供ca,key,cert。

[root@k8s-test2 ~]# etcdctl endpoint health 
127.0.0.1:2379 is healthy: successfully committed proposal: took = 939.097µs
 
[root@k8s-test2 ~]# etcdctl --endpoints=https://10.0.26.152:2379 endpoint health 
https://10.0.26.152:2379 is unhealthy: failed to connect: context deadline exceeded
 
[root@k8s-test2 ~]# etcdctl --endpoints=https://10.0.26.152:2379 --cacert=/etc/k8s/ssl/etcd-root-ca.pem --key=/etc/k8s/ssl/etcd-key.pem  --cert=/etc/k8s/ssl/etcd.pem  endpoint health 
https://10.0.26.152:2379 is healthy: successfully committed proposal: took = 1.001505ms

下面是带证书测试的命令,注意V2版本和V3版本的不一样

v2.0的命令  (会报错,提示:Incorrect Usage. 若报错就先升级到3.0然后执行3.0的命令格式)
etcdctl --endpoints=https://192.168.175.96:2379,https://192.168.175.101:2379,https://192.168.175.57:2379 \
  --ca-file=/etc/etcd/ssl/ca.pem \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem  cluster-health

如果使用v2.0的命令,在v3.0中执行,就会出现错误提示,例如unnkown usage 之类的。

V3.0的命令  将etcd升级到v3.0,使用证书来测试  在三个etc节点上面都执行一下该测试
etcdctl --endpoints=https://192.168.175.96:2379,https://192.168.175.101:2379,https://192.168.175.57:2379  --cacert=/etc/etcd/ssl/ca.pem --key=/etc/etcd/ssl/etcd-key.pem  --cert=/etc/etcd/ssl/etcd.pem  endpoint health 
测试成功的提示为
https://192.168.175.101:2379 is healthy: successfully committed proposal: took = 1.912224ms
https://192.168.175.57:2379 is healthy: successfully committed proposal: took = 2.274874ms
https://192.168.175.96:2379 is healthy: successfully committed proposal: took = 2.293437ms
etcd升级(apt安装的版本为v2.2.5,kubernetes v1.10要求版本最低为3.1)
官网下载最新安装包
wget https://github.com/coreos/etcd/releases/download/v3.3.5/etcd-v3.3.5-linux-amd64.tar.gz
tar zxf etcd-v3.3.5-linux-amd64.tar.gz
执行下面语句的时候先进入对应的目录把同名旧文件做一下备份,用mv命令就可以
mv /usr/bin/etcd /usr/bin/etcd_v2
mv /usr/bin/etcdctl /usr/bin/etcdctl_v2
cp etcd-v3.3.5-linux-amd64/etcd /usr/bin/etcd
cp etcd-v3.3.5-linux-amd64/etcdctl /usr/bin/etcdctl
在/etc/profile文件添加以下一行,重启服务器
export ETCDCTL_API=3

source /etc/profile

systemctl restart etcd

重启etcd服务,并产看集群状态
(可在所有的机器上面查看集群的情况)
root@k8s-n2:~/k8s# etcdctl member list
aa76456e260f7bd1, started, node02, https://192.168.175.101:2380, https://192.168.175.101:2379
d12950b45efa96da, started, node03, https://192.168.175.57:2380, https://192.168.175.57:2379
e598ba1c84356928, started, node01, https://192.168.175.96:2380, https://192.168.175.96:2379

4. 安装docker(三台机器都需要做)

curl -fsSL "https://get.docker.com/" | sh

(可以使用apt-get install docker.io)

4.1 docker 添加阿里云代理,修改配置文件/lib/systemd/system/docker.service

ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --registry-mirror=https://ms3cfraz.mirror.aliyuncs.com

4.2 启动docker,添加开机自启动

systemctl daemon-reload
systemctl restart docker
systemctl enable docker
systemctl status docker

5. 配置kubeadm

所有节点修改kubelet配置文件
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
#添加这一行
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
#添加这一行
Environment="KUBELET_EXTRA_ARGS=--v=2 --fail-swap-on=false --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/k8sth/pause-amd64:3.0"
所有节点修改完配置文件一定要重新加载配置
systemctl daemon-reload
systemctl enable kubelet
systemctl restart kubelet

6. 初始化集群

6.1 node01、node02、node03添加集群初始配置文件(集群配置文件一样)

cat < config.yaml 
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
etcd:
  endpoints:
  - https://192.168.175.96:2379
  - https://192.168.175.101:2379
  - https://192.168.175.57:2379
  caFile: /etc/etcd/ssl/ca.pem
  certFile: /etc/etcd/ssl/etcd.pem
  keyFile: /etc/etcd/ssl/etcd-key.pem
  dataDir: /var/lib/etcd
networking:
  podSubnet: 10.244.0.0/16
kubernetesVersion: 1.10.0
api:
  advertiseAddress: "192.168.175.120"
token: "b99a00.a144ef80536d4344"
tokenTTL: "0s"
apiServerCertSANs:
- node01
- node02
- node03
- 192.168.175.96
- 192.168.175.101
- 192.168.175.57
- 192.168.175.99
- 192.168.175.120
featureGates:
  CoreDNS: true
imageRepository: "registry.cn-hangzhou.aliyuncs.com/k8sth"
EOF

6.2 首先node01初始化集群

配置文件定义podnetwork是10.244.0.0/16
kubeadm init --hlep可以看出,service默认网段是10.96.0.0/12
kubeadm init --help
找到下面这行
--service-cidr string                  Use alternative range of IP address for service VIPs. (default "10.96.0.0/12")
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf默认dns地址cluster-dns=10.96.0.10
kubeadm init --config config.yaml 

注意:下载kubectl、kubeadm、kubelet时没有加上指定版本号的,执行以上语句,提示如下的错误

kubeadm init --config config.yaml
your configuration file uses an old API spec: "kubeadm.k8s.io/v1alpha1". Please use kubeadm v1.11 instead and run 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.

原因解析,是原文档的集群初始配置文件的API为apiVersion: kubeadm.k8s.io/v1alpha1;而在新版里面的API已经升级成为v1beta1。

补救方法一
而要想让v1alpha1升级成为v1beta1,需要几次版本更新才可以。升级的路径如下:
v1alpha1—v1alpha2----v1alpha3----v1beta1
升级文档对应的官方地址为:
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1
从旧kubeadm配置版本迁移
请使用kubeadm v1.13.x的“kubeadm配置迁移”命令将您的v1alpha 3配置文件转换为v1beta 1(从kubeadm配置文件的旧版本转换到v1beta 1需要更早版本的kubeadm。

kubeadm v1.11 should be used to migrate v1alpha1 to v1alpha2; kubeadm v1.12 should be used to translate v1alpha2 to v1alpha3)

需要下载对应的不同版本的kubeadm,非常的繁琐。采用这个办法可以先下载对应版本的kubeadm来进行逐步升级。

补救方法二
先卸载掉keburnetes 的三个软件包,重新安装1.10.13版本。
需要的三个命令是:
1、卸载命令
apt-get --purge remove kubelet
apt-get --purge remove kubeadm
apt-get --purge remove kubectl
2、查找kubeadm的历史版本信息:
apt-cache madison kubeadm
3、下载对应历史信息的版本:
格式为:apt-get install <>=<>
例如:apt-get install kubeadm=1.10.13-00

初始化失败后处理办法
kubeadm reset
#或
rm -rf /etc/kubernetes/*.conf
rm -rf /etc/kubernetes/manifests/*.yaml
docker ps -a |awk '{print $1}' |xargs docker rm -f
systemctl  stop kubelet
初始化正常的结果如下
Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.175.120:6443 --token b99a00.a144ef80536d4344 --discovery-token-ca-cert-hash sha256:a2551d730098fe59c8f0f9d77e07ab9e1ceb2d205678e4780826e8b7cc32aacf
6.3 node01上面执行如下命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
6.4 kubeadm生成证书密码文件分发到node02和node03上面去
scp -r /etc/kubernetes/pki  node03:/etc/kubernetes/
scp -r /etc/kubernetes/pki  node02:/etc/kubernetes/

这一步完成之后,可以在剩余的两台机器上面执行初始化集群
kubeadm init --config config.yaml
参照上面的初始化提示,能够初始化成功之后,就可以继续了。

6.5 部署flannel网络,只需要在node01执行就行
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#版本信息:quay.io/coreos/flannel:v0.10.0-amd64
kubectl create -f  kube-flannel.yml
执行命令

执行 kubectl get node 命令,刚开始是 Notready 状态,等候几分钟后

[root@node01 ~]# kubectl   get node
NAME      STATUS    ROLES     AGE       VERSION
node01    Ready     master    50m       v1.10.3
node02    Ready     master    44m       v1.10.3
node03    Ready     master    43m       v1.10.3
[root@node01 ~]# kubectl   get pods --all-namespaces
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE
kube-system   coredns-7997f8864c-4x7mg         1/1       Running   0          29m
kube-system   coredns-7997f8864c-zfcck         1/1       Running   0          29m
kube-system   kube-apiserver-node01            1/1       Running   0          29m
kube-system   kube-controller-manager-node01   1/1       Running   0          30m
kube-system   kube-flannel-ds-hw2xb            1/1       Running   0          1m
kube-system   kube-proxy-s265b                 1/1       Running   0          29m
kube-system   kube-scheduler-node01            1/1       Running   0          30m

以上步骤完成率之后,就可以确定k8s基本是安装完成了。

6.6 部署dashboard (这是图形化的管理集群的工具,可以不用安装)
kubernetes-dashboard.yaml文件内容如下
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Configuration to deploy release version of the Dashboard UI compatible with
# Kubernetes 1.8.
#
# Example usage: kubectl create -f 

# ------------------- Dashboard Secret ------------------- #

apiVersion: v1
kind: Secret
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard-certs
  namespace: kube-system
type: Opaque

---
# ------------------- Dashboard Service Account ------------------- #

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system

---
# ------------------- Dashboard Role & Role Binding ------------------- #

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kubernetes-dashboard-minimal
  namespace: kube-system
rules:
  # Allow Dashboard to create 'kubernetes-dashboard-key-holder' secret.
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["create"]
  # Allow Dashboard to create 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["create"]
  # Allow Dashboard to get, update and delete Dashboard exclusive secrets.
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs"]
  verbs: ["get", "update", "delete"]
  # Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["kubernetes-dashboard-settings"]
  verbs: ["get", "update"]
  # Allow Dashboard to get metrics from heapster.
- apiGroups: [""]
  resources: ["services"]
  resourceNames: ["heapster"]
  verbs: ["proxy"]
- apiGroups: [""]
  resources: ["services/proxy"]
  resourceNames: ["heapster", "http:heapster:", "https:heapster:"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubernetes-dashboard-minimal
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kubernetes-dashboard-minimal
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system

---
# ------------------- Dashboard Deployment ------------------- #

kind: Deployment
apiVersion: apps/v1beta2
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kubernetes-dashboard
  template:
    metadata:
      labels:
        k8s-app: kubernetes-dashboard
    spec:
      nodeSelector:
        node-role.kubernetes.io/master: ""
      containers:
      - name: kubernetes-dashboard
        image: registry.cn-hangzhou.aliyuncs.com/k8sth/kubernetes-dashboard-amd64:v1.8.3
        ports:
        - containerPort: 8443
          protocol: TCP
        args:
          - --auto-generate-certificates
          # Uncomment the following line to manually specify Kubernetes API server Host
          # If not specified, Dashboard will attempt to auto discover the API server and connect
          # to it. Uncomment only if the default does not work.
          # - --apiserver-host=http://my-address:port
        volumeMounts:
        - name: kubernetes-dashboard-certs
          mountPath: /certs
          # Create on-disk volume to store exec logs
        - mountPath: /tmp
          name: tmp-volume
        livenessProbe:
          httpGet:
            scheme: HTTPS
            path: /
            port: 8443
          initialDelaySeconds: 30
          timeoutSeconds: 30
      volumes:
      - name: kubernetes-dashboard-certs
        secret:
          secretName: kubernetes-dashboard-certs
      - name: tmp-volume
        emptyDir: {}
      serviceAccountName: kubernetes-dashboard
      # Comment the following tolerations if Dashboard must not be deployed on master
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule

---
# ------------------- Dashboard Service ------------------- #

kind: Service
apiVersion: v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  type: NodePort
  ports:
    - port: 443
      targetPort: 8443
      nodePort: 30000
  selector:
    k8s-app: kubernetes-dashboard

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kube-system
部署
kubectl create -f kubernetes-dashboard.yaml
获取token,通过令牌登陆
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
通过firefox访问dashboard,输入token,即可登陆
https://192.168.175.96:30000/#!/login

6.7 在node02和node03上面分别执行初始化

kubeadm init --config config.yaml
#初始化的结果和node01的结果完全一样
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

6.8 查看节点信息

[root@node01 ~]# kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
node01    Ready     master    5h        v1.10.3
node02    Ready     master    2h        v1.10.3
node03    Ready     master    1h        v1.10.3
[root@node01 ~]# kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   coredns-7997f8864c-5bvlg                1/1       Running   0          6m        10.244.1.2       node02
kube-system   coredns-7997f8864c-xbq2j                1/1       Running   0          6m        10.244.2.2       node03
kube-system   kube-apiserver-node01                   1/1       Running   3          5m        192.168.175.96   node01
kube-system   kube-apiserver-node02                   1/1       Running   0          1h        192.168.175.101   node02
kube-system   kube-apiserver-node03                   1/1       Running   0          1h        192.168.175.57   node03
kube-system   kube-controller-manager-node01          1/1       Running   3          5m        192.168.175.96   node01
kube-system   kube-controller-manager-node02          1/1       Running   0          1h        192.168.175.101   node02
kube-system   kube-controller-manager-node03          1/1       Running   1          1h        192.168.175.57   node03
kube-system   kube-flannel-ds-gwql9                   1/1       Running   1          1h        192.168.175.96   node01
kube-system   kube-flannel-ds-l8bfs                   1/1       Running   1          1h        192.168.175.101   node02
kube-system   kube-flannel-ds-xw5bv                   1/1       Running   1          1h        192.168.175.57   node03
kube-system   kube-proxy-cwlhw                        1/1       Running   0          1h        192.168.175.57   node03
kube-system   kube-proxy-jz9mk                        1/1       Running   3          5h        192.168.175.96   node01
kube-system   kube-proxy-zdbtc                        1/1       Running   0          2h        192.168.175.101   node02
kube-system   kube-scheduler-node01                   1/1       Running   3          5m        192.168.175.96   node01
kube-system   kube-scheduler-node02                   1/1       Running   0          1h        192.168.175.101   node02
kube-system   kube-scheduler-node03                   1/1       Running   1          1h        192.168.175.57   node03
kube-system   kubernetes-dashboard-7b44ff9b77-chdjp   1/1       Running   0          6m        10.244.2.3       node03

6.9 让master也运行pod(默认master不运行pod)

kubectl taint nodes --all node-role.kubernetes.io/master-

7. 添加node04节点到集群

在node04节点执行如下命令,即可将节点添加进集群
root@node04:~# kubeadm join 192.168.175.120:6443 --token b99a00.a144ef80536d4344 --discovery-token-ca-cert-hash sha256:a2551d730098fe59c8f0f9d77e07ab9e1ceb2d205678e4780826e8b7cc32aacf
[preflight] Running pre-flight checks.
	[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03
	[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "192.168.175.120:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.175.120:6443"
[discovery] Requesting info from "https://192.168.175.120:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.175.120:6443"
[discovery] Successfully established connection with API Server "192.168.175.120:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.
[root@node01 ~]# kubectl get node
NAME      STATUS    ROLES     AGE       VERSION
node01    Ready     master    45m       v1.10.0
node02    Ready     master    15m       v1.10.0
node03    Ready     master    14m       v1.10.0
node04    Ready         13m       v1.10.0

你可能感兴趣的:(系统运维)