kubernetes集群安装笔记

准备

centos7主机两台

192.168.209.102  node102
192.168.209.103  node103  (master)

设置hostname的方法

#设置hostname 的方法 
hostnamectl set-hostname node102     #在 192.168.209.102 上执行
hostnamectl set-hostname node103     #在 192.168.209.103 上执行
hostnamectl --static #查看设置结果

软件版本

docker 18.09.0
kubelet-1.15.4 kubeadm-1.15.4 kubectl-1.15.4 
helm 2.14.0

所有操作无特殊说明都需要在所有节点(k8s-master 和 k8s-node)上执行

环境

关闭防火墙

systemctl stop firewalld.service
systemctl disable firewalld.service

如果不想启用防火墙,设置可以参考这里看一下kubernetes需要开放的端口 https://kubernetes.io/docs/setup/independent/install-kubeadm/#check-required-ports

关闭swap

kubernetes1.8开始不关闭swap无法启动

禁用交换分区

swapoff -a && sed -i '/ swap / s/^/#/' /etc/fstab

也就是去掉 /etc/fstab 里面这一行/dev/mapper/centos-swap swap swap defaults

禁用SELinux

将 SELinux 设置为 permissive 模式(将其禁用)

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

安装docker

设置docker镜像源

curl -o /etc/yum.repos.d/Docker-ce-Ali.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo


yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum makecache fast

运行命令查看可用docker镜像
yum list docker-ce --showduplicates | sort -r


安装指定版本的docker
yum install docker-ce-18.09.0-3.el7 -y

安装完毕后启动docker

systemctl start docker

如果不是从全新的环境开始安装,最好把docker清理干净

docker ps   #为空
docker ps -a   #为空**加粗样式**
docker network   #只有默认三种网络
docker volume ls   #为空
管理节点上 docker service ls  #为空

修改iptables

CentOS 7上的一些用户报告了由于iptables被绕过而导致流量路由不正确的问题。创建/etc/sysctl.d/k8s.conf文件,添加如下内容

cat <  /etc/sysctl.d/k8s.conf
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

使配置生效

#使配置生效
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf

加载ipvs模块

cat > /etc/sysconfig/modules/ipvs.modules <

开始

用kubeadm 部署 kubernetes

安装kubeadm, kubelet 注意: ==yum install 安装的时候一定要看一下kubernetes的版本号后面kubeadm init 的时候需要用到==

cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF


#安装   注意::这里一定要看一下版本号,因为 Kubeadm init 的时候 填写的版本号不能低于kuberenete版本
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
#注 如果需要指定版本 用下面的命令   kubelet-
yum install kubelet-1.15.4 kubeadm-1.15.4 kubectl-1.15.4 --disableexcludes=kubernetes

#启动 kubelet 
systemctl enable kubelet.service && systemctl start kubelet.service
#查看 kubelet 状态
[root@node103 ~]#  systemctl status kubelet.service
#查看出错信息
[root@node103 ~]#  journalctl -xefu kubelet

启动kubelet.service之后 我们查看一下kubelet状态是未启动状态,查看原因发现是 “/var/lib/kubelet/config.yaml”文件不存在,这里可以暂时先不用处理,当kubeadm init 之后会创建此文件

Master 节点

kubeadm init \
--apiserver-advertise-address=192.168.209.103 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.15.4 \
--pod-network-cidr=10.244.0.0/16 \
--token-ttl 0

--apiserver-advertise-address 对应的是master节点的IP
--image-repository 设置为阿里云的镜像,防止部分镜像无法拉取
--kubernetes-version 关闭版本探测,因为它的默认值是stable-1,会从https://storage.googleapis.com/kubernetes-release/release/stable-1.txt下载最新的版本号,指定版本跳过网络请求,再次强调==一定要和Kubernetes版本号一致==

[init] Using Kubernetes version: v1.15.4
[preflight] Running pre-flight checks
......
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.209.103:6443 --token 9yh5wi.15p63wyw19kkzxsl \
    --discovery-token-ca-cert-hash sha256:e6ddd0a8419514ab5e98bceea40b347aaf8b2044db9fceade811da7bec51c362 

初始化成功后会提示在使用之前需要再配置一下,配置方法已经给出,另外会生成一个临时token以及增加节点的方法

#普通用户要使用k8s 需要执行下面操作
  
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config


#如果是root 可以直接执行
export KUBECONFIG=/etc/kubernetes/admin.conf

# 以上两个二选一即可,这里我是直接用的root 所以直接执行
export KUBECONFIG=/etc/kubernetes/admin.conf

现在我们查看一下 kubelet 的状态 已经是 running 状态 ,启动成功

[root@node103 ~]#  systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sat 2019-10-12 14:24:49 CST; 47min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 2092 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─2092 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf -...

Oct 12 14:25:11 node103 kubelet[2092]: E1012 14:25:11.090966    2092 kuberuntime_manager.go:692] createPodSandbox for pod "coredns-bccdc95cf...
Oct 12 14:25:11 node103 kubelet[2092]: E1012 14:25:11.091062    2092 pod_workers.go:190] Error syncing pod a6dd2502-9bcf-426e-a5a8-5...a8-53caf
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.522357    2092 docker_sandbox.go:384] failed to read pod IP from plugin/docker: Networ...
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.536992    2092 pod_container_deletor.go:75] Container "40bb67673e4888b1b69b9f4...ntainers
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.547593    2092 cni.go:309] CNI failed to retrieve network namespace path: cann...25eb6a8"
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.548785    2092 docker_sandbox.go:384] failed to read pod IP from plugin/docker: Networ...
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.562485    2092 pod_container_deletor.go:75] Container "e9e61a73a517720c96508bd...ntainers
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.566360    2092 cni.go:309] CNI failed to retrieve network namespace path: cann...bb38bc4"
Oct 12 14:25:12 node103 kubelet[2092]: W1012 14:25:12.806694    2092 pod_container_deletor.go:75] Container "cfb51541812189c587dbdf8...ntainers
Oct 12 14:25:12 node103 kubelet[2092]: W1012 14:25:12.843299    2092 pod_container_deletor.go:75] Container "27626aad865161213e9e2e1...ntainers
Hint: Some lines were ellipsized, use -l to show in full.

查看状态

确认每个 组件都是 Healthy 状态

[root@node103 ~]# kubectl get cs
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok                  
scheduler            Healthy   ok                  
etcd-0               Healthy   {"health":"true"} 

查看node状态

[root@node103 ~]# kubectl get node
NAME      STATUS    ROLES    AGE     VERSION
node103   NotReady  master   18h     v1.15.4

安装port Network( flannel )

k8s cluster 工作 必须安装pod网络,否则pod之间无法通信,k8s支持多种方案,这里选择flannel

[root@node103 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

podsecuritypolicy.extensions/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created

检查Pod状态

[root@node103 ~]# kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE   IP                NODE      NOMINATED NODE   READINESS GATES
kube-system   coredns-bccdc95cf-4x6dk           1/1     Running   10         13h   10.244.0.4        node103              
kube-system   coredns-bccdc95cf-tjvv9           1/1     Running   10         13h   10.244.0.5        node103              
kube-system   etcd-node103                      1/1     Running   3          13h   192.168.209.103   node103              
kube-system   kube-apiserver-node103            1/1     Running   3          13h   192.168.209.103   node103              
kube-system   kube-controller-manager-node103   1/1     Running   3          13h   192.168.209.103   node103              
kube-system   kube-flannel-ds-amd64-qxchh       1/1     Running   1          13h   192.168.209.103   node103              
kube-system   kube-proxy-kpbkx                  1/1     Running   3          13h   192.168.209.103   node103              
kube-system   kube-scheduler-node103            1/1     Running   3          13h   192.168.209.103   node103              

好事多磨,kube-flannel-ds-amd64-qxchh, coredns-bccdc95cf-tjvv9.coredns-bccdc95cf-4x6dk这几个服务不正常折腾了我将近一天的时间。现在把遇到的问题和解决方案简略的讲一下:
查看错误最好的方法就是根据日志定位错误,查询pod状态信息的命令如下

kubectl describe pod coredns-bccdc95cf-4x6dk   --namespace=kube-system
  1. 镜像包下不下来
    设置阿里云镜像,单独执型docker pull命令,我遇到了这个包无法pull failed的问题
docker pull quay.io/coreos/flannel:v0.11.0-amd64
  1. 发现报错plugin flannel does not support config version

修改配置文件


{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

systemctl daemon-reload

  1. 使用 kubectl 命令是报错
[root@node103 ~]#  kubectl get pod
The connection to the server localhost:8080 was refused - did you specify the right host or port?

原因: 由于使用kubeadm安装的k8s ,所以需要使用 kubernetes-admin 来运行。

解决方法: (如果admin.conf没有就从master节点上copy一份到当前节点)

  1. dial tcp 10.96.0.1:443: getsockopt: no route to host --- kubernetes(k8s)DNS 服务反复重启
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker

增加节点

在从节点node102执行如下命令:

kubeadm join --token  : --discovery-token-ca-cert-hash sha256:

: ,本文这里对应得是192.168.209.103:6443
token,一般token两天就过期了,如果过期了你需要重新创建(查看token命令是kubeadm token list,创建token命令是kubeadm token create),如下

[root@node103 ~]#  kubeadm token list
TOKEN                     TTL         EXPIRES   USAGES                   DESCRIPTION                                                EXTRA GROUPS
9yh5wi.15p63wyw19kkzxsl         authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:boot

[root@node103 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
e6ddd0a8419514ab5e98bceea40b347aaf8b2044db9fceade811da7bec51c362
# 增加子节点的命令如下:
[root@node103 ~]# kubeadm join 192.168.209.103:6443 --token 9yh5wi.15p63wyw19kkzxsl    --discovery-token-ca-cert-hash sha256:e6ddd0a8419514ab5e98bceea40b347aaf8b2044db9fceade811da7bec51c362

用kubeadm 增加节点的方法::有时 忘记token 或 token过期,以及查看 --discovery-token-ca-cert-hash 的方法

#查看当前存在的token
[root@]# kubeadm token list

#生成新的token
[root@]# kubeadm token create

#再次查看已有的token 发现多了一个
[root@]# kubeadm token list


#查看 --discovery-token-ca-cert-hash 方法
[root@]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null |    openssl dgst -sha256 -hex | sed 's/^.* //'

#删掉oken
[root@]# kubeadm token delete  token字符串
bootstrap token with id "hpvhe4" deleted

执行完成之后查看节点

[root@node103 linux-amd64]# kubectl get nodes
NAME      STATUS   ROLES    AGE    VERSION
node102   Ready       173m   v1.15.4
node103   Ready    master   17h    v1.15.4
[root@node103 linux-amd64]# kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-bccdc95cf-4x6dk           1/1     Running   10         17h
coredns-bccdc95cf-tjvv9           1/1     Running   10         17h
etcd-node103                      1/1     Running   3          17h
kube-apiserver-node103            1/1     Running   3          17h
kube-controller-manager-node103   1/1     Running   3          17h
kube-flannel-ds-amd64-qxchh       1/1     Running   1          16h
kube-flannel-ds-amd64-txjq8       1/1     Running   0          174m
kube-proxy-4fgbn                  1/1     Running   0          174m
kube-proxy-kpbkx                  1/1     Running   3          17h
kube-scheduler-node103            1/1     Running   3          17h
tiller-deploy-7bbf796b9c-f4sws    1/1     Running   0          2m51s

删除node

删除节点之后,节点想再次加入到集群中 需要先执行 kubeadm reset , 之后再执行 kubeadm join
[root@node103 ~]# kubectl delete node node102
---node102 节点名称,当然不只这一种删除pod的方法,我这里不一一列出了

安装helm

helm

在helm的github上下载想要安装的helm版本包,本文为helm-v2.14.0-linux-amd64.tar.gz

[root@node103 ~]# tar -zxvf helm-v2.14.0-linux-amd64.tar.gz 
[root@node103 ~]# mv linux-amd64/helm  /usr/bin

tiller

初始化并验证 Helm,这样就会自动安装服务器端Tiller。注意:由于国内网络的问题,在安装 Tiller 的时候,需要下载镜像 gcr.io/kubernetes-helm/tiller:v2.14.0,很有可能会安装失败。所以我们这里使用阿里镜像来安装Tiller。

[root@node103 linux-amd64]# helm init --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.0 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts 
Adding local repo with URL: http://127.0.0.1:8879/charts 
$HELM_HOME has been configured at /root/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation

1.6开始,API Server启用了RBAC授权。而Tiller部署没有定义授权的ServiceAccount,这会导致访问API Server时被拒绝。我们可以采用如下方法,为Tiller部署添加授权

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

若没有找到该pods,则重新执行命令

helm init --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.0 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts

如果需要删除服务端,可以使用下面的命令

helm reset

helm reset -f

当要移除helm init创建的目录等数据时,执行helm reset --remove-helm-home

问题

  1. Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 192.168.209.102:10250: connect: no route to host

    我重启了一下虚拟机就好了。应该是iptables的问题,解决方式参考上面的问题4.

参考文档

  • centos7 使用kubeadm 快速部署 kubernetes 国内源
  • kubernetes 常见问题整理
  • K8s - Kubernetes集群的安装部署教程(CentOS系统)
  • Docker & kubernetes(k8s) 集群搭建
  • 二进制安装Kubernetes(K8s)集群---从零安装教程
  • kubernetes(k8s)离线安装helm和tiller
  • k8s包管理器helm安装部署及使用
  • k8s集群 添加节点过程记录及问题解决
  • Kubernetes系列之二:将Slave节点加入集群
  • 世民谈云计算 理解Docker(8)

你可能感兴趣的:(kubernetes集群安装笔记)