阿里云Ubuntu系统部署K8s集群

首先说一下我的环境和配置:阿里云1核2G,系统是Ubuntu18.04(最好是2核,因为master有限制,不是的话也没关系,因为到时候可以忽略掉),node也是1核2G

好了开始进入正题吧

1,更新系统源

如果系统本身自带得镜像地址,服务器在国外,下载速度会很慢,可以打开 /etc/apt/sources.list 替换为国内得镜像源。

apt upgrade

2,更新软件包

将系统得软件组件更新至最新稳定版本。

apt update
Reading package lists... Done
 Building dependency tree
 Reading state information... Done
 The following additional packages will be installed:
   libcurl4
 The following packages will be upgraded:
   curl libcurl4
 2 upgraded, 0 newly installed, 0 to remove and 46 not upgraded.
 Need to get 378 kB of archives.
 After this operation, 0 B of additional disk space will be used.
 Do you want to continue? [Y/n] y
 Ign:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14
 Ign:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14
 Err:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14
   404  Not Found [IP: 100.100.2.148 80]
 Err:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14
   404  Not Found [IP: 100.100.2.148 80]
 E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/curl_7.58.0-2ubuntu3.14_amd64.deb  404  Not Found [IP: 100.100.2.148 80]
 E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/libcurl4_7.58.0-2ubuntu3.14_amd64.deb  404  Not Found [IP: 100.100.2.148 80]
 E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

不更新的话会遇到这个问题,所以记得更新哦,而且上边已给了提示run apt-get update or try with --fix-missing

3,安装 Docker

也可以参考其它过程安装

 apt-get install docker.io

如果需要配置为开机启动,可执行以下命令

systemctl enable docker
 ​
systemctl start docker

如果要配置 Docker 镜像加速,打开 /etc/docker/daemon.json 文件,registry-mirrors 增加或修改,加入https://registry.docker-cn.com 这个地址,也可以填写阿里云腾讯云等镜像加速地址。

示例

 {
     "registry-mirrors": [
 ​
         "https://registry.docker-cn.com"
 ​
     ]
 ​
 }

重启 Docker,使配置生效

 sudo systemctl daemon-reload
 ​
 sudo systemctl restart docker

4,安装 K8S

禁用 swapoff

 # 暂时关闭SWAP分区
 swapoff -a
 # 永久禁用SWAP分区
 swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

将系统中桥接的IPv4以及IPv6的流量串通:

cat >/etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

执行以下命令安装 https 工具以及 k8s。

apt-get update && apt-get install -y apt-transport-https curl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticated

#默认下载最新版本,也可指定版本
apt-get install -y kubelet-1.15.0 kubeadm-1.15.0 kubectl-1.15.0 --allow-unauthenticated
    
#常用命令
重启kubelet服务:
systemctl daemon-reload
systemctl restart kubelet
sudo systemctl restart kubelet.service
sudo systemctl daemon-reload
sudo systemctl stop kubelet
sudo systemctl enable kubelet
sudo systemctl start kubelet

执行下面命令测试是否正常

kubeadm version

#结果示例
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:37:34Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

如果安装时,出现下面情况,说明系统得镜像源中,找不到 k8s 的软件包。

E: Unable to locate package kubelet
E: Unable to locate package kubeadm
E: Unable to locate package kubectl

可以打开 /etc/apt/sources.list 文件,添加一行

deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main

先执行更新软件包命令,再次执行安装 K8s 的命令。

这一步安装curl时可能会遇到这个问题

The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
Reading package lists... Done
W: GPG error: https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
E: The repository 'https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

只需执行即可(key就是NO_PUBKEY后的值,根据你自己的key进行替换)

 apt-key adv --keyserver keyserver.ubuntu.com --recv-keys  FEEA9169307EA071

上面命令,安装了 kubeletkubeadmkubectlkubelet 是 k8s 相关服务,kubectlk8s 管理客户端,kubeadm 是部署工具。

如果只是node的话到这里就可以了

另一台阿里云加入集群只需执行(这个在下面会告诉你怎么弄出来的,等全看完再回来搞就行)

kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t         --discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060

可能会遇到的错

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

#解决:
#是因为docker和kubernetes所使用的cgroup不一致导致
#在docker中修改配置文件
cat > /etc/docker/daemon.json <                 15s   v1.22.2

5,Master初始化

执行下面命令进行初始化,会自动从网络中下载需要的 Docker 镜像。

此命令是用来部署主节点的Master

执行 kubeadm version 查看版本,GitVersion:"v1.17.2" 中即为版本号。

执行以下命令初始化(记得把ip换了)

kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96

--ignore-preflight-errors=NumCPU 是在只有一个 CPU 的时候使用,例如 1G1M 的学生服务器。

但是因为需要连接到 Google ,所以可能无法下载内容。

我们可以通过使用 kubeadm config images list 命令,列举需要拉取的镜像。我们来手动通过 Docker 拉取。这个过程比较麻烦,还需要手动修改镜像名称。

拉取方法 docker pull {镜像名称}

Google 访问不了,不过 DockerHub 已经备份好需要的镜像。

mirrorgooglecontainers 这个仓库备份了相应的镜像。遗憾的是,镜像不一定都是最新的备份。阿里云上面的 google_containers 仓库应该是备份最新的。

例如需要以下镜像

k8s.gcr.io/kube-apiserver:v1.22.2
k8s.gcr.io/kube-controller-manager:v1.22.2
k8s.gcr.io/kube-scheduler:v1.22.2
k8s.gcr.io/kube-proxy:v1.22.2
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns:1.8.4

则拉取对应的镜像

docker pull mirrorgooglecontainers/kube-apiserver:v1.22.2
docker pull mirrorgooglecontainers/kube-controller-manager:v1.22.2
docker pull mirrorgooglecontainers/kube-scheduler:v1.22.2
docker pull mirrorgooglecontainers/kube-proxy:v1.22.2
docker pull mirrorgooglecontainers/pause:3.5
docker pull mirrorgooglecontainers/etcd:3.5.0-0
docker pull coredns/coredns:1.8.4

使用 docker tag {旧名称:版本}:{新名称:版本} ,将镜像改名。

考虑到各种情况和可能会出现问题,笔者这里给出一个别人写的一键脚本,可以直接一键完成这一步。

touch pullk8s.sh	# 创建脚本文件
nano pullk8s.sh		# 编辑脚本

然后将以下内容复制进去

for  i  in  `kubeadm config images list`;  do
    imageName=${i#k8s.gcr.io/}
    docker pull registry.aliyuncs.com/google_containers/$imageName
    docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
    docker rmi registry.aliyuncs.com/google_containers/$imageName
done;

保存文件

Ctrl + O
回车键
Ctrl + x

给脚本文件赋权限

chmod +x pullk8s.sh

执行脚本

sh pullk8s.sh

然后执行 docker images 命令查看需要的镜像是否都准备好了。

root@ubuntu:~# docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                v1.22.2             cba2a99699bd        2 weeks ago         116MB
k8s.gcr.io/kube-apiserver            v1.22.2             41ef50a5f06a        2 weeks ago         171MB
k8s.gcr.io/kube-controller-manager   v1.22.2             da5fd66c4068        2 weeks ago         161MB
k8s.gcr.io/kube-scheduler            v1.22.2             f52d4c527ef2        2 weeks ago         94.4MB
k8s.gcr.io/coredns                   1.8.4               70f311871ae1        3 months ago        41.6MB
k8s.gcr.io/etcd                      3.5.0-0             303ce5db0e90        3 months ago        288MB
k8s.gcr.io/pause                     3.5                 da86e6ba6ca1        2 years ago         742kB

也可能会报错,报错的话就手动拉取

Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/coredns/coredns, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Error response from daemon: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
Error: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
docker pull coredns/coredns:1.8.4

#镜像改名命令格式:
docker  tag  旧镜像名  新镜像名

最后执行 开头的初始化命令。

kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96

因为阿里云ecs里没有配置公网ip,etcd无法启动,所以kubeadm在初始化会出现”timeout“的错误。

解决办法:

1.建立两个ssh对话,即用ssh工具新建两个标签,一个用来初始化节点,另一个在初始化过程中修改配置文件。 注意是初始化过程中,每次运行kubeadm init,kubeadm都会生成etcd的配置文件,如果提前修改了配置文件,在运行kubeadm init时会把修改的结果覆盖,那么也就没有作用了。

2.运行”kubeadm init …“上述的初始化命令,此时会卡在

Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed

3.在输入上述命令后,kubeadm即开始了master节点的初始化,但是由于etcd配置文件不正确,所以etcd无法启动,要对该文件进行修改。 文件路径"/etc/kubernetes/manifests/etcd.yaml"。

#对文件这两行进行修改
--listen-client-urls=https://127.0.0.1:2379,https://39.96.46.96:2379
--listen-peer-urls=https://39.96.46.96:2380
#修改后
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380

4.要关注的是"–listen-client-urls"和"–listen-peer-urls"。需要把"–listen-client-urls"后面的公网ip删除,把"–listen-peer-urls"改为本地的地址。

稍等后master节点初始化就会完成

可能遇到的问题

[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

#执行指令
swapoff -a && kubeadm reset  && systemctl daemon-reload && systemctl restart kubelet  && iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
#再执行初始化就可以了
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96

可能遇到的问题

error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns/coredns:v1.8.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1

只需打开这个网址The Best IP Address, Email and Networking Tools - IPAddress.com,搜索https://k8s.gcr.io得到它的 ip 142.250.113.82,打开本机hosts文件,Linux是

vim /etc/hosts,将上面的网址和ip按下面的形式加入进去即可,不是root用户记得sudo

142.250.113.82  k8s.gcr.io

还是有问题

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

是因为docker和kubernetes所使用的cgroup不一致导致

解决方法 在docker中修改配置文件

cat > /etc/docker/daemon.json <

重启docker

systemctl restart docker

之后还是会有问题,这些就简单了报什么错就解决什么(在此我附上我遇到的问题)

[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists

#解决方法(此处省略了几个,步骤都一样我就不写了)
cd /etc/kubernetes/manifests/
rm kube-apiserver.yaml

[ERROR Port-10250]: Port 10250 is in use

#解决方法(此处省略了几个,步骤都一样我就不写了)
lsof -i:10250
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kubelet 22055 root   27u  IPv6 773301      0t0  TCP *:10250 (LISTEN)
kill -9 22055

[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty

#解决方法
cd /var/lib/etcd/
rm -r member/

再次执行初始化命令就会成功

#成功后的结果
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t \
        --discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060

在master节点上执行如下

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#检查 master 
kubectl get nodes
root@ubuntu:~# kubectl get nodes
NAME     STATUS      ROLES                  AGE   VERSION
master   NotReady    control-plane,master   26h   v1.22.2
node     Ready                        15s   v1.22.2

#添加网络插件
sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#结果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

kubectl get pods --all-namespaces
#如果显示这样,个别的Pod是Pending状态
NAMESPACE     NAME                             READY   STATUS     RESTARTS      AGE
kube-system   coredns-78fcd69978-fkkmh         0/1     Pending    0             17m
kube-system   coredns-78fcd69978-qrx2c         0/1     Pending    0             17m
kube-system   etcd-ubuntu                      1/1     Running    0             17m
kube-system   kube-apiserver-ubuntu            1/1     Running    1 (19m ago)   17m
kube-system   kube-controller-manager-ubuntu   1/1     Running    2 (20m ago)   17m
kube-system   kube-flannel-ds-g97gm            0/1     Init:0/1   0             80s
kube-system   kube-proxy-f6ctf                 1/1     Running    0             17m
kube-system   kube-scheduler-ubuntu            1/1     Running    2 (19m ago)   17m

#只需把  185.199.111.133 raw.githubusercontent.com  加到hosts文件就可以,再次执行添加网络插件的指令就OK了
NAMESPACE     NAME                             READY   STATUS    RESTARTS      AGE
kube-system   coredns-78fcd69978-fkkmh         1/1     Running   0             28m
kube-system   coredns-78fcd69978-qrx2c         1/1     Running   0             28m
kube-system   etcd-ubuntu                      1/1     Running   0             28m
kube-system   kube-apiserver-ubuntu            1/1     Running   1 (30m ago)   28m
kube-system   kube-controller-manager-ubuntu   1/1     Running   2 (31m ago)   28m
kube-system   kube-flannel-ds-g97gm            1/1     Running   0             11m
kube-system   kube-proxy-f6ctf                 1/1     Running   0             28m
kube-system   kube-scheduler-ubuntu            1/1     Running   2 (30m ago)   28m

kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   26h   v1.22.2
node     Ready                     15s   v1.22.2

到这里那个node状态可能还是Notready

kube-system   coredns-78fcd69978-fkkmh             0/1     Pending                 0              17m
kube-system   coredns-78fcd69978-qrx2c             0/1     Pending                 0              17m
kube-system   kube-flannel-ds-4ksws                0/1     Init:ErrImagePull       0              110m
kube-system   kube-flannel-ds-ls7qr                0/1     Init:ImagePullBackOff   0              4m35s

#发现coredns一直处于pending状态,再进一步看kuberctl.services日志
journalctl -f -u kubelet.service
#结果
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.514075    8358 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="737fd9370b28e559d78073f038515969ec776e15a59a5688f6b230860d11184f"
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526    8358 cni.go:333] "CNI failed to retrieve network namespace path" err="cannot find network namespace for the terminated container \"737fd9370b28e559d78073f038515969ec776e15a59a5688f6b230860d11184f\""

#看到是网络的问题,应该是fu
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#结果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
#查看node状态,发现还是notready。查看日志
journalctl -f -u kubelet.service 
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526    8358 cni.go:333] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526    8358 cni.go:333] Error validating CNI config &{cbr0  false [0xc0006a22a0 0xc0006a2360] [123 10 32 32 34 110 97 109 101 34 58 32 34 99 98 114 48 34 44 10 32 32 34 112 108 117 103 105 110 115 34 58 32 91 10 32 32 32 32 123 10 32 32 32 32 32 32 34 116 121 112 101 34 58 32 34 102 108 97 110 110 101 108 34 44 10 32 32 32 32 32 32 34 100 101 108 101 103 97 116 101 34 58 32 123 10 32 32 32 32 32 32 32 32 34 104 97 105 114 112 105 110 77 111 100 101 34 58 32 116 114 117 101 44 10 32 32 32 32 32 32 32 32 34 105 115 68 101 102 97 117 108 116 71 97 116 101 119 97 121 34 58 32 116 114 117 101 10 32 32 32 32 32 32 125 10 32 32 32 32 125 44 10 32 32 32 32 123 10 32 32 32 32 32 32 34 116 121 112 101 34 58 32 34 112 111 114 116 109 97 112 34 44 10 32 32 32 32 32 32 34 99 97 112 97 98 105 108 105 116 105 101 115 34 58 32 123 10 32 32 32 32 32 32 32 32 34 112 111 114 116 77 97 112 112 105 110 103 115 34 58 32 116 114 117 101 10 32 32 32 32 32 32 125 10 32 32 32 32 125 10 32 32 93 10 125 10]}: [plugin flannel does not support config version ""]
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526    8358 cni.go:333] Unable to update cni config: no valid networks found in /etc/cni/net.d
#发现报错plugin flannel does not support config version ,修改配置文件

vim /etc/cni/net.d/10-flannel.conflist
//加上cni的版本号
//文件内容如下
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

#修改后,运行 systemctl daemon-reload 
#再查看集群状态,发现master正常了,处于ready状态
NAME         STATUS   ROLES                  AGE     VERSION
k8s-master   Ready    control-plane,master   24h     v1.22.2

此处为止,k8s集群基本安装已完成,因为目前我暂时没有dashboard的需求,所以暂时没有安装,等有需求了我再回来更新哈哈

--------------------------更新

dashboard已更新,有需求的可以看另一篇文章

你可能感兴趣的:(k8s,k8s)