首先说一下我的环境和配置:阿里云1核2G,系统是Ubuntu18.04(最好是2核,因为master有限制,不是的话也没关系,因为到时候可以忽略掉),node也是1核2G
好了开始进入正题吧
如果系统本身自带得镜像地址,服务器在国外,下载速度会很慢,可以打开 /etc/apt/sources.list
替换为国内得镜像源。
apt upgrade
将系统得软件组件更新至最新稳定版本。
apt update
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libcurl4
The following packages will be upgraded:
curl libcurl4
2 upgraded, 0 newly installed, 0 to remove and 46 not upgraded.
Need to get 378 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Ign:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14
Ign:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14
Err:1 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.14
404 Not Found [IP: 100.100.2.148 80]
Err:2 http://mirrors.cloud.aliyuncs.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.14
404 Not Found [IP: 100.100.2.148 80]
E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/curl_7.58.0-2ubuntu3.14_amd64.deb 404 Not Found [IP: 100.100.2.148 80]
E: Failed to fetch http://mirrors.cloud.aliyuncs.com/ubuntu/pool/main/c/curl/libcurl4_7.58.0-2ubuntu3.14_amd64.deb 404 Not Found [IP: 100.100.2.148 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
不更新的话会遇到这个问题,所以记得更新哦,而且上边已给了提示run apt-get update or try with --fix-missing
也可以参考其它过程安装
apt-get install docker.io
如果需要配置为开机启动,可执行以下命令
systemctl enable docker
systemctl start docker
如果要配置 Docker 镜像加速,打开 /etc/docker/daemon.json
文件,registry-mirrors 增加或修改,加入https://registry.docker-cn.com
这个地址,也可以填写阿里云腾讯云等镜像加速地址。
示例
{
"registry-mirrors": [
"https://registry.docker-cn.com"
]
}
重启 Docker,使配置生效
sudo systemctl daemon-reload
sudo systemctl restart docker
禁用 swapoff
# 暂时关闭SWAP分区
swapoff -a
# 永久禁用SWAP分区
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
将系统中桥接的IPv4以及IPv6的流量串通:
cat >/etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
执行以下命令安装 https 工具以及 k8s。
apt-get update && apt-get install -y apt-transport-https curl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticated
#默认下载最新版本,也可指定版本
apt-get install -y kubelet-1.15.0 kubeadm-1.15.0 kubectl-1.15.0 --allow-unauthenticated
#常用命令
重启kubelet服务:
systemctl daemon-reload
systemctl restart kubelet
sudo systemctl restart kubelet.service
sudo systemctl daemon-reload
sudo systemctl stop kubelet
sudo systemctl enable kubelet
sudo systemctl start kubelet
执行下面命令测试是否正常
kubeadm version
#结果示例
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:37:34Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
如果安装时,出现下面情况,说明系统得镜像源中,找不到 k8s 的软件包。
E: Unable to locate package kubelet
E: Unable to locate package kubeadm
E: Unable to locate package kubectl
可以打开 /etc/apt/sources.list
文件,添加一行
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
先执行更新软件包命令,再次执行安装 K8s 的命令。
这一步安装curl时可能会遇到这个问题
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
Reading package lists... Done
W: GPG error: https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
E: The repository 'https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
只需执行即可(key就是NO_PUBKEY后的值,根据你自己的key进行替换)
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys FEEA9169307EA071
上面命令,安装了 kubelet
、kubeadm
、kubectl
,kubelet
是 k8s 相关服务,kubectl
是 k8s
管理客户端,kubeadm
是部署工具。
如果只是node的话到这里就可以了
另一台阿里云加入集群只需执行(这个在下面会告诉你怎么弄出来的,等全看完再回来搞就行)
kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t --discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060
可能会遇到的错
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
#解决:
#是因为docker和kubernetes所使用的cgroup不一致导致
#在docker中修改配置文件
cat > /etc/docker/daemon.json < 15s v1.22.2
执行下面命令进行初始化,会自动从网络中下载需要的 Docker 镜像。
此命令是用来部署主节点的Master。
执行 kubeadm version
查看版本,GitVersion:"v1.17.2"
中即为版本号。
执行以下命令初始化(记得把ip换了)
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96
--ignore-preflight-errors=NumCPU
是在只有一个 CPU 的时候使用,例如 1G1M 的学生服务器。
但是因为需要连接到 Google ,所以可能无法下载内容。
我们可以通过使用 kubeadm config images list
命令,列举需要拉取的镜像。我们来手动通过 Docker 拉取。这个过程比较麻烦,还需要手动修改镜像名称。
拉取方法 docker pull {镜像名称}
。
Google 访问不了,不过 DockerHub 已经备份好需要的镜像。
mirrorgooglecontainers 这个仓库备份了相应的镜像。遗憾的是,镜像不一定都是最新的备份。阿里云上面的 google_containers 仓库应该是备份最新的。
例如需要以下镜像
k8s.gcr.io/kube-apiserver:v1.22.2
k8s.gcr.io/kube-controller-manager:v1.22.2
k8s.gcr.io/kube-scheduler:v1.22.2
k8s.gcr.io/kube-proxy:v1.22.2
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns:1.8.4
则拉取对应的镜像
docker pull mirrorgooglecontainers/kube-apiserver:v1.22.2
docker pull mirrorgooglecontainers/kube-controller-manager:v1.22.2
docker pull mirrorgooglecontainers/kube-scheduler:v1.22.2
docker pull mirrorgooglecontainers/kube-proxy:v1.22.2
docker pull mirrorgooglecontainers/pause:3.5
docker pull mirrorgooglecontainers/etcd:3.5.0-0
docker pull coredns/coredns:1.8.4
使用 docker tag {旧名称:版本}:{新名称:版本}
,将镜像改名。
考虑到各种情况和可能会出现问题,笔者这里给出一个别人写的一键脚本,可以直接一键完成这一步。
touch pullk8s.sh # 创建脚本文件
nano pullk8s.sh # 编辑脚本
然后将以下内容复制进去
for i in `kubeadm config images list`; do
imageName=${i#k8s.gcr.io/}
docker pull registry.aliyuncs.com/google_containers/$imageName
docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
docker rmi registry.aliyuncs.com/google_containers/$imageName
done;
保存文件
Ctrl + O
回车键
Ctrl + x
给脚本文件赋权限
chmod +x pullk8s.sh
执行脚本
sh pullk8s.sh
然后执行 docker images
命令查看需要的镜像是否都准备好了。
root@ubuntu:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.22.2 cba2a99699bd 2 weeks ago 116MB
k8s.gcr.io/kube-apiserver v1.22.2 41ef50a5f06a 2 weeks ago 171MB
k8s.gcr.io/kube-controller-manager v1.22.2 da5fd66c4068 2 weeks ago 161MB
k8s.gcr.io/kube-scheduler v1.22.2 f52d4c527ef2 2 weeks ago 94.4MB
k8s.gcr.io/coredns 1.8.4 70f311871ae1 3 months ago 41.6MB
k8s.gcr.io/etcd 3.5.0-0 303ce5db0e90 3 months ago 288MB
k8s.gcr.io/pause 3.5 da86e6ba6ca1 2 years ago 742kB
也可能会报错,报错的话就手动拉取
Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/coredns/coredns, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Error response from daemon: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
Error: No such image: registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.4
docker pull coredns/coredns:1.8.4
#镜像改名命令格式:
docker tag 旧镜像名 新镜像名
最后执行 开头的初始化命令。
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96
因为阿里云ecs里没有配置公网ip,etcd无法启动,所以kubeadm在初始化会出现”timeout“的错误。
解决办法:
1.建立两个ssh对话,即用ssh工具新建两个标签,一个用来初始化节点,另一个在初始化过程中修改配置文件。 注意是初始化过程中,每次运行kubeadm init,kubeadm都会生成etcd的配置文件,如果提前修改了配置文件,在运行kubeadm init时会把修改的结果覆盖,那么也就没有作用了。
2.运行”kubeadm init …“上述的初始化命令,此时会卡在
Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed
3.在输入上述命令后,kubeadm即开始了master节点的初始化,但是由于etcd配置文件不正确,所以etcd无法启动,要对该文件进行修改。 文件路径"/etc/kubernetes/manifests/etcd.yaml"。
#对文件这两行进行修改
--listen-client-urls=https://127.0.0.1:2379,https://39.96.46.96:2379
--listen-peer-urls=https://39.96.46.96:2380
#修改后
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
4.要关注的是"–listen-client-urls"和"–listen-peer-urls"。需要把"–listen-client-urls"后面的公网ip删除,把"–listen-peer-urls"改为本地的地址。
稍等后master节点初始化就会完成
可能遇到的问题
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
#执行指令
swapoff -a && kubeadm reset && systemctl daemon-reload && systemctl restart kubelet && iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
#再执行初始化就可以了
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=NumCPU --apiserver-advertise-address=39.96.46.96
可能遇到的问题
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns/coredns:v1.8.4: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
只需打开这个网址The Best IP Address, Email and Networking Tools - IPAddress.com,搜索https://k8s.gcr.io得到它的 ip 142.250.113.82,打开本机hosts文件,Linux是
vim /etc/hosts,将上面的网址和ip按下面的形式加入进去即可,不是root用户记得sudo
142.250.113.82 k8s.gcr.io
还是有问题
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
是因为docker和kubernetes所使用的cgroup不一致导致
解决方法 在docker中修改配置文件
cat > /etc/docker/daemon.json <
重启docker
systemctl restart docker
之后还是会有问题,这些就简单了报什么错就解决什么(在此我附上我遇到的问题)
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
#解决方法(此处省略了几个,步骤都一样我就不写了)
cd /etc/kubernetes/manifests/
rm kube-apiserver.yaml
[ERROR Port-10250]: Port 10250 is in use
#解决方法(此处省略了几个,步骤都一样我就不写了)
lsof -i:10250
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kubelet 22055 root 27u IPv6 773301 0t0 TCP *:10250 (LISTEN)
kill -9 22055
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
#解决方法
cd /var/lib/etcd/
rm -r member/
再次执行初始化命令就会成功
#成功后的结果
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 39.96.46.96:6443 --token 9vbzuf.vtzj1w5vefjlwi0t \
--discovery-token-ca-cert-hash sha256:b6e6fffb6b0e11d2db374ce21f6d86de3e09e1e13075e1bf01055130c2c5e060
在master节点上执行如下
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#检查 master
kubectl get nodes
root@ubuntu:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master NotReady control-plane,master 26h v1.22.2
node Ready 15s v1.22.2
#添加网络插件
sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#结果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
kubectl get pods --all-namespaces
#如果显示这样,个别的Pod是Pending状态
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-fkkmh 0/1 Pending 0 17m
kube-system coredns-78fcd69978-qrx2c 0/1 Pending 0 17m
kube-system etcd-ubuntu 1/1 Running 0 17m
kube-system kube-apiserver-ubuntu 1/1 Running 1 (19m ago) 17m
kube-system kube-controller-manager-ubuntu 1/1 Running 2 (20m ago) 17m
kube-system kube-flannel-ds-g97gm 0/1 Init:0/1 0 80s
kube-system kube-proxy-f6ctf 1/1 Running 0 17m
kube-system kube-scheduler-ubuntu 1/1 Running 2 (19m ago) 17m
#只需把 185.199.111.133 raw.githubusercontent.com 加到hosts文件就可以,再次执行添加网络插件的指令就OK了
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-fkkmh 1/1 Running 0 28m
kube-system coredns-78fcd69978-qrx2c 1/1 Running 0 28m
kube-system etcd-ubuntu 1/1 Running 0 28m
kube-system kube-apiserver-ubuntu 1/1 Running 1 (30m ago) 28m
kube-system kube-controller-manager-ubuntu 1/1 Running 2 (31m ago) 28m
kube-system kube-flannel-ds-g97gm 1/1 Running 0 11m
kube-system kube-proxy-f6ctf 1/1 Running 0 28m
kube-system kube-scheduler-ubuntu 1/1 Running 2 (30m ago) 28m
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 26h v1.22.2
node Ready 15s v1.22.2
到这里那个node状态可能还是Notready
kube-system coredns-78fcd69978-fkkmh 0/1 Pending 0 17m
kube-system coredns-78fcd69978-qrx2c 0/1 Pending 0 17m
kube-system kube-flannel-ds-4ksws 0/1 Init:ErrImagePull 0 110m
kube-system kube-flannel-ds-ls7qr 0/1 Init:ImagePullBackOff 0 4m35s
#发现coredns一直处于pending状态,再进一步看kuberctl.services日志
journalctl -f -u kubelet.service
#结果
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.514075 8358 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="737fd9370b28e559d78073f038515969ec776e15a59a5688f6b230860d11184f"
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526 8358 cni.go:333] "CNI failed to retrieve network namespace path" err="cannot find network namespace for the terminated container \"737fd9370b28e559d78073f038515969ec776e15a59a5688f6b230860d11184f\""
#看到是网络的问题,应该是fu
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#结果
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
#查看node状态,发现还是notready。查看日志
journalctl -f -u kubelet.service
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526 8358 cni.go:333] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526 8358 cni.go:333] Error validating CNI config &{cbr0 false [0xc0006a22a0 0xc0006a2360] [123 10 32 32 34 110 97 109 101 34 58 32 34 99 98 114 48 34 44 10 32 32 34 112 108 117 103 105 110 115 34 58 32 91 10 32 32 32 32 123 10 32 32 32 32 32 32 34 116 121 112 101 34 58 32 34 102 108 97 110 110 101 108 34 44 10 32 32 32 32 32 32 34 100 101 108 101 103 97 116 101 34 58 32 123 10 32 32 32 32 32 32 32 32 34 104 97 105 114 112 105 110 77 111 100 101 34 58 32 116 114 117 101 44 10 32 32 32 32 32 32 32 32 34 105 115 68 101 102 97 117 108 116 71 97 116 101 119 97 121 34 58 32 116 114 117 101 10 32 32 32 32 32 32 125 10 32 32 32 32 125 44 10 32 32 32 32 123 10 32 32 32 32 32 32 34 116 121 112 101 34 58 32 34 112 111 114 116 109 97 112 34 44 10 32 32 32 32 32 32 34 99 97 112 97 98 105 108 105 116 105 101 115 34 58 32 123 10 32 32 32 32 32 32 32 32 34 112 111 114 116 77 97 112 112 105 110 103 115 34 58 32 116 114 117 101 10 32 32 32 32 32 32 125 10 32 32 32 32 125 10 32 32 93 10 125 10]}: [plugin flannel does not support config version ""]
Oct 16 15:52:20 k8s-master kubelet[8358]: I1016 15:52:20.534526 8358 cni.go:333] Unable to update cni config: no valid networks found in /etc/cni/net.d
#发现报错plugin flannel does not support config version ,修改配置文件
vim /etc/cni/net.d/10-flannel.conflist
//加上cni的版本号
//文件内容如下
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
#修改后,运行 systemctl daemon-reload
#再查看集群状态,发现master正常了,处于ready状态
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 24h v1.22.2
此处为止,k8s集群基本安装已完成,因为目前我暂时没有dashboard的需求,所以暂时没有安装,等有需求了我再回来更新哈哈
--------------------------更新
dashboard已更新,有需求的可以看另一篇文章