此为安装部署单Master集群,如需高可用Master集群,请一并参考《【K8S 五】使用kubeadm工具快速部署Kubernetes集群(Master高可用集群)》
目录
安装前配置
依赖包安装
kube-proxy开启ipvs的前置条件 (ALL NODE)
关闭防火墙
安装 Docker(ALL NODE)
安装 Kubeadm(ALL NODE)
初始化主节点(MASTER NODE)
安装网络-Flannel(MASTER NODE)
Work Node加入集群(WORK NODE)
Token过期后加入集群
通过kubernetes调度启动容器
卸载node
FAQ
角色 | IP地址 |
Master节点 | 192.168.11.8 |
Node节点 | 192.168.11.10 |
yum -y install iproute-tc conntrack ipset ipvsadm
modprobe br_netfilter ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack_ipv4
注意:Kernel 4.18.0-240及以上版本,没有nf_conntrack_ipv4,使用modprobe nf_conntrack
modprobe nf_conntrack_ipv4
modprobe br_netfilter
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
不关闭防火墙,需要将比较多的端口添加到例外,否则会出现比较多的坑,需要你填。例如:
kubeadm init时的警告:
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
systemctl stop firewalld systemctl disable firewalld
修改/etc/hosts
将所有节点上hostname和IP地址对应关系填入到/etc/hosts,否则在kubeadm init的preflight阶段会有警告:
[WARNING Hostname]: hostname "rhel-8-11.8" could not be reached
[WARNING Hostname]: hostname "rhel-8-11.8": lookup rhel-8-11.8 on 223.5.5.5:53: no such host
此外还有:关闭swap、关闭selinux、设置时区和时间同步、给docker挂载单独分区、kubernetes内核优化 。
可根据实际情况,酌情进行配置。
yum -y install yum-utils yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo yum clean all yum makecache yum list docker-ce --showduplicates | sort -r
或
yum list |grep docker-ce yum -y install docker-ce
systemctl start docker.service # Docker 在默认情况下使用的 Cgroup Driver 为 cgroupfs,而 Kubernetes 其实推荐使用 systemd 来代替 cgroupfs。此时不启动Docker,则需手动创建/etc/docker目录。
vi /etc/docker/daemon.json
{
"registry-mirrors": ["https://cn7-k8s-min-51.com"],
"insecure-registries": ["cn7-k8s-min-51.com"],
"ipv6": true,
"fixed-cidr-v6": "fd00:db8:1::/48",
"experimental": true,
"ip6tables": true,
"log-level":"warn",
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"default-address-pools" : [
{
"base" : "172.103.0.0/16",
"size" : 24
}
]
}
systemctl reload docker.service
systemctl enable docker.service
虽然镜像源是EL7的,但是EL8可用。
vi /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
yum makecache yum -y install kubelet kubeadm kubectl --disableexcludes=kubernetes
## 因为kubeadm依赖kubectl,所以在work node,同样只能一起安装kubectl,即便如下命令同样会将kubectl安装到worknode。
yum -y install kubelet kubeadm --disableexcludes=kubernetes systemctl enable kubelet
kubeadm config print init-defaults > kubeadm-init.default.yaml
# kubeadm config print init-defaults > kubeadm-init.default.yaml
# 以下配置限v1.19.0版本,更高版本只能作为参考;关于各组件的配置可通过以下命令查看,这里以KubeProxyConfiguration为例:
kubeadm config print init-defaults --component-configs KubeProxyConfiguration > kubeadm-config.yaml 查看。
#修改初始化配置文件apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: liuyll.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.11.8
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: master-01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.20.0
networking:
dnsDomain: cluster.local
podSubnet: 172.254.0.0/16
serviceSubnet: 10.254.0.0/16
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
iptables:
masqueradeAll: false
ipvs:
minSyncPeriod: 0s
scheduler: "rr"
kind: KubeProxyConfiguration
mode: "ipvs"
kubeadm config images pull --config=kubeadm-init.default.yaml
kubeadm init --config=kubeadm-init.default.yaml |tee kubeadm-init.log
# kubeadm config images pull --config=kubeadm-init.default.yaml (可提前下载images,以节省初始化master节点时间,也可直接执行下面的命令)
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.24.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.24.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.24.0
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.7
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.3-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.8.6注:安装docker或者containerd之后,默认在/etc/containerd/config.toml禁用了CRI,需要注释掉disabled_plugins = ["cri"],否则执行kubeadm进行部署时会报错:
output: E0614 09:35:20.617482 19686 remote_image.go:218] "PullImage from image service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService" image="registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.0"
time="2022-06-14T09:35:20+08:00" level=fatal msg="pulling image: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService"# kubeadm init --config=kubeadm-init.default.yaml |tee kubeadm-init.log
W1110 16:12:18.114927 84873 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local rhel-8-11.8] and IPs [10.254.0.1 192.168.11.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost rhel-8-11.8] and IPs [192.168.11.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost rhel-8-11.8] and IPs [192.168.11.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 17.003026 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node rhel-8-11.8 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node rhel-8-11.8 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/configYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.11.8:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:c6f619461d7aaa470249e11520888cbc22110bca8ed8b2bb0f2b626164d685d4
此时:
执行kubectl get cs,scheduler和controller-manager为Unhealthy状态! --- --- 禁用了--port
执行kubectl get node,node是NotReady状态 ! --- --- 没有安装网络
执行kubectl get po,pod中coredns是Pending状态! --- --- 没有安装网络
解决:scheduler和controller-manager默认禁用了--port(非安全端口),注释掉再次执行kubectl get cs命令scheduler和controller-manager状态正常,可以不处理,不影响使用。
# - --port=0
kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
kubectl get node
NAME STATUS ROLES AGE VERSION
kubeadmin-node Ready 51m v1.19.3
master-01 Ready master 60m v1.19.3
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
修改:
net-conf.json: |
{
"Network": "172.254.0.0/16",
"Backend": {
"Type": "vxlan",
"Directrouting": true
}
}
kubectl apply -f kube-flannel.yml
kubectl get po -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6d56c8448f-7lsmz 1/1 Running 0 60m 172.254.0.3 rhel-8-11.8
coredns-6d56c8448f-fk2pz 1/1 Running 0 60m 172.254.0.2 rhel-8-11.8
etcd-rhel-8-11.8 1/1 Running 0 61m 192.168.11.8 rhel-8-11.8
kube-apiserver-rhel-8-11.8 1/1 Running 0 61m 192.168.11.8 rhel-8-11.8
kube-controller-manager-rhel-8-11.8 1/1 Running 1 56m 192.168.11.8 rhel-8-11.8
kube-flannel-ds-6zrvq 1/1 Running 0 52m 192.168.11.10 kubeadmin-node
kube-flannel-ds-jvskr 1/1 Running 0 54m 192.168.11.8 rhel-8-11.8
kube-proxy-f994g 1/1 Running 0 60m 192.168.11.8 rhel-8-11.8
kube-proxy-qwsgl 1/1 Running 0 52m 192.168.11.10 kubeadmin-node
kube-scheduler-rhel-8-11.8 1/1 Running 0 56m 192.168.11.8 rhel-8-11.8
在Work Node,执行到“安装 Kubeadm(ALL NODE)”,然后执行下面的步骤加入到kubernetes集群。
# kubeadm config print join-defaults > kubeadm-join.default.yaml
将kube-apiserver 追加到/etc/hosts/中master IP地址后面,或者这里改成MasterIP地址或主机名。
# cat kubeadm-join.default.yaml
apiVersion: kubeadm.k8s.io/v1beta2
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
bootstrapToken:
apiServerEndpoint: kube-apiserver:6443
token: abcdef.0123456789abcdef
unsafeSkipCAVerification: true
timeout: 5m0s
tlsBootstrapToken: abcdef.0123456789abcdef
kind: JoinConfiguration
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-02
taints: null
# kubeadm join --config kubeadm-join.default.yaml |tee kubeadm-join.log
或直接使用命令参数执行:
# kubeadm join 192.168.35.8:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2c5ea47a17346ee5265ccdf6a5c7d3df8916c0e2ca94df3836935ea17d8aef4a
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
Token过期执行kubeadm join将无法加入到Kubernetes集群。执行下面的命令验证Token是否过期:
# kubectl get configmap cluster-info --namespace=kube-public -o yaml
apiVersion: v1
data:
jws-kubeconfig-gdtrlm: eyJhbGciOiJIUzI1NiIsImtpZCI6ImdkdHJsbSJ9..IK_YSH-y_oSd1jdW8kPOl1Pe-5AQeJ7aTptqDEjWNq0
kubeconfig: |
apiVersion: v1
如果Token过期,上面命令回显将没有红色字体行。执行# kubeadm token list查看token也是空的。重新创建新Token:kubeadm token create --print-join-command --ttl 0
--print-join-command 直接打印kubeadm join命令,执行该命令即可成功加入集群。
--ttl=0 表示生成的token永不失效,如果不指定,则ttl默认24小时。
kubectl apply -f mysql-rc.yaml
kubectl apply -f mysql-svc.yaml
kubectl apply -f tomcat-deploy.yaml
kubectl apply -f tomcat-svc.yaml
kubectl get po
NAME READY STATUS RESTARTS AGE
mysql-zq2l5 1/1 Running 0 45m
myweb-594787496-jjp2l 1/1 Running 0 43m
myweb-594787496-xhr72 1/1 Running 0 43m
YAML配置文件:
# mysql-rc.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: mysql
spec:
replicas: 1
selector:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.6.43
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: "123456"
# mysql-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
selector:
app: mysql
ports:
- port: 3306
# tomcat-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myweb
spec:
replicas: 2
selector:
matchLabels:
app: myweb
template:
metadata:
labels:
app: myweb
spec:
containers:
- name: myweb
image: kubeguide/tomcat-app:v1
ports:
- containerPort: 8080
env:
- name: MYSQL_SERVICE_HOST
value: 'mysql'
volumeMounts:
- name: shared
mountPath: /mydata-data
volumes:
- name: shared
hostPath:
path: "/tmp/"
# tomcat-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: myweb
spec:
type: NodePort
ports:
- port: 8080
nodePort: 30080
selector:
app: myweb
kubeadm reset && rm -rf /etc/cni/net.d && ipvsadm --clear && rm -rf $HOME/.kube && rm -rf /etc/kubernetes/*
问题:api-server启动--anonymous-auth=false,会导致频繁重启:
Liveness: http-get https://192.168.11.8:6443/livez delay=10s timeout=15s period=10s #success=1 #failure=8
Readiness: http-get https://192.168.11.8:6443/readyz delay=0s timeout=15s period=1s #success=1 #failure=3
Startup: http-get https://192.168.11.8:6443/livez delay=10s timeout=15s period=10s #success=1 #failure=24解决:(未解决)
问题:work node上kubelet服务日志频繁打印错误:
kubelet[623018]: E1231 18:05:30.083549 623018 file_linux.go:60] Unable to read config path "/etc/kubernetes/manifests": path does not exist, ignoring解决:mkdir -p /etc/kubernetes/manifests
问题:初始化节点或者提前下载镜像(kubeadm config images pull --config=kubeadm-config.new.yaml ),命令执行后,报如下警告:error unmarshaling JSON: while decoding JSON: json: unknown field SupportIPVSProxyMode
解决:因为粘贴的KubeProxyConfiguration配置问题,虽然校验yaml格式都正确,但总是报错,在生成初始化kubeadm配置文件时,加上--component-configs KubeProxyConfiguration,然后在配置文件中进行定义修剪即可:
kubeadm config print init-defaults --component-configs KubeProxyConfiguration > kubeadm-kubeproxy.yaml
问题:执行kubeadm join加入kubernetes集群,卡在[preflight] Running pre-flight checks;增加-v5参数后,看见报错信息:[discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "liuyll", will try again。
解决:如上报错为token过期导致,默认token的有效期是24小时,过期之后如果还有节点加入kubernetes集群,需要创建新的token:# kubeadm token create --print-join-command
问题:kubectl get cs scheduler和controller-mananger组件Unhealthy
$ kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
解决:1、不需要解决,不影响使用;2、如果非要解决,进入到/etc/kubernetes/manifests/,修改相关组件的yaml文件,将--port=0注释掉即可。
问题:kubectl get node 发现Node状态为Unready。
到对应的Node上执行[failed to find plugin "flannel" in path [/opt/cni/bin] failed to find plugin "portmap" in path [/opt/cni/bin]]。查看/opt/下竟然空空如也。因/opt/cni下的所有文件来自于kubernetes-cni(可以通过rpm -ql kubernetes-cni查看),所以通过rpm -q kubernetes-cni查询软件包,诡异了:软件包还在,文件不在了。
怎么解决呢?可以重新安装kubernetes-cni,但是会报错:
Error:
Problem: problem with installed package kubelet-1.20.4-0.x86_64
- cannot install both kubelet-1.18.4-0.x86_64 and kubelet-1.20.4-0.x86_64
- cannot install both kubelet-1.20.4-0.x86_64 and kubelet-1.18.4-0.x86_64
除非是将Kubelet等全部卸载重装,代价太大了。
解决:可以从其他的Node上拷贝/opt/cni/*到有问题的Node即可。
如不提前安装Docker,则执行“kubeadm config print init-defaults”报警告:
W0511 10:54:30.386986 5061 kubelet.go:210] cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
kubeadm config images pull --config=kubeadm-init-default.yaml执行该命令报警告,则是因为追加的KubeProxyConfiguration配置中空格问题,删除后重新输入空格即可。
W0511 11:16:06.091946 5715 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "\u00a0 masqueradeAll"
问题:执行crictl images报错
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"
E0705 15:23:06.260611 13731 remote_image.go:121] "ListImages with filter from image service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService" filter="&ImageFilter{Image:&ImageSpec{Image:,Annotations:map[string]string{},},}"
FATA[0000] listing images: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService解决:
编辑crictl命令的配置文件,保存后再次执行crictl命令,问题解决。
vi /etc/crictl.yaml
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
问题:执行kubeadm config images pull --config=kubeadm-init.default.yaml报IPv6子网掩码是错误
podSubnet: Invalid value: "fa00::/112": the size of pod subnet with mask 112 is smaller than the size of node subnet with mask 64
To see the stack trace of this error execute with --v=5 or higher解决:
kind: ClusterConfiguration中追加controllerManager,自定义IPv4和IPv6的子网掩码位数(默认分别为:24、64)
controllerManager:
extraArgs:
"node-cidr-mask-size-ipv4": "24"
"node-cidr-mask-size-ipv6": "120"
问题:执行kubeadm init --config=kubeadm-init.default.yaml,卡死在如下阶段
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
定位:
看kubeadm init的日志看不到啥东西,通过journalctl -f -u containerd看容器引擎的日志,发现:
failed, error" error="failed to get sandbox image \"k8s.gcr.io/pause:3.6\": failed to pull image \"k8s.gcr.io/pause:3.6\": failed to pull and unpack image \"k8s.gcr.io/pause:3.6\": failed to resolve reference \"k8s.gcr.io/pause:3.6\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.6\": dial tcp 108.177.125.82:443: i/o timeout"
Jul 05 19:08:30 k8s-testing01-190 containerd[13788]: time="2022-07-05T19:08:30.696324518+08:00" level=info msg="trying next host" error="failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.6\": dial tcp 108.177.125.82:443: i/o timeout" host=k8s.gcr.io
(这里会无视kubelet的配置,如下指定了基础设置image并没用:--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.7;kubeadm启动control plane还是会使用k8s.gcr.io/pause:3.6)解决:
ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/pause:3.7 k8s.gcr.io/pause:3.6
问题:crictl pull insecure registry报https的443 refused(同docker pull)
E0707 10:29:23.733163 3389 remote_image.go:238] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"192.168.11.101/library/cni:v3.22.2\": failed to resolve reference \"192.168.11.101/library/cni:v3.22.2\": failed to do request: Head \"https://192.168.11.101/v2/library/cni/manifests/v3.22.2\": dial tcp 192.168.11.101:443: connect: connection refused" image="192.168.11.101/library/cni:v3.22.2"
FATA[0000] pulling image: rpc error: code = Unknown desc = failed to pull and unpack image "192.168.11.101/library/cni:v3.22.2": failed to resolve reference "192.168.11.101/library/cni:v3.22.2": failed to do request: Head "https://192.168.11.101/v2/library/cni/manifests/v3.22.2": dial tcp 192.168.11.101:443: connect: connection refused定位:
docker的pull insecure 镜像看,我们都知道在daemon.json中配置就好了,但是crictl命令咋整?却需要配置containerd的配置文件:/etc/containerd/config.toml
解决:
(通过containerd config default > /etc/containerd/config.toml获取默认配置)
vi /etc/containerd/config.toml
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""[plugins."io.containerd.grpc.v1.cri".registry.auths]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.headers]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."192.168.11.101"]
endpoint = ["http://192.168.11.101"]