云服务器安装k8s和kubesphere踩坑

首先是三台云服务器,腾讯云、百度云、京东云,配置4c4g、2c4g、2c4g,穷大学生只有这配置的服务器了

ubuntu20.04通过kubekey安装k8s和kubesphere

参考
在 Ubuntu 22.04 上安装 KubeSphere 实战教程
这篇文章是使用kubekey安装k8s和kubesphere,三台服务器都参与作为master节点,master可以选举,不容易嘎蛋,但是我没跑明白,etcd那块卡挺久的,弄好后我以为要装好了,但是后面还有一堆要安装还有坑,最后实在搞不动放弃了

坑1

00:02:04 CST [PullModule] Start to pull images on all nodes
00:02:04 CST message: [master]
downloading image: kubesphere/pause:3.7
00:02:04 CST message: [node2]
downloading image: kubesphere/pause:3.7
00:02:04 CST message: [node1]
downloading image: kubesphere/pause:3.7
00:02:04 CST message: [master]
pull image failed: Failed to exec command: sudo -E /bin/bash -c "env PATH=$PATH crictl pull kubesphere/pause:3.7"
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"
E0819 00:02:04.611117    9644 remote_image.go:218] "PullImage from image service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService" image="kubesphere/pause:3.7"
FATA[0000] pulling image: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService: Process exited with status 1
00:02:04 CST retry: [master]
00:02:07 CST message: [node1]
downloading image: kubesphere/kube-proxy:v1.24.2
00:02:09 CST message: [node2]
downloading image: kubesphere/kube-proxy:v1.24.2
00:02:09 CST message: [master]
downloading image: kubesphere/pause:3.7
00:02:09 CST message: [master]
pull image failed: Failed to exec command: sudo -E /bin/bash -c "env PATH=$PATH crictl pull kubesphere/pause:3.7"
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"
E0819 00:02:09.650287    9665 remote_image.go:218] "PullImage from image service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService" image="kubesphere/pause:3.7"
FATA[0000] pulling image: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService: Process exited with status 1
00:02:09 CST retry: [master]
00:02:14 CST message: [master]
downloading image: kubesphere/pause:3.7
00:02:14 CST message: [master]
pull image failed: Failed to exec command: sudo -E /bin/bash -c "env PATH=$PATH crictl pull kubesphere/pause:3.7"
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"
E0819 00:02:14.690706    9697 remote_image.go:218] "PullImage from image service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService" image="kubesphere/pause:3.7"
FATA[0000] pulling image: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.ImageService: Process exited with status 1

在配置文件里加入

cat > /etc/containerd/config.toml <<EOF
[plugins."io.containerd.grpc.v1.cri"]
systemd_cgroup = true
EOF

然后重启

systemctl restart containerd

坑2

error: code = Unknown desc = failed to pull and unpack image "docker.io/kubesphere/k8s-dns-node-cache:1.15.12": failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://registry-1.docker.io/v2/kubesphere/k8s-dns-node-cache/manifests/sha256:8e765f63b3a5b4832c484b4397f4932bd607713ec2bb3e639118bc164ab4a958": net/http: TLS handshake timeout: Process exited with status 1

改一下配置

crictl config runtime-endpoint /run/containerd/containerd.sock
# 或者选择修改配置文件也可,修改文件这个我没去验证
# vi /etc/crictl.yaml

k8s集群部署中etcd启动报错request sent was ignored (cluster ID mismatch: peer[c39bdec535db1fd5]=cdf818194e3a8c

下边的方法是清缓存的指令

k8s集群部署中etcd启动报错request sent was ignored (cluster ID mismatch: peer[c39bdec535db1fd5]=cdf818194e3a8c

云服务器主要需要把etcd的配置文件写好
有些需要改成本地网卡监听

kube@k8s-master-0:~/kubekey$ cat /etc/etcd.env
# Environment file for etcd v3.4.13
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://外网:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://外网:2380
ETCD_INITIAL_CLUSTER_STATE=existing
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://内网:2379,https://127.0.0.1:2379
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://内网:2380
ETCD_NAME=etcd-k8s-master-0
ETCD_PROXY=off
ETCD_ENABLE_V2=true
ETCD_INITIAL_CLUSTER=etcd-k8s-master-0=https://外网:2380,etcd-k8s-master-1=https://外网:2380,etcd-k8s-master-2=https://外网:2380
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=10000

# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0-key.pem
ETCD_CLIENT_CERT_AUTH=true

ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-k8s-master-0-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=True

# CLI settings
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY_FILE=/etc/ssl/etcd/ssl/admin-k8s-master-0-key.pem
ETCDCTL_CERT_FILE=/etc/ssl/etcd/ssl/admin-k8s-master-0.pem

kubekey配置

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: k8s-master-0, address: 外网, internalAddress: 外网, user: kube, password: ""}
  - {name: k8s-master-1, address: 外网, internalAddress: 外网, user: kube, privateKeyPath: "~/.ssh/id_ed25519"}
  - {name: k8s-master-2, address: 外网, internalAddress: 外网, user: kube, privateKeyPath: "~/.ssh/id_ed25519"}
  roleGroups:
    etcd:
    - k8s-master-0
    - k8s-master-1
    - k8s-master-2
    control-plane:
    - k8s-master-0
    - k8s-master-1
    - k8s-master-2
    worker:
    - k8s-master-0
    - k8s-master-1
    - k8s-master-2
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers
    internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  kubernetes:
    version: v1.25.5
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: containerd
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: ""
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: []
  addons: []



---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.3.2
spec:
  persistence:
    storageClass: ""
  authentication:
    jwtSecret: ""
  zone: ""
  local_registry: ""
  namespace_override: ""
  # dev_tag: ""
  etcd:
    monitoring: false
    endpointIps: localhost
    port: 2379
    tlsEnable: true
  common:
    core:
      console:
        enableMultiLogin: true
        port: 30880
        type: NodePort
    # apiserver:
    #  resources: {}
    # controllerManager:
    #  resources: {}
    redis:
      enabled: false
      volumeSize: 2Gi
    openldap:
      enabled: false
      volumeSize: 2Gi
    minio:
      volumeSize: 20Gi
    monitoring:
      # type: external
      endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
      GPUMonitoring:
        enabled: false
    gpu:
      kinds:
      - resourceName: "nvidia.com/gpu"
        resourceType: "GPU"
        default: true
    es:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      logMaxAge: 7
      elkPrefix: logstash
      basicAuth:
        enabled: false
        username: ""
        password: ""
      externalElasticsearchHost: ""
      externalElasticsearchPort: ""
  alerting:
    enabled: false
    # thanosruler:
    #   replicas: 1
    #   resources: {}
  auditing:
    enabled: false
    # operator:
    #   resources: {}
    # webhook:
    #   resources: {}
  devops:
    enabled: false
    # resources: {}
    jenkinsMemoryLim: 8Gi
    jenkinsMemoryReq: 4Gi
    jenkinsVolumeSize: 8Gi
  events:
    enabled: false
    # operator:
    #   resources: {}
    # exporter:
    #   resources: {}
    # ruler:
    #   enabled: true
    #   replicas: 2
    #   resources: {}
  logging:
    enabled: false
    logsidecar:
      enabled: true
      replicas: 2
      # resources: {}
  metrics_server:
    enabled: false
  monitoring:
    storageClass: ""
    node_exporter:
      port: 9100
      # resources: {}
    # kube_rbac_proxy:
    #   resources: {}
    # kube_state_metrics:
    #   resources: {}
    # prometheus:
    #   replicas: 1
    #   volumeSize: 20Gi
    #   resources: {}
    #   operator:
    #     resources: {}
    # alertmanager:
    #   replicas: 1
    #   resources: {}
    # notification_manager:
    #   resources: {}
    #   operator:
    #     resources: {}
    #   proxy:
    #     resources: {}
    gpu:
      nvidia_dcgm_exporter:
        enabled: false
        # resources: {}
  multicluster:
    clusterRole: none
  network:
    networkpolicy:
      enabled: false
    ippool:
      type: none
    topology:
      type: none
  openpitrix:
    store:
      enabled: false
  servicemesh:
    enabled: false
    istio:
      components:
        ingressGateways:
        - name: istio-ingressgateway
          enabled: false
        cni:
          enabled: false
  edgeruntime:
    enabled: false
    kubeedge:
      enabled: false
      cloudCore:
        cloudHub:
          advertiseAddress:
            - ""
        service:
          cloudhubNodePort: "30000"
          cloudhubQuicNodePort: "30001"
          cloudhubHttpsNodePort: "30002"
          cloudstreamNodePort: "30003"
          tunnelNodePort: "30004"
        # resources: {}
        # hostNetWork: false
      iptables-manager:
        enabled: true
        mode: "external"
        # resources: {}
      # edgeService:
      #   resources: {}
  terminal:
    timeout: 600

以上是ubuntu安装遇到的一些坑,使用ubuntu系统然后kubekey一键安装没装明白,以下换成centos7.9使用kubeadm安装

Centos7.9 使用kubeadm安装k8s

centos通过参考下边的文章安装成功,但是这种固定一个master节点,master服务器挂了k8s就挂了,但是安装简单,学习使用暂时先这样

首先参考这篇文章搭建的k8s

总体流程一览

主要流程如下:

  1. 准备云主机,升级CentOS系统到7.9
  2. 所有节点上安装Docker和Kubeadm,拉取相关镜像
  3. 在Master节点初始化集群,包括kubectl和部署CN容器网络插件
  4. 把Node节点加入k8s集群

可视化界面和私有镜像仓库请参考其他文章:
1.部署Dashboard Web 页面,可视化查看Kubernetes资源,看我下一篇文章:k8s dashboard安装
2.部署Harbor私有仓库,存放镜像资源(非必要,省略介绍)

下面,开始介绍各流程详细配置步骤。

环境准备
云主机
云服务器 区域 CentOS 节点类型 配置 公网IP 安装工具
腾讯云 上海三区 7.9 master01 2C4G 101.34.112.190 docker、kubeadm、kubelet、kubectl、flannel
同上 上海二区 7.9 node01 1C2G 81.68.126.69 同上
同上 上海二区 7.9 node02 1C2G 81.68.92.49 同上
CentOS升级
如果低于CentOS 7.9,请先升级:

$ yum update -y
$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

PS:请务必不要跳过这步,低版本的CentOS安装Kubeadm时很可能会失败。作者就是失败之后升级CentOS才成功了!!

所有节点CentOS设置
基础设置
PS:该步骤基于:CentOS Linux release 7.9.2009 (Core) ,7.2 - 7.6 版本好像会失败,如果中途不成功,请考虑升级更换CentOS版本!

可以创建 k8s-pre-install-centos.sh 脚本,一键设置:

$ vim k8s-pre-install-centos.sh

#!/bin/sh

function set_base(){
  # 关闭防火墙,PS:如果使用云服务器,还需要在云服务器的控制台中把防火墙关闭了或者允许所有端口。
  systemctl stop firewalld
  systemctl disable firewalld

  # 关闭SELinux,这样做的目的是:为了让容器能读取主机文件系统。
  setenforce 0

  # 永久关闭swap分区交换,kubeadm规定,一定要关闭
  swapoff -a
  sed -ri 's/.*swap.*/#&/' /etc/fstab

  # iptables配置
  cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

  cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

  # iptables生效参数
  sysctl --system
}

set_base

执行上述脚本:

$ chmod 777 k8s-pre-install-centos.sh && ./k8s-pre-install-centos.sh

主机名设置
修改主机名

master执行:

hostnamectl set-hostname master01 

节点1执行:

hostnamectl set-hostname node01

节点2执行:

hostnamectl set-hostname node02

修改hosts文件
每台机器上都要执行:

$ vim /etc/hosts 
101.34.112.190 master01 
81.68.126.69 node01 
81.68.92.49 node02

PS:请更换为自己的公网IP(注意是公网IP,不是内网)!

所有节点安装Docker
k8s支持 3种容器运行时,这里我们优先使用熟悉的 Docker 作为容器运行时。请确保CentOS 7以上,最新要求以 官方 为主。

安装yum仓库

$ sudo yum install -y yum-utils
$ sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo

安装docker
包括cli、engine、docker compose等

$ sudo yum install docker-ce-20.10.14-3.el7 docker-ce-cli-20.10.14-3.el7 containerd.io docker-compose-plugin

ps:本教程使用的是docker版本:20.10.14

配置Docker守护程序
尤其是使用 systemd 来管理容器的 cgroup,另外还要配置阿里云镜像源,加快拉取速度!

$ sudo mkdir /etc/docker
$ cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "registry-mirrors": ["https://6ijb8ubo.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

registry-mirrors:镜像加速。
cgroupdriver:使用systemd。
log-driver:使用json日志,大小为100m。
启动docker,并设置为开机启动

$ sudo systemctl enable docker
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
$ systemctl status docker # 确保是running状态

确认Cgroup Driver为systemd

$ docker info | grep "Cgroup Driver"
 Cgroup Driver: systemd

PS:因为k8s是默认systemd作为cgroup driver,如果Docker使用另外的驱动,则可能出现不稳定的情况。

所有节点安装kubeadm
为保证不过期,请最终以 官方文档 为主:

配置yum源(使用aliyun,google你知道的)

$ cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

$ yum makecache # 更新yum
安装kubeadm, kubelet, kubectl
$ sudo yum install -y kubelet-1.23.6 kubeadm-1.23.6 kubectl-1.23.6 --disableexcludes=kubernetes

PS:k8s升级很快,为了保证教程正确,请使用相同版本。

启动klubelet,并设置为开机启动
$ sudo systemctl start kubelet
$ sudo systemctl enable kubelet
PS:kubeadm 将使用 kubelet 服务以容器方式部署和启动 Kubemetes 的主要服务。

所有节点拉取Docker镜像
ps:该1.23.6版本,需要docker为20,超过的不保证成功
拉取Docker镜像

查看初始化需要的镜像

$ kubeadm config images list

k8s.gcr.io/kube-apiserver:v1.23.6
k8s.gcr.io/kube-controller-manager:v1.23.6
k8s.gcr.io/kube-scheduler:v1.23.6
k8s.gcr.io/kube-proxy:v1.23.6
k8s.gcr.io/pause:3.6
k8s.gcr.io/etcd:3.5.1-0
k8s.gcr.io/coredns/coredns:v1.8.6

替换k8s镜像源
k8s模式镜像仓库是 k8s.gcr.io,由于众所周知的原因是无法访问的。

故这里需要创建配置 kubeadm-config-image.yaml 替换成阿里云的源:

$ vim kubeadm-config-image.yaml

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
# 默认为k8s.gcr.io,但是网络不通,所以要替换为阿里云镜像
imageRepository: registry.aliyuncs.com/google_containers 

确认镜像仓库改变
再查看,发现镜像的地址变了,才能执行下一步:

$ kubeadm config images list --config kubeadm-config-image.yaml

registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.6
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.6
registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.6
registry.aliyuncs.com/google_containers/kube-proxy:v1.23.6
registry.aliyuncs.com/google_containers/pause:3.6
registry.aliyuncs.com/google_containers/etcd:3.5.1-0
registry.aliyuncs.com/google_containers/coredns:v1.8.6

拉取镜像

$ kubeadm config images pull --config kubeadm-config-image.yaml

在 所有机器 上执行,把这些镜像提前拉好。

Master节点初始化集群
生成默认配置 kubeadm-config.yaml
并更改下面几项:

$ kubeadm config print init-defaults > kubeadm-config.yaml

kubernetes-version:集群版本,上面安装的kubeadm版本必须小于等于这里的,可以查看这里:https://kubernetes.io/releases/。
pod-network-cidr:pod资源的网段,需与pod网络插件的值设置一致。通常,Flannel网络插件的默认为10.244.0.0/16,Calico插件的默认值为192.168.0.0/16;
api-server:使用Master作为api-server,所以就是master机器的IP地址。
image-repository:拉取镜像的镜像仓库,默认是k8s.gcr.io。
nodeRegistration.name:改成master01
最终如下:

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 101.34.112.190 # 指定master节点的IP地址(公网)
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  imagePullPolicy: IfNotPresent
  name: master01  # 改成master的主机名
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers  # 默认为k8s.gcr.io,但是网络不通,所以要替换为阿里云镜像
kind: ClusterConfiguration
kubernetesVersion: 1.23.6  # 指定kubernetes版本号,使用kubeadm config print init-defaults生成的即可
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16  # 指定pod网段,10.244.0.0/16用于匹配flannel默认网段
scheduler: {}

上面的配置等价于:

$ kubeadm init \
--kubernetes-version=v1.23.6 \
--image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=101.34.112.190 --ignore-preflight-errors=Swap

或者1核CPU Master初始化(–ignore-preflight-errors=NumCPU这个如果是1核的ECS服务器一定要添加,不然会报错,因为K8S要求最低核数是2核)::

$ kubeadm init \
--kubernetes-version=v1.23.6 \
--apiserver-advertise-address=101.34.112.190 
--ignore-preflight-errors=NumCPU \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16 \
--image-repository registry.aliyuncs.com/google_containers \
--v=6

PS:建议通过配置文件的方式来操作,命令行不直观。

检查环境

$ kubeadm init phase preflight --config=kubeadm-config.yaml 

这个命令会检查配置文件是否正确,以及系统环境是否支持kubeadm的安装。

初始化kubeadm集群
只需要在master上执行如下命令:

$ kubeadm init --config=kubeadm-config.yaml
1
PS:这里是最难的,作者卡在这里卡了一整天,查阅各种资料才解决,所以如果你也失败了,比较正常,这里是相比于内网部署k8s,公网最麻烦也是最难的点,这一步成功了,后面也没啥了。最终参考:https://blog.51cto.com/u_15152259/2690063解决

到这里,会有2种结果:

如果是内网,上面的docker版本,kubeadm版本没错的话,会成功,直接跳到4步骤。
如果在云服务器(腾讯云,阿里云)上,一定会失败(原因和办法在这里):
[kubeconfig] Using kubeconfig folder “/etc/kubernetes”
[kubeconfig] Writing “admin.conf” kubeconfig file
[kubeconfig] Writing “kubelet.conf” kubeconfig file
[kubeconfig] Writing “controller-manager.conf” kubeconfig file
[kubeconfig] Writing “scheduler.conf” kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file “/var/lib/kubelet/kubeadm-flags.env”
[kubelet-start] Writing kubelet configuration to file “/var/lib/kubelet/config.yaml”
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder “/etc/kubernetes/manifests”
[control-plane] Creating static Pod manifest for “kube-apiserver”
[control-plane] Creating static Pod manifest for “kube-controller-manager”
[control-plane] Creating static Pod manifest for “kube-scheduler”
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests”
// …
[kubelet-check] Initial timeout of 40s passed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
提示:请一定先执行上面的初始化(目的是为了生成k8s的配置文件,否则下面的步骤中你会找不到etcd.yaml),失败后再执行下面的步骤!!

云服务器初始化失败解决版本
1)编辑etcd配置文件
配置文件位置:/etc/kubernetes/manifests/etcd.yaml

- --listen-client-urls=https://127.0.0.1:2379,https://101.34.112.190:2379
- --listen-peer-urls=https://101.34.112.190:2380

1
2
改成

- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380

1
2
引用 在腾讯云安装K8S集群 :
此处"118.195.137.68"为腾讯云公网ip,要关注的是"–listen-client-urls"和"–listen-peer-urls"。需要把–listen-client-urls后面的公网IP删除,把–listen-peer-urls改成127.0.0.1:2380
原因是因为腾讯云只要选择VPC网络均是采用NAT方式将公网IP映射到私人网卡的,有兴趣的同学可以了解下NAT。这也就是为什么很多同事无法在腾讯云或阿里云上安装k8s集群的原因

2)手工停止已启动的进程

先停止kubelet

$ systemctl stop kubelet

把所有kube的进程杀掉

$ netstat -anp |grep kube
1
2
3
4
请注意,不要执行 kubeadm reset,先 systemctl stop kubelet ,然后手动通过 netstat -anp |grep kube 来找pid,再通过 kill -9 pid 强杀。否则又会生成错误的etcd配置文件,这里非常关键!!!

3)重新初始化,但是跳过etcd文件已经存在的检查:

重新启动kubelet

$ systemctl start kubelet

重新初始化,跳过配置文件生成环节,不要etcd的修改要被覆盖掉

$ kubeadm init --config=kubeadm-config.yaml --skip-phases=preflight,certs,kubeconfig,kubelet-start,control-plane,etcd
1
2
3
4
成功初始化
如果所有配置都正常,很快会输出下面的信息(秒级别)代表了成功,否则大概率是失败(由于网络超时等):

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown ( i d − u ) : (id -u): (idu):(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run “kubectl apply -f [podnetwork].yaml” with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.200:6443 --token abcdef.0123456789abcdef
–discovery-token-ca-cert-hash sha256:af2a6e096cb404da729ef3802e77482f0a8a579fa602d7c071ef5c5415aac748

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
保存上面输出的token和sha256值。
也就是下面这一段,这段命令主要是让node节点加入k8s集群:

kubeadm join 101.34.112.190:6443 --token abcdef.0123456789abcdef
–discovery-token-ca-cert-hash sha256:af2a6e096cb404da729ef3802e77482f0a8a579fa602d7c071ef5c5415aac748
1
2
常见错误
Initial timeout of 40s passed
可能1:检查镜像版本,可能是不匹配或者本地替换tag出错造成的了,或者是因为公网IP ETCD无法启动造成的。执行:journalctl -xeu kubelet 查看具体错误,或者时候 journalctl -f -u kubelet 查看初始化的实时输出,下次初始化之前执行 kubeadm reset 重置。
可能2:CentOS版本太低,推荐7.8以上。我在7.2和7.5都失败了,执行 yum update -y升级到7.9才成功。
证书忘记
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2> /dev/null | openssl dgst -sha256 -hex | sed ‘s/^.* //’
1
token忘记
kubeadm token list
1
配置kubectl(master)
准备配置文件
kubectl需经由API server认证及授权后方能执行相应的管理操作,kubeadm 部署的集群为其生成了一个具有管理员权限的认证配置文件 /etc/kubernetes/admin.conf,它可由 kubectl 通过默认的 “$HOME/.kube/config” 的路径进行加载。

拷贝配置文件到kubectl默认加载路径:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown ( i d − u ) : (id -u): (idu):(id -g) $HOME/.kube/config
1
2
3
使用kubectl查看集群信息
在Master节点上执行,输出集群信息:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 NotReady control-plane,master 15m v1.23.6

$ kubectl get cs
etcd-0 Healthy {“health”:“true”,“reason”:“”}
controller-manager Healthy ok
scheduler Healthy ok
1
2
3
4
5
6
7
8
这里STATUS是NotReady是因为还没有配置网络的原因,接下来会介绍。

安装CN网络(master)
$ curl https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml>>kube-flannel.yml
$ chmod 777 kube-flannel.yml
$ kubectl apply -f kube-flannel.yml
1
2
3
等待几分钟,再查看Master节点状态,由NotRead变成了Ready状态:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready control-plane,master 15m v1.23.6
1
2
3
允许master节点部署pod
此时k8s master节点安装完毕(为了不浪费云服务器资源,需要让master节点能部署pod,需要运行以下命令)。

查看调度策略
$ kubectl describe node|grep -E “Name:|Taints:”

Name: master01
Taints: node-role.kubernetes.io/master:NoSchedule
1
2
3
4
NoSchedule: 一定不能被调度
PreferNoSchedule: 尽量不要调度
NoExecute: 不仅不会调度, 还会驱逐Node上已有的Pod
更改master节点可被部署pod
$ kubectl taint nodes --all node-role.kubernetes.io/master-
1
查看是否生效
$ kubectl describe node|grep -E “Name:|Taints:”
Name: master01
Taints:
1
2
3
把Node节点加入集群
在node上执行上面kubeadm输出的命令(注意token和sha256值不同):

$ kubeadm join 192.168.1.200:6443 --token abcdef.0123456789abcdef
–discovery-token-ca-cert-hash sha256:af2a6e096cb404da729ef3802e77482f0a8a579fa602d7c071ef5c5415aac748

[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster…
[preflight] FYI: You can look at this config file with ‘kubectl -n kube-system get cm kubeadm-config -o yaml’
[kubelet-start] Writing kubelet configuration to file “/var/lib/kubelet/config.yaml”
[kubelet-start] Writing kubelet environment file with flags to file “/var/lib/kubelet/kubeadm-flags.env”
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap…

This node has joined the cluster:

  • Certificate signing request was sent to apiserver and a response was received.
  • The Kubelet was informed of the new secure connection details.

Run ‘kubectl get nodes’ on the control-plane to see this node join the cluster.

此时,在master执行以下命令,可以看到node节点已经加入成功了:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready control-plane,master 60m v1.23.6
node01 NotReady 54s v1.23.6
1
2
3
4
等待5分钟左右,node01的状态变成Ready。

另外一台Node节点机器,重复改步骤加入集群即可!

测试集群
创建个nginx Pod
在master节点运行以下命令:

$ kubectl run --image=nginx nginx-app --port=80
$ kubectl run --image=nginx nginx-app1 --port=81
1
2
然后再运行:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-app 0/1 ContainerCreating 0 18s

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-app 1/1 Running 0 26s

$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-app 1/1 Running 0 57s 10.244.1.2 node01
1
2
3
4
5
6
7
8
9
10
11
可以看到2个pod已经是运行状态,证明k8s集群成功安装

Dashboard可视化界面安装
请移步下一篇文章:k8s dashboard安装
————————————————
版权声明:本文为CSDN博主「Go和分布式IM」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/xmcy001122/article/details/127221661

搭建前记得还要做一下内网地址映射,k8s集群之间是通过内网地址访问通讯的

内网映射外网

所有节点都要执行,所有机器的内外网在每个节点上都要做映射

还有集群网路的问题,k8s集群之间是通过内网地址联系的,云服务器内网访问不通,通过iptables将内网和外网做一个映射kubectl把节点名称和外网ip对应起来

iptables -t nat -A OUTPUT -d 内网ip -j DNAT --to-destination 外网ip
iptables -t nat -A OUTPUT -d 内网ip -j DNAT --to-destination 外网ip
iptables -t nat -A OUTPUT -d 内网ip -j DNAT --to-destination 外网ip

kubectl annotate node k8s-master-0 flannel.alpha.coreos.com/public-ip-overwrite=节点外网ip --overwrite
kubectl annotate node k8s-master-1 flannel.alpha.coreos.com/public-ip-overwrite=节点外网ip --overwrite
kubectl annotate node k8s-master-2 flannel.alpha.coreos.com/public-ip-overwrite=节点外网ip --overwrite

然后搭建kubesphere

搭建之前先搭建nfs,挂载一个默认的存储盘
nfs搭建

然后按照官网搭建kubefphere
kubesphere搭建

下载kubesphere的时候记得不要按照网上的把cluster-configuration.yaml里面的false全部改为true,我被坑过,那时候不懂,里面是kubesphere的可插拔组件,服务器性能不行最好一个一个开看看,带不动的,我开了一个日志组件就把master干爆了

然后就是报错

在node节点上get 不到pod

k8s报错:The connection to the server localhost:8080 was refused
k8s的node节点使用kubectl命令时,如kubectl get pods --all-namespaces 出现如下错误:

[root@k8s-node239 ~]# kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决方案

我常用解决方法2

解决办法1:使用一个非 root 账户登录,然后运行下列命令:

sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

解决办法2:

  出现这个问题的原因是kubectl命令需要使用kubernetes-admin的身份来运行,在“kubeadm int”启动集群的步骤中就生成了“/etc/kubernetes/admin.conf”。

因此,解决方法如下,将主节点中的【/etc/kubernetes/admin.conf】文件拷贝到工作节点相同目录下:

#复制admin.conf,请在主节点服务器上执行此命令
scp /etc/kubernetes/admin.conf 172.16.2.202:/etc/kubernetes/admin.conf
scp /etc/kubernetes/admin.conf 172.16.2.203:/etc/kubernetes/admin.conf
然后分别在工作节点上配置环境变量:

#设置kubeconfig文件
export KUBECONFIG=/etc/kubernetes/admin.conf
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile

你可能感兴趣的:(云原生,k8s,腾讯云,百度云,京东云)