随手记一下在校园网环境下安装K8S和部署prometheus遇到的问题

随手记一下在校园网环境下安装K8S和部署prometheus遇到的问题

    • 配置
      • 常用命令
      • 记录一些问题以及找到的解决方法

配置

192.168.129.160 k8s1
192.168.129.162 k8s2
192.168.129.164 k8s3
192.168.129.165 k8s4
192.168.129.166 k8s5
每一台虚拟机都是centos 7.4系统,4G内存,2个CPU,60或80G硬盘,校园网8M宽带,还是不稳定的那种,不想连外网,所以用了好多源
主要安装教程是:

  1. https://blog.csdn.net/qq_40907977/article/details/103328864
  2. 作者:HoPGoldy
    链接:https://www.jianshu.com/p/f53650a85131
    来源:简书
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

非常感谢好心博主们的分享!!!

常用命令

systemctl stop firewalld && systemctl disable firewalld
docker info | grep Cgroup

查看日志

journalctl -xefu kubelet 
kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk'/dashboard-admin/{print $1}')
kubectl create -f manifests/setup && \ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done && \ kubectl create -f manifests/
kubectl get pod --all-namespaces

kubectl replace --force -f kube-state-metrics-rbac.yaml
kubectl replace --force -f kube-state-metrics-deployment.yaml
kubectl replace --force -f kube-state-metrics-service.yaml

用journalctl查看日志非常管用

journalctl -u kube-scheduler
journalctl -xefu kubelet
journalctl -u kube-apiserver
journalctl -u kubelet |tail
journalctl -xe

看系统日志

cat /var/log/messages
kubectl describe pod kubernetes-dashboard-849cd79b75-s2snt --namespace kube-system
kubectl logs -f pods/monitoring-influxdb-fc8f8d5cd-dbs7d -n kube-system
kubectl logs --tail 200 -f kube-apiserver -n kube-system |more
kubectl logs --tail 200 -f podname -n jenkins

用docker查看日志

docker logs c36c56e4cfa3 (容器id)

journalctl -u kubelet -f 查看集群日志

kubectl replace --force -f prometheus-configmap.yaml

记录一些问题以及找到的解决方法

  1. 关闭swap https://www.jianshu.com/p/6dae5c2c4dab

  2. cat >> /etc/hosts << EOF
    192.168.129.160 k8s1
    192.168.129.162 k8s2
    192.168.129.164 k8s3
    192.168.129.165 k8s4
    192.168.129.166 k8s5
    EOF

  3. 安装docker后一定要进行:
    修改或创建/etc/docker/daemon.json,加入下述内容:
    {
    “exec-opts”: [“native.cgroupdriver=systemd”]
    }
    重启docker:systemctl restart docker
    不然后面K8S初始化会"warming" cgroup的问题:[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

  4. 出错后若要重新初始化K8S,使用kubeadm命令,执行:kubeadm reset
    重新执行初始化:kubeadm init 。。。。

  5. 部署Kubernetes Master”时发生警告,“

[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.0-ce. Latest validated version: 18.09

”原来安装的docker版本太低,要更新,参考:https://zhuanlan.zhihu.com/p/28154147
6. CentOS修改主机名(hostname):需要修改两处:一处是/etc/sysconfig/network,另一处是/etc/hosts,只修改任一处会导致系统启动异常。
7. 在节点 安装kubeadm、kubelet:
有个warmming:

warning: /var/cache/yum/x86_64/7/kubernetes/packages/697ad1a31f01e90f44ad3f0c8fe06f32d7bdd3b227fcf705275d5ad241d52eb6-kubeadm-1.16.0-0.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 3e1ba8d5: NOKEY
Public key for 697ad1a31f01e90f44ad3f0c8fe06f32d7bdd3b227fcf705275d5ad241d52eb6-kubeadm-1.16.0-0.x86_64.rpm is not installed

与这条命令有关

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml

我是kubernetes-dashboard.yaml 有问题,
重新下载kubernetes-dashboard.yaml:https://codeload.github.com/liangbirui/dashboard/zip/v1.10.1
通过vim编辑yaml文件中默认的镜像源地址:这里替换为李振良老师的镜像地址。
containers:
- name: kubernetes-dashboard
#image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1
image: lizhenliang/kubernetes-dashboard-amd64:v1.10.1
参考https://www.cnblogs.com/imstrive/p/11480424.html

  1. error: unable to recognize "kubernetes-dashboard.yaml": no matches for kind "Deployment" in version "apps/v1beta2"
    把1beta删去就行了,估计是因为现在2020年版本问题
  2. 打开dashboard界面:
    swapoff -a $ 临时
    kubectl get pods,svc -n kube-system 查看k8s界面访问端口
    在火狐浏览器输入https://(IP):(端口)
    已经创建有serviceaccount ,名字为dashbord, kubectl get secret -n kube-system 查看这个serviceaccount 的命名
    kubectl describe secret dashboard-admin-token-nzkww -n kube-system 获取令牌
    输入令牌登录k8s界面
  3. echo 'KUBELET_EXTRA_ARGS=--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice' > /etc/sysconfig/kubelet
    报错:Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
    解决方法:sudo vim /etc/sysconfig/kubelet
    在DAEMON_ARGS字符串的末尾添加:
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

同时在vi /usr/lib/systemd/system/kubelet.service里面修改: ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
然后重启K8S和docker
11. 安装cadvisor:

 sudo docker run \   --volume=/:/rootfs:ro \  
    --volume=/var/run:/var/run:ro \   --volume=/sys:/sys:ro \   --volume=/var/lib/docker/:/var/lib/docker:ro \   --volume=/dev/disk/:/dev/disk:ro \   --publish=8080:8080 \   --detach=true \   --name=cadvisor \   google/cadvisor:latest
  1. 执行kubectl get nodes等命令时,所有的命令都会打印出错误:Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)
    解决:
    https://blog.csdn.net/woay2008/java/article/details/93250137
    https://blog.csdn.net/woay2008/article/details/93250137
  2. 这个教程不适合我的环境:https://www.jianshu.com/p/ac8853927528
  3. 重启K8S集群步骤:
    master:
    kubeadm reset命令清除集群所有的配置
    rm -rf $HOME/.kube命令删除这个目录
kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.16.0 --apiserver-advertise-address 192.168.129.160 --pod-network-cidr=10.244.0.0/16 --token-ttl 0
mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
 sudo chown $(id -u):$(id -g) $HOME/.kube/config

node:
kubeadm reset命令清除集群所有的配置
kubeadm join…
master:

kubectl apply -f kube-flannel.yml
kubectl get nodes

重启之后可以重新打开dashboard:
kubectl apply -f kubernetes-dashboard.yaml
kubectl create serviceaccount dashboard-admin -n kube-system
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk’/dashboard-admin/{print $1}’)
访问https://192.168.129.160:30001并将上面命令得到的令牌输入

  1. pod出现pending现象,报错:Warning FailedScheduling default-scheduler pod has unbound immediate PersistentVolumeClaims
    解决:https://www.kuboard.cn/learning/k8s-advanced/ts/application.html#pod%E4%B8%80%E7%9B%B4%E6%98%AFpending
  2. 参考这个https://blog.csdn.net/qq_40907977/article/details/103328864 部署失败
  3. 我在k8s-master上面部署了Kuboard, 参考https://kuboard.cn/learning/
    这玩意要部署calico网络,kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
    安装过程出问题https://www.cnblogs.com/cuishuai/p/9897006.html
  4. 遇到问题:prometheus-0这个pod一直处于pending状态,查看kubectl describe pod prometheus-0 -n kube-system得到下面信息:
    pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
    解决方法:https://www.jianshu.com/p/3de37b8dc416 的安装NFS 和 https://www.cnblogs.com/xiangsikai/p/11424245.html的三个yaml文件 结合
  5. kubectl replace --force -f xxxx.yaml 来强制替换Pod 的 API 对象,从而达到重启的目的
  6. https://www.kubernetes.org.cn/7189.html 使用kubeadm在Centos8上部署kubernetes1.18
  7. grafana配置prometheus数据源报错:Failed to update datasource
    解决:https://www.cnblogs.com/whych/p/10793709.html
    我用的方法是改了grafana.yaml,在里面加了版本号,然后kubectl replace --force -f grafana.yaml
  8. prometheus界面报错:Warning! Detected 28802.76 seconds time difference between your browser and the server. Prometheus relies on accurate time and time drift might cause unexpected query results.
    解决:https://www.cnblogs.com/director/p/12821016.html
    千万别改时区!!要虚拟机和服务器统一UTC!因为prometheus默认UTC!
  9. prometheus界面报错:No datapoints found. 国内百度查出资料提示,系统时间有误;各种尝试换时间、时区,还是没有解决该问题。使用google查询该问题,发现是没有运行相对应的export, 或没有使用Prometheus集成export,数据没有采集到。
    解决办法:
    引入相对应export,mysqld_exporter、node_exporter等,运行并使用Prometheus监控数据采集exporte
    我在prometheus-configmap.yaml中加了
- job_name: 'node'
      static_configs:
      - targets:
        - 192.168.129.160:9100
        - 192.168.129.162:9100
        - 192.168.129.164:9100
  1. 同步了服务器和虚拟机的时间之后,prometheus的target界面节点DOWN掉了,要重启所有prometheus 的pod
    即:
kubectl replace --force -f prometheus-configmap.yaml
kubectl replace --force -f prometheus-statefulset.yaml
kubectl replace --force -f prometheus-service.yaml

等几分钟就好了

  1. 用kubectl 查看日志
    注意:使用Kubelet describe 查看日志,一定要带上 命名空间,否则会报如下错误
[root@node2 ~]# kubectl describe pod coredns-6c65fc5cbb-8ntpv
Error from server (NotFound): pods "coredns-6c65fc5cbb-8ntpv" not found
  1. 只有在prometheus启动时添加–web.enable-lifecycle参数,才可以进行热加载重置配置文件
  2. node节点至少内存2g,CPU 2个
  3. master节点至少内存4G,CPU4个
  4. 执行join命令时出错:/proc/sys/net/ipv4/ip_forward contents are not set to 1
    解决方法:按提示设定为1
    echo “1” >/proc/sys/net/ipv4/ip_forward
    echo “1” >/proc/sys/net/bridge/bridge-nf-call-iptables
  5. 加新节点别忘了在prometheus-configmap.yaml 里面加上IP,还有改这个/etc/docker/daemon.json

你可能感兴趣的:(随手记一下在校园网环境下安装K8S和部署prometheus遇到的问题)