Tungsten Fabric实战:基于K8s的部署踩坑

Tungsten Fabric(原名opencontrail),提供了可以与编排器(openstack/k8s/vCenter)协同工作的controller,和部署在计算节点/node上的vRouter受其管控,替代原有的linux-bridge/ovs进行通信。
作者:刘敬一

前言

研究一款开源控制器,最好的方法就是先部署一套,怎么方便怎么来。

先去TF的GitHub,无论是tf-devstack还是tf-dev-env里面的run.sh,全都卡住。

[setup contrail git sources]
INFO: source env from /root/contrail/.env/tf-developer-sandbox.env
INFO: current folder is
100  2584  100  2584    0     0    934      0  0:00:02  0:00:02 --:--:--   933
INFO: Download repo tool
  1. 找到微信公众号TF中文社区(CTFSDN),加微信,被拉入TF讨论群
  2. 经过群里的大佬吴sir和杨sir的指导,开始按照以下文章来部署

第一篇:部署准备与初始状态
第二篇:创建虚拟网络
第三篇:创建安全策略
第四篇:创建隔离命名空间

实操记录

初始准备

  • 创建三台CentOS7.7的虚拟机
deployer 192.168.122.160
master01 192.168.122.96  <---内存至少8G
node01 192.168.122.250

#cat /etc/redhat-release 
CentOS Linux release 7.7.1908 (Core)

基于aliyun的pip加速

  • 各个节点设置pip加速
mkdir .pip && tee ~/.pip/pip.conf <<-'EOF'
[global]
trusted-host =  mirrors.aliyun.com
index-url = https://mirrors.aliyun.com/pypi/simple
EOF

基于aliyun的docker镜像加速

  • 网上教程很多,下面的加速地址用**隐去
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://********.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

一些源文件

  • 很多需要的安装文件被放到了http://35.220.208.0/ 这个服务器上,可以根据实际链接来下发命令
mkdir pkg_python/
cd pkg_python/
wget http://35.220.208.0/packages_python/pip-19.3.1.tar.gz
easy_install pip-19.3.1.tar.gz
easy_install --upgrade --dry-run pip
wget http://35.220.208.0/packages_python/docker_compose-1.24.1-py2.py3-none-any.whl
pip2 install docker_compose-1.24.1-py2.py3-none-any.whl

mkdir /root/pkg_k8s
cd /root/pkg_k8s
wget http://35.220.208.0/k8s_v1.12.9/packages/auto_download.sh
chmod +x auto_download.sh
./auto_download.sh
  • 遇到下面的错误,但是貌似没有什么影响
[root@localhost pkg_python]# easy_install --upgrade --dry-run pip
Searching for pip
Reading https://pypi.python.org/simple/pip/
Best match: pip 20.0.2
Downloading https://files.pythonhosted.org/packages/8e/76/66066b7bc71817238924c7e4b448abdb17eb0c92d645769c223f9ace478f/pip-20.0.2.tar.gz#sha256=7db0c8ea4c7ea51c8049640e8e6e7fde949de672bfa4949920675563a5a6967f
Processing pip-20.0.2.tar.gz
Writing /tmp/easy_install-bm8Ztx/pip-20.0.2/setup.cfg
Running pip-20.0.2/setup.py -n -q bdist_egg --dist-dir /tmp/easy_install-bm8Ztx/pip-20.0.2/egg-dist-tmp-32s9sn
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
  warnings.warn(msg)
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
warning: no files found matching 'docs/docutils.conf'
warning: no previously-included files found matching '.coveragerc'
warning: no previously-included files found matching '.mailmap'
warning: no previously-included files found matching '.appveyor.yml'
warning: no previously-included files found matching '.travis.yml'
warning: no previously-included files found matching '.readthedocs.yml'
warning: no previously-included files found matching '.pre-commit-config.yaml'
warning: no previously-included files found matching 'tox.ini'
warning: no previously-included files found matching 'noxfile.py'
warning: no files found matching 'Makefile' under directory 'docs'
warning: no files found matching '*.bat' under directory 'docs'
warning: no previously-included files found matching 'src/pip/_vendor/six'
warning: no previously-included files found matching 'src/pip/_vendor/six/moves'
warning: no previously-included files matching '*.pyi' found under directory 'src/pip/_vendor'
no previously-included directories found matching '.github'
no previously-included directories found matching '.azure-pipelines'
no previously-included directories found matching 'docs/build'
no previously-included directories found matching 'news'
no previously-included directories found matching 'tasks'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'tools'
warning: install_lib: 'build/lib' does not exist -- no Python modules to install

[root@localhost pkg_python]# 

本地registry

  • 本地运行registry容器,宿主机的80端口映射为容器的5000端口
[root@deployer ~]# docker run -d -p 80:5000 --restart=always --name registry registry:2
0c17a03ebdffe3cea98d7cec42c268c1117241f236f9f2443bbb1b77d34b0082
[root@deployer ~]# 
[root@deployer ~]# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                  NAMES
0c17a03ebdff        registry:2          "/entrypoint.sh /etc…"   About an hour ago   Up About an hour    0.0.0.0:80->5000/tcp   registry
[root@deployer ~]# 

设置yaml文件

  • 获取到contrail-ansible-deployer之后,进入文件夹,修改instances.yaml
[root@deployer inventory]# vim  ../config/instances.yaml

provider_config:
  bms:
   ssh_pwd: Password
   ssh_user: root
   ssh_public_key: /root/.ssh/id_rsa.pub
   ssh_private_key: /root/.ssh/id_rsa
   domainsuffix: local
instances:
  bms1:
    provider: bms
    roles:
      config_database:
      config:
      control:
      analytics_database:
      analytics:
      webui:
      k8s_master:
      kubemanager:
    ip: 192.168.122.96
  bms2:
    provider: bms
    roles:
      vrouter:
      k8s_node:
    ip: 192.168.122.250
global_configuration:
  CONTAINER_REGISTRY: hub.juniper.net
contrail_configuration:
  CONTRAIL_VERSION: 1912-latest
  • CONTAINER_REGISTRY替换为本地registry,contrail的版本设置为1912-last与后面拉取镜像retag保持一致

设置免密登录

  • 需要设置从developer不输入密码就能登录本机/master01/node01
#ssh-keygen -t rsa

#ssh-copy-id -i ~/.ssh/id_rsa.pub root@master01
#ssh-copy-id -i ~/.ssh/id_rsa.pub root@node01
#ssh-copy-id -i ~/.ssh/id_rsa.pub root@node02

ansible

  • deployer上执行ansible会有报错
/usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.3) or chardet (2.2.1) doesn't match a supported version!
  RequestsDependencyWarning)

解决方法是

pip uninstall urllib3    
pip uninstall chardet
pip install requests 

拉取镜像

  • k8s的镜像还好,有aliyun加速
  • contrail的源hub.juniper.net是需要Juniper的账号,这个需要替换为opencontrailnightly
  • 杨sir提供了脚本进行拉取和推送到本地registry,后续master/node就可以直接从deployer的registry拉取了
  • 如果是用最新的contrail-ansible-deployer代码,还需要加上一个镜像:contrail-provisioner
  • 但是执行之前,需要先将本地IP设置为insecure-registry,就可以基于http而不是https下载了
  • 一种解决方法就是修改/etc/docker/daemon.json(如果没有就自己加)
[root@node01 ~]# cat /etc/docker/daemon.json 
{
  "insecure-registries": [ "hub.juniper.net","k8s.gcr.io" ]
}
[root@node01 ~]# 

然后

[root@deployer ~]# systemctl daemon-reload
[root@deployer ~]# systemctl restart docker
  • 脚本如下,已经修改为deployer的IP
#准备Kubernetes离线镜像,运行如下脚本

#!/bin/bash
#Author: Alex Yang 

set -e

REPOSITORIE="gcr.azk8s.cn/google_containers"
LOCAL_REPO="192.168.122.160"
IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"

for img in $IMAGES
do
  echo "===Pulling image: "$img
  docker pull $REPOSITORIE/$img
  echo "===Retag image ["$img"]"
  docker tag $REPOSITORIE/$img $LOCAL_REPO/$img
  echo "===Pushing image: "$LOCAL_REPO/$img
  docker push $LOCAL_REPO/$img
  docker rmi $REPOSITORIE/$img
done

#准备TungstenFabric离线镜像,运行如下脚本

#!/bin/bash
#Author: Alex Yang 

set -e

REGISTRY_URL=opencontrailnightly
LOCAL_REGISTRY_URL=192.168.122.160
IMAGE_TAG=1912-latest
COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"

IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES

for image in $IMAGES
do
  echo "===Pulling image: "$image
  docker pull $REGISTRY_URL/$image:$IMAGE_TAG
  echo "===Retag image ["$image"]"
  docker tag $REGISTRY_URL/$image:$IMAGE_TAG $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
  echo "===Pushing image: "$LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
  docker push $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
  docker rmi $REGISTRY_URL/$image:$IMAGE_TAG
done
  • 查看镜像列表
[root@deployer ~]# docker image list


REPOSITORY                                              TAG                 IMAGE ID            CREATED             SIZE
ubuntu                                                  latest              72300a873c2c        3 weeks ago         64.2MB
registry                                                2                   708bc6af7e5e        7 weeks ago         25.8MB
registry                                                latest              708bc6af7e5e        7 weeks ago         25.8MB
192.168.122.160/contrail-vrouter-kernel-init            1912-latest         92e9cce315a5        3 months ago        581MB
192.168.122.160/contrail-vrouter-agent                  1912-latest         e8d9457d740e        3 months ago        729MB
192.168.122.160/contrail-status                         1912-latest         d2264c6741a5        3 months ago        513MB
192.168.122.160/contrail-nodemgr                        1912-latest         c3428aa7e9b7        3 months ago        523MB
192.168.122.160/contrail-node-init                      1912-latest         c846ff071cc8        3 months ago        506MB
192.168.122.160/contrail-kubernetes-kube-manager        1912-latest         983a6307731b        3 months ago        517MB
192.168.122.160/contrail-kubernetes-cni-init            1912-latest         45c88538c834        3 months ago        525MB
192.168.122.160/contrail-external-zookeeper             1912-latest         6937c72b866c        3 months ago        290MB
192.168.122.160/contrail-external-rsyslogd              1912-latest         812ba27a4e08        3 months ago        304MB
192.168.122.160/contrail-external-redis                 1912-latest         3dc79f0b6eb9        3 months ago        129MB
192.168.122.160/contrail-external-rabbitmq              1912-latest         a98ac91667b2        3 months ago        256MB
192.168.122.160/contrail-external-kafka                 1912-latest         7b5a2ce6a656        3 months ago        665MB
192.168.122.160/contrail-external-cassandra             1912-latest         20109c39696c        3 months ago        545MB
192.168.122.160/contrail-controller-webui-web           1912-latest         44054aa131c5        3 months ago        552MB
192.168.122.160/contrail-controller-webui-job           1912-latest         946e2bbd7451        3 months ago        552MB
192.168.122.160/contrail-controller-control-named       1912-latest         81ef8223a519        3 months ago        575MB
192.168.122.160/contrail-controller-control-dns         1912-latest         15c1ce0cf26e        3 months ago        575MB
192.168.122.160/contrail-controller-control-control     1912-latest         ec195cc75705        3 months ago        594MB
192.168.122.160/contrail-controller-config-svcmonitor   1912-latest         3d53781422be        3 months ago        673MB
192.168.122.160/contrail-controller-config-stats        1912-latest         46bc77cf1c87        3 months ago        506MB
192.168.122.160/contrail-controller-config-schema       1912-latest         75acb8ed961f        3 months ago        673MB
192.168.122.160/contrail-controller-config-dnsmasq      1912-latest         dc2980441d51        3 months ago        506MB
192.168.122.160/contrail-controller-config-devicemgr    1912-latest         c08868a27a0a        3 months ago        772MB
192.168.122.160/contrail-controller-config-api          1912-latest         f39ca251b475        3 months ago        706MB
192.168.122.160/contrail-analytics-snmp-topology        1912-latest         5ee37cbbd034        3 months ago        588MB
192.168.122.160/contrail-analytics-snmp-collector       1912-latest         29ae502fb74f        3 months ago        588MB
192.168.122.160/contrail-analytics-query-engine         1912-latest         b5f937d6b6e3        3 months ago        588MB
192.168.122.160/contrail-analytics-collector            1912-latest         ee1bdbcc460a        3 months ago        588MB
192.168.122.160/contrail-analytics-api                  1912-latest         ac5c8f7cef89        3 months ago        588MB
192.168.122.160/contrail-analytics-alarm-gen            1912-latest         e155b24a0735        3 months ago        588MB
192.168.10.10/kube-proxy                                v1.12.9             295526df163c        9 months ago        95.7MB
192.168.122.160/kube-proxy                              v1.12.9             295526df163c        9 months ago        95.7MB
192.168.122.160/kube-controller-manager                 v1.12.9             f473e8452c8e        9 months ago        164MB
192.168.122.160/kube-apiserver                          v1.12.9             8ea704c2d4a7        9 months ago        194MB
192.168.122.160/kube-scheduler                          v1.12.9             c79506ccc1bc        9 months ago        58.4MB
192.168.122.160/coredns                                 1.2.6               f59dcacceff4        16 months ago       40MB
192.168.122.160/etcd                                    3.2.24              3cab8e1b9802        18 months ago       220MB
192.168.122.160/coredns                                 1.2.2               367cdc8433a4        18 months ago       39.2MB
192.168.122.160/kubernetes-dashboard-amd64              v1.8.3              0c60bcf89900        2 years ago         102MB
192.168.122.160/pause                                   3.1                 da86e6ba6ca1        2 years ago         742kB
[root@deployer ~]# 
  • 查看本地仓库中的image
[root@deployer ~]# curl -X 


GET http://localhost/v2/_catalog | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1080  100  1080    0     0  18298      0 --:--:-- --:--:-- --:--:-- 18620
{
    "repositories": [
        "contrail-analytics-alarm-gen",
        "contrail-analytics-api",
        "contrail-analytics-collector",
        "contrail-analytics-query-engine",
        "contrail-analytics-snmp-collector",
        "contrail-analytics-snmp-topology",
        "contrail-controller-config-api",
        "contrail-controller-config-devicemgr",
        "contrail-controller-config-dnsmasq",
        "contrail-controller-config-schema",
        "contrail-controller-config-stats",
        "contrail-controller-config-svcmonitor",
        "contrail-controller-control-control",
        "contrail-controller-control-dns",
        "contrail-controller-control-named",
        "contrail-controller-webui-job",
        "contrail-controller-webui-web",
        "contrail-external-cassandra",
        "contrail-external-kafka",
        "contrail-external-rabbitmq",
        "contrail-external-redis",
        "contrail-external-rsyslogd",
        "contrail-external-zookeeper",
        "contrail-kubernetes-cni-init",
        "contrail-kubernetes-kube-manager",
        "contrail-node-init",
        "contrail-nodemgr",
        "contrail-status",
        "contrail-vrouter-agent",
        "contrail-vrouter-kernel-init",
        "coredns",
        "etcd",
        "kube-apiserver",
        "kube-controller-manager",
        "kube-proxy",
        "kube-scheduler",
        "kubernetes-dashboard-amd64",
        "pause"
    ]
}
[root@deployer ~]# 
  • 至于master01和node01,就可以直接从developer上拉取k8s/contrail的镜像了,速度杠杠的!(别忘了–insecure-registry=192.168.122.160)
#准备Kubernetes离线镜像,运行如下脚本
#!/bin/bash
#Author: Alex Yang 

set -e

REPOSITORIE="k8s.gcr.io"
LOCAL_REPO="192.168.122.160"
IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"

for img in $IMAGES
do
  echo "===Pulling image: "$img
  docker pull $LOCAL_REPO/$img
  echo "===Retag image ["$img"]"
  docker tag $LOCAL_REPO/$img $REPOSITORIE/$img
  docker rmi $LOCAL_REPO/$img
done

#准备TungstenFabric离线镜像,运行如下脚本

#!/bin/bash
#Author: Alex Yang 

set -e

REPOSITORIE=hub.juniper.net
LOCAL_REPO="192.168.122.160"
IMAGE_TAG=1912-latest
COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"

IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES

for img in $IMAGES
do
  echo "===Pulling image: "$img
  docker pull $LOCAL_REPO/$img:$IMAGE_TAG
  echo "===Retag image ["$img"]"
  docker tag $LOCAL_REPO/$img:$IMAGE_TAG $REPOSITORIE/$img:$IMAGE_TAG
  docker rmi $LOCAL_REPO/$img:$IMAGE_TAG
done

打开web

  • developer上执行过
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_k8s.yml
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_contrail.yml
  • web访问master01的8143端口,默认进入的是monitor页面
    Tungsten Fabric实战:基于K8s的部署踩坑_第1张图片
  • 用户名/密码:admin/contrail123,domain不需要填,总算看到WebUI了
    Tungsten Fabric实战:基于K8s的部署踩坑_第2张图片
  • 可以切换到config页面
    Tungsten Fabric实战:基于K8s的部署踩坑_第3张图片
    k8s状态
  • node
[root@master01 ~]# kubectl get nodes
NAME       STATUS   ROLES    AGE    VERSION
master01   Ready    master   6h4m   v1.12.9
node01     Ready    <none>   6h3m   v1.12.9
[root@master01 ~]# 
[root@master01 ~]# kubectl get namespaces
NAME          STATUS   AGE
contrail      Active   80m
default       Active   6h20m
kube-public   Active   6h20m
kube-system   Active   6h20m
[root@master01 ~]# 
  • pods
[root@master01 ~]# kubectl get pods -n kube-system 
NAME                                    READY   STATUS             RESTARTS   AGE
coredns-85c98899b4-4dzzx                0/1     ImagePullBackOff   0          6h2m
coredns-85c98899b4-w4bcs                0/1     ImagePullBackOff   0          6h2m
etcd-master01                           1/1     Running            5          28m
kube-apiserver-master01                 1/1     Running            4          28m
kube-controller-manager-master01        1/1     Running            5          28m
kube-proxy-dmmlh                        1/1     Running            5          6h2m
kube-proxy-ph9gx                        1/1     Running            1          6h2m
kube-scheduler-master01                 1/1     Running            5          28m
kubernetes-dashboard-76456c6d4b-x5lz4   0/1     ImagePullBackOff   0          6h2m

继续排障

node01无法使用kubectrl命令

  • 问题如下
[root@node01 ~]# kubectl get pods -n kube-system -o wide
The connection to the server localhost:8080 was refused - did you specify the right host or port?
  • 解决方法参考这里
[root@node01 ~]# scp [email protected]:/etc/kubernetes/admin.conf /etc/kubernetes/admin.conf
[root@node01 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
[root@node01 ~]# source ~/.bash_profile
[root@node01 ~]# kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS             RESTARTS   AGE     IP                NODE       NOMINATED NODE
coredns-85c98899b4-4dzzx                0/1     ImagePullBackOff   0          5h45m   10.47.255.252     node01     <none>
coredns-85c98899b4-w4bcs                0/1     ImagePullBackOff   0          5h45m   10.47.255.251     node01     <none>
etcd-master01                           1/1     Running            3          11m     192.168.122.96    master01   <none>
kube-apiserver-master01                 1/1     Running            3          11m     192.168.122.96    master01   <none>
kube-controller-manager-master01        1/1     Running            3          11m     192.168.122.96    master01   <none>
kube-proxy-dmmlh                        1/1     Running            3          5h45m   192.168.122.96    master01   <none>
kube-proxy-ph9gx                        1/1     Running            1          5h44m   192.168.122.250   node01     <none>
kube-scheduler-master01                 1/1     Running            3          11m     192.168.122.96    master01   <none>
kubernetes-dashboard-76456c6d4b-x5lz4   0/1     ImagePullBackOff   0          5h44m   192.168.122.250   node01     <none>
[root@node01 ~]# 

ImagePullBackOff 的问题

  • 先看一下coredns的pod描述
[root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
Name:               coredns-85c98899b4-4dzzx
Namespace:          kube-system
...
Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Warning  FailedScheduling        75m (x281 over 4h40m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
  Warning  FailedCreatePodSandBox  71m                    kubelet, node01    Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1af3fb24d906d5f82ad3bdcf6d65be328302d3c596e63fc79ed0c134390b4753" network for pod "coredns-85c98899b4-4dzzx": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-4dzzx_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Failed HTTP Get operation. Return code 404
  Normal   SandboxChanged          70m (x3 over 71m)      kubelet, node01    Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling                 70m (x3 over 70m)      kubelet, node01    pulling image "k8s.gcr.io/coredns:1.2.6"
  Warning  Failed                  70m (x3 over 70m)      kubelet, node01    Failed to pull image "k8s.gcr.io/coredns:1.2.6": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 192.168.122.160:443: getsockopt: no route to host
  Warning  Failed                  70m (x3 over 70m)      kubelet, node01    Error: ErrImagePull
  Warning  Failed                  6m52s (x282 over 70m)  kubelet, node01    Error: ImagePullBackOff
  Normal   BackOff                 103s (x305 over 70m)   kubelet, node01    Back-off pulling image "k8s.gcr.io/coredns:1.2.6"
[root@master01 ~]# 
  • 看来是启动pod的时候,insecure-registry还没有设置,强制重启pod
[root@master01 ~]# kubectl get pod coredns-85c98899b4-4dzzx -n kube-system -o yaml | kubectl replace --force -f -
pod "coredns-85c98899b4-4dzzx" deleted
pod/coredns-85c98899b4-4dzzx replaced
[root@master01 ~]# 
  • 发现还没有up,继续查看
[root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               6m29s                 default-scheduler  Successfully assigned kube-system/coredns-85c98899b4-fnpd7 to master01
  Warning  FailedCreatePodSandBox  6m26s                 kubelet, master01  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin], failed to clean up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to teardown pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin]]
  Normal   SandboxChanged          76s (x25 over 6m25s)  kubelet, master01  Pod sandbox changed, it will be killed and re-created.
  • 缺少contrail-k8s-cni,从node01复制一个过来
[root@master01 ~]# scp [email protected]:opt/cni/bin/contrail-k8s-cni /opt/cni/bin/
  • 再重建
[root@master01 ~]# kubectl get pod coredns-85c98899b4-fnpd7 -n kube-system -o yaml | kubectl replace --force -f -
pod "coredns-85c98899b4-fnpd7" deleted
pod/coredns-85c98899b4-fnpd7 replaced
[root@master01 ~]# 
  • 可惜重启之后还是有报错
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               18m                  default-scheduler  Successfully assigned kube-system/coredns-85c98899b4-8zq9h to master01
  Warning  FailedCreatePodSandBox  17m                  kubelet, master01  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ffe9745c42750850e44035ee6413bf573148759738fc6131ce970537e03a5d13" network for pod "coredns-85c98899b4-8zq9h": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-8zq9h_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Get http://127.0.0.1:9091/vm-cfg/9bf51269-675b-11ea-ac43-525400c1ec4f: dial tcp 127.0.0.1:9091: connect: connection refused

隔天kebectl的命令都不能用了

  • 无论是在master01上还是在node01上
[root@master01 ~]# kubectl get nodes
The connection to the server 192.168.122.96:6443 was refused - did you specify the right host or port?
[root@master01 ~]# 

多次重启kubelet没有用,虽然运行但是有报错

[root@master01 ~]# journalctl -xe -u kubelet
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.336303   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.425393   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://192.168.122.96:6443/api/v1/nodes?fieldSelector=metadata.name%3Dma
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.426388   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://192.168.122.96:6443/api/v1/services?limit=500&resourceVersion=
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.436468   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.536632   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636848   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636961   28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.122.96:6443/api/v1/pods?fieldSelector=spec.nodeNam
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.737070   28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.837781   28722 kubelet.go:2236] node "master01" not found
  • 搜索发现有很多人也遇到了这个问题
  • 据说可能是kube-apiserver没有启动造成的,但是当前环境无法启动kube-apiserver
[root@master01 ~]# systemctl start kube-apiserver
Failed to start kube-apiserver.service: Unit not found.
[root@master01 ~]# 

调用北向接口

  • 参考文档戳这里
  • 例如最简单的获取virtual-networks列表(使用最简单用户名/密码认证方法)
[root@master01 ~]# curl -X GET -u "admin:contrail123" -H "Content-Type: application/json; charset=UTF-8" http://192.168.122.96:8082/virtual-networks
{"virtual-networks": [{"href": "http://192.168.122.96:8082/virtual-network/99c4144d-a7b7-4fb1-833e-887f21144320", "fq_name": ["default-domain", "default-project", "default-virtual-network"], "uuid": "99c4144d-a7b7-4fb1-833e-887f21144320"}, {"href": "http://192.168.122.96:8082/virtual-network/6e90abe8-91b6-48ad-99d2-fba6c9e29de4", "fq_name": ["default-domain", "k8s-default", "k8s-default-service-network"], "uuid": "6e90abe8-91b6-48ad-99d2-fba6c9e29de4"}, {"href": "http://192.168.122.96:8082/virtual-network/ab12e6dc-be52-407d-8f1d-37e6d29df0b1", "fq_name": ["default-domain", "default-project", "ip-fabric"], "uuid": "ab12e6dc-be52-407d-8f1d-37e6d29df0b1"}, {"href": "http://192.168.122.96:8082/virtual-network/915156f1-cec3-44eb-b15e-742452084d67", "fq_name": ["default-domain", "k8s-default", "k8s-default-pod-network"], "uuid": "915156f1-cec3-44eb-b15e-742452084d67"}, {"href": "http://192.168.122.96:8082/virtual-network/64a648ee-3ba6-4348-a543-07de6f225486", "fq_name": ["default-domain", "default-project", "dci-network"], "uuid": "64a648ee-3ba6-4348-a543-07de6f225486"}, {"href": "http://192.168.122.96:8082/virtual-network/82890bf9-a8e5-4c85-a32c-e307d9447a0a", "fq_name": ["default-domain", "default-project", "__link_local__"], "uuid": "82890bf9-a8e5-4c85-a32c-e307d9447a0a"}]}[root@master01 ~]# 
[root@master01 ~]# 

重新部署

  • 下定决心,重新部署1-master/2-node的k8s场景,还是使用之前的deployer
  • 记录
[root@deployer contrail-ansible-deployer]# cat install_k8s_3node.log 
...
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
192.168.122.116            : ok=31   changed=15   unreachable=0    failed=0   
192.168.122.146            : ok=23   changed=8    unreachable=0    failed=0   
192.168.122.204            : ok=23   changed=8    unreachable=0    failed=0   
localhost                  : ok=62   changed=4    unreachable=0    failed=0  

[root@deployer contrail-ansible-deployer]# cat install_contrail_3node.log
...
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
192.168.122.116            : ok=76   changed=45   unreachable=0    failed=0   
192.168.122.146            : ok=37   changed=17   unreachable=0    failed=0   
192.168.122.204            : ok=37   changed=17   unreachable=0    failed=0   
localhost                  : ok=66   changed=4    unreachable=0    failed=0   

发现新的master的状态是NotReady,查看状态

[root@master02 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 三 2020-03-18 16:04:35 +08; 32min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 18801 (kubelet)
    Tasks: 20
   Memory: 60.3M
   CGroup: /system.slice/kubelet.service
           └─18801 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni

3月 18 16:36:51 master02 kubelet[18801]: W0318 16:36:51.929447   18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
3月 18 16:36:51 master02 kubelet[18801]: E0318 16:36:51.929572   18801 kubelet.go:2167] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready...fig uninitialized
3月 18 16:36:56 master02 kubelet[18801]: W0318 16:36:56.930736   18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
  • 发现master上确实没有 /etc/cni/net.d这个目录,所以将node02的拷贝过来
[root@master02 ~]# mkdir -p /etc/cni/net.d/
[root@master02 ~]# scp [email protected]:/etc/cni/net.d/10-contrail.conf /etc/cni/net.d/10-contrail.conf

[root@master02 ~]# systemctl restart kubelet

问题解决

[root@master02 ~]# kubectl get node
NAME                    STATUS   ROLES    AGE   VERSION
localhost.localdomain   Ready    <none>   35m   v1.12.9
master02                Ready    master   35m   v1.12.9
node03                  Ready    <none>   35m   v1.12.9
[root@master02 ~]# 
  • 如果用一个deployer部署两套环境,打开web的时候会提示
    Tungsten Fabric实战:基于K8s的部署踩坑_第4张图片
    解决方法参考这里
    Tungsten Fabric实战:基于K8s的部署踩坑_第5张图片
    pod状态正常了
[root@master02 ~]# kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP                NODE                    NOMINATED NODE
coredns-85c98899b4-4vgk4                1/1     Running   0          69m   10.47.255.252     node03                  <none>
coredns-85c98899b4-thpz6                1/1     Running   0          69m   10.47.255.251     localhost.localdomain   <none>
etcd-master02                           1/1     Running   0          55m   192.168.122.116   master02                <none>
kube-apiserver-master02                 1/1     Running   0          55m   192.168.122.116   master02                <none>
kube-controller-manager-master02        1/1     Running   0          55m   192.168.122.116   master02                <none>
kube-proxy-6sp2n                        1/1     Running   0          69m   192.168.122.116   master02                <none>
kube-proxy-8gpgd                        1/1     Running   0          69m   192.168.122.204   node03                  <none>
kube-proxy-wtvhd                        1/1     Running   0          69m   192.168.122.146   localhost.localdomain   <none>
kube-scheduler-master02                 1/1     Running   0          55m   192.168.122.116   master02                <none>
kubernetes-dashboard-76456c6d4b-9s6vc   1/1     Running   0          69m   192.168.122.204   node03                  <none>
[root@master02 ~]# 

Tungsten Fabric实战:基于K8s的部署踩坑_第6张图片Tungsten Fabric实战:基于K8s的部署踩坑_第7张图片

你可能感兴趣的:(Tungsten,Fabric中文社区)