作者:刘敬一
Tungsten Fabric(原名opencontrail),提供了可以与编排器(openstack/k8s/vCenter)协同工作的controller,和部署在计算节点/node上的vRouter受其管控,替代原有的linux-bridge/ovs进行通信。
前言
研究一款开源控制器,最好的方法就是先部署一套,怎么方便怎么来。
先去TF的GitHub,无论是tf-devstack还是tf-dev-env里面的run.sh,全都卡住。
[setup contrail git sources]
INFO: source env from /root/contrail/.env/tf-developer-sandbox.env
INFO: current folder is
100 2584 100 2584 0 0 934 0 0:00:02 0:00:02 --:--:-- 933
INFO: Download repo tool
-
找到微信公众号TF中文社区(CTFSDN),加微信,被拉入TF讨论群
- 经过群里的大佬吴sir和杨sir的指导,开始按照以下文章来部署
第一篇:部署准备与初始状态
第二篇:创建虚拟网络 请添加链接描述
第三篇:创建安全策略
第四篇:创建隔离命名空间
实操记录
初始准备
- 创建三台CentOS7.7的虚拟机
deployer 192.168.122.160
master01 192.168.122.96 <---内存至少8G
node01 192.168.122.250
#cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
基于aliyun的pip加速
- 各个节点设置pip加速
mkdir .pip && tee ~/.pip/pip.conf <<-'EOF'
[global]
trusted-host = mirrors.aliyun.com
index-url = https://mirrors.aliyun.com/pypi/simple
EOF
基于aliyun的docker镜像加速
- 网上教程很多,下面的加速地址用**隐去
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://********.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
一些源文件
- 很多需要的安装文件被放到了http://35.220.208.0/ 这个服务器上,可以根据实际链接来下发命令
mkdir pkg_python/
cd pkg_python/
wget http://35.220.208.0/packages_python/pip-19.3.1.tar.gz
easy_install pip-19.3.1.tar.gz
easy_install --upgrade --dry-run pip
wget http://35.220.208.0/packages_python/docker_compose-1.24.1-py2.py3-none-any.whl
pip2 install docker_compose-1.24.1-py2.py3-none-any.whl
mkdir /root/pkg_k8s
cd /root/pkg_k8s
wget http://35.220.208.0/k8s_v1.12.9/packages/auto_download.sh
chmod +x auto_download.sh
./auto_download.sh
- 遇到下面的错误,但是貌似没有什么影响
[root@localhost pkg_python]# easy_install --upgrade --dry-run pip
Searching for pip
Reading https://pypi.python.org/simple/pip/
Best match: pip 20.0.2
Downloading https://files.pythonhosted.org/packages/8e/76/66066b7bc71817238924c7e4b448abdb17eb0c92d645769c223f9ace478f/pip-20.0.2.tar.gz#sha256=7db0c8ea4c7ea51c8049640e8e6e7fde949de672bfa4949920675563a5a6967f
Processing pip-20.0.2.tar.gz
Writing /tmp/easy_install-bm8Ztx/pip-20.0.2/setup.cfg
Running pip-20.0.2/setup.py -n -q bdist_egg --dist-dir /tmp/easy_install-bm8Ztx/pip-20.0.2/egg-dist-tmp-32s9sn
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
warnings.warn(msg)
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
warnings.warn(msg)
warning: no files found matching 'docs/docutils.conf'
warning: no previously-included files found matching '.coveragerc'
warning: no previously-included files found matching '.mailmap'
warning: no previously-included files found matching '.appveyor.yml'
warning: no previously-included files found matching '.travis.yml'
warning: no previously-included files found matching '.readthedocs.yml'
warning: no previously-included files found matching '.pre-commit-config.yaml'
warning: no previously-included files found matching 'tox.ini'
warning: no previously-included files found matching 'noxfile.py'
warning: no files found matching 'Makefile' under directory 'docs'
warning: no files found matching '*.bat' under directory 'docs'
warning: no previously-included files found matching 'src/pip/_vendor/six'
warning: no previously-included files found matching 'src/pip/_vendor/six/moves'
warning: no previously-included files matching '*.pyi' found under directory 'src/pip/_vendor'
no previously-included directories found matching '.github'
no previously-included directories found matching '.azure-pipelines'
no previously-included directories found matching 'docs/build'
no previously-included directories found matching 'news'
no previously-included directories found matching 'tasks'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'tools'
warning: install_lib: 'build/lib' does not exist -- no Python modules to install
[root@localhost pkg_python]#
本地registry
- 本地运行registry容器,宿主机的80端口映射为容器的5000端口
[root@deployer ~]# docker run -d -p 80:5000 --restart=always --name registry registry:2
0c17a03ebdffe3cea98d7cec42c268c1117241f236f9f2443bbb1b77d34b0082
[root@deployer ~]#
[root@deployer ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c17a03ebdff registry:2 "/entrypoint.sh /etc…" About an hour ago Up About an hour 0.0.0.0:80->5000/tcp registry
[root@deployer ~]#
设置yaml文件
- 获取到contrail-ansible-deployer之后,进入文件夹,修改instances.yaml
[root@deployer inventory]# vim ../config/instances.yaml
provider_config:
bms:
ssh_pwd: Password
ssh_user: root
ssh_public_key: /root/.ssh/id_rsa.pub
ssh_private_key: /root/.ssh/id_rsa
domainsuffix: local
instances:
bms1:
provider: bms
roles:
config_database:
config:
control:
analytics_database:
analytics:
webui:
k8s_master:
kubemanager:
ip: 192.168.122.96
bms2:
provider: bms
roles:
vrouter:
k8s_node:
ip: 192.168.122.250
global_configuration:
CONTAINER_REGISTRY: hub.juniper.net
contrail_configuration:
CONTRAIL_VERSION: 1912-latest
- CONTAINER_REGISTRY替换为本地registry,contrail的版本设置为1912-last与后面拉取镜像retag保持一致
设置免密登录
- 需要设置从developer不输入密码就能登录本机/master01/node01
#ssh-keygen -t rsa
#ssh-copy-id -i ~/.ssh/id_rsa.pub root@master01
#ssh-copy-id -i ~/.ssh/id_rsa.pub root@node01
#ssh-copy-id -i ~/.ssh/id_rsa.pub root@node02
ansible
- deployer上执行ansible会有报错
/usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.3) or chardet (2.2.1) doesn't match a supported version!
RequestsDependencyWarning)
解决方法是
pip uninstall urllib3
pip uninstall chardet
pip install requests
拉取镜像
- k8s的镜像还好,有aliyun加速
- contrail的源hub.juniper.net是需要Juniper的账号,这个需要替换为opencontrailnightly
- 杨sir提供了脚本进行拉取和推送到本地registry,后续master/node就可以直接从deployer的registry拉取了
- 如果是用最新的contrail-ansible-deployer代码,还需要加上一个镜像:contrail-provisioner
- 但是执行之前,需要先将本地IP设置为insecure-registry,就可以基于http而不是https下载了
- 一种解决方法就是修改/etc/docker/daemon.json(如果没有就自己加)
[root@node01 ~]# cat /etc/docker/daemon.json
{
"insecure-registries": [ "hub.juniper.net","k8s.gcr.io" ]
}
[root@node01 ~]#
然后
[root@deployer ~]# systemctl daemon-reload
[root@deployer ~]# systemctl restart docker
脚本如下,已经修改为deployer的IP
#准备Kubernetes离线镜像,运行如下脚本
#!/bin/bash
#Author: Alex Yang
set -e
REPOSITORIE="gcr.azk8s.cn/google_containers"
LOCAL_REPO="192.168.122.160"
IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"
for img in $IMAGES
do
echo "===Pulling image: "$img
docker pull $REPOSITORIE/$img
echo "===Retag image ["$img"]"
docker tag $REPOSITORIE/$img $LOCAL_REPO/$img
echo "===Pushing image: "$LOCAL_REPO/$img
docker push $LOCAL_REPO/$img
docker rmi $REPOSITORIE/$img
done
#准备TungstenFabric离线镜像,运行如下脚本
#!/bin/bash
#Author: Alex Yang
set -e
REGISTRY_URL=opencontrailnightly
LOCAL_REGISTRY_URL=192.168.122.160
IMAGE_TAG=1912-latest
COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"
IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES
for image in $IMAGES
do
echo "===Pulling image: "$image
docker pull $REGISTRY_URL/$image:$IMAGE_TAG
echo "===Retag image ["$image"]"
docker tag $REGISTRY_URL/$image:$IMAGE_TAG $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
echo "===Pushing image: "$LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
docker push $LOCAL_REGISTRY_URL/$image:$IMAGE_TAG
docker rmi $REGISTRY_URL/$image:$IMAGE_TAG
done
- 查看镜像列表
[root@deployer ~]# docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu latest 72300a873c2c 3 weeks ago 64.2MB
registry 2 708bc6af7e5e 7 weeks ago 25.8MB
registry latest 708bc6af7e5e 7 weeks ago 25.8MB
192.168.122.160/contrail-vrouter-kernel-init 1912-latest 92e9cce315a5 3 months ago 581MB
192.168.122.160/contrail-vrouter-agent 1912-latest e8d9457d740e 3 months ago 729MB
192.168.122.160/contrail-status 1912-latest d2264c6741a5 3 months ago 513MB
192.168.122.160/contrail-nodemgr 1912-latest c3428aa7e9b7 3 months ago 523MB
192.168.122.160/contrail-node-init 1912-latest c846ff071cc8 3 months ago 506MB
192.168.122.160/contrail-kubernetes-kube-manager 1912-latest 983a6307731b 3 months ago 517MB
192.168.122.160/contrail-kubernetes-cni-init 1912-latest 45c88538c834 3 months ago 525MB
192.168.122.160/contrail-external-zookeeper 1912-latest 6937c72b866c 3 months ago 290MB
192.168.122.160/contrail-external-rsyslogd 1912-latest 812ba27a4e08 3 months ago 304MB
192.168.122.160/contrail-external-redis 1912-latest 3dc79f0b6eb9 3 months ago 129MB
192.168.122.160/contrail-external-rabbitmq 1912-latest a98ac91667b2 3 months ago 256MB
192.168.122.160/contrail-external-kafka 1912-latest 7b5a2ce6a656 3 months ago 665MB
192.168.122.160/contrail-external-cassandra 1912-latest 20109c39696c 3 months ago 545MB
192.168.122.160/contrail-controller-webui-web 1912-latest 44054aa131c5 3 months ago 552MB
192.168.122.160/contrail-controller-webui-job 1912-latest 946e2bbd7451 3 months ago 552MB
192.168.122.160/contrail-controller-control-named 1912-latest 81ef8223a519 3 months ago 575MB
192.168.122.160/contrail-controller-control-dns 1912-latest 15c1ce0cf26e 3 months ago 575MB
192.168.122.160/contrail-controller-control-control 1912-latest ec195cc75705 3 months ago 594MB
192.168.122.160/contrail-controller-config-svcmonitor 1912-latest 3d53781422be 3 months ago 673MB
192.168.122.160/contrail-controller-config-stats 1912-latest 46bc77cf1c87 3 months ago 506MB
192.168.122.160/contrail-controller-config-schema 1912-latest 75acb8ed961f 3 months ago 673MB
192.168.122.160/contrail-controller-config-dnsmasq 1912-latest dc2980441d51 3 months ago 506MB
192.168.122.160/contrail-controller-config-devicemgr 1912-latest c08868a27a0a 3 months ago 772MB
192.168.122.160/contrail-controller-config-api 1912-latest f39ca251b475 3 months ago 706MB
192.168.122.160/contrail-analytics-snmp-topology 1912-latest 5ee37cbbd034 3 months ago 588MB
192.168.122.160/contrail-analytics-snmp-collector 1912-latest 29ae502fb74f 3 months ago 588MB
192.168.122.160/contrail-analytics-query-engine 1912-latest b5f937d6b6e3 3 months ago 588MB
192.168.122.160/contrail-analytics-collector 1912-latest ee1bdbcc460a 3 months ago 588MB
192.168.122.160/contrail-analytics-api 1912-latest ac5c8f7cef89 3 months ago 588MB
192.168.122.160/contrail-analytics-alarm-gen 1912-latest e155b24a0735 3 months ago 588MB
192.168.10.10/kube-proxy v1.12.9 295526df163c 9 months ago 95.7MB
192.168.122.160/kube-proxy v1.12.9 295526df163c 9 months ago 95.7MB
192.168.122.160/kube-controller-manager v1.12.9 f473e8452c8e 9 months ago 164MB
192.168.122.160/kube-apiserver v1.12.9 8ea704c2d4a7 9 months ago 194MB
192.168.122.160/kube-scheduler v1.12.9 c79506ccc1bc 9 months ago 58.4MB
192.168.122.160/coredns 1.2.6 f59dcacceff4 16 months ago 40MB
192.168.122.160/etcd 3.2.24 3cab8e1b9802 18 months ago 220MB
192.168.122.160/coredns 1.2.2 367cdc8433a4 18 months ago 39.2MB
192.168.122.160/kubernetes-dashboard-amd64 v1.8.3 0c60bcf89900 2 years ago 102MB
192.168.122.160/pause 3.1 da86e6ba6ca1 2 years ago 742kB
[root@deployer ~]#
- 查看本地仓库中的image
[root@deployer ~]# curl -X GET http://localhost/v2/_catalog | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1080 100 1080 0 0 18298 0 --:--:-- --:--:-- --:--:-- 18620
{
"repositories": [
"contrail-analytics-alarm-gen",
"contrail-analytics-api",
"contrail-analytics-collector",
"contrail-analytics-query-engine",
"contrail-analytics-snmp-collector",
"contrail-analytics-snmp-topology",
"contrail-controller-config-api",
"contrail-controller-config-devicemgr",
"contrail-controller-config-dnsmasq",
"contrail-controller-config-schema",
"contrail-controller-config-stats",
"contrail-controller-config-svcmonitor",
"contrail-controller-control-control",
"contrail-controller-control-dns",
"contrail-controller-control-named",
"contrail-controller-webui-job",
"contrail-controller-webui-web",
"contrail-external-cassandra",
"contrail-external-kafka",
"contrail-external-rabbitmq",
"contrail-external-redis",
"contrail-external-rsyslogd",
"contrail-external-zookeeper",
"contrail-kubernetes-cni-init",
"contrail-kubernetes-kube-manager",
"contrail-node-init",
"contrail-nodemgr",
"contrail-status",
"contrail-vrouter-agent",
"contrail-vrouter-kernel-init",
"coredns",
"etcd",
"kube-apiserver",
"kube-controller-manager",
"kube-proxy",
"kube-scheduler",
"kubernetes-dashboard-amd64",
"pause"
]
}
[root@deployer ~]#
- 至于master01和node01,就可以直接从developer上拉取k8s/contrail的镜像了,速度杠杠的!(别忘了--insecure-registry=192.168.122.160)
#准备Kubernetes离线镜像,运行如下脚本
#!/bin/bash
#Author: Alex Yang
set -e
REPOSITORIE="k8s.gcr.io"
LOCAL_REPO="192.168.122.160"
IMAGES="kube-proxy:v1.12.9 kube-controller-manager:v1.12.9 kube-scheduler:v1.12.9 kube-apiserver:v1.12.9 coredns:1.2.2 coredns:1.2.6 pause:3.1 etcd:3.2.24 kubernetes-dashboard-amd64:v1.8.3"
for img in $IMAGES
do
echo "===Pulling image: "$img
docker pull $LOCAL_REPO/$img
echo "===Retag image ["$img"]"
docker tag $LOCAL_REPO/$img $REPOSITORIE/$img
docker rmi $LOCAL_REPO/$img
done
#准备TungstenFabric离线镜像,运行如下脚本
#!/bin/bash
#Author: Alex Yang
set -e
REPOSITORIE=hub.juniper.net
LOCAL_REPO="192.168.122.160"
IMAGE_TAG=1912-latest
COMMON_IMAGES="contrail-node-init contrail-status contrail-nodemgr contrail-external-cassandra contrail-external-zookeeper contrail-external-kafka contrail-external-redis contrail-external-rabbitmq contrail-external-rsyslogd"
ANALYTICS_IMAGES="contrail-analytics-query-engine contrail-analytics-api contrail-analytics-collector contrail-analytics-snmp-collector contrail-analytics-snmp-topology contrail-analytics-alarm-gen"
CONTROL_IMAGES="contrail-controller-control-control contrail-controller-control-dns contrail-controller-control-named contrail-controller-config-api contrail-controller-config-devicemgr contrail-controller-config-schema contrail-controller-config-svcmonitor contrail-controller-config-stats contrail-controller-config-dnsmasq"
WEBUI_IMAGES="contrail-controller-webui-job contrail-controller-webui-web"
K8S_IMAGES="contrail-kubernetes-kube-manager contrail-kubernetes-cni-init"
VROUTER_IMAGES="contrail-vrouter-kernel-init contrail-vrouter-agent"
IMAGES=$COMMON_IMAGES" "$ANALYTICS_IMAGES" "$CONTROL_IMAGES" "$WEBUI_IMAGES" "$K8S_IMAGES" "$VROUTER_IMAGES
for img in $IMAGES
do
echo "===Pulling image: "$img
docker pull $LOCAL_REPO/$img:$IMAGE_TAG
echo "===Retag image ["$img"]"
docker tag $LOCAL_REPO/$img:$IMAGE_TAG $REPOSITORIE/$img:$IMAGE_TAG
docker rmi $LOCAL_REPO/$img:$IMAGE_TAG
done
打开web
- developer上执行过
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_k8s.yml
ansible-playbook -e orchestrator=kubernetes -i inventory/ playbooks/install_contrail.yml
- web访问master01的8143端口,默认进入的是monitor页面
k8s状态
- node
[root@master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready master 6h4m v1.12.9
node01 Ready 6h3m v1.12.9
[root@master01 ~]#
[root@master01 ~]# kubectl get namespaces
NAME STATUS AGE
contrail Active 80m
default Active 6h20m
kube-public Active 6h20m
kube-system Active 6h20m
[root@master01 ~]#
- pods
[root@master01 ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-85c98899b4-4dzzx 0/1 ImagePullBackOff 0 6h2m
coredns-85c98899b4-w4bcs 0/1 ImagePullBackOff 0 6h2m
etcd-master01 1/1 Running 5 28m
kube-apiserver-master01 1/1 Running 4 28m
kube-controller-manager-master01 1/1 Running 5 28m
kube-proxy-dmmlh 1/1 Running 5 6h2m
kube-proxy-ph9gx 1/1 Running 1 6h2m
kube-scheduler-master01 1/1 Running 5 28m
kubernetes-dashboard-76456c6d4b-x5lz4 0/1 ImagePullBackOff 0 6h2m
继续排障
node01无法使用kubectrl命令
- 问题如下
[root@node01 ~]# kubectl get pods -n kube-system -o wide
The connection to the server localhost:8080 was refused - did you specify the right host or port?
- 解决方法参考这里
- https://blog.csdn.net/qq_24046745/article/details/94405188?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task
[root@node01 ~]# scp [email protected]:/etc/kubernetes/admin.conf /etc/kubernetes/admin.conf
[root@node01 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
[root@node01 ~]# source ~/.bash_profile
[root@node01 ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
coredns-85c98899b4-4dzzx 0/1 ImagePullBackOff 0 5h45m 10.47.255.252 node01
coredns-85c98899b4-w4bcs 0/1 ImagePullBackOff 0 5h45m 10.47.255.251 node01
etcd-master01 1/1 Running 3 11m 192.168.122.96 master01
kube-apiserver-master01 1/1 Running 3 11m 192.168.122.96 master01
kube-controller-manager-master01 1/1 Running 3 11m 192.168.122.96 master01
kube-proxy-dmmlh 1/1 Running 3 5h45m 192.168.122.96 master01
kube-proxy-ph9gx 1/1 Running 1 5h44m 192.168.122.250 node01
kube-scheduler-master01 1/1 Running 3 11m 192.168.122.96 master01
kubernetes-dashboard-76456c6d4b-x5lz4 0/1 ImagePullBackOff 0 5h44m 192.168.122.250 node01
[root@node01 ~]#
ImagePullBackOff 的问题
- 先看一下coredns的pod描述
[root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
Name: coredns-85c98899b4-4dzzx
Namespace: kube-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 75m (x281 over 4h40m) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
Warning FailedCreatePodSandBox 71m kubelet, node01 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1af3fb24d906d5f82ad3bdcf6d65be328302d3c596e63fc79ed0c134390b4753" network for pod "coredns-85c98899b4-4dzzx": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-4dzzx_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Failed HTTP Get operation. Return code 404
Normal SandboxChanged 70m (x3 over 71m) kubelet, node01 Pod sandbox changed, it will be killed and re-created.
Normal Pulling 70m (x3 over 70m) kubelet, node01 pulling image "k8s.gcr.io/coredns:1.2.6"
Warning Failed 70m (x3 over 70m) kubelet, node01 Failed to pull image "k8s.gcr.io/coredns:1.2.6": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 192.168.122.160:443: getsockopt: no route to host
Warning Failed 70m (x3 over 70m) kubelet, node01 Error: ErrImagePull
Warning Failed 6m52s (x282 over 70m) kubelet, node01 Error: ImagePullBackOff
Normal BackOff 103s (x305 over 70m) kubelet, node01 Back-off pulling image "k8s.gcr.io/coredns:1.2.6"
[root@master01 ~]#
- 看来是启动pod的时候,insecure-registry还没有设置,强制重启pod
[root@master01 ~]# kubectl get pod coredns-85c98899b4-4dzzx -n kube-system -o yaml | kubectl replace --force -f -
pod "coredns-85c98899b4-4dzzx" deleted
pod/coredns-85c98899b4-4dzzx replaced
[root@master01 ~]#
- 发现还没有up,继续查看
[root@master01 ~]# kubectl describe pod coredns-85c98899b4-4dzzx -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m29s default-scheduler Successfully assigned kube-system/coredns-85c98899b4-fnpd7 to master01
Warning FailedCreatePodSandBox 6m26s kubelet, master01 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin], failed to clean up sandbox container "3074c719934789cef519eeae16d2eca4e272fb6bda1b157cee1dbdf2f597a59f" network for pod "coredns-85c98899b4-fnpd7": NetworkPlugin cni failed to teardown pod "coredns-85c98899b4-fnpd7_kube-system" network: failed to find plugin "contrail-k8s-cni" in path [/opt/cni/bin]]
Normal SandboxChanged 76s (x25 over 6m25s) kubelet, master01 Pod sandbox changed, it will be killed and re-created.
- 缺少contrail-k8s-cni,从node01复制一个过来
[root@master01 ~]# scp [email protected]:opt/cni/bin/contrail-k8s-cni /opt/cni/bin/
- 再重建
[root@master01 ~]# kubectl get pod coredns-85c98899b4-fnpd7 -n kube-system -o yaml | kubectl replace --force -f -
pod "coredns-85c98899b4-fnpd7" deleted
pod/coredns-85c98899b4-fnpd7 replaced
[root@master01 ~]#
- 可惜重启之后还是有报错
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned kube-system/coredns-85c98899b4-8zq9h to master01
Warning FailedCreatePodSandBox 17m kubelet, master01 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ffe9745c42750850e44035ee6413bf573148759738fc6131ce970537e03a5d13" network for pod "coredns-85c98899b4-8zq9h": NetworkPlugin cni failed to set up pod "coredns-85c98899b4-8zq9h_kube-system" network: Failed in Poll VM-CFG. Error : Failed in PollVM. Error : Get http://127.0.0.1:9091/vm-cfg/9bf51269-675b-11ea-ac43-525400c1ec4f: dial tcp 127.0.0.1:9091: connect: connection refused
隔天kebectl的命令都不能用了
- 无论是在master01上还是在node01上
[root@master01 ~]# kubectl get nodes
The connection to the server 192.168.122.96:6443 was refused - did you specify the right host or port?
[root@master01 ~]#
- 多次重启kubelet没有用,虽然运行但是有报错
[root@master01 ~]# journalctl -xe -u kubelet
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.336303 28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.425393 28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://192.168.122.96:6443/api/v1/nodes?fieldSelector=metadata.name%3Dma
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.426388 28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://192.168.122.96:6443/api/v1/services?limit=500&resourceVersion=
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.436468 28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.536632 28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636848 28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.636961 28722 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.122.96:6443/api/v1/pods?fieldSelector=spec.nodeNam
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.737070 28722 kubelet.go:2236] node "master01" not found
3月 17 21:57:15 master01 kubelet[28722]: E0317 21:57:15.837781 28722 kubelet.go:2236] node "master01" not found
- 搜索发现有很多人也遇到了这个问题
- 据说可能是kube-apiserver没有启动造成的,但是当前环境无法启动kube-apiserver
[root@master01 ~]# systemctl start kube-apiserver
Failed to start kube-apiserver.service: Unit not found.
[root@master01 ~]#
调用北向接口
- 参考文档戳这里
- http://www.opencontrail.org/documentation/api/r5.0/#
- 例如最简单的获取virtual-networks列表(使用最简单用户名/密码认证方法)
[root@master01 ~]# curl -X GET -u "admin:contrail123" -H "Content-Type: application/json; charset=UTF-8" http://192.168.122.96:8082/virtual-networks
{"virtual-networks": [{"href": "http://192.168.122.96:8082/virtual-network/99c4144d-a7b7-4fb1-833e-887f21144320", "fq_name": ["default-domain", "default-project", "default-virtual-network"], "uuid": "99c4144d-a7b7-4fb1-833e-887f21144320"}, {"href": "http://192.168.122.96:8082/virtual-network/6e90abe8-91b6-48ad-99d2-fba6c9e29de4", "fq_name": ["default-domain", "k8s-default", "k8s-default-service-network"], "uuid": "6e90abe8-91b6-48ad-99d2-fba6c9e29de4"}, {"href": "http://192.168.122.96:8082/virtual-network/ab12e6dc-be52-407d-8f1d-37e6d29df0b1", "fq_name": ["default-domain", "default-project", "ip-fabric"], "uuid": "ab12e6dc-be52-407d-8f1d-37e6d29df0b1"}, {"href": "http://192.168.122.96:8082/virtual-network/915156f1-cec3-44eb-b15e-742452084d67", "fq_name": ["default-domain", "k8s-default", "k8s-default-pod-network"], "uuid": "915156f1-cec3-44eb-b15e-742452084d67"}, {"href": "http://192.168.122.96:8082/virtual-network/64a648ee-3ba6-4348-a543-07de6f225486", "fq_name": ["default-domain", "default-project", "dci-network"], "uuid": "64a648ee-3ba6-4348-a543-07de6f225486"}, {"href": "http://192.168.122.96:8082/virtual-network/82890bf9-a8e5-4c85-a32c-e307d9447a0a", "fq_name": ["default-domain", "default-project", "__link_local__"], "uuid": "82890bf9-a8e5-4c85-a32c-e307d9447a0a"}]}[root@master01 ~]#
[root@master01 ~]#
重新部署
- 下定决心,重新部署1-master/2-node的k8s场景,还是使用之前的deployer
- 记录
[root@deployer contrail-ansible-deployer]# cat install_k8s_3node.log
...
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
192.168.122.116 : ok=31 changed=15 unreachable=0 failed=0
192.168.122.146 : ok=23 changed=8 unreachable=0 failed=0
192.168.122.204 : ok=23 changed=8 unreachable=0 failed=0
localhost : ok=62 changed=4 unreachable=0 failed=0
[root@deployer contrail-ansible-deployer]# cat install_contrail_3node.log
...
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
192.168.122.116 : ok=76 changed=45 unreachable=0 failed=0
192.168.122.146 : ok=37 changed=17 unreachable=0 failed=0
192.168.122.204 : ok=37 changed=17 unreachable=0 failed=
0
localhost : ok=66 changed=4 unreachable=0 failed=0
发现新的master的状态是NotReady,查看状态
[root@master02 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 三 2020-03-18 16:04:35 +08; 32min ago
Docs: https://kubernetes.io/docs/
Main PID: 18801 (kubelet)
Tasks: 20
Memory: 60.3M
CGroup: /system.slice/kubelet.service
└─18801 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni
3月 18 16:36:51 master02 kubelet[18801]: W0318 16:36:51.929447 18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
3月 18 16:36:51 master02 kubelet[18801]: E0318 16:36:51.929572 18801 kubelet.go:2167] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready...fig uninitialized
3月 18 16:36:56 master02 kubelet[18801]: W0318 16:36:56.930736 18801 cni.go:188] Unable to update cni config: No networks found in /etc/cni/net.d
发现master上确实没有 /etc/cni/net.d这个目录,所以将node02的拷贝过来
[root@master02 ~]# mkdir -p /etc/cni/net.d/
[root@master02 ~]# scp [email protected]:/etc/cni/net.d/10-contrail.conf /etc/cni/net.d/10-contrail.conf
[root@master02 ~]# systemctl restart kubelet
问题解决
[root@master02 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready 35m v1.12.9
master02 Ready master 35m v1.12.9
node03 Ready 35m v1.12.9
[root@master02 ~]#
- 解决方法参考这里
- https://support.mozilla.org/en-US/kb/Certificate-contains-the-same-serial-number-as-another-certificate
- pod状态正常了
[root@master02 ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
coredns-85c98899b4-4vgk4 1/1 Running 0 69m 10.47.255.252 node03
coredns-85c98899b4-thpz6 1/1 Running 0 69m 10.47.255.251 localhost.localdomain
etcd-master02 1/1 Running 0 55m 192.168.122.116 master02
kube-apiserver-master02 1/1 Running 0 55m 192.168.122.116 master02
kube-controller-manager-master02 1/1 Running 0 55m 192.168.122.116 master02
kube-proxy-6sp2n 1/1 Running 0 69m 192.168.122.116 master02
kube-proxy-8gpgd 1/1 Running 0 69m 192.168.122.204 node03
kube-proxy-wtvhd 1/1 Running 0 69m 192.168.122.146 localhost.localdomain
kube-scheduler-master02 1/1 Running 0 55m 192.168.122.116 master02
kubernetes-dashboard-76456c6d4b-9s6vc 1/1 Running 0 69m 192.168.122.204 node03
[root@master02 ~]#