参考:前两步Ubuntu云原生环境安装,docker+k8s+kubeedge(亲测好用)_爱吃关东煮的博客-CSDN博客_ubantu部署kubeedge
配置节点gpu:
K8S调用GPU资源配置指南_思影影思的博客-CSDN博客_k8s 使用gpu
kubeadm reset
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
只在主节点运行,apiserver-advertise-address填写主节点ip
sudo kubeadm init \
--apiserver-advertise-address=192.168.1.117 \
--control-plane-endpoint=node4212 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--kubernetes-version v1.21.10 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
主节点完成后,子节点运行主节点完成后展示的join命令
curl https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml
sudo kubectl apply -f /dashbord.yaml
sudo kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard
将type: ClusterIP 改为 type: NodePort
# 找到端口,关闭对应防火墙
sudo kubectl get svc -A |grep kubernetes-dashboard
任意主机ip:31678为实际访问连接(https://192.168.1.109:31678/)
kubectl get pods --all-namespaces -o wide
#查看pod状态
kubectl describe pod kubernetes-dashboard-57c9bfc8c8-lmb67 --namespace kubernetes-dashboard
#打印log
kubectl logs nvidia-device-plugin-daemonset-xn7hx --namespace kube-system
kubectl apply -f /dashuser.yaml
kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"
查看本地镜像docker image
登陆docker账户
给docker打标签,左:本地名:tag 右hub用户名/仓库名:tag
docker tag deeplabv3plus:1.0.0 chenzishu/deepmodel:labv3
上传hub
docker push chenzishu/deepmodel:labv3
应用名随意,镜像地址填写docherhub上对应镜像地址(chenzishu/deepmodel:pytorch)
等待容器运行,需要时间
########
#pod启动后一直重启,并报Back-off restarting failed container
#找到对应的deployment添加
command: ["/bin/bash", "-ce", "tail -f /dev/null"]
########
找到容器:
kubectl get pods --all-namespaces -o wide
进入容器:
kubectl exec -it segnet-747b798bf5-4bjqk /bin/bash
查看容器中文件:
ls
pod无法启动、资源不足
#设置污点阈值
systemctl status -l kubelet
#文件路径
/etc/systemd/system/kubelet.service.d/
#放宽阈值
#修改配置文件增加传参数,添加此配置项 --eviction-hard=nodefs.available<3%
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --eviction-hard=nodefs.available<3%"
systemctl daemon-reload
systemctl restart kubelet
pod反复重启:
pod启动后一直重启,并报Back-off restarting failed container
找到对应的deployment
command: ["/bin/bash", "-ce", "tail -f /dev/null"]
spec:
containers:
- name: test-file
image: xxx:v1
command: ["/bin/bash", "-ce", "tail -f /dev/null"]
imagePullPolicy: IfNotPresent