基本需求和配置:
系统: ubuntu16.04
内存:64G
硬盘:40G
环境准备
IP | 系统 | 软件 | kubernetes角色 |
192.168.78.130 | ubuntu16.04.3 | docker18.09.5,私有仓库 | node |
192.168.78.131 | ubuntu16.04.3 | docker18.09.5 | master |
192.168.78.132 | ubuntu16.04.3 | docker18.09.5 | node |
192.168.78.133 | ubuntu16.04.3 | docker18.09.5 | node |
192.168.78.134 | ubuntu16.04.3 | docker18.09.5 | node |
创建openpai用户
新建用户
adduser openpai
passwd openpai
chmod u+w /etc/sudoers
给kube用户赋sudo权限
vim /etc/sudoers 添加kube用户
openpai ALL=(ALL) NOPASSWD: ALL
chmod u-w /etc/sudoers
安装docker
服务器没有网络,我将安装包及依赖下载下来之后上传至服务器,所需镜像都拉去下来,然后在也上传到服务器上
打包镜像
sudo docker save -o pylon.tar openpai/pylon:v0.14.0 # 以pylon镜像为例
加载镜像
sudo docker load -i pylon.tar
这里搭建了私有仓库,我们将镜像都上传至私有镜像仓库
sudo docker tag openpai/pylon:v0.14.0 192.168.78.130:5000/openpai/pylon:v0.14.0
sudo docker push 192.168.78.130:5000/openpai/pylon:v0.14.0
修改docker配置
# vim /etc/docker/daemon.json
{
"data-root": "/data/docker", # 修改docker存储路径
"log-opts": {
"max-size": "100m", "max-file": "10"
},
"log-driver": "json-file",
"insecure-registries": ["192.168.78.130:5000"] # 改为私有镜像仓库,
}
重启docker
sudo systemctl restart docker
运行dev-box
拉取镜像dev-box
sudo docker pull 192.168.78.130:5000/openpai/dev-box:v0.14.0
运行dev-box容器
sudo docker run -itd -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM -v /var/run/docker.sock:/var/run/docker.sock -v /pathConfiguration:/cluster-configuration -v /hadoop-binary:/hadoop-binary --pid=host --privileged=true --net=host --name=dev-box 192.168.78.130:5000/openpai/dev-box:v0.14.0
进入容器
sudo docker exec -it dev-box /bin/bash
配置k8s
配置quick-start.yaml文件
cd /pai
cp deployment/quick-start/quick-start-example.yaml deployment/quick-start/quick-start.yaml
vim deployment/quick-start/quick-start.yaml
machines:
- 192.168.78.131 # 第一台机器为k8s的master
- 192.168.78.130
- 192.168.78.132
- 192.168.78.133
- 192.168.78.134
ssh-username: openpai
ssh-password: openpai-123
修改kubernetes-configuration.yaml.template
vim deployment/quick-start/kubernetes-configuration.yaml.template
kubernetes:
# 服务器要配置dns /etc/resolv.conf,否则安装不成功
cluster-dns: {{ env["dns"] }}
load-balance-ip: {{ env["load-balance-ip"] }}
service-cluster-ip-range: {{ env["service-cluster-ip-range"] }}
storage-backend: etcd3
# 设置私有镜像仓库
docker-registry: 192.168.78.130:5000
# http://gcr.io/google_containers/hyperkube. Or the tag in your registry.
hyperkube-version: v1.9.9
# http://gcr.io/google_containers/etcd. Or the tag in your registry.
# If you are not familiar with etcd, please don't change it.
etcd-version: 3.2.17
# http://gcr.io/google_containers/kube-apiserver. Or the tag in your registry.
apiserver-version: v1.9.9
# http://gcr.io/google_containers/kube-scheduler. Or the tag in your registry.
kube-scheduler-version: v1.9.9
# http://gcr.io/google_containers/kube-controller-manager
kube-controller-manager-version: v1.9.9
# http://gcr.io/google_containers/kubernetes-dashboard-amd64
dashboard-version: v1.8.3
# 修改etcd安装目录
etcd-data-path: "/data/etcd"
修改services-configuration.yaml.template
vim deployment/quick-start/services-configuration.yaml.template
cluster:
docker-registry:
namespace: openpai
domain: 192.168.78.130:5000 # 设置私有仓库
# 设置openpai版本,openpai的dev-box镜像有点坑,镜像版本是v0.14.0,但是里边的代码是
# v0.13.0,启动该镜像时会通过git校验版本并更新代码,所以离线环境中运行之前一定要在
# 有网络的环境中运行dev-box容器后在打包上传至服务器
tag: v0.14.0
secret-name: pai-secret
rest-server:
# webui登录用户名
default-pai-admin-username: admin
# webui登录密码
default-pai-admin-password: admin-password
生成配置文件
python paictl.py config generate -i /pai/deployment/quick-start/quick-start.yaml -o ~/pai-config -f
启动K8S
启动k8s
python paictl.py cluster k8s-bootup -p ~/pai-config
启动结束之后访问 http://192.168.78.131:9090, 如果可以正常访问,那就证明没有什么问题
启动openpai
创建cluster-id
python paictl.py config push -p ~/pai-config -c ~/.kube/config
在命令行中输入:pai ,然后回车,这是可以在k8s管理界面查看是否创建成功http://192.168.78.131:9090/#!/configmap/default/pai-cluster-id?namespace=default,结果如下图
部署openpai
python paictl.py service start -c ~/.kube/config
这个操作要等待很长时间,中间出现类似“xxxxxxx is not ready yet. Please wait for a moment!” 这样的提示可以多等待一会。
等全部执行完成之后,这时访问http://192.168.78.131:9286,输入用户名和密码之后便可以使用了,
管理
openpai的管理命令:
python paictl.py service start -c ~/.kube/config # 启动openpai
python paictl.py service stop -c ~/.kube/config # 停止openpai
python paictl.py service delete -c ~/.kube/config # 删除openpai
kubernetes管理:
python paictl.py cluster k8s-bootup -p ~/pai-config # 启动kubernetes
python paictl.py cluster k8s-clean -p ~/pai-config # 删除kubernetes
有问题加QQ群: 526855734