微软机器学习平台openpai搭建教程(dev-box)

基本需求和配置:

     系统: ubuntu16.04

     内存:64G

     硬盘:40G

环境准备

   

IP 系统 软件 kubernetes角色
 192.168.78.130 ubuntu16.04.3 docker18.09.5,私有仓库 node
192.168.78.131 ubuntu16.04.3 docker18.09.5 master
192.168.78.132 ubuntu16.04.3 docker18.09.5 node
192.168.78.133 ubuntu16.04.3 docker18.09.5 node
192.168.78.134 ubuntu16.04.3 docker18.09.5 node

创建openpai用户 

    新建用户

adduser openpai
passwd openpai
chmod u+w /etc/sudoers

    给kube用户赋sudo权限

vim  /etc/sudoers  添加kube用户
openpai    ALL=(ALL)       NOPASSWD: ALL
chmod u-w /etc/sudoers

 

安装docker     

       服务器没有网络,我将安装包及依赖下载下来之后上传至服务器,所需镜像都拉去下来,然后在也上传到服务器上

       打包镜像

sudo docker save -o pylon.tar openpai/pylon:v0.14.0  # 以pylon镜像为例

       加载镜像

sudo docker load -i pylon.tar

       这里搭建了私有仓库,我们将镜像都上传至私有镜像仓库

sudo docker tag openpai/pylon:v0.14.0 192.168.78.130:5000/openpai/pylon:v0.14.0
sudo docker push 192.168.78.130:5000/openpai/pylon:v0.14.0

      修改docker配置

# vim /etc/docker/daemon.json
 
{
  "data-root": "/data/docker",                       # 修改docker存储路径
  "log-opts": {
       "max-size": "100m", "max-file": "10"
   }, 
  "log-driver": "json-file", 
  "insecure-registries": ["192.168.78.130:5000"]     # 改为私有镜像仓库,
} 

       重启docker

sudo systemctl restart docker

运行dev-box

       拉取镜像dev-box

sudo docker pull 192.168.78.130:5000/openpai/dev-box:v0.14.0

      运行dev-box容器

sudo docker run -itd -e COLUMNS=$COLUMNS -e LINES=$LINES -e TERM=$TERM -v /var/run/docker.sock:/var/run/docker.sock -v /pathConfiguration:/cluster-configuration  -v /hadoop-binary:/hadoop-binary  --pid=host --privileged=true --net=host --name=dev-box  192.168.78.130:5000/openpai/dev-box:v0.14.0

      进入容器

sudo docker exec -it dev-box /bin/bash

配置k8s

      配置quick-start.yaml文件

cd /pai  
cp deployment/quick-start/quick-start-example.yaml deployment/quick-start/quick-start.yaml
vim deployment/quick-start/quick-start.yaml


machines:
  - 192.168.78.131         # 第一台机器为k8s的master
  - 192.168.78.130
  - 192.168.78.132
  - 192.168.78.133
  - 192.168.78.134


ssh-username: openpai
ssh-password: openpai-123

     修改kubernetes-configuration.yaml.template

vim deployment/quick-start/kubernetes-configuration.yaml.template


kubernetes:
  # 服务器要配置dns /etc/resolv.conf,否则安装不成功
  cluster-dns: {{ env["dns"] }}
  load-balance-ip: {{ env["load-balance-ip"] }}
  service-cluster-ip-range: {{ env["service-cluster-ip-range"] }}
  storage-backend: etcd3
  #  设置私有镜像仓库
  docker-registry: 192.168.78.130:5000
  # http://gcr.io/google_containers/hyperkube. Or the tag in your registry.
  hyperkube-version: v1.9.9
  # http://gcr.io/google_containers/etcd. Or the tag in your registry.
  # If you are not familiar with etcd, please don't change it.
  etcd-version: 3.2.17
  # http://gcr.io/google_containers/kube-apiserver. Or the tag in your registry.
  apiserver-version: v1.9.9
  # http://gcr.io/google_containers/kube-scheduler. Or the tag in your registry.
  kube-scheduler-version: v1.9.9
  # http://gcr.io/google_containers/kube-controller-manager
  kube-controller-manager-version:  v1.9.9
  # http://gcr.io/google_containers/kubernetes-dashboard-amd64
  dashboard-version: v1.8.3
  #   修改etcd安装目录
  etcd-data-path: "/data/etcd"

      修改services-configuration.yaml.template

vim deployment/quick-start/services-configuration.yaml.template


cluster:
  docker-registry:
    namespace: openpai
    domain: 192.168.78.130:5000          # 设置私有仓库
    # 设置openpai版本,openpai的dev-box镜像有点坑,镜像版本是v0.14.0,但是里边的代码是
    # v0.13.0,启动该镜像时会通过git校验版本并更新代码,所以离线环境中运行之前一定要在
    # 有网络的环境中运行dev-box容器后在打包上传至服务器
    tag: v0.14.0                         
    secret-name: pai-secret
rest-server:
  # webui登录用户名
  default-pai-admin-username: admin
  # webui登录密码
  default-pai-admin-password: admin-password

       生成配置文件

python paictl.py config generate -i /pai/deployment/quick-start/quick-start.yaml -o ~/pai-config -f

启动K8S

      启动k8s

python paictl.py cluster k8s-bootup -p ~/pai-config

     启动结束之后访问 http://192.168.78.131:9090, 如果可以正常访问,那就证明没有什么问题

启动openpai

     创建cluster-id

python paictl.py config push -p ~/pai-config -c ~/.kube/config

    在命令行中输入:pai ,然后回车,这是可以在k8s管理界面查看是否创建成功http://192.168.78.131:9090/#!/configmap/default/pai-cluster-id?namespace=default,结果如下图

微软机器学习平台openpai搭建教程(dev-box)_第1张图片

    部署openpai

python paictl.py service start -c ~/.kube/config

    这个操作要等待很长时间,中间出现类似“xxxxxxx  is not ready yet. Please wait for a moment!” 这样的提示可以多等待一会。

等全部执行完成之后,这时访问http://192.168.78.131:9286,输入用户名和密码之后便可以使用了,

管理

   openpai的管理命令:

python paictl.py service start -c ~/.kube/config   # 启动openpai
python paictl.py service stop -c ~/.kube/config    # 停止openpai
python paictl.py service delete -c ~/.kube/config  # 删除openpai

    kubernetes管理: 

python paictl.py cluster k8s-bootup -p ~/pai-config   # 启动kubernetes
python paictl.py cluster k8s-clean -p ~/pai-config    # 删除kubernetes

有问题加QQ群: 526855734

你可能感兴趣的:(ubuntu,docker,openpai)