kubeflow0.6.2版本搭建

目录

 

基础环境:

安装kubeflow


基础环境:

         系统:centos7.6

         kubernetes:1.14

  内存需求:

         kubernetes > 1.11

         cpu > 4

         storage > 50G

         memory > 12G

安装kubeflow

          安装 kfctl

wget https://github.com/kubeflow/kubeflow/releases/download/v0.6.2/kfctl_v0.6.2_linux.tar.gz
tar -zxvf kfctl_v0.6.2_linux.tar.gz
mv kfctl /k8s/kubernetes/bin

          安装ksconnect

wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz
tar -xaf ks_0.13.1_linux_amd64.tar.gz 
mv ks_0.13.1_linux_amd64/ks /k8s/kubernetes/bin/ks

          kubeflow配置

export KFAPP=kfapp
export CONFIG="https://raw.githubusercontent.com/kubeflow/kubeflow/v0.6-branch/bootstrap/config/kfctl_k8s_istio.0.6.2.yaml"
kfctl init ${KFAPP} --config=${CONFIG} -V
cd ${KFAPP}
kfctl generate all -V

         注:环境变量CONFIG中配置的是安装istio和istio与kubeflow组件的配置

         由于默认的镜像仓库是gcr.io,服务器在国外,无法正常拉取,微软的镜像仓库提供了kubeflow的镜像,这里我将需要去gcr.io去下载的镜像全部找出来,然后下载下来并上传到我自己的镜像仓库。

        在服务器192.168.2.10搭建本地镜像仓库

sudo docker run -d -p 5000:5000 --restart always --name registry registry:2

        修改所有机器的docker配置:

cat /etc/docker/daemon.json
{
    "insecure-registries": [
        "192.168.2.10:5000"
    ]
}

         重启docker

sudo systemctl restart docker

        自动下载并上传至本地镜像仓库

#! /bin/bash
images=(
gcr.io/kubeflow-images-public/admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c
gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta
gcr.io/ml-pipeline/api-server:0.1.23
gcr.io/kubeflow-images-public/ingress-setup:latest
gcr.io/kubeflow-images-public/centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59
gcr.io/kubeflow-images-public/jupyter-web-app:9419d4d
gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager-rest:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-bayesianoptimization:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-grid:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-hyperband:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-nasrl:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-random:v0.6.0-rc.0
gcr.io/kubeflow-images-public/katib/v1alpha2/katib-ui:v0.6.0-rc.0
gcr.io/kubeflow-images-public/metadata:v0.1.8
gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8
gcr.io/ml-pipeline/persistenceagent:0.1.23
gcr.io/ml-pipeline/scheduledworkflow:0.1.23
gcr.io/ml-pipeline/frontend:0.1.23
gcr.io/ml-pipeline/viewer-crd-controller:0.1.23
gcr.io/kubeflow-images-public/notebook-controller:v20190603-v0-175-geeca4530-e3b0c4
gcr.io/kubeflow-images-public/profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e
gcr.io/kubeflow-images-public/kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4
gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-rc.0
gcr.io/google_containers/spartakus-amd64:v1.1.0
gcr.io/kubeflow-images-public/tf_operator:v0.6.0.rc0
)
download="sudo docker pull "
tag="sudo docker tag "
push="sudo docker push "
gcr="gcr.io"
aws="gcr.azk8s.cn"
ip="192.168.2.10:5000"
del_image="sudo docker rmi "
for image in ${images[*]};
do
    awstag=$(echo $image |sed "s/${gcr}/${aws}/")
    localtag=$(echo $image |sed "s/${gcr}/${ip}/")
    downloadaws="${download}${awstag}"
    tagx="${tag}${awstag} ${localtag}"
    pushx="${push}${localtag}"	
    delete_aws_image="${del_image}${awstag}"
    ${downloadaws}
    ${tagx}
    ${pushx}
    ${delete_aws_image}
done

         镜像准备完成后,我们来修改编排文件中的镜像,查找每个组件中的deployment.yaml中的镜像,遇到gcr.io的镜像就修改为192.168.2.10:5000,同时要注意镜像版本,以下镜像和我下载的镜像版本有区别

gcr.io/kubeflow-images-public/jupyter-web-app:v0.5.0
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-bayesianoptimization:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-hyperband:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-nasrl:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-random:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-grid:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/kubernetes-sigs/application
gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/katib/v1alpha2/katib-ui:v0.1.2-alpha-289-g14dad8b
gcr.io/kubeflow-images-public/notebook-controller:v20190614-v0-160-g386f2749-e3b0c4

          刚开始我下载镜像后直接将gcr.azk8s.cn 打包成 gcr.io,但是发现有好几镜像始终无法正常下载,所以我采取了本地仓库的策略,最后发现是镜像版本有差异。

         修改完所有的组件的镜像后我们创建kubeflow所需的pv

         vim katib-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: katib-pv2
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data3/katib"

       vim metadata-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: metadata-pv
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data3/metadata"

      vim minmo-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: minmo-pv
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data3/minmo"

      vim mysql-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data3/mysql"

         以上pv都是直接挂载在本地文件系统中,你也可以搭建NFS来作为文件存储系统,

kubectl apply -f katib-pv.yaml
kubectl apply -f metadata-pv.yaml
kubectl apply -f minmo-pv.yaml
kubectl apply -f mysql-pv.yaml

         创建namespace kubeflow-anonymous.org,因为在安装的过程中会因这个kubeflow中默认的namespace没有创建而报错,

kubectl create ns kubeflow-anonymous.org

         接下来安装kubeflow  

kfctl apply all -V

        执行结束后可查看pod创建情况

kubectl -n kubeflow get pod
NAME                                                        READY   STATUS    RESTARTS   AGE
admission-webhook-bootstrap-stateful-set-0                  1/1     Running   0          24h
admission-webhook-deployment-75bb567b88-vm4f8               1/1     Running   0          24h
application-controller-stateful-set-0                       1/1     Running   0          24h
argo-ui-5dcf5d8b4f-482p8                                    1/1     Running   0          2d
centraldashboard-cf4874ddc-dkdqg                            1/1     Running   0          24h
jupyter-web-app-deployment-58ddb666cf-48v2h                 1/1     Running   0          24h
katib-controller-847bf97f4c-xtfw8                           1/1     Running   1          2d
katib-db-8598468fd8-vd2gh                                   1/1     Running   0          24h
katib-manager-84fb8cd98f-zwstg                              1/1     Running   1          2d
katib-manager-rest-778857c989-n9wkv                         1/1     Running   0          24h
katib-suggestion-bayesianoptimization-7797c8fc95-7kbzh      1/1     Running   0          2d
katib-suggestion-grid-b465d99bc-jzvnd                       1/1     Running   0          2d
katib-suggestion-hyperband-8d49fdc49-mwhd5                  1/1     Running   0          24h
katib-suggestion-nasrl-5b96f5db8f-pbwpb                     1/1     Running   0          2d
katib-suggestion-random-6fbd7d697f-lldhl                    1/1     Running   0          2d
katib-ui-594cb5f779-frl98                                   1/1     Running   0          2d
metacontroller-0                                            1/1     Running   0          2d
metadata-db-5dd459cc-r529k                                  1/1     Running   0          2d
metadata-deployment-7f698d8f8d-8m4sp                        1/1     Running   0          24h
metadata-deployment-7f698d8f8d-m2t4h                        1/1     Running   0          2d
metadata-deployment-7f698d8f8d-t5sw9                        1/1     Running   0          24h
metadata-ui-7b85b56578-rd9nj                                1/1     Running   0          2d
minio-758b769d67-55gbg                                      1/1     Running   0          2d
ml-pipeline-7bdb8985fc-jglwq                                1/1     Running   0          2d
ml-pipeline-persistenceagent-867856dfbb-8ltn2               1/1     Running   0          2d
ml-pipeline-scheduledworkflow-788465ccd8-ftx5c              1/1     Running   0          24h
ml-pipeline-ui-7c8875b796-f7ks6                             1/1     Running   0          24h
ml-pipeline-viewer-controller-deployment-7b664cb7d4-hpd4w   1/1     Running   0          2d
mysql-657f87857d-lfnkj                                      1/1     Running   0          2d
notebook-controller-deployment-5b7975d5bf-99x9d             1/1     Running   0          24h
profiles-deployment-7bbb9586d9-lrpnf                        2/2     Running   0          2d
pytorch-operator-7547865bd5-fcqzl                           1/1     Running   0          2d
seldon-operator-controller-manager-0                        1/1     Running   1          2d
spartakus-volunteer-7df8bfcc5c-d5xfl                        1/1     Running   0          24h
tensorboard-6544748d94-vwcfj                                1/1     Running   0          24h
tf-job-dashboard-cfd947d4b-8289s                            1/1     Running   0          2d
tf-job-operator-657cbb8d9c-wjvnk                            1/1     Running   0          2d
workflow-controller-db644d554-6s5jg                         1/1     Running   0          24h

         查看网关istio部署情况

kubectl -n istio-system get pod
NAME                                     READY   STATUS      RESTARTS   AGE
grafana-67c69bb567-fhnp9                 1/1     Running     0          2d
istio-citadel-67697b6697-njcfw           1/1     Running     0          24h
istio-egressgateway-7dbbb87698-vn67l     1/1     Running     0          24h
istio-galley-7474d97954-7wmzk            1/1     Running     0          24h
istio-grafana-post-install-1.1.6-5425v   0/1     Completed   0          2d
istio-ingressgateway-565b894b5f-9m4c8    1/1     Running     0          2d
istio-pilot-6dd5b8f74c-2cl24             2/2     Running     0          24h
istio-policy-7f8bb87857-kpwxv            2/2     Running     2          2d
istio-sidecar-injector-fd5875568-cbb87   1/1     Running     0          2d
istio-telemetry-8759dc6b7-qr4p8          2/2     Running     1          2d
istio-tracing-5d8f57c8ff-m8tvc           1/1     Running     0          2d
kiali-d4d886dd7-6smvx                    1/1     Running     0          24h
prometheus-d8d46c5b5-6285p               1/1     Running     0          2d

        利用port-forward 访问kubeflow ui

kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80

       新开一个终端:

curl http://localhost:8080/
curl: (52) Empty reply from server

        请求失败,同时在port-forward 下出现如下报错

E1014 12:10:30.503595   26696 portforward.go:400] an error occurred forwarding 8080 -> 80: error forwarding port 80 to pod 2620eb8aad30d22cd611496c628c04e934e3ad1b33ae8ab640d92e34c712b26f, uid : unable to do port forwarding: socat not found.

        这是因为没有安装socat,下载socat

wget http://www.dest-unreach.org/socat/download/socat-1.7.3.3.tar.gz
tar -zxvf socat-1.7.3.3.tar.gz
cd socat-1.7.3.3

         编译安装时发生如下错误

./configure 
checking which defines needed for makedepend... 
checking for a BSD-compatible install... /usr/bin/install -c
checking for gcc... no
checking for cc... no
checking for cl.exe... no
configure: error: in `/home/kube/socat-1.7.3.3':
configure: error: no acceptable C compiler found in $PATH
See `config.log' for more details

         原因是没有安装gcc 

sudo yum install gcc

         重新编译安装

./configure
sudo make & sudo make install

         重新运行port-forward

kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80

         访问UI

curl http://localhost:8080/
Kubeflow Central Dashboard

         安装部署已经全部完成,剩余的问题就是如何利用istio来访问UI,

         删除kubeflow

kfctl delete all -V
kubectl delete ns istio-system

        kubeflow0.6.2的搭建已经全部完成了,但是学习才刚刚开始,对kubeflow有兴趣的朋友可以加群526855734,也可以扫码加群kubeflow0.6.2版本搭建_第1张图片

你可能感兴趣的:(docker,kubernetes,kubeflow)