AIMAX集群配置sdfstudio容器记录

AIMAX集群配置sdfstudio容器记录

  • 一、登录
  • 二、测试
  • 三、通过Filezilla传输数据
  • 四、通过第三方私有镜像直接创建环境
    • 方式1 从dockerhub中下载
    • 方式2 上传github中的dockerfile
    • 方式3 上传dockerhub中的第三方镜像
      • 1. 在ubuntu在安装docker
      • 2. 下载第三方镜像
      • 3. 修改hosts
      • 4. 下载证书
      • 5. 修改镜像标签
      • 6. 用户登录
      • 7. 上传镜像
      • 8. 创建交互式开发环境(Terminal)
      • 9. 远程连接
  • 五、在AIMa公有镜像上进逐步完善环境
      • 1. 在公共镜像基础上创建交互式开发环境(Terminal)
      • 2. 创建虚拟环境
      • 3. 安装配置oh-my-zsh(可选)
      • 4. 将改动后的容器保存为镜像
      • 5. 进行任务训练

一、登录

打开主页,输入身份信息。
AIMAX集群配置sdfstudio容器记录_第1张图片

首页
AIMAX集群配置sdfstudio容器记录_第2张图片


二、测试

尝试创建一个交互式开发环境(Desktop)
AIMAX集群配置sdfstudio容器记录_第3张图片

AIMAX集群配置sdfstudio容器记录_第4张图片

可用性较差,不采用这种方式进行开发。


三、通过Filezilla传输数据

  1. 安装Fillzilla

ubuntu下:

sudo apt-get install filezilla
 

windows下可以去官网下载安装。

  1. 连接主机
    输入相关信息进行连接。
    AIMAX集群配置sdfstudio容器记录_第5张图片

  2. 右击文件,选择上传。

AIMAX集群配置sdfstudio容器记录_第6张图片

上传完成后会提示,传输的文件可在私有数据下面找到。
AIMAX集群配置sdfstudio容器记录_第7张图片

四、通过第三方私有镜像直接创建环境

方式1、2不能上传镜像,方式3上传的镜像创建的环境不能进入。怀疑是镜像的原因。这一节可以直接跳过,直接看下一节,根据其公有镜像进行创建环境。

方式1 从dockerhub中下载

可以在dockerhub里搜索到相关镜像,但很遗憾没在网页端看到下载选项
AIMAX集群配置sdfstudio容器记录_第8张图片

AIMAX集群配置sdfstudio容器记录_第9张图片

方式2 上传github中的dockerfile

AIMAX集群配置sdfstudio容器记录_第10张图片

4k的dockerfile文件一直在上传,莫不是在这个过程中就开始创建镜像了?
AIMAX集群配置sdfstudio容器记录_第11张图片

最终还是失败了
AIMAX集群配置sdfstudio容器记录_第12张图片

这个情况在手册里有相关说明:
AIMAX集群配置sdfstudio容器记录_第13张图片
但重试无果,换方式3。

方式3 上传dockerhub中的第三方镜像

1. 在ubuntu在安装docker

  • 安装
  1. docker-ce
  2. nvidia-docker2
  • 将当前用户加入到root组。
❯ sudo cat /etc/group | grep docker
docker:x:998:
❯ sudo usermod -aG docker wj
❯ sudo cat /etc/group | grep docker
❯ sudo chmod a+rw /var/run/docker.sock
❯ sudo systemctl restart docker
❯ docker ps -a
CONTAINER ID   IMAGE         COMMAND    CREATED             STATUS                         PORTS     NAMES
fcd0ba6349f8   hello-world   "/hello"   About an hour ago   Exited (0) About an hour ago             blissful_leavitt

2. 下载第三方镜像

如果加载本地镜像的话:

docker load < dockerimages.tar

从dockerhub上拉取第三方镜像(dockerhub上会给出命令)。

docker pull dromni/sdfstudio:0.2.1
❯ sudo docker pull dromni/sdfstudio:0.2.1
0.2.1: Pulling from dromni/sdfstudio
677076032cca: Pulling fs layer 
bc572704fd22: Pulling fs layer 
82ca2dd0fe9d: Pulling fs layer 
335006729f70: Pulling fs layer 
1b9f8e302abf: Pulling fs layer 
120deaf0783e: Pulling fs layer 
f7b8d7bf559f: Pull complete 
e62d0dcce85d: Pull complete 
dd4b12c0cbdb: Pull complete 
96670d94e1e8: Pull complete 
bb10049f791d: Pull complete 
9e965195e9d1: Pull complete 
f1484bec286b: Pull complete 
f1196e20290a: Pull complete 
c541d97ea6d8: Pull complete 
7f511c789668: Pull complete 
737bd131d2c1: Pull complete 
270a40ad75d6: Pull complete 
f0c0226e364b: Pull complete 
6f9fdc754fdc: Pull complete 
4f4fb700ef54: Pull complete 
30485e8f47b6: Pull complete 
d1cb36d9c606: Pull complete 
db7430713eb7: Pull complete 
19a01bfd85d1: Pull complete 
63a1d18dba4d: Pull complete 
132d02095598: Pull complete 
9bc9681eb426: Pull complete 
94c3a9acdb3e: Pull complete 
Digest: sha256:1823de016219880ac14dae0bb2d3ba71636802683c24fc60f94bb08b484423e9
Status: Downloaded newer image for dromni/sdfstudio:0.2.1
docker.io/dromni/sdfstudio:0.2.1

下载成功,可通过docker images查看。

3. 修改hosts

编辑本地环境中的 /etc/hosts 文件,添加一条记录 registry.cluster.local ,IP 设置为AI Max头节点的IP地址,如:

  ……  
 ……  
192.168.124.95 registry.cluster.local

AIMAX集群配置sdfstudio容器记录_第14张图片

4. 下载证书

  • 创建目录
sudo mkdir -p /etc/docker/certs.d/registry.cluster.local
  • 下载证书
sudo wget -O /etc/docker/certs.d/registry.cluster.local/ca.crt http://192.168.124.95:5680/ca.crt

AIMAX集群配置sdfstudio容器记录_第15张图片

5. 修改镜像标签

sudo docker tag myimage:v1.0 registry.cluster.local/user_username/myimage:v1.0

user_username中仅替换username为AI Max UI平台登录的用户名,user_是前缀,不可删除。

如:

sudo docker tag dromni/sdfstudio:0.2.1 registry.cluster.local/user_xxxx/sdfstudio:1.0

修改后的镜像
在这里插入图片描述

6. 用户登录

  • 获取用户名和密码

在“私有镜像”界面可以点击下载Docker仓库认证信息文件。
AIMAX集群配置sdfstudio容器记录_第16张图片
文件内容如下:

AIMAX集群配置sdfstudio容器记录_第17张图片

  • 登录 registry.cluster.local

登录的用户名口令是以上pushImagesDoc.txt的文件中的用户名和密码。

sudo docker login registry.cluster.local
Username: xxxx
Password: xxxxxxxx

报错如下:
在这里插入图片描述

网络是个好东西,在另一篇帖子看到了解决方案:

/etc/docker/daemon.json加上"insecure-registries": ["https://registry.cluster.local"]"default-runtime": "nvidia",最终的daemon.json文件就变成了:

{
  "insecure-registries": ["https://registry.cluster.local"],
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

相应目录下没有这个daemon.json就自己创建一个,加入以上内容。

然后执行

sudo systemctl   daemon-reload
sudo systemctl restart docker

再登录成功。
AIMAX集群配置sdfstudio容器记录_第18张图片

7. 上传镜像

$ sudo docker push registry.cluster.local/user_username/myimage:v1.0
sudo docker push registry.cluster.local/user_xxxx/sdfstudio:1.0
The push refers to repository [registry.cluster.local/user_xxxx/sdfstudio]
5f70bf18a086: Preparing 
75c9930f04e3: Preparing 
d607f5331dd0: Preparing 
775fd1ca67da: Preparing 
5c85fd87e7d2: Preparing 
c2cc2815d350: Preparing 
c261386e14e8: Waiting 
8f5a7461deb4: Waiting 
5668a06c4f00: Waiting 
6f9a406a17ed: Waiting 
32a3407bed0d: Waiting 
dc1bbc4db2ec: Waiting 
77b9d6e4b433: Waiting 
6be54aac0530: Waiting 
77f74632d268: Waiting 
4d6a42904634: Waiting 
d04569d95086: Waiting 
c2ecd79d5a18: Waiting 
bd889e83e652: Waiting 
d2e28f4121e3: Waiting 
3a12ac953428: Waiting 
11df89f48870: Waiting 
2106d7cd1026: Waiting 
f403f5c5948a: Waiting 
8f6106a133b8: Waiting 
af561c199f2f: Waiting 
ea83d1f80fca: Waiting 
65abf0edb23d: Waiting 
c5ff2d88f679: Waiting 
denied: requested access to the resource is denied

失败了!
换hello-world 镜像试试。

❯ sudo docker push registry.cluster.local/user_wuji/hello:v1.0
The push refers to repository [registry.cluster.local/user_wuji/hello]
01bb4fce3eb1: Preparing 
denied: requested access to the resource is denied

依然失败,排除镜像原因,因为sdfstudio镜像有22G,hello-world镜像只有几K。

琢磨了一下,发现原因在于:登录用户的时候没有使用 sudo 命令,加上 sudo,重新登录。
AIMAX集群配置sdfstudio容器记录_第19张图片
再次推送,正常传输。
AIMAX集群配置sdfstudio容器记录_第20张图片
耗时18分钟,完成,可在私有镜像中找到。

AIMAX集群配置sdfstudio容器记录_第21张图片

8. 创建交互式开发环境(Terminal)

AIMAX集群配置sdfstudio容器记录_第22张图片

AIMAX集群配置sdfstudio容器记录_第23张图片

由于镜像较大,这个准备的过程同样耗时较长。

AIMAX集群配置sdfstudio容器记录_第24张图片

成功,不出意外的话现在可以使用了。

9. 远程连接

下载安装 MobaXterm,Free-Protable(其实网上有破解版的,一则没必要,二则为了安全考虑)。

AIMAX集群配置sdfstudio容器记录_第25张图片

AIMAX集群配置sdfstudio容器记录_第26张图片
可通过新开一个Session,选择ssh输入相关信息,或者直接在终端中输入ssh命令

尝试多次,这一步失败了。
AIMAX集群配置sdfstudio容器记录_第27张图片

尝试一下aimax自带镜像,没有问题,可以正常连接。

AIMAX集群配置sdfstudio容器记录_第28张图片

AIMAX集群配置sdfstudio容器记录_第29张图片

不死心,在Dockerhub上重新找一个镜像试了一下,依然不行,大概率是要在他自带镜像的基础上再逐步完善了。
AIMAX集群配置sdfstudio容器记录_第30张图片


五、在AIMa公有镜像上进逐步完善环境

1. 在公共镜像基础上创建交互式开发环境(Terminal)

AIMAX集群配置sdfstudio容器记录_第31张图片

AIMAX集群配置sdfstudio容器记录_第32张图片

AIMAX集群配置sdfstudio容器记录_第33张图片
进入 /opt/data/private 可以看到自己上传的私有数据。
AIMAX集群配置sdfstudio容器记录_第34张图片


2. 创建虚拟环境

我这里配置的是sdfstudio环境,直接按官方教程来,也有之前的部署记录。

下面仅针对报错进行记录。

  • 在执行 conda activate sdfstudio 时,会有相关报错
  CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init

执行

conda init bash

再关闭shell,重新连接,就可以正常使用虚拟环境了
AIMAX集群配置sdfstudio容器记录_第35张图片

  • 在执行 pip install -e .时报错

AIMAX集群配置sdfstudio容器记录_第36张图片
原因是没有切换到项目目录,切换后可以正常执行。

  • 执行 ns-install-cli报错
(sdfstudio) root@sdfstudio:/opt/data/private/sdfstudio# ns-install-cli
[17:52:23]  .zshrc not found, skipping.                                                                 install.py:212
            Found .bashrc!                                                                              install.py:214
[17:52:24] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-install-cli.      install.py:124
           ❌ Completion script generation failed: ['ns-render-mesh', '--tyro-print-completion', 'bash']  install.py:109
           Traceback (most recent call last):                                                             install.py:113
             File "/opt/conda/envs/sdfstudio/bin/ns-render-mesh", line 5, in <module>                          
               from scripts.render_mesh import entrypoint                                                      
             File "/opt/data/private/sdfstudio/scripts/render_mesh.py", line 12, in <module>                   
               import cv2                                                                                      
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in        
           <module>                                                                                            
               bootstrap()                                                                                     
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in        
           bootstrap                                                                                           
               native_module = importlib.import_module("cv2")                                                  
             File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in                
           import_module                                                                                       
               return _bootstrap._gcd_import(name, package, level)                                             
           ImportError: libGL.so.1: cannot open shared object file: No such file or directory                  
                                                                                                               
           ❌ Completion script generation failed: ['ns-eval', '--tyro-print-completion', 'bash']         install.py:109
           Traceback (most recent call last):                                                             install.py:113
             File "/opt/conda/envs/sdfstudio/bin/ns-eval", line 5, in <module>                                 
               from scripts.eval import entrypoint                                                             
             File "/opt/data/private/sdfstudio/scripts/eval.py", line 11, in <module>                          
               import cv2                                                                                      
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in        
           <module>                                                                                            
               bootstrap()                                                                                     
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in        
           bootstrap                                                                                           
               native_module = importlib.import_module("cv2")                                                  
             File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in                
           import_module                                                                                       
               return _bootstrap._gcd_import(name, package, level)                                             
           ImportError: libGL.so.1: cannot open shared object file: No such file or directory                  
                                                                                                               
           ✔ Updated completion at /opt/data/private/sdfstudio/scripts/completions/bash/_ns-dev-test!     install.py:122
[17:52:25] ✔ Updated completion at /opt/data/private/sdfstudio/scripts/completions/bash/_ns-process-data! install.py:122
[17:52:26] ❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash']        install.py:109
           Traceback (most recent call last):                                                             install.py:113
             File "/opt/conda/envs/sdfstudio/bin/ns-train", line 5, in <module>                                
               from scripts.train import entrypoint                                                            
             File "/opt/data/private/sdfstudio/scripts/train.py", line 48, in <module>                         
               from nerfstudio.configs import base_config as cfg                                               
             File "/opt/data/private/sdfstudio/nerfstudio/configs/base_config.py", line 197, in <module>       
               from nerfstudio.pipelines.base_pipeline import VanillaPipelineConfig                            
             File "/opt/data/private/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 41, in             
           <module>                                                                                            
               from nerfstudio.data.datamanagers.base_datamanager import (                                     
             File "/opt/data/private/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line         
           35, in <module>                                                                                     
               from nerfstudio.cameras.cameras import CameraType                                               
             File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>            
               import cv2                                                                                      
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in        
           <module>                                                                                            
               bootstrap()                                                                                     
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in        
           bootstrap                                                                                           
               native_module = importlib.import_module("cv2")                                                  
             File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in                
           import_module                                                                                       
               return _bootstrap._gcd_import(name, package, level)                                             
           ImportError: libGL.so.1: cannot open shared object file: No such file or directory                  
                                                                                                               
           ❌ Completion script generation failed: ['ns-download-data', '--tyro-print-completion',        install.py:109
           'bash']                                                                                             
           ❌ Completion script generation failed: ['ns-extract-mesh', '--tyro-print-completion', 'bash'] install.py:109
           Traceback (most recent call last):                                                             install.py:113
             File "/opt/conda/envs/sdfstudio/bin/ns-download-data", line 5, in <module>                        
               from scripts.downloads.download_data import entrypoint                                          
             File "/opt/data/private/sdfstudio/scripts/downloads/download_data.py", line 17, in <module>       
               from nerfstudio.configs.base_config import PrintableConfig                                      
             File "/opt/data/private/sdfstudio/nerfstudio/configs/base_config.py", line 197, in <module>       
               from nerfstudio.pipelines.base_pipeline import VanillaPipelineConfig                            
             File "/opt/data/private/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 41, in             
           <module>                                                                                            
               from nerfstudio.data.datamanagers.base_datamanager import (                                     
             File "/opt/data/private/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line         
           35, in <module>                                                                                     
               from nerfstudio.cameras.cameras import CameraType                                               
             File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>            
               import cv2                                                                                      
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in        
           <module>                                                                                            
               bootstrap()                                                                                     
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in        
           bootstrap                                                                                           
               native_module = importlib.import_module("cv2")                                                  
             File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in                
           import_module                                                                                       
               return _bootstrap._gcd_import(name, package, level)                                             
           ImportError: libGL.so.1: cannot open shared object file: No such file or directory                  
                                                                                                               
           Traceback (most recent call last):                                                             install.py:113
             File "/opt/conda/envs/sdfstudio/bin/ns-extract-mesh", line 5, in <module>                         
               from scripts.extract_mesh import entrypoint                                                     
             File "/opt/data/private/sdfstudio/scripts/extract_mesh.py", line 16, in <module>                  
               from nerfstudio.utils.eval_utils import eval_setup                                              
             File "/opt/data/private/sdfstudio/nerfstudio/utils/eval_utils.py", line 30, in <module>           
               from nerfstudio.configs import base_config as cfg                                               
             File "/opt/data/private/sdfstudio/nerfstudio/configs/base_config.py", line 197, in <module>       
               from nerfstudio.pipelines.base_pipeline import VanillaPipelineConfig                            
             File "/opt/data/private/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 41, in             
           <module>                                                                                            
               from nerfstudio.data.datamanagers.base_datamanager import (                                     
             File "/opt/data/private/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line         
           35, in <module>                                                                                     
               from nerfstudio.cameras.cameras import CameraType                                               
             File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>            
               import cv2                                                                                      
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in        
           <module>                                                                                            
               bootstrap()                                                                                     
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in        
           bootstrap                                                                                           
               native_module = importlib.import_module("cv2")                                                  
             File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in                
           import_module                                                                                       
               return _bootstrap._gcd_import(name, package, level)                                             
           ImportError: libGL.so.1: cannot open shared object file: No such file or directory                  
                                                                                                               
Traceback (most recent call last):
  File "/opt/conda/envs/sdfstudio/bin/ns-install-cli", line 8, in <module>
    sys.exit(entrypoint())
  File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 284, in entrypoint
    tyro.cli(main, description=__doc__)
  File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/tyro/_cli.py", line 177, in cli
    output = _cli_impl(
  File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/tyro/_cli.py", line 430, in _cli_impl
    out, consumed_keywords = _calling.call_from_args(
  File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/tyro/_calling.py", line 204, in call_from_args
    return unwrapped_f(*positional_args, **kwargs), consumed_keywords  # type: ignore
  File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 253, in main
    completion_paths = list(
  File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/opt/conda/envs/sdfstudio/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 255, in <lambda>
    lambda path_or_entrypoint_and_shell: _generate_completion(
  File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 114, in _generate_completion
    raise e
  File "/opt/data/private/sdfstudio/scripts/completions/install.py", line 101, in _generate_completion
    new = subprocess.run(
  File "/opt/conda/envs/sdfstudio/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-download-data', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.
           ❌ Completion script generation failed: ['ns-render', '--tyro-print-completion', 'bash']       install.py:109
           Traceback (most recent call last):                                                             install.py:113
             File "/opt/conda/envs/sdfstudio/bin/ns-render", line 5, in <module>                               
               from scripts.render import entrypoint                                                           
             File "/opt/data/private/sdfstudio/scripts/render.py", line 27, in <module>                        
               from nerfstudio.cameras.camera_paths import get_path_from_json, get_spiral_path                 
             File "/opt/data/private/sdfstudio/nerfstudio/cameras/camera_paths.py", line 27, in <module>       
               from nerfstudio.cameras.cameras import Cameras                                                  
             File "/opt/data/private/sdfstudio/nerfstudio/cameras/cameras.py", line 24, in <module>            
               import cv2                                                                                      
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 181, in        
           <module>                                                                                            
               bootstrap()                                                                                     
             File "/opt/conda/envs/sdfstudio/lib/python3.8/site-packages/cv2/__init__.py", line 153, in        
           bootstrap                                                                                           
               native_module = importlib.import_module("cv2")                                                  
             File "/opt/conda/envs/sdfstudio/lib/python3.8/importlib/__init__.py", line 127, in                
           import_module                                                                                       
               return _bootstrap._gcd_import(name, package, level)                                             
           ImportError: libGL.so.1: cannot open shared object file: No such file or directory                  

解决办法:

apt-get update && apt-get install libgl1
  • 另外,由于另有改动,根据其报错信息还需安装 icecreamcryptography
pip install icecream
pip install cryptography

再次执行ns-install-cli,成功。

(sdfstudio) root@sdfstudio:/opt/data/private/sdfstudio# ns-install-cli
[19:32:24]  .zshrc not found, skipping.                                                                 install.py:212
            Found .bashrc!                                                                              install.py:214
[19:32:25] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-dev-test.         install.py:124
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-install-cli.      install.py:124
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-process-data.     install.py:124
[19:32:36] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-eval.             install.py:124
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-download-data.    install.py:124
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-extract-mesh.     install.py:124
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render-mesh.      install.py:124
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render.           install.py:124
[19:32:38] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-train.            install.py:124
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-eval.                       install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-install-cli.                install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-train.                      install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-extract-mesh.               install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-process-data.               install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render.                     install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-download-data.              install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render-mesh.                install.py:270
            Deleted /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-dev-test.                   install.py:270
            Completions installed to /root/.bashrc. Exciting! Open a new shell to try them out.         install.py:186
All done!

先小小测试一下。

(sdfstudio) root@sdfstudio:/opt/data/private/sdfstudio# ns-train -h
usage: ns-train [-h]
                {testsdf,bakedangelo,neuralangelo,bakedsdf,bakedsdf-mlp,neus-facto-angelo,neus-facto,neus-fac
to-bigmlp,geo-volsdf,monosdf,volsdf,geo-neus,mono-neus,neus,unisurf,mono-unisurf,geo-unisurf,dto,neusW,neus-a
cc,nerfacto,instant-ngp,mipnerf,semantic-nerfw,vanilla-nerf,tensorf,dnerf,phototourism}

Train a radiance field with nerfstudio. For real captures, we recommend using the nerfacto model.

Nerfstudio allows for customizing your training and eval configs from the CLI in a powerful way, but there
are some things to understand.

The most demonstrative and helpful example of the CLI structure is the difference in output between the
following commands:

    ns-train -h
    ns-train nerfacto -h nerfstudio-data
    ns-train nerfacto nerfstudio-data -h

In each of these examples, the -h applies to the previous subcommand (ns-train, nerfacto, and
nerfstudio-data).

In the first example, we get the help menu for the ns-train script. In the second example, we get the help
menu for the nerfacto model. In the third example, we get the help menu for the nerfstudio-data dataparser.

With our scripts, your arguments will apply to the preceding subcommand in your command, and thus where you
put your arguments matters! Any optional arguments you discover from running

    ns-train nerfacto -h nerfstudio-data

need to come directly after the nerfacto subcommand, since these optional arguments only belong to the
nerfacto subcommand:

    ns-train nerfacto {nerfacto optional args} nerfstudio-data

╭─ arguments ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ -h, --help        show this help message and exit                                                         │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ subcommands ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ {testsdf,bakedangelo,neuralangelo,bakedsdf,bakedsdf-mlp,neus-facto-angelo,neus-facto,neus-facto-bigmlp,g… │
│     testsdf           Implementation of TestSDF                                                           │
│     bakedangelo       Implementation of Neuralangelo with BakedSDF                                        │
│     neuralangelo      Implementation of Neuralangelo                                                      │
│     bakedsdf          Implementation of BackedSDF with multi-res hash grids                               │
│     bakedsdf-mlp      Implementation of BackedSDF with large MLPs                                         │
│     neus-facto-angelo Implementation of Neuralangelo with neus-facto                                      │
│     neus-facto        Implementation of NeuS similar to nerfacto where proposal sampler is used.          │
│     neus-facto-bigmlp NeuS-facto with big MLP, it is used in training heritage data with 8 gpus           │
│     geo-volsdf        Implementation of patch warping from GeoNeuS with VolSDF.                           │
│     monosdf           Implementation of MonoSDF.                                                          │
│     volsdf            Implementation of VolSDF.                                                           │
│     geo-neus          Implementation of patch warping from GeoNeuS with NeuS.                             │
│     mono-neus         Implementation of MonoSDF with NeuS rendering formulation.                          │
│     neus              Implementation of NeuS.                                                             │
│     unisurf           Implementation of UniSurf.                                                          │
│     mono-unisurf      Implementation of MonoSDF with unisurf rendering formulation.                       │
│     geo-unisurf       Implementation of patch warping from GeoNeuS with UniSurf.                          │
│     dto               Occupancy field with density guided sampling                                        │
│     neusW             Implementation of Neural Reconstruction in the wild                                 │
│     neus-acc          Implementation of NeuS with empty space skipping.                                   │
│     nerfacto          Recommended real-time model tuned for real captures. This model will be continually │
│                       updated.                                                                            │
│     instant-ngp       Implementation of Instant-NGP. Recommended real-time model for bounded synthetic    │
│                       data.                                                                               │
│     mipnerf           High quality model for bounded scenes. (slow)                                       │
│     semantic-nerfw    Predicts semantic segmentations and filters out transient objects.                  │
│     vanilla-nerf      Original NeRF model. (slow)                                                         │
│     tensorf           tensorf                                                                             │
│     dnerf             Dynamic-NeRF model. (slow)                                                          │
│     phototourism      Uses the Phototourism data.                                                         │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯

没毛病!


3. 安装配置oh-my-zsh(可选)

emmm,bash 没有耗时记录,考虑了一下,还是装一下zsh吧。

  • 安装zsh
 apt install zsh
  • 安装oh-my-zsh
    通过文件传输
mv .oh-my-zsh ~/.oh-my-zsh
cp ~/.oh-my-zsh/templates/zshrc.zsh-template ~/.zshrc
chsh -s /bin/zsh
  • 安装powerline10k
    通过文件传输(包括字体)
mv powerlevel10k ~/.oh-my-zsh/custom/themes
mkdir ~/.fonts
mv MesloLGS* ~/.fonts/*

打开 ~/.bashrc,查看conda 相关配置

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
        . "/opt/conda/etc/profile.d/conda.sh"
    else
        export PATH="/opt/conda/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

  • 配置oh-my-zsh
    编辑 ~/.zshrc
vim~/.zshrc
...
 ZSH_THEME="powerlevel10k/powerlevel10k"
..
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
        . "/opt/conda/etc/profile.d/conda.sh"
    else
        export PATH="/opt/conda/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

更新~/.zshrc

source ~/.zshrc

zsh 安装后还需要重新执行一下 ns-install-cli

  /opt/da/p/sdfstudio ❯ ns-install-cli                                           sdfstudio root@sdfstudio  21:51:55
[21:51:58]  Found .zshrc!                                                                               install.py:214
            Found .bashrc!                                                                              install.py:214
[21:51:59] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-install-cli.      install.py:124
           ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-install-cli! install.py:119
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-dev-test.         install.py:124
           ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-dev-test!    install.py:119
[21:52:00] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-process-data.     install.py:124
           ✔ Wrote new completion to                                                                      install.py:119
           /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-process-data!                               
[21:52:55] ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-download-data.    install.py:124
           ✔ Wrote new completion to                                                                      install.py:119
           /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-download-data!                              
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-eval.             install.py:124
           ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-eval!        install.py:119
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render.           install.py:124
           ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render!      install.py:119
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-extract-mesh.     install.py:124
           ✔ Wrote new completion to                                                                      install.py:119
           /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-extract-mesh!                               
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-render-mesh.      install.py:124
           ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-render-mesh! install.py:119
[21:52:57] ✔ Wrote new completion to /opt/data/private/sdfstudio/scripts/completions/zsh/_ns-train!       install.py:119
           ✔ Nothing to do for /opt/data/private/sdfstudio/scripts/completions/bash/_ns-train.            install.py:124
            Completions installed to /root/.zshrc. Exciting! Open a new shell to try them out.          install.py:186
            Existing completions uninstalled from /root/.bashrc.                                        install.py:180
            Completions installed to /root/.bashrc. Ok! Open a new shell to try them out.               install.py:186
All done!

完成,这下就可以使用集群硬件进行训练了。


4. 将改动后的容器保存为镜像

AIMAX集群配置sdfstudio容器记录_第37张图片
然后可在私有镜像下看到这个镜像,然后可通过这个镜像创建新的环境。


5. 进行任务训练

数据较大时任务会被killed(这个问题通过提高内存没有得到解决,难道是显存或参数的原因?)。

AIMAX集群配置sdfstudio容器记录_第38张图片

降低分辨率后可以正常训练。

AIMAX集群配置sdfstudio容器记录_第39张图片


注意到断开ssh连接会导致任务终止,后续考虑通过任务训练方式进行训练,交互式开发可能是得一直处于交互状态才可以(可以用另一台电脑挂着)。

你可能感兴趣的:(Ubuntu,Others,Python,docker,ai,服务器)