flownet2-pytorch issues

GitHub:https://github.com/NVIDIA/flownet2-pytorch
———————————————————————————————————————————————

pytorch>=0.4.1

1.bash ./install.sh报错:nvcc fatal : Unsupported gpu architecture ‘compute_70’

  • cuda版本要求9.0
    1> pytorch安装对应版本
    conda install -n py36t4 pytorch=0.4.1 torchvision cudatoolkit=9.0 -c pytorch
    2>查看nvcc版本
    nvcc --version
    3>修改当前cuda(服务器)
    • vim ~/.bash.profile
# .bash_profile
  2
  3 # Get the aliases and functions
  4 if [ -f ~/.bashrc ]; then
  5     . ~/.bashrc
  6 fi
  7
  8 # User specific environment and startup programs
  9
 10 PATH=/usr/local/cuda2/bin:$HOME/.local/bin:$HOME/bin:$PATH
 11 export PATH
 12
 13 LD_LIBRARY_PATH=/usr/local/cuda2/lib64:$LD_LIBRARY_PATH
 14 export LD_LIBRARY_PATH
 15
 16 INCLUDE_PATH=/usr/local/cuda2/include:$INCLUD_PATH
 17 export INCLUDE_PATH

    退出当前环境后执行source ~/.bash_profile
    logout后重新启动

  • vim ~/.bashrc
 # .bashrc
  2
  3 # Source global definitions
  4 if [ -f /etc/bashrc ]; then
  5     . /etc/bashrc
  6 fi
  7
  8 # Uncomment the following line if you don't like systemctl's auto-paging feature:
  9 # export SYSTEMD_PAGER=
 10
 11 # >>> conda initialize >>>
 12 # !! Contents within this block are managed by 'conda init' !!
 13 __conda_setup="$('/.../anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
 14 if [ $? -eq 0 ]; then
 15     eval "$__conda_setup"
 16 else
 17     if [ -f "/.../anaconda3/etc/profile.d/conda.sh" ]; then
 18         . "/.../anaconda3/etc/profile.d/conda.sh"
 19     else
 20         export PATH="/.../anaconda3/bin:$PATH"
 21     fi
 22 fi
 23 unset __conda_setup
 24 # <<< conda initialize <<<
 25

    4>修改三个文件夹下的setup.py

nvcc_args = [
    '-gencode', 'arch=compute_35,code=sm_35',
    #'-gencode', 'arch=compute_50,code=sm_50',
    #'-gencode', 'arch=compute_52,code=sm_52',
    #'-gencode', 'arch=compute_60,code=sm_60',
    #'-gencode', 'arch=compute_61,code=sm_61',
    #'-gencode', 'arch=compute_70,code=sm_70',
    #'-gencode', 'arch=compute_70,code=compute_70'
]

2.报错

correlation_cuda_kernel.cu(130): error: identifier "__syncwarp" is undefined
correlation_cuda_kernel.cu(19): error: identifier "__shfl_down_sync" is undefined
  • cuda链接库问题

3.Inference报错
segmentation fault(core dumped)

  • 版本对应问题 未解决

pytorch<=0.4.0

1.Inference报错:

ImportError:.../.../lib/libstdc++.so.6:version `GLIBCXX_3.4.20' not found
  • 解决
    conda install libgcc=5.2.0

 继续报错:

ImportError:.../.../resempled2d_cuda-0.0.0-py3.6-linux-x86_64.egg/resampled2d_cuda.cpython-36m-x86_64-linux-gnu.so:undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE
  • 原因:git未切换分支
  • 解决
git branch -a
git pull origin python36-PyTorch0.4
git commit -a
git checkout -b flow40 origin/python36-PyTorch0.4
git branch -a

2.bash ./install.sh报错:

  error: ‘for’ loop initial declarations are only allowed in C99 mode
  command 'gcc' failed with exit status 1

解决export CFLAGS="-std=c99"

3.Inference测试时报错

命令为 python main.py --inference --model FlowNet2 --save_flow --inference_dataset ImagesFromFolder --inference_dataset_root /path/to/image/folder --resume ./FlowNet2_checkpoint.pth.tar
(需将原命令处的MpiSintelClean替换为ImagesFromFolder,同时注意datasets.py中对应类的参数itex值是否与图片格式相符,否则会报错IndexError:list index out of range

报错:

error in Correlation_forward_cuda_kernel: invalid device functionncing

解决

Turns out in the 3 make.sh, the architecture is hardcoded to -arch=sm_52 and in my case it needs to be -arch=sm_35.

//
//

command ‘gcc’ failed with exit status 1

export CXXFLAGS="-std=c++11"
export CFLAGS="-std=c99"

你可能感兴趣的:(flownet)