tx2安装 pytorch辛酸史

tx2 jetpack4 cuda10 没有mkl python2

开启tx2最大功率和小风扇

sudo nvpmodel -m 0         # 切换工作模式到最大
cd  /usr/bin/
sudo ./jetson_clocks       # 强制开启风扇最大转速

0x00 验证 torch

import torch
print(torch.__version__)
print('CUDA available: ' + str(torch.cuda.is_available()))
a = torch.cuda.FloatTensor(2).zero_()
print('Tensor a = ' + str(a))
b = torch.randn(2).cuda()
print('Tensor b = ' + str(b))
c = a + b
print('Tensor c = ' + str(c))

0x01 .whl 轮子安装

参考: PyTorch for Jetson Nano
https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/

环境: PyTorch 1.1, Python 2.7 and Python 3.6 , Jetson TX2, JetPack 4.2.

Python 2.7
wget https://nvidia.box.com/shared/static/m6vy0c7rs8t1alrt9dqf7yt1z587d1jk.whl -O torch-1.1.0a0+b457266-cp27-cp27mu-linux_aarch64.whl
pip install torch-1.1.0a0+b457266-cp27-cp27mu-linux_aarch64.whl
Python 3.6
wget https://nvidia.box.com/shared/static/veo87trfaawj5pfwuqvhl6mzc5b55fbj.whl -O torch-1.1.0a0+b457266-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.1.0a0+b457266-cp36-cp36m-linux_aarch64.whl
# 安装torchvision,原创找不到了,吊大的说一下
sudo pip install  --no-deps torchvision==0.2.0

然后用torch执行项目测试遇到这么个报错:

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/Downloads/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:66

解决:

  • TX2 cuda runtime error (7) : too many resources requested for launch ,运行错误解决方法 https://blog.csdn.net/xxradon/article/details/87922618

    解决方法是将pytorch源码中的 CUDA_NUM_THREADS =256
    改了两个文件:

    • pytorch/aten/src/THCUNN/common.h 12行
    • pytorch/aten/src/ATen/cuda/detail/KernelUtils.h 15行

改用源码安装,撸起袖子开始干/v/

0x02 源码安装

参考:Jetson TX2安装pytorch(from source) https://www.jianshu.com/p/9e9c74834283

git clone 速度慢, 网上找到方法说是dns污染,怒改dns为114.114.114.114,或者8.8.8.8,果然单车变摩托
/etc/hosts,我也改了,增加如下:
151.101.72.249 github.global.ssl.fastly.net
192.30.253.112 github.com

1.安装依赖

sudo apt install libopenblas-dev libatlas-dev liblapack-dev 
# 遇到缺什么依赖就装什么依赖,下面这些我就感觉cmake有用,但是还是全装上吧
sudo pip install scipy pyyaml scikit-build cffi
sudo apt-get -y install cmake

2. 添加cudnn

sudo gedit ~/.bashrc

# add end
export CUDNN_LIB_DIR=/usr/lib/aarch64-linux-gnu
export CUDNN_INCLUDE_DIR=/usr/include
export CUDA_ROOT="/usr/local/cuda-10.0/"
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64/:$LD_LIBRARY_PATH"

# 为了保险起见,我在/etc/profile 也增加了上面四行

3.下载pytorch码源包

git clone http://github.com/pytorch/pytorch
cd pytorch
sudo pip install -U setuptools
sudo pip install -r requirements.txt
git checkout tags/v1.1.0 -b build
git submodule update --init --recursive

这个时候会遇到最后一条命令怎么等待进度条都不会增加,令人捉鸡。更改dns。速度上窜了一大截,恢复正常网速。
最后一条一定要执行结束,不然安装的时候会报错。

4.编译

ps:编译之前先修改两个文件的1024为256

 sudo python setup.py build

这一步报错,

 Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-mkldnn --use-qnnpack caffe2'

参考:ubuntu 16.04 Caffe2 / PyTorch - CMake Error at third_party/protobuf/cmake/cmake_install.cmake:64 https://blog.csdn.net/chengyq116/article/details/83817726

解决:

sudo python setup.py install
# 执行的时候报错,发现是`git submodule update --init --recursive`中的少了某个
# 文件,上github下载相应的文件,编译成功

5. 验证安装

python -c "import torch"

# 报错如下
ImportError: No module named _C 

参考:PyTorch源码安装小记 https://blog.csdn.net/Draco_mystack/article/details/71191924

查了下pytorch repo的issues,果然很多人遇到:https://github.com/pytorch/pytorch/issues/7
作者淡定说,不要在pytorch项目根目录下导入torch……
然后就可以了。

6. 濒临崩溃

执行项目代码出现同样错误:

RuntimeError: cuda runtime error (7) : too many resources requested for launch at 

相当于之前的源码(改1024为256)安装解决失败,重新google,解决如下:

RuntimeError: cuda runtime error (7) : too many resources requested for launch at #8103 https://github.com/pytorch/pytorch/issues/8103

# 更改"aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu":

# Around line 62:
# 注释 THCState_getCurrentDeviceProperties(state)->maxThraedsPerBlock;
# 改为
const int num_threads = 512;

# Around line 97
# 注释 THCState_getCurrentDeviceProperties(state)->maxThraedsPerBlock;
# 改为
const int num_threads = 512;

执行项目代码,遇到其他错误,先记录,以待修复,这个就好象是项目里面的代码问题了:

UnboundLocalError: local variable 'pred3' referenced before assignment

你可能感兴趣的:(TX2,linux)