Ubuntu18.04
中安装,只能,必须,一定是Ubuntu18.04SDK Manger
,登录Nvidia账户sudo apt-get update
升级一下deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main restricted universe multiverse
#deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main restricted universe multiverse
#deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main restricted universe multiverse
#deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main universe restricted
#deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main universe restricted
注:这一部分没有成功,但还是记录一下
YoloV5的官方要求是Python>=3.7.0,PyTorch>=1.7.
但是打开参考资料一发现,所有提供的pytorch版本的python都是3.6的,这就不满足YoloV5的运行要求。因此准备使用源代码编译的方式进行安装。
之前安装Pytorch都是先安装conda然后再安装pytorch,所以这个思路也被放到了TX2上,工作完全做完之后发现浪费了好多时间。
Miniconda3 Linux-aarch64 64-bit
文件,只有100.9MiB。BUT,根本装不上!ARM aarch64
,而TX2的是aarch64
,这有啥区别?archiconda3
,这个直接上官网下载后安装就可以了。#!/usr/bin/env bash
while [ 1 ]
do
git clone --recursive --branch v1.7.1 https://github.com/pytorch/pytorch
[ "$?" = "0" ] && break
done
echo "clone finished"
cd pytorch
while [ 1 ]
do
echo "submodule"
git submodule update --init --recursive
[ "$?" = "0" ] && break
done
chmod +x filename
来添加可执行权限$ sudo nvpmodel -m 0 && sudo jetson_clocks
,然后使用./filename
来运行git脚本diff --git a/aten/src/ATen/cpu/vec256/vec256_float_neon.h b/aten/src/ATen/cpu/vec256/vec256_float_neon.h
index cfe6b0ea0f..d1e75ab9af 100644
--- a/aten/src/ATen/cpu/vec256/vec256_float_neon.h
+++ b/aten/src/ATen/cpu/vec256/vec256_float_neon.h
@@ -25,6 +25,8 @@ namespace {
// https://bugs.llvm.org/show_bug.cgi?id=45824
// Most likely we will do aarch32 support with inline asm.
#if defined(__aarch64__)
+// See https://github.com/pytorch/pytorch/issues/47098
+#if defined(__clang__) || (__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3))
#ifdef __BIG_ENDIAN__
#error "Big endian is not supported."
@@ -665,6 +667,7 @@ Vec256<float> inline fmadd(const Vec256<float>& a, const Vec256<float>& b, const
return Vec256<float>(r0, r1);
}
-#endif
+#endif /* defined(__clang__) || (__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3)) */
+#endif /* defined(aarch64) */
}}}
diff --git a/aten/src/ATen/cuda/CUDAContext.cpp b/aten/src/ATen/cuda/CUDAContext.cpp
index fd51cc45e7..e3be2fd3bc 100644
--- a/aten/src/ATen/cuda/CUDAContext.cpp
+++ b/aten/src/ATen/cuda/CUDAContext.cpp
@@ -24,6 +24,8 @@ void initCUDAContextVectors() {
void initDeviceProperty(DeviceIndex device_index) {
cudaDeviceProp device_prop;
AT_CUDA_CHECK(cudaGetDeviceProperties(&device_prop, device_index));
+ // patch for "too many resources requested for launch"
+ device_prop.maxThreadsPerBlock = device_prop.maxThreadsPerBlock / 2;
device_properties[device_index] = device_prop;
}
diff --git a/aten/src/ATen/cuda/detail/KernelUtils.h b/aten/src/ATen/cuda/detail/KernelUtils.h
index 45056ab996..81a0246ceb 100644
--- a/aten/src/ATen/cuda/detail/KernelUtils.h
+++ b/aten/src/ATen/cuda/detail/KernelUtils.h
@@ -22,7 +22,10 @@ namespace at { namespace cuda { namespace detail {
// Use 1024 threads per block, which requires cuda sm_2x or above
-constexpr int CUDA_NUM_THREADS = 1024;
+//constexpr int CUDA_NUM_THREADS = 1024;
+
+// patch for "too many resources requested for launch"
+constexpr int CUDA_NUM_THREADS = 512;
// CUDA: number of blocks for threads.
inline int GET_BLOCKS(const int64_t N) {
diff --git a/aten/src/THCUNN/common.h b/aten/src/THCUNN/common.h
index 69b7f3a4d3..85b0b1305f 100644
--- a/aten/src/THCUNN/common.h
+++ b/aten/src/THCUNN/common.h
@@ -5,7 +5,10 @@
"Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one.")
// Use 1024 threads per block, which requires cuda sm_2x or above
-const int CUDA_NUM_THREADS = 1024;
+//constexpr int CUDA_NUM_THREADS = 1024;
+
+// patch for "too many resources requested for launch"
+constexpr int CUDA_NUM_THREADS = 512;
// CUDA: number of blocks for threads.
inline int GET_BLOCKS(const int64_t N)
.patch
的后缀名,并将其和下载的pytorch项目放到同一级文件夹下patch -d pytorch/ -p1 < xxx.patch
打补丁sudo apt-get install python3-pip cmake libopenblas-dev libopenmpi-dev
pip3 install -r requirements.txt
pip3 install scikit-build
pip3 install ninja
USE_NCCL=0 USE_DISTRIBUTED=0 USE_QNNPACK=0 USE_PYTORCH_QNNPACK=0 TORCH_CUDA_ARCH_LIST="10.0;10.1" PYTORCH_BUILD_VERSION=1.7.1 PYTORCH_BUILD_NUMBER=1 python3 setup.py install
install
改成bdist_wheel
会生成安装文件third_party
下的QNNPACK
组件,然后放到相应的目录下。v1.7.1
third_party
下缺失的组件放置完成后,重新执行上述命令CMakeError.log
文件,发现文档的最后报错src.c:(.text+0x30):对‘vld1q_f32_x2’未定义的引用
vld1q_f32_x2
是在gcc10里面定义的,因此要先编译gcc10,这过程太繁杂了,所以先放弃这种安装方式编译安装的pytorch无法运行,那么只能先安装官方编译好的1.7版本,遇到问题再解决
torch-1.7.0-cp36-cp36m-linux_aarch64.whl
文件,由于要求的python版本是3.6,因此我们使用系统中自带的python。~/.bashrc
文件,将安装Archiconda3时所加的最后几行注释pip install --user torch-1.7.0-cp36-cp36m-linux_aarch64.whl
命令安装,等安装完成后测试。发现torch能正常加载并且cuda
也能正常使用mkdir ~/.pip
,打开文件~/.pip/pip.conf
并填入以下内容[global]
index-url=https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install --upgrade pip
python -m pip install Cython scikit-build matplotlib numpy opencv-python PyYAML requests scipy tqdm tensorboard pandas seabornls
python -m pip install pillow --no-cache-dir
#!/usr/bin/env bash
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libfreetype6-dev
while [ 1 ]
do
git clone -b v0.8.1 https://github.com/pytorch/vision torchvision
[ "$?" = "0" ] && break
done
setup.py
文件,将version
变量由0.8.0a0
改成0.8.1
sudo python setup.py install
python detect.py --view-img --source person.mp4
'8.0.1.6'
,这有时候会导致onnx文件生成错误,此时打开export.py
文件,将if trt.__version__[0] == '7':
中的7改为8
,让代码走TensorRT
的版本原本属于7的部分python export.py --weights yolov5s.pt --include engine --imgsz 512 --device 0
imgsz
的值改到512,可使1080p视频的单帧处理时间降到0.4秒以下,这样就达到了一秒25帧的目标yolov5s.engine
文件python detect.py --weights yolov5s.engine --source person.mp4 --view-img --imgsz 512
imgsz
的参数做相同的大小调整即可