Pytorch内集成有视频分类功能,提供了三个基于Kinectis400/600/700数据集(https://deepmind.com/research/open-source/kinetics)训练的网络模型(ResNet3D、Mixed Convoluation、R(2+1)D )可用来实现视频动作识别(Activity Recognition)。
相关论文链接 https://arxiv.org/abs/1711.11248
模型的代码参见
https://github.com/pytorch/vision/blob/master/torchvision/models/video/resnet.py
模型的训练参见这里的代码
https://github.com/pytorch/vision/tree/master/references/video_classification
使用python版TensorRT加速模型的部署参见:
https://github.com/kn1ghtf1re/Activity-Recognition-TensorRT
这里先不说Kinectics数据集和模型的训练,先说如何实验使用python版TensorRT部署加速模型,使用C++版的TensorRT部署加速模型可以参照python版来实现,因为代码涉及商业机密,所以不便多说。
https://github.com/kn1ghtf1re/Activity-Recognition-TensorRT目前有了trt7.2和trt8.0两个branch,如果想把模型部署到NVIDIA的Jetson序列板子上运行,当然还是得参考trt7.2版,因为目前Jetson平台上的最新的Jet Pack也只还有TensorRT7,没有支持TensorRT8,所以你在TensorRT8下展开工作的话,到时在Jetson板子上部署还得重新去实验实现一遍几乎是瞎忙活了,因为相对于TensorRT7来说,TensorRT8的API变化有点大。
Activity-Recognition-TensorRT的requirements的安装呢,如果环境中一些所需的例如numpy这样的基础支持包的版本已经比较新了,那可能安装和使用会比较顺利一点,使用:
pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
即可。如果出现一些七七八八的冲突或者错误,可以按照以下的方式分步安装:
pip3 install --upgrade pip3
pip3 install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com numpy ==1.21.1
pip3 install -U -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com protobuf
pip3 install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com scikit-build appdirs Mako MarkupSafe netron pytools==2021.2.7 pycuda==2021.1 opencv-python
如果安装完requirements后运行时发生Illegal instruction (core dumped),并且调试发现是import pycuda.driver as cuda这句这里崩溃的,那么是可能是因为pycuda没安装的太早,前面一些支持包还没有安装造成安装有问题,一般重新安装一遍就可以了:
pip3 uninstall pycuda
pip3 install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com numpy pytools==2021.2.3 pycuda==2021.1
如果报错:
numpy/core/src/multiarray/numpyos.c:18:10: fatal error: xlocale.h: No such file or directory
#include
^~~~~~~~~~~
compilation terminated.
原因是我使用的Numpy是1.19.5,numpy/core/的目录中根本没有src,这样自然找不到文件,而最新的numpy1.20.2是有这个目录的,执行下面的命令安装:
pip3 install cython numpy==1.20.2
如果是在Jetson 平台上安装,numpy目前还没有1.20.2版本的arm64版,需要下载源码自己编译安装:
wget https://github.com/numpy/numpy/archive/refs/tags/v1.20.2.tar.gz
tar xf v1.20.2.tar.gz
cd numpy-1.20.2
python3 setup.py install
如果执行编译和安装时报错ModuleNotFoundError: No module named 'Cython',使用下面的命令安装cython:
pip3 install cython
执行python3 setup.py install如果报下面的错误,说明目前你使用的python3版本低了,需要安装至少python3.7以上的版本:
Traceback (most recent call last):
File "setup.py", line 30, in
raise RuntimeError("Python version >= 3.7 required.")
RuntimeError: Python version >= 3.7 required
#首先需要安装一些用于编译的支持库
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo apt-get install build-essential python3-dev python3-setuptools python3-pip pytho3n-smbus libncursesw5-dev libgdbm-dev libc6-dev zlib1g-dev libsqlite3-dev tk-dev libssl-dev openssl libffi-dev
#sudo apt-get install libffi-dev
#下载源码并进行编译和安装
wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz
tar xf Python-3.9.6.tgz
cd Python-3.9.6
./configure --with-ssl --prefix=/usr/local/python3
make
sudo rm -rf /usr/local/python3 #删除旧版python3
sudo make install
如果上面的编译安装过程中发生下面的错误:
File "/tmp/tmp908ibhic/pip-21.1.1-py3-none-any.whl/pip/_vendor/distro.py", line 125, in linux_distribution
return _distro.linux_distribution(full_distribution_name)
File "/tmp/tmp908ibhic/pip-21.1.1-py3-none-any.whl/pip/_vendor/distro.py", line 681, in linux_distribution
self.version(),
File "/tmp/tmp908ibhic/pip-21.1.1-py3-none-any.whl/pip/_vendor/distro.py", line 741, in version
self.lsb_release_attr('release'),
File "/tmp/tmp908ibhic/pip-21.1.1-py3-none-any.whl/pip/_vendor/distro.py", line 903, in lsb_release_attr
return self._lsb_release_info.get(attribute, '')
File "/tmp/tmp908ibhic/pip-21.1.1-py3-none-any.whl/pip/_vendor/distro.py", line 556, in __get__
ret = obj.__dict__[self._fname] = self._f(obj)
File "/tmp/tmp908ibhic/pip-21.1.1-py3-none-any.whl/pip/_vendor/distro.py", line 1014, in _lsb_release_info
stdout = subprocess.check_output(cmd, stderr=devnull)
File "/home/jzyq/Python-3.9.6/Lib/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/jzyq/Python-3.9.6/Lib/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('lsb_release', '-a')' returned non-zero
错误原因是已有的低版本python3的lsb_release引起的,删掉或者改名即可:
sudo rm /usr/bin/lsb_release
安装完成后,再强制将最新的python3.9.6作为默认的python3,并更新pip和steuptools:
sudo rm /usr/bin/python3
sudo rm /usr/bin/pip3
sudo ln -s /usr/local/python3/bin/python3.9 /usr/bin/python3
sudo ln -s /usr/local/python3/bin/pip3.9 /usr/bin/pip3
rm /usr/bin/pip
sudo ln -s /usr/bin/pip3 /usr/bin/pip
/usr/bin/python3 -m pip install --upgrade pip
pip install setuptools --upgrade
如果是在X86平台上安装,接下来下载一个Tensorrt7.1.3或者7.2.3的tar包,解开tar包,进到python目录下,执行对应于python3.9的python版tensorrt 的whl文件安装python版tensorrt:
sudo pip3 install tensorrt-7.2.3-cp39-none-linux_x86_64.whl
如果是Jetson平台,没有现成的python版whl包,需要自己想办法从Tensorrt的python源码编译出whl文件,但是我发现https://github.com/NVIDIA/TensorRT只从release/8.0这个branch才开始提供TensorRT/python/这个目录和代码的,从脚本里的代码和编译出来后对应的文件名字来看确实都是8.0版本的,所以Jetson平台上似乎没法自己编译出TensorRT7.x的whl包,所以对于TensorRT7.x的python版的源码还没找到从哪里去弄。
如果是在主机环境里,可以执行下面的命令运行程序来打开摄像头进行动作识别了:
python3 action_recognition_tensorrt.py --stream webcam --model resnext-101-kinetics.onnx --fp16 --frameskip 3
程序启动后会出现一个qt窗口,你在摄像头前随便做各种动作,会有一个识别结果和所花时间显示出来,当然很多时候可能识别的动作结果都不那么准确。
如果是在docker里安装的上述环境,在里面执行上面的命令启动action_recognition_tensorrt.py程序时,可能会报下面的错误:
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.6/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
这个原因可能有两个,一个是缺少一个支持库,还有一个是需要配置xwindow的授权。
首先执行
export QT_DEBUG_PLUGINS=1
ldd /usr/local/lib/python3.9/site-packages/cv2/qt/plugins/platforms/libqxcb.so
发现,ldd会报libxcb-xinerama.so.0库不存在,安装这个库:
sudo apt-get install libxcb-xinerama0
再次执行下面的命令,报很多 is not an ELF object错误,然后abort:
python3 action_recognition_tensorrt.py --stream webcam --model resnext-101-kinetics.onnx --fp16 --frameskip 3
QFactoryLoader::QFactoryLoader() checking directory path "/usr/local/lib/python3.6/site-packages/cv2/qt/plugins" ...
QFactoryLoader::QFactoryLoader() checking directory path "/usr/local/bin" ...
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/2to3"
QElfParser: '/usr/local/bin/2to3-3.6' is not an ELF object
"'/usr/local/bin/2to3-3.6' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/2to3-3.6"
QElfParser: '/usr/local/bin/2to3-3.6' is not an ELF object
"'/usr/local/bin/2to3-3.6' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/backend-test-tools"
QElfParser: '/usr/local/bin/backend-test-tools' is not an ELF object
"'/usr/local/bin/backend-test-tools' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/chardetect"
QElfParser: '/usr/local/bin/chardetect' is not an ELF object
"'/usr/local/bin/chardetect' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/check-model"
QElfParser: '/usr/local/bin/check-model' is not an ELF object
"'/usr/local/bin/check-model' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/check-node"
QElfParser: '/usr/local/bin/check-node' is not an ELF object
"'/usr/local/bin/check-node' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/cmake"
"Failed to extract plugin meta data from '/usr/local/bin/cmake'"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/convert-caffe2-to-onnx"
QElfParser: '/usr/local/bin/convert-caffe2-to-onnx' is not an ELF object
"'/usr/local/bin/convert-caffe2-to-onnx' is not an ELF object"
not a plugin
QFactoryLoader::QFactoryLoader() looking at "/usr/local/bin/convert-onnx-to-caffe2"
QElfParser: '/usr/local/bin/convert-onnx-to-caffe2' is not an ELF object
...
实际错误原因跟那些文件不是ELF object毫无关系,而是因为xhost显示权限的问题,解决思路是主机首先授权(需要在主机的图形界面上操作,使用远程文字终端不行),
查看主机上所启动的图像界面终端的DISPLAY值:
echo $DISPLAY
DISPLAY =:1
然后确保主机安装了x11-server-utils,如果没有安装,则执行下面的命令安装:
sudo apt-get install x11-xserver-utils
然后执行xhost
xhost +
允许所有的client连接。
第三,确保docker容器配置了DISPLAY环境变量,如果容器在创建时docker run命令里没有加 -e DISPLAY=$DISPLAY或者加了但是当时主机环境里的DISPLAY没有值,那么都可以通过修改docker容器的配置文件来实现增加DISPLAY环境的设置:
#修改作为客户端的容器的配置文件config.v2.json,修改docker容器配置文件前必须停掉docker服务,否则修改的内容不能生效,而且在docker容器重启时又丢掉了!
systemctl stop docker
#修改config.v2.json,在ENV这项增加"DISPLAY=:1" 并保存后退出
vi /var/lib/docker/containers/47284ff0fea7aabc5b82de8340c8b85592ba072cb7b6e615b9b61284eb1043b0/config.v2.json
"Env":["DISPLAY=:1",...]
#重启启动docker service和启动容器,配置的修改即生效了
systemctl start docker
docker start 47284ff0fea7
第四,如果程序使用的是主机直连的摄像头,确保主机的/dev/video0之类的devices资源也映射到了容器内:
###注意如果使用主机连接的摄像头,还得给docker容器加上摄像头资源映射,需要修改hostconfig.json:
systemctl stop docker
vi /var/lib/docker/containers/47284ff0fea7aabc5b82de8340c8b85592ba072cb7b6e615b9b61284eb1043b0/hostconfig.json
#在Deivices处增加"/dev/video0":"/dev/video0"这个映射,保存hostconfig.json修改并退出
#如果这样写: "Devices":["/dev/video0":"/dev/video0",...]不行,则使用下面的写法:
"Devices":[{"PathOnHost":"/dev/video0","PathInContainer":"/dev/video0","CgroupPermissions":"rwm"}]
#重启启动docker service和docker容器,配置修改即生效
systemctl start docker
docker start 47284ff0fea7
再次执行下面的命令启动action_recognition_tensorrt程序即可运行:
###kinetics 400
python3 action_recognition_tensorrt.py --stream webcam --model resnet-101-kinetics.onnx --fp16 --frameskip 3
###kinetics 700+Moment
python3 action_recognition_tensorrt.py --stream webcam --model resnet-18-kinetcis-moments.onnx --fp16 --frameskip 3