Yolo-v3 Slim

Yolo-v3 Slim

  • Yolo-v3模型精简化原因
  • Yolo-v3模型Prune

Yolo-v3模型精简化原因

最近一直在海思芯片3519A芯片基础架构上使用Yolo-v3作为检测网络进行目标检测研究,但即便使用NNIE进行硬件矩阵加速单元运算,也很难做到实时性。当Yolo-v3有80个类别需要进行检测时需要140ms(三层Yolo层和53层卷积层运算所耗时几乎相等)。虽然Yolo-v3-Tiny或Moblie-SSD速度较快(32ms和18ms),但对于小物体检测漏检率和误报率还是较高。所以只能从Yolo-v3层剪枝,通道剪枝,知识蒸馏等三方面入手压缩模型减少检测时间。其参考的文献和代码如下所示:

文章

  1. SlimYOLOv3: Narrower, Faster and Better for Real-Time UAV Applications
  2. Learning Efficient Convolutional Networks Through Network Slimming
  3. Learning Efficient Object Detection Models with Knowledge Distillation
  4. Distilling the Knowledge in a Neural Network

Github

  1. https://github.com/PengyiZhang/SlimYOLOv3
  2. https://github.com/tanluren/yolov3-channel-and-layer-pruning

Yolo-v3模型Prune

压缩模型最常见的三种手段是:层剪枝,通道剪枝,知识蒸馏。层剪枝和通道剪枝原理基于bn层Gmma系数进行通道剪枝。知识蒸馏具体介绍:https://www.cnblogs.com/shixiangwan/p/9015010.html。今天主要讲解
前者使用层剪枝和通道剪纸对模型进行压缩。其大致流程如下所述:首先在Yolo-v3网络中训练自己数据集达到理想精度,然后再进行稀疏训练(重中之重),其次对需要剪枝的层使用对应的BN gamma系数进行大幅压缩,对不重要的通道或者层进行剪枝,最后如果当剪枝后模型精度降低过多,可进行微调恢复精度。

使用自己数据集训练到理想精度的过程,就不再详细描述。详细可见DarkNetAB的Github仓,此作者已经进行详细描述。而博主只是将自己将自己在稀疏化训练过程中碰到的问题及解决方法进行大概简述,方便以后碰上这种问题有迹可寻。

1.Run Python Train 的过程中出现:


UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/work/xxx/yolov3-channel-and-layer-pruning-master/utils/datasets.py", line 416, in __getitem__
    img, labels = load_mosaic(self, index)
  File "/work/xxx/yolov3-channel-and-layer-pruning-master/utils/datasets.py", line 592, in load_mosaic
    labels4.append(labels)
UnboundLocalError: local variable 'labels' referenced before assignment

解决方法:需要再labels4.append(labels)前面敲4个空格对齐–不能为两个TAB。

2.当解决上述问题时,当程序正常运行1-epoch的时候,程序会报以下错误。

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by 
(1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; 
(2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).(prepare_for_backward at /opt/conda/conda-bld/pytorch_1573049306803/work/torch/csrc/distributed/c10d/reducer.cpp:518

frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f2600f45687 in /home/hujun/anaconda3/envs/py37/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::prepare_for_backward(std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&) + 0x7b7 (0x7f263201a667 in /home/hujun/anaconda3/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: <unknown function> + 0x7cfca1 (0x7f2632008ca1 in /home/hujun/anaconda3/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x2065e6 (0x7f2631a3f5e6 in /home/hujun/anaconda3/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch_python.so)<omitting python frames>
frame #29: __libc_start_main + 0xf0 (0x7f2641066830 in /lib/x86_64-linux-gnu/libc.so.6)

Yolo-v3 Slim_第1张图片
解决方法:这个原因,博主一开始以为数据集过大,因读取图片时间过程,导致第二轮训练导致失败。在不断试错后,终于再https://github.com/ultralytics/yolov3/issues/404 中找到原因。因为博主在稀疏训练时,数据集并不是显卡和BacthSize的整数倍数关系(例如:我的GPU群如图所示GPU=8,BacthSize = 32,那么训练的数据集因为为二者的最小公倍数 )。所以导致此类错误。此外,作者强调此代码运行在Python >= 3.7, PyTorch >= 1.3的环境中。
Yolo-v3 Slim_第2张图片

你可能感兴趣的:(机器学习)