TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu

TPH-YOLOv5

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu_第1张图片

文章目录

      • TPH-YOLOv5
        • 参考
        • Introduciton
        • Structure
          • CSPDarknet53
          • Transformer
          • CBAM
          • Ms-testing and model ensemble.
          • Self-trained classifier
        • 效果

参考

  • TPH-YOLOv5:基于Transformer的改进YOLOv5的无人机目标检测
  • YOLOv4: Optimal Speed and Accuracy of Object Detection
  • self-attention与Transformer补充
  • 注意力模型CBAM

Introduciton

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu_第2张图片

  • TPH-YOLOv5的整体架构如上所示,YOLOv5我没怎么看过,不过我们可以直接看它的结构

    • 首先Backbone部分,前8块和之前区别不大,作者的描述是CSPDarknet53 backbone with three transformer encoder blocks at the end,也就是它其实在CSPDarknet53点基础上在SPP模块的后面添加了三个Transformer层
    • 接下来Neck部分,明显是参考了PANet的结构,只不过用上了CBAM和Transformer的结构
    • 然后是Header部分,用的是魔改的TPH(transformer prediction heads),作者的意思是从Transformer开始到检测输出这段都叫TPH,在我看来其实就是用了Transformer模块然后把这块强行叫做TPH模块,可能就类似‘拍了拍 Transformer’?\手动狗头

Structure

CSPDarknet53
  • 关于这部分,可以参考YOLOv4中对应部分,其实说白了就是在一堆残差块边上再加上一个残差边(可以看做大号残差)
    • YOLOv4: Optimal Speed and Accuracy of Object Detection
Transformer
  • 关于这部分,也有一篇相关记录,是看李宏毅老师的课的笔记,个人觉得李老师讲的很清楚(只看encoder部分即可)

    • self-attention与Transformer补充
  • TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu_第3张图片

  • 这里可以对比下原图和作者给出的Encoder的区别

  • TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu_第4张图片

  • 可以看到还是有一丢丢不一样的,比如embedding后原版是直接进Multi-Head Attention的,而作者是先做了LayerNorm(这个东东没有详细提,应该是类似BN的Norm,可参考下图),然后作者的版本还多了一个Dropout的操作,这玩意在我的理解为多个subnetwork防止过拟合同时提高检测性能

  • TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu_第5张图片

CBAM
  • 全称是Convolutional block attention module,这玩意的论文还没看,在补了,可以先参考这个:

    • 注意力模型CBAM
  • 目前的理解是使得网络的注意力集中在某些channel和spatial上,以提高检测精度

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captu_第6张图片

Ms-testing and model ensemble.
  • 这玩意就是多尺度检测+合并
Self-trained classifier
  • 所谓Self-trained classifier就是把这部分额外摘出来用相关数据做图像分类的task train一遍,然后合并回去,这。。。

效果

你可能感兴趣的:(论文阅读杂记,transformer,深度学习)