(学习笔记)【目标检测】YOLO系列简单归纳

文章目录

  • 絮絮叨叨
  • 一、YOLO发展史
  • 二、逐篇学习
    • 1.You Only Look Once: Unified, Real-Time Object Detection
    • 2.YOLO9000
    • 3.YOLOv3: An Incremental Improvement
    • 4.YOLOv4: Optimal Speed and Accuracy of Object Detection
    • 5.TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios
    • 6.YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
    • 7.YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
      • Abstract
      • Introduction
      • Related work
      • Architecture
      • Trainable bag-of-freebies
  • 知识点专区
    • SWINL Cascade-Mask R-CNN
    • ConvNeXt-XL Cascade-Mask R-CNN

絮絮叨叨

本博客是笔者做的关于YOLO学习笔记,基本属于薄弱基础硬啃文献。如博客中错误,欢迎大家指出。

@2022/10/25 创建本博客 现实原因,先学习YOLOv7的文章。

一、YOLO发展史

2016年,You Only Look Once: Unified, Real-Time Object Detection,链接: link
2017年,YOLO9000,链接: link
2018年,YOLOv3: An Incremental Improvement,链接: link
2020年,YOLOv4: Optimal Speed and Accuracy of Object Detection,链接: link
2021年,TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios,链接: link
2022年,YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications,链接: link
2022年,YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,链接: link Source code: link

二、逐篇学习

1.You Only Look Once: Unified, Real-Time Object Detection

暂留

2.YOLO9000

暂留

3.YOLOv3: An Incremental Improvement

暂留

4.YOLOv4: Optimal Speed and Accuracy of Object Detection

暂留

5.TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios

暂留

6.YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications

暂留

7.YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Abstract

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWINL Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.

Introduction

  1. A necessary component in multi-object tracking, autonomous driving, robotics, medical image analysis …
  2. The computing device: CPU, GPU, NPU
    NPU:neural processing units, such as, the Apple neural engine (Apple), the neural compute stick (Intel), Jetson AI edge devices (Nvidia), the edge TPU (Google), the neural processing engine (Qualcomm), the AI processing unit (MediaTek), and the AI SoCs (Kneron)
    tips: 什么是边缘设备?link
  3. 实时目标检测的两个发展方向: Different edge device and the design of efficient architecture
    __Different edge device __:
    MCUNet and NanoDet: producing low-power single-chip and improving the inference speed on edge CPU.
    YOLOX and YOLOR, they focus on improving the inference speed of various GPUs.
    __The design of efficient architecture __:
    CPU: MobileNet, ShuffleNet, or GhostNet.
    GPU:ResNet, DarkNet, or DLA, then using CSPNet strategy to optimize the architecture.
  4. This paper focus on the optimization of the training process.(model re-parameterization and dynamic label assignment)
  5. The contributions of this paper
    (1) design several trainable bag-of-freebies method: greatly improve the detection accuracy without extra inference cost;
    (2) re-parameterization module replace orginal module and dynamic label assignment strategy deals with assignment to different output layers.
    (3) “extend” and “compound scaling” method.
    (4) reduce 40% parameters and 50% computation of state-of-the-art real-time object detector, faster inference speed and higher detection accuracy.

Related work

  1. Real-time object detectors
    (1)State-of-the-art real-time object detectors are mainly based on YOLO and FCOS.
    (2)To become a state-of-the-art real-time object detector :
    a faster and stronger network architecture
    a more effective feature integration method
    a more accurate detection method
    a more robust loss function
    a more efficient label assignment method
    a more efficient training method
  2. Model scaling
    (1)Model scaling is to achieve a good trade-off for the amount of network parameters, computation, inference speed, and accuracy.
    (2) A new compound scaling method for the concatenation-based model.

Architecture

  1. Extended efficient layer aggregation networks

  2. Model scaling for concatenation-based models

Trainable bag-of-freebies

  1. Planned re-parameterized convolution

  2. Coarse for auxiliary and fine for lead loss

  3. Other trainable bag-of-freebies

知识点专区

SWINL Cascade-Mask R-CNN

ConvNeXt-XL Cascade-Mask R-CNN

你可能感兴趣的:(自学笔记,目标检测,学习,深度学习)