硬核·论文解读(1)——YOLOR

本文目录

  • 论文介绍
  • 论文部分解读(中英对照)
    • 摘要(Abstract)
      • 原文如下:
      • 大致翻译:
    • 第一部分:Introduction
      • 原文如下:
      • 大致翻译:
    • 第二部分:Related work
      • 原文如下:
      • 大致翻译:
    • 第三部分:How implicit knowledge works?
      • 原文如下:
      • 大致翻译:
    • 第四部分:Implicit knowledge in our unified networks
      • 原文如下:
      • 大致翻译:
    • 最后的部分:实验结果及比对前代yolo模型
  • 论文复现
    • 1. 模型简介
    • 2. 使用方法
      • 2.1. 推断使用方法
      • 2.2. 训练使用方法
      • 2.3. 预训练模型路径
      • 2.4. 模型测试数据集
    • 3.模型来源


论文介绍

论文地址:You Only Learn One Representation: Unified Network for Multiple Tasks.
论文项目地址:https:// github.com/WongKinYiu/yolor
为方便同学们下载,我把pdf格式论文放入了百度云:https://pan.baidu.com/s/1qqrwk_-XuNzQ-4u3CyRAEw
提取码:pr18


论文部分解读(中英对照)


摘要(Abstract)

原文如下:

People “understand” the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge).These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks. The source code of this work is at : https:// github.com/WongKinYiu/yolor

大致翻译:

大致是说人类与电脑的学习方式不同,人类可以利用正常的学习或潜意识(在他们称为——implicit knowledge:隐式知识)的学习将知识储存在脑中,并根据这些知识,来处理之前没有接触过的东西。他们提出了一个类似于人脑工作机制的的统一网络,同时处理隐式知识与显示知识。他们在神经网络中执行了kernel space alignment(核空间对齐), prediction refinement(预测细化), and multi-task learning(多任务学习)。随后他们发现了当隐式知识引入神经网络后,有利于神经网络的性能提升并对不同任务的物理捕捉更准确。


第一部分:Introduction

原文如下:

The way to construct the above unified networks is to combine compressive sensing and deep learning, and the main theoretical basis can be found in our previous work [16, 17, 18]. In [16], we prove the effectiveness of reconstructing residual error by extended dictionary. In [17, 18], we use sparse coding to reconstruct feature map of a CNN and make it more robust. The contribution of this work are summarized as follows:
1.We propose a unified network that can accomplish various tasks, it learns a general representation by integrating implicit knowledge and explicit knowledge, and one can complete various tasks through this general representation. The proposed network effectively improves the performance of the model with a very small amount of additional cost (less than one ten thousand of the amount of parameters and calculations.)
2. We introduced kernel space alignment, prediction refinement, and multi-task learning into the implicit knowledge learning process, and verified their effectiveness.
3. We respectively discussed the ways of using vector, neural network, or matrix factorization as a tool to model implicit knowledge, and at the same time verified its effectiveness.
4. We confirmed that the proposed implicit representation learned can accurately correspond to a specific physical characteristic, and we also present it in a visual way. We also confirmed that if operators that conform to the physical meaning of an objective, it can be used to integrate implicit knowledge and explicit knowledge, and it will have a multiplier effect.
5. Combined with state-of-the-art methods, our proposed unified network achieved comparable accuracy as Scaled-YOLOv4-P7 on object detection and the inference speed has been increased 88%.

大致翻译:

第一、二点是指出在摘要中提出的implicit knowledge结合的网络优势明显且进行了具体应用,并验证了有效性。第三点大致是说利用vector, neural network, or matrix factorization 三种建模方法,去验证了有效性。第四点说明所谓的implicit knowledge可以在网络训练后准确地对应一个物理特征。第五点表述与yolov4的比较,准确度有所提升。


第二部分:Related work

原文如下:

This literature review is mainly divided into three aspects: (1) explicit deep learning: it will cover some methods that can automatically adjust or select features based on input data, (2) implicit deep learning: it will cover the related literature of implicit deep knowledge learning and implicit differential derivative, and (3) knowledge modeling: it will list several methods that can be used to integrate implicit knowledge and explicit knowledge.
2.1. Explicit deep learning
Explicit deep learning can be carried out in the following ways. Among them, Transformer [14, 5, 20] is one way, and it mainly uses query, key, or value to obtain self-attention. Non-local networks [21, 4, 24] is another way to obtain attention, and it mainly extracts pair-wise attention in time and space. Another commonly used explicit deep learning method [7, 25] is to automatically select the appropriate kernel by input data.
2.2. Implicit deep learning
The methods that belong to the category of implicit deep learning are mainly implicit neural representations [11] and deep equilibrium models [2, 3, 19]. The former is mainly to obtain the parameterized continuous mapping representation of discrete inputs to perform different tasks, while the latter is to transform implicit learning into a residual form neural networks, and perform the equilibrium point calculation on it.
2.3. Knowledge modeling
As for the methods belonging to the category of knowledge modeling, sparse representation [1, 23] and memory networks [22, 12] are mainly included. The former uses exemplar, predefined over complete, or learned dictionary to perform modeling, while the latter relies on combining various forms of embedding to form memory, and enable memory to be dynamically added or changed.

大致翻译:

论述了具体工作,从三个方面,Explicit deep learning, Implicit deep learning, Knowledge modeling.这三个方面的具体论述可以去看论文,本人目前还在学习中。


第三部分:How implicit knowledge works?

原文如下:

The main purpose of this research is to conduct a unified network that can effectively train implicit knowledge, so first we will focus on how to train implicit knowledge and inference it quickly in the follow-up. Since implicit representation zi is irrelevant to observation, we can think of it as a set of constant tensor Z = {z1, z2, …, zk}. In this section we will introduce how implicit knowledge as constant tensor can be applied to various tasks.
3.1. Manifold space reduction
3.2. Kernel space alignment
3.3. More functions

大致翻译:

本研究的主要目的是建立一个能够有效的统一网络的implicit knowledge训练网络,首先集中于如何训练implicit knowledge和快速推理。由于隐式表示 z 与观察结果无关,我们可以把它看作是一组常数张量 Z={z1,z2,zk}。在本节中,主要介绍隐知识作为常张量应用于各种任务。具体的详见论文。


第四部分:Implicit knowledge in our unified networks

原文如下:

In this section, we shall compare the objective function of conventional networks and the proposed unified networks, and to explain why introducing implicit knowledges important for training a multi-purpose network. At the same time, we will also elaborate the details of the method proposed in this work.
4.1. Formulation of implicit knowledge
4.2. Modeling implicit knowledge
4.3. Training
4.4. Inference

大致翻译:

本节中,比较传统网络和所提出的统一网络的目标函数,并解释为什么引入隐式知识对于训练多用途网络很重要。对传统网络与提出的隐式网络进行对比,同时对隐知识建模进行了阐述,并对模型的训练过程和推理过程做出阐述。


最后的部分:实验结果及比对前代yolo模型

具体详见论文,图表较多不方便翻译。


论文复现

1. 模型简介

YOLOR模型于2021年由开发者Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao在《You Only Learn One Representation: Unified Network for Multiple Tasks》提出。

YOLOR is the latest State-Of-the-Art Object Detection Model that is better and faster than YOLOv4, Scaled YOLOv4, YOLOv5 and PP-YOLOv2!
In this tutorial I will show you how to install and run YOLOR Object Detection on images and video.You can run this either on CPU or CUDA Supported GPU (Nvidia Only). I achieved 3 FPS on CPU and 15 FPS on GPU (1080Ti).

2. 使用方法

2.1. 推断使用方法

可以使用/root/yolor/detect.py文件进行推断。 在终端中,在目录/root/yolor下,输入:

# 示例
python detect.py --source inference/images/horses.jpg --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --conf 0.25 --img-size 1280 --device 0 

若需要查看如何部署请转至:点这里!

2.2. 训练使用方法

准备COCO128数据集:

tips:需要更改coco.yaml下文件路径

cd /root/yolor
python train.py --batch-size 8 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0 --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300 
optional arguments:
  -h, --help            show this help message and exit
  --weights WEIGHTS     initial weights path
  --cfg CFG             model.yaml path
  --data DATA           data.yaml path
  --hyp HYP             hyperparameters path
  --epochs EPOCHS
  --batch-size BATCH_SIZE
                        total batch size for all GPUs
  --img-size IMG_SIZE [IMG_SIZE ...]
                        [train, test] image sizes
  --rect                rectangular training
  --resume [RESUME]     resume most recent training
  --nosave              only save final checkpoint
  --notest              only test final epoch
  --noautoanchor        disable autoanchor check
  --evolve              evolve hyperparameters
  --bucket BUCKET       gsutil bucket
  --cache-images        cache images for faster training
  --image-weights       use weighted image selection for training
  --device DEVICE       cuda device, i.e. 0 or 0,1,2,3 or cpu
  --multi-scale         vary img-size +/- 50%
  --single-cls          train as single-class dataset
  --adam                use torch.optim.Adam() optimizer
  --sync-bn             use SyncBatchNorm, only available in DDP mode
  --local_rank LOCAL_RANK
                        DDP parameter, do not modify
  --log-imgs LOG_IMGS   number of images for W&B logging, max 100
  --workers WORKERS     maximum number of dataloader workers
  --project PROJECT     save to project/name
  --name NAME           save to project/name
  --exist-ok            existing project/name ok, do not increment

2.3. 预训练模型路径

在路径/root/yolor下。有yolor_p6.pt文件。

2.4. 模型测试数据集

数据集路径:/datasets/coco128

COCO128数据是由coco2017的前128张图片组成。COCO通过大量使用Amazon Mechanical Turk来收集数据。COCO数据集现在有3种标注类型:object instances(目标实例), object keypoints(目标上的关键点), 和image captions(图片说明),使用JSON文件存储。上面所述的三种类型,每种类型又包含了训练和验证,所以共6个JSON文件。

3.模型来源

点这里:https://github.com/WongKinYiu/yolor

你可能感兴趣的:(深度学习论文解读,神经网络,深度学习,人工智能,网络)