wxplol

《YOLOv4: Optimal Speed and Accuracy of Object Detection》论文翻译

最新的YoloV4已经出来好久了，今天主要读一下看看相比于YoloV3有什么改进和创新的地方，主要是来学习学习。废话不多说，开始。

Abstract	摘要
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ∼65 FPS on Tesla V100. Source code is at https://github.com/AlexeyAB/darknet.	大量的特征可以提高卷积神经网络(CNN)的准确率。这需要在大型数据集上对这些特征的组合进行实际测试，并且需要理论上对结果进行分析。一些特性操作用在定的模型上并且是为了解决特定的问题，或者仅针对小的数据集；而一些特性，如BN层和残差连接，适用于大多数模型、任务和数据集。我们假设有一些通用特性如：加权残差连接(WRC)、(CSP)，交叉小批归一化(CMBN)，自我对抗训练(SAT)和Mish激活。我们使用新的特性：WRC，CSP，CMBN，SAT，Mish激活，马赛克数据增强，CMBN, DropBlock正则化和CIOU损失，将他们结合起来实现了在MSCOCO数据集上最优的结果：43.5%AP (65.7%AP50)，在Tesla V100实时速度为65FPS。
1. Introduction	1. 介绍
The majority of CNN-based object detectors are largely applicable only for recommendation systems. For example, searching for free parking spaces via urban video cameras is executed by slow accurate models, whereas car collision warning is related to fast inaccurate models. Improving the real-time object detector accuracy enables using them not only for hint generating recommendation systems, but also for stand-alone process management and human input reduction. Real-time object detector operation on conven tional Graphics Processing Units (GPU) allows their mass usage at an affordable price. The most accurate modern neural networks do not operate in real time and require large number of GPUs for training with a large mini-batch-size. We address such problems through creating a CNN that op erates in real-time on a conventional GPU, and for which training requires only one conventional GPU. The main goal of this work is designing a fast operating speed of an object detector in production systems and opti mization for parallel computations, rather than the low com putation volume theoretical indicator (BFLOP). We hope that the designed object can be easily trained and used. For example, anyone who uses a conventional GPU to train and test can achieve real-time, high quality, and convincing ob ject detection results, as the YOLOv4 results shown in Fig ure 1 . Our contributions are summarized as follows: 1. We develope an effificient and powerful object detection model. It makes everyone can use a 1080 Ti or 2080 Ti GPU to train a super fast and accurate object detector. 2. We verify the inflfluence of state-of-the-art Bag-of Freebies and Bag-of-Specials methods of object detec tion during the detector training. 3. We modify state-of-the-art methods and make them more effecient and suitable for single GPU training, including CBN [ 89 ], PAN [ 49 ], SAM [ 85 ], etc.	大多数基于CNN的目标检测器基本上用于推荐系统。例如，通过城市摄像机搜索免费停车位是通过慢速精确执行的模型，而汽车碰撞警告与快速不准确的模型有关。提高实时目标检测器的准确性不仅可以用于提示生成推荐系统，而且还可以用于独立的过程管理和人力投入减少。在GPU上的实时对象检测器大量使用的话需要需要承担起价格。大多数神经网络不能实时运行，并且需要大量的GPU来进行小批量的训练。我们通过创建一个在传统GPU上实时运行的CNN来解决这些问题，而训练只需要一个传统GPU。本工作的主要目的是生产系统中设计一个快速目标检测器并且能够通过并行计算来优化，而不是降低计算量的理论指标 (BFLOP)。我们希望设计的对象可以很容易地训练和使用。例如，任何人能够使用常规GPU进行训练和测试并实现实时、高质量和传统对象检测结果，如图1所示的YOLOv4结果。我们的贡献总结如下： 1. 我们开发了一个高效、强大的目标检测模型。它使每个人都可以使用1080Ti或2080TiGPU来训练超级快速和精确的物体探测器。 2. 我们验证了在探测器训练过程中，state-of-the-art Bag-of Freebies and Bag-of-Specials对物体检测的影响。 3. 我们修改了当前的方法，使它们更有效，更适合于单个GPU训练，包括CBN[89]、PAN[49]、SAM[85]等。
2. Related work 2.1. Object detection models	2. 相关工作 2.1. 物体检测方法
A modern detector is usually composed of two parts, a backbone which is pre-trained on ImageNet and a head which is used to predict classes and bounding boxes of ob jects. For those detectors running on GPU platform, their backbone could be VGG [ 68 ], ResNet [ 26 ], ResNeXt [ 86 ], or DenseNet [ 30 ]. For those detectors running on CPU plat form, their backbone could be SqueezeNet [ 31 ], MobileNet [ 28 , 66 , 27 , 74 ], or ShufflfleNet [ 97 , 53 ]. As to the head part, it is usually categorized into two kinds, i.e., one-stage object detector and two-stage object detector. The most represen tative two-stage object detector is the R-CNN [ 19 ] series, including fast R-CNN [ 18 ], faster R-CNN [ 64 ], R-FCN [ 9 ], and Libra R-CNN [ 58 ]. It is also possible to make a two stage object detector an anchor-free object detector, such as RepPoints [ 87 ]. As for one-stage object detector, the most representative models are YOLO [ 61 , 62 , 63 ], SSD [ 50 ], and RetinaNet [ 45 ]. In recent years, anchor-free one-stage object detectors are developed. The detectors of this sort are CenterNet [ 13 ], CornerNet [ 37 , 38 ], FCOS [ 78 ], etc. Object detectors developed in recent years often insert some lay ers between backbone and head, and these layers are usu ally used to collect feature maps from different stages. We can call it the neck of an object detector. Usually, a neck is composed of several bottom-up paths and several top down paths. Networks equipped with this mechanism in clude Feature Pyramid Network (FPN) [ 44 ], Path Aggrega tion Network (PAN) [ 49 ], BiFPN [ 77 ], and NAS-FPN [ 17 ]. In addition to the above models, some researchers put their emphasis on directly building a new backbone (DetNet [ 43 ], DetNAS [ 7 ]) or a new whole model (SpineNet [ 12 ], HitDe tector [ 20 ]) for object detection. To sum up, an ordinary object detector is composed of several parts: • Input : Image, Patches, Image Pyramid • Backbones : VGG16 [ 68 ], ResNet-50 [ 26 ], SpineNet [ 12 ], EffificientNet-B0/B7 [ 75 ], CSPResNeXt50 [ 81 ], CSPDarknet53 [ 81 ] • Neck : • Additional blocks : SPP [ 25 ], ASPP [ 5 ], RFB [ 47 ], SAM [ 85 ] • Path-aggregation blocks : FPN [ 44 ], PAN [ 49 ], NAS-FPN [ 17 ], Fully-connected FPN, BiFPN [ 77 ], ASFF [ 48 ], SFAM [ 98 ] • Heads • Dense Prediction (one-stage) : ◦ RPN [ 64 ], SSD [ 50 ], YOLO [ 61 ], RetinaNet [ 45 ] (anchor based) ◦ CornerNet [ 37 ], CenterNet [ 13 ], MatrixNet [ 60 ], FCOS [ 78 ] (anchor free) • Sparse Prediction (two-stage) : ◦ Faster R-CNN [ 64 ], R-FCN [ 9 ], Mask R CNN [ 23 ] (anchor based) ◦ RepPoints [ 87 ] (anchor free)	现代检测器通常由两个部分组成，一个是在ImageNet上预先训练的骨干，一个是用来预测对象的类和包围框的头。对于运行在GPU平台上的检测器来说，它们的骨干可以是VGG[68]、ResNet[26]、ResneXt[86]或DenseNet[30]。对于运行在CPU平台上的检测器来说，它们的骨干可以是SqueezeNet [31], MobileNet [28, 66, 27, 74], or ShufflfleNet [97, 53].对于头部部分，通常分为两类，即一级对象检测器和二级对象检测器。最具代表性的两级物体检测器是R-CNN [19] 系列,包括fast R-CNN [18], faster R-CNN [64], R-FCN [9],and Libra R-CNN [58]. 还可以使两级物体检测器成为没有锚点对象检测器，如RepPoint[87]。对于一级对象检测器，最具代表性的模型是YOLO[61,62,63]、SSD[50]和RetinaNet [45]。近年来，无锚点一级目标探测器正在发展。这类检测器有CenterNet [13], CornerNet [37, 38], FCOS [78],等。近年来发展起来的对象检测器在骨干和头部之间插入一些层，这些层通常用于收集不同阶段的特征图。我们可以称之为物体探测器的颈部。通常颈部由几条自下而上的路径和几条自上而下的路径组成。有此机制的网络包括特征金字塔网络(FPN)[44]、路径聚合网络(PAN)[49]、BiFPN[77]和NAS-FPN[17]。除了上述模型外，一些研究人员还强调直接为物体检测器构建一个新的骨干(DetNet[43]，Det NAS[7])或一个新的整体模型(Spine Net[12]，HitDetector[20] Tection。综上所述，一个普通的物体探测器由几部分组成：输入：图像，pitch,图像金字塔骨干：VGG16 [68], ResNet-50 [26], SpineNet [12], EffificientNet-B0/B7 [75], CSPResNeXt50 [81], CSPDarknet53 [81] 颈部：额外的块：SPP [25], ASPP [5], RFB[47], SAM [85] 路径聚和块：FPN [44], PAN [49],NAS-FPN [17], Fully-connected FPN, BiFPN[77], ASFF [48], SFAM [98] 头部：密集预测（一阶段）： RPN[64]，SSD[50]，YOLO[61]，RetinaNet[45]（基于锚） CornerNet [37], CenterNet [13], MatrixNet [60], FCOS [78] (anchor free) 稀疏预测（两阶段）： Faster R-CNN [64], R-FCN [9], Mask R CNN [23] (anchor based)（基于锚） RepPoints [87] （无锚）
2.2. Bag of freebies	2.2. Bag of freebies
Usually, a conventional object detector is trained off line. Therefore, researchers always like to take this advan tage and develop better training methods which can make the object detector receive better accuracy without increas ing the inference cost. We call these methods that only change the training strategy or only increase the training cost as “bag of freebies.” What is often adopted by object detection methods and meets the defifinition of bag of free bies is data augmentation. The purpose of data augmenta tion is to increase the variability of the input images, so that the designed object detection model has higher robustness to the images obtained from different environments. For examples, photometric distortions and geometric distortions are two commonly used data augmentation method and they defifinitely benefifit the object detection task. In dealing with photometric distortion, we adjust the brightness, contrast, hue, saturation, and noise of an image. For geometric dis tortion, we add random scaling, cropping, flflipping, and ro tating. The data augmentation methods mentioned above are all pixel-wise adjustments, and all original pixel information in the adjusted area is retained. In addition, some researchers engaged in data augmentation put their emphasis on sim ulating object occlusion issues. They have achieved good results in image classifification and object detection. For ex ample, random erase [ 100 ] and CutOut [ 11 ] can randomly select the rectangle region in an image and fifill in a random or complementary value of zero. As for hide-and-seek [ 69 ] and grid mask [ 6 ], they randomly or evenly select multiple rectangle regions in an image and replace them to all ze ros. If similar concepts are applied to feature maps, there are DropOut [ 71 ], DropConnect [ 80 ], and DropBlock [ 16 ] methods. In addition, some researchers have proposed the methods of using multiple images together to perform data augmentation. For example, MixUp [ 92 ] uses two images to multiply and superimpose with different coeffificient ra tios, and then adjusts the label with these superimposed ra tios. As for CutMix [ 91 ], it is to cover the cropped image to rectangle region of other images, and adjusts the label according to the size of the mix area. In addition to the above mentioned methods, style transfer GAN [ 15 ] is also used for data augmentation, and such usage can effectively reduce the texture bias learned by CNN. Different from the various approaches proposed above, some other bag of freebies methods are dedicated to solving the problem that the semantic distribution in the dataset may have bias. In dealing with the problem of semantic distri bution bias, a very important issue is that there is a problem of data imbalance between different classes, and this prob lem is often solved by hard negative example mining [ 72 ] or online hard example mining [ 67 ] in two-stage object de tector. But the example mining method is not applicable to one-stage object detector, because this kind of detector belongs to the dense prediction architecture. Therefore Lin et al . [ 45 ] proposed focal loss to deal with the problem of data imbalance existing between various classes. An other very important issue is that it is diffificult to express the relationship of the degree of association between different categories with the one-hot hard representation. This rep resentation scheme is often used when executing labeling. The label smoothing proposed in [ 73 ] is to convert hard la bel into soft label for training, which can make model more robust. In order to obtain a better soft label, Islam et al . [ 33 ] introduced the concept of knowledge distillation to design the label refifinement network. The last bag of freebies is the objective function of Bounding Box (BBox) regression. The traditional object detector usually uses Mean Square Error (MSE) to di rectly perform regression on the center point coordinates and height and width of the BBox, i.e., { x center , y center , w , h } , or the upper left point and the lower right point, i.e., { x top lef t , y top lef t , x bottom right , y bottom right } . As for anchor-based method, it is to estimate the correspond ing offset, for example { x center of f set , y center of f set , w of f set , h of f set } and { x top lef t of f set , y top lef t of f set , x bottom right of f set , y bottom right of f set } . However, to di rectly estimate the coordinate values of each point of the BBox is to treat these points as independent variables, but in fact does not consider the integrity of the object itself. In order to make this issue processed better, some researchers recently proposed IoU loss [ 90 ], which puts the coverage of predicted BBox area and ground truth BBox area into con sideration. The IoU loss computing process will trigger the calculation of the four coordinate points of the BBox by ex ecuting IoU with the ground truth, and then connecting the generated results into a whole code. Because IoU is a scale invariant representation, it can solve the problem that when traditional methods calculate the l 1 or l 2 loss of { x , y , w , h } , the loss will increase with the scale. Recently, some researchers have continued to improve IoU loss. For exam ple, GIoU loss [ 65 ] is to include the shape and orientation of object in addition to the coverage area. They proposed to fifind the smallest area BBox that can simultaneously cover the predicted BBox and ground truth BBox, and use this BBox as the denominator to replace the denominator origi nally used in IoU loss. As for DIoU loss [ 99 ], it additionally considers the distance of the center of an object, and CIoU loss [ 99 ], on the other hand simultaneously considers the overlapping area, the distance between center points, and the aspect ratio. CIoU can achieve better convergence speed and accuracy on the BBox regression problem.	通常，传统的物体检测器是离线训练的。因此，研究人员总是喜欢利用这一优势，开发更好的训练方法，使对象检测器能够达到更高的准确率而不增加推理成本。我们把这些只改变训练策略或只增加训练成本的方法称为“bag of freebies.”。该方法经常被物体检测器使用并且满足“bag of freebies”方法也叫做数据增强。数据增强的目的是增加输入图像的可变性，使设计的物体检测模型对从不同环境中获得的图像具有较高的鲁棒性。例如，光度畸变和几何畸变是两种常用的数据增强方法它们肯定有利于目标检测任务。在处理光度失真时，我们调整图像的亮度、对比度、色调、饱和度和噪声。对于几何畸变，我们添加随机缩放、裁剪、翻转和旋转。上述数据增强方法均为像素级调整，保留调整区域内所有原始像素信息。此外，一些从事数据增强的研究人员强调模拟对象遮挡问题。它们在图像分类和目标检测方面取得了良好的效果。例如，例如，在图像中随机擦除或剪切矩形区域，并随机填充零或其互补值。至于hide-and-seek和网格掩码，它们随机或均匀地在图像中选择多个矩形区域并将它们替换为所有零。如果将类似的概念应用于特征映射，则有DropOut、DropConnect和DropBlock方法。此外，一些研究人员也有专业人士提出了使用多幅图像拼接在一起的数据增强的方法。例如，将两张图片以不同的比例叠加在一起，然后调整这些带有叠加比率的标签。至于裁剪混合，它是将裁剪后的图像覆盖到其他图像的矩形区域，并根据混合区域的大小调整标签。除了上述方法外，风格迁移GAN网络也被用于数据增强，这样的使用可以有效地减少CNN学习的纹理偏差。与上述提出的各种方法不同，其他一些bag of freebies方法致力于解决数据集中语义分布可能存在偏差的问题。在处理语义分布偏差问题中，一个非常重要的问题是不同类之间存在数据不平衡问题，这个问题通常是通过两级对象检测器中进行负例采样或在线负例采样来解决。但实例挖掘方法不适用于一级对象检测器，因为这种检测器属于密集的预测体系结构。因此，Lin等人提出了focal loss来处理各类之间存在的数据不平衡问题。另一个非常重要的问题是，很难表达不同类别之间关联程度与one-hot标签之间的关系。标签平滑是将硬标签转换为软标签进行训练，使模型更加稳健，在制作标签时这种方案经常被使用。为了获得更好的软标签，Islam等人引入知识蒸馏的概念来设计标签细化网络。最后一个bag of freebies是BoundingBox(BBox)回归的目标函数。传统的对象检测器通常使用均方误差(MeanSquare Error，MSE)直接对中心坐标和高度、宽度的BBox进行回归，即{xcenter，ycenter，w，h}，或左上点和右下点，即{xtop_left，ytop_left，xbottom_left，ybottom_right}。如对于基于锚的方法，它是估计相应的偏移量，例如{xcenter_offset，ycenter_offset，woffset，hoffset}和f集的{xtop_left_offset，ytop_left_offset，xbottom_right_offset，ybottom_right_offset}。然而，直接估计BBox每个点的坐标值并将这些点视为自变量，实际上没有考虑对象本身的完整性。为了使这一问题得到更好的处理，一些研究人员最近提出了IoU损失，将预测的BBox和真实的BBox放在一起考虑。通过将IoU与地面真相执行，IoU损失通过计算BBox的四个坐标点与真实标签的IoU,并将得到的结果加入到整个代码中。由于IoU是一个尺度不变表示，因此可以解决传统方法计算{x，y，w，h}的L1或L2损失时，损失会随着尺度的增加而增加的问题。最近，一些研究人员在继续改善IoU损失。例如，GIOU损失除了覆盖区域外，还包括物体的形状和方向。他们提出找到最小的区域BBox，可以同时覆盖预测的BBox和真实BBox，并使用这个BBox作为分母以取代原来在IoU损失中使用的分母。对于DIOU损失，它还考虑了物体中心的距离，CIOU损失，另一方面同时考虑了重叠区域，即CEN之间的距离点和纵横比作为对于DIOU损失。在BBox回归问题上，CIOU可以获得更好的收敛速度和精度。
2.3. Bag of specials	2.3. Bag of specials
For those plugin modules and post-processing methods that only increase the inference cost by a small amount but can signifificantly improve the accuracy of object detec tion, we call them “bag of specials”. Generally speaking, these plugin modules are for enhancing certain attributes in a model, such as enlarging receptive fifield, introducing at tention mechanism, or strengthening feature integration ca pability, etc., and post-processing is a method for screening model prediction results. Common modules that can be used to enhance recep tive fifield are SPP [ 25 ], ASPP [ 5 ], and RFB [ 47 ]. The SPP module was originated from Spatial Pyramid Match ing (SPM) [ 39 ], and SPMs original method was to split fea ture map into several d × d equal blocks, where d can be { 1 , 2 , 3 , ... } , thus forming spatial pyramid, and then extract ing bag-of-word features. SPP integrates SPM into CNN and use max-pooling operation instead of bag-of-word op eration. Since the SPP module proposed by He et al . [ 25 ] will output one dimensional feature vector, it is infeasible to be applied in Fully Convolutional Network (FCN). Thus in the design of YOLOv3 [ 63 ], Redmon and Farhadi improve SPP module to the concatenation of max-pooling outputs with kernel size k × k , where k = { 1 , 5 , 9 , 13 } , and stride equals to 1. Under this design, a relatively large k × k max pooling effectively increase the receptive fifield of backbone feature. After adding the improved version of SPP module, YOLOv3-608 upgrades AP 50 by 2.7% on the MS COCO object detection task at the cost of 0.5% extra computation. The difference in operation between ASPP [ 5 ] module and improved SPP module is mainly from the original k × k ker nel size, max-pooling of stride equals to 1 to several 3 × 3 kernel size, dilated ratio equals to k , and stride equals to 1 in dilated convolution operation. RFB module is to use sev eral dilated convolutions of k × k kernel, dilated ratio equals to k , and stride equals to 1 to obtain a more comprehensive spatial coverage than ASPP. RFB [ 47 ] only costs 7% extra inference time to increase the AP 50 of SSD on MS COCO by 5.7%. The attention module that is often used in object detec tion is mainly divided into channel-wise attention and point wise attention, and the representatives of these two atten tion models are Squeeze-and-Excitation (SE) [ 29 ] and Spa tial Attention Module (SAM) [ 85 ], respectively. Although SE module can improve the power of ResNet50 in the Im ageNet image classifification task 1% top-1 accuracy at the cost of only increasing the computational effort by 2%, but on a GPU usually it will increase the inference time by about 10%, so it is more appropriate to be used in mobile devices. But for SAM, it only needs to pay 0.1% extra cal culation and it can improve ResNet50-SE 0.5% top-1 accu racy on the ImageNet image classifification task. Best of all, it does not affect the speed of inference on the GPU at all. In terms of feature integration, the early practice is to use skip connection [ 51 ] or hyper-column [ 22 ] to integrate low level physical feature to high-level semantic feature. Since multi-scale prediction methods such as FPN have become popular, many lightweight modules that integrate different feature pyramid have been proposed. The modules of this sort include SFAM [ 98 ], ASFF [ 48 ], and BiFPN [ 77 ]. The main idea of SFAM is to use SE module to execute channel wise level re-weighting on multi-scale concatenated feature maps. As for ASFF, it uses softmax as point-wise level re weighting and then adds feature maps of different scales. In BiFPN, the multi-input weighted residual connections is proposed to execute scale-wise level re-weighting, and then add feature maps of different scales. In the research of deep learning, some people put their focus on searching for good activation function. A good activation function can make the gradient more effificiently propagated, and at the same time it will not cause too much extra computational cost. In 2010, Nair and Hin ton [ 56 ] propose ReLU to substantially solve the gradient vanish problem which is frequently encountered in tradi tional tanh and sigmoid activation function. Subsequently, LReLU [ 54 ], PReLU [ 24 ], ReLU6 [ 28 ], Scaled Exponential Linear Unit (SELU) [ 35 ], Swish [ 59 ], hard-Swish [ 27 ], and Mish [ 55 ], etc., which are also used to solve the gradient vanish problem, have been proposed. The main purpose of LReLU and PReLU is to solve the problem that the gradi ent of ReLU is zero when the output is less than zero. As for ReLU6 and hard-Swish, they are specially designed for quantization networks. For self-normalizing a neural net work, the SELU activation function is proposed to satisfy the goal. One thing to be noted is that both Swish and Mish are continuously differentiable activation function. The post-processing method commonly used in deep learning-based object detection is NMS, which can be used to fifilter those BBoxes that badly predict the same ob ject, and only retain the candidate BBoxes with higher re sponse. The way NMS tries to improve is consistent with the method of optimizing an objective function. The orig inal method proposed by NMS does not consider the con text information, so Girshick et al . [ 19 ] added classifification confifidence score in R-CNN as a reference, and according to the order of confifidence score, greedy NMS was performed in the order of high score to low score. As for soft NMS [ 1 ], it considers the problem that the occlusion of an object may cause the degradation of confifidence score in greedy NMS with IoU score. The DIoU NMS [ 99 ] developers way of thinking is to add the information of the center point dis tance to the BBox screening process on the basis of soft NMS. It is worth mentioning that, since none of above post processing methods directly refer to the captured image fea tures, post-processing is no longer required in the subse quent development of an anchor-free method.	对于那些只增加少量推理成本但能显著提高目标检测精度的插件模块和后处理方法，我们称之为“Bag of specials”。一般来说，这些插件模块是为了增强模型中的某些属性，如扩大感受野、引入注意机制或加强特征集成能力等，后处理是筛选模型预测结果的一种方法。可用于增强感受野的常用模块有SPP、ASPP和RFB。 SPP模块起源于空间金字塔匹配(SPM)，SPM原始方法为t将特征映射分割成几个d×d等量块，其中d可以是{1，2，3，...}，从而形成空间金字塔，然后提取词袋特征。 SPP将SPM集成到CNN中并使用max-pool操作而不是单词袋操作。由于He等人提出的SPP模块。将输出一维特征向量，在FCN中应用是不可行得。因此，在YOLOv3的设计中，Redmon和Farhadi将SPP模块改进为核大小为k×k的最大池输出的级联，其中k={1、5、9、13}，步长等于1。在此设计中，相对较大的k×k maxpooling有效地增加了骨干特征的感受野。增加SPP模块的改进版本后，YOLOv3-608将AP50提升2.7%并减少0.5%的额外计算成本完成MSCOCO对象检测任务。 ASPP模块与改进SPP模块在操作上的区别主要是将原始的k×k大小的核，maxpooling步长等于1变成一系列3×3大小的核，扩展比等于k，步长等于1的。射频模块是使用k×k的几个膨胀卷积核，扩张比等于k，步长等于1的扩展卷积运算，以获得比ASPP更全面的空间覆盖。 RFB只花费7%的额外推理时间来获得SSD在MS COCO上AP50增加5.7%。在物体检测中经常使用的注意模块主要分为通道注意力和点注意力，这两种注意模型的代表分别是SE和空间注意模块SAM。虽然SE模块可以提高ResNet50在ImageNet图像分类任务中的能力，提高1%top-1的准确率只需要增加2%的计算量，但在GPU上通常会增加10%左右的推理时间，因此在移动设备中使用更合适。但对SAM来说，在图像网图像分类任务中，它只需要花费0.1%的额外计算量就提高ResNet50-SE 0.5%top-1的准确率。最重要的是，它不影响GPU上的推理速度。在特征集成方面，早期的实践是使用跳转连接或hyper-column将低级特征集成到高级语义特征中。自从多尺度预测方法如FPN变的流行起来，许多集成不同特征金字塔的轻量级模块已经被提出。这类模块包括SFAM、ASFF和BiFPN。 SFAM的主要思想是利用SE模块对多尺度级联特征进行通道级重加权。至于ASFF，它使用Softmax作为点积重加权，然后添加不同尺度的特征。在BiFPN中，提出了多输入加权残差连接来执行标度级重新加权，然后添加不同尺度的特征。在深度学习的研究中，一些人把重点放在寻找更好的激活函数上。一个好的激活函数可以使梯度更有效地传播，同时时间不会造成太多额外的计算成本。在2010年，Nair和Hinton提出ReLU来实质性地解决传统中tanh和sigmoid经常遇到的梯度消失问题。随后，LRELU、PRELU、RELU6、标度指数线性单元(SELU)、Swish、Hard-Swish和Mish等也被使用为了解决梯度消失问题。而LRELU和PRELU的主要目的是解决当输出小于零时RELU的梯度为零的问题。至于ReLU6和Hard-Swish，它们是专门为量化网络设计的。对于神经网络的自归一化，提出了SELU激活函数就是为了来满足这个目标。有一件事需要注意在Swish和Mish都是连续可微激活函数。基于深度学习的对象检测中常用的后处理方法是NMS，它可以用来过滤那些预测同一对象不好的BBox，并且只保留候选具有较高响应的BBox。NMS试图改进的方法与优化目标函数的方法是一致的。 NMS提出的原始方法不考虑上下文信息，所以Girshick等人在R-CNN中添加分类置信度评分作为参考，并根据置信度评分的顺序，按高分到低分的顺序执行NMS。对于soft NMS，它考虑了对象的遮挡可能导致NMS中置信度分数下降的问题。DIoU NMS开发人员的思维方式是在soft NMS的基础上，将中心点距离的信息添加到BBox筛选过程中。值得一提的是，由于上述后处理方法都没有直接提捕获的图像特征，在后续开发无锚方法中后处理不再被需要。
3. Methodology	3. 方法
The basic aim is fast operating speed of neural network, in production systems and optimization for parallel compu tations, rather than the low computation volume theoreti cal indicator (BFLOP). We present two options of real-time neural networks: • For GPU we use a small number of groups (1 - 8) in convolutional layers: CSPResNeXt50 / CSPDarknet53 • For VPU - we use grouped-convolution, but we re frain from using Squeeze-and-excitement (SE) blocks - specififically this includes the following models: EffificientNet-lite / MixNet [76] / GhostNet [21] / Mo bileNetV3	我们的主要目标是神经网络的快速运行速度、生产系统和并行计算的优化，而不是低计算量的理论指标(BFLOP)。我们提出两个实时运行神经网络选项： ·对于GPU，我们在卷积层中使用少量的组合（1-8）：CSPResNeXt50/CSPDarknet53 ·对于VPU-我们使用分组卷积，但我们不使用挤压和SE模块 -特别是这包括以下模型：EffificientNet-lite/MxNet/GhostNet/MobileNetV3
3.1. Selection of architecture	3.1. 结构的选择
Our objective is to fifind the optimal balance among the in put network resolution, the convolutional layer number, the parameter number (fifilter size2 * fifilters * channel / groups), and the number of layer outputs (fifilters). For instance, our numerous studies demonstrate that the CSPResNext50 is considerably better compared to CSPDarknet53 in terms of object classifification on the ILSVRC2012 (ImageNet) dataset [10]. However, conversely, the CSPDarknet53 is better compared to CSPResNext50 in terms of detecting objects on the MS COCO dataset [46]. The next objective is to select additional blocks for in creasing the receptive fifield and the best method of parame ter aggregation from different backbone levels for different detector levels: e.g. FPN, PAN, ASFF, BiFPN. A reference model which is optimal for classifification is not always optimal for a detector. In contrast to the classi- fifier, the detector requires the following: • Higher input network size (resolution) – for detecting multiple small-sized objects • More layers – for a higher receptive fifield to cover the increased size of input network • More parameters – for greater capacity of a model to detect multiple objects of different sizes in a single im age Hypothetically speaking, we can assume that a model with a larger receptive fifield size (with a larger number of convolutional layers 3 × 3) and a larger number of parame ters should be selected as the backbone. Table 1 shows the information of CSPResNeXt50, CSPDarknet53, and Effifi- cientNet B3. The CSPResNext50 contains only 16 convo lutional layers 3 × 3, a 425 × 425 receptive fifield and 20.6 M parameters, while CSPDarknet53 contains 29 convolu tional layers 3 × 3, a 725 × 725 receptive fifield and 27.6 M parameters. This theoretical justifification, together with our numerous experiments, show that CSPDarknet53 neu ral network is the optimal model of the two as the backbone for a detector. The inflfluence of the receptive fifield with different sizes is summarized as follows: • Up to the object size - allows viewing the entire object • Up to network size - allows viewing the context around the object • Exceeding the network size - increases the number of connections between the image point and the fifinal ac tivation We add the SPP block over the CSPDarknet53, since it signifificantly increases the receptive fifield, separates out the most signifificant context features and causes almost no re duction of the network operation speed. We use PANet as the method of parameter aggregation from different back bone levels for different detector levels, instead of the FPN used in YOLOv3. Finally, we choose CSPDarknet53 backbone, SPP addi tional module, PANet path-aggregation neck, and YOLOv3 (anchor based) head as the architecture of YOLOv4. In the future we plan to expand signifificantly the content of Bag of Freebies (BoF) for the detector, which theoreti cally can address some problems and increase the detector accuracy, and sequentially check the inflfluence of each fea ture in an experimental fashion. We do not use Cross-GPU Batch Normalization (CGBN or SyncBN) or expensive specialized devices. This al lows anyone to reproduce our state-of-the-art outcomes on a conventional graphic processor e.g. GTX 1080Ti or RTX 2080Ti.	我们的目标是在输入网络分辨率、卷积层数、参数数（滤波器大小2滤波器信道/组）和输出层数量之间找到最优平衡。例如，我们的大量研究表明，与CSPDarknet53相比，CSPRES Next50在ILSVRC2012(Image Net)数据集上的物体分类方面要好得多。然而与此相反， CSPDarknet53与CSPRESNext50相比在MSCOCO物体检测数据集上的效果更好。下一个目标是选择额外的块来增加感受野和根据不同的检测器级从不同的骨干级别进行参数聚合的最佳方法：例如， FPN，PAN，ASFF，BiFPN。一个最适合分类的参考模型对于检测器来说并不总是最优的。与分类器相比，检测器需要以下内容： ·更高的输入网络大小（分辨率）- 用于检测多个小型物体 ·更多的层 - 获取更大的感受野以覆盖增加的输入网络大小 · 更多的参数-为了提高模型在单个图像中检测多个大小不同的对象的能力假设性地说，我们可以假设一个有较大感受野（具有更多的卷积层3×3）和大量的参数的模型,这样的模型应该选择为 backbone。表1显示了CSPRESNeXt50、CSPDarknet53和Effi-cientNetB3的信息。 CSPRES Next50只包含16个3×3卷积层、425×425感受野和20.6M参数，而CSPDarknet53包含29个3×3卷积层、725×725接收场和27.6M参数。根据理论，加上我们的无数实验，证明 CSPDarknet53神经网络是两者作为检测器主干的最优模型。不同大小的感受野的影响总结如下： ·从对象大小方面-允许查看整个对象 ·从网络大小方面-允许查看对象周围的上下文 · 超过网络大小-增加图像点与最终激活之间的连接数量我们在CSPDarknet53上添加SPP块，因为它能够增加了感受野，提取最重要的上下文特征，并且没有降低网络运行速度。我们使用PANet作为不同检测器级别的不同骨干级别的参数聚合方法，而不是YOLOv3中使用的FPN。最后，我们选择以CSPDarknet53骨干、SPP附加模块、PANet和YOLOv3（基于锚的）头部为YOLOv4的体系结构。今后我们计划扩大检测器中Bag of Freebies 的内容，理论上可以解决一些问题并提高探测器的精度，并以实验的方式依次检查每个特征的影响。我们不使用交叉GPU批量归一化(CGBN或同步BN)或昂贵的专门设备。这允许任何人在传统的图形处理器上复制我们最好的结果。例如GTX1080Ti或RTX2080Ti。
3.2. Selection of BoF and BoS	3.2. 选择BoF和BoS
For improving the object detection training, a CNN usu ally uses the following: • Activations: ReLU, leaky-ReLU, parametric-ReLU, ReLU6, SELU, Swish, or Mish • Bounding box regression loss: MSE, IoU, GIoU, CIoU, DIoU • Data augmentation: CutOut, MixUp, CutMix • Regularization method: DropOut, DropPath [36], Spatial DropOut [79], or DropBlock • Normalization of the network activations by their mean and variance: Batch Normalization (BN) [32], Cross-GPU Batch Normalization (CGBN or SyncBN) [93], Filter Response Normalization (FRN) [70], or Cross-Iteration Batch Normalization (CBN) [89] • Skip-connections: Residual connections, Weighted residual connections, Multi-input weighted residual connections, or Cross stage partial connections (CSP) As for training activation function, since PReLU and SELU are more diffificult to train, and ReLU6 is specififically designed for quantization network, we therefore remove the above activation functions from the candidate list. In the method of reqularization, the people who published Drop Block have compared their method with other methods in detail, and their regularization method has won a lot. There fore, we did not hesitate to choose DropBlock as our reg ularization method. As for the selection of normalization method, since we focus on a training strategy that uses only one GPU, syncBN is not considered.	为了改进目标检测训练，CNN通常使用以下方法： • 激活函数：ReLU，leaky-ReLU，parametric-ReLU，ReLU6，SELU，Swish，或Mish • 盒回归损失：MSE，IoU，GioU，CIOU，DIOU • 数据增强：切断，混合，切割混合 • 正则化:Dropout,DropPath，spatial Dropout，或dropBlock • 归一化的网络激活的均值和方差：批归一化(B N)，交叉GPU批归一化(CGBN或SyncBN)，FRN，或CBN • 跳跃连接：残差连接，加权残差连接，多输入魏剩余连接，或跨级部分连接(CSP) 至于训练激活函数，由于PRELU和SELU难训练，而ReLU6是专门为量化网络设计的，因此我们删除了上述激活函数从候选人名单上。在正则化方法中，发表DropBlock的人详细比较了他们的方法和其他方法，他们的正则化方法更胜一筹。因此，我们毫不犹豫地选择DropBlock作为我们的正则化方法。至于归一化方法的选择，由于我们专注于只使用一个GPU的训练，syncBN我们将不考虑。
3.3. Additional improvements	3.3. 其他改进
In order to make the designed detector more suitable for training on single GPU, we made additional design and im provement as follows: • We introduce a new method of data augmentation Mo saic, and Self-Adversarial Training (SAT) • We select optimal hyper-parameters while applying genetic algorithms • We modify some exsiting methods to make our design suitble for effificient training and detection - modifified SAM, modifified PAN, and Cross mini-Batch Normal ization (CmBN) Mosaic represents a new data augmentation method that mixes 4 training images. Thus 4 different contexts are mixed, while CutMix mixes only 2 input images. This al lows detection of objects outside their normal context. In addition, batch normalization calculates activation statistics from 4 different images on each layer. This signifificantly reduces the need for a large mini-batch size. Self-Adversarial Training (SAT) also represents a new data augmentation technique that operates in 2 forward backward stages. In the 1st stage the neural network alters the original image instead of the network weights. In this way the neural network executes an adversarial attack on it self, altering the original image to create the deception that there is no desired object on the image. In the 2nd stage, the neural network is trained to detect an object on this modifified image in the normal way. CmBN represents a CBN modifified version, as shown in Figure 4, defifined as Cross mini-Batch Normalization (CmBN). This collects statistics only between mini-batches within a single batch. We modify SAM from spatial-wise attention to point wise attention, and replace shortcut connection of PAN to concatenation, as shown in Figure 5 and Figure 6, respec tively.	为了使设计的检测器更适合于单个GPU上的训练，我们做了以下额外的设计和改进： • 我们引进了一种新的数据增强马赛克和自我对抗训练(SAT)的方法) • 我们在应用遗传算法的同时选择最优超参数 • 我们修改了一些现有的方法，使我们的设计更有效的训练和检测——改进的SAM，改进的PAN，和(CMBN) 马赛克表示一种新的数据增强方法，它混合了4幅训练图像。因此，4种不同的语境是混合，而剪切混合只混合2个输入图像。这允许检测它们正常上下文之外的对象。此外，批处理归一化从每个层上的4个不同的图像中计算激活统计量。这大大减少了对大的小批量尺寸的需要。自我对抗训练(SAT)也代表了一种新的数据增强技术，它工作在两个向前向后阶段。在第一阶段，神经网络改变原始图像而不是网络权重。通过这种方式，神经网络对自己执行对抗性攻击，改变原始图像以创建图像上没有所需对象的欺骗。在第二阶段，对神经网络进行训练，以正常的方式检测该修改图像上的物体。 cmBN表示一个CBN修改版本，如图4所示，定义为Cross mini-Batch Normalization(cmBN)。这只收集单个批次内的微型批次之间的统计数据。我们将SAM从空间注意修改为点注意，并将PAN的快捷连接替换为级联，分别如图5和图6所示。
3.4. YOLOv4	3.4. YOLOv4
In this section, we shall elaborate the details of YOLOv4. YOLOv4 consists of: • Backbone: CSPDarknet53 [81] • Neck: SPP [25], PAN [49] • Head: YOLOv3 [63] YOLO v4 uses: • Bag of Freebies (BoF) for backbone: CutMix and Mosaic data augmentation, DropBlock regularization, Class label smoothing • Bag of Specials (BoS) for backbone: Mish activa tion, Cross-stage partial connections (CSP), Multi input weighted residual connections (MiWRC) • Bag of Freebies (BoF) for detector: CIoU-loss, CmBN, DropBlock regularization, Mosaic data aug mentation, Self-Adversarial Training, Eliminate grid sensitivity, Using multiple anchors for a single ground truth, Cosine annealing scheduler [52], Optimal hyper parameters, Random training shapes • Bag of Specials (BoS) for detector: Mish activation, SPP-block, SAM-block, PAN path-aggregation block, DIoU-NMS	在本节中，我们将详细介绍YOLOv4的细节。 YOLOv4组成： • Backbone: CSPDarknet53 • Neck: SPP , PAN • Head: YOLOv3 YOLOv4的用途： • Bag of Freebies (BoF) for backbone：混合切割和马赛克数据增强，DropBlock正则化，类标签平滑 • Bag of Specials (BoS) for backbone: Mish激活，CSP，多输入加权残差链接连接(Mi WRC) • Bag of Freebies (BoF) for detector: CIoU-loss,CmBN, DropBlock 正则化，马赛克数据增强，自对抗训练，消除网格敏感性，多个锚点对，余弦退火学习，最优超参数，随机训练形状 • Bag of Specials (BoS) for detector: Mish activation, SPP-block, SAM-block, PAN path-aggregation block, DIoU-NMS
4. Experiments	4. 实验
Table 2: Inflfluence of BoF and Mish on the CSPResNeXt-50 clas sififier accuracy. Table 3: Inflfluence of BoF and Mish on the CSPDarknet-53 classi- fifier accuracy. Table 4: Ablation Studies of Bag-of-Freebies. (CSPResNeXt50-PANet-SPP, 512x512). Table 5: Ablation Studies of Bag-of-Specials. (Size 512x512). Table 6: Using different classififier pre-trained weightings for de tector training (all other training parameters are similar in all mod els) . Table 7: Using different mini-batch size for detector training. Figure 8: Comparison of the speed and accuracy of different object detectors. (Some articles stated the FPS of their detectors for only one of the GPUs: Maxwell/Pascal/Volta) Table 8: Comparison of the speed and accuracy of different object detectors on the MS COCO dataset (test dev 2017). (Real-time detectors with FPS 30 or higher are highlighted here. We compare the results with batch=1 without using tensorRT.)	表2：BoF和Mish对CSPResNeXt-50分类器精度的影响。表3：BoF和Mish对CSPDarknet-53分类精度的影响。表4：Bag-of-Freebies消融研究。 (CSPRes Ne Xt50-PANet-SPP，512x512)。表5：Bag-of-Specials消融研究。 (尺寸512x512)。表6：使用不同的分类器进行检测器训练（所有其他训练参数在所有模型中都是相似的）。表7：使用不同的小批量大小进行探测器训练。图8：不同对象检测器的速度和精度比较。 (有些文章指出，它们的探测器的FPS只适用于一个GPU：Maxwell/Pascal/Volta) 表8：MSCOCO数据集上不同对象检测器的速度和精度的比较(testdev2017)。 (此处突出显示FPS30或更高的实时检测器。我们比较batchsize=1且不使用tensorRT加速下的结果)

你可能感兴趣的:(论文翻译,深度学习)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
损失函数与反向传播 Star_. PyTorch pytorch 深度学习 python
损失函数定义与作用损失函数(lossfunction)在深度学习领域是用来计算搭建模型预测的输出值和真实值之间的误差。1.损失函数越小越好2.计算实际输出与目标之间的差距3.为更新输出提供依据（反向传播)常见的损失函数回归常见的损失函数有：均方差（MeanSquaredError，MSE）、平均绝对误差（MeanAbsoluteErrorLoss，MAE）、HuberLoss是一种将MSE与MAE
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
基于深度学习的文本引导的图像编辑 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的文本引导的图像编辑（Text-GuidedImageEditing）是一种通过自然语言文本指令对图像进行编辑或修改的技术。它结合了图像生成和自然语言处理（NLP）的最新进展，使用户能够通过描述性文本对图像内容进行精确的调整和操控。1.文本引导的图像编辑的挑战文本和图像之间的对齐：如何将文本中的语义信息准确地映射到图像中的特定区域或元素是一个关键挑战。这涉及到多模态数据的对齐和理解。编
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
深度学习驱动的车牌识别：技术演进与未来挑战逼子歌深度学习车牌识别神经网络字符识别 YOLO 卷积神经网络
一、引言1.1研究背景在当今社会，智能交通系统的发展日益重要，而车牌识别作为其关键组成部分，发挥着至关重要的作用。车牌识别技术广泛应用于交通管理、停车场管理、安防监控等领域。在交通管理中，它可以用于车辆识别、交通违法监控和车流统计等，提高交通管理的效率和准确性。在停车场管理中，实现车辆的自动识别和收费，提升管理和服务水平。在安防监控领域，可用于追踪嫌疑人及犯罪行为。深度学习的出现为车牌识别带来了重
每天五分钟玩转深度学习PyTorch：模型参数优化器torch.optim 幻风_huanfeng 深度学习框架pytorch 深度学习 pytorch 人工智能神经网络机器学习优化算法
本文重点在机器学习或者深度学习中，我们需要通过修改参数使得损失函数最小化(或最大化)，优化算法就是一种调整模型参数更新的策略。在pytorch中定义了优化器optim，我们可以使用它调用封装好的优化算法，然后传递给它神经网络模型参数，就可以对模型进行优化。本文是学习第6步(优化器)，参考链接pytorch的学习路线随机梯度下降算法在深度学习和机器学习中，梯度下降算法是最常用的参数更新方法，它的公式
什么是AIGC？有哪些免费工具？ chent_某位 AIGC
AIGC（AIGeneratedContent），即“人工智能生成内容”，是指通过人工智能技术自动生成各种类型的数字内容。AIGC让机器能够根据输入的信息或数据生成符合人类需求的文本、图像、音频、视频等内容，极大提高了内容创作的效率。AIGC的背景与起源随着深度学习和自然语言处理技术的快速发展，人工智能已经不再局限于简单的任务，如分类、预测和数据分析，而是具备了生成内容的能力。生成式AI模型，如O
transformer架构(Transformer Architecture)原理与代码实战案例讲解 AI架构设计之禅大数据AI人工智能 Python入门实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
transformer架构(TransformerArchitecture)原理与代码实战案例讲解关键词：Transformer,自注意力机制,编码器-解码器,预训练,微调,NLP,机器翻译作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来自然语言处理（NLP）领域的发展经历了从规则驱动到统计驱动再到深度学习驱动的三个阶段。
如何有效的学习AI大模型？ Python程序员罗宾学习人工智能语言模型自然语言处理架构
学习AI大模型是一个系统性的过程，涉及到多个学科的知识。以下是一些建议，帮助你更有效地学习AI大模型：基础知识储备：数学基础：学习线性代数、概率论、统计学和微积分等，这些是理解机器学习算法的数学基础。编程技能：掌握至少一种编程语言，如Python，因为大多数AI模型都是用Python实现的。理论学习：机器学习基础：了解监督学习、非监督学习、强化学习等基本概念。深度学习：学习神经网络的基本结构，如卷
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程牙牙要健康深度学习 onnx onnxruntime 深度学习 python 人工智能
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程提示:博主取舍了很多大佬的博文并亲测有效,分享笔记邀大家共同学习讨论文章目录【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程前言模型转换--pytorch转onnxWindows平台搭建依赖环境onnxruntime调用onnx模型ONNXRuntime推理核
基于深度学习的多模态信息检索 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的多模态信息检索（MultimodalInformationRetrieval,MMIR）是指利用深度学习技术，从包含多种模态（如文本、图像、视频、音频等）的数据集中检索出满足用户查询意图的相关信息。这种方法不仅可以处理单一模态的数据，还可以在多种模态之间建立关联，从而更准确地满足用户需求。1.多模态信息检索的挑战异构数据表示：多模态数据通常具有不同的特征和表示形式（如文本的词嵌入与图
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI linux PHP android
╔-----------------------------------╗┆
zookeeper admin 笔记 braveCS zookeeper
Required Software 1) JDK>=1.6 2)推荐使用ensemble的ZooKeeper(至少3台)，并run on separate machines 3)在Yahoo!，zk配置在特定的RHEL boxes里，2个cpu，2G内存，80G硬盘数据和日志目录 1)数据目录里的文件是zk节点的持久化备份，包括快照和事务日
Spring配置多个连接池 easterfly spring
项目中需要同时连接多个数据库的时候，如何才能在需要用到哪个数据库就连接哪个数据库呢？ Spring中有关于dataSource的配置： <bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" &nb
Mysql 171815164 mysql
例如，你想myuser使用mypassword从任何主机连接到mysql服务器的话。 GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%'IDENTIFIED BY 'mypassword' WI TH GRANT OPTION; 如果你想允许用户myuser从ip为192.168.1.6的主机连接到mysql服务器，并使用mypassword作
CommonDAO（公共/基础DAO） g21121 DAO
好久没有更新博客了，最近一段时间工作比较忙，所以请见谅，无论你是爱看呢还是爱看呢还是爱看呢，总之或许对你有些帮助。 DAO(Data Access Object)是一个数据访问（顾名思义就是与数据库打交道）接口，DAO一般在业
直言有讳永夜-极光感悟随笔
1.转载地址:http://blog.csdn.net/jasonblog/article/details/10813313 精华: “直言有讳”是阿里巴巴提倡的一种观念，而我在此之前并没有很深刻的认识。为什么呢？就好比是读书时候做阅读理解，我喜欢我自己的解读，并不喜欢老师给的意思。在这里也是。我自己坚持的原则是互相尊重，我觉得阿里巴巴很多价值观其实是基本的做人
安装CentOS 7 和Win 7后，Win7 引导丢失随便小屋 centos
一般安装双系统的顺序是先装Win7，然后在安装CentOS，这样CentOS可以引导WIN 7启动。但安装CentOS7后，却找不到Win7 的引导，稍微修改一点东西即可。一、首先具有root 的权限。即进入Terminal后输入命令su，然后输入密码即可二、利用vim编辑器打开/boot/grub2/grub.cfg文件进行修改 v
Oracle备份与恢复案例 aijuans oracle
Oracle备份与恢复案例一. 理解什么是数据库恢复当我们使用一个数据库时，总希望数据库的内容是可靠的、正确的，但由于计算机系统的故障（硬件故障、软件故障、网络故障、进程故障和系统故障）影响数据库系统的操作，影响数据库中数据的正确性，甚至破坏数据库，使数据库中全部或部分数据丢失。因此当发生上述故障后，希望能重构这个完整的数据库，该处理称为数据库恢复。恢复过程大致可以分为复原(Restore)与
JavaEE开源快速开发平台G4Studio v5.0发布無為子
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V5.0版本已经正式发布。访问G4Studio网站 http://www.g4it.org 2013-04-06 发布G4Studio_V5.0版本功能新增 (1). 新增了调用Oracle存储过程返回游标，并将游标映射为Java List集合对象的标
Oracle显示根据高考分数模拟录取百合不是茶 PL/SQL编程 oracle例子模拟高考录取学习交流
题目要求: 1,创建student表和result表 2,pl/sql对学生的成绩数据进行处理 3,处理的逻辑是根据每门专业课的最低分线和总分的最低分数线自动的将录取和落选 1,创建student表,和result表学生信息表; create table student( student_id number primary key,--学生id
优秀的领导与差劲的领导 bijian1013 领导管理团队
责任优秀的领导：优秀的领导总是对他所负责的项目担负起责任。如果项目不幸失败了，那么他知道该受责备的人是他自己，并且敢于承认错误。差劲的领导：差劲的领导觉得这不是他的问题，因此他会想方设法证明是他的团队不行，或是将责任归咎于团队中他不喜欢的那几个成员身上。努力工作优秀的领导：团队领导应该是团队成员的榜样。至少，他应该与团队中的其他成员一样努力工作。这仅仅因为他
js函数在浏览器下的兼容 Bill_chen jquery 浏览器 IE DWR ext
做前端开发的工程师，少不了要用FF进行测试，纯js函数在不同浏览器下，名称也可能不同。对于IE6和FF，取得下一结点的函数就不尽相同： IE6：node.nextSibling,对于FF是不能识别的； FF：node.nextElementSibling,对于IE是不能识别的；兼容解决方式：var Div = node.nextSibl
【JVM四】老年代垃圾回收：吞吐量垃圾收集器(Throughput GC) bit1129 垃圾回收
吞吐量与用户线程暂停时间衡量垃圾回收算法优劣的指标有两个：吞吐量越高，则算法越好暂停时间越短，则算法越好首先说明吞吐量和暂停时间的含义。垃圾回收时，JVM会启动几个特定的GC线程来完成垃圾回收的任务，这些GC线程与应用的用户线程产生竞争关系，共同竞争处理器资源以及CPU的执行时间。GC线程不会对用户带来的任何价值，因此，好的GC应该占
J2EE监听器和过滤器基础白糖_ J2EE
Servlet程序由Servlet，Filter和Listener组成，其中监听器用来监听Servlet容器上下文。监听器通常分三类：基于Servlet上下文的ServletContex监听，基于会话的HttpSession监听和基于请求的ServletRequest监听。 ServletContex监听器 ServletContex又叫application
博弈AngularJS讲义(16) - 提供者 boyitech js AngularJS api Angular Provider
Angular框架提供了强大的依赖注入机制，这一切都是有注入器(injector)完成. 注入器会自动实例化服务组件和符合Angular API规则的特殊对象，例如控制器，指令，过滤器动画等。那注入器怎么知道如何去创建这些特殊的对象呢？ Angular提供了5种方式让注入器创建对象，其中最基础的方式就是提供者(provider), 其余四种方式(Value, Fac
java-写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 bylijinnan java
public class CommonSubSequence { /** * 题目：写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 * 写一个版本算法复杂度O(N^2)和一个O(N) 。 * * O(N^2)：对于a中的每个字符，遍历b中的每个字符，如果相同，则拷贝到新字符串中。 * O(
sqlserver 2000 无法验证产品密钥 Chen.H sql windows SQL Server Microsoft
在 Service Pack 4 (SP 4), 是运行 Microsoft Windows Server 2003、 Microsoft Windows Storage Server 2003 或 Microsoft Windows 2000 服务器上您尝试安装 Microsoft SQL Server 2000 通过卷许可协议 (VLA) 媒体。这样做, 收到以下错误信息CD KEY的 SQ
[新概念武器]气象战争 comsci
气象战争的发动者必须是拥有发射深空航天器能力的国家或者组织.... 原因如下: 地球上的气候变化和大气层中的云层涡旋场有密切的关系,而维持一个在大气层某个层次
oracle 中 rollup、cube、grouping 使用详解 daizj oracle grouping rollup cube
oracle 中 rollup、cube、grouping 使用详解 -- 使用oracle 样例表演示转自namesliu -- 使用oracle 的样列库，演示 rollup, cube, grouping 的用法与使用场景 --- ROLLUP ，为了理解分组的成员数量，我增加了分组的计数 COUNT(SAL)
技术资料汇总分享 Dead_knight 技术资料汇总分享
本人汇总的技术资料，分享出来，希望对大家有用。 http://pan.baidu.com/s/1jGr56uE 资料主要包含： Workflow->工作流相关理论、框架(OSWorkflow、JBPM、Activiti、fireflow...) Security->java安全相关资料(SSL、SSO、SpringSecurity、Shiro、JAAS...) Ser
初一下学期难记忆单词背诵第一课 dcj3sjt126com english word
could 能够 minute 分钟 Tuesday 星期二 February 二月 eighteenth 第十八 listen 听 careful 小心的，仔细的 short 短的 heavy 重的 empty 空的 certainly 当然 carry 携带；搬运 tape 磁带 basket 蓝子 bottle 瓶 juice 汁，果汁 head 头；头部
截取视图的图片, 然后分享出去 dcj3sjt126com OS Objective-C
OS 7 has a new method that allows you to draw a view hierarchy into the current graphics context. This can be used to get an UIImage very fast. I implemented a category method on UIView to get the vi
MySql重置密码 fanxiaolong MySql重置密码
方法一: 在my.ini的[mysqld]字段加入： skip-grant-tables 重启mysql服务，这时的mysql不需要密码即可登录数据库然后进入mysql mysql>use mysql; mysql>更新 user set password=password('新密码') WHERE User='root'; mysq
Ehcache（03）——Ehcache中储存缓存的方式 234390216 ehcache MemoryStore DiskStore 存储驱除策略
Ehcache中储存缓存的方式目录 1 堆内存（MemoryStore） 1.1 指定可用内存 1.2 驱除策略 1.3 元素过期 2 &nbs
spring mvc中的@propertysource jackyrong spring mvc
在spring mvc中，在配置文件中的东西，可以在java代码中通过注解进行读取了： @PropertySource 在spring 3.1中开始引入比如有配置文件 config.properties mongodb.url=1.2.3.4 mongodb.db=hello 则代码中 @PropertySource(&
重学单例模式 lanqiu17 单例 Singleton 模式
最近在重新学习设计模式，感觉对模式理解更加深刻。觉得有必要记下来。第一个学的就是单例模式，单例模式估计是最好理解的模式了。它的作用就是防止外部创建实例，保证只有一个实例。单例模式的常用实现方式有两种，就人们熟知的饱汉式与饥汉式，具体就不多说了。这里说下其他的实现方式静态内部类方式: package test.pattern.singleton.statics; publ
.NET开源核心运行时，且行且珍惜 netcome java .net 开源
背景 2014年11月12日，ASP.NET之父、微软云计算与企业级产品工程部执行副总裁Scott Guthrie，在Connect全球开发者在线会议上宣布，微软将开源全部.NET核心运行时，并将.NET 扩展为可在 Linux 和 Mac OS 平台上运行。.NET核心运行时将基于MIT开源许可协议发布，其中将包括执行.NET代码所需的一切项目——CLR、JIT编译器、垃圾收集器（GC）和核心
使用oscahe缓存技术减少与数据库的频繁交互 Everyday都不同 Web 高并发 oscahe缓存
此前一直不知道缓存的具体实现，只知道是把数据存储在内存中，以便下次直接从内存中读取。对于缓存的使用也没有概念，觉得缓存技术是一个比较”神秘陌生“的领域。但最近要用到缓存技术，发现还是很有必要一探究竟的。缓存技术使用背景：一般来说，对于web项目，如果我们要什么数据直接jdbc查库好了，但是在遇到高并发的情形下，不可能每一次都是去查数据库，因为这样在高并发的情形下显得不太合理——
Spring+Mybatis 手动控制事务 toknowme mybatis
@Override public boolean testDelete(String jobCode) throws Exception { boolean flag = false; &nbs
菜鸟级的android程序员面试时候需要掌握的知识点 xp9802 android
熟悉Android开发架构和API调用掌握APP适应不同型号手机屏幕开发技巧熟悉Android下的数据存储熟练Android Debug Bridge Tool 熟练Eclipse/ADT及相关工具熟悉Android框架原理及Activity生命周期熟练进行Android UI布局熟练使用SQLite数据库；熟悉Android下网络通信机制，S

《YOLOv4: Optimal Speed and Accuracy of Object Detection》论文翻译

Abstract

摘要

1. Introduction

1. 介绍

2. Related work

2.1. Object detection models

2. 相关工作

2.1. 物体检测方法

2.2. Bag of freebies

2.2. Bag of freebies

2.3. Bag of specials

2.3. Bag of specials

3. Methodology

3. 方法

3.1. Selection of architecture

3.1. 结构的选择

3.2. Selection of BoF and BoS

3.2. 选择BoF和BoS

3.3. Additional improvements

3.3. 其他改进

3.4. YOLOv4

3.4. YOLOv4

4. Experiments

4. 实验

你可能感兴趣的:(论文翻译,深度学习)