冯子材

YOLOv3原理及代码解析

博主完整翻译了YOLOV1和YOLOV3的论文；请移步查看：
YOLOV1：https://blog.csdn.net/taifengzikai/article/details/81988891
YOLOV3：https://blog.csdn.net/taifengzikai/article/details/100903687

YOLO v3原理及代码解析

YOLO是一种端到端的目标检测模型。YOLO算法的基本思想是：首先通过特征提取网络对输入特征提取特征，得到特定大小的特征图输出。输入图像分成13×13的grid cell，接着如果真实框中某个object的中心坐标落在某个grid cell中，那么就由该grid cell来预测该object。每个object有固定数量的bounding box，YOLO v3中有三个bounding box，使用逻辑回归确定用来预测的回归框。

一、YOLO结构

Yolo系列里，作者只在v1的论文里给出了结构图，而v2和v3的论文里没有给出结构图。但是清晰的结构图对于理解和学习Yolo十分重要。在“木盏”的CSDN博客上，找到了一份完整的Yolo v3结构图。经博主同意之后，我拿到了高清原图，如图1.1。Yolo v3整个结构，不包括池化层和全连接层。Yolo主干结构是Darknet-53网络，还有 Yolo预测支路采用的都是全卷积的结构。

图1.1 Yolo v3结构图

图1.1中的DBL是Yolo v3的基本组件。正如yolo3.model中的DarknetConv2D_BN_Leaky函数所定义的那样，Darknet的卷积层后接BatchNormalization（BN）和LeakyReLU。除最后一层卷积层外，在yolo v3中BN和LeakyReLU已经是卷积层不可分离的部分了，共同构成了最小组件。

主干网络中使用了5个resn结构。n代表数字，有res1，res2, … ,res8等等，表示这个res_block里含有n个res_unit，这是Yolo v3的大组件。从Yolo v2的darknet-19上升到Yolo v3的darknet-53，前者没有残差结构。Yolo v3开始借鉴了ResNet的残差结构，使用这种结构可以让网络结构更深。对于res_block的解释，可以在图1.1的右下角直观看到，其基本组件也是DBL。

在预测支路上有张量拼接（concat）操作。其实现方法是将darknet中间层和中间层后某一层的上采样进行拼接。值得注意的是，张量拼接和Res_unit结构的add的操作是不一样的，张量拼接会扩充张量的维度，而add只是直接相加不会导致张量维度的改变。

从代码层面来整体分析，Yolo_body一共有252层。23个Res_unit对应23个add层。BN层和LeakyReLU层数量都是72层，在网络结构中的表现为：每一层BN后面都会接一层LeakyReLU。上采样和张量拼接操作各2个，5个零填充对应5个res_block。卷积层一共有75层，其中有72层后面都会接BatchNormalization和LeakyReLU构成的DBL。三个不同尺度的输出对应三个卷积层，最后的卷积层的卷积核个数是255，是针对COCO数据集的80类：3×(80+4+1)=255，3表示一个grid cell包含3个bounding box，4表示框的4个坐标信息，1表示置信度。

二、Darknet-53特征提取网络（Backbone）

Yolo v3中使用了一个53层的卷积网络，这个网络由残差单元叠加而成。Joseph Redmon的实验表明，在分类准确度上与效率的平衡上，Darknet-53模型比ResNet-101、 ResNet-152和Darknet-19表现得更好。Yolo v3并没有那么追求速度，而是在保证实时性(fps>60)的基础上追求performance。

一方面，Darknet-53网络采用全卷积结构，Yolo v3前向传播过程中，张量的尺寸变换是通过改变卷积核的步长来实现的。卷积的步长为2，每次经过卷积之后，图像边长缩小一半。如图2.1中所示，Darknet-53中有5次卷积的步长为2。经过5次缩小，特征图缩小为原输入尺寸的1/32。所以网络输入图片的尺寸为32的倍数，取为416×416。Yolo v2中对于前向过程中张量尺寸变换，都是通过最大池化来进行，一共有5次。而v3是通过卷积核增大步长来进行，也是5次。

图2.1 Darknet-53骨干结构另一方面，Darknet-53网络引入了residual结构。Yolo v2中还是类似VGG那样直筒型的网络结构，层数太多训起来会有梯度问题，所以Darknet-19也就19层。得益于ResNet的residual结构，训练深层网络的难度大大减小。因此Darknet-53网络做到53层，精度提升比较明显。

Darknet-53网络只是特征提取层，源码中只使用了pooling层前面的卷积层来提取特征，因此multi-scale的特征融合和预测支路并没有在该网络结构中体现。

三、边界框的预测（Bounding Box Prediction）

Yolo v3关于bounding box的初始尺寸还是采用Yolo v2 v2中的k-means聚类的方式来做，这种先验知识对于bounding box的初始化帮助还是很大的，毕竟过多的bounding box虽然对于效果来说有保障，但是对于算法速度影响还是比较大的。

Yolo v2借鉴了faster R-CNN的RPN的anchor机制，不同的是，采用k-means聚类的方法来确定默认框的尺寸。Joseph Redmon修改了k-means算法中关于距离的定义，使用的是IOU距离。同样地，YOLO v3选择的默认框有9个。其尺寸可以通过k-means算法在数据集上聚类得到。在COCO数据集上，9个聚类是：（10×13）;（16×30）;（33×23）;（30×61）;（62×45）; （59×119）; （116×90）; （156×198）; （373×326）。默认框与不同尺寸特征图的对应关系是：13×13的feature map对应[（116×90），（156×198），（373×326）],26×26的feature map对应[（30×61），（62×45），（59×119）],52×52的feature map对应[（10×13），（16×30），（33×23）]。其原因是：特征图越大，感受野越小。对小目标越敏感，所以选用小的anchor box。特征图越小，感受野越大。对大目标越敏感，所以选用大的anchor box。

import numpy as np
class YOLO_Kmeans:
    def __init__(self, cluster_number, filename):
        self.cluster_number = cluster_number
        self.filename = "2012_train.txt"
    def iou(self, boxes, clusters):  # 1 box -> k clusters 
        n = boxes.shape[0]
        k = self.cluster_number
        box_area = boxes[:, 0] * boxes[:, 1]
        box_area = box_area.repeat(k)
        box_area = np.reshape(box_area, (n, k))
        cluster_area = clusters[:, 0] * clusters[:, 1]
        cluster_area = np.tile(cluster_area, [1, n])
        cluster_area = np.reshape(cluster_area, (n, k))
        box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k))
        cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k))
        min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)
        box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k))
        cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k))
        min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)
        inter_area = np.multiply(min_w_matrix, min_h_matrix)
# 计算IOU值
        result = inter_area / (box_area + cluster_area - inter_area)
        return result
    def avg_iou(self, boxes, clusters):
        accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)])
        return accuracy
def kmeans(self, boxes, k, dist=np.median):
	#聚类问题
        box_number = boxes.shape[0]
        distances = np.empty((box_number, k))
        last_nearest = np.zeros((box_number,))
        np.random.seed()
        clusters = boxes[np.random.choice(
            box_number, k, replace=False)]  # init k clusters
        while True:
#此处没有使用欧氏距离，较大的box会比较小的box产生更多的错误。自定义的距离度量公式为：
#d(box,centroid)=1-IOU(box,centroid)。到聚类中心的距离越小越好，但IOU值是越大越好，所以使用 #1 - IOU，这样就保证距离越小，IOU值越大。
            distances = 1 - self.iou(boxes, clusters)  
            current_nearest = np.argmin(distances, axis=1)
            if (last_nearest == current_nearest).all():
                break  # clusters won't change
            for cluster in range(k):
                clusters[cluster] = dist(  # update clusters
                    boxes[current_nearest == cluster], axis=0)
            last_nearest = current_nearest
        return clusters
    def result2txt(self, data):
        f = open("yolo_anchors.txt", 'w')
        row = np.shape(data)[0]
        for i in range(row):
            if i == 0:
                x_y = "%d,%d" % (data[i][0], data[i][1])
            else:
                x_y = ", %d,%d" % (data[i][0], data[i][1])
            f.write(x_y)
        f.close()
    def txt2boxes(self):
        f = open(self.filename, 'r')
        dataSet = []
        for line in f:
            infos = line.split(" ")
            length = len(infos)
            for i in range(1, length):
                width = int(infos[i].split(",")[2]) - \
                    int(infos[i].split(",")[0])
                height = int(infos[i].split(",")[3]) - \
                    int(infos[i].split(",")[1])
                dataSet.append([width, height])
        result = np.array(dataSet)
        f.close()
        return result
    def txt2clusters(self):
        all_boxes = self.txt2boxes()
        result = self.kmeans(all_boxes, k=self.cluster_number)
        result = result[np.lexsort(result.T[0, None])]
        self.result2txt(result)
        print("K anchors:\n {}".format(result))
        print("Accuracy: {:.2f}%".format(
            self.avg_iou(all_boxes, result) * 100))
if __name__ == "__main__":
    cluster_number = 9
    filename = "2012_train.txt"
    kmeans = YOLO_Kmeans(cluster_number, filename)
    kmeans.txt2clusters()

Yolo v3采用直接预测相对位置的方法。预测出b-box中心点相对于网格单元左上角的相对坐标。直接预测出（tx，ty，tw，th，t0），然后通过以下坐标偏移公式计算得到b-box的位置大小和confidence。

tx、ty、tw、th就是模型的预测输出。cx和cy表示grid cell的坐标，比如某层的feature map大小是13×13，那么grid cell就有13×13个，第0行第1列的grid cell的坐标cx就是0，cy就是1。pw和ph表示预测前bounding box的size。bx、by、bw和bh就是预测得到的bounding box的中心的坐标和size。在训练这几个坐标值的时候采用了sum of squared error loss（平方和距离误差损失），因为这种方式的误差可以很快的计算出来。

Yolo v3使用逻辑回归预测每个边界框的分数。如果边界框与真实框的重叠度比之前的任何其他边界框都要好，则该值应该为1。如果边界框不是最好的，但确实与真实对象的重叠超过某个阈值(Yolo v3中这里设定的阈值是0.5)，那么就忽略这次预测。Yolo v3只为每个真实对象分配一个边界框，如果边界框与真实对象不吻合，则不会产生坐标或类别预测损失，只会产生物体预测损失。

四、类别预测

类别预测方面主要是将原来的单标签分类改进为多标签分类，因此网络结构上就将原来用于单标签多分类的softmax层换成用于多标签多分类的Logistic分类器。Yolo v2网络中的Softmax分类器，认为一个目标只属于一个类别，通过输出Score大小，使得每个框分配到Score最大的一个类别。但在一些复杂场景下，一个目标可能属于多个类（有重叠的类别标签），因此Yolo v3用多个独立的Logistic分类器替代Softmax层解决多标签分类问题，且准确率不会下降。举例说明，原来分类网络中的softmax层都是假设一张图像或一个object只属于一个类别，但是在一些复杂场景下，一个object可能属于多个类，比如你的类别中有woman和person这两个类，那么如果一张图像中有一个woman，那么你检测的结果中类别标签就要同时有woman和person两个类，这就是多标签分类，需要用Logistic分类器来对每个类别做二分类。Logistic分类器主要用到sigmoid函数，该函数可以将输入约束在0到1的范围内，因此当一张图像经过特征提取后的某一类输出经过sigmoid函数约束后如果大于0.5，就表示该边界框负责的目标属于该类。

五、多尺度预测

Yolo v3采用多个scale融合的方式做预测。原来的Yolo v2有一个层叫：passthrough layer，该层作用是为了加强Yolo算法对小目标检测的精确度。这个思想在Yolo v3中得到了进一步加强，在Yolo v3中采用类似FPN(feature pyramid networks)的upsample和融合做法（最后融合了3个scale，其他两个scale的大小分别是26×26和52×52），在多个scale的feature map上做检测，越精细的grid cell就可以检测出越精细的物体。对于小目标的检测效果提升明显。

在结构图1.1中可以看出，Yolo v3设定的是每个网格单元预测3个box，所以每个box需要有(x, y, w, h, confidence)五个基本参数。Yolo v3输出了3个不同尺度的feature map，如图1.1所示的y1, y2, y3。y1,y2和y3的深度都是255，边长的规律是13:26:52。

每个预测任务得到的特征大小都为N ×N ×[3∗(4+1+80)] ，N为格子大小，3为每个格子得到的边界框数量， 4是边界框坐标数量，1是目标预测值，80是类别数量。对于COCO类别而言，有80个类别的概率，所以每个box应该对每个种类都输出一个概率。所以3×(5 + 80) = 255。这个255就是这么来的。

Yolo v3用上采样的方法来实现这种多尺度的feature map。在Darknet-53得到的特征图的基础上，经过六个DBL结构和最后一层卷积层得到第一个特征图谱，在这个特征图谱上做第一次预测。Y1支路上，从后向前的倒数第3个卷积层的输出，经过一个DBL结构和一次（2,2）上采样，将上采样特征与第2个Res8结构输出的卷积特征张量连接，经过六个DBL结构和最后一层卷积层得到第二个特征图谱，在这个特征图谱上做第二次预测。Y2支路上，从后向前倒数第3个卷积层的输出，经过一个DBL结构和一次（2,2）上采样，将上采样特征与第1个Res8结构输出的卷积特征张量连接，经过六个DBL结构和最后一层卷积层得到第三个特征图谱，在这个特征图谱上做第三次预测。

就整个网络而言，Yolo v3多尺度预测输出的feature map尺寸为y1：（13×13），y2：（26×26），y3：（52×52）。网络接收一张（416×416）的图，经过5个步长为2的卷积来进行降采样（416 / 2ˆ5 = 13，y1输出（13×13）。从y1的倒数第二层的卷积层上采样(x2，up sampling)再与最后一个26×26大小的特征图张量连接，y2输出（26×26）。从y2的倒数第二层的卷积层上采样(x2，up sampling)再与最后一个52×52大小的特征图张量连接，y3输出（52×52）

六、Loss Function

对掌握Yolo来讲，loss function不可谓不重要。在Yolo v3的论文里没有明确提出所用的损失函数，确切地说，Yolo系列论文里面只有Yolo v1明确提了损失函数的公式。在Yolo v1中使用了一种叫sum-square error的损失计算方法，只是简单的差方相加。我们知道，在目标检测任务里，有几个关键信息是需要确定的:(x,y),(w,h),class,confidence 。根据关键信息的特点可以分为上述四类，损失函数应该由各自特点确定。最后加到一起就可以组成最终的loss function了，也就是一个loss function搞定端到端的训练。

图6.1 损失函数可以从代码分析出v3的损失函数，同样也是对以上四类，不过相比于v1中简单的总方误差，还是有一些调整的。keras框架描述的Yolo v3 的loss function代码，在附录yolo3.model。忽略恒定系数不看，可以从上述代码看出：除了w, h的损失函数依然采用总方误差之外，其他部分的损失函数用的是二值交叉熵。最后加到一起。

xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[...,0:2], from_logits=True)
wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh-raw_pred[...,2:4])
# 置信度
confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
# 分类
class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[...,5:], from_logits=True)
xy_loss = K.sum(xy_loss) / mf
wh_loss = K.sum(wh_loss) / mf
confidence_loss = K.sum(confidence_loss) / mf
class_loss = K.sum(class_loss) / mf
loss += xy_loss + wh_loss + confidence_loss + class_loss

七、实验
本文使用了qwe的keras版本的yolo v3代码，代码相对来说比较容易理解，复现比较容易。目录结构如图7.1所示

图7.1 目录结构实验环境如下：

Python3.6	Cuda 9.0	CuDnn 7.0.3
Opencv 4.0.0	Tensorflow-gpu 1.8.0	Keras 2.2.4

整个仿真过程流程如图7.2：

图7.2 仿真流程

7.1使用官方模型进行检测

Yolo v3的作者训练的网络基于coco数据集。下载作者的权值文件，yolov3.weights。经convert.py转换为keras的网络结构和权值文件。执行以下命令，完成模型的转换。

python convert.py -w yolov3.cfg yolov3.weights model_data/yolo.h5

转换过程如图7.3。需要注意的是然后使用yolo_video.py检测图像或视频中的目标。yolov3.cfg是模型控制文件，yolov3.weights是模型权重文件，model_data/yolo.h5是输出的keras权重文件。

图7.3 转换权重文件列出yolov3.cfg中的部分参数进行注释，

[net]
# Testing            ### 测试模式                                          
batch=1
subdivisions=1
# Training           ### 训练模式，每次前向的图片数目 = batch/subdivisions 
# batch=64
# subdivisions=16
width=416            ### 网络的输入宽、高、通道数                          
height=416
channels=3
momentum=0.9         ### 动量                                              
decay=0.0005         ### 权重衰减                                          
angle=0
saturation = 1.5     ### 饱和度                                            
exposure = 1.5       ### 曝光度                                            
hue=.1               ### 色调                                              

learning_rate=0.001  ### 学习率                                            
burn_in=1000         ### 学习率控制的参数
max_batches = 50200  ### 迭代次数                                          
policy=steps         ### 学习率策略                                       
steps=40000,45000    ### 学习率变动步长                                   
scales=.1,.1         ### 学习率变动因子                                   

[convolutional]
batch_normalize=1    ### BN
filters=32           ### 卷积核数目
size=3               ### 卷积核尺寸
stride=1             ### 卷积核步长
pad=1                ### pad
activation=leaky     ### 激活函数
	……
	[convolutional]
size=1
stride=1
pad=1
filters=255              ### 3x(classes + 4coor + 1prob) = 3x(20+4+1) = 75              
activation=linear

[yolo]
mask = 0,1,2            ### mask序号                        
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326  
classes=80              ### 类比数目                        
num=9
jitter=.3               ### 数据扩充的抖动操作              
ignore_thresh = .5      ### 文章中的阈值1                   
truth_thresh = 1        ### 文章中的阈值2                   
random=1                ### 多尺度训练开关

检测使用的脚本的分析放在使用自己的数据集训练并检测的部分。这里先给出使用官方权重文件的检测结果。使用python yolo_video.py –image命令，输入类别为狗，鸟，人的图片各一张，得到图片的检测结果，图7.4：

图7.4 使用官方权重文件的图片检测结果使用以下命令，得到视频的检测结果，图7.5：

python yolo_video.py [video_path] [output_path (optional)]

图7.5 使用官方权重文件的视频检测结果 ### 7.2 自己训练数据集并检测用于Keras-yolo训练的数据集格式为VOC格式。本文使用的数据集是由监控摄像头拍摄得到的视频随机截取得到的帧图像。训练集包含 people（人），front（车前），side（车侧身）和back（车尾）四个类别。首先，构建图7.6所示的数据集目录结构。将数据集图片都复制到JPEGImages目录下。使用Labelimg工具，人工标注训练集，把标注工具输出的文件复制到Annotations目录下。

图7.6 数据集目录结构在VOC2007下新建名为test.py的python文件。该python文件，读入'Annotations'目录下的xml文件，将90%的图片划分为训练集，10%的图片用作训练-验证集。训练-验证集的10%用作验证集。

import os
import random
trainval_percent = 0.1 #验证集比例
train_percent = 0.9 #训练集比例
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)
ftrainval = open('ImageSets/Main/trainval.txt', 'w')
ftest = open('ImageSets/Main/test.txt', 'w')
ftrain = open('ImageSets/Main/train.txt', 'w')
fval = open('ImageSets/Main/val.txt', 'w')
for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)  #10%的图片用作训练-验证集
        if i in train:
            ftest.write(name)  #训练-验证集的90%用作测试集
        else:
            fval.write(name)  #训练-验证集的10%用作验证集
    else:
        ftrain.write(name)   #90%的图片作为训练集
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

最后将划分好的图片名称分别保存到’ImageSets\Main’目录下，trainval.txt，test.txt,train.txt和val.txt四个文件中。
该工程中使用的数据格式是: image_file_path box1 box2 … boxN; 边界框格式是: x_min,y_min,x_max,y_max,class_id (no space)。对于VOC数据集，需要使用voc_annotation.py脚本进行转换。在主目录下生成test.txt,train.txt和val.txt，包含上一步生成的训练集、验证集和测试集的图片的路径和（x，y，w，h，class）真实值信息。

import xml.etree.ElementTree as ET
from os import getcwd
sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]
classes = ["people","front","side","back"] #数据集中所标记的四个类别
def convert_annotation(year, image_id, list_file):
    in_file =open('/home/fengzicai/Documents/keras-yolo3/VOC%s/Annotations/%s.xml'%(year, image_id))
    tree=ET.parse(in_file)
    root = tree.getroot()
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), int(xmlbox.find('ymax').text))
        list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
wd = getcwd()
for year, image_set in sets:
    image_ids = open('/home/fengzicai/Documents/keras-yolo3/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(wd, year, image_id))
        convert_annotation(year, image_id, list_file)
        list_file.write('\n')
    list_file.close()
至此，VOC格式的数据集就准备好了。然后将四类标签名写入model_data/coco_classes.txt和model/voc_classes.txt中。model_data/ yolo_anchors.txt填写通过K聚类得到的9个anchor。

下一步开始准备训练。训练过程函数调用关系如图7.7。

图7.7 函数调用关系训练脚本train.py：

"""
Retrain the YOLO model for your own dataset.
"""
import numpy as np
import keras.backend as K
from keras.layers import Input, Lambda
from keras.models import Model
from keras.callbacks import TensorBoard, ModelCheckpoint, EarlyStopping

from yolo3.model import preprocess_true_boxes, yolo_body, tiny_yolo_body, yolo_loss
from yolo3.utils import get_random_data


def _main():
    annotation_path = 'train.txt'
    log_dir = 'logs/000/'  #保存权重文件的路径
    classes_path = 'model_data/voc_classes.txt'  #保存分类信息文件的路径
    anchors_path = 'model_data/yolo_anchors.txt'  #保存默认框信息的路径
    class_names = get_classes(classes_path)
    anchors = get_anchors(anchors_path)
    input_shape = (416,416) # multiple of 32, hw
    model = create_model(input_shape, anchors, len(class_names) )
    train(model, annotation_path, input_shape, anchors, len(class_names), log_dir=log_dir)

def train(model, annotation_path, input_shape, anchors, num_classes, log_dir='logs/'):
    model.compile(optimizer='adam', loss={
        'yolo_loss': lambda y_true, y_pred: y_pred})
    logging = TensorBoard(log_dir=log_dir)
    checkpoint = ModelCheckpoint(log_dir + "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5",
        monitor='val_loss', save_weights_only=True, save_best_only=True, period=1)
    batch_size = 10
    val_split = 0.1
    with open(annotation_path) as f:
        lines = f.readlines()
    np.random.shuffle(lines)
    num_val = int(len(lines)*val_split)
    num_train = len(lines) - num_val
    print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))

    model.fit_generator(data_generator_wrap(lines[:num_train], batch_size, input_shape, anchors, num_classes),
            steps_per_epoch=max(1, num_train//batch_size),
            validation_data=data_generator_wrap(lines[num_train:], batch_size, input_shape, anchors, num_classes),
            validation_steps=max(1, num_val//batch_size),
            epochs=500,
            initial_epoch=0)
    model.save_weights(log_dir + 'trained_weights.h5')

def get_classes(classes_path):
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
return np.array(anchors).reshape(-1, 2)
	#该函数用于创建模型
def create_model(input_shape, anchors, num_classes, load_pretrained=False, freeze_body=False,
            weights_path='model_data/yolo_weights.h5'):
    K.clear_session() # get a new session
    image_input = Input(shape=(None, None, 3))
    h, w = input_shape
    num_anchors = len(anchors)
    y_true = [Input(shape=(h//{0:32, 1:16, 2:8}[l], w//{0:32, 1:16, 2:8}[l], \
        num_anchors//3, num_classes+5)) for l in range(3)]
#预测每个尺度的3个框，所以对于4个边界框偏移量，1个目标性预测和4个类别预测，张量为#N×N×[3 *（4 + 1 + 4）]，默认参数下：y_true[l]的shape为（batch,H,W,3,num_classes+5)
model_body = yolo_body(image_input, num_anchors//3, num_classes)
	# yolo_body()函数从yolo3.model中引入
    print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))

    if load_pretrained:
        model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
        print('Load weights {}.'.format(weights_path))
        if freeze_body:
            # Do not freeze 3 output layers.
            num = len(model_body.layers)-7
            for i in range(num): model_body.layers[i].trainable = False
            print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
#生成模型损失
    model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
        arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
        [*model_body.output, *y_true])
    model = Model([model_body.input, *y_true], model_loss)
return model
	#通过train.py(data_generator)生成数据
def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes):
    n = len(annotation_lines)
    np.random.shuffle(annotation_lines)
    i = 0
    while True:
        image_data = []
        box_data = []
        for b in range(batch_size):
            i %= n
            image, box = get_random_data(annotation_lines[i], input_shape, random=True)
            image_data.append(image)
            box_data.append(box)
            i += 1
        image_data = np.array(image_data)
        box_data = np.array(box_data)
        y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes)
        yield [image_data, *y_true], np.zeros(batch_size)

def data_generator_wrap(annotation_lines, batch_size, input_shape, anchors, num_classes):
    n = len(annotation_lines)
    if n==0 or batch_size<=0: return None
    return data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)

if __name__ == '__main__':
_main()

使用自己的数据集进行检测过程与使用官方权重文件进行检测过程相同。使用python yolo_img.py –image执行检测脚本。检测脚本yolo_img.py：

import sys
import argparse
from yolo import YOLO, detect_video
from PIL import Image
import os
import glob
def detect_img(yolo):
    path = "/home/fengzicai/Documents/keras-yolo3/VOC2007/JPEGImages/*.jpg" #要读入的图片路径
    outdir = "/home/fengzicai/Documents/keras-yolo3/VOC2007/SegmentationClass" #将检测的结果全保#存到outdir路径
    for jpgfile in glob.glob(path):
        img = Image.open(jpgfile)
        img = yolo.detect_image(img)  #调用yolo类中的detect_image函数，对图片进行检测，见#yolo脚本
        img.save(os.path.join(outdir, os.path.basename(jpgfile)))
    yolo.close_session()
FLAGS = None
if __name__ == '__main__':
    # class YOLO defines the default value, so suppress any default here
    parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS)
    '''
    Command line options
    '''
    parser.add_argument(
        '--model', type=str,
        help='path to model weight file, default ' + YOLO.get_defaults("model_path")
    )
    parser.add_argument(
        '--anchors', type=str,
        help='path to anchor definitions, default ' + YOLO.get_defaults("anchors_path")
    )

    parser.add_argument(
        '--classes', type=str,
        help='path to class definitions, default ' + YOLO.get_defaults("classes_path")
    )
    parser.add_argument(
        '--gpu_num', type=int,
        help='Number of GPU to use, default ' + str(YOLO.get_defaults("gpu_num"))
    )
    parser.add_argument(
        '--image', default=False, action="store_true",
        help='Image detection mode, will ignore all positional arguments'
    )
    '''
    Command line positional arguments -- for video detection mode
    '''
    parser.add_argument(
        "--input", nargs='?', type=str,required=False,default='./path2your_video',
        help = "Video input path"
    )
    parser.add_argument(
        "--output", nargs='?', type=str, default="",
        help = "[Optional] Video output path"
    )
    FLAGS = parser.parse_args()
    if FLAGS.image:
        """
        Image detection mode, disregard any remaining command line arguments
        """
        print("Image detection mode")
        if "input" in FLAGS:
            print(" Ignoring remaining command line arguments: " + FLAGS.input + "," + FLAGS.output)
        detect_img(YOLO(**vars(FLAGS)))
    elif "input" in FLAGS:
        detect_video(YOLO(**vars(FLAGS)), FLAGS.input, FLAGS.output)
    else:
        print("Must specify at least video_input_path.  See usage with --help.")

Yolo_img.py在执行时，导入了yolo.py脚本，包含图像和视频中YOLO v3模型检测的类定义。

"""
Class definition of YOLO_v3 style detection model on image and video
"""
import colorsys
import os
from timeit import default_timer as timer

import numpy as np
from keras import backend as K
from keras.models import load_model
from keras.layers import Input
from PIL import Image, ImageFont, ImageDraw

from yolo3.model import yolo_eval, yolo_body, tiny_yolo_body
from yolo3.utils import letterbox_image
import os
from keras.utils import multi_gpu_model
#YOLO类的初始化参数
class YOLO(object):
    _defaults = {
        #"model_path": 'model_data/yolo.h5',
        "model_path": 'logs/001/trained_weights.h5', #训练好的模型
        "anchors_path": 'model_data/yolo_anchors.txt', #有9个anchor box，从小到大排列
        "classes_path": 'model_data/coco_classes.txt', #类别数目
        "score" : 0.3, #score阈值
        "iou" : 0.45, #iou 阈值
        "model_image_size" : (416, 416), #输入图像尺寸
        "gpu_num" : 1, #gpu数量
    }

    @classmethod
    def get_defaults(cls, n):
        if n in cls._defaults:
            return cls._defaults[n]
        else:
            return "Unrecognized attribute name '" + n + "'"

    def __init__(self, **kwargs):
        self.__dict__.update(self._defaults) # set up default values
        self.__dict__.update(kwargs) # and update with user overrides
        self.class_names = self._get_class()
        self.anchors = self._get_anchors()
        self.sess = K.get_session()
        self.boxes, self.scores, self.classes = self.generate()

    def _get_class(self):
        classes_path = os.path.expanduser(self.classes_path)
        with open(classes_path) as f:
            class_names = f.readlines()
        class_names = [c.strip() for c in class_names]
        return class_names

    def _get_anchors(self):
        anchors_path = os.path.expanduser(self.anchors_path)
        with open(anchors_path) as f:
            anchors = f.readline()
        anchors = [float(x) for x in anchors.split(',')]
        return np.array(anchors).reshape(-1, 2)

    def generate(self): #yolo_img.py中调用了该函数
        model_path = os.path.expanduser(self.model_path)  #获取model路径
        assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.' 
		#判断model是否以h5结尾
        # Load model, or construct model and load weights.
        num_anchors = len(self.anchors) #num_anchors = 9。yolov3有9个先验框
        num_classes = len(self.class_names) #num_cliasses = 4。一共有四个类别
        is_tiny_version = num_anchors==6 # default setting
        try:
            self.yolo_model = load_model(model_path, compile=False) #下载model
        except:
            self.yolo_model=tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, 
num_classes) \
            if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
            self.yolo_model.load_weights(self.model_path) # 确保model和anchor classes 对应
        else:
            assert self.yolo_model.layers[-1].output_shape[-1] == \
	# model.layer[-1]:网络最后一层输出。 output_shape[-1]:输出维度的最后一维。 -> (?,13,13,27)
                num_anchors/len(self.yolo_model.output) * (num_classes + 5), \
#27 = 9/3*(4+5). 9/3:每层网格对应3个anchor box 4：4个类别 5:4+1,框的4个值+1个置信度
                'Mismatch between model and given anchor and class sizes'

        print('{} model, anchors, and classes loaded.'.format(model_path))

        # 生成绘制边框的颜色
        hsv_tuples = [(x / len(self.class_names), 1., 1.)
	#h(色调）：x/len(self.class_names)  s(饱和度）：1.0  v(明亮）：1.0
                      for x in range(len(self.class_names))]
        self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))  #hsv转换为rgb
        self.colors = list(
            map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
                self.colors))
	#hsv取值范围在[0,1]，而RBG取值范围在[0,255]，所以乘上255
        np.random.seed(10101)  # np.random.seed():产生随机种子。固定种子为一致的颜色
        np.random.shuffle(self.colors)  # 调整颜色来装饰相邻的类。
        np.random.seed(None)  # 重置种子为默认

        #为过滤的边界框生成输出张量目标。
        self.input_image_shape = K.placeholder(shape=(2, )) #K.placeholder:keras中的占位符
        if self.gpu_num>=2:
            self.yolo_model = multi_gpu_model(self.yolo_model, gpus=self.gpu_num)
        boxes, scores, classes = yolo_eval(self.yolo_model.output, self.anchors,
                len(self.class_names), self.input_image_shape,
                score_threshold=self.score, iou_threshold=self.iou) #yolo_eval():yolo评估函数
        return boxes, scores, classes

    def detect_image(self, image): # yolo_img.py中调用了该函数
        start = timer()
        if self.model_image_size != (None, None): #判断图片是否存在
            assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
            assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
	#assert断言语句的语法格式 model_image_size[0][1]指图像的w和h，且必须是32的整数倍
            boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
	# letterbox_image()定义见附录中的yolo3.utils。输入参数（图像 ,(w=416,h=416)),
#输出一张使用填充来调整图像的纵横比不变的新图。
        else:
            new_image_size = (image.width - (image.width % 32),
                              image.height - (image.height % 32))
            boxed_image = letterbox_image(image, new_image_size)
        image_data = np.array(boxed_image, dtype='float32')

        print(image_data.shape) #(416,416,3)
        image_data /= 255. #归一化
        image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
	#添加批量维度为 (1,416,416,3)，使输入网络的张量满足(bitch, w, h, c)的格式
        out_boxes, out_scores, out_classes = self.sess.run(
            [self.boxes, self.scores, self.classes],
	#目的为了求boxes,scores,classes，具体计算方式定义在generate（）函数内。在yolo.py中
            feed_dict={ 
                self.yolo_model.input: image_data,  #图像数据
                self.input_image_shape: [image.size[1], image.size[0]], #图像尺寸
                K.learning_phase(): 0 #学习模式 0：测试模型。1：训练模式
            })
        print('Found {} boxes for {}'.format(len(out_boxes), 'img'))
	#绘制边框，自动设置边框宽度，绘制边框和类别文字，使用pillow绘图库。
        font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
                    size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) #设置字体
        thickness = (image.size[0] + image.size[1]) // 300 #设置厚度

        for i, c in reversed(list(enumerate(out_classes))):
            predicted_class = self.class_names[c] #类别
            box = out_boxes[i] #框
            score = out_scores[i] #置信度

            label = '{} {:.2f}'.format(predicted_class, score) #标签
            draw = ImageDraw.Draw(image) #画图
            label_size = draw.textsize(label, font) #标签文字

            top, left, bottom, right = box
            top = max(0, np.floor(top + 0.5).astype('int32'))
            left = max(0, np.floor(left + 0.5).astype('int32'))
            bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
            right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
            print(label, (left, top), (right, bottom)) #边框

            if top - label_size[1] >= 0: #标签文字
                text_origin = np.array([left, top - label_size[1]])
            else:
                text_origin = np.array([left, top + 1])

            # My kingdom for a good redistributable image drawing library.
            for i in range(thickness): #画边框
                draw.rectangle(
                    [left + i, top + i, right - i, bottom - i],
                    outline=self.colors[c])
            draw.rectangle( #文字背景
                [tuple(text_origin), tuple(text_origin + label_size)],
                fill=self.colors[c])
            draw.text(text_origin, label, fill=(0, 0, 0), font=font)
            del draw

        end = timer()
        print(end - start)
        return image

    def close_session(self):
        self.sess.close()

def detect_video(yolo, video_path, output_path=""):
    import cv2
    vid = cv2.VideoCapture(video_path)
    if not vid.isOpened():
        raise IOError("Couldn't open webcam or video")
    video_FourCC    = int(vid.get(cv2.CAP_PROP_FOURCC))
    video_fps       = vid.get(cv2.CAP_PROP_FPS)
    video_size      = (int(vid.get(cv2.CAP_PROP_FRAME_WIDTH)),
                        int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    isOutput = True if output_path != "" else False
    if isOutput:
        print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size))
        out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size)
    accum_time = 0
    curr_fps = 0
    fps = "FPS: ??"
    prev_time = timer()
    while True:
        return_value, frame = vid.read()
        #frame_array = np.asarray(frame)
        image = Image.fromarray(frame)
        image = yolo.detect_image(image)
        result = np.asarray(image)
        curr_time = timer()
        exec_time = curr_time - prev_time
        prev_time = curr_time
        accum_time = accum_time + exec_time
        curr_fps = curr_fps + 1
        if accum_time > 1:
            accum_time = accum_time - 1
            fps = "FPS: " + str(curr_fps)
            curr_fps = 0
        cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                    fontScale=0.50, color=(255, 0, 0), thickness=2)
        cv2.namedWindow("result", cv2.WINDOW_NORMAL)
        cv2.imshow("result", result)
        if isOutput:
            out.write(result)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
yolo.close_session()

检测图片时，命令行的执行过程如图7.8。函数的调用的关系如图7.9。图片检测结果如图7.10。

图7.8 函数调用关系

图7.9 命令行执行过程

图7.10（1）图片检测结果

图7.10（2）图片检测结果

图7.10（3）图片检测结果 ## 参考文献

[1] https://www.jianshu.com/p/3943be47fe84

[2] https://blog.csdn.net/leviopku/article/details/82660381

[3] https://github.com/qqwweee/keras-yolo3

[4] https://blog.csdn.net/Patrick_Lxc/article/details/80615433

[5] https://blog.csdn.net/lilai619/article/details/79695109

[6] https://blog.csdn.net/sum_nap/article/details/80568873#comments

[7] https://blog.csdn.net/u014380165/article/details/80202337

[8] https://blog.csdn.net/KKKSQJ/article/details/83587138

[9] https://blog.csdn.net/yangchengtest/article/details/80664415

[10] https://blog.csdn.net/Gentleman_Qin/article/details/84350496

附录A

训练和检测都导入了yolo3.model：
"""YOLO_v3 Model Defined in Keras."""

from functools import wraps

import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2

from yolo3.utils import compose

# DarknetConv2D()，DarknetConv2D_BN_Leaky()，resblock_body()三个函数构成了darknet_body()卷积层框#架
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
    """Wrapper to set Darknet parameters for Convolution2D."""
    darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
    darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
    darknet_conv_kwargs.update(kwargs)
    return Conv2D(*args, **darknet_conv_kwargs)
#注意 DARKNET卷积这里激活函数是LEAKYRELU
def DarknetConv2D_BN_Leaky(*args, **kwargs):
    """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
    no_bias_kwargs = {'use_bias': False}
    no_bias_kwargs.update(kwargs)
    return compose(
        DarknetConv2D(*args, **no_bias_kwargs),
        BatchNormalization(),
        LeakyReLU(alpha=0.1))

def resblock_body(x, num_filters, num_blocks):
    '''A series of resblocks starting with a downsampling Convolution2D'''
# Darknet uses left and top padding instead of 'same' mode
	# Darknet使用向左和向上填充代替same模式。
	#DARKNET每块之间，使用了，（1，0，1，0）的PADDING层。
    x = ZeroPadding2D(((1,0),(1,0)))(x)
    x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x)
    for i in range(num_blocks):
        y = compose(
                DarknetConv2D_BN_Leaky(num_filters//2, (1,1)),
                DarknetConv2D_BN_Leaky(num_filters, (3,3)))(x)
        x = Add()([x,y])
return x
	#创建darknet网络结构，有52层卷积层。包含五个resblock
def darknet_body(x):
    '''Darknent body having 52 Convolution2D layers'''
    x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
    x = resblock_body(x, 64, 1)
    x = resblock_body(x, 128, 2)
    x = resblock_body(x, 256, 8)
    x = resblock_body(x, 512, 8)
    x = resblock_body(x, 1024, 4)
    return x
#Convs由make_last_layers函数来实现。
def make_last_layers(x, num_filters, out_filters):
    '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer'''
    x = compose(
            DarknetConv2D_BN_Leaky(num_filters, (1,1)),
            DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
            DarknetConv2D_BN_Leaky(num_filters, (1,1)),
            DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
            DarknetConv2D_BN_Leaky(num_filters, (1,1)))(x)
    y = compose(
            DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
            DarknetConv2D(out_filters, (1,1)))(x)
    return x, y


def yolo_body(inputs, num_anchors, num_classes):
    """Create YOLO_V3 model CNN body in Keras."""
darknet = Model(inputs, darknet_body(inputs)) # darknet_body(inputs)创建一个darknet网络
	#以下语句是特征金字塔（FPN）的具体实现。 
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))
	#compose函数，从左向右评估函数
    x = compose(
            DarknetConv2D_BN_Leaky(256, (1,1)),
            UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[152].output])
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[92].output])
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))

    return Model(inputs, [y1,y2,y3])

def tiny_yolo_body(inputs, num_anchors, num_classes):
    '''Create Tiny YOLO_v3 model CNN body in keras.'''
    x1 = compose(
            DarknetConv2D_BN_Leaky(16, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(32, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(64, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(128, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(256, (3,3)))(inputs)
    x2 = compose(
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(512, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(1,1), padding='same'),
            DarknetConv2D_BN_Leaky(1024, (3,3)),
            DarknetConv2D_BN_Leaky(256, (1,1)))(x1)
    y1 = compose(
            DarknetConv2D_BN_Leaky(512, (3,3)),
            DarknetConv2D(num_anchors*(num_classes+5), (1,1)))(x2)

    x2 = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x2)
    y2 = compose(
            Concatenate(),
            DarknetConv2D_BN_Leaky(256, (3,3)),
            DarknetConv2D(num_anchors*(num_classes+5), (1,1)))([x2,x1])

    return Model(inputs, [y1,y2])


def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
    """Convert final layer features to bounding box parameters."""
    num_anchors = len(anchors)  #num_anchors = 3
    # Reshape to batch, height, width, num_anchors, box_params.
    anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])  #reshape ->(1,1,1,3,2)
grid_shape = K.shape(feats)[1:3]   # height, width   (?,13,13,27) -> (13,13)
#grid_y和grid_x用于生成网格grid，通过arange、reshape、tile的组合， 创建y轴的0~12的组合#grid_y，再创建x轴的0~12的组合grid_x，将两者拼接concatenate，就是grid；
    grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
        [1, grid_shape[1], 1, 1])
    grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
        [grid_shape[0], 1, 1, 1])
    grid = K.concatenate([grid_x, grid_y])
    grid = K.cast(grid, K.dtype(feats))  #K.cast():把grid中值的类型变为和feats中值的类型一样

    feats = K.reshape(
        feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
#将feats的最后一维展开，将anchors与其他数据（类别数+4个框值+框置信度）分离

# Adjust preditions to each spatial grid point and anchor size.
	#xywh的计算公式，见边界框回归公式。
#tx、ty、tw和th是feats值，而bx、by、bw和bh是输出值
    box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))  #sigmoid:σ
    box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
    box_confidence = K.sigmoid(feats[..., 4:5])
box_class_probs = K.sigmoid(feats[..., 5:])
	# ...操作符，在Python中，“...”(ellipsis)操作符，表示其他维度不变，只操作最前或最后1维；
    if calc_loss == True:
        return grid, feats, box_xy, box_wh
# 将box_xy，box_xy 从OUTPUT的预测数据转为真实坐标。
    return box_xy, box_wh, box_confidence, box_class_probs


def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):  #得到正确的x,y,w,h
    '''Get corrected boxes'''
    box_yx = box_xy[..., ::-1]  #“::-1”是颠倒数组的值
    box_hw = box_wh[..., ::-1]   
    input_shape = K.cast(input_shape, K.dtype(box_yx))
    image_shape = K.cast(image_shape, K.dtype(box_yx))
    new_shape = K.round(image_shape * K.min(input_shape/image_shape))
    offset = (input_shape-new_shape)/2./input_shape
    scale = input_shape/new_shape
    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes =  K.concatenate([
        box_mins[..., 0:1],  # y_min
        box_mins[..., 1:2],  # x_min
        box_maxes[..., 0:1],  # y_max
        box_maxes[..., 1:2]  # x_max
    ])

    # Scale boxes back to original image shape.
    boxes *= K.concatenate([image_shape, image_shape])
    return boxes


def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
# feats:输出的shape，->(?,13,13,27); anchors:每层对应的3个anchor box
# num_classes: 类别数（4）; input_shape:（416,416）; image_shape:图像尺寸
    '''Process Conv layer output'''
    box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
        anchors, num_classes, input_shape)
#yolo_head():box_xy是box的中心坐标，(0~1)相对位置；box_wh是box的宽高，(0~1)相对值；
#box_confidence是框中物体置信度；box_class_probs是类别置信度；
boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
#将box_xy和box_wh的(0~1)相对值，转换为真实坐标，输出boxes是(y_min,x_min,y_max,x_max)的#值
boxes = K.reshape(boxes, [-1, 4])
	#reshape,将不同网格的值转换为框的列表。即（?,13,13,3,4）->(?,4)  ？：框的数目
box_scores = box_confidence * box_class_probs
	#框的得分=框的置信度*类别置信度
    box_scores = K.reshape(box_scores, [-1, num_classes])
#reshape,将框的得分展平，变为(?,4); ?:框的数目
    return boxes, box_scores


def yolo_eval(yolo_outputs, 
#模型输出，格式如下[（?，13,13,27）（?，26,26,27）（?,52,52,27）] ?:bitch size; 13-26-52:多尺度预测； 27：预测值（3*（4+5））
              anchors,
	#[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198),(373,326)]
              num_classes, # 类别个数，此数据集有4类 
              image_shape, #placeholder类型的TF参数，默认(416, 416)；
              max_boxes=20,
#每张图每类最多检测到20个框同类别框的IoU阈值，大于阈值的重叠框被删除，重叠物体较多，则调高阈值，重叠物体较少，则调低阈值
              score_threshold=.6,
#框置信度阈值，小于阈值的框被删除，需要的框较多，则调低阈值，需要的框较少，则调高阈值；
              iou_threshold=.5):
#同类别框的IoU阈值，大于阈值的重叠框被删除，重叠物体较多，则调高阈值，重叠物体较少，则调低阈值
    """Evaluate YOLO model on given input and return filtered boxes."""
num_layers = len(yolo_outputs) #yolo的输出层数；num_layers = 3  -> 13-26-52
	# 不同的欺骗对应不同的ANCHOR大小。
    anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
#每层分配3个anchor box.如13*13分配到[6,7,8]即[（116,90）（156,198）（373,326）]
input_shape = K.shape(yolo_outputs[0])[1:3] * 32
	#输入shape(?,13,13,255);即第一维和第二维分别乘32，输出的图片尺寸为（416,416）
    boxes = []
    box_scores = []
    for l in range(num_layers):
        _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], 
	# yolo_boxes_and_scores()函数见附录yolo3.model
            anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
        boxes.append(_boxes)
        box_scores.append(_box_scores)
    boxes = K.concatenate(boxes, axis=0) #K.concatenate:将数据展平 ->(?,4)
    box_scores = K.concatenate(box_scores, axis=0) # ->(?,)

    mask = box_scores >= score_threshold 
#MASK掩码，过滤小于score阈值的值，只保留大于阈值的值
    max_boxes_tensor = K.constant(max_boxes, dtype='int32') #最大检测框数20
    boxes_ = []
    scores_ = []
    classes_ = []
    for c in range(num_classes):
        # TODO: use keras backend instead of tf.
        class_boxes = tf.boolean_mask(boxes, mask[:, c]) #通过掩码MASK和类别C筛选框boxes
        class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) 
#通过掩码MASK和类别C筛选scores
        nms_index = tf.image.non_max_suppression(  #运行非极大抑制
            class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
        class_boxes = K.gather(class_boxes, nms_index)
	#K.gather:根据索引nms_index选择class_boxes
        class_box_scores = K.gather(class_box_scores, nms_index)
	#根据索引nms_index选择class_box_score)
        classes = K.ones_like(class_box_scores, 'int32') * c #计算类的框得分
        boxes_.append(class_boxes)
        scores_.append(class_box_scores)
        classes_.append(classes)
boxes_ = K.concatenate(boxes_, axis=0)
#K.concatenate().将相同维度的数据连接在一起；把boxes_展平。  -> 变成格式:(?,4);  ?:框的个#数；4：（x,y,w,h）
    scores_ = K.concatenate(scores_, axis=0)
    classes_ = K.concatenate(classes_, axis=0)
    return boxes_, scores_, classes_
	#图片缩放到固定大小之后就是生成对应的数据
	#通过model.py(preprocess_true_boxes实现box框的框定
def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
    '''Preprocess true boxes to training input format

    Parameters
    ----------
    true_boxes: array, shape=(m, T, 5)
        Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
    input_shape: array-like, hw, multiples of 32
    anchors: array, shape=(N, 2), wh
    num_classes: integer

    Returns
    -------
    y_true: list of array, shape like yolo_outputs, xywh are reletive value

    '''
    assert (true_boxes[..., 4]0
	#每个图片都需要单独处理。
    for b in range(m):
        # Discard zero rows.
        wh = boxes_wh[b, valid_mask[b]]
        if len(wh)==0: continue
        # Expand dim to apply broadcasting.
        wh = np.expand_dims(wh, -2)
        box_maxes = wh / 2.
        box_mins = -box_maxes

        intersect_mins = np.maximum(box_mins, anchor_mins)
        intersect_maxes = np.minimum(box_maxes, anchor_maxes)
        intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
        intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
        box_area = wh[..., 0] * wh[..., 1]
        anchor_area = anchors[..., 0] * anchors[..., 1]
        iou = intersect_area / (box_area + anchor_area - intersect_area)

        # Find best anchor for each true box
# 9个设定的ANCHOR去框定每个输入的BOX。
        best_anchor = np.argmax(iou, axis=-1)

        for t, n in enumerate(best_anchor):
            for l in range(num_layers):
                if n in anchor_mask[l]:
                    i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32')
                    j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32')
                    k = anchor_mask[l].index(n)
                    c = true_boxes[b,t, 4].astype('int32')
				 # 设定数据
				 # 将T个box的标的数据统一放置到3*B*W*H*3的维度上。
                    y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
                    y_true[l][b, j, i, k, 4] = 1
                    y_true[l][b, j, i, k, 5+c] = 1

    return y_true


def box_iou(b1, b2):
    '''Return iou tensor

    Parameters
    ----------
    b1: tensor, shape=(i1,...,iN, 4), xywh
    b2: tensor, shape=(j, 4), xywh

    Returns
    -------
    iou: tensor, shape=(i1,...,iN, j)

    '''

    # Expand dim to apply broadcasting.
    b1 = K.expand_dims(b1, -2)
    b1_xy = b1[..., :2]
    b1_wh = b1[..., 2:4]
    b1_wh_half = b1_wh/2.
    b1_mins = b1_xy - b1_wh_half
    b1_maxes = b1_xy + b1_wh_half

    # Expand dim to apply broadcasting.
    b2 = K.expand_dims(b2, 0)
    b2_xy = b2[..., :2]
    b2_wh = b2[..., 2:4]
    b2_wh_half = b2_wh/2.
    b2_mins = b2_xy - b2_wh_half
    b2_maxes = b2_xy + b2_wh_half

    intersect_mins = K.maximum(b1_mins, b2_mins)
    intersect_maxes = K.minimum(b1_maxes, b2_maxes)
    intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
    b1_area = b1_wh[..., 0] * b1_wh[..., 1]
    b2_area = b2_wh[..., 0] * b2_wh[..., 1]
    iou = intersect_area / (b1_area + b2_area - intersect_area)

    return iou


def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
    '''Return yolo_loss tensor 

    Parameters
    ----------
    yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body
    y_true: list of array, the output of preprocess_true_boxes
    anchors: array, shape=(N, 2), wh
    num_classes: integer
    ignore_thresh: float, the iou threshold whether to ignore object confidence loss

    Returns
    -------
    loss: tensor, shape=(1,)

    '''
    num_layers = len(anchors)//3 # default setting
    yolo_outputs = args[:num_layers]
y_true = args[num_layers:]
	# 不同的欺骗对应不同的ANCHOR大小。
anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
	# 根据模型返回的OUTPUT计算输入图片SHAPE以及3个LAYER下，3个切片的大小。
    input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
    grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
    loss = 0
m = K.shape(yolo_outputs[0])[0] # batch size, tensor #m表示采样batch_size
    mf = K.cast(m, K.dtype(yolo_outputs[0]))
# loss是需要三层分别计算的
for l in range(num_layers):
	   # 置信率
        object_mask = y_true[l][..., 4:5]
        # 分类
        true_class_probs = y_true[l][..., 5:]
	# raw_pred是yolo_outputs[l]，经过yolo_head函数后，raw_pred数据并没有改变。
        grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
             anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
        pred_box = K.concatenate([pred_xy, pred_wh])

        # Darknet raw box to calculate loss.
	# Darknet原始盒子来计算损失。
        raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid
        raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
        raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
        box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]

        # Find ignore mask, iterate over each of batch.
        ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
        object_mask_bool = K.cast(object_mask, 'bool')
# loop_body计算batch_size内最大的IOU
        def loop_body(b, ignore_mask):
# tf.boolean_mask Apply boolean mask to tensor. Numpy equivalent is tensor[mask]. 根据y_true的置信度标识，来框定y_true的坐标系参数是否有效。
            true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0])
            iou = box_iou(pred_box[b], true_box)
            best_iou = K.max(iou, axis=-1)
	#当一张图片的最大IOU低于ignore_thresh，则认为图片内是没有目标
            ignore_mask = ignore_mask.write(b, K.cast(best_iou

 
  附录B
 训练和检测都导入了yolo3.utils： 
  #yolo3.utils中是其他使用函数，主要用于keras-yolo数据增强的一些方法。
"""Miscellaneous utility functions."""

from functools import reduce

from PIL import Image
import numpy as np
from matplotlib.colors import rgb_to_hsv, hsv_to_rgb

def compose(*funcs):
    """Compose arbitrarily many functions, evaluated left to right.

    Reference: https://mathieularose.com/function-composition-in-python/
    """
    # return lambda x: reduce(lambda v, f: f(v), funcs, x)
    if funcs:
        return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
    else:
        raise ValueError('Composition of empty sequence not supported.')

def letterbox_image(image, size):
    '''resize image with unchanged aspect ratio using padding'''
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

def rand(a=0, b=1):
    return np.random.rand()*(b-a) + a
#在utils.py(get_random_data)函数中实现数据处理
def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
    '''random preprocessing for real-time data augmentation'''
    line = annotation_line.split()
    image = Image.open(line[0])
    iw, ih = image.size
    h, w = input_shape
    box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
#not random的实现
if not random:
        # resize image
        #缩放大小
        scale = min(w/iw, h/ih)
        nw = int(iw*scale)
        nh = int(ih*scale)
        #中心点
        dx = (w-nw)//2
        dy = (h-nh)//2
        image_data=0
        if proc_img:
            image = image.resize((nw,nh), Image.BICUBIC)
		   #背景
            new_image = Image.new('RGB', (w,h), (128,128,128))
		   #黏贴图pain
            new_image.paste(image, (dx, dy))
		   #归一化
            image_data = np.array(new_image)/255.

        # correct boxes
        box_data = np.zeros((max_boxes,5))
        if len(box)>0:
            np.random.shuffle(box)
		   # 最大20个BOX。
            if len(box)>max_boxes: box = box[:max_boxes]
		   #根据缩放大小，生成新图中的BOX位置
            box[:, [0,2]] = box[:, [0,2]]*scale + dx
            box[:, [1,3]] = box[:, [1,3]]*scale + dy
            box_data[:len(box)] = box

        return image_data, box_data

# resize image
# 随机生成宽高比
new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
# 随机生成缩放比例。
scale = rand(.25, 2)
# 生成新的高宽数据，可能放大2倍。
    if new_ar < 1:
        nh = int(scale*h)
        nw = int(nh*new_ar)
    else:
        nw = int(scale*w)
        nh = int(nw/new_ar)
    image = image.resize((nw,nh), Image.BICUBIC)

# place image
# 随机水平位移
    dx = int(rand(0, w-nw))
    dy = int(rand(0, h-nh))
    new_image = Image.new('RGB', (w,h), (128,128,128))
    new_image.paste(image, (dx, dy))
    image = new_image

# flip image or not
# 翻转
    flip = rand()<.5
    if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)

# distort image
# HSV抖动
    hue = rand(-hue, hue)
    sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
val = rand(1, val) if rand()<.5 else 1/rand(1, val)
# 归一化处理
# 内部函数，通过公式转化。具体函数不介绍。
    x = rgb_to_hsv(np.array(image)/255.)
    x[..., 0] += hue
    x[..., 0][x[..., 0]>1] -= 1
    x[..., 0][x[..., 0]<0] += 1
    x[..., 1] *= sat
x[..., 2] *= val
# 避免S/V CHANNEL越界
    x[x>1] = 1
    x[x<0] = 0
    image_data = hsv_to_rgb(x) # numpy array, 0 to 1

    # correct boxes
    box_data = np.zeros((max_boxes,5))
    if len(box)>0:
        np.random.shuffle(box)
        box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
        box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
	        ### 左右翻转
        if flip: box[:, [0,2]] = w - box[:, [2,0]]
        ### 定义边界
        box[:, 0:2][box[:, 0:2]<0] = 0
        box[:, 2][box[:, 2]>w] = w
        box[:, 3][box[:, 3]>h] = h
        ### 计算新的长宽
        box_w = box[:, 2] - box[:, 0]
        box_h = box[:, 3] - box[:, 1]
        box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
        if len(box)>max_boxes: box = box[:max_boxes]
        box_data[:len(box)] = box
    return image_data, box_data
 
  请各位关注公众号，回复0115，获得下载链接。更多的文章可以关注公众号查看。

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
【目标检测数据集】卡车数据集1073张VOC+YOLO格式熬夜写代码的平头哥∰ 目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：1073标注数量(xml文件个数)：1073标注数量(txt文件个数)：1073标注类别数：1标注类别名称:["truck"]每个类别标注的框数：truck框数=1120总框数：1120使用标注工具：labelImg标注
番茄西红柿叶子病害分类数据集12882张11类别 futureflsl 数据集分类数据挖掘人工智能
数据集类型：图像分类用，不可用于目标检测无标注文件数据集格式：仅仅包含jpg图片，每个类别文件夹下面存放着对应图片图片数量(jpg文件个数)：12882分类类别数：11类别名称:["Bacterial_Spot_Bacteria","Early_Blight_Fungus","Healthy","Late_Blight_Water_Mold","Leaf_Mold_Fungus","Powdery
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
[数据集][目标检测]汽车头部尾部检测数据集VOC+YOLO格式5319张3类别 FL1623863129 数据集目标检测汽车 YOLO
数据集制作单位：未来自主研究中心(FIRC)版权单位：未来自主研究中心(FIRC)版权声明：数据集仅仅供个人使用，不得在未授权情况下挂淘宝、咸鱼等交易网站公开售卖,由此引发的法律责任需自行承担数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：5319标注数量(xml文件
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
损失函数与反向传播 Star_. PyTorch pytorch 深度学习 python
损失函数定义与作用损失函数(lossfunction)在深度学习领域是用来计算搭建模型预测的输出值和真实值之间的误差。1.损失函数越小越好2.计算实际输出与目标之间的差距3.为更新输出提供依据（反向传播)常见的损失函数回归常见的损失函数有：均方差（MeanSquaredError，MSE）、平均绝对误差（MeanAbsoluteErrorLoss，MAE）、HuberLoss是一种将MSE与MAE
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
CV、NLP、数据控掘推荐、量化海的那边- AI算法自然语言处理人工智能
下面是对CV（计算机视觉）、NLP（自然语言处理）、数据挖掘推荐和量化的简要概述及其应用领域的介绍：1.CV（计算机视觉，ComputerVision）定义：计算机视觉是一门让计算机能够从图像或视频中提取有用信息，并做出决策的学科。它通过模拟人类的视觉系统来识别、处理和理解视觉信息。主要任务：图像分类：识别图像中的物体并分类，比如猫、狗、车等。目标检测：在图像或视频中定位并识别多个对象，如人脸检测
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
基于深度学习的文本引导的图像编辑 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的文本引导的图像编辑（Text-GuidedImageEditing）是一种通过自然语言文本指令对图像进行编辑或修改的技术。它结合了图像生成和自然语言处理（NLP）的最新进展，使用户能够通过描述性文本对图像内容进行精确的调整和操控。1.文本引导的图像编辑的挑战文本和图像之间的对齐：如何将文本中的语义信息准确地映射到图像中的特定区域或元素是一个关键挑战。这涉及到多模态数据的对齐和理解。编
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
深度学习驱动的车牌识别：技术演进与未来挑战逼子歌深度学习车牌识别神经网络字符识别 YOLO 卷积神经网络
一、引言1.1研究背景在当今社会，智能交通系统的发展日益重要，而车牌识别作为其关键组成部分，发挥着至关重要的作用。车牌识别技术广泛应用于交通管理、停车场管理、安防监控等领域。在交通管理中，它可以用于车辆识别、交通违法监控和车流统计等，提高交通管理的效率和准确性。在停车场管理中，实现车辆的自动识别和收费，提升管理和服务水平。在安防监控领域，可用于追踪嫌疑人及犯罪行为。深度学习的出现为车牌识别带来了重
每天五分钟玩转深度学习PyTorch：模型参数优化器torch.optim 幻风_huanfeng 深度学习框架pytorch 深度学习 pytorch 人工智能神经网络机器学习优化算法
本文重点在机器学习或者深度学习中，我们需要通过修改参数使得损失函数最小化(或最大化)，优化算法就是一种调整模型参数更新的策略。在pytorch中定义了优化器optim，我们可以使用它调用封装好的优化算法，然后传递给它神经网络模型参数，就可以对模型进行优化。本文是学习第6步(优化器)，参考链接pytorch的学习路线随机梯度下降算法在深度学习和机器学习中，梯度下降算法是最常用的参数更新方法，它的公式
什么是AIGC？有哪些免费工具？ chent_某位 AIGC
AIGC（AIGeneratedContent），即“人工智能生成内容”，是指通过人工智能技术自动生成各种类型的数字内容。AIGC让机器能够根据输入的信息或数据生成符合人类需求的文本、图像、音频、视频等内容，极大提高了内容创作的效率。AIGC的背景与起源随着深度学习和自然语言处理技术的快速发展，人工智能已经不再局限于简单的任务，如分类、预测和数据分析，而是具备了生成内容的能力。生成式AI模型，如O
对股票分析时要注意哪些主要因素？会飞的奇葩猪股票分析云掌股吧
　　众所周知，对散户投资者来说，股票技术分析是应战股市的核心武器，想学好股票的技术分析一定要知道哪些是重点学习的，其实非常简单，我们只要记住三个要素：成交量、价格趋势、振荡指标。一、成交量　　大盘的成交量状态。成交量大说明市场的获利机会较多，成交量小说明市场的获利机会较少。当沪市的成交量超过150亿时是强市市场状态，运用技术找综合买点较准；
【Scala十八】视图界定与上下文界定 bit1129 scala
Context Bound，上下文界定，是Scala为隐式参数引入的一种语法糖，使得隐式转换的编码更加简洁。隐式参数首先引入一个泛型函数max，用于取a和b的最大值 def max[T](a: T, b: T) = { if (a > b) a else b } 因为T是未知类型，只有运行时才会代入真正的类型，因此调用a >
C语言的分支——Object-C程序设计阅读有感 darkblue086 apple c 框架 cocoa
自从1972年贝尔实验室Dennis Ritchie开发了C语言，C语言已经有了很多版本和实现，从Borland到microsoft还是GNU、Apple都提供了不同时代的多种选择，我们知道C语言是基于Thompson开发的B语言的，Object-C是以SmallTalk-80为基础的。和C++不同的是，Object C并不是C的超集，因为有很多特性与C是不同的。 Object-C程序设计这本书
去除浏览器对表单值的记忆周凡杨 html 记忆 autocomplete form 浏览
&n
java的树形通讯录 g21121 java
最近用到企业通讯录，虽然以前也开发过，但是用的是jsf，拼成的树形，及其笨重和难维护。后来就想到直接生成json格式字符串，页面上也好展现。 // 首先取出每个部门的联系人 for (int i = 0; i < depList.size(); i++) { List<Contacts> list = getContactList(depList.get(i
Nginx安装部署 510888780 nginx linux
Nginx ("engine x") 是一个高性能的 HTTP 和反向代理服务器，也是一个 IMAP/POP3/SMTP 代理服务器。 Nginx 是由 Igor Sysoev 为俄罗斯访问量第二的 Rambler.ru 站点开发的，第一个公开版本0.1.0发布于2004年10月4日。其将源代码以类BSD许可证的形式发布，因它的稳定性、丰富的功能集、示例配置文件和低系统资源
java servelet异步处理请求墙头上一根草ｊａｖａ异步返回ｓｅｒｖｌｅｔ
servlet3.0以后支持异步处理请求，具体是使用AsyncContext ，包装httpservletRequest以及httpservletResponse具有异步的功能， final AsyncContext ac = request.startAsync(request, response); ac.s
我的spring学习笔记8-Spring中Bean的实例化 aijuans Spring 3
在Spring中要实例化一个Bean有几种方法： 1、最常用的（普通方法） <bean id="myBean" class="www.6e6.org.MyBean" /> 使用这样方法，按Spring就会使用Bean的默认构造方法，也就是把没有参数的构造方法来建立Bean实例。（有构造方法的下个文细说） 2、还
为Mysql创建最优的索引 annan211 mysql 索引
索引对于良好的性能非常关键，尤其是当数据规模越来越大的时候，索引的对性能的影响越发重要。索引经常会被误解甚至忽略，而且经常被糟糕的设计。索引优化应该是对查询性能优化最有效的手段了，索引能够轻易将查询性能提高几个数量级，最优的索引会比较好的索引性能要好2个数量级。 1 索引的类型 (1) B-Tree 不出意外，这里提到的索引都是指 B-
日期函数百合不是茶 oracle sql 日期函数查询
ORACLE日期时间函数大全 TO_DATE格式(以时间:2007-11-02 13:45:25为例) Year: yy two digits 两位年显示值:07 yyy three digits 三位年显示值:007
线程优先级 bijian1013 java thread 多线程 java多线程
多线程运行时需要定义线程运行的先后顺序。线程优先级是用数字表示，数字越大线程优先级越高，取值在1到10，默认优先级为5。实例： package com.bijian.study; /** * 因为在代码段当中把线程B的优先级设置高于线程A,所以运行结果先执行线程B的run()方法后再执行线程A的run()方法 * 但在实际中，JAVA的优先级不准，强烈不建议用此方法来控制执
适配器模式和代理模式的区别 bijian1013 java 设计模式
一.简介适配器模式：适配器模式（英语：adapter pattern）有时候也称包装样式或者包装。将一个类的接口转接成用户所期待的。一个适配使得因接口不兼容而不能在一起工作的类工作在一起，做法是将类别自己的接口包裹在一个已存在的类中。 &nbs
【持久化框架MyBatis3三】MyBatis3 SQL映射配置文件 bit1129 Mybatis3
SQL映射配置文件一方面类似于Hibernate的映射配置文件，通过定义实体与关系表的列之间的对应关系。另一方面使用<select>,<insert>,<delete>，<update>元素定义增删改查的SQL语句，这些元素包含三方面内容 1. 要执行的SQL语句 2. SQL语句的入参，比如查询条件 3. SQL语句的返回结果
oracle大数据表复制备份个人经验 bitcarter oracle 大表备份大表数据复制
前提：数据库仓库A（就拿oracle11g为例）中有两个用户user1和user2,现在有user1中有表ldm_table1,且表ldm_table1有数据5千万以上，ldm_table1中的数据是从其他库B（数据源）中抽取过来的，前期业务理解不够或者需求有变，数据有变动需要重新从B中抽取数据到A库表ldm_table1中。
HTTP加速器varnish安装小记 ronin47 http varnish 加速
上午共享的那个varnish安装手册，个人看了下，有点不知所云，好吧~看来还是先安装玩玩！苦逼公司服务器没法连外网，不能用什么wget或yum命令直接下载安装，每每看到别人博客贴出的在线安装代码时，总有一股羡慕嫉妒“恨”冒了出来。。。好吧，既然没法上外网，那只能麻烦点通过下载源码来编译安装了！ Varnish 3.0.4下载地址： http://repo.varnish-cache.org/
java-73-输入一个字符串，输出该字符串中对称的子字符串的最大长度 bylijinnan java
public class LongestSymmtricalLength { /* * Q75题目：输入一个字符串，输出该字符串中对称的子字符串的最大长度。 * 比如输入字符串“google”，由于该字符串里最长的对称子字符串是“goog”，因此输出4。 */ public static void main(String[] args) { Str
学习编程的一点感想 Cb123456 编程感想 Gis
写点感想，总结一些，也顺便激励一些自己.现在就是复习阶段，也做做项目. 本专业是GIS专业，当初觉得本专业太水，靠这个会活不下去的，所以就报了培训班。学习的时候，进入状态很慢，而且当初进去的时候，已经上到Java高级阶段了，所以.....，呵呵，之后有点感觉了，不过，还是不好好写代码，还眼高手低的，有
[能源与安全]美国与中国 comsci 能源
现在有一个局面：地球上的石油只剩下N桶，这些油只够让中国和美国这两个国家中的一个顺利过渡到宇宙时代，但是如果这两个国家为争夺这些石油而发生战争，其结果是两个国家都无法平稳过渡到宇宙时代。。。。而且在战争中，剩下的石油也会被快速消耗在战争中，结果是两败俱伤。。。在这个大
SEMI-JOIN执行计划突然变成HASH JOIN了的原因分析 cwqcwqmax9 oracle
甲说： A B两个表总数据量都很大，在百万以上。 idx1 idx2字段表示是索引字段 A B 两表上都有 col1字段表示普通字段 select xxx from A where A.idx1 between mmm and nnn and exists (select 1 from B where B.idx2 =
SpringMVC-ajax返回值乱码解决方案 dashuaifu Ajax springMVC response 中文乱码
SpringMVC-ajax返回值乱码解决方案一：（自己总结，测试过可行） ajax返回如果含有中文汉字，则使用：（如下例：） @RequestMapping(value="/xxx.do") public @ResponseBody void getPunishReasonB
Linux系统中查看日志的常用命令 dcj3sjt126com OS
因为在日常的工作中，出问题的时候查看日志是每个管理员的习惯，作为初学者，为了以后的需要，我今天将下面这些查看命令共享给各位 cat tail -f 日志文件说明 /var/log/message 系统启动后的信息和错误日志，是Red Hat Linux中最常用的日志之一 /var/log/secure 与安全相关的日志信息 /var/log/maillog 与邮件相关的日志信
[应用结构]应用 dcj3sjt126com PHP yii2
应用主体应用主体是管理 Yii 应用系统整体结构和生命周期的对象。每个Yii应用系统只能包含一个应用主体，应用主体在入口脚本中创建并能通过表达式 \Yii::$app 全局范围内访问。补充: 当我们说"一个应用"，它可能是一个应用主体对象，也可能是一个应用系统，是根据上下文来决定[译：中文为避免歧义，Application翻译为应
assertThat用法 eksliang JUnit assertThat
junit4.0 assertThat用法一般匹配符1、assertThat( testedNumber, allOf( greaterThan(8), lessThan(16) ) ); 注释： allOf匹配符表明如果接下来的所有条件必须都成立测试才通过，相当于“与”（&&） 2、assertThat( testedNumber, anyOf( g
android点滴2 gundumw100 应用服务器 android 网络应用 OS HTC
如何让Drawable绕着中心旋转？ Animation a = new RotateAnimation(0.0f, 360.0f, Animation.RELATIVE_TO_SELF, 0.5f, Animation.RELATIVE_TO_SELF,0.5f); a.setRepeatCount(-1); a.setDuration(1000); 如何控制Andro
超简洁的CSS下拉菜单 ini html Web 工作 html5 css
效果体验：http://hovertree.com/texiao/css/3.htmHTML文件： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>简洁的HTML+CSS下拉菜单-HoverTree</title>
kafka consumer防止数据丢失 kane_xie kafka offset commit
kafka最初是被LinkedIn设计用来处理log的分布式消息系统，因此它的着眼点不在数据的安全性（log偶尔丢几条无所谓），换句话说kafka并不能完全保证数据不丢失。尽管kafka官网声称能够保证at-least-once，但如果consumer进程数小于partition_num，这个结论不一定成立。考虑这样一个case，partiton_num=2
@Repository、@Service、@Controller 和 @Component mhtbbx DAO spring bean prototype
@Repository、@Service、@Controller 和 @Component 将类标识为Bean Spring 自 2.0 版本开始，陆续引入了一些注解用于简化 Spring 的开发。@Repository注解便属于最先引入的一批，它用于将数据访问层 (DAO 层 ) 的类标识为 Spring Bean。具体只需将该注解标注在 DAO类上即可。同时，为了让 Spring 能够扫描类
java 多线程高并发读写控制误区 qifeifei java thread
先看一下下面的错误代码，对写加了synchronized控制，保证了写的安全，但是问题在哪里呢？ public class testTh7 { private String data; public String read(){ System.out.println(Thread.currentThread().getName() + "read data "
mongodb replica set(副本集)设置步骤 tcrct java mongodb
网上已经有一大堆的设置步骤的了，根据我遇到的问题，整理一下，如下：首先先去下载一个mongodb最新版，目前最新版应该是2.6 cd /usr/local/bin wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.6.0.tgz tar -zxvf mongodb-linux-x86_64-2.6.0.t
rust学习笔记 wudixiaotie 学习笔记
1.rust里绑定变量是let，默认绑定了的变量是不可更改的，所以如果想让变量可变就要加上mut。 let x = 1; let mut y = 2; 2.match 相当于erlang中的case，但是case的每一项后都是分号，但是rust的match却是逗号。 3.match 的每一项最后都要加逗号，但是最后一项不加也不会报错，所有结尾加逗号的用法都是类似。 4.每个语句结尾都要加分