zyw2002

目标检测——Yolo系列

深度学习经典检测方法概述

检测任务中阶段的意义

One Stage: YOLO系列
输出检测框的左上角和右下角坐标 (x1,y1),(x2,y2)
一个CNN网络回归预测即可
Two Stage: Faster-rcnn Mask-Rcnn系列
与one-stage 相比，多了一步提取候选框（预选）。

一个比方：
假设公司要招聘一些优秀的人才，one-stage 是直接一次海选，two-stage有两层选拔（相当于多了一个淘汰局）。 one-satge 速度更快适合做实时检测任务，two-stage检测效果更好。

衡量检测的指标

IOU
IOU 就是真实框和预测框间的交并比
准确度和召回率

TP: 被正确（true）的判为正样本(positive)的数目 (原来是正样本)
FP: 被错误（false）的判为正样本(positive)的数目(原来是负样本)
FN: 被错误（true）的判为负样本(negative)的数目 (原来是正样本)
TN: 被正确（true）的判为负样本(negative)的数目 (原来是负样本)

Precision: 描述的是找对的概率
Recall: 描述的是找全的概率

一个比方：
假设在做一个多选题，一共6个选项分别是A~F，正确答案是ABCDE, 你选ABF。答案中只有AB是对的，准确率是2/3。但还有CDE没有选，找全的比例是2/5.

AP 和mAP

一般来说，准确率和找回率不能同时兼顾，一个高另一个就相对变低。
改变阈值（IOU大于一定的阈值，就被判为正样本），可以得到不同的precision和recall, 然后做出presion - recall 的图（一般叫做PR图）

去上限求出与x轴围成的阴影的面积就是AP的值。如上图就是长方形A1，A2，A3,A4的面积之和。

对每个类别都计算出AP，求均值得到mAP

YOLO- V1整体思想与网络架构

经典的one-stage方法

You Only Look Once，名字就已经说明了一切！

把检测问题转化成回归问题，一个CNN就搞定了！

可以对视频进行实时检测，应用领域非常广！

核心思想：
输入数据一张图片 ——> 然后把图片分割成7*7的网格——> 然后每个格子对应两个候选框（长宽是根据经验得到）——>真实值与候选框得到IOU——> 选择IOU大的那个候选框——> 对选出来的候选框的长宽进行微调——> 预测的框的中心点坐标（x，y）,长宽w, h, 还有置信度（是物体的概率）

网络架构
输入图像resize到48483
经过多次卷积得到771024的特征图
然后全连接展开，第一个全连接得到4096个特征，第二个全连接得到1470个特征
再通过reshape 得到7730。（每张图片的网格数是7*7的，每个格子对应30个特征值，其中30个特征值的前10个是两个候选框的值x,y,w,h,c，后面的20代表的是20个分类，即属于每个类别的概率）

x,y,w,h表示的是归一化后的值，是相对位置和大小

损失函数：

非极大值抑制
检测框重叠时选择IOU最大的。

YOLOv1问题
问题1：每个Cell只预测一个类别，如果重叠无法解决
问题2：小物体检测效果一般，长宽比可选的但单一

YOLO-V2改进细节详解

改进细节

引入 Batch Normalizatioin
V2版本舍弃Dropout(V1全连接的时候杀死一些神经远，防止过拟合,V2版本没有全连接层)，卷积后全部加入Batch Normalization
网络的每一层的输入都做了归一化，收敛相对更容易
经过Batch Normalization处理后的网络会提升2%的mAP
从现在的角度来看，Batch Normalization已经成网络必备处理
更大的分辨率
V1训练时用的是224224，测试时使用448448
可能导致模型水土不服，V2训练时额外又进行了10次448*448 的微调
使用高分辨率分类器后，YOLOv2的mAP提升了约4%

网络结构

DarkNet19，实际输入为416*416
没有FC层，5次降采样，（13*13）
1*1卷积节省了很多参数

5次下采样，长宽缩小2^5=32.
416/32=13 ,除法得到的值最好是奇数，便于找到中心点对应的网格

聚类提取先验框
通过聚类得到5个先验框

faster-rcnn的先验框的比例分别是1：1，1：2，2：1，每个比例对应3个不同的size, 所以一共有3*3=9个先验框

Anchor Box
通过引入anchor boxes，使得预测的box数量更多（1313n）
跟faster-rcnn系列不同的是先验框并不是直接按照长宽固定比给定
Directed Location Prediction

感受野
是特征图上的点能看到原始图像多大区域

感受野越大，越能感受全局的物体。
堆叠两个33的卷积层，感受野是55。
堆叠三个33的卷积层，感受野是77。

为什么采用小卷积核的堆叠而不是直接采用一个大的卷积核来扩大感受野呢？

特征融合
最后一层时感受野太大了，小目标可能丢失了，需融合之前的特征。
将最后一层和倒数第二层拆分后拼接在一起。
多尺度

YOLOv3核心网络模型

V3最大的改进就是网络结构，使其更适合小目标检测
特征做的更细致，融入多持续特征图信息来预测不同规格物体
先验框更丰富了，3种scale，每种3个规格，一共9种
softmax改进，预测多标签任务

改进之处：

多scale
设计大中小三个专门预测不同大小的目标
分别是5252， 2626，13*13（其值越大，候选库框越小），每个BOX代表不同的长宽比。

如下，左图是制作图像金字塔，分别对于不同大小的图片进行预测。但速度较慢。
右图是通过上采样，不同大小的特征图进行融合。

残差连接
思想：只用好的，不好的舍弃，不比原来差。

核心网络架构： darknet53

没有池化，因为池化会压缩特征
全连接层不实用，在v2中已经去掉
下采样通过stride =2实现，图像大小缩小为原来的一半。
网格大小有三种

V1 中网格是77
V2中网格是1313
V3中网格是13*13 26*26 52*52

先验框的设计
也是通过聚类得到9个先验框。
大的先验框交给1313，中等交给2626，小的交给52*52。
softmax层替代
多标签改进,得到属于每个类别的概率值，大于一定阈值的属于该类别。

YOLOV-3 源码详解

下载地址：

PyTorch-YOLOv3源码下载地址
预训练权重

coco 数据集：
http://images.cocodataset.org/zips/val2014.zip
http://images.cocodataset.org/zips/train2014.zip

参考资料：
yolov3源码解析
https://blog.csdn.net/qq_24739717/article/details/92399359

config 文件夹

coco.data

classes= 80  # 类别
train=data/coco/trainvalno5k.txt # 训练集图片的存放路径
valid=data/coco/5k.txt # 测试集图片的存放路径
names=data/coco.names # 类别名
backup=backup/ # 记录checkpoint存放位置 
eval=coco # 选择map计算方式

create_custom_model.sh
脚本文件：用户自定义自己的模型，运行此文件用来生成自定义模型的配置文件yolov3-custom.cfg。可对比yolov3.cfg
custom.data
自己数据集的信息,用来训练自己的检测任务：类别数量，训练集路径、验证集路径、类别名称路径
yolov3.cfg
yolov3网络模型的配置信息：卷积层（卷积核数、卷积核尺寸、步长…）、yolo层及其他层的配置信息。
yolov3-custom.cfg
自定义的网络模型的配置信息，由create_custom_model.sh脚本文件生成。
yolov3-tiny.cfg
yolov3的tiny版本网络模型的配置信息。

data 文件夹

coco文件夹

是coco训练集、验证集的数据集，是运行get_coco_dataset.sh脚本文件后的结果。

custom 文件夹

是自定义数据集的信息。
1）images文件夹：所有训练集、验证集的图片。
2）labels文件夹：使用图片标记软件对images文件夹里的图片进行标注得到对应的标签文件。每个标签文件为一个txt文件，txt文件的每一行数据为一个groundthuth信息：类别序号，边界框坐标信息。如图示例，0代表类别索引号，后面为边界框坐标信息
3）classes.names是自定义数据集的类别名称文件。
4）train.txt文件是训练集图片路径的集合，每行数据是训练集某图像的路径。
5）valid.txt文件是验证集图片路径的集合，每行数据是训练集某图片的路径。

samples文件夹

是模型测试图片所在的文件夹，用来看模型的检测结果。

coco.names

coco数据的类别信息，类似classes.names。

get_coco_dataset.sh

脚本文件，用来获取coco数据，生成coco文件夹及其内容。

Utils文件夹

augmentations.py

进行数据增强的文件，本项目只是进行水平翻转的数据增强，图像进行翻转的时候，对应标注信息也进行了修改，最终返回的是翻转后的图片和翻转后的图片对应的标签。

导包

import torch
import torch.nn.functional as F
import numpy as np

horisontal_flip()
输入：image,targets 是原始图像和标签；
返回：images，targets是翻转后的图像和标签。
功能：horisontal_flip() 函数是对图像进行数据增强，使得数据集得到扩充。在此处只采用了对图片进行水平方向上的镜像翻转。

def horisontal_flip(images, targets): #对图像和标签进行镜像翻转
    images = torch.flip(images, [-1]) #镜像翻转
    targets[:, 2] = 1 - targets[:, 2]
    # targets是对应的标签[置信度，中心点高度，中心点宽度，框高度，框宽度]
    # 镜像翻转时，受影响的只有targets[:, 2],
    return images, targets

torch.flip(input,dims) ->tensor
功能：对数组进行反转
参数: imput 反转的tensor ; dim 反转的维度
返回：反转后的tensor
由于image 是用数组存储起来的（c,h,w）,三个维度分别代表颜色通道，垂直方向，水平方向。python 中[-1] 代表最后一个数，即水平方向。

targets是对应的标签[置信度，中心点高度，中心点宽度，框高度，框宽度], 其中高度宽度都是用相对位置表示的，范围是[0,1]。

datasets.py

对数据集进行操作的py文件，包含图像的填充、图像大小的调整、测试数据集的加载类、评估数据集的加载类。整个文件包含3个函数和2个类,如下

导包

import glob
import random
import os
import sys
import numpy as np
from PIL import Image
import torch
import torch.nn.functional as F

from utils.augmentations import horisontal_flip
from torch.utils.data import Dataset
import torchvision.transforms as transforms

pad_to_square
输入：img 原始图片，pad_value 填充padding的值
输出：padding 后的图片
功能：将原始的图片添加padding,使之扩充为一个正方形。正方形的边长取max(width,length)。然后采用F.pad 函数用常数填充padding

'''图片填充函数：
把图片用pad_value填充成一个正方形，返回填充后的图片以及填充的位置信息'''
def pad_to_square(img, pad_value):  # 图片填充为正方形，pad_value：补全部分所填充的数值
    c, h, w = img.shape
    dim_diff = np.abs(h - w)
    # (upper / left) padding and (lower / right) padding
    pad1, pad2 = dim_diff // 2, dim_diff - dim_diff // 2
    # 填充方式，如果高小于宽则上下填充，如果高大于宽，左右填充
    pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)
    # 图片填充，参数img是原图，pad是填充方式（0,0,pad1,pad2）或（pad1，pad2,0，0），value是填充的值
    img = F.pad(img, pad, "constant", value=pad_value)
    return img, pad

resize
输入：image，原始图片; size，期望resize到的图片大小
输出：resize后的图片
功能：实现上/下采样的功能

'''图片调整大小：将正方形图片使用插值方法，改变到固定size大小'''
def resize(image, size):
    image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0)  #将原始图片解压后用“nearest”方法进行填充，然后再压缩
    return image

pytorch torch.nn.functional.interpolate实现插值和上采样
torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)
参数：
input (Tensor) – 输入张量
size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) – 输出大小
scale_factor (float or Tuple[float]) – 指定输出为输入的多少倍数。如果输入为tuple，其也要制定为tuple类型
mode (str) – 可使用的上采样算法，有’nearest’, ‘linear’, ‘bilinear’, ‘bicubic’ , ‘trilinear’和’area’. 默认使用’nearest’

random_resize
输入：image 原始图片， min_size,max_size 随机数所在的范围
输出：调整后的图片
功能：随机调整图片的大小

"""随机裁剪函数：将图片随机裁剪到某个尺寸（使用插值法）"""
def random_resize(images, min_size=288, max_size=448):
    new_size = random.sample(list(range(min_size, max_size + 1, 32)), 1)[0]
    images = F.interpolate(images, size=new_size, mode="nearest")
    return images

ImageFolder
功能：用来定义数据集的标准格式。
从文件夹中读取图片，将图片padding成正方形，所有的输入图片大小调整为416*416，返回图片的数量

'''数据集加载类1：加载并处理图片，返回的是图片路径，和经过处理后的图片'''
#用于预测：在detect.py中加载数据集时使用
class ImageFolder(Dataset): # 这是定义数据集的标准格式
    def __init__(self, folder_path, img_size=416):#初始化的参数为：测试图片所在的文件夹的路径、图片的尺寸（用于输入到网络的图片的大小）
        #获取文件夹下图片的路径，files是图片路径组成的列表
        self.files = sorted(glob.glob("%s/*.*" % folder_path))#例在detect.py中folder_path=data/samples
        self.img_size = img_size #初始化图片的尺寸

    def __getitem__(self, index): #根据索引获取列表里的图片的路径
        img_path = self.files[index % len(self.files)]
        # 将图片转换为tensor的格式
        img = transforms.ToTensor()(Image.open(img_path))
        # 用0将图片填充为正方形
        img, _ = pad_to_square(img, 0)
        # 将图片大小调整为指定大小
        img = resize(img, self.img_size)
        return img_path, img  # 返回 index 对应的图片的 路径和 图片

    def __len__(self):
        return len(self.files) # 所有图片的数量

ListDataset

Dataset类：
pytorch读取图片，主要通过Dataset类。Dataset类作为所有datasets的基类，所有的datasets都要继承它
init：用来初始化一些有关操作数据集的参数
getitem:定义数据获取的方式（包括读取数据，对数据进行变换等），该方法支持从 0 到 len(self)-1的索引。obj[index]等价于obj.getitem
len:获取数据集的大小。len(obj)等价于obj.len()

"""数据集加载类2：加载并处理图片和图片标签，返回的是图片路径，经过处理后的图片，经过处理后的标签"""
#用于评估：在test.py中加载数据集时候使用
class ListDataset(Dataset):
	# 数据的载入
    def __init__(self, list_path, img_size=416, augment=True, multiscale=True, normalized_labels=True):
    #初始化参数：list_path为验证集图片的路径组成的txt文件，的路径、img_size为图片大小（输入到网络中的图片的大小）、augment是否数据增强、multiscale是否使用多尺度，normalized_labels标签是否归一化
    	#获取验证集图片路径img_files,是一个列表
        with open(list_path, "r") as file: #打开valid.txt文件，内容为data/custom/images/train.jpg，指明了验证集对应的图片路径
            self.img_files = file.readlines()
		# 获取验证集标签路径label_files：是一个列表，根据验证集图片的路径获取标签路径，两者之间是文件夹及后缀名不同，
        self.label_files = [
            path.replace("images", "labels").replace(".png", ".txt").replace(".jpg", ".txt")
            for path in self.img_files
        ] 
        #其他设置
        self.img_size = img_size
        self.max_objects = 100 # 最多目标个数
        self.augment = augment # bool. 是否使用增强
        self.multiscale = multiscale # bool. 是否多尺度输入，每次喂到网络中的batch中图片大小不固定。
        self.normalized_labels = normalized_labels  # bool. 默认label.txt文件中的bbox是归一化到0-1之间的
        self.min_size = self.img_size - 3 * 32
        self.max_size = self.img_size + 3 * 32 # self.min_size和self.max_size的作用主要是经过数据处理后生成三种不同size的图像，目的是让网络对小物体和大物体都有较好的检测结果。
        self.batch_count = 0  # 当前网络训练的是第几个batch

 	 #根据下标 index 找到对应的图片,并对图片、标签进行填充，适应于正方形，对标签进行归一化。返回图片路径，图片，标签 
    def __getitem__(self, index):  # 读取数据和标签

        # ---------
        #  Image
        # ---------
		# 根据索引获取图片的路径
        img_path = self.img_files[index % len(self.img_files)].rstrip()
        img_path = 'F:\\cv\\PyTorch-YOLOv3\\PyTorch-YOLOv3\\data\\coco' + img_path
        # print (img_path)
        # 把图片变为tensor
        img = transforms.ToTensor()(Image.open(img_path).convert('RGB'))

        #  把图片变为三个通道，获取图像的宽和高
        if len(img.shape) != 3:
            img = img.unsqueeze(0)
            img = img.expand((3, img.shape[1:]))

        _, h, w = img.shape
        h_factor, w_factor = (h, w) if self.normalized_labels else (1, 1)  # 如果标注bbox不是归一化的，则标注里面的保存的就是真实位置
        #把图片填充为正方形，返回填充后的图片，以及填充的信息 pad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)
        img, pad = pad_to_square(img, 0)
        #填充后的高和宽
        _, padded_h, padded_w = img.shape

        # ---------
        #  Label
        # ---------
		#根据索引，获取标签路径
        label_path = self.label_files[index % len(self.img_files)].rstrip()
        label_path='F:\\cv\\PyTorch-YOLOv3\\PyTorch-YOLOv3\\data\\coco\\labels'+ label_path
        #print (label_path)
		
		
        targets = None
        if os.path.exists(label_path): #读取某张图片的标签信息
            # 读取一张图片内的边界框：txt文件包含的边界框的坐标信息是归一化后的坐标
            boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5)) # [0class_id, 1x_c, 2y_c, 3w, 4h] 归一化的, 归一化是为了加速模型的收敛
            # np.loadtxt()函数主要将标签里的值转化为araray
            #  将归一化后的坐标变为适应于原图片的坐标
            # 使用(x_c, y_c, w, h)获取真实坐标（左上，右下）
            x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
            y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
            x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
            y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)
            # 将坐标变为适应于填充为正方形后图片的坐标
            # 标注要和原图做相同的调整 pad（0左，1右，2上，3下）
            x1 += pad[0]
            y1 += pad[2]
            x2 += pad[1]
            y2 += pad[3]
            # 将边界框的信息转变为（x,y,w,h）形式,并归一化
            # (padded_w, padded_h)是当前padding之后图片的宽度
            boxes[:, 1] = ((x1 + x2) / 2) / padded_w
            boxes[:, 2] = ((y1 + y2) / 2) / padded_h
            # (w_factor, h_factor)是原始图的宽高
            boxes[:, 3] *= w_factor / padded_w
            boxes[:, 4] *= h_factor / padded_h

			 # #长度为6：（0，类别索引，x,y,w,h）
            targets = torch.zeros((len(boxes), 6))
            targets[:, 1:] = boxes

        # Apply augmentations
        if self.augment:
            if np.random.random() < 0.5:
                img, targets = horisontal_flip(img, targets) #数据增强

        return img_path, img, targets #返回index对应的图片路径，填充和调整大小之后的图片，图片标签归一化后的格式 (img_id, class_id, x_c, y_c, w, h)

	# collate_fn：实现自定义的batch输出。如何取样本的，定义自己的函数来准确地实现想要的功能，并给target赋予索引
    def collate_fn(self, batch):
        paths, imgs, targets = list(zip(*batch)) # #获取批量的图片路径、图片、标签
        #target的每个元素为每张图片的所有边界框的信息
        targets = [boxes for boxes in targets if boxes is not None]
        #读取target的每个元素，每个元素为一张图片的所有边界框信息，并微每张图片的边界框标相同的序号
        for i, boxes in enumerate(targets):
            boxes[:, 0] = i #为每个边界框增加索引，序号
        targets = torch.cat(targets, 0) # 直接将一个batch中所有的bbox合并在一起，计算loss时是按batch计算
        # Selects new image size every tenth batch
        if self.multiscale and self.batch_count % 10 == 0:
            self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
        # Resize images to input shape
        # 每10个样本随机调整图像大小
        imgs = torch.stack([resize(img, self.img_size) for img in imgs])  # 调整图像大小放入栈中
        self.batch_count += 1
        return paths, imgs, targets # 返回归一化后的[img_id, class_id, x_c, y_c, h, w]

    def __len__(self):
        return len(self.img_files)

logger.py

用来将监控数据写入文件系统（日志），保存训练的某些信息。如损失等。这个logger类在train.py中使用，在训练过程中保存一些信息到日志文件。

import tensorflow as tf

class Logger(object):
    def __init__(self, log_dir): #log_dir 是日志的路径
        """Create a summary writer logging to log_dir."""
        self.writer = tf.summary.create_file_writer(log_dir) #创建一个summary writer
        # 由于版本问题，tf.summary.FileWriter可能会报错，改为tf.compat.v1.summary.FileWriter
    def scalar_summary(self, tag, value, step):  #  记录a scalar variable
        with self.writer.as_default():
            tf.summary.scalar(tag, value, step=step)
            self.writer.flush()
    def list_of_scalars_summary(self, tag_value_pairs, step):
        with self.writer.as_default():
            for tag, value in tag_value_pairs:
                tf.summary.scalar(tag, value, step=step)
            self.writer.flush()
        # summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value) for tag, value in tag_value_pairs])
        # self.writer.add_summary(summary, step)

parse_config.py

包含两个解析器：
1.模型配置解析器：返回一个列表model_defs，列表的每一个元素为一个字典，字典代表模型某一个层（模块）的信息。
2.数据配置解析器：返回一个字典，每一个键值对描述了，数据的名称路径，或其他信息。

'''模型配置解析器：解析yolo-v3层配置文件函数，并返回模块定义module_defs，path就是yolov3.cfg路径'''
def parse_model_config(path):
    '''
    看此函数，一定要先看config文件夹下的yolov3.cfg文件，如下是yolov3。cfg的一部分内容展示：
    [convolutional]
    batch_normalize=1
    filters=32
    size=3
    stride=1
    pad=1
    activation=leaky
    # Downsample
    [convolutional]
    batch_normalize=1
    filters=64
    size=3
    stride=2
    pad=1
    activation=leaky
    。。。
    :param path: 模型配置文件路径，yolov3.cfg的路径
    :return: 模型定义，列表类型，列表中的元素是字典，字典包含了每一个模块的定义参数
    '''

    # 打开yolov3.cfg文件,并将文件内容存入列表，列表的每一个元素为文件的一行数据。
    file = open(path, 'r')
    lines = file.read().split('\n')
    lines = [x for x in lines if x and not x.startswith('#')]#不读取注释
    lines = [x.rstrip().lstrip() for x in lines] #去除边缘空白

    #定义一个列表modle_defs
    module_defs = []
    #读取cfg的每一行内容：
    #   1.如果该行内容以[开头:代表是模型的一个新块的开始，给module_defs列表新增一个字典
    #   字典的‘type’=[]内的内容，如果[]内的内容是convolution,则字典添加'batch_normalize'：0
    #   2.如果该行内容不以[开头，代表是块的具体内容
    #   等号前的值为字典的key，等号后的值为字典的value
    for line in lines:#读取yolov3.cfg文件的每一行

        #如果一行内容以[开头说明是一个模型的开始,[]里的内容是模块的名称，如[convolutional][convolutional][shortcut]。。。。
        if line.startswith('['): # This marks the start of a new block
            # 将一个空字典添加到模型定义module_defs列表中
            module_defs.append({})
            # 给该字典内容赋值：例{’type‘：’convolutional‘}
            module_defs[-1]['type'] = line[1:-1].rstrip()
            # 如果当前的模块是convolutional模块，给字典的内容赋值：{’type‘：’convolutional‘，'batch_normalize'：0}
            if module_defs[-1]['type'] == 'convolutional':
                module_defs[-1]['batch_normalize'] = 0

        #如果一行内容不以[开头说明是模块里的具体内容
        else:
            key, value = line.split("=")
            value = value.strip()#strip()删除头尾空格，rstrip()删除结尾空格
            # 将该行内容添加到字典中，key为等式左边的内容，value为等式右边的内容
            module_defs[-1][key.rstrip()] = value.strip()

    return module_defs#模型定义，是一个列表，列表每一个元素为一个字典，字典包含一个模块的具体信息

'''数据配置解析器：参数path为配置文件的路径'''
def parse_data_config(path):
    """
    数据配置包含的信息：
    classes= 80
    train=data/coco/trainvalno5k.txt
    valid=data/coco/5k.txt
    names=data/coco.names
    backup=backup/
    eval=coco
    """

    #创建一个字典
    options = dict()

    #为字典添加元素
    options['gpus'] = '0,1,2,3'
    options['num_workers'] = '10'

    #读取数据配置文件的每一行，并将每一行的信息以键值对的形式存入字典中
    with open(path, 'r') as fp:
        lines = fp.readlines()
    for line in lines:
        line = line.strip()
        if line == '' or line.startswith('#'):
            continue
        key, value = line.split('=')
        options[key.strip()] = value.strip()
        
    return options#返回一个字典，字典的key为名称（train，valid,names..），value为路径或其他信息

Utils.py

from __future__ import division
import tqdm
import torch
import numpy as np
def to_cpu(tensor):
    return tensor.detach().cpu()
'''加载数据集类别信息：返回类别组成的列表'''
def load_classes(path):#参数为类别名称文件的路径。例coco.names或classes.names的路径
    fp = open(path, "r")
    names = fp.read().split("\n")[:-1]#将文件的每一行数据存入列表，这使得数据集的每个类别的名称存入到一个列表
    return names#返回类别名称构成的列表
'''权重初始化函数'''
def weights_init_normal(m):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:#卷积层权重初始化设置
        torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find("BatchNorm2d") != -1:#批量归一化层权重初始化设置
        torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
        torch.nn.init.constant_(m.bias.data, 0.0)
'''改变预测边界框的尺寸函数：参数为，边界框、当前的图片尺寸（标量）、原始图片尺寸。因为网络预测的边界框信息是，
对图像填充、调整大小后的图片进行预测的结果，因此需要对预测的边界框进行调整使其适应于原图的目标'''
def rescale_boxes(boxes, current_dim, original_shape):
    #原始图片的高和宽
    orig_h, orig_w = original_shape

    #原始图片的填充信息：根据原图的宽高的差值来计算。
    #pad_x为宽天长的像素数量， pad_y为高填充的像素数量
    pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))# 原图的高大于宽。改变后图片的大小/原图的最长边的尺寸=缩放比率
    pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))

    #将预测的边界框信息，调整为适应于原图
    unpad_h = current_dim - pad_y
    unpad_w = current_dim - pad_x
    # 改变预测边界框的尺寸，使其是适用于原图片
    boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w#左上x的坐标
    boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h#左上y的坐标
    boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w
    boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h
    return boxes#返回调整后的预测边界框的信息/
'''将边界框信息转换为左上右下坐标表示函数'''
def xywh2xyxy(x):
    y = x.new(x.shape)
    y[..., 0] = x[..., 0] - x[..., 2] / 2
    y[..., 1] = x[..., 1] - x[..., 3] / 2
    y[..., 2] = x[..., 0] + x[..., 2] / 2
    y[..., 3] = x[..., 1] + x[..., 3] / 2
    return y
"""度量计算：参数为true_positive（值为0或1,list）、预测置信度(list)，预测类别(list)，真实类别(list)
返回：p, r, ap, f1, unique_classes.astype("int32")"""
def ap_per_class(tp, conf, pred_cls, target_cls):#参数：true_positives, pred_scores, pred_labels 、图片真实标签信息

    # 按照置信度排序，后的tp, conf, pred_cls
    i = np.argsort(-conf)
    #print('所有预测框的个数为',len(i))
    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]#按照置信度排序后的tp(值为0,1), conf, pred_cls
    #print('tp[i]',tp[i])

    # 获取图片中真实框所包含的类别（类别不重复）
    unique_classes = np.unique(target_cls)
    #print('unique_classes',unique_classes)

    # Create Precision-Recall curve and compute AP for each class
    ap, p, r = [], [], []
    for c in tqdm.tqdm(unique_classes, desc="Computing AP"):#为每一个类别计算AP

        # i:对于所有预测边界框的类pred_cls，判断与当前c类是否相同，相同则该位置为true否则为false,得到与pred_class形状相同的布尔列表
        i = pred_cls == c

        # ground truth 中类别为c的数量
        n_gt = (target_cls == c).sum()

        #预测边界框中类别为c的数量
        n_p = i.sum()

        if n_p == 0 and n_gt == 0:
            continue
        elif n_p == 0 or n_gt == 0:
            ap.append(0)
            r.append(0)
            p.append(0)
        else:
            # 计算FP和TP
            fpc = (1 - tp[i]).cumsum()#i列表记录着索引对应位置是否是c类别的边界框，tp记录着索引对应位置是否是正例框
            tpc = (tp[i]).cumsum()
            # print('tp[i]',tp[i],len(tp[i]))#tp[i]是所有框中类别为c的预测框的true_positive信息（值为0或1，1代表与真值框iou大于阈值）
            # print('fpc',fpc,len(fpc))#fpc为类别为c的预测框中为正例的预测框
            # print('tpc', tpc,len(tpc))#tpc为类别为c的预测框中为负例的预测框

            #计算召回率
            recall_curve = tpc / (n_gt + 1e-16)
            #print('recall_curve',recall_curve)
            r.append(recall_curve[-1])
            #print('r',r)

            #计算精度
            precision_curve = tpc / (tpc + fpc)
            #print('precision_curve',precision_curve)
            p.append(precision_curve[-1])
            #print('p',p)

            # 计算AP：AP from recall-precision curve
            ap.append(compute_ap(recall_curve, precision_curve))

    # Compute F1 score (harmonic mean of precision and recall)
    p, r, ap = np.array(p), np.array(r), np.array(ap)
    f1 = 2 * p * r / (p + r + 1e-16)
    return p, r, ap, f1, unique_classes.astype("int32")
"""计算AP"""
def compute_ap(recall, precision):#参数精度和召回率
    # correct AP calculation
    # 给Precision-Recall曲线添加头尾
    mrec = np.concatenate(([0.0], recall, [1.0]))
    mpre = np.concatenate(([0.0], precision, [0.0]))

    # compute the precision envelope
    # 简单的应用了一下动态规划，实现在recall=x时，precision的数值是recall=[x, 1]范围内的最大precision
    for i in range(mpre.size - 1, 0, -1):
        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

    # to calculate area under PR curve, look for points
    # where X axis (recall) changes value
    # 寻找recall[i]!=recall[i+1]的所有位置，即recall发生改变的位置，方便计算PR曲线下的面积，即AP
    i = np.where(mrec[1:] != mrec[:-1])[0]

    # and sum (\Delta recall) * prec
    # 用积分法求PR曲线下的面积，即AP
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap
'''统计信息计算：参数，模型预测输出（NMS处理后的结果），真实标签（适应于原图的x,y,x,y）,iou阈值。
返回，true_positive（值为0/1，如果预测边界框与真实边界框重叠度大则值为1，否则为0）,预测置信度，预测类别'''
def get_batch_statistics(outputs, targets, iou_threshold):
    # outputs为非极大值抑制后的结果(x,y,x,y,object_confs,class_confs,class_preds)长度为7
    batch_metrics = []
    for sample_i in range(len(outputs)):#遍历每个output的边界框，因为是批量操作的，每个批量有很多图片，每个图片对应一个output,所以遍历每个output
        if outputs[sample_i] is None:
            continue
        '''图片的预测信息：'''
        output = outputs[sample_i]#取第sample_i个output信息，每个output里面包含很多边界框
        pred_boxes = output[:, :4]#预测边界框的坐标信息
        pred_scores = output[:, 4]#预测边界框的置信度
        pred_labels = output[:, -1]#预测边界框的类别

        true_positives = np.zeros(pred_boxes.shape[0])#true_positive的长度为pre_boxes的个数

        '''图片的标注信息（groundtruth）：'''
        #坐标信息，格式为（xyxy）
        annotations = targets[targets[:, 0] == sample_i][:, 1:]#这句把对应ID下的target和图像进行匹配，dataset.py里的ListDataset类里的collate_fn函数给target赋予ID
        #类别信息
        target_labels = annotations[:, 0] if len(annotations) else []

        if len(annotations):
            detected_boxes = []#创建空列表
            target_boxes = annotations[:, 1:]#真实边界框（groundtruth）坐标
            for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):#遍历预测框：坐标和类别
                if len(detected_boxes) == len(annotations):
                    break
                # Ignore if label is not one of the target labels
                if pred_label not in target_labels:
                    continue

                # 计算预测框和真实框的IOU
                iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0)
                #如果预测框和真实框的IOU大于阈值，那么可以认为该预测边界框预测’正确‘，并把该边界框的true_positives值设置为1
                if iou >= iou_threshold and box_index not in detected_boxes:
                    true_positives[pred_i] = 1
                    detected_boxes += [box_index]
        batch_metrics.append([true_positives, pred_scores, pred_labels])

    return batch_metrics#true_positive,预测置信度，预测类别
"""未用到"""
def bbox_wh_iou(wh1, wh2):
    wh2 = wh2.t()
    w1, h1 = wh1[0], wh1[1]
    w2, h2 = wh2[0], wh2[1]
    inter_area = torch.min(w1, w2) * torch.min(h1, h2)
    union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
    return inter_area / union_area
"""计算两个边界框的IOU值"""
def bbox_iou(box1, box2, x1y1x2y2=True):

    #获取边界框的左上右下坐标值
    if not x1y1x2y2:
        #如果边界框的表示方式为（center_x,center_y,width,height）则转换表示格式为（x，y，x，y）
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]#box1的左上右下坐标
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]#box1的左上右下坐标

    #相交矩形的左上右下坐标
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)
    # 相交矩形的面积
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
        inter_rect_y2 - inter_rect_y1 + 1, min=0
    )
    #并集的面积
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
    return iou#返回重叠度IOU的值
'''非极大值抑制函数：返回边界框【x1,y1,x2,y2,conf,class_conf,class_pred】，参数为，模型预测，置信度阈值，nms阈值'''
def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
    """
    Removes detections with lower object confidence score than 'conf_thres' and performs Non-Maximum Suppression to further filter detections.
    Returns detections with shape:
        (x1, y1, x2, y2, object_conf, class_score, class_pred)
    """

    """（1）模型预测坐标格式转变： (center x, center y, width, height) to (x1, y1, x2, y2)"""
    #三个yolo层,有三个尺寸的输出分别为13,26,52，所以对于一张图片，
    # 模型输出的shape是(10647,85),(13*13+26*26+52*52)*3=10647,后面的85是(x,y,w,h, conf, cls) xywh加一个置信度加80个分类。
    #prediction的形状为[1, 10647, 85]，85的前4个信息为坐标信息（center x, center y, width, height）
    # 第5个信息为目标置信度，第6-85的信息为80个类的置信度
    prediction[..., :4] = xywh2xyxy(prediction[..., :4])# 将模型预测的坐标信息由(center x, center y, width, height) 格式转变为 (x1, y1, x2, y2)格式
    output = [None for _ in range(len(prediction))]

    #遍历每个图片，每张图片的预测image_pred：
    for image_i, image_pred in enumerate(prediction):#遍历预测边界框
        """（2）边界框筛选：去除目标置信度低于阈值的边界框"""
        image_pred = image_pred[image_pred[:, 4] >= conf_thres]#筛选每幅图片预测边界框中目标置信度大于阈值的边界框
        # If none are remaining => process next image
        if not image_pred.size(0):#判断本图片经过目标置信度阈值的赛选是否还存在边界框，如果没有边界框则执行下一个图片的NMS
            continue

        """(3)非极大值抑制：根据score进行排序得到最大值，找到和这个score最大的预测类别相同的计算iou值，通过加权计算，得到最终的预测框(xyxy),最后从prediction中去掉iou大于设置的iou阈值的边界框。"""
        # 分数=目标置信度*80个类别得分的最大值。
        score = image_pred[:, 4] * image_pred[:, 5:].max(1)[0]
        # 根据score为图片中的预测边界框进行排序
        image_pred = image_pred[(-score).argsort()]#形状【经过置信度阈值筛选后的边界框数量，85】
        #类别置信度最大值和类别置信度最大值所在位置（索引，也就是预测的类别）
        class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True)#
        detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1)#(x,y,x,y,object_confs,class_confs,class_preds)长度为7

        keep_boxes = []
        while detections.size(0):
            # 将当前第一个边界框（当前分数最高的边界框）与剩余边界框计算IoU，并且大于NMS阈值的边界框
            #第一个bbx与其余bbx的iou大于nms_thres的判别(0, 1)， 1为大于，0为小于
            large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres

            # 判断他们的类别是否相同，只有相同时才进行nms， 相同时为1， 不同时为0
            label_match = detections[0, -1] == detections[:, -1]

            # invalid 为Indices of boxes with lower confidence scores, large IOUs and matching labels
            # 只有在两个bbx的iou大于thres，且类别相同时，invalid为True，其余为False
            invalid = large_overlap & label_match
            # weights为对应的权值, 其格式为：将True bbx中的confidence连成一个Tensor
            weights = detections[invalid, 4:5]
            # Merge overlapping bboxes by order of confidence
            # 这里得到最后的bbx它是跟他满足IOU大于threshold，并且相同label的一些bbx，根据confidence重新加权得到
            # 并不是原始bbx的保留。
            detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
            keep_boxes += [detections[0]]
            ## 去掉这些invalid，即iou大于阈值且预测同一类
            detections = detections[~invalid]
        if keep_boxes:
            output[image_i] = torch.stack(keep_boxes)
    return output#返回NMS后的边界框(x,y,x,y,object_confs,class_confs,class_preds)长度为7、

def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):

    ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
    FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor

    nB = pred_boxes.size(0)
    nA = pred_boxes.size(1)
    nC = pred_cls.size(-1)
    nG = pred_boxes.size(2)

    # Output tensors
    obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
    noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
    class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
    iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
    tx = FloatTensor(nB, nA, nG, nG).fill_(0)
    ty = FloatTensor(nB, nA, nG, nG).fill_(0)
    tw = FloatTensor(nB, nA, nG, nG).fill_(0)
    th = FloatTensor(nB, nA, nG, nG).fill_(0)
    tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)

    # Convert to position relative to box
    target_boxes = target[:, 2:6] * nG
    gxy = target_boxes[:, :2]
    gwh = target_boxes[:, 2:]
    # Get anchors with best iou
    ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
    best_ious, best_n = ious.max(0)
    # Separate target values
    b, target_labels = target[:, :2].long().t()
    gx, gy = gxy.t()
    gw, gh = gwh.t()
    gi, gj = gxy.long().t()
    # Set masks
    obj_mask[b, best_n, gj, gi] = 1
    noobj_mask[b, best_n, gj, gi] = 0

    # Set noobj mask to zero where iou exceeds ignore threshold
    for i, anchor_ious in enumerate(ious.t()):
        noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0

    # Coordinates
    tx[b, best_n, gj, gi] = gx - gx.floor()
    ty[b, best_n, gj, gi] = gy - gy.floor()
    # Width and height
    tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
    th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
    # One-hot encoding of label
    tcls[b, best_n, gj, gi, target_labels] = 1
    # Compute label correctness and iou at best anchor
    class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
    iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)

    tconf = obj_mask.float()
    return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf

detect.py

模型训练完成后，进行检测测试的文件。验证数据集在data/samples文件夹下，验证结果保存在本py文件自动创建的文件夹output文件夹下。

from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
import os
import time
import datetime
import argparse
from PIL import Image
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.ticker import NullLocator

if __name__ == "__main__":
    ##########################################################################################################################
    '''(1)参数解析'''
    parser = argparse.ArgumentParser()
    # 测试文件夹路径
    parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset")
    #yolov3的模型信息（网络层，每层的卷积核数量，尺寸，步长。。。）
    parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
    #预训练模型路径
    parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
    #类名字
    parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
    #目标置信度阈值
    parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold")
    #NMS的IoU阈值
    parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression")
    #批量大小
    parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")
    #CPU线程
    parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")
    #图片维度
    parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
    #checkpoint_model
    parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model")
    opt = parser.parse_args()
    print(opt)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    os.makedirs("output", exist_ok=True)#创建预测图片的输出位置
    ##########################################################################################################################
    '''(2)模型构建'''
    # 加载模型：这条语句加载darkent模型结构，即YOLOv3模型。Darknet模型在model.py中进行定义。
    #将模型设置为评估模式
    model = Darknet(opt.model_def, img_size=opt.img_size).to(device)#根据模型的配置文件，搭建模型的结构
    #为模型结构加载训练的权重（模型参数）
    if opt.weights_path.endswith(".weights"):
        # Load darknet weights
        model.load_darknet_weights(opt.weights_path)
    else:
        model.load_state_dict(torch.load(opt.weights_path))
    model.eval()  # 设置模型为评估模式，不然只要输入数据就会进行参数更新、优化
    ##########################################################################################################################
    '''(3)数据集加载、类别加载'''
    #加载测试的图片：
    # dataloader本质是一个可迭代对象，使用iter()访问，不能使用next()访问；
    #也可以使用`for inputs, labels in dataloaders`进行可迭代对象的访问
    #一般我们实现一个datasets对象，传入到dataloader中；然后内部使用yeild返回每一次batch的数据
    dataloader = DataLoader(
        ImageFolder(opt.image_folder, img_size=opt.img_size),#评估数据集，ImageFolder在datasets.py中定义，返回的是图片路径，和经过处理（填充、调整大小）的图片
        batch_size=opt.batch_size,
        shuffle=False,
        num_workers=opt.n_cpu,
    )

    #加载类别名，classes是一个列表
    classes = load_classes(opt.class_path)  # Extracts class labels from file

    #创建保存图片路径和图片检测信息的列表
    imgs = []
    img_detections = [] 
    ##########################################################################################################################
    """(3)模型预测：将图片路径、图片预测结果存入imgs和img_detections列表中"""
    
    print("\nPerforming object detection:")
    prev_time = time.time()
    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
    
    # 测试图片的检测：并将图片路径和检测结果信息保存
    # 算出batch中图片的地址img_paths和检测结果detections
    for batch_i, (img_paths, input_imgs) in enumerate(dataloader):#使用dataloader加载数据，加载的数据为一批量的数据
        # 把输入图像转换为tensor并变为变量
        input_imgs = Variable(input_imgs.type(Tensor))
        # 目标检测：使用模型检测图像，检测结果为一个张量，
        # 对检测结果进行非极大值抑制，得到最终结果
        with torch.no_grad():
            detections = model(input_imgs)
            #print(detections.shape)#[：, 10647, 85]
            ##非极大值抑制：将边界框信息，转变为左上右下坐标，并且去除置信度低的坐标. (x1, y1, x2, y2, object_conf, class_score, class_pred)
            detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)#非极大值抑制[:,:,7]

        # 打印:检测时间，检测的批次
        current_time = time.time()
        inference_time = datetime.timedelta(seconds=current_time - prev_time)
        prev_time = current_time
        print("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time))

        # 保存图片路径，图片的检测信息(经过NMS处理后)
        imgs.extend(img_paths)
        img_detections.extend(detections)#长度为7

    ##########################################################################################################################
    """（4）将检测结果绘制到图片，并保存"""

    #边界框颜色
    cmap = plt.get_cmap("tab20b")   # Bounding-box colors
    colors = [cmap(i) for i in np.linspace(0, 1, 20)]

    #遍历图片
    for img_i, (path, detections) in enumerate(zip(imgs, img_detections)):
        print("(%d) Image: '%s'" % (img_i, path))

        #读取图片并将图片绘制在plt.figure
        img = np.array(Image.open(path))#读取图片
        plt.figure()#创建图片画布
        fig, ax = plt.subplots(1)
        ax.imshow(img)#将读取的图片绘制到画布

        #将图片对应的检测的边界框和标签绘制到图片上
        if detections is not None:
            # 将检测的边界框（对填充、调整大小的原图的预测），重新设置尺寸，使其与原图目标能匹配
            detections = rescale_boxes(detections, opt.img_size, img.shape[:2])

            #获取检测结果的类标签，并为每一个类指定一种颜色
            unique_labels = detections[:, -1].cpu().unique()#返回参数数组中所有不同的值,并按照从小到大排序可选参数
            n_cls_preds = len(unique_labels)
            bbox_colors = random.sample(colors, n_cls_preds)#为每一类分配一个边界框颜色

            #遍历图片对应检测结果的每一个边界框
            for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:#检测结果为左上和右下坐标
                print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))
                #边界框宽和高
                box_w = x2 - x1
                box_h = y2 - y1
                #将边界框写入图片中，并设置颜色
                color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
                # 创建一个矩形边界框
                bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
                # 吧矩形边界框写入画布
                ax.add_patch(bbox)
                # 为检测边界框添加类别信息
                plt.text( x1,y1,s=classes[int(cls_pred)],color="white",verticalalignment="top",bbox={"color": color, "pad": 0}  )

        #将绘制好边界框的图片保存
        plt.axis("off")
        plt.gca().xaxis.set_major_locator(NullLocator())
        plt.gca().yaxis.set_major_locator(NullLocator())
        filename = path.split("/")[-1].split(".")[0]
        plt.savefig(f"output/{filename}.png", bbox_inches="tight", pad_inches=0.0)
        plt.close()

models.py

定义模型结构的文件，根据模型的配置文件信息，来构建模型结构

from __future__ import division
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from utils.parse_config import *
from utils.utils import build_targets, to_cpu

'''构建网络函数：通过获取的模型定义module_defs来构建YOLOv3模型结构,根据module_defs中的模块配置构造层块的模块列表'''
def create_modules(module_defs):

    '''构建模型结构'''
    '''（1）解析模型超参数，获取模型的输入通道数'''
    #从model_def获取net的配置信息组成的字典hyperparams。model_def是由parse_config函数解析出来的列表，每个元素为一个字典，每一个字典包含了某层、模块的参数信息
    hyperparams = module_defs.pop(0)#hyperparams为module_defs的第一个字典元素，是模型的超参数信息{'type': 'net',...}
    output_filters = [int(hyperparams["channels"])]

    '''(2)构建nn.ModuleList()，用来存放创建的网络层、模块'''
    module_list = nn.ModuleList()

    '''(3)遍历模型定义列表的每个字典元素，创建相应的层、模块，添加到nn.ModuleList()中'''
    #遍历 module_defs的每个字典，根据字典内容，创建相应的层或模块。其中字典的type的值有一下几种："convolutional"，"maxpool"
    #"upsample"， "route"，"shortcut"， "yolo"
    for module_i, module_def in enumerate(module_defs):
        #创建一个 nn.Sequential()
        modules = nn.Sequential()

        #卷积层构建，并添加到nn.Sequential()
        if module_def["type"] == "convolutional":
            #获取convolutional层的参数信息
            bn = int(module_def["batch_normalize"])
            filters = int(module_def["filters"])
            kernel_size = int(module_def["size"])
            pad = (kernel_size - 1) // 2
            #创建convolution层：根据convolutional层的参数信息，创建convolutional层，并将改层加入到nn.Sequential()中
            modules.add_module(f"conv_{module_i}",#层在模型中的名字
                nn.Conv2d(#层
                    in_channels=output_filters[-1],#输入的通道数
                    out_channels=filters,#输出的通道数
                    kernel_size=kernel_size,#卷结核大小
                    stride=int(module_def["stride"]),#步长
                    padding=pad,#填充
                    bias=not bn,
                ),
            )
            if bn:
                #添加BatchNorm2d层
                modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
            if module_def["activation"] == "leaky":
                #添加激活层LeakyReLU
                modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))

        #池化层构建，并添加到nn.Sequential()
        elif module_def["type"] == "maxpool":
            # 获取maxpool层的参数信息
            kernel_size = int(module_def["size"])
            stride = int(module_def["stride"])
            # 根据maxpool层的参数信息，创建maxpool层，并将改层加入到 nn.Sequential()中
            if kernel_size == 2 and stride == 1:
                modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
            #创建maxpool层
            modules.add_module(f"maxpool_{module_i}",
                               nn.MaxPool2d(
                                   kernel_size=kernel_size, #卷积核大小
                                   stride=stride, #步长
                                   padding=int((kernel_size - 1) // 2))#填充
                               )

        #上采样层构建，并添加到nn.Sequential()
        #上采样层是自定义的层，需要实例化Upsample为一个对象，将对象层添加到模型列表中
        elif module_def["type"] == "upsample":
            #上采样的配置例，如下
            # [upsample]
            # stride = 2

            # 构建upsample层，上采样层类，重写了forward函数
            upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
            #层添加到模型
            modules.add_module(f"upsample_{module_i}", upsample)


        elif module_def["type"] == "route":
            #youte信息，例
            # [route]
            # layers = -1, 36

            # 获取route层的参数信息
            layers = [int(x) for x in module_def["layers"].split(",")]
            filters = sum([output_filters[1:][i] for i in layers])
            modules.add_module(f"route_{module_i}", EmptyLayer())#EmptyLayer()为“路线”和“快捷方式”层的占位符

        elif module_def["type"] == "shortcut":
            filters = output_filters[1:][int(module_def["from"])]
            modules.add_module(f"shortcut_{module_i}", EmptyLayer())#EmptyLayer()为“路线”和“快捷方式”层的占位符

        elif module_def["type"] == "yolo":
            #例：假设yolo的配置信息如下
            # [yolo]
            # mask = 3,4,5
            # anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
            # classes=80
            # num=9
            # jitter=.3
            # ignore_thresh = .7
            # truth_thresh = 1
            # random=1

            #获取anchor的索引，上例为3,4,5
            anchor_idxs = [int(x) for x in module_def["mask"].split(",")]

            #提取anchor尺寸信息,放入列表
            anchors = [int(x) for x in module_def["anchors"].split(",")]
            anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
            anchors = [anchors[i] for i in anchor_idxs]
            num_classes = int(module_def["classes"])
            #print('anchors1:', anchors)#上例为anchors1: [(30, 61), (62, 45), (59, 119)]

            #获取图片的输入尺寸
            img_size = int(hyperparams["height"])

            #定义yolo检测层：实例化yolo类，创建yolo层，传入的参数为三个anchor的尺寸，类别的数量，图像的大小
            yolo_layer = YOLOLayer(anchors, num_classes, img_size)

            #将YOLO层加入到模型列表
            modules.add_module(f"yolo_{module_i}", yolo_layer)

        module_list.append(modules) #将创建的nn.Sequential()即创建的层，添加到 nn.ModuleList()中
        output_filters.append(filters)#将创建的层的输出通道数添加到filters列表中，作为下次创建层的输入通道数

    return hyperparams, module_list#返回网络的参数、网络结构即层组成的列表

'''上采样层'''
class Upsample(nn.Module):
    """ nn.Upsample 被重写 """
    def __init__(self, scale_factor, mode="nearest"):
        super(Upsample, self).__init__()
        self.scale_factor = scale_factor#上采样步长
        self.mode = mode
    def forward(self, x):
        x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)#上采样方法，插值
        return x#返回上采样结果

'''emptylayer定义'''
class EmptyLayer(nn.Module):
    """Placeholder for 'route' and 'shortcut' layers"""
    def __init__(self):
        super(EmptyLayer, self).__init__()

'''yolo层定义：检测层'''
class YOLOLayer(nn.Module):
    """Detection layer"""
    def __init__(self, anchors, num_classes, img_dim=416):#参数为三个anchor的尺寸，类别的数量，图像的大小
        super(YOLOLayer, self).__init__()
        #基础设置
        self.anchors = anchors#anchor的尺寸信息，例某一层yolo尺寸为[(30, 61), (62, 45), (59, 119)]
        self.num_anchors = len(anchors)#anchor的数量
        self.num_classes = num_classes#类别的数量

        self.ignore_thres = 0.5
        self.mse_loss = nn.MSELoss()
        self.bce_loss = nn.BCELoss()
        self.obj_scale = 1
        self.noobj_scale = 100
        self.metrics = {}
        self.img_dim = img_dim
        self.grid_size = 0  # grid size
    #计算网格单元偏移
    def compute_grid_offsets(self, grid_size, cuda=True):

        #获取网格尺寸（几×几）
        self.grid_size = grid_size
        g = self.grid_size
        # print('g',g)  g可能的取值为13/26/52，对应不同yolo层的特征图的尺寸
        FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor

        #获取网格单元大小
        self.stride = self.img_dim / self.grid_size#网格单元的尺寸

        # Calculate offsets for each grid，假设g取13,
        #torch.arange(g)  为tensor([0,1,2,3,4,5,6,7,8,9,10,11,12])
        #torch.arange(g).repeat(g, 1)  为由tensor([0,1,2,3,4,5,6,7,8,9,10,11,12])组成的13行一列的张量
        #torch.arange(g).repeat(g, 1).view([1, 1, g, g])  改变视图为【1,1,13,13】
        self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)#
        self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)


        #把anchor的宽和高转变为相对于网格单元大小的度量
        self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])#例某一层yolo尺寸为[(30, 61), (62, 45), (59, 119)]
        self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))#获取anchor的宽
        self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))#获取anchor的高

    def forward(self, x, targets=None, img_dim=None):
        #yolo层的前向传播，参数为yolo层来自上层的输出作为输入x
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        #图片的大小
        self.img_dim = img_dim

        #获取x的形状
        num_samples = x.size(0)
        grid_size = x.size(2)

        prediction = (
            x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)#（num_samples,3,85,gride_size,grid_size）
            .permute(0, 1, 3, 4, 2)#permute是用来做维度换位置的，（num_samples,3,gride_size,grid_size,85）
            .contiguous()#调用contiguous()时，会强制拷贝一份tensor，让它的布局和从头创建的一毛一样。而不是与原数据公用一份内存。
        )
        # 得到outputs
        x = torch.sigmoid(prediction[..., 0])  # Center x
        y = torch.sigmoid(prediction[..., 1])  # Center y
        w = prediction[..., 2]  # Width
        h = prediction[..., 3]  # Height
        pred_conf = torch.sigmoid(prediction[..., 4])  # Conf
        pred_cls = torch.sigmoid(prediction[..., 5:])  # Cls pred.

        # If grid size does not match current we compute new offsets
        if grid_size != self.grid_size:
            self.compute_grid_offsets(grid_size, cuda=x.is_cuda)

        # Add offset and scale with anchors
        pred_boxes = FloatTensor(prediction[..., :4].shape)
        pred_boxes[..., 0] = x.data + self.grid_x
        pred_boxes[..., 1] = y.data + self.grid_y
        pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
        pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h

        output = torch.cat(
            (
                pred_boxes.view(num_samples, -1, 4) * self.stride,
                pred_conf.view(num_samples, -1, 1),
                pred_cls.view(num_samples, -1, self.num_classes),
            ),
            -1,
        )

        if targets is None:
            return output, 0
        else:
            iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
                pred_boxes=pred_boxes,
                pred_cls=pred_cls,
                target=targets,
                anchors=self.scaled_anchors,
                ignore_thres=self.ignore_thres,
            )

            # Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
            loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
            loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
            loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
            loss_h = self.mse_loss(h[obj_mask], th[obj_mask])

            loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
            loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
            loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj

            loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
            total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls

            # Metrics
            cls_acc = 100 * class_mask[obj_mask].mean()
            conf_obj = pred_conf[obj_mask].mean()
            conf_noobj = pred_conf[noobj_mask].mean()
            conf50 = (pred_conf > 0.5).float()
            iou50 = (iou_scores > 0.5).float()
            iou75 = (iou_scores > 0.75).float()
            detected_mask = conf50 * class_mask * tconf
            precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
            recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
            recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)

            self.metrics = {
                "loss": to_cpu(total_loss).item(),
                "x": to_cpu(loss_x).item(),
                "y": to_cpu(loss_y).item(),
                "w": to_cpu(loss_w).item(),
                "h": to_cpu(loss_h).item(),
                "conf": to_cpu(loss_conf).item(),
                "cls": to_cpu(loss_cls).item(),
                "cls_acc": to_cpu(cls_acc).item(),
                "recall50": to_cpu(recall50).item(),
                "recall75": to_cpu(recall75).item(),
                "precision": to_cpu(precision).item(),
                "conf_obj": to_cpu(conf_obj).item(),
                "conf_noobj": to_cpu(conf_noobj).item(),
                "grid_size": grid_size,
            }

            return output, total_loss

"""Darknet类：YOLOv3模型"""
class Darknet(nn.Module):
    """YOLOv3 object detection model"""
    def __init__(self, config_path, img_size=416):
        super(Darknet, self).__init__()

        # parse_model_config（）模型配置的解析器:用来解析yolo-v3层配置文件(yolov3.cfg)并返回模块定义
        #（模型定义module_defs是一个列表，每一个元素是一个字典，该字典描绘了网络每一个模块/层的信息）
        self.module_defs = parse_model_config(config_path)

        #通过获取的模型定义module_defs，来构建YOLOv3模型
        self.hyperparams,self.module_list = create_modules(self.module_defs)#模型参数和模型结构
        self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
        self.img_size = img_size
        self.seen = 0
        self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)

    def forward(self, x, targets=None):
        img_dim = x.shape[2]
        loss = 0
        layer_outputs, yolo_outputs = [], []
        for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
            if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
                x = module(x)
            elif module_def["type"] == "route":
                x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
            elif module_def["type"] == "shortcut":
                layer_i = int(module_def["from"])
                x = layer_outputs[-1] + layer_outputs[layer_i]
            elif module_def["type"] == "yolo":
                x, layer_loss = module[0](x, targets, img_dim)
                loss += layer_loss
                yolo_outputs.append(x)
            layer_outputs.append(x)
        yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
        return yolo_outputs if targets is None else (loss, yolo_outputs)

    def load_darknet_weights(self, weights_path):
        """Parses and loads the weights stored in 'weights_path'"""

        # Open the weights file
        with open(weights_path, "rb") as f:
            header = np.fromfile(f, dtype=np.int32, count=5)  # First five are header values
            self.header_info = header  # Needed to write header when saving weights
            self.seen = header[3]  # number of images seen during training
            weights = np.fromfile(f, dtype=np.float32)  # The rest are weights

        # Establish cutoff for loading backbone weights
        cutoff = None
        if "darknet53.conv.74" in weights_path:
            cutoff = 75

        ptr = 0
        for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
            if i == cutoff:
                break
            if module_def["type"] == "convolutional":
                conv_layer = module[0]
                if module_def["batch_normalize"]:
                    # Load BN bias, weights, running mean and running variance
                    bn_layer = module[1]
                    num_b = bn_layer.bias.numel()  # Number of biases
                    # Bias
                    bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
                    bn_layer.bias.data.copy_(bn_b)
                    ptr += num_b
                    # Weight
                    bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
                    bn_layer.weight.data.copy_(bn_w)
                    ptr += num_b
                    # Running Mean
                    bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
                    bn_layer.running_mean.data.copy_(bn_rm)
                    ptr += num_b
                    # Running Var
                    bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
                    bn_layer.running_var.data.copy_(bn_rv)
                    ptr += num_b
                else:
                    # Load conv. bias
                    num_b = conv_layer.bias.numel()
                    conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
                    conv_layer.bias.data.copy_(conv_b)
                    ptr += num_b
                # Load conv. weights
                num_w = conv_layer.weight.numel()
                conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
                conv_layer.weight.data.copy_(conv_w)
                ptr += num_w

    def save_darknet_weights(self, path, cutoff=-1):
        """
            @:param path    - path of the new weights file
            @:param cutoff  - save layers between 0 and cutoff (cutoff = -1 -> all are saved)
        """
        fp = open(path, "wb")
        self.header_info[3] = self.seen
        self.header_info.tofile(fp)

        # Iterate through layers
        for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
            if module_def["type"] == "convolutional":
                conv_layer = module[0]
                # If batch norm, load bn first
                if module_def["batch_normalize"]:
                    bn_layer = module[1]
                    bn_layer.bias.data.cpu().numpy().tofile(fp)
                    bn_layer.weight.data.cpu().numpy().tofile(fp)
                    bn_layer.running_mean.data.cpu().numpy().tofile(fp)
                    bn_layer.running_var.data.cpu().numpy().tofile(fp)
                # Load conv bias
                else:
                    conv_layer.bias.data.cpu().numpy().tofile(fp)
                # Load conv weights
                conv_layer.weight.data.cpu().numpy().tofile(fp)

        fp.close()

test.py

用来评估模型性能的文件。

from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
import argparse
import tqdm
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable

"""模型评估函数：参数为模型、valid数据集路径、iou阈值。nms阈值、网络输入大小、批量大小"""
def evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size):
    #加上model.eval(). 否则的话，有输入数据，即使不训练，它也会改变权值
    model.eval()

    '''(1)获取评估数据集：变为batch组成的数据集'''
    # dataset（验证集图片路径集、验证集图片集，验证集标签集）
    # dataloader获取批量batch,验证集图片路径batch、验证集图片batch，验证集标签batch）
    dataset = ListDataset(path, img_size=img_size, augment=False, multiscale=False)
    dataloader = torch.utils.data.DataLoader(dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             num_workers=1,
                                             collate_fn=dataset.collate_fn)#collate_fn参数，实现自定义的batch输出
    Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor


    labels = []
    sample_metrics = []  # List of tuples (TP, confs, pred)
    for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):#tqdm进度条
        '''(2) batch标签处理'''
        labels += targets[:, 1].tolist()#将targets的类别信息转变为list存到label列表中
        # Rescale target
        targets[:, 2:] = xywh2xyxy(targets[:, 2:])#将targets的坐标变为（xyxy）形式，此时的坐标也是归一化的形式
        targets[:, 2:] *= img_size#适应于原图的比target形式

        '''(3)batch图片预测，并进行NMS处理'''
        # 图片输入模型，并对模型输出进行非极大值抑制
        imgs = Variable(imgs.type(Tensor), requires_grad=False)
        with torch.no_grad():
            outputs = model(imgs)
            outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)

        '''（4）预测信息统计：得到经过NMS处理后，预测边界框的true_positive（值为或1）、预测置信度，预测类别信息'''
        sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres)#参数：模型输出，真实标签（适应于原图的x,y,x,y）,iou阈值

    # 这里需要注意,github上面的代码有错误,需要添加if条件语句，训练才能正常运行
    if len(sample_metrics) == 0:
        return np.array([]), np.array([]), np.array([]), np.array([]), np.array([])

    # sample_metrics信息解析，获取独立的 true_positive（值为或1）、预测置信度，预测类别  信息
    true_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]

    #计算 precision, recall, AP, f1, ap_class，这里调用了utils.py中的函数进行计算
    precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)#pred_labels, labels的长度是不同的
    return precision, recall, AP, f1, ap_class

if __name__ == "__main__":
    '''(1)参数解析'''
    parser = argparse.ArgumentParser()
    parser.add_argument("--batch_size", type=int, default=8, help="size of each image batch")
    parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
    parser.add_argument("--data_config", type=str, default="config/custom.data", help="path to data config file")
    parser.add_argument("--weights_path", type=str, default="checkpoints/yolov3_ckpt_9.pth", help="path to weights file")#"weights/yolov3.weights"
    parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
    parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected")
    parser.add_argument("--conf_thres", type=float, default=0.001, help="object confidence threshold")
    parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")
    parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")
    parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
    opt = parser.parse_args()
    #print(opt)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    """（2）数据解析"""
    # 调用parse_config。py中的数据解析桉树，返回值 data_config 为字典{class:80,train:路径，valid:路径。。。}
    data_config = parse_data_config(opt.data_config)
    valid_path = data_config["valid"]#验证集路径valid=data/custom/valid.txt
    class_names = load_classes(data_config["names"])#类别路径

    """（3）模型构建：构建模型，加载模型参数"""
    model = Darknet(opt.model_def).to(device)
    if opt.weights_path.endswith(".weights"):
        # Load darknet weights
        model.load_darknet_weights(opt.weights_path)#
    else:
        model.load_state_dict(torch.load(opt.weights_path))#自定义的函数

    print("Compute mAP...")

    """(4)模型评估"""
    precision, recall, AP, f1, ap_class = evaluate(
        model,#模型
        path=valid_path,#验证集路径
        iou_thres=opt.iou_thres,
        conf_thres=opt.conf_thres,#置信度阈值
        nms_thres=opt.nms_thres,#nms阈值
        img_size=opt.img_size,#网路输入尺寸
        batch_size=8,#批量
    )
    print(precision, recall, AP, f1, ap_class)
    print("Average Precisions:")
    for i, c in enumerate(ap_class):
        print(f"+ Class '{c}' ({class_names[c]}) - AP: {AP[i]}")

    print(f"mAP: {AP.mean()}")

train.py
模型训练的文件夹，训练会生成：
（1）checkpoint文件夹，用来保存某epoch训练后的模型参数
（2）logs文件夹，用来保存日志信息

from __future__ import division
from models import *
from utils.logger import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
from terminaltables import AsciiTable
import os
from test import evaluate
import time
import datetime
import argparse
import torch
from torch.utils.data import DataLoader
from torch.autograd import Variable

if __name__ == "__main__":
    '''（1）参数解析'''
    parser = argparse.ArgumentParser()
    parser.add_argument("--epochs", type=int, default=10, help="number of epochs")
    parser.add_argument("--batch_size", type=int, default=1, help="size of each image batch")
    #梯度累加数
    parser.add_argument("--gradient_accumulations", type=int, default=2, help="number of gradient accums before step")
    parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
    parser.add_argument("--data_config", type=str, default="config/custom.data", help="path to data config file")
    parser.add_argument("--pretrained_weights", type=str, help="if specified starts from checkpoint model")
    parser.add_argument("--n_cpu", type=int, default=1, help="number of cpu threads to use during batch generation")
    parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
    parser.add_argument("--checkpoint_interval", type=int, default=1, help="interval between saving model weights")
    parser.add_argument("--evaluation_interval", type=int, default=1, help="interval evaluations on validation set")
    parser.add_argument("--compute_map", default=False, help="if True computes mAP every tenth batch")
    parser.add_argument("--multiscale_training", default=True, help="allow for multi-scale training")
    parser.add_argument("--weights_path", type=str, default="checkpoints/yolov3_ckpt_9.pth", help="path to weights file")
    opt = parser.parse_args()
    print(opt)
    '''(2)实例化日志类'''
    logger = Logger("logs")
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    '''(3)文件夹创建'''
    os.makedirs("output", exist_ok=True)
    os.makedirs("checkpoints", exist_ok=True)
    """(4)初始化模型：模型构建，模型参数装载"""
    model = Darknet(opt.model_def).to(device)
    model.apply(weights_init_normal)
    # If specified we start from checkpoint
    if opt.pretrained_weights:
        if opt.pretrained_weights.endswith(".pth"):
            model.load_state_dict(torch.load(opt.pretrained_weights))
        else:
            model.load_darknet_weights(opt.pretrained_weights)
    """(5)数据集加载"""
    data_config = parse_data_config(opt.data_config)#调用parse_config.py文件的数据配置解析函数，获取data_config为一个字典
    train_path = data_config["train"]#训练集路径
    valid_path = data_config["valid"]#验证集路径
    class_names = load_classes(data_config["names"])#调用utils.py内的load_classes函数用于获取数据集包含的类别名称

    #dataset是数据集中，图片的路径和、图片、标签（归一化的格式x,y,w,h）的集合
    dataset = ListDataset(train_path, augment=True, multiscale=opt.multiscale_training)
    #dataloader是dataset装载成批量形式
    dataloader = torch.utils.data.DataLoader(
        dataset,
        batch_size=opt.batch_size,
        shuffle=True,
        num_workers=opt.n_cpu,
        pin_memory=True,
        collate_fn=dataset.collate_fn,
    )
    """(7)优化器"""
    optimizer = torch.optim.Adam(model.parameters())


    """(8)模型训练"""
    metrics = [
        "grid_size",
        "loss",
        "x",
        "y",
        "w",
        "h",
        "conf",
        "cls",
        "cls_acc",
        "recall50",
        "recall75",
        "precision",
        "conf_obj",
        "conf_noobj",
    ]
    for epoch in range(opt.epochs):#迭代epoch次训练

        model.train()#设置模型为训练模式
        start_time = time .time()
        print('start_time',start_time)


        for batch_i, (_, imgs, targets) in enumerate(dataloader):#每一epoch的批量迭代

            #批量的累计迭代数
            batches_done = len(dataloader) * epoch + batch_i

            #图片、标签的变量化处理
            imgs = Variable(imgs.to(device))#把图像变为变量，可以记录梯度
            targets = Variable(targets.to(device), requires_grad=False)#把标签变为变量，不记录梯度

            # 获取模型的输出与损失，损失反向传播
            loss, outputs = model(imgs, targets)#将图片和标签输入模型，获取输出
            loss.backward()

            #计算梯度
            if batches_done % opt.gradient_accumulations:
                # 在每一步之前计算梯度Accumulates gradient before each step
                optimizer.step()
                optimizer.zero_grad()

            #训练的epoch及batch信息
            log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch+1, opt.epochs, batch_i+1, len(dataloader))
            #print('log_str',log_str)#例---- [Epoch 1/10, Batch 1/10] ----

            #创建行索引
            metric_table = [["Metrics", *[f"YOLO Layer {i}" for i in range(len(model.yolo_layers))]]]#创建训练过程中的表格，行索引
            #print(metric_table)# [['Metrics', 'YOLO Layer 0', 'YOLO Layer 1', 'YOLO Layer 2']]

            # 在每一个 YOLO layer的各项指标信息
            for i, metric in enumerate(metrics):#metrics为各项指标名称组成的列表，上面已经定义
                #获取metrics各个项的数值类型
                formats = {m: "%.6f" for m in metrics}#将所有的metrics中的输出数值类型定义，这一步把全部的输出类型全部定义保留6位小数
                formats["grid_size"] = "%2d"
                formats["cls_acc"] = "%.2f%%"
                #print(' formats', formats)#{'grid_size': '%2d', 'loss': '%.6f', 'x': '%.6f', 'y': '%.6f', 'w': '%.6f', 'h': '%.6f', 'conf': '%.6f', 'cls': '%.6f', 'cls_acc': '%.2f%%', 'recall50': '%.6f', 'recall75': '%.6f', 'precision': '%.6f', 'conf_obj': '%.6f', 'conf_noobj': '%.6f'}

                #表格赋值
                row_metrics = [formats[metric] % yolo.metrics.get(metric, 0) for yolo in model.yolo_layers]#？？？？？？？？？？？？？
                #print('row_metrics',row_metrics)
                metric_table += [[metric, *row_metrics]]

                # Tensorboard 日志信息
                tensorboard_log = []
                for j, yolo in enumerate(model.yolo_layers):
                    for name, metric in yolo.metrics.items():
                        if name != "grid_size":
                            tensorboard_log += [(f"{name}_{j+1}", metric)]#把除grid_size的其余信息，添加到日志中
                tensorboard_log += [("loss", loss.item())]#把损失也添加到日志信息中
                #把日志信息列表写入创建的日志对象
                logger.list_of_scalars_summary(tensorboard_log, batches_done)

            #log_str打印各项指标参数：
            log_str += AsciiTable(metric_table).table
            log_str += f"\nTotal loss {loss.item()}"

            # 计算该epoch剩余需要的大概时间
            epoch_batches_left = len(dataloader) - (batch_i + 1)
            time_left = datetime.timedelta(seconds=epoch_batches_left * (time.time() - start_time) / (batch_i + 1))
            log_str += f"\n---- ETA {time_left}"

            print(log_str)
            model.seen += imgs.size(0)
        '''(9)训练时评估'''
        if epoch % opt.evaluation_interval == 0:
            print("\n---- Evaluating Model ----")
            # 在评估数据集上对当前模型进行评估，具体评估细节可以看test.py
            precision, recall, AP, f1, ap_class = evaluate(
                model,
                path=valid_path,
                iou_thres=0.5,
                conf_thres=0.5,
                nms_thres=0.5,
                img_size=opt.img_size,
                batch_size=8,
            )
            evaluation_metrics = [
                ("val_precision", precision.mean()),
                ("val_recall", recall.mean()),
                ("val_mAP", AP.mean()),
                ("val_f1", f1.mean()),
            ]
            logger.list_of_scalars_summary(evaluation_metrics, epoch)

            # Print class APs and mAP
            ap_table = [["Index", "Class name", "AP"]]
            for i, c in enumerate(ap_class):
                ap_table += [[c, class_names[c], "%.5f" % AP[i]]]
            print(AsciiTable(ap_table).table)
            print(f"---- mAP {AP.mean()}")

        '''(10)模型保存'''
        if epoch % opt.checkpoint_interval == 0:
            torch.save(model.state_dict(), f"checkpoints/yolov3_ckpt_%d.pth" % epoch)

YOLOv4算法解读

YOLOv4的改进

马赛克数据增强
简介的增加了batchsize

标签平滑

IOU升级
IOU 会存在梯度消失的问题

GIOU 引入闭包面积

DIOU引入中心点距离

CIOU引入长宽比
NMS 改进

soft-nms

SPPNet
V3中为了更好满足不同输入大小，训练的时候要改变输入数据的大小
SPP其实就是用最大池化来满足最终输入特征一致即可

CSPNet

每一个block按照特征图的channel维度拆分成两部分一份正常走网络，另一份直接concat到这个block的输出

CBAM
一个是通道上的注意力机制（Channel Attention),加大部分特征图的权重
一个是空间上的注意力机制（Spatial Attention，加大部分区域的权重

YOLOV4中只采用了空间上的注意力机制

你可能感兴趣的:(计算机视觉,目标检测,计算机视觉,深度学习)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
【目标检测数据集】卡车数据集1073张VOC+YOLO格式熬夜写代码的平头哥∰ 目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：1073标注数量(xml文件个数)：1073标注数量(txt文件个数)：1073标注类别数：1标注类别名称:["truck"]每个类别标注的框数：truck框数=1120总框数：1120使用标注工具：labelImg标注
番茄西红柿叶子病害分类数据集12882张11类别 futureflsl 数据集分类数据挖掘人工智能
数据集类型：图像分类用，不可用于目标检测无标注文件数据集格式：仅仅包含jpg图片，每个类别文件夹下面存放着对应图片图片数量(jpg文件个数)：12882分类类别数：11类别名称:["Bacterial_Spot_Bacteria","Early_Blight_Fungus","Healthy","Late_Blight_Water_Mold","Leaf_Mold_Fungus","Powdery
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
[数据集][目标检测]汽车头部尾部检测数据集VOC+YOLO格式5319张3类别 FL1623863129 数据集目标检测汽车 YOLO
数据集制作单位：未来自主研究中心(FIRC)版权单位：未来自主研究中心(FIRC)版权声明：数据集仅仅供个人使用，不得在未授权情况下挂淘宝、咸鱼等交易网站公开售卖,由此引发的法律责任需自行承担数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：5319标注数量(xml文件
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
计算机视觉中，Pooling的作用 Wils0nEdwards 计算机视觉人工智能
在计算机视觉中，Pooling（池化）是一种常见的操作，主要用于卷积神经网络（CNN）中。它通过对特征图进行下采样，减少数据的空间维度，同时保留重要的特征信息。Pooling的作用可以归纳为以下几个方面：1.降低计算复杂度与内存需求Pooling操作通过对特征图进行下采样，减少了特征图的空间分辨率（例如，高度和宽度）。这意味着网络需要处理的数据量会减少，从而降低了计算量和内存需求。这对大型神经网络
OpenCV图像处理技术（Python）——入门森屿_ opencv
©FuXianjun.AllRightsReserved.OpenCV入门图像作为人类感知世界的视觉基础，是人类获取信息、表达信息的重要手段，OpenCV作为一个开源的计算机视觉库，它包括几百个易用的图像成像和视觉函数，既可以用于学术研究，也可用于工业邻域，它于1999年由因特尔的GaryBradski启动，OpenCV库主要由C和C++语言编写，它可以在多个操作系统上运行。1.1图像处理基本操作
损失函数与反向传播 Star_. PyTorch pytorch 深度学习 python
损失函数定义与作用损失函数(lossfunction)在深度学习领域是用来计算搭建模型预测的输出值和真实值之间的误差。1.损失函数越小越好2.计算实际输出与目标之间的差距3.为更新输出提供依据（反向传播)常见的损失函数回归常见的损失函数有：均方差（MeanSquaredError，MSE）、平均绝对误差（MeanAbsoluteErrorLoss，MAE）、HuberLoss是一种将MSE与MAE
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
CV、NLP、数据控掘推荐、量化海的那边- AI算法自然语言处理人工智能
下面是对CV（计算机视觉）、NLP（自然语言处理）、数据挖掘推荐和量化的简要概述及其应用领域的介绍：1.CV（计算机视觉，ComputerVision）定义：计算机视觉是一门让计算机能够从图像或视频中提取有用信息，并做出决策的学科。它通过模拟人类的视觉系统来识别、处理和理解视觉信息。主要任务：图像分类：识别图像中的物体并分类，比如猫、狗、车等。目标检测：在图像或视频中定位并识别多个对象，如人脸检测
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
基于深度学习的文本引导的图像编辑 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的文本引导的图像编辑（Text-GuidedImageEditing）是一种通过自然语言文本指令对图像进行编辑或修改的技术。它结合了图像生成和自然语言处理（NLP）的最新进展，使用户能够通过描述性文本对图像内容进行精确的调整和操控。1.文本引导的图像编辑的挑战文本和图像之间的对齐：如何将文本中的语义信息准确地映射到图像中的特定区域或元素是一个关键挑战。这涉及到多模态数据的对齐和理解。编
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
深度学习驱动的车牌识别：技术演进与未来挑战逼子歌深度学习车牌识别神经网络字符识别 YOLO 卷积神经网络
一、引言1.1研究背景在当今社会，智能交通系统的发展日益重要，而车牌识别作为其关键组成部分，发挥着至关重要的作用。车牌识别技术广泛应用于交通管理、停车场管理、安防监控等领域。在交通管理中，它可以用于车辆识别、交通违法监控和车流统计等，提高交通管理的效率和准确性。在停车场管理中，实现车辆的自动识别和收费，提升管理和服务水平。在安防监控领域，可用于追踪嫌疑人及犯罪行为。深度学习的出现为车牌识别带来了重
JVM StackMapTable 属性的作用及理解 lijingyao8206 jvm 字节码 Class文件 StackMapTable
在Java 6版本之后JVM引入了栈图(Stack Map Table)概念。为了提高验证过程的效率，在字节码规范中添加了Stack Map Table属性，以下简称栈图，其方法的code属性中存储了局部变量和操作数的类型验证以及字节码的偏移量。也就是一个method需要且仅对应一个Stack Map Table。在Java 7版
回调函数调用方法百合不是茶 java
最近在看大神写的代码时,.发现其中使用了很多的回调 ,以前只是在学习的时候经常用到 ,现在写个笔记记录一下代码很简单: MainDemo :调用方法得到方法的返回结果
[时间机器]制造时间机器需要一些材料 comsci 制造
根据我的计算和推测,要完全实现制造一台时间机器,需要某些我们这个世界不存在的物质和材料... 甚至可以这样说,这种材料和物质,我们在反应堆中也无法获得......
开口埋怨不如闭口做事邓集海邓集海做人做事工作
“开口埋怨，不如闭口做事。”不是名人名言，而是一个普通父亲对儿子的训导。但是，因为这句训导，这位普通父亲却造就了一个名人儿子。这位普通父亲造就的名人儿子，叫张明正。　　　　张明正出身贫寒，读书时成绩差，常挨老师批评。高中毕业，张明正连普通大学的分数线都没上。高考成绩出来后，平时开口怨这怨那的张明正，不从自身找原因，而是不停地埋怨自己家庭条件不好、埋怨父母没有给他创造良好的学习环境。　　　　
jQuery插件开发全解析，类级别与对象级别开发 IT独行者 jquery 开发插件　函数
jQuery插件的开发包括两种：一种是类级别的插件开发，即给 jQuery添加新的全局函数，相当于给 jQuery类本身添加方法。 jQuery的全局函数就是属于 jQuery命名空间的函数，另一种是对象级别的插件开发，即给 jQuery对象添加方法。下面就两种函数的开发做详细的说明。 1 、类级别的插件开发类级别的插件开发最直接的理解就是给jQuer
Rome解析Rss 413277409 Rome解析Rss
import java.net.URL; import java.util.List; import org.junit.Test; import com.sun.syndication.feed.synd.SyndCategory; import com.sun.syndication.feed.synd.S
RSA加密解密无量加密解密 rsa
RSA加密解密代码代码有待整理 package com.tongbanjie.commons.util; import java.security.Key; import java.security.KeyFactory; import java.security.KeyPair; import java.security.KeyPairGenerat
linux 软件安装遇到的问题 aichenglong linux 遇到的问题 ftp
1 ftp配置中遇到的问题 500 OOPS: cannot change directory 出现该问题的原因:是SELinux安装机制的问题.只要disable SELinux就可以了修改方法:1 修改/etc/selinux/config 中SELINUX=disabled 2 source /etc
面试心得 alafqq 面试
最近面试了好几家公司。记录下；支付宝，面试我的人胖胖的，看着人挺好的；博彦外包的职位，面试失败；阿里金融，面试官人也挺和善，只不过我让他吐血了。。。由于印象比较深，记录下； 1，自我介绍 2，说下八种基本类型；（算上string。楼主才答了3种，哈哈，string其实不是基本类型，是引用类型） 3，什么是包装类，包装类的优点； 4，平时看过什么书？NND，什么书都没看过。。照样
java的多态性探讨百合不是茶 java
java的多态性是指main方法在调用属性的时候类可以对这一属性做出反应的情况 //package 1; class A{ public void test(){ System.out.println("A"); } } class D extends A{ public void test(){ S
网络编程基础篇之JavaScript-学习笔记 bijian1013 JavaScript
1.documentWrite <html> <head> <script language="JavaScript"> document.write("这是电脑网络学校"); document.close(); </script> </h
探索JUnit4扩展：深入Rule bijian1013 JUnit Rule 单元测试
本文将进一步探究Rule的应用，展示如何使用Rule来替代@BeforeClass，@AfterClass，@Before和@After的功能。在上一篇中提到，可以使用Rule替代现有的大部分Runner扩展，而且也不提倡对Runner中的withBefores()，withAfte
[CSS]CSS浮动十五条规则 bit1129 css
这些浮动规则，主要是参考CSS权威指南关于浮动规则的总结，然后添加一些简单的例子以验证和理解这些规则。 1. 所有的页面元素都可以浮动 2. 一个元素浮动后，会成为块级元素，比如<span>,a, strong等都会变成块级元素 3.一个元素左浮动，会向最近的块级父元素的左上角移动，直到浮动元素的左外边界碰到块级父元素的左内边界；如果这个块级父元素已经有浮动元素停靠了
【Kafka六】Kafka Producer和Consumer多Broker、多Partition场景 bit1129 partition
0.Kafka服务器配置 3个broker 1个topic，6个partition，副本因子是2 2个consumer，每个consumer三个线程并发读取 1. Producer package kafka.examples.multibrokers.producers; import java.util.Properties; import java.util.
zabbix_agentd.conf配置文件详解 ronin47 zabbix 配置文件
Aliaskey的别名，例如 Alias=ttlsa.userid:vfs.file.regexp[/etc/passwd,^ttlsa:.:([0-9]+),,,,\1]，或者ttlsa的用户ID。你可以使用key：vfs.file.regexp[/etc/passwd,^ttlsa:.: ([0-9]+),,,,\1]，也可以使用ttlsa.userid。备注: 别名不能重复，但是可以有多个
java--19.用矩阵求Fibonacci数列的第N项 bylijinnan fibonacci
参考了网上的思路，写了个Java版的： public class Fibonacci { final static int[] A={1,1,1,0}; public static void main(String[] args) { int n=7; for(int i=0;i<=n;i++){ int f=fibonac
Netty源码学习-LengthFieldBasedFrameDecoder bylijinnan java netty
先看看LengthFieldBasedFrameDecoder的官方API http://docs.jboss.org/netty/3.1/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html API举例说明了LengthFieldBasedFrameDecoder的解析机制，如下：实
AES加密解密 chicony 加密解密
AES加解密算法，使用Base64做转码以及辅助加密： package com.wintv.common; import javax.crypto.Cipher; import javax.crypto.spec.IvParameterSpec; import javax.crypto.spec.SecretKeySpec; import sun.misc.BASE64Decod
文件编码格式转换 ctrain 编码格式
package com.test; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream;
mysql 在linux客户端插入数据中文乱码 daizj mysql 中文乱码
1、查看系统客户端，数据库，连接层的编码查看方法： http://daizj.iteye.com/blog/2174993 进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+------
好代码是廉价的代码 dcj3sjt126com 程序员读书
长久以来我一直主张：好代码是廉价的代码。当我跟做开发的同事说出这话时，他们的第一反应是一种惊愕，然后是将近一个星期的嘲笑，把它当作一个笑话来讲。当他们走近看我的表情、知道我是认真的时，才收敛一点。当最初的惊愕消退后，他们会用一些这样的话来反驳： “好代码不廉价，好代码是采用经过数十年计算机科学研究和积累得出的最佳实践设计模式和方法论建立起来的精心制作的程序代码。” 我只
Android网络请求库——android-async-http dcj3sjt126com android
在iOS开发中有大名鼎鼎的ASIHttpRequest库，用来处理网络请求操作，今天要介绍的是一个在Android上同样强大的网络请求库android-async-http，目前非常火的应用Instagram和Pinterest的Android版就是用的这个网络请求库。这个网络请求库是基于Apache HttpClient库之上的一个异步网络请求处理库，网络处理均基于Android的非UI线程，通
ORACLE 复习笔记之SQL语句的优化 eksliang SQL优化 Oracle sql语句优化 SQL语句的优化
转载请出自出处：http://eksliang.iteye.com/blog/2097999 SQL语句的优化总结如下 sql语句的优化可以按照如下六个步骤进行：合理使用索引避免或者简化排序消除对大表的扫描避免复杂的通配符匹配调整子查询的性能 EXISTS和IN运算符下面我就按照上面这六个步骤分别进行总结：
浅析：Android 嵌套滑动机制（NestedScrolling） gg163 android 移动开发滑动机制嵌套
谷歌在发布安卓 Lollipop版本之后，为了更好的用户体验，Google为Android的滑动机制提供了NestedScrolling特性 NestedScrolling的特性可以体现在哪里呢？ 比如你使用了Toolbar，下面一个ScrollView，向上滚
使用hovertree菜单作为后台导航 hvt JavaScript jquery .net hovertree asp.net
hovertree是一个jquery菜单插件，官方网址：http://keleyi.com/jq/hovertree/ ，可以登录该网址体验效果。 0.1.3版本：http://keleyi.com/jq/hovertree/demo/demo.0.1.3.htm hovertree插件包含文件： http://keleyi.com/jq/hovertree/css
SVG 教程（二）矩形天梯梦 svg
SVG <rect> SVG Shapes SVG有一些预定义的形状元素，可被开发者使用和操作：矩形 <rect> 圆形 <circle> 椭圆 <ellipse> 线 <line> 折线 <polyline> 多边形 <polygon> 路径 <path>
一个简单的队列 luyulong java 数据结构队列
public class MyQueue { private long[] arr; private int front; private int end; // 有效数据的大小 private int elements; public MyQueue() { arr = new long[10]; elements = 0; front
基础数据结构和算法九：Binary Search Tree sunwinner Algorithm
A binary search tree (BST) is a binary tree where each node has a Comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all
项目出现的一些问题和体会 Steven-Walker DAO Web servlet
第一篇博客不知道要写点什么，就先来点近阶段的感悟吧。这几天学了servlet和数据库等知识，就参照老方的视频写了一个简单的增删改查的，完成了最简单的一些功能，使用了三层架构。 dao层完成的是对数据库具体的功能实现，service层调用了dao层的实现方法，具体对servlet提供支持。 &
高手问答：Java老A带你全面提升Java单兵作战能力！ ITeye管理员 java
本期特邀《Java特种兵》作者：谢宇，CSDN论坛ID: xieyuooo 针对JAVA问题给予大家解答，欢迎网友积极提问，与专家一起讨论! 作者简介：淘宝网资深Java工程师，CSDN超人气博主，人称“胖哥”。 CSDN博客地址： http://blog.csdn.net/xieyuooo 作者在进入大学前是一个不折不扣的计算机白痴，曾经被人笑话过不懂鼠标是什么，