baidu_huihui

Pytorch | yolov3原理及代码详解（一）

YOLO相关原理：

https://blog.csdn.net/leviopku/article/details/82660381

https://www.jianshu.com/p/d13ae1055302

https://blog.csdn.net/qq_34199326/article/details/84874409
https://blog.csdn.net/chandanyan8568/article/details/81089083
https://blog.csdn.net/leviopku/article/details/82660381

分析代码：

https://github.com/eriklindernoren/PyTorch-YOLOv3

注：这个是方便个人学习pytorch和yolov3所做的记录，有任何错误欢迎指出。

1、detect.py

从detect.py开始分析代码的流程。

1.1模型初始化（detect.py——part.1）

from __future__ import division
from models import *
from utils.utils import *
from utils.datasets import *
import os
import sys
import time
import datetime
import argparse
from PIL import Image
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.ticker import NullLocator
"""
（1）import argparse 首先导入模块
（2）parser = argparse.ArgumentParser（）创建一个解析对象
（3）parser.add_argument() 向该对象中添加你要关注的命令行参数和选项
（4）parser.parse_args() 进行解析
"""
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset")
parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")
parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")
parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")
parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold")
parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression")
parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")
parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")
parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")
parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model")
opt = parser.parse_args()
print(opt)
#选择是否使用GPU设备
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#创建多级目录
os.makedirs("output", exist_ok=True)
# Set up model 调用darknet模型
model = Darknet(opt.model_def, img_size=opt.img_size).to(device)

1.1.1 YOLOv3模型解析

model = Darknet(opt.model_def, img_size=opt.img_size).to(device)，这条语句加载darkent模型，即YOLOv3模型。Darknet模型在model.py中进行定义。其完整定义如下：

class Darknet(nn.Module):
"""YOLOv3 object detection model"""
def __init__(self, config_path, img_size=416):
super(Darknet, self).__init__()
#解析cfg文件
self.module_defs = parse_model_config(config_path)
#print("module_defs : ",self.module_defs)
self.hyperparams, self.module_list = create_modules(self.module_defs)
#print("module_list : ",self.module_list)
# hasattr() 函数用于判断对象是否包含对应的属性。
# yolo层有 metrics 属性
self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
#print("self.yolo_layers:\n",self.yolo_layers)
self.img_size = img_size
self.seen = 0
self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)
def forward(self, x, targets=None):
img_dim = x.shape[2]
loss = 0
layer_outputs, yolo_outputs = [], []
print("x.shape: ",x.shape)
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
#print("module_defs : ",module_def)
#print("module : ",module)
#print("i: ",i," x.shape: ",x.shape)
if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
x = module(x)
elif module_def["type"] == "route":
print("i: ",i," x.shape: ",x.shape)
for layer_i in module_def["layers"].split(","):
print("layer_i:\n",layer_i)
x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
elif module_def["type"] == "shortcut":
layer_i = int(module_def["from"])
x = layer_outputs[-1] + layer_outputs[layer_i]
elif module_def["type"] == "yolo":
x, layer_loss = module[0](x, targets, img_dim)
loss += layer_loss
yolo_outputs.append(x)
layer_outputs.append(x)
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
return yolo_outputs if targets is None else (loss, yolo_outputs)
def load_darknet_weights(self, weights_path):
"""Parses and loads the weights stored in 'weights_path'"""
# Open the weights file
with open(weights_path, "rb") as f:
header = np.fromfile(f, dtype=np.int32, count=5) # First five are header values
self.header_info = header # Needed to write header when saving weights
self.seen = header[3] # number of images seen during training
weights = np.fromfile(f, dtype=np.float32) # The rest are weights
"""
print("------------------------------------")
print("header:\n",header)
print("weights:\n",weights)
print("weights.shape:\n",weights.shape)
"""
# Establish cutoff for loading backbone weights
cutoff = None
if "darknet53.conv.74" in weights_path:
cutoff = 75
ptr = 0
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
#print("i:\n",i)
#print("module_def:\n",module_def)
#print("module:\n",module)
if i == cutoff:
break
if module_def["type"] == "convolutional":
conv_layer = module[0]
if module_def["batch_normalize"]:
# Load BN bias, weights, running mean and running variance
bn_layer = module[1]
num_b = bn_layer.bias.numel() # Number of biases
#print("bn_layer:\n",bn_layer)
#print("num_b:\n",num_b)
# Bias
bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
bn_layer.bias.data.copy_(bn_b)
ptr += num_b
# Weight
bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
bn_layer.weight.data.copy_(bn_w)
ptr += num_b
# Running Mean
bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
bn_layer.running_mean.data.copy_(bn_rm)
ptr += num_b
# Running Var
bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
bn_layer.running_var.data.copy_(bn_rv)
ptr += num_b
else:
# Load conv. bias
num_b = conv_layer.bias.numel()
conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
conv_layer.bias.data.copy_(conv_b)
ptr += num_b
# Load conv. weights
num_w = conv_layer.weight.numel()
conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
conv_layer.weight.data.copy_(conv_w)
ptr += num_w
#print("conv_w:\n",conv_w)
#print("num_w:\n",num_w)
#print("ptr:\n",ptr)
def save_darknet_weights(self, path, cutoff=-1):
"""
@:param path - path of the new weights file
@:param cutoff - save layers between 0 and cutoff (cutoff = -1 -> all are saved)
"""
fp = open(path, "wb")
self.header_info[3] = self.seen
self.header_info.tofile(fp)
# Iterate through layers
for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
if module_def["type"] == "convolutional":
conv_layer = module[0]
# If batch norm, load bn first
if module_def["batch_normalize"]:
bn_layer = module[1]
bn_layer.bias.data.cpu().numpy().tofile(fp)
bn_layer.weight.data.cpu().numpy().tofile(fp)
bn_layer.running_mean.data.cpu().numpy().tofile(fp)
bn_layer.running_var.data.cpu().numpy().tofile(fp)
# Load conv bias
else:
conv_layer.bias.data.cpu().numpy().tofile(fp)
# Load conv weights
conv_layer.weight.data.cpu().numpy().tofile(fp)
fp.close()

首先从__init__()函数开始，大致流程是从.cfg中解析文件，然后根据文件内容生成相关的网络结构。

解析后会生成一个列表，存储网络结构的各种属性，通过遍历这个列表便可以得到网络结构，解析后的列表如下图所示（部分）：

图1

self.hyperparams, self.module_list = create_modules(self.module_defs)，这条语句会根据生成的列表构建网络结构，create_modules（）函数如下：

def create_modules(module_defs):
"""
Constructs module list of layer blocks from module configuration in module_defs
"""
#pop() 函数用于移除列表中的一个元素（默认最后一个元素），并且返回该元素的值。
hyperparams = module_defs.pop(0)
output_filters = [int(hyperparams["channels"])]
module_list = nn.ModuleList()
for module_i, module_def in enumerate(module_defs):
modules = nn.Sequential()
if module_def["type"] == "convolutional":
bn = int(module_def["batch_normalize"])
filters = int(module_def["filters"])
kernel_size = int(module_def["size"])
pad = (kernel_size - 1) // 2
modules.add_module(
f"conv_{module_i}",
nn.Conv2d(
in_channels=output_filters[-1],
out_channels=filters,
kernel_size=kernel_size,
stride=int(module_def["stride"]),
padding=pad,
bias=not bn,
),
)
if bn:
modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
if module_def["activation"] == "leaky":
modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))
elif module_def["type"] == "maxpool":
kernel_size = int(module_def["size"])
stride = int(module_def["stride"])
if kernel_size == 2 and stride == 1:
#保证输出是偶数
modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, padding=int((kernel_size - 1) // 2))
modules.add_module(f"maxpool_{module_i}", maxpool)
elif module_def["type"] == "upsample":
upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
modules.add_module(f"upsample_{module_i}", upsample)
elif module_def["type"] == "route":
layers = [int(x) for x in module_def["layers"].split(",")]
filters = sum([output_filters[1:][i] for i in layers])
"""
print("------------------------------------")
print("layers: \n",layers)
print("output_filters:\n",output_filters)
print("output_filters[1:][i] :\n",[output_filters[1:][i] for i in layers])
print("output_filters[1:]:\n",output_filters[1:])
print("output_filters[1:][1]:\n",output_filters[1:][1])
print("output_filters[1:][3]:\n",output_filters[1:][3])
"""
modules.add_module(f"route_{module_i}", EmptyLayer())
elif module_def["type"] == "shortcut":
filters = output_filters[1:][int(module_def["from"])]
modules.add_module(f"shortcut_{module_i}", EmptyLayer())
elif module_def["type"] == "yolo":
anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
# Extract anchors
#print("----------------------------------")
#print("anchor_idxs\n:",anchor_idxs)
anchors = [int(x) for x in module_def["anchors"].split(",")]
#print("1. anchors \n:",anchors)
anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
#print("2. anchors \n:",anchors)
anchors = [anchors[i] for i in anchor_idxs]
#print("3. anchors \n:",anchors)
num_classes = int(module_def["classes"])
img_size = int(hyperparams["height"])
# Define detection layer
yolo_layer = YOLOLayer(anchors, num_classes, img_size)
modules.add_module(f"yolo_{module_i}", yolo_layer)
# Register module list and number of output filters
module_list.append(modules)
output_filters.append(filters)
return hyperparams, module_list

根据列表会生成相应的convolutional、maxpool、upsample、route、shortcut、yolo层。

convolutional层构建方法很常规：设置filter尺寸、数量，添加batch normalize层（在.cfg文件中batch_normalize=1），以及pad层，使用leaky激活函数。

maxpool层，不过在YOLOv3中没有使用最大池化来进行下采样，是使用的3*3的卷积核，步长=2的卷积操作进行下采样，（细心的同学会发现yolov3.cfg没有maxpool层），一共5次，下采样2^5=32倍数。

图2

upsample层，上采样层，由于nn.Upsample被弃用了，所以新建了一个类完成这个操作。

class Upsample(nn.Module):
""" nn.Upsample is deprecated """
def __init__(self, scale_factor, mode="nearest"):
super(Upsample, self).__init__()
self.scale_factor = scale_factor
self.mode = mode
def forward(self, x):
x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)
return x

接下来是route层，这层十分重要。这层的作用相当于把前面的特征图进行相融合。

[route]
layers = -4 # 只有一个值，一个路径
[route]
layers = -1, 61 # 两个值，两个路径，两个特征图进行特征融合

下图来自darknet-master（windos下的yolov3实现，纯C语言，下图是方便理解，本文不对该代码分析）

图3

layer=-4，表示当前层的序号减4，如第83层route，-4之后是79层，把79层的特征层融合（layer值只有一个，相当于只有链接过来），route层的输出可以看作是下一层的输入，即13*13*512和79层的特征图是完全吻合的。同理，layer=-1，61，表示融合85（86-1）层和61层的特征图。即26*26*512+26*26*256=26*26*768。~~至于为什么选这几个层进行融合，我表示并不清楚，希望有了解的朋友指点一下。~~ 这几层刚好下采样块的输出层。

shortcut层，直连层，借鉴于ResNet网络。

https://cloud.tencent.com/developer/article/1148375，更多细节可查看：https://blog.csdn.net/u014665013/article/details/81985082

ResNet 的动机依然是解决深度模型中的退化问题：层数越深，梯度越容易发散，误差越大，难以训练。理论上，模型层数越深，误差应该越小才对，因为我们总可以根据浅层模型的解构造出深层模型的解（将深层模型与浅层模型对应的层赋值为浅层模型的权重，将后面的层取为恒等映射），使得这个深层模型的误差不大于浅层模型的误差。但是实际上，深度模型的误差要比浅层模型的误差要大，在CIFAR-10上面的训练和测试误差如下图所示。产生这种现象的原因是深度模型难以优化，难以收敛到较优的解，并假设相比于直接优化最初的plain networks的模型F(x)=y，残差F(x)=y-x更容易优化。需要注意的是，变换F可以是很多层，也就是说shortcut不一定只跨越1层。并且实际中，由于shortcut只跨越单层没有优势，ResNet中是跨越了2层或3层

YOLOv3完整的结构有100+层，所以采用直连的方式来优化网络结构，能使网络更好的训练、更快的收敛。值得注意的是，YOLOv3的shortcut层是把网络的值进行叠加，没有改变特征图的大小，所以仔细的人会发现在shortcut层的前后，输入输出大小没变。

在本代码中，route层和shortcut层使用EmptyLayer()来进行占位。

重点：yolo层。

仔细看上图的五次采样，会发现有三个Scale，分别是Scale1（下采样2^3=8倍）,Scale2（下采样2^4=16倍），Scale3（下采样2^5=32倍），此时网络默认的尺寸是416*416，对应的feature map为52*52，26*26，13*13。这里借用一幅图：

https://blog.csdn.net/leviopku/article/details/82660381

图4

之所以使用3种尺度，是为了加强对小目标的检测，这个应该是借鉴SSD的思想。比较大的特征图来检测相对较小的目标，而小的特征图负责检测大目标。

在有多尺度的概念下，作者使用k-means得到9个先验框的尺寸（416*416的尺寸下）。作者原话为：

We still use k-means clustering to determine our bounding box priors. We just sort of chose 9 clusters and 3 scales arbitrarily and then divide up the clusters evenly across scales. On the COCO dataset the 9 clusters were: (10×13),(16×30),(33×23),(30×61),(62×45),(59× 119),(116×90),(156×198),(373×326).

解析yolo层代码：

elif module_def["type"] == "yolo":
anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
# Extract anchors
print("----------------------------------")
print("anchor_idxs\n:",anchor_idxs)
anchors = [int(x) for x in module_def["anchors"].split(",")]
print("1. anchors \n:",anchors)
anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
print("2. anchors \n:",anchors)
anchors = [anchors[i] for i in anchor_idxs]
print("3. anchors \n:",anchors)
num_classes = int(module_def["classes"])
img_size = int(hyperparams["height"])
# Define detection layer
yolo_layer = YOLOLayer(anchors, num_classes, img_size)
modules.add_module(f"yolo_{module_i}", yolo_layer)

可以看到输出：

图5

可以看到yolo层搭建了三次，可以看图4，第一个yolo层是下采样2^5=32倍，特征图尺寸是13*13（默认输入416*416，下同）。这层选择mask的ID是6，7，8，对应的anchor box尺寸是（116， 90）、（156， 198）、（373， 326）。这对应了上面所说的，小的特征图检测大目标，所以使用的anchor box最大。

至此，Darknet(YOLOv3)模型基本加载完毕，接下来就是，加载权重.weights文件，进行预测。

1.2模型预测（detect.py——part.2）

1.2.1 获取检测框

#查找weights_path路径下的.weights的文件
if opt.weights_path.endswith(".weights"):
# Load darknet weights
model.load_darknet_weights(opt.weights_path)
else:
# Load checkpoint weights
model.load_state_dict(torch.load(opt.weights_path))
# model.eval()，让model变成测试模式，这主要是对dropout和batch normalization的操作在训练和测试的时候是不一样的
model.eval() # Set in evaluation mode
dataloader = DataLoader(
ImageFolder(opt.image_folder, img_size=opt.img_size),
batch_size=opt.batch_size,
shuffle=False,
num_workers=opt.n_cpu,
)
classes = load_classes(opt.class_path) # Extracts class labels from file
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor
imgs = [] # Stores image paths
img_detections = [] # Stores detections for each image index
print("\nPerforming object detection:")
#返回当前时间的时间戳
prev_time = time.time()
for batch_i, (img_paths, input_imgs) in enumerate(dataloader):
# Configure input
input_imgs = Variable(input_imgs.type(Tensor))
#print("img_paths:\n",img_paths)
# Get detections
with torch.no_grad():
#52*52+26*26+13*13）*3=10647
# 5 + 80 =85
# detections : 10647*85
detections = model(input_imgs)
#非极大值抑制
detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)
#print("detections:\n",detections)
# Log progress
current_time = time.time()
#timedelta代表两个datetime之间的时间差
inference_time = datetime.timedelta(seconds=current_time - prev_time)
prev_time = current_time
print("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time))
# Save image and detections
#extend() 函数用于在列表末尾一次性追加另一个序列中的多个值（用新列表扩展原来的列表）。
imgs.extend(img_paths)
img_detections.extend(detections)
# Bounding-box colors
cmap = plt.get_cmap("tab20b")
colors = [cmap(i) for i in np.linspace(0, 1, 20)]
print("\nSaving images:")
# Iterate through images and save plot of detections
for img_i, (path, detections) in enumerate(zip(imgs, img_detections)):
print("(%d) Image: '%s'" % (img_i, path))
# Create plot
img = np.array(Image.open(path))
plt.figure()
fig, ax = plt.subplots(1)
ax.imshow(img)
# Draw bounding boxes and labels of detections
if detections is not None:
# Rescale boxes to original image
detections = rescale_boxes(detections, opt.img_size, img.shape[:2])
unique_labels = detections[:, -1].cpu().unique()
n_cls_preds = len(unique_labels)
bbox_colors = random.sample(colors, n_cls_preds)
for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:
print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))
box_w = x2 - x1
box_h = y2 - y1
color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
# Create a Rectangle patch
bbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")
# Add the bbox to the plot
ax.add_patch(bbox)
# Add label
plt.text(
x1,
y1,
s=classes[int(cls_pred)],
color="white",
verticalalignment="top",
bbox={"color": color, "pad": 0},
)
# Save generated image with detections
plt.axis("off")
plt.gca().xaxis.set_major_locator(NullLocator())
plt.gca().yaxis.set_major_locator(NullLocator())
filename = path.split("/")[-1].split(".")[0]
plt.savefig(f"output/{filename}.jpg", bbox_inches="tight", pad_inches=0.0)
plt.show()
plt.close()

model.load_darknet_weights(opt.weights_path)通过这个语句加载yolov3.weights。加载的完整代码如下：

def load_darknet_weights(self, weights_path):
"""Parses and loads the weights stored in 'weights_path'"""
# Open the weights file
with open(weights_path, "rb") as f:
header = np.fromfile(f, dtype=np.int32, count=5) # First five are header values
self.header_info = header # Needed to write header when saving weights
self.seen = header[3] # number of images seen during training
weights = np.fromfile(f, dtype=np.float32) # The rest are weights
"""
print("------------------------------------")
print("header:\n",header)
print("weights:\n",weights)
print("weights.shape:\n",weights.shape)
"""
# Establish cutoff for loading backbone weights
cutoff = None
if "darknet53.conv.74" in weights_path:
cutoff = 75
ptr = 0
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
#print("i:\n",i)
#print("module_def:\n",module_def)
#print("module:\n",module)
if i == cutoff:
break
if module_def["type"] == "convolutional":
conv_layer = module[0]
if module_def["batch_normalize"]:
# Load BN bias, weights, running mean and running variance
bn_layer = module[1]
num_b = bn_layer.bias.numel() # Number of biases
#print("bn_layer:\n",bn_layer)
#print("num_b:\n",num_b)
# Bias
bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
bn_layer.bias.data.copy_(bn_b)
ptr += num_b
# Weight
bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
bn_layer.weight.data.copy_(bn_w)
ptr += num_b
# Running Mean
bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
bn_layer.running_mean.data.copy_(bn_rm)
ptr += num_b
# Running Var
bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
bn_layer.running_var.data.copy_(bn_rv)
ptr += num_b
else:
# Load conv. bias
num_b = conv_layer.bias.numel()
conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
conv_layer.bias.data.copy_(conv_b)
ptr += num_b
# Load conv. weights
num_w = conv_layer.weight.numel()
conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
conv_layer.weight.data.copy_(conv_w)
ptr += num_w
#print("conv_w:\n",conv_w)
#print("num_w:\n",num_w)
#print("ptr:\n",ptr)

这一段的代码是解析.weights文件，这里我了解不够到位，欢迎有知道的人指点。主要是不知.weights的结构是怎样的，所以有点疑惑。加载完.weights文件之后，便开始加载测试图片数据。

dataloader = DataLoader(
ImageFolder(opt.image_folder, img_size=opt.img_size),
batch_size=opt.batch_size,
shuffle=False,
num_workers=opt.n_cpu,
)

ImageFolder是遍历文件夹下的测试图片，完整定义如下。ImageFolder中的__getitem__()函数会把图像归一化处理成img_size(默认416)大小的图片。

class ImageFolder(Dataset):
def __init__(self, folder_path, img_size=416):
#sorted(iterable[, cmp[, key[, reverse]]])
#sorted() 函数对所有可迭代的对象进行排序操作
##获取指定目录下的所有文件
self.files = sorted(glob.glob("%s/*.*" % folder_path))
self.img_size = img_size
def __getitem__(self, index):
img_path = self.files[index % len(self.files)]
# Extract image as PyTorch tensor
img = transforms.ToTensor()(Image.open(img_path))
# Pad to square resolution 变成方形
img, _ = pad_to_square(img, 0)
# Resize
img = resize(img, self.img_size)
return img_path, img
def __len__(self):
return len(self.files)

detections = model(input_imgs)，把图像放进模型中，得到检测结果。这里是通过Darknet的forward()函数得到检测结果。其完整代码如下：

def forward(self, x, targets=None):
img_dim = x.shape[2]
loss = 0
layer_outputs, yolo_outputs = [], []
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
x = module(x)
elif module_def["type"] == "route":
x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
elif module_def["type"] == "shortcut":
layer_i = int(module_def["from"])
x = layer_outputs[-1] + layer_outputs[layer_i]
elif module_def["type"] == "yolo":
x, layer_loss = module[0](x, targets, img_dim)
loss += layer_loss
yolo_outputs.append(x)
layer_outputs.append(x)
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
return yolo_outputs if targets is None else (loss, yolo_outputs)

通过遍历self.module_defs,与self.module_list，来完成网络的前向传播。

如果是"convolutional", "upsample", "maxpool"层，则直接使用前向传播即可。

如果是route层，则使用torch.cat()完成特征图的融合（拼接）。这里测试一张图：

这张图的尺寸为3*768*576，我们看看放进模型进行测试的时候，其shape是如何变化的。图像会根据cfg归一化成416*416.

接下来查看一下route层对应的ID以及shape：

图6

该模型的每一层的输出通过layer_outputs.append(x)，保存在layer_outputs列表中，本次结构完全符合本文前面所论述的部分。如果layer只有一个值，那么该route层的输出就是该层。如果layer有两个值，则route层输出是对应两个层的特征图的融合。

shortcut层则特别清晰，直接对应两层相叠加即可：

elif module_def["type"] == "shortcut":
layer_i = int(module_def["from"])
x = layer_outputs[-1] + layer_outputs[layer_i]

yolo层有三个，分别对应的特征图大小为13*13，26*26，52*52。每一个特征图的每一个cell会预测3个bounding boxes。每一个bounding box会预测预测三类值：（1）每个框的位置（4个值，中心坐标tx和ty，，框的高度bh和宽度bw），（2）一个objectness prediction ，一个目标性评分(objectness score)，即这块位置是目标的可能性有多大。这一步是在predict之前进行的，可以去掉不必要anchor，可以减少计算量（3）N个类别，COCO有80类，VOC有20类。

所以不难理解，在这里是COCO数据集，在13*13的特征图中，一共有13*13*3=507个bounding boxes，每一个bounding box预测（4+1+80=85）个值，用张量的形式表示为[1, 507, 85]，那个1表示的是batch size。同理，其余张量的shape不难理解。

图7

至于如何得到这个张量的，主要需要了解yolo层的forward()和compute_grid_offsets(),其完整代码如下：

class YOLOLayer(nn.Module):
"""Detection layer"""
def __init__(self, anchors, num_classes, img_dim=416):
super(YOLOLayer, self).__init__()
self.anchors = anchors
self.num_anchors = len(anchors)
self.num_classes = num_classes
self.ignore_thres = 0.5
self.mse_loss = nn.MSELoss()
self.bce_loss = nn.BCELoss()
self.obj_scale = 1
self.noobj_scale = 100
self.metrics = {}
self.img_dim = img_dim
self.grid_size = 0 # grid size
def compute_grid_offsets(self, grid_size, cuda=True):
self.grid_size = grid_size
g = self.grid_size
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
self.stride = self.img_dim / self.grid_size
# Calculate offsets for each grid
#repeat 相当于一个broadcasting的机制repeat(*sizes)
#沿着指定的维度重复tensor。不同与expand()，本函数复制的是tensor中的数据。
self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)
self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))
def forward(self, x, targets=None, img_dim=None):
# Tensors for cuda support
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
ByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensor
self.img_dim = img_dim
num_samples = x.size(0)
grid_size = x.size(2)
"""
所以在输入为416*416时，每个cell的三个anchor box为(116 ,90);
(156 ,198); (373 ,326)。16倍适合一般大小的物体，anchor box为
(30,61); (62,45); (59,119)。8倍的感受野最小，适合检测小目标，
因此anchor box为(10,13); (16,30); (33,23)。所以当输入为416*416时，
实际总共有（52*52+26*26+13*13）*3=10647个proposal box。
"""
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size)
.permute(0, 1, 3, 4, 2)
.contiguous()
)
"""
print("----------------------------------")
print("num_samples:\n",num_samples)
print("self.num_anchors:\n",self.num_anchors)
print("self.grid_size:\n",self.grid_size)
print("grid_size:\n",grid_size)
"""
#print("x:\n",x)
#print("prediction:\n",prediction)
# Get outputs
#print("prediction\n:",prediction)
#print("prediction.shape:\n",prediction.shape)
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.
"""
print("anchors \n:",self.anchors)
print("x.shape\n:",x.shape)
print("y.shape\n:",y.shape)
print("w.shape\n:",w.shape)
print("h.shape\n:",h.shape)
print("pred_conf.shape\n:",pred_conf.shape)
print("pred_cls.shape\n:",pred_cls.shape)
"""
# If grid size does not match current we compute new offsets
if grid_size != self.grid_size:
print("··················different··················")
self.compute_grid_offsets(grid_size, cuda=x.is_cuda)
# Add offset and scale with anchors
pred_boxes = FloatTensor(prediction[..., :4].shape)
"""
print("prediction[..., :4].shape:\n",prediction[..., :4].shape)
print("self.grid_x:\n",self.grid_x)
print("self.grid_y:\n",self.grid_y)
print("self.anchor_w:\n",self.anchor_w)
print("self.anchor_h:\n",self.anchor_h)
print("self.anchors:\n",self.anchors)
print("self.stride:\n",self.stride)
"""
pred_boxes[..., 0] = x.data + self.grid_x
pred_boxes[..., 1] = y.data + self.grid_y
pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
#torch.cat 按最后一维拼接
"""
print("pred_boxes.view(num_samples, -1, 4).shape:\n",pred_boxes.view(num_samples, -1, 4).shape)
print("pred_conf.view(num_samples, -1, 1).shape:\n",pred_conf.view(num_samples, -1, 1).shape)
print("pred_cls.view(num_samples, -1, self.num_classes).shape:\n",pred_cls.view(num_samples, -1, self.num_classes).shape)
"""
output = torch.cat(
(
pred_boxes.view(num_samples, -1, 4) * self.stride,
pred_conf.view(num_samples, -1, 1),
pred_cls.view(num_samples, -1, self.num_classes),
),
-1,
)
#print("output.shape:\n",output.shape)
#print("targets:\n",targets)
if targets is None:
return output, 0
else:
iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
pred_boxes=pred_boxes,
pred_cls=pred_cls,
target=targets,
anchors=self.scaled_anchors,
ignore_thres=self.ignore_thres,
)
# Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls
# Metrics
cls_acc = 100 * class_mask[obj_mask].mean()
conf_obj = pred_conf[obj_mask].mean()
conf_noobj = pred_conf[noobj_mask].mean()
conf50 = (pred_conf > 0.5).float()
iou50 = (iou_scores > 0.5).float()
iou75 = (iou_scores > 0.75).float()
detected_mask = conf50 * class_mask * tconf
precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)
self.metrics = {
"loss": to_cpu(total_loss).item(),
"x": to_cpu(loss_x).item(),
"y": to_cpu(loss_y).item(),
"w": to_cpu(loss_w).item(),
"h": to_cpu(loss_h).item(),
"conf": to_cpu(loss_conf).item(),
"cls": to_cpu(loss_cls).item(),
"cls_acc": to_cpu(cls_acc).item(),
"recall50": to_cpu(recall50).item(),
"recall75": to_cpu(recall75).item(),
"precision": to_cpu(precision).item(),
"conf_obj": to_cpu(conf_obj).item(),
"conf_noobj": to_cpu(conf_noobj).item(),
"grid_size": grid_size,
}
return output, total_loss

num_samples是每一批有多少张图片，grid_size是特征图的大小。

图8

使用torch.view,改变输入yolo层的张量结构（shape），以prediction命名的张量进行预测处理。

图9

接下来是便是对边框进行预测，具体细节可以参考：https://blog.csdn.net/qq_34199326/article/details/84109828。x，y坐标都是使用了sigmoid函数进行处理，置信度和类别概率使用同样的方法处理。

论文中的边界框预测：

图10

Bounding boxes with dimension priors and location prediction. We predict the width and height of the box as offsets from cluster centroids. We predict the center coordinates of the box relative to the location of ﬁlter application using a sigmoid function. This ﬁgure blatantly self-plagiarized from.

x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
pred_conf = torch.sigmoid(prediction[..., 4]) # Conf
pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred.

在3个尺度下，分别进行预测坐标、置信度、类别概率，这和在搭建yolo层一直，可对比图5。

图11

从图中我们发现grid_size和self.grid_size是不相等的，所以需要进行计算偏移，即compute_grid_offsets。完整代码在YOLOLayer中。

以gird=13为例。此时特征图是13*13，但原图shape尺寸是416*416，所以要把416*416评价切成13*13个方格，需要得到间隔（步距self.stride=416/13=32）。相应的并把anchor的尺寸进行缩放，即116/32=3.6250，90/32=2.8125。

图12

根据论文和图10可知，每一个小方格（cell），都会预测3个边界框，同样以gird=13为列。第一个小方格（cell），会预测3个边界框，每个边界框都有坐标+置信度+类别概率。所以以下代码中的x.shape=[1, 3, 13, 13],并且与y,w,h的shape一致。

print("x.shape=",x.shape)
print("x.data=\n",x.data)
pred_boxes[..., 0] = x.data + self.grid_x
pred_boxes[..., 1] = y.data + self.grid_y
pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
#torch.cat 按最后一维拼接

同时由于在最后进行拼接，得到输出output 。其507=13*13*3，2028=26*26*3，8112=52*52*3不难理解。

图13

由于target=None（推演的时候设置为None），所以输出的total_loss=0。

1.2.2 非极大值抑制

# detections : 10647*85
detections = model(input_imgs)
#非极大值抑制
detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)

在获取检测框之后，需要使用非极大值抑制来筛选框。即 detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)

完整代码如下：

def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
"""
Removes detections with lower object confidence score than 'conf_thres' and performs
Non-Maximum Suppression to further filter detections.
Returns detections with shape:
(x1, y1, x2, y2, object_conf, class_score, class_pred)
"""
# From (center x, center y, width, height) to (x1, y1, x2, y2)
prediction[..., :4] = xywh2xyxy(prediction[..., :4])
output = [None for _ in range(len(prediction))]
for image_i, image_pred in enumerate(prediction):
# Filter out confidence scores below threshold
print("------------------------------")
#print("image_i:\n",image_i)
print("image_pred.shape:\n",image_pred.shape)
image_pred = image_pred[image_pred[:, 4] >= conf_thres]#保留大于置信度的边界框
print("image_pred.size(0)",image_pred.size(0))
# If none are remaining => process next image
if not image_pred.size(0):
continue
# Object confidence times class confidence
# .max(1) 返回每行tensor的最大值 .max(1)[0]具体的最大值 .max(1)[1] 最大值对应的索引
score = image_pred[:, 4] * image_pred[:, 5:].max(1)[0]
"""
print("image_pred[:, 5:]:\n",image_pred[:, 5:])
print("image_pred[:, 5:].max(1):\n",image_pred[:, 5:].max(1))
print("image_pred[:, 5:].max(1)[0]:\n",image_pred[:, 5:].max(1)[0])
"""
# Sort by it
# 完成从大到小排序
image_pred = image_pred[(-score).argsort()]
"""
print("score:\n",score)
print("(-score).argsort():\n",(-score).argsort())
print("image_pred:\n",image_pred)\
"""
#若keepdim值为True，则在输出张量中，除了被操作的dim维度值降为1，其它维度与输入张量input相同。
#否则，dim维度相当于被执行torch.squeeze()维度压缩操作，导致此维度消失，
#最终输出张量会比输入张量少一个维度。
class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True)
#print("image_pred[:, 5:].max(1, keepdim=True):\n",image_pred[:, 5:].max(1, keepdim=True))
#print("image_pred[:, 5:].max(1, keepdim=False):\n",image_pred[:, 5:].max(1, keepdim=False))
detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1)
# Perform non-maximum suppression
#print("detections.size():\n",detections.size())
#print("detections.size(0):\n",detections.size(0))
#print("image_pred[:, :5]:\n",image_pred[:, :5])
keep_boxes = []
while detections.size(0):
#torch.unsqueeze()这个函数主要是对数据维度进行扩充
large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_thres
label_match = detections[0, -1] == detections[:, -1]
# Indices of boxes with lower confidence scores, large IOUs and matching labels
invalid = large_overlap & label_match
weights = detections[invalid, 4:5]#置信度
"""
print("1.detections:\n",detections)
print("large_overlap:\n",large_overlap)
print("detections[0, -1]:\n",detections[0, -1])
print("detections[:, -1]:\n",detections[:, -1])
print("label_match:\n",label_match)
print("invalid:\n",invalid)
print("weights:\n",weights)
"""
# Merge overlapping bboxes by order of confidence
detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
"""
print("detections[invalid, :4]:\n",detections[invalid, :4])
print("weights * detections[invalid, :4]:\n",weights * detections[invalid, :4])
print("detections[invalid, :4].sum(0):\n",detections[invalid, :4].sum(0))
print("weights * detections[invalid, :4].sum(0):\n",weights * detections[invalid, :4].sum(0))
print("2.detections:\n",detections)
"""
keep_boxes += [detections[0]]
detections = detections[~invalid]
#print("3.detections:\n",detections)
if keep_boxes:
output[image_i] = torch.stack(keep_boxes)
return output

非极大值抑制算法可参考：

https://www.cnblogs.com/makefile/p/nms.html

https://www.jianshu.com/p/d452b5615850

在经过非极大值抑制处理之后，在这里唯一有一点不同的是，这里采取了边界框“融合”的策略：

# Merge overlapping bboxes by order of confidence
detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()

显示非极大值抑制过后的目标检测效果。

图14

至此第一部分检测分析完毕，剩下关于训练的部分还在更新中。

~~Pytorch | yolov3代码详解（二）（更新中）~~

已更完

Pytorch | yolov3代码详解（二）

https://blog.csdn.net/qq_24739717/article/details/96705055

Yolo系列之Yolo的基本理解是十一月末 YOLO python 开发语言 yolo
YOLO的基本理解目录YOLO的基本理解1YOLO1.1概念1.2算法2单、多阶段对比2.1FLOPs和FPS2.2one-stage单阶段2.3two-stage两阶段1YOLO1.1概念YOLO(YouOnlyLookOnce)是一种基于深度学习的目标检测算法，由JosephRedmon等人于2016年提出。它的核心思想是将目标检测问题转化为一个回归问题，通过一个神经网络直接预测目标的类别和位
yolov8实战第七天——pyqt5-yolov8实现车牌识别系统（参考论文（约7000字）+环境配置+完整部署代码+代码使用说明+训练好的模型）学术菜鸟小晨 yolov8实战100天 python YOLO pyqt5 车牌识别毕业设计论文
基于pyqt5-yolov8实现车牌识别系统，包括图片车牌识别，视频车牌识别，视频流车牌识别。效果展示（图片检测，检测到的内容添加到历史记录）：效果展示（视频检测，视频车辆只会添加一条记录，下文更多实际应用中的优化策略）：新增功能：批量图片检测（2024/5/7更新代码）
YOLOv12优化：图像去噪 | AAAI2025 Transformer |一种基于Transformer的盲点网络（TBSN）架构，结合空间和通道自注意力层来增强网络能力 AI小怪兽 YOLOv12魔术师 YOLO transformer 深度学习人工智能 python
提出了一种基于Transformer的盲点网络（TBSN）架构，通过分析和重新设计Transformer运算符以满足盲点要求。TBSN遵循扩张BSN的架构原则，并结合空间和通道自注意力层来增强网络能力。如何使用：1）结合C3k2二次创新使用；2）结合A2C2f二次创新使用；亮点包括：1.提出了一种新的基于Transformer的盲点网络（TBSN）架构；2.引入了知识蒸馏策略来提高计算效率；3.在
【ai】mocap：conda 安装python3.8+ cuda+ pytorch+torchaudio、torchvision 等风来不如迎风去 AI入门与实战人工智能 ubuntu conda
MotionCapubuntu18.04不知道为啥会依赖于ffmpeg、xorg渲染？安装pytorch就是会带上cudacudnn啥的pytorch【ai】tx2nx：安装torch、torchvisionforyolov5这里就发现pytorch和torchvision有依赖关系的，还涉及到rapidjson所以python的环境隔离很重要。核心库-cudatoolkit=11.3-pytor
YOLO11改进-模块-引入频率谱动态聚合模块FSDA 去除噪声一勺汤 YOLOv11模型改进系列目标检测魔改模块 YOLO YOLOv11 YOLOv11改进改进
在图像去雾领域，深度学习在白天图像去雾方面成果显著，但夜间雾图研究较少。夜间雾图面临诸多挑战，其中包括雾、辉光和噪声因多个低强度有源彩色光源而具有复杂特性，以及模拟与真实数据的域差异导致的亮度问题。为解决这些，我们使用FSDA模块，处理频率不一致特性。FSDA先对频谱信息聚合，再计算通道权重并应用，最后映射回空间域，以此优化频谱信息，使模型更好处理复杂干扰。本文将其与YOLOv11相结合，增强YO
YOLO魔改之频率分割模块（FDM）清风AI YOLO算法魔改系列 YOLO 人工智能计算机视觉目标检测 python 深度学习
目标检测原理目标检测是一种将目标分割和识别相结合的图像处理技术，旨在从图像中定位并识别特定目标。深度学习方法，如FasterR-CNN和YOLO系列，已成为主流解决方案。这些方法通常采用两阶段或单阶段策略，通过卷积神经网络(CNN)提取特征并进行分类和定位。在小目标检测中，为克服分辨率低和特征不明显的问题，模型设计中会特别注重特征融合和多尺度处理，以增强对小目标的感知能力。YOLOv8基础YOLO
目标检测YOLO实战应用案例100讲-基于毫米波雷达与摄像头协同的道路目标检测与识别（续）林聪木目标检测 YOLO 人工智能
目录3.2实测数据采集与分析3.2.1回波数据处理3.2.2毫米波雷达数据采集实验3.3基于传统图像特征的目标识别算法3.3.1基于灰度共生矩阵的时频图特征提取3.3.2支持向量机分类器3.3.3实验及结果分析3.4基于卷积神经网络的目标识别算法3.4.1卷积神经网络的基本理论3.4.2卷积神经网络框架设计3.4.3实验及结果分析基于图像的目标检测算法4.1目标检测算法一般流程4.2典型目标检测算
Python 的 ultralytics 库详解白.夜人工智能
ultralytics是一个专注于计算机视觉任务的Python库，尤其以YOLO（YouOnlyLookOnce）系列模型为核心，提供了简单易用的接口，支持目标检测、实例分割、姿态估计等任务。本文将详细介绍ultralytics库的功能、安装方法、核心模块以及使用示例。1.ultralytics库简介ultralytics库由Ultralytics团队开发，旨在为YOLO系列模型提供高效、灵活且易
智慧城市道路防护栏破损缺陷检测数据集VOC+YOLO格式6939张3类别 FL1623863129 数据集 YOLO 深度学习机器学习
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：6939标注数量(xml文件个数)：6939标注数量(txt文件个数)：6939标注类别数：3标注类别名称(注意yolo格式类别顺序不和这个对应，而以labels文件夹classes.txt为准):["body","cr
将 VOC 格式 XML 转换为 YOLO 格式 TXT JeJe同学 xml YOLO
目录1.导入必要的模块2.定义类别名称3.设置文件路径完整代码1.导入必要的模块importosimportxml.etree.ElementTreeasETos：用于文件和目录操作，例如创建目录、遍历文件等。xml.etree.ElementTree：用于解析XML文件，从中提取信息。2.定义类别名称class_names=['nest','balloon','kite','trash']这是一
Yolov8训练自己的数据集(脱离ultralytics库) 爱吃肉的鹏 YOLO
最近在整理关于yolov8的相关内容，有个很大的问题，抛开yolov8性能不谈，yolov8代码的使用灵活性不如yolov5，尤其是对于一些新手或者对yolo框架不是很熟悉的人(这也是因人而异，有些人可能会喜欢v8代码的使用方式)。比如在使用v8的时候需要安装ultralytics库，然后再调用YOLO进行训练或者预测，那么就有这几个问题：问题1：安装了ultralytics库后如何使用YOLO呢
标签转换脚本 - VOC格式转COCO格式，即voc2coco，xml2json 附VOC及COCO标签格式详解 Limiiiing YOLO训练/写作脚本 YOLO 计算机视觉目标检测深度学习
前言本文的脚本功能为将VOC数据集的标签文件xml转成COCO的标签文件，指定自己的VOC数据集的标签文件路径后，可一键运行转成COCO的标签文件。专栏目录：YOLO训练/写作脚本目录一览|涉及标签转换、数据扩充、热力图、感受野、精度曲线、数量统计等近百个脚本文件专栏地址：YOLO训练/写作脚本——丰富文章内容，增强实验信服力，助力发文！！！文章目录前言一、VOC数据集介绍1.1总体结构1.2各标
标签转换脚本 - VOC格式转YOLO格式，即voc2yolo，xml2txt 附VOC及YOLO标签格式详解 Limiiiing YOLO训练/写作脚本 YOLO 深度学习计算机视觉目标检测
前言本文的脚本功能为将VOC数据集的标签文件xml转成YOLO的标签文件，指定自己的VOC数据集的标签文件路径后，可一键运行转成YOLO的标签文件。专栏目录：YOLO训练/写作脚本目录一览|涉及标签转换、数据扩充、热力图、感受野、精度曲线、数量统计等近百个脚本文件专栏地址：YOLO训练/写作脚本——丰富文章内容，增强实验信服力，助力发文！！！文章目录前言一、VOC数据集介绍1.1总体结构1.2各标
数据集格式转换——json2txt、xml2txt、txt2json【复制就能用】 kay_545 YOLO11改进有效涨点 python 人工智能机器学习
秋招面试专栏推荐：深度学习算法工程师面试问题总结【百面算法工程师】——点击即可跳转本专栏所有程序均经过测试，可成功执行专栏地址：YOLO11入门+改进涨点——点击即可跳转欢迎订阅目录json2txt脚本xml2txttxt2json
yolo模型coco数据集详解工头阿乐深度学习 YOLO
深度学习文章目录深度学习前言前言instances_train2017.json和instances_val2017.json文件均分为五大部分，这五部分对应的关键字分别为info、licenses、images、annotations、categories。{"info":info,"licenses":[license1,license2,license3,...],"images":[ima
Ultralytics包引起的编码报错问题 Xylokrysen 深度学习深度学习 YOLO
安装完Ultralytics包后，加载YOLO相关模型，执行报错：UnicodeEncodeError:'gbk'codeccan'tencodecharacter'\u0467'inposition3:illegalmultibytesequence这个错误是由于文件编码问题引起的，Ultralytics在初始化时会尝试创建或更新配置文件settings.yaml，而Windows系统默认使用G
yolov4 zzh- 笔记
V4贡献：亲民政策，单GPU就能训练的非常好，接下来很多小模块都是这个出发点两大核心方法，从数据层面和网络设计层面来进行改善消融实验，感觉能做的都让他给做了，这工作量不轻全部实验都是单GPU完成，不用太担心设备了Bagoffreebies(BOF)只增加训练成本，但是能显著提高精度，并不影响推理速度数据增强：调整亮度、对比度、色调、随机缩放、剪切、翻转、旋转网络正则化的方法：Dropout、Dro
【保姆级视频教程（一）】YOLOv12环境配置：从零到一，手把手保姆级教程！| 小白也能轻松玩转目标检测！一只云卷云舒 YOLOv12保姆级通关教程 YOLO YOLOv12 flash attention GPU 计算能力算力
【2025全站首发】YOLOv12环境配置：从零到一，手把手保姆级教程！|小白也能轻松玩转目标检测！文章目录1.FlashAttentionWindows端WHL包下载1.1简介1.2下载链接1.3国内镜像站1.4安装方法2.NVIDIAGPU计算能力概述2.1简介2.2计算能力版本与GPU型号对照表2.2.1CUDA-EnabledDatacenterProducts2.2.2CUDA-Enab
yolov8的第一次实验报告算法宇宙 YOLO 人工智能计算机视觉
1.实验概述实验名称:占道经营目标检测模型实验目标:提高模型的精确率（Precision）和召回率（Recall），使其接近1。实验日期:[2025-01-16]2.数据集数据集名称:[datasets]数据集大小:[2.68Gb]数据集描述:[数据集主要分两个类别：zdjy_ld,zdjy_gd]注释：占道经营流动，占道经营固定3.模型配置3.1基础配置·模型类型:YOLOv8·预训练模型:YO
YOLOv8n-OBB使用C#在windows10进行部署（CPU） cd_Ww777 YOLO
1.训练YOLOv8-OBB模型1.1数据集制作所用标注工具：X-AnyLabeling下载链接：https://github.com/CVHub520/X-AnyLabeling/releases/download/v2.3.6/X-AnyLabeling-CPU.exe附上两张图片为标注过程中的重要步骤；标注快捷键的使用具体参考官方文档，附图为简单实用的快捷键。https://github.c
C#学习笔记（3）：调用YOLOv8 playerofIE c#学习笔记 YOLO python
最近做的项目需要C#编写上位机程序，同时也要使用yolo进行深度学习检测。使用pythonnet调用写好的py文件，C#代码如下:Runtime.PythonDLL="python310.dll";PythonEngine.Initialize();using(Py.GIL()){dynamicsys=Py.Import("sys");dynamictorch=Py.Import("torch")
YOLOv8 的简介及C#中如何简单应用YOLOv8 码上有潜 YOLOv8 YOLO
YOLOv8是YOLO（YouOnlyLookOnce）系列中的最新版本，是一种用于目标检测和图像分割的深度学习模型。YOLO模型以其快速和准确的目标检测性能而著称，广泛应用于实时应用程序中。主要特点高效性：YOLOv8在保持高检测速度的同时，进一步提高了检测精度。端到端训练：可以直接从图像输入端到分类结果输出，简化了训练和部署过程。改进的架构：包括更深的网络结构、更复杂的特征提取方法以及更高效的
Yolov11目标检测(ultralytics) @M_J_Y@ 目标检测 YOLO 目标检测人工智能
Yolov11目标检测（ultralytics）1.克隆仓库2.安装环境依赖3.训练、验证、推理以及onnx模型导出1.克隆仓库从官网下载Yolov11到本地。[email protected]:ultralytics/ultralytics.git2.安装环境依赖pipinstall-e.-ihttps://pypi.mirrors.ustc.edu.cn/simple/3.训练、验证
使用 labelImg 制作YOLO系列目标检测数据集（ 2401_89791028 YOLO 目标检测人工智能
文章转载自K同学，谨防原文失效可参考link1和link2和link3LabelImg介绍LabelImg支持文件夹的导入，在标完一张后，在左侧选择NextImage就可以切换到下一张继续了。输出格式部分，目前LabelImg支持YOLO和PascalVOC2种格式，前者标签文件后缀是.txt件，而后者标签文件后缀是.xml件。标签保存在对应的labels文件夹下，与images中的图片文件名一一
YOLOv8n-seg.pt的使用（实例分割，训练自己制作的数据集）再坚持一下！！！ YOLO
Ubuntu+python3一、YOLOV8源码下载参考：GitHub-ultralytics/ultralytics:NEW-YOLOv8inPyTorch>ONNX>OpenVINO>CoreML>TFLite二、数据集制作1.labelme下载：pip3installlabelme2.终端输入labelme,打开labelme。界面“打开目录”，打开图片目录images，进行多边形标注（右键
YOLOv8 改进：添加 GAM 注意力机制鱼弦人工智能时代 YOLO
YOLOv8改进：添加GAM注意力机制引言在目标检测领域，YOLO（YouOnlyLookOnce）网络因其速度和准确性被广泛应用。然而，随着场景的复杂化，仅仅依靠卷积特征可能不足以捕捉图像中的重要信息。引入注意力机制，如GAM（GlobalAttentionMechanism），可以有效提高模型对关键区域的关注，从而提升检测性能。技术背景GAM是一种全局注意力机制，通过全局信息聚合和自适应权重分
从0到1构建AI深度学习视频分析系统--基于YOLO 目标检测的动作序列检查系统：（2）消息队列与消息中间件 shiter 人工智能系统解决方案与技术架构人工智能深度学习音视频
文章大纲原始视频队列Python内存视频缓存优化方案（4GB以内）一、核心参数设计二、内存管理实现三、性能优化策略四、内存占用验证五、高级优化技巧六、部署建议检测结果队列YOLO检测结果队列技术方案一、技术选型矩阵二、核心实现代码三、性能优化策略四、可视化方案对比五、部署建议逻辑判定队列时间片图论时间序列大模型引入参考文献原始视频队列想要在单机内存中缓存1-5分钟的视频片段，python技术栈的话
Python—JSON格式标签转换为TXT格式标签详细教程2（附完整代码）资源补给站 python 图像处理笔记 python json 开发语言
这个代码主要是解析一个json文件转换成多个txt文件使用的，尤其是便于yolo训练decode_json函数中的convert函数确实是用于将坐标缩放到0-1之间的。但是，您在调用decode_json函数时设置了is_convert=False，这意味着坐标缩放功能被关闭了代码详解数字规范化的会将坐标缩放至(0—1）区间主要是修改这两个地方即可，话不多说，咱们直接附代码#下面是将`is_con
YOLOv8改进添加swin transformer 兜里没有一毛钱 YOLO系列改进管理 YOLO transformer python
最近在做实验，需要改进YOLOv8，去网上找了很多教程都是充钱才能看的，NND这对一个一餐只能吃两个菜的大学生来说是多么的痛苦，所以自己去找代码手动改了一下，成功实现YOLOv8改进添加swintransformer，本人水平有限，改得不对的地方请自行改正。第一步，在ultralytics\nn\modules\block.py代码中的最后部分中添加swintransformer代码，代码如下：#
YOLOv5+UI界面在车辆检测中的应用与实现深度学习&目标检测实战项目 YOLOv5实战项目 YOLO ui 分类数据挖掘目标跟踪人工智能
1.引言随着智能交通系统（ITS）的快速发展，车辆检测已成为计算机视觉领域的重要研究方向。车辆检测技术广泛应用于交通流量监控、车辆违章抓拍、无人驾驶等场景中。近年来，深度学习技术的突破，特别是卷积神经网络（CNN）的崛起，使得目标检测技术取得了显著进展。其中，YOLO（YouOnlyLookOnce）系列模型以其高效的实时检测能力和出色的性能成为车辆检测领域的首选方法之一。在本文中，我们将基于YO
JAVA中的Enum 周凡杨 java enum 枚举
Enum是计算机编程语言中的一种数据类型---枚举类型。在实际问题中，有些变量的取值被限定在一个有限的范围内。例如，一个星期内只有七天我们通常这样实现上面的定义： public String monday; public String tuesday; public String wensday; public String thursday
赶集网mysql开发36条军规 Bill_chen mysql 业务架构设计 mysql调优 mysql性能优化
(一)核心军规 (1)不在数据库做运算 cpu计算务必移至业务层； (2)控制单表数据量 int型不超过1000w，含char则不超过500w；合理分表；限制单库表数量在300以内； (3)控制列数量字段少而精，字段数建议在20以内
Shell test命令 daizj shell 字符串 test 数字文件比较
Shell test命令 Shell中的 test 命令用于检查某个条件是否成立，它可以进行数值、字符和文件三个方面的测试。数值测试参数说明 -eq 等于则为真 -ne 不等于则为真 -gt 大于则为真 -ge 大于等于则为真 -lt 小于则为真 -le 小于等于则为真实例演示： num1=100 num2=100if test $[num1]
XFire框架实现WebService(二) 周凡杨 java webservice
有了XFire框架实现WebService(一)，就可以继续开发WebService的简单应用。 Webservice的服务端(WEB工程)：两个java bean类： Course.java package cn.com.bean; public class Course { private
重绘之画图板朱辉辉33 画图板
上次博客讲的五子棋重绘比较简单，因为只要在重写系统重绘方法paint（）时加入棋盘和棋子的绘制。这次我想说说画图板的重绘。画图板重绘难在需要重绘的类型很多，比如说里面有矩形，园，直线之类的，所以我们要想办法将里面的图形加入一个队列中，这样在重绘时就
Java的IO流西蜀石兰 java
刚学Java的IO流时，被各种inputStream流弄的很迷糊，看老罗视频时说想象成插在文件上的一根管道，当初听时觉得自己很明白，可到自己用时，有不知道怎么代码了。。。每当遇到这种问题时，我习惯性的从头开始理逻辑，会问自己一些很简单的问题，把这些简单的问题想明白了，再看代码时才不会迷糊。 IO流作用是什么？答：实现对文件的读写，这里的文件是广义的； Java如何实现程序到文件
No matching PlatformTransactionManager bean found for qualifier 'add' - neither 林鹤霄
java.lang.IllegalStateException: No matching PlatformTransactionManager bean found for qualifier 'add' - neither qualifier match nor bean name match! 网上找了好多的资料没能解决，后来发现：项目中使用的是xml配置的方式配置事务，但是
Row size too large (> 8126). Changing some columns to TEXT or BLOB aigo column
原文：http://stackoverflow.com/questions/15585602/change-limit-for-mysql-row-size-too-large 异常信息： Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAM
JS 格式化时间 alxw4616 JavaScript
/** * 格式化时间 2013/6/13 by 半仙 [email protected] * 需要 pad 函数 * 接收可用的时间值. * 返回替换时间占位符后的字符串 * * 时间占位符:年 Y 月 M 日 D 小时 h 分 m 秒 s 重复次数表示占位数 * 如 YYYY 4占4位 YY 占2位<p></p> * MM DD hh mm
队列中数据的移除问题百合不是茶队列移除
队列的移除一般都是使用的remov();都可以移除的,但是在昨天做线程移除的时候出现了点问题,没有将遍历出来的全部移除, 代码如下; // package com.Thread0715.com; import java.util.ArrayList; public class Threa
Runnable接口使用实例 bijian1013 java thread Runnable java多线程
Runnable接口 a. 该接口只有一个方法：public void run(); b. 实现该接口的类必须覆盖该run方法 c. 实现了Runnable接口的类并不具有任何天
oracle里的extend详解 bijian1013 oracle 数据库 extend
扩展已知的数组空间，例： DECLARE TYPE CourseList IS TABLE OF VARCHAR2(10); courses CourseList; BEGIN -- 初始化数组元素，大小为3 courses := CourseList('Biol 4412 ', 'Psyc 3112 ', 'Anth 3001 '); --
【httpclient】httpclient发送表单POST请求 bit1129 httpclient
浏览器Form Post请求浏览器可以通过提交表单的方式向服务器发起POST请求，这种形式的POST请求不同于一般的POST请求 1. 一般的POST请求，将请求数据放置于请求体中，服务器端以二进制流的方式读取数据，HttpServletRequest.getInputStream()。这种方式的请求可以处理任意数据形式的POST请求，比如请求数据是字符串或者是二进制数据 2. Form
【Hive十三】Hive读写Avro格式的数据 bit1129 hive
1. 原始数据 hive> select * from word; OK 1 MSN 10 QQ 100 Gtalk 1000 Skype 2. 创建avro格式的数据表 hive> CREATE TABLE avro_table(age INT, name STRING)STORE
nginx+lua+redis自动识别封解禁频繁访问IP ronin47
在站点遇到攻击且无明显攻击特征，造成站点访问慢，nginx不断返回502等错误时，可利用nginx+lua+redis实现在指定的时间段内，若单IP的请求量达到指定的数量后对该IP进行封禁，nginx返回403禁止访问。利用redis的expire命令设置封禁IP的过期时间达到在指定的封禁时间后实行自动解封的目的。一、安装环境： CentOS x64 release 6.4(Fin
java-二叉树的遍历-先序、中序、后序（递归和非递归）、层次遍历 bylijinnan java
import java.util.LinkedList; import java.util.List; import java.util.Stack; public class BinTreeTraverse { //private int[] array={ 1, 2, 3, 4, 5, 6, 7, 8, 9 }; private int[] array={ 10,6,
Spring源码学习-XML 配置方式的IoC容器启动过程分析 bylijinnan java spring IOC
以FileSystemXmlApplicationContext为例，把Spring IoC容器的初始化流程走一遍： ApplicationContext context = new FileSystemXmlApplicationContext ("C:/Users/ZARA/workspace/HelloSpring/src/Beans.xml&q
[科研与项目]民营企业请慎重参与军事科技工程 comsci 企业
军事科研工程和项目并非要用最先进，最时髦的技术，而是要做到“万无一失” 而民营科技企业在搞科技创新工程的时候，往往考虑的是技术的先进性，而对先进技术带来的风险考虑得不够，在今天提倡军民融合发展的大环境下，这种“万无一失”和“时髦性”的矛盾会日益凸显。。。。。。所以请大家在参与任何重大的军事和政府项目之前，对
spring 定时器-两种方式 cuityang spring quartz 定时器
方式一：间隔一定时间运行 <bean id="updateSessionIdTask" class="com.yang.iprms.common.UpdateSessionTask" autowire="byName" /> <bean id="updateSessionIdSchedule
简述一下关于BroadView站点的相关设计 damoqiongqiu view
终于弄上线了，累趴，戳这里http://www.broadview.com.cn 简述一下相关的技术点前端：jQuery+BootStrap3.2+HandleBars，全站Ajax（貌似对SEO的影响很大啊！怎么破？），用Grunt对全部JS做了压缩处理，对部分JS和CSS做了合并（模块间存在很多依赖，全部合并比较繁琐，待完善）。后端：U
运维 PHP问题汇总 dcj3sjt126com windows2003
1、Dede(织梦)发表文章时,内容自动添加关键字显示空白页解决方法：后台>系统>系统基本参数>核心设置>关键字替换（是/否），这里选择“是”。后台>系统>系统基本参数>其他选项>自动提取关键字，这里选择“是”。 2、解决PHP168超级管理员上传图片提示你的空间不足网站是用PHP168做的，反映使用管理员在后台无法
mac 下安装php扩展 - mcrypt dcj3sjt126com PHP
MCrypt是一个功能强大的加密算法扩展库，它包括有22种算法，phpMyAdmin依赖这个PHP扩展，具体如下：下载并解压libmcrypt-2.5.8.tar.gz。在终端执行如下命令： tar zxvf libmcrypt-2.5.8.tar.gz cd libmcrypt-2.5.8/ ./configure --disable-posix-threads --
MongoDB更新文档 [四] eksliang mongodb Mongodb更新文档
MongoDB更新文档转载请出自出处：http://eksliang.iteye.com/blog/2174104 MongoDB对文档的CURD，前面的博客简单介绍了，但是对文档更新篇幅比较大，所以这里单独拿出来。语法结构如下： db.collection.update( criteria, objNew, upsert, multi) 参数含义参数
Linux下的解压，移除，复制，查看tomcat命令 y806839048 tomcat
重复myeclipse生成webservice有问题删除以前的，干净 1、先切换到：cd usr/local/tomcat5/logs 2、tail -f catalina.out 3、这样运行时就可以实时查看运行日志了 Ctrl+c 是退出tail命令。有问题不明的先注掉 cp /opt/tomcat-6.0.44/webapps/g
Spring之使用事务缘由(3-XML实现) ihuning spring
用事务通知声明式地管理事务事务管理是一种横切关注点。为了在 Spring 2.x 中启用声明式事务管理，可以通过 tx Schema 中定义的 <tx:advice> 元素声明事务通知，为此必须事先将这个 Schema 定义添加到 <beans> 根元素中去。声明了事务通知后，就需要将它与切入点关联起来。由于事务通知是在 <aop:
GCD使用经验与技巧浅谈啸笑天 GC
前言 GCD(Grand Central Dispatch)可以说是Mac、iOS开发中的一大“利器”，本文就总结一些有关使用GCD的经验与技巧。 dispatch_once_t必须是全局或static变量这一条算是“老生常谈”了，但我认为还是有必要强调一次，毕竟非全局或非static的dispatch_once_t变量在使用时会导致非常不好排查的bug，正确的如下： 1
linux（Ubuntu）下常用命令备忘录1 macroli linux 工作 ubuntu
在使用下面的命令是可以通过--help来获取更多的信息1,查询当前目录文件列表：ls ls命令默认状态下将按首字母升序列出你当前文件夹下面的所有内容，但这样直接运行所得到的信息也是比较少的，通常它可以结合以下这些参数运行以查询更多的信息： ls / 显示/.下的所有文件和目录 ls -l 给出文件或者文件夹的详细信息 ls -a 显示所有文件，包括隐藏文
nodejs同步操作mysql qiaolevip 学习永无止境每天进步一点点 mysql nodejs
// db-util.js var mysql = require('mysql'); var pool = mysql.createPool({ connectionLimit : 10, host: 'localhost', user: 'root', password: '', database: 'test', port: 3306 });
一起学Hive系列文章 superlxw1234 hive Hive入门
[一起学Hive]系列文章目录贴，入门Hive，持续更新中。 [一起学Hive]之一—Hive概述，Hive是什么 [一起学Hive]之二—Hive函数大全-完整版 [一起学Hive]之三—Hive中的数据库(Database)和表(Table) [一起学Hive]之四-Hive的安装配置 [一起学Hive]之五-Hive的视图和分区 [一起学Hive
Spring开发利器：Spring Tool Suite 3.7.0 发布 wiselyman spring
Spring Tool Suite(简称STS)是基于Eclipse，专门针对Spring开发者提供大量的便捷功能的优秀开发工具。在3.7.0版本主要做了如下的更新：将eclipse版本更新至Eclipse Mars 4.5 GA Spring Boot(JavaEE开发的颠覆者集大成者，推荐大家学习)的配置语言YAML编辑器的支持(包含自动提示，