本人前段时间用yolov5进行目标检测研究,记录一下流程方便查看,也希望能帮助刚入门的人快速利用yolov5进行研究。
(1)https://blog.csdn.net/oJiWuXuan/article/details/107558286?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param
(2)https://blog.csdn.net/sihaiyinan/article/details/89417963?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159149151419724846405391%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=159149151419724846405391&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2alltop_click~default-1-89417963.ecpm_v1_rank_ctr_v3&utm_term=voc_eval
为了方便,可以直接下载本人调试过的代码,这些代码在原有代码上有增加,如mAP的计算。
百度网盘:
链接:https://pan.baidu.com/s/1sbqoA5-xY3z5bZIItwwO5g
提取码:0ved
也可以下载GitHub上的原始代码:
代码下载: https://github.com/ultralytics/yolov5
下载权重:
我下载的是yolov5s.pt,并将该权重保存到 yolov5\weights中。
在yolov5\data目录下新建Annotations, ImageSets, labels 三个文件夹。
先将images文件夹清空,然后将用于训练的图片放入images中,将对应的xml文件放入Annotations中,如下图所示。
在yolov5根目录下创建makeTxt.py,代码如下:
import os
import random
trainval_percent = 0.03
train_percent = 1.0
xmlfilepath = './data/Annotations'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
trainval = random.sample(list, tv)
txt_train='./data/ImageSets/train.txt'
if os.path.exists(txt_train):
os.remove(txt_train)
else:
open(txt_train,'w')
txt_val='./data/ImageSets/val.txt'
if os.path.exists(txt_val):
os.remove(txt_val)
else:
open(txt_val,'w')
ftrain = open(txt_train, 'w')
fval = open(txt_val, 'w')
for i in list:
name = total_xml[i][:-4] + '\n'
ftrain.write(name)
if i in trainval:
fval.write(name)
ftrain.close()
fval.close()
makeTxt.py主要是将数据集分类成训练数据集和验证数据集,运行后ImagesSets文件夹中会生成2个文件,用于记录训练数据集和验证数据集的图片名称,如下图所示。
因为本人研究需要,对makeTxt.py作了修改。 原来的makeTxt.py代码请参考
https://blog.csdn.net/oJiWuXuan/article/details/107558286?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param
接着再新建另一个文件voc_label.py,切记,classes=[……] 中填入的一定要是自己在数据集中所标注的类别名称,标记了几个类别就填写几个类别名,填写错误的话会造成读取不出xml文件里的标注信息。代码如下:
# -*- coding: utf-8 -*-
# xml解析包
import xml.etree.ElementTree as ET
import os
from os import getcwd
import shutil
sets = ['train', 'val']
classes = ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool',
'hpc_stage_5', 'mixer', 'nozzle', 'nozzle_cone', 'stand']
style = '.png'
# 进行归一化操作
def convert(size, box): # size:(原图w,原图h) , box:(xmin,xmax,ymin,ymax)
dw = 1. / size[0] # 1/w
dh = 1. / size[1] # 1/h
x = (box[0] + box[1]) / 2.0 # 物体在图中的中心点x坐标
y = (box[2] + box[3]) / 2.0 # 物体在图中的中心点y坐标
w = box[1] - box[0] # 物体实际像素宽度
h = box[3] - box[2] # 物体实际像素高度
x = x * dw # 物体中心点x的坐标比(相当于 x/原图w)
w = w * dw # 物体宽度的宽度比(相当于 w/原图w)
y = y * dh # 物体中心点y的坐标比(相当于 y/原图h)
h = h * dh # 物体宽度的宽度比(相当于 h/原图h)
return (x, y, w, h) # 返回 相对于原图的物体中心点的x坐标比,y坐标比,宽度比,高度比,取值范围[0-1]
# year ='2012', 对应图片的id(文件名)
def convert_annotation(image_id):
'''
将对应文件名的xml文件转化为label文件,xml文件包含了对应的bunding框以及图片长款大小等信息,
通过对其解析,然后进行归一化最终读到label文件中去,也就是说
一张图片文件对应一个xml文件,然后通过解析和归一化,能够将对应的信息保存到唯一一个label文件中去
labal文件中的格式:calss x y w h 同时,一张图片对应的类别有多个,所以对应的bunding的信息也有多个
'''
# 对应的通过year 找到相应的文件夹,并且打开相应image_id的xml文件,其对应bund文件
in_file = open('./data/Annotations/%s.xml' % (image_id), encoding='utf-8')
# 准备在对应的image_id 中写入对应的label,分别为
#
out_file = open('./data/labels/%s.txt' % (image_id), 'w', encoding='utf-8')
# 解析xml文件
tree = ET.parse(in_file)
# 获得对应的键值对
root = tree.getroot()
# 获得图片的尺寸大小
size = root.find('size')
# 如果xml内的标记为空,增加判断条件
if size != None:
# 获得宽
w = int(size.find('width').text)
# 获得高
h = int(size.find('height').text)
# 遍历目标obj
for obj in root.iter('object'):
# 获得difficult ??
difficult = obj.find('difficult').text
# 获得类别 =string 类型
cls = obj.find('name').text
# 如果类别不是对应在我们预定好的class文件中,或difficult==1则跳过
if cls not in classes or int(difficult) == 1:
continue
# 通过类别名称找到id
cls_id = classes.index(cls)
# 找到bndbox 对象
xmlbox = obj.find('bndbox')
# 获取对应的bndbox的数组 = ['xmin','xmax','ymin','ymax']
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
float(xmlbox.find('ymax').text))
print(image_id, cls, b)
# 带入进行归一化操作
# w = 宽, h = 高, b= bndbox的数组 = ['xmin','xmax','ymin','ymax']
bb = convert((w, h), b)
# bb 对应的是归一化后的(x,y,w,h)
# 生成 calss x y w h 在label文件中
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
# 返回当前工作目录
wd = getcwd()
print(wd)
# 先找labels文件夹如果不存在则创建
labels = './data/labels'
if os.path.exists(labels):
shutil.rmtree(labels) # delete output folder
os.makedirs(labels) # make new output folder
for image_set in sets:
'''
对所有的文件数据集进行遍历
做了两个工作:
1.将所有图片文件都遍历一遍,并且将其所有的全路径都写在对应的txt文件中去,方便定位
2.同时对所有的图片文件进行解析和转化,将其对应的bundingbox 以及类别的信息全部解析写到label 文件中去
最后再通过直接读取文件,就能找到对应的label 信息
'''
# 读取在ImageSets/Main 中的train、test..等文件的内容
# 包含对应的文件名称
image_ids = open('./data/ImageSets/%s.txt' % (image_set)).read().strip().split()
# 打开对应的2012_train.txt 文件对其进行写入准备
txt_name = './data/%s.txt' % (image_set)
if os.path.exists(txt_name):
os.remove(txt_name)
else:
open(txt_name, 'w')
list_file = open(txt_name, 'w')
# 将对应的文件_id以及全路径写进去并换行
for image_id in image_ids:
list_file.write('data/images/%s%s\n' % (image_id, style))
# 调用 year = 年份 image_id = 对应的文件名_id
convert_annotation(image_id)
# 关闭文件
list_file.close()
voc_label.py主要是将图片数据集标注后的xml文件中的标注信息读取出来并写入txt文件,运行后在labels文件夹中生成所有图片数据集的标注信息,如下图:
同时在data文件夹下生成train和val两个txt文件。
到此,本次训练所需的数据集已经全部准备好了。
首先在data目录下,新建object.yaml,并对object.yaml中的参数进行配置。其中train,val后面分别为训练集和验证集图片的路径, nc为数据集的类别数(我的为10类),names: 换成自己的类别名称。
代码如下:
# COCO 2017 dataset http://cocodataset.org
# Download command: bash yolov5/data/get_coco2017.sh
# Train command: python train.py --data ./data/coco.yaml
# Dataset should be placed next to yolov5 folder:
# /parent_folder
# /coco
# /yolov5
# train and val datasets (image directory or *.txt file with image paths)
train: data/train.txt # 118k images
val: data/val.txt # 5k images
#test: data/test.txt # 20k images for submission to https://competitions.codalab.org/competitions/20794
# number of classes
nc: 10
# class names
names: ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool', 'hpc_stage_5', 'mixer', 'nozzle', 'nozzle_cone', 'stand']
# Print classes
# with open('data/coco.yaml') as f:
# d = yaml.load(f, Loader=yaml.FullLoader) # dict
# for i, x in enumerate(d['names']):
# print(i, x)
在yolov5\models目录下,选择一个模型,我用的是yolov5s.yaml文件,修改该文件,只需要修改nc。我的是10类。
# parameters
nc: 10 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
最后,在根目录中对train.py中的部分参数进行修改,batch-size和workers根据自己电脑的性能进行设置,如下所示:
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='weights/yolov5s.pt', help='initial weights path')
parser.add_argument('--cfg', type=str, default='models/yolov5s.yaml', help='model.yaml path')
parser.add_argument('--data', type=str, default='data/object.yaml', help='data.yaml path')
parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path')
parser.add_argument('--epochs', type=int, default=300)
parser.add_argument('--batch-size', type=int, default=4, help='total batch size for all GPUs')
parser.add_argument('--workers', type=int, default=2, help='maximum number of dataloader workers')
全部配置好后,直接执行train.py文件开始训练。训练好后会在yolov5\runs\train\exp文件夹得到如下文件:
其中best.pt是epoch次训练中得到的最好的一个权重,last.pt是最后一次训练所得的权重。
在训练过程中,运行yolov5\loss_line文件夹中的criterion_line.py,可以实时查看loss等曲线:
loss_line文件夹需要自己创建。
criterion_line.py代码:
import matplotlib.pyplot as plt
import numpy as np
with open('../runs/train/exp/results.txt', 'r') as out_data:
text = out_data.readlines() # 结果为str类型
loss = []
for ss in text:
ss = ss.strip()
ss = ss.split()
strr = ss[2:6] + ss[8:12]
numbers = list(map(float, strr))
loss.append(numbers)
# 0-GIoU, 1-obj, 2-cls, 3-total, 4-P, 5-R, [email protected], [email protected]:.95
loss = np.array(loss)
epoch_n = len(loss)
x = np.linspace(1, epoch_n, epoch_n)
GIoU = loss[:, 0]
obj = loss[:, 1]
cls = loss[:, 2]
total = loss[:, 3]
P = loss[:, 4]
R = loss[:, 5]
mAP_5 = loss[:, 6]
mAP_5_95 = loss[:, 7]
plt.figure(num=1, figsize=(16, 10), )
plt.subplot(4, 2, 1)
plt.plot(x, GIoU, color='red', linewidth=1.0, linestyle='--', label='GIoU')
plt.legend(loc='upper right')
plt.subplot(4, 2, 2)
plt.plot(x, obj, color='red', linewidth=1.0, linestyle='--', label='obj')
plt.legend(loc='upper right')
plt.subplot(4, 2, 3)
plt.plot(x, cls, color='red', linewidth=1.0, linestyle='--', label='cls')
plt.legend(loc='upper right')
plt.subplot(4, 2, 4)
plt.plot(x, total, color='red', linewidth=1.0, linestyle='--', label='total')
plt.legend(loc='upper right')
plt.subplot(4, 2, 5)
plt.plot(x, P, color='red', linewidth=1.0, linestyle='--', label='P')
plt.legend(loc='upper right')
plt.subplot(4, 2, 6)
plt.plot(x, R, color='red', linewidth=1.0, linestyle='--', label='R')
plt.legend(loc='upper right')
plt.subplot(4, 2, 7)
plt.plot(x, mAP_5, color='red', linewidth=1.0, linestyle='--', label='mAP_5')
plt.legend(loc='upper right')
plt.subplot(4, 2, 8)
plt.plot(x, mAP_5_95, color='red', linewidth=1.0, linestyle='--', label='mAP_5_95')
plt.legend(loc='upper right')
plt.show()
在yolov5中新建data_test文件夹,在该文件夹中新建5个文件夹和一个txt文件,如下所示。
并将测试集中图片放在JPEGImages_manual文件夹中,将对应的xml文件放在Annotations_manual中。
在yolov5中新建mAP文件夹,并新建cfg_mAP.py,detect_eval_class_txt.py,compute_mAP.py,mAP_line.py,utils_mAP.py和yolov5_eval.py
代码分别如下:
# -*- coding: utf-8 -*-
import os
from easydict import EasyDict
Cfg = EasyDict()
Cfg.names = ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool', 'hpc_stage_5',
'mixer', 'nozzle', 'nozzle_cone', 'stand']
# 由于原对象的名字太长,绘制在图片上显得很杂乱,所以将名字简写。
Cfg.textnames = ['combustion', 'fan', 'stator', 'core', 'spool', 'stage', 'mixer', 'nozzle', 'cone', 'stand']
Cfg.device = '0,1'
# manual
Cfg.origimgs_filepath = '../data_test/JPEGImages_manual'
Cfg.testimgs_filepath = '../data_test/JPEGImages_manual'
Cfg.eval_classtxt_path = '../data_test/class_txt_manual/'
Cfg.eval_Annotations_path = '../data_test/Annotations_manual'
Cfg.eval_imgs_name_txt = '../data_test/imgs_name_manual.txt'
Cfg.cachedir = '../data_test/cachedir_manual/'
Cfg.prediction_path = '../data_test/predictions_manual'
# mAP_line cachedir
Cfg.systhesis_valid_cachedir = '../data_test/cachedir_systhesis_valid/'
Cfg.manual_cachedir = '../data_test/cachedir_manual/'
import argparse
import os
import platform
import shutil
import time
from pathlib import Path
import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random
from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import (
check_img_size, non_max_suppression, apply_classifier, scale_coords,
xyxy2xywh, plot_one_box, strip_optimizer, set_logging)
from utils.torch_utils import select_device, load_classifier, time_synchronized
from cfg_mAP import Cfg
cfg = Cfg
def detect(save_img=False):
out, source, weights, view_img, save_txt, imgsz = \
opt.output, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt')
# Initialize
set_logging()
device = select_device(opt.device)
if os.path.exists(out):
shutil.rmtree(out) # delete output folder
os.makedirs(out) # make new output folder
half = device.type != 'cpu' # half precision only supported on CUDA
# Load model
model = attempt_load(weights, map_location=device) # load FP32 model
imgsz = check_img_size(imgsz, s=model.stride.max()) # check img_size
if half:
model.half() # to FP16
# Second-stage classifier
classify = False
if classify:
modelc = load_classifier(name='resnet101', n=2) # initialize
modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']) # load weights
modelc.to(device).eval()
# Set Dataloader
vid_path, vid_writer = None, None
if webcam:
view_img = True
cudnn.benchmark = True # set True to speed up constant image size inference
dataset = LoadStreams(source, img_size=imgsz)
else:
save_img = True
dataset = LoadImages(source, img_size=imgsz)
# Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
# Run inference
t0 = time.time()
img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
_ = model(img.half() if half else img) if device.type != 'cpu' else None # run once
test_time=[]
for path, img, im0s, vid_cap in dataset:
# Inference
t1 = time_synchronized()
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float() # uint8 to fp16/32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
img = img.unsqueeze(0)
# # Inference
# t1 = time_synchronized()
pred = model(img, augment=opt.augment)[0]
# Apply NMS
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
t2 = time_synchronized()
# Apply Classifier
if classify:
pred = apply_classifier(pred, modelc, img, im0s)
# Process detections
for i, det in enumerate(pred): # detections per image
if webcam: # batch_size >= 1
p, s, im0 = path[i], '%g: ' % i, im0s[i].copy()
else:
p, s, im0 = path, '', im0s
img_name = Path(p).name
txt = open(opt.eval_imgs_name_txt, 'a')
txt.write(img_name[:-4])
txt.write('\n')
txt.close()
save_path = str(Path(out) / Path(p).name)
txt_path = str(Path(out) / Path(p).stem) + ('_%g' % dataset.frame if dataset.mode == 'video' else '')
s += '%gx%g ' % img.shape[2:] # print string
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
if det is not None and len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
# Print results
for c in det[:, -1].unique():
n = (det[:, -1] == c).sum() # detections per class
s += '%g %ss, ' % (n, names[int(c)]) # add to string
# Write results
for *xyxy, conf, cls in reversed(det):
txt = open(opt.eval_classtxt_path + '/%s' % names[int(cls)], 'a')
obj_conf = conf.cpu().numpy()
xyxy = torch.tensor(xyxy).numpy()
x1 = xyxy[0]
y1 = xyxy[1]
x2 = xyxy[2]
y2 = xyxy[3]
new_box = [img_name[:-4], obj_conf, x1, y1, x2, y2]
txt.write(" ".join([str(a) for a in new_box]))
txt.write('\n')
txt.close()
if save_txt: # Write to file
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
with open(txt_path + '.txt', 'a') as f:
f.write(('%g ' * 5 + '\n') % (cls, *xywh)) # label format
if save_img or view_img: # Add bbox to image
label = '%s %.2f' % (cfg.textnames[int(cls)], conf)
plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)
test_time.append(t2 - t1)
# Print time (inference + NMS)
print('%sDone. (%.3fs)' % (s, t2 - t1))
# Stream results
if view_img:
cv2.imshow(p, im0)
if cv2.waitKey(1) == ord('q'): # q to quit
raise StopIteration
# Save results (image with detections)
if save_img:
if dataset.mode == 'images':
cv2.imwrite(save_path, im0)
else:
if vid_path != save_path: # new video
vid_path = save_path
if isinstance(vid_writer, cv2.VideoWriter):
vid_writer.release() # release previous video writer
fourcc = 'mp4v' # output video codec
fps = vid_cap.get(cv2.CAP_PROP_FPS)
w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h))
vid_writer.write(im0)
if save_txt or save_img:
print('Results saved to %s' % Path(out))
if platform.system() == 'Darwin' and not opt.update: # MacOS
os.system('open ' + save_path)
print('Done. (%.3fs)' % (time.time() - t0))
mean_time=sum(test_time)/len(test_time)
print('mean time:', mean_time)
print('frame: ', 1/mean_time)
if __name__ == '__main__':
dir = '../data_test/imgs_name_manual.txt'
if os.path.exists(dir):
os.remove(dir)
else:
open(dir, 'w')
predictions_manual='../data_test/predictions_manual'
class_txt_manual='../data_test/class_txt_manual'
cachedir_manual='../data_test/cachedir_manual'
if os.path.exists(predictions_manual):
shutil.rmtree(predictions_manual) # delete output folder
os.makedirs(predictions_manual) # make new output folder
if os.path.exists(class_txt_manual):
shutil.rmtree(class_txt_manual) # delete output folder
os.makedirs(class_txt_manual) # make new output folder
if os.path.exists(cachedir_manual):
shutil.rmtree(cachedir_manual) # delete output folder
os.makedirs(cachedir_manual) # make new output folder
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='../runs/train/exp/weights/last.pt', help='model.pt path(s)')
parser.add_argument('--source', type=str, default='../data_test/JPEGImages_manual',
help='source') # file/folder, 0 for webcam
parser.add_argument('--output', type=str, default='../data_test/predictions_manual',
help='output folder') # output folder
parser.add_argument('--eval_imgs_name_txt', type=str, default='../data_test/imgs_name_manual.txt',
help='output folder') # output folder
parser.add_argument('--eval_classtxt_path', type=str, default='../data_test/class_txt_manual',
help='output folder') # output folder
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.4, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.5, help='IOU threshold for NMS')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='display results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
opt = parser.parse_args()
print(opt)
with torch.no_grad():
if opt.update: # update all models (to fix SourceChangeWarning)
for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
detect()
strip_optimizer(opt.weights)
else:
detect()
此外需要将plot_one_box函数放入yolov5\utils\general.py中。
def plot_one_box(x, img, color=None, label=None, line_thickness=None):
# Plots one bounding box on image img
tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 # line/font thickness
color = color or [random.randint(0, 255) for _ in range(3)]
c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
if label:
tf = max(tl - 1, 1) # font thickness
t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled
cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
# -*- coding: utf-8 -*-
import os
import numpy as np
from yolov5_eval import yolov5_eval # 注意将yolov4_eval.py和compute_mAP.py放在同一级目录下
from cfg_mAP import Cfg
import pickle
import shutil
cfg = Cfg
eval_classtxt_path = cfg.eval_classtxt_path # 各类txt文件路径
eval_classtxt_files = os.listdir(eval_classtxt_path)
classes = cfg.names # ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool', 'hpc_stage_5','mixer', 'nozzle', 'nozzle_cone', 'stand']
aps = [] # 保存各类ap
cls_rec = {} # 保存recall
cls_prec = {} # 保存精度
cls_ap = {}
annopath = cfg.eval_Annotations_path + '/{:s}.xml' # annotations的路径,{:s}.xml方便后面根据图像名字读取对应的xml文件
imagesetfile = cfg.eval_imgs_name_txt # 读取图像名字列表文件
cachedir = cfg.cachedir
if os.path.exists(cachedir):
shutil.rmtree(cachedir) # delete output folder
os.makedirs(cachedir) # make new output folder
for cls in eval_classtxt_files: # 读取cls类对应的txt文件
filename = eval_classtxt_path + cls
rec, prec, ap = yolov5_eval( # yolov4_eval.py计算cls类的recall precision ap
filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
use_07_metric=False)
aps += [ap]
cls_ap[cls] = ap
cls_rec[cls] = rec[-1]
cls_prec[cls] = prec[-1]
print('AP for {} = {:.4f}'.format(cls, ap))
print('recall for {} = {:.4f}'.format(cls, rec[-1]))
print('precision for {} = {:.4f}'.format(cls, prec[-1]))
with open(os.path.join(cfg.cachedir, 'cls_ap.pkl'), 'wb') as in_data:
pickle.dump(cls_ap, in_data, pickle.HIGHEST_PROTOCOL)
with open(os.path.join(cfg.cachedir, 'cls_rec.pkl'), 'wb') as in_data:
pickle.dump(cls_rec, in_data, pickle.HIGHEST_PROTOCOL)
with open(os.path.join(cfg.cachedir, 'cls_prec.pkl'), 'wb') as in_data:
pickle.dump(cls_prec, in_data, pickle.HIGHEST_PROTOCOL)
print('Mean AP = {:.4f}'.format(np.mean(aps)))
print('~~~~~~~~')
print('Results:')
for ap in aps:
print('{:.3f}'.format(ap))
print('~~~~~~~~')
print('{:.3f}'.format(np.mean(aps)))
print('~~~~~~~~')
import os
import matplotlib.pyplot as plt
import numpy as np
import pickle
from cfg_mAP import Cfg
cfg = Cfg
x = np.linspace(1, 10, 10)
ap_systhesis_valid = []
ap_manual = []
plt.figure(num=1, figsize=(8, 5), )
with open(os.path.join(cfg.manual_cachedir, 'cls_ap.pkl'), 'rb') as out_data:
# 按保存变量的顺序加载变量
manual_cls_ap = pickle.load(out_data)
print(manual_cls_ap) # dataList
print(len(manual_cls_ap)) # dataList
for cls in cfg.names:
if cls in manual_cls_ap.keys():
ap_manual.append(manual_cls_ap[cls])
else:
ap_manual.append(0.0)
print('ap_manual: ', ap_manual)
manual_mAP = np.mean(ap_manual)
l2, = plt.plot(x, ap_manual, color='k', linewidth=1.0, linestyle='-.', label='manual_AP')
plt.scatter(x, ap_manual, s=10, color='k')
for x1, y1 in zip(x, ap_manual):
plt.text(x1, y1, '%s' % str('{0:.3f}'.format(y1)), fontdict={'fontsize': 14}, verticalalignment="bottom",
horizontalalignment="center")
plt.annotate(r'manual_mAP=%s' % str('{0:.3f}'.format(manual_mAP)), xy=(5, manual_mAP), xycoords='data',
xytext=(0.0, 0.0),
textcoords='offset points', fontsize=13, )
plt.xticks(np.linspace(1, 10, 10),
[r'combustion_lining', r'fan', r'fan_support', r'hp_core_casing', r'hpc_spool',
r'hpc_stage5', r'mixer', r'nozzle', r'nozzle_cone', r'stand'])
plt.legend(handles=[l2], loc='best')
plt.show()
import sys
import os
import time
import math
import torch
import numpy as np
from PIL import Image, ImageDraw, ImageFont
from torch.autograd import Variable
import itertools
import struct # get_image_size
import imghdr # get_image_size
def sigmoid(x):
return 1.0 / (np.exp(-x) + 1.)
def softmax(x):
x = np.exp(x - np.expand_dims(np.max(x, axis=1), axis=1))
x = x / np.expand_dims(x.sum(axis=1), axis=1)
return x
def bbox_iou(box1, box2, x1y1x2y2=True):
if x1y1x2y2:
mx = min(box1[0], box2[0])
Mx = max(box1[2], box2[2])
my = min(box1[1], box2[1])
My = max(box1[3], box2[3])
w1 = box1[2] - box1[0]
h1 = box1[3] - box1[1]
w2 = box2[2] - box2[0]
h2 = box2[3] - box2[1]
else:
mx = min(box1[0] - box1[2] / 2.0, box2[0] - box2[2] / 2.0)
Mx = max(box1[0] + box1[2] / 2.0, box2[0] + box2[2] / 2.0)
my = min(box1[1] - box1[3] / 2.0, box2[1] - box2[3] / 2.0)
My = max(box1[1] + box1[3] / 2.0, box2[1] + box2[3] / 2.0)
w1 = box1[2]
h1 = box1[3]
w2 = box2[2]
h2 = box2[3]
uw = Mx - mx
uh = My - my
cw = w1 + w2 - uw
ch = h1 + h2 - uh
carea = 0
if cw <= 0 or ch <= 0:
return 0.0
area1 = w1 * h1
area2 = w2 * h2
carea = cw * ch
uarea = area1 + area2 - carea
return carea / uarea
def bbox_ious(boxes1, boxes2, x1y1x2y2=True):
if x1y1x2y2:
mx = torch.min(boxes1[0], boxes2[0])
Mx = torch.max(boxes1[2], boxes2[2])
my = torch.min(boxes1[1], boxes2[1])
My = torch.max(boxes1[3], boxes2[3])
w1 = boxes1[2] - boxes1[0]
h1 = boxes1[3] - boxes1[1]
w2 = boxes2[2] - boxes2[0]
h2 = boxes2[3] - boxes2[1]
else:
mx = torch.min(boxes1[0] - boxes1[2] / 2.0, boxes2[0] - boxes2[2] / 2.0)
Mx = torch.max(boxes1[0] + boxes1[2] / 2.0, boxes2[0] + boxes2[2] / 2.0)
my = torch.min(boxes1[1] - boxes1[3] / 2.0, boxes2[1] - boxes2[3] / 2.0)
My = torch.max(boxes1[1] + boxes1[3] / 2.0, boxes2[1] + boxes2[3] / 2.0)
w1 = boxes1[2]
h1 = boxes1[3]
w2 = boxes2[2]
h2 = boxes2[3]
uw = Mx - mx
uh = My - my
cw = w1 + w2 - uw
ch = h1 + h2 - uh
mask = ((cw <= 0) + (ch <= 0) > 0)
area1 = w1 * h1
area2 = w2 * h2
carea = cw * ch
carea[mask] = 0
uarea = area1 + area2 - carea
return carea / uarea
def nms(_boxes, _nms_thresh):
if len(_boxes) == 0:
return _boxes
det_confs = torch.zeros(len(_boxes))
for i in range(len(_boxes)):
det_confs[i] = 1 - _boxes[i][4]
_, sortIds = torch.sort(det_confs)
out_boxes = []
for i in range(len(_boxes)):
box_i = _boxes[sortIds[i]]
if box_i[4] > 0:
out_boxes.append(box_i)
for j in range(i + 1, len(_boxes)):
box_j = _boxes[sortIds[j]]
if bbox_iou(box_i, box_j, x1y1x2y2=False) > _nms_thresh:
# print(box_i, box_j, bbox_iou(box_i, box_j, x1y1x2y2=False))
box_j[4] = 0
return out_boxes
def convert2cpu(gpu_matrix):
return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix)
def convert2cpu_long(gpu_matrix):
return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)
def get_region_boxes_in_model(output, conf_thresh, num_classes, anchors, num_anchors, only_objectness=1,
validation=False):
anchor_step = len(anchors) // num_anchors
if output.dim() == 3:
output = output.unsqueeze(0)
batch = output.size(0)
assert (output.size(1) == (5 + num_classes) * num_anchors)
h = output.size(2)
w = output.size(3)
t0 = time.time()
all_boxes = []
output = output.view(batch * num_anchors, 5 + num_classes, h * w).transpose(0, 1).contiguous().view(5 + num_classes,
batch * num_anchors * h * w)
grid_x = torch.linspace(0, w - 1, w).repeat(h, 1).repeat(batch * num_anchors, 1, 1).view(
batch * num_anchors * h * w).type_as(output) # cuda()
grid_y = torch.linspace(0, h - 1, h).repeat(w, 1).t().repeat(batch * num_anchors, 1, 1).view(
batch * num_anchors * h * w).type_as(output) # cuda()
xs = torch.sigmoid(output[0]) + grid_x
ys = torch.sigmoid(output[1]) + grid_y
anchor_w = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([0]))
anchor_h = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([1]))
anchor_w = anchor_w.repeat(batch, 1).repeat(1, 1, h * w).view(batch * num_anchors * h * w).type_as(output) # cuda()
anchor_h = anchor_h.repeat(batch, 1).repeat(1, 1, h * w).view(batch * num_anchors * h * w).type_as(output) # cuda()
ws = torch.exp(output[2]) * anchor_w
hs = torch.exp(output[3]) * anchor_h
det_confs = torch.sigmoid(output[4])
cls_confs = torch.nn.Softmax()(Variable(output[5:5 + num_classes].transpose(0, 1))).data
cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
cls_max_confs = cls_max_confs.view(-1)
cls_max_ids = cls_max_ids.view(-1)
t1 = time.time()
sz_hw = h * w
sz_hwa = sz_hw * num_anchors
det_confs = convert2cpu(det_confs)
cls_max_confs = convert2cpu(cls_max_confs)
cls_max_ids = convert2cpu_long(cls_max_ids)
xs = convert2cpu(xs)
ys = convert2cpu(ys)
ws = convert2cpu(ws)
hs = convert2cpu(hs)
if validation:
cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
t2 = time.time()
for b in range(batch):
boxes = []
for cy in range(h):
for cx in range(w):
for i in range(num_anchors):
ind = b * sz_hwa + i * sz_hw + cy * w + cx
det_conf = det_confs[ind]
if only_objectness:
conf = det_confs[ind]
else:
conf = det_confs[ind] * cls_max_confs[ind]
if conf > conf_thresh:
bcx = xs[ind]
bcy = ys[ind]
bw = ws[ind]
bh = hs[ind]
cls_max_conf = cls_max_confs[ind]
cls_max_id = cls_max_ids[ind]
box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]
if (not only_objectness) and validation:
for c in range(num_classes):
tmp_conf = cls_confs[ind][c]
if c != cls_max_id and det_confs[ind] * tmp_conf > conf_thresh:
box.append(tmp_conf)
box.append(c)
boxes.append(box)
all_boxes.append(boxes)
t3 = time.time()
if False:
print('---------------------------------')
print('matrix computation : %f' % (t1 - t0))
print(' gpu to cpu : %f' % (t2 - t1))
print(' tpz filter : %f' % (t3 - t2))
print('---------------------------------')
return all_boxes
def get_region_boxes_out_model(_output, _cfg, _anchors, _num_anchors, _only_objectness=1, _validation=False):
anchor_step = len(_anchors) // _num_anchors
if len(_output.shape) == 3:
_output = np.expand_dims(_output, axis=0)
batch = _output.shape[0]
assert (_output.shape[1] == (5 + _cfg.classes) * _num_anchors)
h = _output.shape[2]
w = _output.shape[3]
t0 = time.time()
all_boxes = []
_output = _output.reshape(batch * _num_anchors, 5 + _cfg.classes, h * w).transpose((1, 0, 2)).reshape(
5 + _cfg.classes,
batch * _num_anchors * h * w)
grid_x = np.expand_dims(np.expand_dims(np.linspace(0, w - 1, w), axis=0).repeat(h, 0), axis=0).repeat(
batch * _num_anchors, axis=0).reshape(
batch * _num_anchors * h * w)
grid_y = np.expand_dims(np.expand_dims(np.linspace(0, h - 1, h), axis=0).repeat(w, 0).T, axis=0).repeat(
batch * _num_anchors, axis=0).reshape(
batch * _num_anchors * h * w)
xs = sigmoid(_output[0]) + grid_x
ys = sigmoid(_output[1]) + grid_y
anchor_w = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 0]
anchor_h = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 1]
anchor_w = np.expand_dims(np.expand_dims(anchor_w, axis=1).repeat(batch, 1), axis=2) \
.repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
anchor_h = np.expand_dims(np.expand_dims(anchor_h, axis=1).repeat(batch, 1), axis=2) \
.repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
ws = np.exp(_output[2]) * anchor_w
hs = np.exp(_output[3]) * anchor_h
det_confs = sigmoid(_output[4])
cls_confs = softmax(_output[5:5 + _cfg.classes].transpose(1, 0))
cls_max_confs = np.max(cls_confs, 1)
cls_max_ids = np.argmax(cls_confs, 1)
t1 = time.time()
sz_hw = h * w
sz_hwa = sz_hw * _num_anchors
t2 = time.time()
for b in range(batch):
boxes = []
for cy in range(h):
for cx in range(w):
for i in range(_num_anchors):
ind = b * sz_hwa + i * sz_hw + cy * w + cx
det_conf = det_confs[ind]
if _only_objectness:
conf = det_confs[ind]
else:
conf = det_confs[ind] * cls_max_confs[ind]
if conf > _cfg.conf_thresh:
bcx = xs[ind]
bcy = ys[ind]
bw = ws[ind]
bh = hs[ind]
cls_max_conf = cls_max_confs[ind]
cls_max_id = cls_max_ids[ind]
box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]
if (not _only_objectness) and _validation:
for c in range(_cfg.classes):
tmp_conf = cls_confs[ind][c]
if c != cls_max_id and det_confs[ind] * tmp_conf > _cfg.conf_thresh:
box.append(tmp_conf)
box.append(c)
boxes.append(box)
all_boxes.append(boxes)
t3 = time.time()
if False:
print('---------------------------------')
print('matrix computation : %f' % (t1 - t0))
print(' gpu to cpu : %f' % (t2 - t1))
print(' tpz filter : %f' % (t3 - t2))
print('---------------------------------')
return all_boxes
def get_classtxt_out_model(_output, _cfg, _anchors, _num_anchors, _only_objectness=1, _validation=False):
anchor_step = len(_anchors) // _num_anchors
if len(_output.shape) == 3:
_output = np.expand_dims(_output, axis=0)
batch = _output.shape[0]
assert (_output.shape[1] == (5 + _cfg.n_classes) * _num_anchors)
h = _output.shape[2]
w = _output.shape[3]
t0 = time.time()
all_boxes = []
_output = _output.reshape(batch * _num_anchors, 5 + _cfg.n_classes, h * w).transpose((1, 0, 2)).reshape(
5 + _cfg.n_classes,
batch * _num_anchors * h * w)
grid_x = np.expand_dims(np.expand_dims(np.linspace(0, w - 1, w), axis=0).repeat(h, 0), axis=0).repeat(
batch * _num_anchors, axis=0).reshape(
batch * _num_anchors * h * w)
grid_y = np.expand_dims(np.expand_dims(np.linspace(0, h - 1, h), axis=0).repeat(w, 0).T, axis=0).repeat(
batch * _num_anchors, axis=0).reshape(
batch * _num_anchors * h * w)
xs = sigmoid(_output[0]) + grid_x
ys = sigmoid(_output[1]) + grid_y
anchor_w = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 0]
anchor_h = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 1]
anchor_w = np.expand_dims(np.expand_dims(anchor_w, axis=1).repeat(batch, 1), axis=2) \
.repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
anchor_h = np.expand_dims(np.expand_dims(anchor_h, axis=1).repeat(batch, 1), axis=2) \
.repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
ws = np.exp(_output[2]) * anchor_w
hs = np.exp(_output[3]) * anchor_h
det_confs = sigmoid(_output[4])
cls_confs = softmax(_output[5:5 + _cfg.n_classes].transpose(1, 0))
cls_max_confs = np.max(cls_confs, 1)
cls_max_ids = np.argmax(cls_confs, 1)
t1 = time.time()
sz_hw = h * w
sz_hwa = sz_hw * _num_anchors
t2 = time.time()
for b in range(batch):
boxes = []
for cy in range(h):
for cx in range(w):
for i in range(_num_anchors):
ind = b * sz_hwa + i * sz_hw + cy * w + cx
det_conf = det_confs[ind]
if _only_objectness:
conf = det_confs[ind]
else:
conf = det_confs[ind] * cls_max_confs[ind]
if conf > _cfg.conf_thresh:
bcx = xs[ind]
bcy = ys[ind]
bw = ws[ind]
bh = hs[ind]
cls_max_conf = cls_max_confs[ind]
cls_max_id = cls_max_ids[ind]
box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]
if (not _only_objectness) and _validation:
for c in range(_cfg.classes):
tmp_conf = cls_confs[ind][c]
if c != cls_max_id and det_confs[ind] * tmp_conf > _cfg.conf_thresh:
box.append(tmp_conf)
box.append(c)
boxes.append(box)
all_boxes.append(boxes)
t3 = time.time()
if False:
print('---------------------------------')
print('matrix computation : %f' % (t1 - t0))
print(' gpu to cpu : %f' % (t2 - t1))
print(' tpz filter : %f' % (t3 - t2))
print('---------------------------------')
return all_boxes
def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None):
import cv2
colors = torch.FloatTensor([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]]);
def get_color(c, x, max_val):
ratio = float(x) / max_val * 5
i = int(math.floor(ratio))
j = int(math.ceil(ratio))
ratio = ratio - i
r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
return int(r * 255)
width = img.shape[1]
height = img.shape[0]
for i in range(len(boxes)):
box = boxes[i]
x1 = int((box[0] - box[2] / 2.0) * width)
y1 = int((box[1] - box[3] / 2.0) * height)
x2 = int((box[0] + box[2] / 2.0) * width)
y2 = int((box[1] + box[3] / 2.0) * height)
if color:
rgb = color
else:
rgb = (255, 0, 0)
if len(box) >= 7 and class_names:
cls_conf = box[5]
cls_id = box[6]
print('%s: %f' % (class_names[cls_id], cls_conf))
classes = len(class_names)
offset = cls_id * 123457 % classes
red = get_color(2, offset, classes)
green = get_color(1, offset, classes)
blue = get_color(0, offset, classes)
if color is None:
rgb = (red, green, blue)
img = cv2.putText(img, class_names[cls_id], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 1)
img = cv2.rectangle(img, (x1, y1), (x2, y2), rgb, 1)
if savename:
print("save plot results to %s" % savename)
cv2.imwrite(savename, img)
return img
def plot_boxes(_img, _boxes, _savename=None, _class_names=None):
font = ImageFont.truetype("consola.ttf", 40, encoding="unic") # 设置字体
colors = torch.FloatTensor([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]]);
def get_color(c, x, max_val):
ratio = float(x) / max_val * 5
i = int(math.floor(ratio))
j = int(math.ceil(ratio))
ratio = ratio - i
r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
return int(r * 255)
# width = _img.shape[1]
# height = _img.shape[0]
draw = ImageDraw.Draw(_img)
for i in range(len(_boxes)):
box = _boxes[i]
x1 = box[0]
y1 = box[1]
x2 = box[2]
y2 = box[3]
rgb = (255, 0, 0)
if len(box) >= 7 and _class_names:
cls_conf = box[5]
cls_id = box[6]
print('%s: %f' % (_class_names[cls_id], cls_conf))
classes = len(_class_names)
offset = cls_id * 123457 % classes
red = get_color(2, offset, classes)
green = get_color(1, offset, classes)
blue = get_color(0, offset, classes)
rgb = (red, green, blue)
# draw.text((x1, y1), _class_names[cls_id], fill=rgb, font=font)
draw.text((x1, y1), _class_names[cls_id], fill=rgb, font=font)
draw.rectangle([x1, y1, x2, y2], outline=rgb, width=5)
if _savename:
print("save plot results to %s" % _savename)
_img.save(_savename)
return _img
def read_truths(lab_path):
if not os.path.exists(lab_path):
return np.array([])
if os.path.getsize(lab_path):
truths = np.loadtxt(lab_path)
truths = truths.reshape(truths.size / 5, 5) # to avoid single truth problem
return truths
else:
return np.array([])
def load_class_names(_namesfile):
class_names = []
with open(_namesfile, 'r') as fp:
lines = fp.readlines()
for line in lines:
line = line.rstrip()
class_names.append(line)
return class_names
def do_detect(_model, _img, _cfg, _use_cuda=1):
_model.eval()
t0 = time.time()
if isinstance(_img, Image.Image):
width = _img.width
height = _img.height
img = torch.ByteTensor(torch.ByteStorage.from_buffer(_img.tobytes()))
img = img.view(height, width, 3).transpose(0, 1).transpose(0, 2).contiguous()
img = img.view(1, 3, height, width)
img = img.float().div(255.0)
elif type(_img) == np.ndarray and len(_img.shape) == 3: # cv2 image
img = torch.from_numpy(_img.transpose(2, 0, 1)).float().div(255.0).unsqueeze(0)
elif type(_img) == np.ndarray and len(_img.shape) == 4:
img = torch.from_numpy(_img.transpose(0, 3, 1, 2)).float().div(255.0)
else:
print("unknow image type")
exit(-1)
t1 = time.time()
if _use_cuda:
img = img.cuda()
img = torch.autograd.Variable(img)
t2 = time.time()
list_features = _model(img)
list_features_numpy = []
for feature in list_features:
list_features_numpy.append(feature.data.cup().numpy())
return post_processing(_img=img, _cfg=_cfg, _list_features_numpy=list_features_numpy, _t0=t0, _t1=t1, _t2=t2)
def post_processing(_img, _cfg, _list_features_numpy, _t0, _t1, _t2):
anchor_step = len(_cfg.anchors) // _cfg.num_anchors
boxes = []
for i in range(3):
masked_anchors = []
for m in _cfg.anchor_masks[i]:
masked_anchors += _cfg.anchors[m * anchor_step:(m + 1) * anchor_step]
masked_anchors = [anchor / _cfg.strides[i] for anchor in masked_anchors]
boxes.append(get_region_boxes_out_model(_output=_list_features_numpy[i], _cfg=_cfg, _anchors=masked_anchors,
_num_anchors=len(_cfg.anchor_masks[i])))
if _img.shape[0] > 1:
bboxs_for_imgs = [
boxes[0][index] + boxes[1][index] + boxes[2][index]
for index in range(_img.shape[0])]
# 分别对每一张图片的结果进行nms
t3 = time.time()
boxes = [nms(_boxes=bboxs, _nms_thresh=_cfg.nms_thresh) for bboxs in bboxs_for_imgs]
else:
boxes = boxes[0][0] + boxes[1][0] + boxes[2][0]
t3 = time.time()
boxes = nms(boxes, _cfg.nms_thresh)
t4 = time.time()
if True:
print('-----------------------------------')
print(' image to tensor : %f' % (_t1 - _t0))
print(' tensor to cuda : %f' % (_t2 - _t1))
print(' predict : %f' % (t3 - _t2))
print(' nms : %f' % (t4 - t3))
print(' total : %f' % (t4 - _t0))
print('-----------------------------------')
return boxes
def classtxt_processing(_img, _cfg, _list_features_numpy, _t0, _t1, _t2):
anchor_step = len(_cfg.anchors) // _cfg.num_anchors
boxes = []
for i in range(3):
masked_anchors = []
for m in _cfg.anchor_masks[i]:
masked_anchors += _cfg.anchors[m * anchor_step:(m + 1) * anchor_step]
masked_anchors = [anchor / _cfg.strides[i] for anchor in masked_anchors]
boxes.append(get_classtxt_out_model(_output=_list_features_numpy[i], _cfg=_cfg, _anchors=masked_anchors,
_num_anchors=len(_cfg.anchor_masks[i])))
if _img.shape[0] > 1:
bboxs_for_imgs = [
boxes[0][index] + boxes[1][index] + boxes[2][index]
for index in range(_img.shape[0])]
# 分别对每一张图片的结果进行nms
t3 = time.time()
boxes = [nms(_boxes=bboxs, _nms_thresh=_cfg.nms_thresh) for bboxs in bboxs_for_imgs]
else:
boxes = boxes[0][0] + boxes[1][0] + boxes[2][0]
t3 = time.time()
boxes = nms(boxes, _cfg.nms_thresh)
t4 = time.time()
if True:
print('-----------------------------------')
print(' image to tensor : %f' % (_t1 - _t0))
print(' tensor to cuda : %f' % (_t2 - _t1))
print(' predict : %f' % (t3 - _t2))
print(' nms : %f' % (t4 - t3))
print(' total : %f' % (t4 - _t0))
print('-----------------------------------')
return boxes
def gen_cls_txt(_model, _img, _cfg, _use_cuda):
_model.eval()
t0 = time.time()
if isinstance(_img, Image.Image):
width = _img.width
height = _img.height
img = torch.ByteTensor(torch.ByteStorage.from_buffer(_img.tobytes()))
img = img.view(height, width, 3).transpose(0, 1).transpose(0, 2).contiguous()
img = img.view(1, 3, height, width)
img = img.float().div(255.0)
elif type(_img) == np.ndarray and len(_img.shape) == 3: # cv2 image
img = torch.from_numpy(_img.transpose(2, 0, 1)).float().div(255.0).unsqueeze(0)
elif type(_img) == np.ndarray and len(_img.shape) == 4:
img = torch.from_numpy(_img.transpose(0, 3, 1, 2)).float().div(255.0)
else:
print("unknow image type")
exit(-1)
t1 = time.time()
if _use_cuda:
img = img.cuda()
img = torch.autograd.Variable(img)
t2 = time.time()
list_features = _model(img)
list_features_numpy = []
for feature in list_features:
list_features_numpy.append(feature.data.cpu().numpy())
return classtxt_processing(_img=img, _cfg=_cfg, _list_features_numpy=list_features_numpy, _t0=t0, _t1=t1, _t2=t2)
# -*- coding: utf-8 -*-
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Bharath Hariharan
# --------------------------------------------------------
import xml.etree.ElementTree as ET
import os
import pickle
import numpy as np
def parse_rec(filename):
""" Parse a PASCAL VOC xml file """
tree = ET.parse(filename)
objects = []
for obj in tree.findall('object'):
obj_struct = {}
obj_struct['name'] = (obj.find('name').text).replace(" ", "")
obj_struct['pose'] = obj.find('pose').text
obj_struct['truncated'] = int(obj.find('truncated').text)
obj_struct['difficult'] = int(obj.find('difficult').text)
bbox = obj.find('bndbox')
obj_struct['bbox'] = [int(bbox.find('xmin').text),
int(bbox.find('ymin').text),
int(bbox.find('xmax').text),
int(bbox.find('ymax').text)]
objects.append(obj_struct)
return objects
def voc_ap(rec, prec, use_07_metric=False): # voc2007的计算方式和voc2012的计算方式不同,目前一般采用第二种
""" ap = voc_ap(rec, prec, [use_07_metric])
Compute VOC AP given precision and recall.
If use_07_metric is true, uses the
VOC 07 11 point method (default:False).
"""
if use_07_metric:
# 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
## 程序入口
def yolov5_eval(detpath, # 保存检测到的目标框的文件路径,每一类的目标框单独保存在一个文件
annopath, # Annotations的路径
imagesetfile, # 测试图片名字列表
classname, # 类别名称
cachedir, # 缓存文件夹
ovthresh=0.5, # IoU阈值
use_07_metric=False): # mAP计算方法
"""rec, prec, ap = voc_eval(eval_classtxt_path,
annopath,
imagesetfile,
classname,
[ovthresh],
[use_07_metric])
Top level function that does the PASCAL VOC evaluation.
eval_classtxt_path: Path to detections
eval_classtxt_path.format(classname) should produce the detection results file.
annopath: Path to annotations
annopath.format(imagename) should be the xml annotations file.
imagesetfile: Text file containing the list of images, one image per line.
classname: Category name (duh)
cachedir: Directory for caching the annotations
[ovthresh]: Overlap threshold (default = 0.5)
[use_07_metric]: Whether to use VOC07's 11 point AP computation
(default False)
"""
# assumes detections are in eval_classtxt_path.format(classname)
# assumes annotations are in annopath.format(imagename)
# assumes imagesetfile is a text file with each line an image name
# cachedir caches the annotations in a pickle file
# first load gt 获取真实目标框
# 当程序第一次运行时,会读取Annotations下的xml文件获取每张图片中真实的目标框
# 然后把获取的结果保存在annotations_cache文件夹中
# 以后再次运行时直接从缓存文件夹中读取真实目标
if not os.path.isdir(cachedir):
os.mkdir(cachedir)
cachefile = os.path.join(cachedir, 'annots.pkl')
# read list of images
with open(imagesetfile, 'r') as f:
lines = f.readlines()
imagenames = [x.strip() for x in lines]
if not os.path.isfile(cachefile):
# load annots
recs = {}
for i, imagename in enumerate(imagenames):
recs[imagename] = parse_rec(annopath.format(imagename))
if i % 100 == 0:
print('Reading annotation for {:d}/{:d}'.format(i + 1, len(imagenames)))
# save
print('Saving cached annotations to {:s}'.format(cachefile))
# with open(cachefile, 'w') as cls:
# pickle.dump(recs, cls)
with open(cachefile, 'wb') as f:
pickle.dump(recs, f)
else:
# load
with open(cachefile, 'rb') as f:
recs = pickle.load(f)
# extract gt objects for this class 提取该类的真实目标
class_recs = {}
npos = 0 # 保存该类一共有多少真实目标
for imagename in imagenames:
R = [obj for obj in recs[imagename] if obj['name'] == classname] # 保存名字为imagename的图片中,类别为classname的目标框的信息
bbox = np.array([x['bbox'] for x in R]) # 目标框的坐标
difficult = np.array([x['difficult'] for x in R]).astype(np.bool) # 是否是难以识别的目标
det = [False] * len(R) # 每一个目标框对应一个det[i],用来判断该目标框是否已经处理过
npos = npos + sum(~difficult) # 计算总的目标个数
class_recs[imagename] = {'bbox': bbox, # 把每一张图像中的目标框信息放到class_recs中
'difficult': difficult,
'det': det}
# read dets
detfile = detpath.format(classname) # 打开classname类别检测到的目标框文件
with open(detfile, 'r') as f:
lines = f.readlines()
splitlines = [x.strip().split(' ') for x in lines]
image_ids = [x[0] for x in splitlines] # 图像名字
confidence = np.array([float(x[1]) for x in splitlines]) # 置信度
BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) # 目标框坐标
# sort by confidence 按照置信度排序
sorted_ind = np.argsort(-confidence)
sorted_scores = np.sort(-confidence)
BB = BB[sorted_ind, :]
image_ids = [image_ids[x] for x in sorted_ind]
# go down dets and mark TPs and FPs
nd = len(image_ids) # 统计检测到的目标框个数
tp = np.zeros(nd) # 创建tp列表,列表长度为目标框个数
fp = np.zeros(nd) # 创建fp列表,列表长度为目标框个数
for d in range(nd):
R = class_recs[image_ids[d]] # 得到图像名字为image_ids[d]真实的目标框信息
bb = BB[d, :].astype(float) # 得到图像名字为image_ids[d]检测的目标框坐标
ovmax = -np.inf
BBGT = R['bbox'].astype(float) # 得到图像名字为image_ids[d]真实的目标框坐标
if BBGT.size > 0:
# compute overlaps 计算IoU
# intersection
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
# union
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
overlaps = inters / uni
ovmax = np.max(overlaps) # 检测到的目标框可能预若干个真实目标框都有交集,选择其中交集最大的
jmax = np.argmax(overlaps)
if ovmax > ovthresh: # IoU是否大于阈值
if not R['difficult'][jmax]: # 真实目标框是否难以识别
if not R['det'][jmax]: # 该真实目标框是否已经统计过
tp[d] = 1. # 将tp对应第d个位置变成1
R['det'][jmax] = 1 # 将该真实目标框做标记
else:
fp[d] = 1. # 否则将fp对应的位置变为1
else:
fp[d] = 1. # 否则将fp对应的位置变为1
# compute precision recall
fp = np.cumsum(fp) # 按列累加,最大值即为tp数量
tp = np.cumsum(tp) # 按列累加,最大值即为fp数量
rec = tp / float(npos) # 计算recall
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) # 计算精度
ap = voc_ap(rec, prec, use_07_metric) # 计算ap
return rec, prec, ap
首先运行detect_eval_class_txt.py
在yolov5\data_test\predictions_manual文件夹中保存了测试结果图片,在imgs_name_manual.txt文件中记录测试图片名称,在yolov5\data_test\class_txt_manual文件夹中保存了每一类的结果信息。
然后运行compute_mAP.py
输出各类的AP、recall、precision 以及 mAP.
"D:\Program Files\Python38\python.exe" E:/tpz/yolov5/mAP/compute_mAP.py
Reading annotation for 1/712
Reading annotation for 101/712
Reading annotation for 201/712
Reading annotation for 301/712
Reading annotation for 401/712
Reading annotation for 501/712
Reading annotation for 601/712
Reading annotation for 701/712
Saving cached annotations to ../data_test/cachedir_manual/annots.pkl
AP for combustion_lining = 0.9992
recall for combustion_lining = 1.0000
precision for combustion_lining = 0.9951
AP for fan = 0.9968
recall for fan = 0.9968
precision for fan = 1.0000
AP for fan_stator_casing_and_support = 0.9995
recall for fan_stator_casing_and_support = 1.0000
precision for fan_stator_casing_and_support = 0.9918
AP for hpc_spool = 1.0000
recall for hpc_spool = 1.0000
precision for hpc_spool = 0.9950
AP for hpc_stage_5 = 0.9967
recall for hpc_stage_5 = 0.9967
precision for hpc_stage_5 = 0.9918
AP for hp_core_casing = 0.9951
recall for hp_core_casing = 0.9967
precision for hp_core_casing = 0.9870
AP for mixer = 0.9992
recall for mixer = 1.0000
precision for mixer = 0.9967
AP for nozzle = 0.9953
recall for nozzle = 0.9953
precision for nozzle = 0.9953
AP for nozzle_cone = 0.9984
recall for nozzle_cone = 0.9984
precision for nozzle_cone = 0.9967
AP for stand = 1.0000
recall for stand = 1.0000
precision for stand = 0.9985
Mean AP = 0.9980
~~~~~~~~
Results:
0.999
0.997
0.999
1.000
0.997
0.995
0.999
0.995
0.998
1.000
~~~~~~~~
0.998
~~~~~~~~