- 由于本人水平有限,难免出现错漏,敬请批评改正。
- 更多精彩内容,可点击进入Python日常小操作专栏、OpenCV-Python小应用专栏、YOLO系列专栏、自然语言处理专栏或我的个人主页查看
- YOLOv8 Ultralytics:使用Ultralytics框架训练RT-DETR实时目标检测模型
- 基于DETR的人脸伪装检测
- YOLOv7训练自己的数据集(口罩检测)
- YOLOv8训练自己的数据集(足球检测)
- YOLOv5:TensorRT加速YOLOv5模型推理
- YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU
- 玩转Jetson Nano(五):TensorRT加速YOLOv5目标检测
- YOLOv5:添加SE、CBAM、CoordAtt、ECA注意力机制
- YOLOv5:yolov5s.yaml配置文件解读、增加小目标检测层
- Python将COCO格式实例分割数据集转换为YOLO格式实例分割数据集
- YOLOv5:使用7.0版本训练自己的实例分割模型(车辆、行人、路标、车道线等实例分割)
- 使用Kaggle GPU资源免费体验Stable Diffusion开源项目
- 熟悉Python
- Python是一种跨平台的计算机程序设计语言。是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本语言。最初被设计用于编写自动化脚本(shell),随着版本的不断更新和语言新功能的添加,越多被用于独立的、大型项目的开发。
- PyTorch 是一个深度学习框架,封装好了很多网络和深度学习相关的工具方便我们调用,而不用我们一个个去单独写了。它分为 CPU 和 GPU 版本,其他框架还有 TensorFlow、Caffe 等。PyTorch 是由 Facebook 人工智能研究院(FAIR)基于 Torch 推出的,它是一个基于 Python 的可续计算包,提供两个高级功能:1、具有强大的 GPU 加速的张量计算(如 NumPy);2、构建深度神经网络时的自动微分机制。
- YOLOv5是一种单阶段目标检测算法,该算法在YOLOv4的基础上添加了一些新的改进思路,使其速度与精度都得到了极大的性能提升。它是一个在COCO数据集上预训练的物体检测架构和模型系列,代表了Ultralytics对未来视觉AI方法的开源研究,其中包含了经过数千小时的研究和开发而形成的经验教训和最佳实践。
- Labelme是一款图像标注工具,由麻省理工(MIT)的计算机科学和人工智能实验室(CSAIL)研发。它是用Python和PyQT编写的,开源且免费。Labelme支持Windows、Linux和Mac等操作系统。
- 这款工具提供了直观的图形界面,允许用户在图像上标注多种类型的目标,例如矩形框、多边形、线条等,甚至包括更复杂的形状。标注结果以JSON格式保存,便于后续处理和分析。这些标注信息可以用于目标检测、图像分割、图像分类等任务。
- 总的来说,Labelme是一款强大且易用的图像标注工具,可以满足不同的图像处理需求。
- Labelme标注json文件是一种用于存储标注信息的文件格式,它包含了以下几个主要的字段:
version
: Labelme的版本号,例如"4.5.6"。flags
: 一些全局的标志,例如是否是分割任务,是否有多边形,等等。shapes
: 一个列表,每个元素是一个字典,表示一个标注对象。每个字典包含了以下几个字段:
label
: 标注对象的类别名称,例如"dog"。points
: 一个列表,每个元素是一个坐标对,表示标注对象的边界点,例如[[10, 20], [30, 40]]。group_id
: 标注对象的分组编号,用于表示属于同一组的对象,例如1。shape_type
: 标注对象的形状类型,例如"polygon",“rectangle”,“circle”,等等。flags
: 一些针对该标注对象的标志,例如是否是难例,是否被遮挡,等等。lineColor
: 标注对象的边界线颜色,例如[0, 255, 0, 128]。fillColor
: 标注对象的填充颜色,例如[255, 0, 0, 128]。imagePath
: 图像文件的相对路径,例如"img_001.jpg"。imageData
: 图像文件的二进制数据,经过base64编码后的字符串,例如"iVBORw0KGgoAAAANSUhEUgAA…"。imageHeight
: 图像的高度,例如600。imageWidth
: 图像的宽度,例如800。
以下是一个Labelme标注json文件的示例:
{
"version": "4.5.6",
"flags": {},
"shapes": [
{
"label": "dog",
"points": [
[
121.0,
233.0
],
[
223.0,
232.0
],
[
246.0,
334.0
],
[
121.0,
337.0
]
],
"group_id": null,
"shape_type": "polygon",
"flags": {}
}
],
"lineColor": [
0,
255,
0,
128
],
"fillColor": [
255,
0,
0,
128
],
"imagePath": "img_001.jpg",
"imageData": "iVBORw0KGgoAAAANSUhEUgAA...",
"imageHeight": 600,
"imageWidth": 800
}
- Python 3.x (面向对象的高级语言)
- images:原始图片数据集所在的文件夹。
- jsons:原始Labelme标注文件所在的文件夹。
{
"version": "5.2.0.post4",
"flags": {},
"shapes": [
{
"label": "cat",
"points": [
[
161.0612244897959,
152.265306122449
],
[
610.0408163265306,
399.7142857142857
]
],
"group_id": null,
"description": "",
"shape_type": "rectangle",
"flags": {}
}
],
"imagePath": "cat.png",
"imageData": null,
"imageHeight": 478,
"imageWidth": 766
}
{
"version": "5.2.0.post4",
"flags": {},
"shapes": [
{
"label": "flower",
"points": [
[
301.9230769230769,
19.52747252747254
],
[
452.4725274725275,
168.42857142857144
]
],
"group_id": null,
"description": "",
"shape_type": "rectangle",
"flags": {}
},
{
"label": "flower",
"points": [
[
378.2967032967033,
183.81318681318683
],
[
529.3956043956044,
364.032967032967
]
],
"group_id": null,
"description": null,
"shape_type": "rectangle",
"flags": {}
}
],
"imagePath": "flower.png",
"imageData": null,
"imageHeight": 394,
"imageWidth": 850
}
{
"version": "5.2.0.post4",
"flags": {},
"shapes": [
{
"label": "swan",
"points": [
[
147.76178010471205,
212.01570680628274
],
[
294.88219895287955,
476.93717277486905
]
],
"group_id": null,
"description": "",
"shape_type": "rectangle",
"flags": {}
},
{
"label": "swan",
"points": [
[
271.8455497382199,
243.42931937172776
],
[
342.0026178010471,
322.4869109947644
]
],
"group_id": null,
"description": "",
"shape_type": "rectangle",
"flags": {}
},
{
"label": "swan",
"points": [
[
305.35340314136124,
215.6806282722513
],
[
394.3586387434555,
421.4397905759162
]
],
"group_id": null,
"description": "",
"shape_type": "rectangle",
"flags": {}
},
{
"label": "swan",
"points": [
[
549.8560209424083,
202.59162303664922
],
[
655.0916230366491,
345.52356020942403
]
],
"group_id": null,
"description": "",
"shape_type": "rectangle",
"flags": {}
}
],
"imagePath": "swan.png",
"imageData": null,
"imageHeight": 490,
"imageWidth": 795
}
import os
import cv2
import math
import json
import random
import numpy as np
def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
border=(0, 0)):
# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))
# targets = [cls, xyxy]
height = im.shape[0] + border[0] * 2 # shape(h,w,c)
width = im.shape[1] + border[1] * 2
# Center
C = np.eye(3)
C[0, 2] = -im.shape[1] / 2 # x translation (pixels)
C[1, 2] = -im.shape[0] / 2 # y translation (pixels)
# Perspective
P = np.eye(3)
P[2, 0] = random.uniform(-perspective, perspective) # x perspective (about y)
P[2, 1] = random.uniform(-perspective, perspective) # y perspective (about x)
# Rotation and Scale
R = np.eye(3)
a = random.uniform(-degrees, degrees)
# a += random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations
s = random.uniform(1 - scale, 1 + scale)
# s = 2 ** random.uniform(-scale, scale)
R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)
# Shear
S = np.eye(3)
S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg)
S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg)
# Translation
T = np.eye(3)
T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width # x translation (pixels)
T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height # y translation (pixels)
# Combined rotation matrix
M = T @ S @ R @ P @ C # order of operations (right to left) is IMPORTANT
if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any(): # image changed
if perspective:
im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))
else: # affine
im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
# Visualize
# import matplotlib.pyplot as plt
# ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()
# ax[0].imshow(im[:, :, ::-1]) # base
# ax[1].imshow(im2[:, :, ::-1]) # warped
# Transform label coordinates
n = len(targets)
if n:
use_segments = any(x.any() for x in segments)
new = np.zeros((n, 4))
# warp boxes
xy = np.ones((n * 4, 3))
xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1
xy = xy @ M.T # transform
xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8) # perspective rescale or affine
# create new boxes
x = xy[:, [0, 2, 4, 6]]
y = xy[:, [1, 3, 5, 7]]
new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
# clip
new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)
# filter candidates
i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)
targets = targets[i]
targets[:, 1:5] = new[i]
return im, targets
def box_candidates(box1, box2, wh_thr=2, ar_thr=100, area_thr=0.1, eps=1e-16): # box1(4,n), box2(4,n)
# Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps)) # aspect ratio
return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr) # candidates
# 图像显示函数
def show(name, img):
cv2.namedWindow(name, 0) # 用来创建指定名称的窗口,0表示CV_WINDOW_NORMAL
# cv2.resizeWindow(name, img.shape[1], img.shape[0]); # 设置宽高大小为640*480
cv2.imshow(name, img)
cv2.waitKey(0)
cv2.destroyAllWindows()
def xyxy2xminyminxmaxymax(rect):
'''
(x1,y1,x2,y2) -> (xmin,ymin,xmax,ymax)
'''
xmin = min(rect[0],rect[2])
ymin = min(rect[1],rect[3])
xmax = max(rect[0],rect[2])
ymax = max(rect[1],rect[3])
return xmin,ymin,xmax,ymax
def read_img_json(in_img_path,in_json_path):
label_map = {'cat':0,'flower':1,'swan':2}
img = cv2.imread(in_img_path)
with open(in_json_path, "r", encoding='utf-8') as f:
# json.load数据到变量json_data
json_data = json.load(f)
labels = []
# print(json_data['shapes'])
# 读取原始jsons的 [[x1,y1],[x2,y2]]
for i in json_data['shapes']:
label = label_map[i['label']]
rect = int(i['points'][0][0]),int(i['points'][0][1]),int(i['points'][1][0]),int(i['points'][1][1]) # x1,y1,x2,y2
xmin,ymin,xmax,ymax = xyxy2xminyminxmaxymax(rect)
labels.append([label,xmin,ymin,xmax,ymax])
return img, np.array(labels)
def write_img_json(img_array,img_targets,out_img_name,out_img_path,out_json_path):
json_dict = {
"version": "4.5.6",
"flags": {},
"shapes": [],
}
label_map = {0:'cat',1:'flower',2:'swan'}
cv2.imwrite(out_img_path,img_array)
new_img_height,new_img_width = img_array.shape[0],img_array.shape[1]
for i in img_targets:
label = label_map[i[0]]
box = i[1:]
shapes_dict = {'label': '',
'points': [], # [[x1,y1],[x2,y2]]
'group_id': None,
'shape_type': 'rectangle',
'flags': {}}
shapes_dict['label'] = label
'''
将 numpy int32 对象传递给 json.dumps() 方法,但该方法默认不处理 numpy integers。
要解决该错误,请在序列化之前使用内置的 int()或 float()函数将 numpy int32 对象转换为Python intege
x1,y1,x2,y2 = box
'''
x1,y1,x2,y2 = int(box[0]),int(box[1]),int(box[2]),int(box[3])
shapes_dict['points'] = [[x1,y1],[x2,y2]]
json_dict['shapes'].append(shapes_dict)
'''
写新的json文件
'''
json_dict["imagePath"] = out_img_name
json_dict["imageData"] = None
json_dict["imageHeight"] = new_img_height
json_dict["imageWidth"] = new_img_width
# 创建一个写文件
with open(out_json_path, "w", encoding='utf-8') as f:
# 将修改后的数据写入文件
f.write(json.dumps(json_dict))
if __name__=="__main__":
# 输出图片所在文件夹
out_imgs_dir = 'out_images/'
# 输出jsons所在文件夹
out_jsons_dir = 'out_jsons/'
if not os.path.exists(out_imgs_dir):
os.mkdir(out_imgs_dir)
if not os.path.exists(out_jsons_dir):
os.mkdir(out_jsons_dir)
# 输入图片所在文件夹
in_imgs_dir = 'images/'
# 输入jsons所在文件夹
in_jsons_dir = 'jsons/'
# 输入图片名列表
file_name_list = os.listdir(in_imgs_dir)
img_name_list = [i for i in file_name_list if i.endswith('.png')]
# 输入jsons文件名列表
file_name_list = os.listdir(in_jsons_dir)
json_name_list = [i for i in file_name_list if i.endswith('.json')]
# print(img_name_list,json_name_list)
# 定义剪裁图片的左右填充数
pad = 0
for img_name,json_name in zip(img_name_list,json_name_list):
in_img_path = os.path.join(in_imgs_dir,img_name)
out_img_path = os.path.join(out_imgs_dir,img_name)
in_json_path = os.path.join(in_jsons_dir,json_name)
out_json_path = os.path.join(out_jsons_dir,json_name)
# 原始图片和标注信息labels = [[label,xmin,ymin,xmax,ymax]]
img,labels = read_img_json(in_img_path,in_json_path)
# print(img,labels)
# 随机仿射后的图片和标注信息targets = [[label,xmin,ymin,xmax,ymax]]
img_res,targets = random_perspective(img,labels)
# print(img_res,targets)
write_img_json(img_res,targets,img_name,out_img_path,out_json_path)
- out_images:随机仿射变换后的图片所在的文件夹。
- out_jsons:随机仿射变换后的Labelme标注文件所在的文件夹。
{
"version": "4.5.6",
"flags": {},
"shapes": [
{
"label": "cat",
"points": [
[
211,
94
],
[
700,
374
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
}
],
"imagePath": "cat.png",
"imageData": null,
"imageHeight": 478,
"imageWidth": 766
}
{
"version": "4.5.6",
"flags": {},
"shapes": [
{
"label": "flower",
"points": [
[
266,
49
],
[
421,
193
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
},
{
"label": "flower",
"points": [
[
355,
205
],
[
515,
379
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
}
],
"imagePath": "flower.png",
"imageData": null,
"imageHeight": 394,
"imageWidth": 850
}
{
"version": "4.5.6",
"flags": {},
"shapes": [
{
"label": "swan",
"points": [
[
202,
135
],
[
363,
417
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
},
{
"label": "swan",
"points": [
[
329,
182
],
[
404,
269
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
},
{
"label": "swan",
"points": [
[
362,
158
],
[
461,
374
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
},
{
"label": "swan",
"points": [
[
607,
175
],
[
721,
331
]
],
"group_id": null,
"shape_type": "rectangle",
"flags": {}
}
],
"imagePath": "swan.png",
"imageData": null,
"imageHeight": 490,
"imageWidth": 795
}
- 由于本人水平有限,难免出现错漏,敬请批评改正。
- 更多精彩内容,可点击进入Python日常小操作专栏、OpenCV-Python小应用专栏、YOLO系列专栏、自然语言处理专栏或我的个人主页查看
- YOLOv8 Ultralytics:使用Ultralytics框架训练RT-DETR实时目标检测模型
- 基于DETR的人脸伪装检测
- YOLOv7训练自己的数据集(口罩检测)
- YOLOv8训练自己的数据集(足球检测)
- YOLOv5:TensorRT加速YOLOv5模型推理
- YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU
- 玩转Jetson Nano(五):TensorRT加速YOLOv5目标检测
- YOLOv5:添加SE、CBAM、CoordAtt、ECA注意力机制
- YOLOv5:yolov5s.yaml配置文件解读、增加小目标检测层
- Python将COCO格式实例分割数据集转换为YOLO格式实例分割数据集
- YOLOv5:使用7.0版本训练自己的实例分割模型(车辆、行人、路标、车道线等实例分割)
- 使用Kaggle GPU资源免费体验Stable Diffusion开源项目