参考:
- 《飞桨新人赛:钢铁缺陷检测挑战赛-第1名方案》FasterRCNN+Swin
- 《飞桨新人赛:钢铁缺陷检测挑战赛-第2名方案》FasterRCNN
- 《飞桨新人赛:钢铁缺陷检测挑战赛-第3名方案》
提交内容及格式:
image_id bbox category_id confidence
1400 [0, 0, 0, 0] 0 1
各字段含义如下:
《PaddleDetection-MaskRcnn相关结构以及优化器》
%cd ~/work
#!git clone https://github.com/PaddlePaddle/PaddleDetection.git
#如果github下载代码较慢,可尝试使用gitee
#git clone https://gitee.com/paddlepaddle/PaddleDetection
# 安装其他依赖
%cd PaddleDetection
!pip install -r requirements.txt
# 编译安装paddledet
!python setup.py install
#安装后确认测试通过:
!python ppdet/modeling/tests/test_architectures.py
!python ppdet/modeling/tests/test_architectures.py
W1001 15:08:57.768669 1185 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1001 15:08:57.773610 1185 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
.......
----------------------------------------------------------------------
Ran 7 tests in 2.142s
OK
# 解压到work下的dataset文件夹
!mkdir dataset
!unzip ../data/test.zip -d dataset
!unzip ../data/train.zip -d dataset
# 重命名为annotations和images
!mv dataset/train/IMAGES dataset/train/images
!mv dataset/train/ANNOTATIONS dataset/train/annotations
PaddleDetection
的数据处理模块的所有代码逻辑在ppdet/data/
中,数据处理模块用于加载数据并将其转换成适用于物体检测模型的训练、评估、推理所需要的格式。source
目录下,其中dataset.py
中定义了数据集的基类DetDataSet
, 所有的数据集均继承于基类,DetDataset基类里定义了如下等方法:parse_dataset
根据数据集设置:
dataset_dir
image_dir
anno_path
取出所有的样本,并将其保存在一个列表roidbs中,每一个列表中的元素为一个样本xxx_rec(比如coco_rec或者voc_rec),用dict表示,dict中包含样本的image, gt_bbox, gt_class等字段。COCO和Pascal-VOC数据集中的xxx_rec的数据结构定义如下:
xxx_rec = {
'im_file': im_fname, # 一张图像的完整路径
'im_id': np.array([img_id]), # 一张图像的ID序号
'h': im_h, # 图像高度
'w': im_w, # 图像宽度
'is_crowd': is_crowd, # 是否是群落对象, 默认为0 (VOC中无此字段)
'gt_class': gt_class, # 标注框标签名称的ID序号
'gt_bbox': gt_bbox, # 标注框坐标(xmin, ymin, xmax, ymax)
'gt_poly': gt_poly, # 分割掩码,此字段只在coco_rec中出现,默认为None
'difficult': difficult # 是否是困难样本,此字段只在voc_rec中出现,默认为0
}
xxx_rec中的内容也可以通过DetDataSet的data_fields参数来控制,即可以过滤掉一些不需要的字段,但大多数情况下不需要修改,按照configs/dataset中的默认配置即可。
此外,在parse_dataset函数中,保存了类别名到id的映射的一个字典cname2cid。在coco数据集中,会利用COCO API从标注文件中加载数据集的类别名,并设置此字典。在voc数据集中,如果设置use_default_label=False,将从label_list.txt中读取类别列表,反之将使用voc默认的类别列表。
参考:《如何准备训练数据》
尝试coco数据集
dataset/xxx/
├── annotations
│ ├── train.json # coco数据的标注文件
│ ├── valid.json # coco数据的标注文件
├── images
│ ├── xxx1.jpg
│ ├── xxx2.jpg
│ ├── xxx3.jpg
│ | ...
...
paddledetection./tools/
中提供的x2coco.py
,将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据(生成json标准为文件)。这样太麻烦,还不如直接用VOC格式训练。尝试自定义数据集(参考《数据处理模块》,重写parse_dataset
感觉也很麻烦)
准备voc数据集(最简单,麻烦一点的就是生成txt文件)
模仿VOC数据集目录结构,新建VOCdevkit
文件夹并进入其中,然后继续新建VOC2007
文件夹并进入其中,之后新建Annotations
、JPEGImages
和ImageSets
文件夹,最后进入ImageSets
文件夹中新建Main
文件夹,至此完成VOC数据集目录结构的建立。
将该数据集中的train/annotations/xmls
与val/annotations/xmls
(如果有val验证集的话)下的所有xml标注文件拷贝到VOCdevkit/VOC2007/Annotations
中,
将该数据集中的train/images/
与val/images/
下的所有图片拷贝到VOCdevkit/VOC2007/JPEGImages
中
最后在数据集根目录下输出最终的trainval.txt和test.txt文件(可用pandas完成,一会说):
生成VOC格式目录。
如果觉得后面移动文件很麻烦,可以先生成VOC目录,再将数据集解压到VOC2007中,将其图片和标注文件夹分别重命名为Annotations
和JPEGImages
。
%cd work
!mkdir VOCdevkit
%cd VOCdevkit
!mkdir VOC2007
%cd VOC2007
!mkdir Annotations JPEGImages ImageSets
%cd ImageSets
!mkdir Main
%cd ../../
trainval.txt
和val.txt
trainval.txt
和val.txt
,所以需要进行生成,用pandas处理更直观# 遍历图片和标注文件夹,将所有文件后缀正确的文件添加到列表中
import os
import pandas as pd
ls_xml,ls_image=[],[]
for xml in os.listdir('dataset/train/annotations'):
if xml.split('.')[1]=='xml':
ls_xml.append(xml)
for image in os.listdir('dataset/train/images'):
if image.split('.')[1]=='jpg':
ls_image.append(image)
读取xml
文件列表和image
文件名列表之后,要先进行排序。
df=pd.DataFrame(ls_image,columns=['image'])
df.sort_values('image',inplace=True)
df=df.reset_index(drop=True)
s=pd.Series(ls_xml).sort_values().reset_index(drop=True)
df['xml']=s
df.head(3)
image xml
0 0.jpg 0.xml
1 1.jpg 1.xml
2 10.jpg 10.xml
训练时,文件都是相对路径,所以要加前缀VOC2007/JPEGImages/
和VOC2007/Annotations/
%cd VOCdevkit
voc=df.sample(frac=1)
voc.image=voc.image.apply(lambda x : 'VOC2007/JPEGImages/'+str(x))
voc.xml=voc.xml.apply(lambda x : 'VOC2007/Annotations/'+str(x))
voc.to_csv('trainval.txt',sep=' ',index=0,header=0)
# 划分出训练集和验证集,保存为txt格式,中间用空格隔开
train_df=trainval[:1200]
val_df=trainval[1200:]
train_df.to_csv('train.txt',sep=' ',index=0,header=0)
val_df.to_csv('val.txt',sep=' ',index=0,header=0)
!cp -r train/annotations/* ../VOCdevkit/VOC2007/Annotations
!cp -r train/images/* ../VOCdevkit/VOC2007/JPEGImages
查看一张图片信息:
from PIL import Image
image = Image.open('dataset/train/images/0.jpg')
print('width: ', image.width)
print('height: ', image.height)
print('size: ', image.size)
print('mode: ', image.mode)
print('format: ', image.format)
print('category: ', image.category)
print('readonly: ', image.readonly)
print('info: ', image.info)
image.show()
configs/ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml
文件的副本,其它类似,防止改错了无法还原configs/datasets/voc.yml
不用复制,免得下次重新写,修改后如下:(TestDataset
最好不要写dataset_dir
字段,否则后面infer.py
推断的时候,选择参数save_results=True
会报错label_list label_list.txt not a file
)metric: VOC
map_type: 11point
num_classes: 6
TrainDataset:
!VOCDataSet
dataset_dir: ../VOCdevkit
anno_path: train.txt
label_list: label_list.txt
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
EvalDataset:
!VOCDataSet
dataset_dir: ../VOCdevkit
anno_path: val.txt
label_list: label_list.txt
data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
TestDataset:
!ImageFolder
anno_path: ../VOCdevkit/label_list.txt
--amp
.Multi-Scale Training
:多尺度训练 。yolov3中作者认为网络输入尺寸固定的话,模型鲁棒性受限,所以考虑多尺度训练。具体的,在训练过程中每隔10个batches,重新随机选择输入图片的尺寸[320,352,416…608](Darknet-19最终将图片缩放32倍,所以一般选择32的倍数)。configs/_base_/yolov3_reader.yml
中的TrainReader
的BatchRandomResize
中target_size
包含指定的尺寸,训练完成后,在评估或者预测时,需要将EvalReader
和TestReader
中的Resize
的target_size
修改成对应的尺寸,如果是需要模型导出(export_model),则需要将TestReader
中的image_shape
修改为对应的图片输入尺寸 。ppyoloe_plus_reader.yml
修改如下:(图片都很小,把默认的入网尺寸改了)
worker_num: 4
eval_height: &eval_height 224
eval_width: &eval_width 224
eval_size: &eval_size [*eval_height, *eval_width]
TrainReader:
sample_transforms:
- Decode: {}
- RandomDistort: {}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {}
- RandomFlip: {}
batch_transforms:
- BatchRandomResize: {target_size: [96, 128, 160, 192, 224, 256, 288,320,352], random_size: True, random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
- PadGT: {}
batch_size: 16
shuffle: true
drop_last: true
use_shared_memory: true
collate_batch: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 16
TestReader:
inputs_def:
image_shape: [3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {target_size: *eval_size, keep_ratio: False, interp: 2}
- NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
- Permute: {}
batch_size: 1 #最好是1,下文会说明
训练参数列表:(可通过–help查看)
FLAG | 支持脚本 | 用途 | 默认值 | 备注 |
---|---|---|---|---|
-c | ALL | 指定配置文件 | None | 必选,例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml |
-o | ALL | 设置或更改配置文件里的参数内容 | None | 相较于-c 设置的配置文件有更高优先级,例如:-o use_gpu=False |
–eval | train | 是否边训练边测试 | False | 如需指定,直接--eval 即可 |
-r/–resume_checkpoint | train | 恢复训练加载的权重路径 | None | 例如:-r output/faster_rcnn_r50_1x_coco/10000 |
–slim_config | ALL | 模型压缩策略配置文件 | None | 例如--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml |
–use_vdl | train/infer | 是否使用VisualDL记录数据,进而在VisualDL面板中显示 | False | VisualDL需Python>=3.5 |
–vdl_log_dir | train/infer | 指定 VisualDL 记录数据的存储路径 | train:vdl_log_dir/scalar infer: vdl_log_dir/image |
VisualDL需Python>=3.5 |
–output_eval | eval | 评估阶段保存json路径 | None | 例如 --output_eval=eval_output , 默认为当前路径 |
–json_eval | eval | 是否通过已存在的bbox.json或者mask.json进行评估 | False | 如需指定,直接--json_eval 即可, json文件路径在--output_eval 中设置 |
–classwise | eval | 是否评估单类AP和绘制单类PR曲线 | False | 如需指定,直接--classwise 即可 |
–output_dir | infer/export_model | 预测后结果或导出模型保存路径 | ./output |
例如--output_dir=output |
–draw_threshold | infer | 可视化时分数阈值 | 0.5 | 例如--draw_threshold=0.7 |
–infer_dir | infer | 用于预测的图片文件夹路径 | None | --infer_img 和--infer_dir 必须至少设置一个 |
–infer_img | infer | 用于预测的图片路径 | None | --infer_img 和--infer_dir 必须至少设置一个,infer_img 具有更高优先级 |
–save_results | infer | 是否在文件夹下将图片的预测结果保存到文件中 | False | 可选 |
# lr=0.0002,epoch=40,time=2572s
%cd ~/work/PaddleDetection/
!python -u tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco-Copy1.yml \
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
--eval \
--amp
[10/02 02:51:09] ppdet.engine INFO: Epoch: [45] [ 0/75] learning_rate: 0.000085 loss: 1.671754 loss_cls: 0.781011 loss_iou: 0.174941 loss_dfl: 0.880325 loss_l1: 0.378740 eta: 0:10:17 batch_cost: 0.5481 data_cost: 0.0049 ips: 29.1918 images/s
[10/02 02:51:58] ppdet.engine INFO: Epoch: [46] [ 0/75] learning_rate: 0.000080 loss: 1.672976 loss_cls: 0.782366 loss_iou: 0.173936 loss_dfl: 0.885129 loss_l1: 0.378303 eta: 0:09:36 batch_cost: 0.5459 data_cost: 0.0049 ips: 29.3068 images/s
[10/02 02:52:48] ppdet.engine INFO: Epoch: [47] [ 0/75] learning_rate: 0.000075 loss: 1.679924 loss_cls: 0.791866 loss_iou: 0.173251 loss_dfl: 0.892923 loss_l1: 0.371434 eta: 0:08:55 batch_cost: 0.5490 data_cost: 0.0049 ips: 29.1445 images/s
[10/02 02:53:37] ppdet.engine INFO: Epoch: [48] [ 0/75] learning_rate: 0.000069 loss: 1.669277 loss_cls: 0.785255 loss_iou: 0.173793 loss_dfl: 0.879943 loss_l1: 0.384001 eta: 0:08:14 batch_cost: 0.5546 data_cost: 0.0072 ips: 28.8511 images/s
[10/02 02:54:25] ppdet.engine INFO: Epoch: [49] [ 0/75] learning_rate: 0.000064 loss: 1.653534 loss_cls: 0.783021 loss_iou: 0.173808 loss_dfl: 0.865887 loss_l1: 0.377161 eta: 0:07:32 batch_cost: 0.5445 data_cost: 0.0080 ips: 29.3847 images/s
[10/02 02:55:18] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 02:55:19] ppdet.engine INFO: Eval iter: 0
[10/02 02:55:24] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/02 02:55:24] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 77.23%
[10/02 02:55:24] ppdet.engine INFO: Total sample number: 200, averge FPS: 36.900204541994455
[10/02 02:55:24] ppdet.engine INFO: Best test bbox ap is 0.772.
[10/02 02:55:30] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 02:55:31] ppdet.engine INFO: Epoch: [50] [ 0/75] learning_rate: 0.000059 loss: 1.639129 loss_cls: 0.777811 loss_iou: 0.172295 loss_dfl: 0.865005 loss_l1: 0.371313 eta: 0:06:51 batch_cost: 0.5446 data_cost: 0.0056 ips: 29.3792 images/s
[10/02 02:56:18] ppdet.engine INFO: Epoch: [51] [ 0/75] learning_rate: 0.000054 loss: 1.643418 loss_cls: 0.775152 loss_iou: 0.170642 loss_dfl: 0.866506 loss_l1: 0.374402 eta: 0:06:10 batch_cost: 0.5376 data_cost: 0.0058 ips: 29.7619 images/s
[10/02 02:57:07] ppdet.engine INFO: Epoch: [52] [ 0/75] learning_rate: 0.000050 loss: 1.652525 loss_cls: 0.774686 loss_iou: 0.170963 loss_dfl: 0.863157 loss_l1: 0.375742 eta: 0:05:28 batch_cost: 0.5396 data_cost: 0.0068 ips: 29.6510 images/s
[10/02 02:57:56] ppdet.engine INFO: Epoch: [53] [ 0/75] learning_rate: 0.000045 loss: 1.627508 loss_cls: 0.768282 loss_iou: 0.168563 loss_dfl: 0.865570 loss_l1: 0.368651 eta: 0:04:47 batch_cost: 0.5505 data_cost: 0.0093 ips: 29.0646 images/s
[10/02 02:58:45] ppdet.engine INFO: Epoch: [54] [ 0/75] learning_rate: 0.000041 loss: 1.630234 loss_cls: 0.768148 loss_iou: 0.168092 loss_dfl: 0.868954 loss_l1: 0.361416 eta: 0:04:06 batch_cost: 0.5521 data_cost: 0.0096 ips: 28.9806 images/s
[10/02 02:59:39] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 02:59:40] ppdet.engine INFO: Eval iter: 0
[10/02 02:59:45] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/02 02:59:45] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 76.88%
[10/02 02:59:45] ppdet.engine INFO: Total sample number: 200, averge FPS: 34.83328737953738
[10/02 02:59:45] ppdet.engine INFO: Best test bbox ap is 0.772.
[10/02 02:59:47] ppdet.engine INFO: Epoch: [55] [ 0/75] learning_rate: 0.000037 loss: 1.630969 loss_cls: 0.772440 loss_iou: 0.170143 loss_dfl: 0.868614 loss_l1: 0.365805 eta: 0:03:25 batch_cost: 0.5482 data_cost: 0.0057 ips: 29.1862 images/s
[10/02 03:00:35] ppdet.engine INFO: Epoch: [56] [ 0/75] learning_rate: 0.000033 loss: 1.637988 loss_cls: 0.769367 loss_iou: 0.170256 loss_dfl: 0.869540 loss_l1: 0.361416 eta: 0:02:44 batch_cost: 0.5446 data_cost: 0.0055 ips: 29.3816 images/s
[10/02 03:01:24] ppdet.engine INFO: Epoch: [57] [ 0/75] learning_rate: 0.000029 loss: 1.627233 loss_cls: 0.764908 loss_iou: 0.166364 loss_dfl: 0.872990 loss_l1: 0.351342 eta: 0:02:03 batch_cost: 0.5433 data_cost: 0.0054 ips: 29.4497 images/s
[10/02 03:02:12] ppdet.engine INFO: Epoch: [58] [ 0/75] learning_rate: 0.000025 loss: 1.621432 loss_cls: 0.766320 loss_iou: 0.165519 loss_dfl: 0.872478 loss_l1: 0.342992 eta: 0:01:22 batch_cost: 0.5474 data_cost: 0.0084 ips: 29.2273 images/s
[10/02 03:03:01] ppdet.engine INFO: Epoch: [59] [ 0/75] learning_rate: 0.000022 loss: 1.618331 loss_cls: 0.764125 loss_iou: 0.167583 loss_dfl: 0.870914 loss_l1: 0.356742 eta: 0:00:41 batch_cost: 0.5461 data_cost: 0.0093 ips: 29.2967 images/s
[10/02 03:03:50] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/02 03:03:50] ppdet.engine INFO: Eval iter: 0
[10/02 03:03:55] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/02 03:03:55] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 77.04%
[10/02 03:03:55] ppdet.engine INFO: Total sample number: 200, averge FPS: 36.8503525942714
[10/02 03:03:55] ppdet.engine INFO: Best test bbox ap is 0.772.
60epoch共3571s,差不多一个epoch1分钟。
# bs=16,lr=0.00025,epoch=60,time=3571s
%cd ~/work/PaddleDetection/
!python -u tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco-Copy1.yml \
--use_vdl=true \
--vdl_log_dir=vdl_dir/scalar \
--eval --amp \
-o output_dir=output/ppyoloe_l_plus_10e\
snapshot_epoch=2
[10/01 16:52:44] ppdet.engine INFO: Epoch: [50] [ 0/87] learning_rate: 0.000061 loss: 1.627664 loss_cls: 0.770285 loss_iou: 0.169481 loss_dfl: 0.865384 loss_l1: 0.369002 eta: 0:07:53 batch_cost: 0.5706 data_cost: 0.0075 ips: 28.0426 images/s
[10/01 16:53:44] ppdet.engine INFO: Epoch: [51] [ 0/87] learning_rate: 0.000056 loss: 1.632395 loss_cls: 0.771937 loss_iou: 0.170442 loss_dfl: 0.873482 loss_l1: 0.374619 eta: 0:07:07 batch_cost: 0.5764 data_cost: 0.0108 ips: 27.7600 images/s
[10/01 16:54:41] ppdet.engine INFO: Epoch: [52] [ 0/87] learning_rate: 0.000051 loss: 1.632395 loss_cls: 0.770529 loss_iou: 0.171579 loss_dfl: 0.871368 loss_l1: 0.374107 eta: 0:06:19 batch_cost: 0.5642 data_cost: 0.0099 ips: 28.3589 images/s
[10/01 16:55:38] ppdet.engine INFO: Epoch: [53] [ 0/87] learning_rate: 0.000046 loss: 1.612642 loss_cls: 0.753961 loss_iou: 0.171108 loss_dfl: 0.863146 loss_l1: 0.360852 eta: 0:05:32 batch_cost: 0.5599 data_cost: 0.0120 ips: 28.5753 images/s
[10/01 16:56:34] ppdet.engine INFO: Epoch: [54] [ 0/87] learning_rate: 0.000042 loss: 1.606246 loss_cls: 0.752236 loss_iou: 0.168287 loss_dfl: 0.849859 loss_l1: 0.353835 eta: 0:04:44 batch_cost: 0.5536 data_cost: 0.0086 ips: 28.9019 images/s
[10/01 16:57:30] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/01 16:57:31] ppdet.engine INFO: Eval iter: 0
[10/01 16:57:36] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/01 16:57:36] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 84.07%
[10/01 16:57:36] ppdet.engine INFO: Total sample number: 200, averge FPS: 35.00615985304822
[10/01 16:57:36] ppdet.engine INFO: Best test bbox ap is 0.841.
[10/01 16:57:42] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/01 16:57:43] ppdet.engine INFO: Epoch: [55] [ 0/87] learning_rate: 0.000038 loss: 1.600404 loss_cls: 0.750651 loss_iou: 0.167692 loss_dfl: 0.850050 loss_l1: 0.356826 eta: 0:03:57 batch_cost: 0.5343 data_cost: 0.0044 ips: 29.9444 images/s
[10/01 16:58:42] ppdet.engine INFO: Epoch: [56] [ 0/87] learning_rate: 0.000034 loss: 1.598746 loss_cls: 0.750651 loss_iou: 0.167707 loss_dfl: 0.855518 loss_l1: 0.357439 eta: 0:03:09 batch_cost: 0.5491 data_cost: 0.0042 ips: 29.1369 images/s
[10/01 16:59:38] ppdet.engine INFO: Epoch: [57] [ 0/87] learning_rate: 0.000030 loss: 1.617750 loss_cls: 0.758843 loss_iou: 0.169149 loss_dfl: 0.867757 loss_l1: 0.359384 eta: 0:02:22 batch_cost: 0.5589 data_cost: 0.0055 ips: 28.6267 images/s
[10/01 17:00:33] ppdet.engine INFO: Epoch: [58] [ 0/87] learning_rate: 0.000026 loss: 1.615083 loss_cls: 0.764672 loss_iou: 0.169149 loss_dfl: 0.866775 loss_l1: 0.358770 eta: 0:01:34 batch_cost: 0.5437 data_cost: 0.0114 ips: 29.4296 images/s
[10/01 17:01:29] ppdet.engine INFO: Epoch: [59] [ 0/87] learning_rate: 0.000023 loss: 1.600423 loss_cls: 0.762148 loss_iou: 0.168355 loss_dfl: 0.865254 loss_l1: 0.353518 eta: 0:00:47 batch_cost: 0.5460 data_cost: 0.0144 ips: 29.3045 images/s
[10/01 17:02:26] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
[10/01 17:02:27] ppdet.engine INFO: Eval iter: 0
[10/01 17:02:32] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[10/01 17:02:32] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 84.71%
[10/01 17:02:32] ppdet.engine INFO: Total sample number: 200, averge FPS: 36.10603852107679
[10/01 17:02:32] ppdet.engine INFO: Best test bbox ap is 0.847.
[10/01 17:02:38] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_plus_crn_l_80e_coco-Copy1
《VisualDL可视化的使用方法》
!visualdl --logdir PaddleDetection/vdl_dir/scalar
这种是打不开的,因为用的是别人的服务器--draw_threshold
:可视化时分数的阈值,默认大于0.5的box会显示出来keep_top_k
表示设置输出目标的最大数量,默认值为100,用户可以根据自己的实际情况进行设定。--save_txt=True
会输出txt文件存储bbox,新版本--save_txt
没了,改成了--save_results=True
,存储bbox为json文件。ppyoloes_plus_80e
文件夹,其它都删了!python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco-Copy1.yml \
--infer_dir=../VOCdevkit/test/images \
--output_dir=infer_output/ \
-o weights=output/ppyoloe_l_plus_80e/best_model.pdparams \
--draw_threshold=0.3 \
--save_results=True
from PIL import Image
image_test='infer_output/1406.jpg'
image = Image.open(image_test)
image.show()
import glob
import os
import json
import pandas as pd
class Result(object):
def __init__(self):
self.imagesPath = '/home/aistudio/work/VOCdevkit/test/images'
self.bboxPath = '/home/aistudio/work/PaddleDetection/infer_output3/bbox.json'
self.submissionPath = '/home/aistudio/work/submission.csv'
def run(self):
images = self.get_image_ids()
bbox = self.get_bbox()
results = []
for i in range(400):
image_id = images[i]
for j in range(len(bbox['bbox'][i])):
bbox_ = [round(i,4) for i in bbox['bbox'][i][j]]
item = [
image_id,
bbox_,
int(bbox['label'][i][j]),
round(bbox['score'][i][j],2)
]
results.append(item)
submit = pd.DataFrame(results, columns=['image_id', 'bbox','category_id','confidence'])
submit[['image_id', 'bbox','category_id','confidence']].to_csv(self.submissionPath, index=False)
def get_image_ids(self):
idx=[]
for image in os.listdir(self.imagesPath):
if image.split('.')[1]=='jpg':
idx.append(image.split('.')[0])
idx.sort()
return idx
def get_bbox(self):
with open(self.bboxPath, 'r', encoding='utf-8') as bbox:
bbox = json.load(bbox)
return bbox
resultObj = Result()
resultObj.run()
最后生成的csv文件,是每张图包含300个检测目标,筛选其中scroe>0.3的作为最终结果一共1302个检测框)。最终得分41.32
分。
import pandas as pd
df=pd.read_csv('../submission.csv')
df_demo=df.loc[df.confidence>0.3]
df_demo.to_csv('submission.csv',index=None) # paddledatection文件夹下
df_demo
image_id bbox category_id confidence
0 1400 [5.4677, 0.3653, 199.2925, 61.0883] 0 0.54
1 1400 [2.2173, 71.8166, 195.2088, 131.9529] 0 0.47
2 1400 [0.6983, 26.4009, 200.0532, 131.7431] 0 0.44
3 1400 [21.8238, 151.2348, 187.4655, 199.7138] 0 0.32
343 1401 [128.7988, 43.0498, 181.4566, 196.0749] 1 0.89
... ... ... ... ...
119029 1797 [10.5545, 124.7763, 121.6406, 187.9] 0 0.33
119030 1797 [136.0446, 89.9455, 199.4311, 198.8453] 0 0.32
119031 1797 [12.9682, 91.6822, 199.3519, 193.211] 0 0.31
119393 1798 [0.2173, 0.4157, 199.9586, 160.8067] 2 0.83
119626 1799 [5.0449, 107.328, 198.9616, 185.1402] 0 0.39
import numpy as np
from tqdm.notebook import tqdm
tqdm.pandas()
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import glob
import shutil
import sys
sys.path.append('../input/paddleirondetection')
from joblib import Parallel, delayed
from IPython.display import display
/kaggle/working
!mkdir dataset
!cp -r ../input/paddleirondetection/test/test dataset
!cp -r ../input/paddleirondetection/train/train/IMAGES dataset # 直接在
!cp -r ../input/paddleirondetection/train/train/ANNOTATIONS dataset
!mv ./dataset/ANNOTATIONS ./dataset/Annotations
!mv ./dataset/IMAGES ./dataset/images
!ls dataset/images
# 遍历图片和标注文件夹,将所有文件后缀正确的文件添加到列表中
import os
import pandas as pd
ls_xml,ls_image=[],[]
for xml in os.listdir('../input/paddleirondetection/train/train/ANNOTATIONS'):
if xml.split('.')[1]=='xml':
ls_xml.append(xml)
for image in os.listdir('../input/paddleirondetection/train/train/IMAGES'):
if image.split('.')[1]=='jpg':
ls_image.append(image)
df=pd.DataFrame(ls_image,columns=['image'])
df.sort_values('image',inplace=True)
df=df.reset_index(drop=True)
s=pd.Series(ls_xml).sort_values().reset_index(drop=True)
df['xml']=s
df.head(3)
image xml
0 0.jpg 0.xml
1 1.jpg 1.xml
2 10.jpg 10.xml
写入label_list.txt文件,echo -e表示碰到转义符('\n’等)按对应特殊字符处理。(这个是以前VOC数据集用的,可忽略)
!echo -e "crazing\ninclusion\npitted_surface\nscratches\npatches\nrolled-in_scale" > dataset/label_list.txt
!cat dataset/label_list.txt
crazing
inclusion
pitted_surface
scratches
patches
rolled-in_scale
[xmin,ymin,xmax,ymax]
表示cls,[x_center,y_center,w,h]
,且是归一化之后的结果。(将x_center和标注框宽度w除以图像宽度,将y_center与标注框高度h除以图像高度。这样xywh的值域都是[0,1]
)5 0.6075 0.14250000000000002 0.775 0.165
5 0.505 0.6825 0.79 0.525
以下转换代码来自github上的objectDetectionDatasets项目:
#!pip install mmcv
import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
classes = ['crazing','inclusion','pitted_surface','scratches','patches','rolled-in_scale']
def convert(size, box):
dw = 1./(size[0])
dh = 1./(size[1])
x = (box[0] + box[1])/2.0 - 1
y = (box[2] + box[3])/2.0 - 1
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
if w>=1:
w=0.99
if h>=1:
h=0.99
return (x,y,w,h)
def convert_annotation(rootpath,xmlname):
xmlpath = rootpath + '/Annotations'
xmlfile = os.path.join(xmlpath,xmlname)
with open(xmlfile, "r", encoding='UTF-8') as in_file:
txtname = xmlname[:-4]+'.txt' # 生成对应的txt文件名
print(txtname)
txtpath = rootpath + '/labels' # 生成的.txt文件会被保存在worktxt目录下
if not os.path.exists(txtpath):
os.makedirs(txtpath)
txtfile = os.path.join(txtpath,txtname)
with open(txtfile, "w+" ,encoding='UTF-8') as out_file:
tree=ET.parse(in_file)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
out_file.truncate()
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult)==1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
rootpath='dataset'
xmlpath=rootpath+'/Annotations'
list=df.xml.values
for i in range(0,len(list)) :
path = os.path.join(xmlpath,list[i]) # 判断Annotations下是否是xml文件或XML文件
if ('.xml' in path)or('.XML' in path):
convert_annotation(rootpath,list[i])
print('done', i)
else:
print('not xml file',i)
!cat dataset/labels/0.txt
5 0.6075 0.14250000000000002 0.775 0.165
5 0.505 0.6825 0.79 0.525
!ls ../dataset
Annotations images label_list.txt labels test
安装完之后路径是working/yolov5
!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt # install
from yolov5 import utils
display = utils.notebook_init() # check
YOLOv5 v6.2-181-g8a19437 Python-3.7.12 torch-1.11.0 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)
Setup complete ✅ (2 CPUs, 15.6 GB RAM, 3884.4/4030.6 GB disk)
gbr.yaml
内容为:yaml:
names:
- crazing
- inclusion
- pitted_surface
- scratches
- patches
- rolled-in_scale
nc: 6
path: /kaggle/working/ # dataset的上一级目录,绝对路径
train: /kaggle/working/train.txt # train.txt绝对路径,好像也可以用相对路径
val: /kaggle/working/val.txt
with open('dataset/label_list.txt','r') as file:
labels=[x.split('\n')[0] for x in file.readlines()]
labels
['crazing',
'inclusion',
'pitted_surface',
'scratches',
'patches',
'rolled-in_scale']
import yaml
shuffle_df=df.sample(frac=1)
train_df=shuffle_df[:1200]
val_df=shuffle_df[1200:]
cwd='/kaggle/working/' # 数据集(dataset)的上一级目录
with open(os.path.join( cwd ,'train.txt'), 'w') as f:
for path in train_df.image.tolist():
f.write('./dataset/images/'+path+'\n') # txt文件写的是图片相对于cwd的地址
with open(os.path.join(cwd , 'val.txt'), 'w') as f:
for path in val_df.image.tolist():
f.write('./dataset/images/'+path+'\n')
with open(os.path.join( cwd ,'trainval.txt'), 'w') as f:
for path in df.image.tolist():
f.write('./dataset/images/'+path+'\n') # txt文件写的是图片相对于cwd的地址
data = dict(
path = '/kaggle/working/',
train = os.path.join( cwd , 'train.txt') ,
val = os.path.join( cwd , 'val.txt' ),
nc = 6,
names = labels,
)
with open(os.path.join( cwd , 'gbr.yaml'), 'w') as outfile:
yaml.dump(data, outfile, default_flow_style=False)
f = open(os.path.join( cwd , 'gbr.yaml'), 'r')
print('\nyaml:')
print(f.read())
!head -n 3 ../train.txt
输出结果:
yaml:
names:
- crazing
- inclusion
- pitted_surface
- scratches
- patches
- rolled-in_scale
nc: 6
path: /kaggle/working/
train: /kaggle/working/train.txt
val: /kaggle/working/val.txt
./dataset/images/354.jpg
./dataset/images/13.jpg
./dataset/images/1395.jpg
import torch
def set_seeds(seed):
torch.manual_seed(seed) # 固定随机种子(CPU)
if torch.cuda.is_available(): # 固定随机种子(GPU)
torch.cuda.manual_seed(seed) # 为当前GPU设置
torch.cuda.manual_seed_all(seed) # 为所有GPU设置
np.random.seed(seed) # 保证后续使用random函数时,产生固定的随机数
torch.backends.cudnn.deterministic = True # 固定网络结构
set_seeds(106)
# 这么写是后面设置wandb输出文件夹时懒得复制一遍PROJECT和NAME,其实也可以不写这一段
DIM = 256 # img_size
MODEL = 'yolov5s6'
PROJECT = 'paddle-iron-detection' # w&b in yolov5
NAME = f'{MODEL}-dim{DIM}-epoch{EPOCHS}' # w&b for yolov5
NAME
'yolov5s6-dim224-epoch20'
!wandb.login(key=api_key)
可直接启动wandb,import wandb
try:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("WANDB")
wandb.login(key=api_key)
anonymous = None
except:
wandb.login(anonymous='must')
print('To use your W&B account,\nGo to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. \nGet your W&B access token from here: https://wandb.ai/authorize')
wandb: WARNING If you're specifying your api key in code, ensure this code is not shared publicly.
wandb: WARNING Consider setting the WANDB_API_KEY environment variable, or running `wandb login` from the command line.
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
(实验发现img=256比默认640效果更好)
!python train.py --img 256 --batch 16 --epochs 20 --optimizer Adam \
--data ../gbr.yaml --hyp data/hyps/hyp.VOC.yaml\
--weights yolov5s.pt --project {project} --name {name}
Model summary: 157 layers, 7026307 parameters, 0 gradients, 15.8 GFLOPs
Class Images Instances P R mAP50 mAP50-95: 100% 7/7 [00:02<00:00, 3.08it/s]
all 200 420 0.644 0.672 0.689 0.321
crazing 200 83 0.515 0.325 0.361 0.112
inclusion 200 90 0.604 0.711 0.755 0.349
pitted_surface 200 48 0.829 0.792 0.8 0.415
scratches 200 59 0.828 0.831 0.9 0.398
patches 200 64 0.65 0.953 0.91 0.483
rolled-in_scale 200 76 0.436 0.421 0.408 0.17
Results saved to paddle-iron-detection/yolov5s6-dim224-epoch20
这些训练结果都代表啥,可以查看《yolov5 训练结果解析》
import pandas as pd
result=pd.read_csv('paddle-iron-detection/yolov5s6-dim224-epoch20/results.csv')
result
OUTPUT_DIR = '{}/{}'.format(PROJECT, NAME)
!ls {OUTPUT_DIR}
F1_curve.png results.png
PR_curve.png train_batch0.jpg
P_curve.png train_batch1.jpg
R_curve.png train_batch2.jpg
confusion_matrix.png val_batch0_labels.jpg
events.out.tfevents.1664736500.2cd00906b272.888.0 val_batch0_pred.jpg
hyp.yaml val_batch1_labels.jpg
labels.jpg val_batch1_pred.jpg
labels_correlogram.jpg val_batch2_labels.jpg
opt.yaml val_batch2_pred.jpg
results.csv weights
# 这是另一个比赛的图,仅做展示。这个cells的输出我删了,懒得再跑一次了
plt.figure(figsize = (10,10))
plt.axis('off')
plt.imshow(plt.imread(f'{OUTPUT_DIR}/labels_correlogram.jpg'));
# Wandb界面的Mosaics(yolov5s6-dim3000-fold1)
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch0.jpg'))
plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch1.jpg'))
plt.figure(figsize = (10, 10))
plt.imshow(plt.imread(f'{OUTPUT_DIR}/train_batch2.jpg'))
fig, ax = plt.subplots(3, 2, figsize = (2*9,3*5), constrained_layout = True)
for row in range(3):
ax[row][0].imshow(plt.imread(f'{OUTPUT_DIR}/val_batch{row}_labels.jpg'))
ax[row][0].set_xticks([])
ax[row][0].set_yticks([])
ax[row][0].set_title(f'{OUTPUT_DIR}/val_batch{row}_labels.jpg', fontsize = 12)
ax[row][1].imshow(plt.imread(f'{OUTPUT_DIR}/val_batch{row}_pred.jpg'))
ax[row][1].set_xticks([])
ax[row][1].set_yticks([])
ax[row][1].set_title(f'{OUTPUT_DIR}/val_batch{row}_pred.jpg', fontsize = 12)
plt.show()
可以看到,还是有很多没预测出来,也有一些预测框有偏差的。
Model summary: 157 layers, 7026307 parameters, 0 gradients, 15.8 GFLOPs
Class Images Instances P R mAP50 mAP50-95: 100% 7/7 [00:02<00:00, 2.80it/s]
all 200 420 0.694 0.728 0.745 0.381
crazing 200 83 0.498 0.359 0.391 0.125
inclusion 200 90 0.638 0.706 0.761 0.371
pitted_surface 200 48 0.881 0.792 0.829 0.468
scratches 200 59 0.854 0.894 0.95 0.511
patches 200 64 0.775 0.984 0.947 0.563
rolled-in_scale 200 76 0.518 0.632 0.592 0.247
观察发现,数据集明暗程度相差很多,利用直方图均衡化,平衡图像的明暗度。
# 处理测试集图片
test_path = '../dataset/test/IMAGES'
test_path1 = test_path+'_equ'
os.makedirs(test_path1,exist_ok=1)
for i in os.listdir(test_path):
underexpose = cv2.imread(os.path.join(test_path,i))
equalizeUnder = np.zeros(underexpose.shape, underexpose.dtype)
equalizeUnder[:, :, 0] = cv2.equalizeHist(underexpose[:, :, 0])
equalizeUnder[:, :, 1] = cv2.equalizeHist(underexpose[:, :, 1])
equalizeUnder[:, :, 2] = cv2.equalizeHist(underexpose[:, :, 2])
cv2.imwrite(os.path.join(test_path1,i),equalizeUnder)
# 处理训练集图片
train_path = '../dataset/images'
train_path1 = test_path+'_equ'
os.makedirs(train_path1,exist_ok=1)
for i in os.listdir(train_path):
underexpose = cv2.imread(os.path.join(train_path,i))
equalizeUnder = np.zeros(underexpose.shape, underexpose.dtype)
equalizeUnder[:, :, 0] = cv2.equalizeHist(underexpose[:, :, 0])
equalizeUnder[:, :, 1] = cv2.equalizeHist(underexpose[:, :, 1])
equalizeUnder[:, :, 2] = cv2.equalizeHist(underexpose[:, :, 2])
cv2.imwrite(os.path.join(train_path1,i),equalizeUnder)
# 将处理后的训练集和测试集、标注文件夹、labels文件夹都移动到新文件夹dataset_equ
!mkdir ../dataset_equ
# 移动训练集
!mv ../dataset/images_equ/ ../dataset_equ
!mv ../dataset/test/IMAGES_equ/ ../dataset_equ
# 移动测试集
!mv ../dataset_equ/images_equ ../dataset_equ/images
# 移动标注文件,其实是voc格式的标注,已经没用了
!mv ../dataset/Annotations ../dataset_equ
# 移动lables
!cp -r ../dataset/labels ../dataset_equ
!ls ../dataset_equ
移动完后,需要重新写一下gbr.yaml文件
import yaml
cwd='/kaggle/working/' # 数据集(dataset)的上一级目录
with open(os.path.join(cwd,'train_equ.txt'), 'w') as f:
for path in train_df.image.tolist():
f.write('./dataset_equ/images/'+path+'\n') # txt文件写的是图片相对于cwd的地址
with open(os.path.join(cwd ,'val_equ.txt'), 'w') as f:
for path in val_df.image.tolist():
f.write('./dataset_equ/images/'+path+'\n')
data = dict(
path = '/kaggle/working/',
train = os.path.join(cwd,'train_equ.txt') ,
val = os.path.join(cwd,'val_equ.txt' ),
nc = 6,
names = labels,
)
with open(os.path.join( cwd , 'gbr_equ.yaml'), 'w') as outfile:
yaml.dump(data, outfile, default_flow_style=False)
f = open(os.path.join( cwd , 'gbr_equ.yaml'), 'r')
print('\nyaml:')
print(f.read())
!head -n 3 ../train_equ.txt
!python train.py --img 256 --batch 16 --epochs 20 --optimizer Adam \
--data ../gbr_equ.yaml --hyp data/hyps/hyp.Objects365.yaml\
--weights yolov5s.pt --project {project} --name yolov5s-obj-adam20-equ
结果并不好:
Class Images Instances P R mAP50 mAP50-95: 100% 7/7 [00:02<00:00, 2.66it/s]
all 200 420 0.57 0.648 0.651 0.328
crazing 200 83 0.472 0.265 0.329 0.105
inclusion 200 90 0.543 0.711 0.705 0.328
pitted_surface 200 48 0.685 0.816 0.832 0.527
scratches 200 59 0.761 0.701 0.787 0.377
patches 200 64 0.644 0.922 0.898 0.522
rolled-in_scale 200 76 0.314 0.474 0.354 0.109
PROJECT = 'paddle-iron-detection' # w&b in yolov5
!python train.py --img 256 --data ../gbr.yaml --hyp data/hyps/hyp.Objects365.yaml\
--weights yolov5x.pt --project {project} --name yolov5x-default \
--patience 20 --epoch 100 --cache
patience20
表示20个epoch内模型都没有优化就会停止训练。cache
表示图片会先加载到内存再训练,可以加快训练速度。
训练花了一小时,第98个epoch效果最好,提升了一点。
Class Images Instances P R mAP50 mAP50-95: 100% 7/7 [00:03<00:00, 2.24it/s]
all 200 420 0.765 0.753 0.794 0.445
crazing 200 83 0.509 0.449 0.455 0.163
inclusion 200 90 0.709 0.8 0.85 0.456
pitted_surface 200 48 0.923 0.833 0.883 0.541
scratches 200 59 0.945 0.873 0.975 0.536
patches 200 64 0.845 0.969 0.941 0.672
rolled-in_scale 200 76 0.659 0.592 0.662 0.301
!python train.py --img 256 --data ../gbr_all.yaml --hyp data/hyps/hyp.Objects365.yaml\
--weights paddle-iron6/yolov5x-default/weights/best.pt --project {project} --name yolov5x-120 \
--epoch 20 --save-period 1
第19个epoch效果最好,进行推理,提交后分数37.74。
!python detect.py --weights paddle-iron6/yolov5x-1203/weights/best.pt --augment\
--img 256 --conf 0.3 --source ../dataset/test/IMAGES --save-txt --save-conf
import xml.etree.ElementTree as ET
from pathlib import Path
import random
# 原图片、标签文件、裁剪图片路径
img_path = 'dataset/IMAGES'
xml_path = 'train/ANNOTATIONS'
obj_img_path = 'train/clip'
if os.path.exists(obj_img_path) :
print(f'{obj_img_path} is exist')
else:
os.mkdir(obj_img_path) # 裁剪目录要先创建,不然后面在此目录接着创建子目录会报错
# 声明一个空字典用于储存裁剪图片的类别及其数量
clip= {}
# 把原图片裁剪后,按类别新建文件夹保存,并在该类别下按顺序编号
for img_file in os.listdir(img_path):
if img_file[-4:] in ['.png', '.jpg']: # 判断文件是否为图片格式
img_filename = os.path.join(img_path, img_file) # 将图片路径与图片名进行拼接,例如‘train/IMAGES\0.jpg’
img_cv = cv2.imread(img_filename) # 读取图片
img_name = (os.path.splitext(img_file)[0]) # 图片索引,如“000.png” 图片名为“000”
xml_name = xml_path + '\\' + '%s.xml' % img_name # 完整的标签路径名,例如‘train/ANNOTATIONS\0.xml’
if os.path.exists(xml_name): # 判断与图片同名的标签是否存在,因为图片不一定每张都打标
root = ET.parse(xml_name).getroot() # 利用ET读取xml文件
for obj in root.iter('object'): # 遍历所有目标框
name = obj.find('name').text # 获取目标框名称,即label名
xmlbox = obj.find('bndbox') # 找到框目标
x0 = xmlbox.find('xmin').text # 将框目标的四个顶点坐标取出
y0 = xmlbox.find('ymin').text
x1 = xmlbox.find('xmax').text
y1 = xmlbox.find('ymax').text
obj_img = img_cv[int(y0):int(y1), int(x0):int(x1)] # cv2裁剪出目标框中的图片
clip.setdefault(name, 0) # 判断字典中有无当前name对应的类别,无则新建
clip[name] += 1 # 当前类别对应数量 + 1
my_file = Path(obj_img_path + '/' + name) # 判断当前name对应的类别有无文件夹
if 1 - my_file.is_dir(): # 无则新建
os.mkdir(str(obj_img_path + '/' + str(name)))
# 保存裁剪图片,图片命名4位,不足补0
#cv2.imwrite(obj_img_path + '/' + name + '/' + '%04d' % (clip[name]) + '.jpg',obj_img) # 按顺序命名裁剪图片
# 裁剪图片名为原图片名+顺序名
cv2.imwrite(obj_img_path + '/' + name + '/' + img_name+'_'+ '%04d' % (clip[name])+'.jpg',obj_img)
--multi-scale
),结果变差了--image-weights
),结果也变差了。!python train.py --img 256 --batch 16 --epochs 50 --weights=None\
--data /kaggle/working/gbr.yaml --hyp data/hyps/hyp.scratch-med.yaml\
--project kaggle-iron --name yolov5l-scratch --cfg models/yolov5l.yaml
YOLOv5l summary: 267 layers, 46135203 parameters, 0 gradients, 107.7 GFLOPs
Class Images Instances P R mAP50
all 200 473 0.573 0.672 0.67 0.323
crazing 200 66 0.433 0.227 0.324 0.0922
inclusion 200 127 0.647 0.748 0.741 0.325
pitted_surface 200 33 0.674 0.727 0.759 0.475
scratches 200 68 0.519 0.809 0.745 0.303
patches 200 120 0.738 0.925 0.924 0.535
rolled-in_scale 200 59 0.424 0.598 0.525 0.21
YOLOv5l summary: 267 layers, 46135203 parameters, 0 gradients, 107.7 GFLOPs
Class Images Instances P R mAP50
all 200 473 0.703 0.679 0.732 0.358
crazing 200 66 0.793 0.303 0.511 0.162
inclusion 200 127 0.671 0.756 0.755 0.349
pitted_surface 200 33 0.737 0.697 0.741 0.446
scratches 200 68 0.626 0.824 0.84 0.395
patches 200 120 0.853 0.917 0.936 0.563
rolled-in_scale 200 59 0.537 0.576 0.606 0.229
《yolov5 --save-txt 生成的txt怎么设置为覆盖而不是追加到txt中》
--save-txt --save-conf
:表示预测结果保存为txt,且保存置信度分数!python detect.py --weights paddle-iron-detection/yolov5m-dim224-epoch50/weights/best.pt\
--img 224 --conf 0.25 --source ../dataset/test/IMAGES --save-txt --save-conf
最终结果保存在
yolov5runs/detect
下,每跑一次模型生成一个exp文件夹。我跑了三次,所以结果在runs/detect/exp3/
,txt文件在runs/detect/exp3/labels
display.Image(filename='runs/detect/exp3/1401.jpg', width=300)
!cat runs/detect/exp3/labels/1401.txt
1 0.7825 0.5775 0.205 0.775 0.478262
[x_center,y_center,w,h]
,且被归一化((将x_center和w除以图像宽度,将y_center与h除以图像高度。这样xywh的值域都是[0,1]
))-save-conf
会在bbox后面保存置信度结果需要按照题目要求的格式处理预测结果。
一开始没有注意数据格式问题,怎么保存csv bbox的逗号都没了,折腾了一天
import pandas as pd
import numpy as np
result_list = []
for name in os.listdir('dataset/test/IMAGES'): # 遍历测试集图片
idx=name.split('.')[0] # 图片索引
txt_name = 'uns/detect/exp3/labels/'+idx+'.txt'
try: # 如果这张图片有预测到结果,就写入以下信息
with open(txt_name, 'r') as f:
predicts = f.readlines() # 从txt文本读取的是字符串格式,要转为对应的数字格式
for predict in predicts:
pred=predict.split(' ')
cls=pred[0]
bbox=[float(x) for x in pred[1:5]]
score=pred[5].rstrip() # 去掉右侧换行符
result_list.append([idx,bbox,cls,score])
except: # 如果没有预测到检测框,就只返回idx
result_list.append([idx])
df= pd.DataFrame(result_list,columns=['image_id','bbox','category_id','confidence'])
df.head()
image_id bbox category_id confidence
0 1400 None None None
1 1401 [0.7825, 0.5775, 0.205, 0.775] 1 0.478262
2 1402 [0.785, 0.5, 0.42, 1] 2 0.419653
3 1402 [0.445, 0.4875, 0.84, 0.975] 2 0.437668
4 1403 [0.3675, 0.5, 0.165, 1] 3 0.765889
# pd.to_numeric也可以将series里面可以转换为数字的值转为数字,不能转换的可以保留原格式/设为缺失值/报错
df.image_id=pd.to_numeric(df.image_id,errors='ignore')
df.category_id=df.category_id.astype('int')
df.confidence=df.confidence.astype('float')
df.info()
0 image_id 982 non-null int64
1 bbox 982 non-null object
2 category_id 982 non-null int32
3 confidence 982 non-null float64
dtypes: float64(1), int32(1), int64(1), object(1)
代码来自github上的bbox包,用法可参考《kaggle——海星目标检测比赛》帖子中的3.4章节:生成标注文件
# 打印可以看到测试集图片尺寸都是200,200
from PIL import Image
for i,name in enumerate(os.listdir('dataset/test/IMAGES')):
img_name='dataset/test/IMAGES/'+name
image = Image.open(img_name)
print(image.size)
def yolo2voc(bboxes, height=200, width=200):
"""
yolo => [xmid, ymid, w, h] (normalized)
voc => [x1, y1, x2, y1]
"""
# bboxes = bboxes.copy().astype(float) # otherwise all value will be 0 as voc_pascal dtype is np.int
bboxes[..., 0::2] *= width
bboxes[..., 1::2] *= height
bboxes[..., 0:2] -= bboxes[..., 2:4]/2
bboxes[..., 2:4] += bboxes[..., 0:2]
return bboxes
# yolog格式预测框转为voc格式预测框
df.bbox=df.bbox.apply(lambda x: yolo2voc(np.array(x).astype(np.float32)))
"""
转完格式后,bbox是array格式,直接保存csv文件,bbox这一列没有逗号,我也不知道为啥会这样,坑死我了
必须转为list格式,bbox在保存csv时,列表中才有逗号,不然就是[0.0 3.0 200.0 67.0]的格式
"""
df.bbox=df.bbox.apply(lambda x:list(x))
# 比赛提交的csv文件,不需要index,但必须有列名,否则报错异常
df.to_csv('submission.csv',index=None,header=None)
本项目使用PaddleX进行目标检测,并生成端到端的模型。
代码参考了新人练习赛 钢铁缺陷检测挑战赛baseline改良方案,并使用了其中的部分代码。
将数据按照8:2的比例划分为训练集和测试集。这一部分直接使用了参考代码。
#首先将训练集解压缩
!unzip -oq /home/aistudio/data/data105746/train.zip -d /home/aistudio/work/
#测试集集解压缩
!unzip -oq /home/aistudio/data/data105747/test.zip -d /home/aistudio/work/
#删除生成的_MACOSX
!rm -rf /home/aistudio/work/__MACOSX
#遍历训练数据,将数据以8:2划分为训练集和验证集,如果已经完成了,就不需要在进行此步骤了
import os
name = [name for name in os.listdir('work/train/IMAGES') if name.endswith('.jpg')]
train_name_list=[]
for i in name:
tmp = os.path.splitext(i)
train_name_list.append(tmp[0])
# 构造图片-xml的链接文件ori_train.txt
with open("./work/train/ori_train.txt","w") as f:
for i in range(len(train_name_list)):
if i!=0: f.write('\n')
line='IMAGES/'+train_name_list[i]+'.jpg'+" "+"ANNOTATIONS/"+train_name_list[i]+'.xml'
f.write(line)
# 构造label.txt
labels=['crazing','inclusion','pitted_surface','scratches','patches','rolled-in_scale']
with open("./work/train/labels.txt","w") as f:
for i in range(len(labels)):
line=labels[i]+'\n'
f.write(line)
# 将ori_train.txt随机按照eval_percent分为验证集文件和训练集文件
# eval_percent 验证集所占的百分比
import random
eval_percent=0.2;
data=[]
with open("work/train/ori_train.txt", "r") as f:
for line in f.readlines():
line = line.strip('\n')
data.append(line)
index=list(range(len(data)))
random.shuffle(index)
# 构造验证集文件
cut_point=int(eval_percent*len(data))
with open("./work/train/val_list.txt","w") as f:
for i in range(cut_point):
if i!=0: f.write('\n')
line=data[index[i]]
f.write(line)
# 构造训练集文件
with open("./work/train/train_list.txt","w") as f:
for i in range(cut_point,len(data)):
if i!=cut_point: f.write('\n')
line=data[index[i]]
f.write(line)
# 安装paddlex
# 需要注意paddlex1对于版本有所要求,所以最好更新对应的包版本
!pip install "numpy<=1.19.5" -i https://mirror.baidu.com/pypi/simple
!pip install paddlex==2.0.0
#引入需要使用的库
import matplotlib
matplotlib.use('Agg')
import os
#os.environ['GPU_VISIBLE_DEVICES'] = '0'#似乎不需要使用这条语句
import paddlex as pdx
import numpy as np
from paddlex import transforms
train_transforms = transforms.Compose([
#transforms.MixupImage(mixup_epoch=250),
#transforms.RandomDistort(),
#transforms.RandomExpand(),
#transforms.RandomCrop(),
transforms.RandomResizeByShort(short_sizes=[640, 672, 704, 736, 768, 800],
max_size=1333,
interp='RANDOM'),
transforms.RandomHorizontalFlip(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])#在数据增强方面,大多数增强方式都不利于模型精度的提高,因此只选用了图片翻转,后期为了训练的稳定性,将关掉所有的数据增强。
#另外进行了图片的缩放和归一化便于进行训练。
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=800,
max_size=1333,
interp='CUBIC'),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
在训练前期仅使用训练集数据进行训练,在训练的末期将所有的图片都用于训练
train_dataset = pdx.datasets.VOCDetection(
data_dir='work/train',
file_list='work/train/train_list.txt',
label_list='work/train/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir='work/train',
file_list='work/train/val_list.txt',
label_list='work/train/labels.txt',
transforms=eval_transforms)
trainval_dataset = pdx.datasets.VOCDetection(
data_dir='work/train',
file_list='work/train/ori_train.txt',
label_list='work/train/labels.txt',
transforms=eval_transforms)
出于题目的排名是基于网络精度而进行的,所以选用精度更高的两阶段法Fast-RCNN,并且backbone选用ResNet101_vd
#num_classes = len(train_dataset.labels)
model = pdx.det.FasterRCNN(num_classes=6,
backbone='ResNet101_vd')
因为使用了预训练模型所以在模型训练的初期使用warm-up学习率进行训练,在模型稳定了之后使用余弦退火衰减学习率。(余弦退火衰减学习率效果并不好,所以结果舍去了)
选择带有动量的SGD作为优化器,同时对所有的参数设置了L2正则化系数。
import paddle
train_batch_size = 8
num_steps_each_epoch = 1120 // train_batch_size
num_epochs = 80
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
learning_rate=0.06,
T_max=num_steps_each_epoch * 12 // 3)
warmup_epoch = 1
warmup_steps = warmup_epoch * num_steps_each_epoch
scheduler = paddle.optimizer.lr.LinearWarmup(
learning_rate=scheduler,
warmup_steps=warmup_steps,
start_lr=0.006,
end_lr=0.06)
custom_optimizer = paddle.optimizer.Momentum(
scheduler,
momentum=.9,
weight_decay=paddle.regularizer.L2Decay(coeff=1e-04),
parameters=model.net.parameters())
model.train(num_epochs = num_epochs,
train_dataset = train_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
optimizer=custom_optimizer,
save_interval_epochs=1,
log_interval_steps=280,
save_dir='output/T001',
pretrain_weights='COCO',
early_stop=True,
early_stop_patience=5,
use_vdl=True,
metric='coco',
#pretrain_weights = None,
#resume_checkpoint = "output/T008_101_vdMCpie3*lr/epoch_38_78.376"
)
2022-10-09 02:35:08 [INFO] There are 556/560 variables loaded into FasterRCNN.
2022-10-09 03:12:04 [INFO] [TRAIN] Epoch=14/80, Step=70/560, loss_rpn_cls=0.001508, loss_rpn_reg=0.012554, loss_bbox_cls=0.080150, loss_bbox_reg=0.139579, loss=0.233791, lr=0.000018, time_each_step=0.26s, eta=3:12:33
2022-10-09 03:12:20 [INFO] [TRAIN] Epoch=14/80, Step=140/560, loss_rpn_cls=0.001076, loss_rpn_reg=0.011392, loss_bbox_cls=0.074022, loss_bbox_reg=0.163931, loss=0.250421, lr=0.000071, time_each_step=0.24s, eta=3:1:1
2022-10-09 03:12:38 [INFO] [TRAIN] Epoch=14/80, Step=210/560, loss_rpn_cls=0.006574, loss_rpn_reg=0.023940, loss_bbox_cls=0.170613, loss_bbox_reg=0.312816, loss=0.513943, lr=0.000160, time_each_step=0.26s, eta=3:11:41
2022-10-09 03:12:56 [INFO] [TRAIN] Epoch=14/80, Step=280/560, loss_rpn_cls=0.006941, loss_rpn_reg=0.005008, loss_bbox_cls=0.090919, loss_bbox_reg=0.162063, loss=0.264931, lr=0.000283, time_each_step=0.25s, eta=3:10:20
2022-10-09 03:13:13 [INFO] [TRAIN] Epoch=14/80, Step=350/560, loss_rpn_cls=0.019845, loss_rpn_reg=0.019525, loss_bbox_cls=0.047384, loss_bbox_reg=0.059670, loss=0.146424, lr=0.000440, time_each_step=0.24s, eta=3:2:0
2022-10-09 03:13:30 [INFO] [TRAIN] Epoch=14/80, Step=420/560, loss_rpn_cls=0.009172, loss_rpn_reg=0.023620, loss_bbox_cls=0.186319, loss_bbox_reg=0.234781, loss=0.453892, lr=0.000629, time_each_step=0.25s, eta=3:4:28
2022-10-09 03:13:48 [INFO] [TRAIN] Epoch=14/80, Step=490/560, loss_rpn_cls=0.002352, loss_rpn_reg=0.074411, loss_bbox_cls=0.052982, loss_bbox_reg=0.085663, loss=0.215407, lr=0.000848, time_each_step=0.25s, eta=3:7:50
2022-10-09 03:14:06 [INFO] [TRAIN] Epoch=14/80, Step=560/560, loss_rpn_cls=0.001969, loss_rpn_reg=0.014362, loss_bbox_cls=0.112086, loss_bbox_reg=0.211446, loss=0.339863, lr=0.001095, time_each_step=0.26s, eta=3:11:11
2022-10-09 03:14:06 [INFO] [TRAIN] Epoch 14 finished, loss_rpn_cls=0.00589173, loss_rpn_reg=0.02162566, loss_bbox_cls=0.089257866, loss_bbox_reg=0.14901184, loss=0.26578707 .
2022-10-09 03:14:06 [WARNING] Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
2022-10-09 03:14:06 [INFO] Start to evaluate(total_samples=280, total_steps=280)...
2022-10-09 03:14:29 [INFO] Start evaluate...
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.36s).
Accumulating evaluation results...
DONE (t=0.09s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.778
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.265
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.363
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.495
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.264
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.555
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.563
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.429
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.547
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
2022-10-09 03:14:30 [INFO] [EVAL] Finished, Epoch=14, bbox_mmap=0.425782 .
2022-10-09 03:14:34 [INFO] Model saved in output/T001/best_model.
2022-10-09 03:14:34 [INFO] Current evaluated best model on eval_dataset is epoch_14, bbox_mmap=0.42578244961501155
2022-10-09 03:14:35 [INFO] Model saved in output/T001/epoch_14.
custom_optimizer = paddle.optimizer.Momentum(
scheduler2,
momentum=.9,
weight_decay=paddle.regularizer.L2Decay(coeff=1e-04),
parameters=model.net.parameters())
import paddle
train_batch_size = 8
num_steps_each_epoch = 1400 // train_batch_size
num_epochs = 10
scheduler3 = paddle.optimizer.lr.CosineAnnealingDecay(
learning_rate=0.001,
T_max=num_steps_each_epoch * 12 // 3)
custom_optimizer = paddle.optimizer.Momentum(
scheduler3,
momentum=.9,
weight_decay=paddle.regularizer.L2Decay(coeff=1e-04),
parameters=model.net.parameters())
paddelx中训练时默认每个epoch都保存模型,比如best_model文件夹中,model.pdparams应该是存储模型参数,model.pdopt存储的是优化器信息。
pretrain_weights = None,resume_checkpoint = "output/bset_model"
可以进行断点续传,继续训练,前提是模型和优化器都要一样# 10epoch,1991s
model.train(num_epochs = num_epochs,
train_dataset = trainval_dataset,
train_batch_size=train_batch_size,
eval_dataset=eval_dataset,
optimizer=custom_optimizer,
save_interval_epochs=1,
log_interval_steps=90,
save_dir='output/T002',
early_stop=True,
early_stop_patience=5,
use_vdl=True,
metric='coco',
pretrain_weights = "output/T001/best_model/model.pdparams"
)
2022-10-09 04:31:14 [INFO] [TRAIN] Epoch 5 finished, loss_rpn_cls=0.005495656, loss_rpn_reg=0.020394348, loss_bbox_cls=0.08805889, loss_bbox_reg=0.1477338, loss=0.2616827 .
2022-10-09 04:31:14 [WARNING] Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
2022-10-09 04:31:15 [INFO] Start to evaluate(total_samples=280, total_steps=280)...
2022-10-09 04:31:38 [INFO] Start evaluate...
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.35s).
Accumulating evaluation results...
DONE (t=0.08s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.469
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.824
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.483
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.353
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.531
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.281
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.590
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.533
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.583
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626
2022-10-09 04:31:39 [INFO] [EVAL] Finished, Epoch=5, bbox_mmap=0.468618 .
2022-10-09 04:31:43 [INFO] Model saved in output/T002/best_model.
2022-10-09 04:31:43 [INFO] Current evaluated best model on eval_dataset is epoch_5, bbox_mmap=0.46861803517092365
2022-10-09 04:31:44 [INFO] Model saved in output/T002/epoch_5.
2022-10-09 04:44:38 [INFO] [TRAIN] Epoch 9 finished, loss_rpn_cls=0.004496215, loss_rpn_reg=0.01914196, loss_bbox_cls=0.079838865, loss_bbox_reg=0.13827054, loss=0.24174762 .
2022-10-09 04:44:38 [WARNING] Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
2022-10-09 04:44:38 [INFO] Start to evaluate(total_samples=280, total_steps=280)...
2022-10-09 04:45:00 [INFO] Start evaluate...
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.29s).
Accumulating evaluation results...
DONE (t=0.08s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.517
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.869
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.545
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.399
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.483
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.573
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.308
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.626
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.558
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.624
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.657
2022-10-09 04:45:01 [INFO] [EVAL] Finished, Epoch=9, bbox_mmap=0.517191 .
2022-10-09 04:45:05 [INFO] Model saved in output/T002/best_model.
2022-10-09 04:45:05 [INFO] Current evaluated best model on eval_dataset is epoch_9, bbox_mmap=0.5171907953103836
2022-10-09 04:45:06 [INFO] Model saved in output/T002/epoch_9.
上面使用全部数据训练,包括了验证集,所以不能以最优aAP作为指标 ,而是看训练集的loss。在visual中看到第5个epoch,train_loss最小,加载此模型进行推理。
直接使用参考代码进行预测,并生成可以直接提交的CSV文件
import paddlex as pdx
import os
import numpy as np
import pandas as pd
# 读取模型
model = pdx.load_model('output/T002/epoch_5')
#获取测试图片的序号
name = [name for name in os.listdir('work/test/IMAGES') if name.endswith('.jpg')]
test_name_list=[]
for i in name:
tmp = os.path.splitext(i)
test_name_list.append(tmp[0])
test_name_list.sort()
# 建立一个标号和题目要求的id的映射
num2index={'crazing':0,'inclusion':1,'pitted_surface':2,'scratches':3,'patches':4,'rolled-in_scale':5}
result_list = []
# 将置信度较好的框写入result_list
for index in test_name_list:
image_name = 'work/test/IMAGES/'+index+'.jpg'
predicts = model.predict(image_name)
for predict in predicts:
if predict['score']<0.5: continue;
# 将bbox转化为题目中要求的格式
tmp=predict['bbox']
tmp[2]+=tmp[0]
tmp[3]+=tmp[1]
line=[index,tmp,num2index[predict['category']],predict['score']]
result_list.append(line)
result_array = np.array(result_list)
df = pd.DataFrame(result_array,columns=['image_id','bbox','category_id','confidence'])
df.to_csv('output/T002/submission.csv',index=None)
前14个epoch的best-model,提交预测结果是40.005分,后面又训练5个epoch后,结果是40.935分。
预测图片保存在output/T002/visualize
# 48s
for index in test_name_list:
image_name = 'work/test/IMAGES/'+index+'.jpg'
predicts = model.predict(image_name)
pdx.det.visualize(image_name, predicts, threshold=0.5, save_dir='output/T002/visualize')
- 参考《飞桨新人赛:钢铁缺陷检测挑战赛-第1名方案》
- 训练36epoch,7580s(bs=4,训练集1232条数据),算下来210s/epoch
以下基本是第一名方案的代码,使用的是PaddleDetection2.3版本,所以有些地方会不一样。
# 解压文件并移除多余的目录
! unzip /home/aistudio/data/data105746/train.zip -d /home/aistudio/data/steel
!rm -r /home/aistudio/data/steel/__MACOSX
! unzip /home/aistudio/data/data105747/test.zip -d /home/aistudio/data/steel
!rm -r /home/aistudio/data/steel/__MACOSX
# 修改文件名字 JPEGImages Annotations
!mv /home/aistudio/data/steel/train/ANNOTATIONS /home/aistudio/data/steel/train/Annotations
!mv /home/aistudio/data/steel/train/IMAGES /home/aistudio/data/steel/train/JPEGImages
PadlleX:https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/data
PaddleDetection:
https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.3/docs/tutorials/PrepareDataSet.md
生成train_list.txt,val_list.txt和labels.txt
# 安装paddlex 用于拆分数据集
# 升级pip
!pip install --upgrade pip -i https://mirror.baidu.com/pypi/simple
!pip install "paddlex>2.0.0" -i https://mirror.baidu.com/pypi/simple
!paddlex --split_dataset --format VOC --dataset_dir /home/aistudio/data/steel/train \
--val_value 0.001 --test_value 0.0
# 下载PaddleDetection
%cd /home/aistudio/work
!git clone https://gitee.com/paddlepaddle/PaddleDetection.git -b release/2.3
# 进入PaddleDetection
%cd /home/aistudio/work/PaddleDetection
# 安装其它依赖
!pip install -r /home/aistudio/work/PaddleDetection/requirements.txt
# 临时环境安装
!pip install pycocotools -i https://mirror.baidu.com/pypi/simple
!pip install lap -i https://mirror.baidu.com/pypi/simple
%cd /home/aistudio/work/PaddleDetection/
#转换train
!python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir /home/aistudio/data/steel/train/ \
--voc_anno_list /home/aistudio/data/steel/train/train_list.txt \
--voc_label_list /home/aistudio/data/steel/train/labels.txt \
--voc_out_name /home/aistudio/data/steel/train/voc_train.json
#转换test
!python tools/x2coco.py \
--dataset_type voc \
--voc_anno_dir /home/aistudio/data/steel/train/ \
--voc_anno_list /home/aistudio/data/steel/train/val_list.txt \
--voc_label_list /home/aistudio/data/steel/train/labels.txt \
--voc_out_name /home/aistudio/data/steel/train/voc_val.json
!rm -r /home/aistudio/data/steel/train/Annotations/*
!mv /home/aistudio/data/steel/train/*.json /home/aistudio/data/steel/train/Annotations/
/home/aistudio/work/PaddleDetection
Start converting !
100%|████████████████████████████████████| 1399/1399 [00:00<00:00, 15936.76it/s]
Start converting !
100%|██████████████████████████████████████| 1/1 [00:00<00:00, 15987.00it/s]
在试了多种模型后,发现faster_rcnn_swin_tiny_fpn_3x_coco效果最好。接下来就带着大家走一遍训练流程把。
复制work/PaddleDetection/configs/faster_rcnn
下的faster_rcnn_swin_tiny_fpn_1x_coco.yml
一般来说,需要修改的就是weights即模型保存路径。及训练轮次,学习率等。
可以将一些需要改动的参数放到此文件中,这样就不会改动了里面得文件导致使用其他模型时还要再去那个文件进行改动。此文件的参数优先级高于其他base文件。
# 复制此文件,名为faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml
_BASE_: [
'faster_rcnn_swin_tiny_fpn_1x_coco.yml',
]
weights: output/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1/model_final
epoch: 42
LearningRate:
base_lr: 0.0001
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [24, 33]
- !LinearWarmup
start_factor: 0.1
steps: 1000
OptimizerBuilder:
clip_grad_by_norm: 1.0
optimizer:
type: AdamW
weight_decay: 0.05
faster_rcnn_swin_tiny_fpn_1x_coco
然后打开_BASE_的路径,即faster_rcnn_swin_tiny_fpn_1x_coco.yml
文件,
我们最需要改的是 第一个得数据集配置文件,以及训练参数配置文件。
_BASE_: [
'../datasets/coco_detection-fastertrcnn-swin.yml',
'../runtime.yml',
'_base_/optimizer_swin_1x-Copy1.yml',
'_base_/faster_rcnn_swin_tiny_fpn.yml',
'_base_/faster_rcnn_swin_reader.yml',
]
打开work/PaddleDetection/configs/datasets
/路径下的coco_detection.yml
metric: COCO
num_classes: 6
TrainDataset:
!COCODataSet
image_dir: JPEGImages
anno_path: Annotations/voc_train.json
dataset_dir: ../../data/steel/train
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset:
!COCODataSet
image_dir: JPEGImages
anno_path: Annotations/voc_val.json
dataset_dir: ../../data/steel/train
TestDataset:
!ImageFolder
anno_path: ../../data/steel/train/Annotations/voc_val.json
其他基本不用动。打开work/PaddleDetection/configs/faster_rcnn/_base_/
路径下的faster_rcnn_swin_reader.yml
。可以修改其中的batch_size=2
。
# 训练36epoch,7580s,第26个epoch效果最好
!python tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml\
--use_vdl=true --vdl_log_dir=vdl_dir/scalar --eval\
-o log_iter=154
[10/09 17:16:55] ppdet.engine INFO: Best test bbox ap is 0.435.
# 单卡断点续训
# !python tools/train.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_3x_coco.yml \
# -r /home/aistudio/work/output/faster_rcnn_swin_tiny_fpn_3x_coco/best \
# --eval \
# --use_vdl=true \
# --vdl_log_dir=vdl_dir/scalar \
# --eval
# 使用作者的最优模型。推理图像和生成txt文件
!python tools/infer.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml\
-o weights=/home/aistudio/own_model/34.pdparams \
--infer_dir=/home/aistudio/data/steel/test/IMAGES/ \
--output_dir=/home/aistudio/data/steel/infer_output\
--draw_threshold=0.3 --save_txt=True
# 使用自己的模型推理
!python tools/infer.py -c configs/faster_rcnn/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1.yml\
-o weights=output/faster_rcnn_swin_tiny_fpn_1x_coco-Copy1/best_model.pdparams \
--infer_dir=/home/aistudio/data/steel/test/IMAGES/ \
--output_dir=/home/aistudio/data/steel/infer_output2\
--draw_threshold=0.3 --save_txt=True
处理预测数据:
import csv
import os
headers = ['image_id','bbox','category_id','confidence']
classList = ['crazing','inclusion','pitted_surface','scratches','patches','rolled-in_scale']
rows = []
rootdir = '/home/aistudio/data/steel/infer_output'
list = os.listdir(rootdir) #列出文件夹下所有的目录与文件
for i in range(0,len(list)):
path = os.path.join(rootdir,list[i])
if os.path.isfile(path) and path.endswith('txt'):
txtFile = open(path)
print(path)
result = txtFile.readlines()
for r in result:
ls = r.split(' ')
Cls = ls[0]
sco = float(ls[1])
xmin = float(ls[2])
ymin = float(ls[3])
w = float(ls[4])
h = float(ls[5])
xmax = xmin+w
ymax = ymin+h
clsID = classList.index(Cls)
imgID = list[i][:-4]
row = [imgID,[xmin,ymin,xmax,ymax],clsID,sco]
rows.append(row)
with open('submission.csv','w')as f:
f_csv = csv.writer(f)
f_csv.writerow(headers)
f_csv.writerows(rows)
import pandas as pd
datafile = pd.read_csv('/home/aistudio/work/PaddleDetection/submission.csv')
# 按照列值排序
data = datafile.sort_values(by="image_id", ascending=True)
data.to_csv('submission_final.csv', mode='a+', index=False)
最后提交的时候要进行排序,不然分数会很低。
myconfig为配置文件。own_model为模型。
也尝试过paddledetection套件的其他一些模型,最终效果没有faster_rcnn_swin_tiny_fpn_3x_coco好。大家可以多做尝试。并尝试加一些trick。例如TTA和WBF等进行模型融合。
或者是试试cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml 、ppyolov2_r101vd_dcn_365e_coco.yml
单从训练来说:
ppyoloe+
,分数41.3
,。bs=16,87step/epoch,60s/epoch
。((全部训练集训练速度))faster-RCNN
,我目前刷到的是40.93
分,bs=8,175step/epoch,200s/epoch
。(全部训练集训练速度)aster-RCNN+swin
,参考方案刷到44
分,还没有复现。bs=4,308step/epoch,210s/epoch。(去除验证集168条数据)yolov5s
速度最快,分数37
左右吧,也可能是我没调好。推理时间还没有测试。