应用labelImg对图片进行标注,下载链接:https://github.com/tzutalin/labelImg
标注时,需要注意的点,(参考自https://blog.csdn.net/chenmaolin88/article/details/79357502)
标注什么 | 预定义的所有类别的所有对象实例(就是说,如果图片里面有3只浣熊,就要分别标注3只浣熊), 除非:你拿不准那玩意儿是不是。对象非常非常的小(尺度自己拿捏),只能看见对象的不到 10-20%的部分 , 因此你拿不准那个到底是哪一类的,比如你只能看见一个轮胎,你不确定是卡车还是小轿车,这种就可以不用标注.如果图片中的对象肉眼都难以识别,就丢掉这张图片 |
难以识别(difficult) | 若肉眼虽然可以大致识别,但确信度不是很高,则勾选difficult复选框,表示这个对象不是很好识别。 |
矩形框 | 用矩形框标注对象的可见区域, 不可见的区域不要标注. 非对象的区域不要标注,矩形框应该要且仅包括对象的所有可见的像素点, 除非为了包括很小一部分的对象部件,需要扩大很大一个矩形框面积,比如,小轿车的天线可以不用框进来,因为他太小了,且天线对于汽车来说无关紧要,并非主要特征。 |
截断Truncated | 如果对象超过 15-20% 的部分不在矩形框内,则将对象标记为Truncated. 这个标记意味着矩形框内没有包含完成的对象实例。这个属性在LabelImg中无法直接勾选,需要手工编辑XML文件里的对应标签。 |
遮挡Occlusion | 如果矩形框内,对象有超过 5% 的部分被遮挡, 标记为 Occluded. 这个标记指示矩形框内的图像存在被遮挡的情况。这个属性在LabelImg中无法直接勾选,需要手工编辑XML文件里的对应标签。 |
衣服、雪、泥etc | 如果遮挡物是跟对象强相关的,则不用标记为遮挡,比如 人身上的衣服,应视为人的一部分。 |
透明 | 透过玻璃看到的对象也应该被标记, 但是若玻璃是有点反光的,则玻璃上的映像,应被标记为遮挡 occlusion |
镜子 | 镜子里的对象也应该被标记。 |
透明 | 透过玻璃看到的对象也应该被标记, 但是若玻璃是有点反光的,则玻璃上的映像,应被标记为遮挡 occlusion |
海报 | 图片里面的海报、杂志等上面的对象也应该被标记,除非是一些很浮夸的卡通画 |
应用以下代码,自动生成无标注目标的xml文件
参考链接:https://www.jianshu.com/p/5b2254fdf8f8
#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image
# VEDAI 图像存储位置
src_img_dir = "/home/lehui/Desktop/负样本700"
# VEDAI 图像生成的xml文件存放位置
src_xml_dir = "/home/lehui/Desktop/xml"
img_Lists = glob.glob(src_img_dir + '/*.jpg')
img_basenames = [] # e.g. 100.jpg
for item in img_Lists:
img_basenames.append(os.path.basename(item))
img_names = [] # e.g. 100
for item in img_basenames:
temp1, temp2 = os.path.splitext(item)
img_names.append(temp1)
for img in img_names:
im = Image.open((src_img_dir + '/' + img + '.jpg'))
width, height = im.size
# write in xml file
#os.mknod(src_xml_dir + '/' + img + '.xml')
xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
xml_file.write('\n')
xml_file.write(' VOC2007 \n')
xml_file.write(' ' + str(img) + '.jpg' + ' \n')
xml_file.write(' '+ src_xml_dir + '/' + str(img) + '.jpg' + ' \n')
xml_file.write(' \n')
xml_file.write(' \n')
xml_file.write(' ' + str(width) + ' \n')
xml_file.write(' ' + str(height) + ' \n')
xml_file.write(' 3 \n')
xml_file.write(' \n')
xml_file.write(' 0 \n')
xml_file.write(' ')
应用以下代码,分别得到train、val、test对应的xml集合。
参考链接:https://www.cnblogs.com/gezhuangzhuang/p/10613468.html
import os
import random
import time
import shutil
#xmlfilepath——所有xml的路径,saveBasePath——保存结果的路径,下面建立三个文件夹:train、val、test
xmlfilepath=r'./Annotations'
saveBasePath=r"./Annotations"
trainval_percent=0.8
train_percent=0.8
total_xml = os.listdir(xmlfilepath)
num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)
print("train and val size",tv)
print("train size",tr)
start = time.time()
test_num=0
val_num=0
train_num=0
for i in list:
name=total_xml[i]
if i in trainval: #train and val set
if i in train:
directory="train"
train_num += 1
xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))
if(not os.path.exists(xml_path)):
os.mkdir(xml_path)
filePath=os.path.join(xmlfilepath,name)
newfile=os.path.join(saveBasePath,os.path.join(directory,name))
shutil.copyfile(filePath, newfile)
else:
directory="validation"
xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))
if(not os.path.exists(xml_path)):
os.mkdir(xml_path)
val_num += 1
filePath=os.path.join(xmlfilepath,name)
newfile=os.path.join(saveBasePath,os.path.join(directory,name))
shutil.copyfile(filePath, newfile)
else:
directory="test"
xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))
if(not os.path.exists(xml_path)):
os.mkdir(xml_path)
test_num += 1
filePath=os.path.join(xmlfilepath,name)
newfile=os.path.join(saveBasePath,os.path.join(directory,name))
shutil.copyfile(filePath, newfile)
end = time.time()
seconds=end-start
print("train total : "+str(train_num))
print("validation total : "+str(val_num))
print("test total : "+str(test_num))
total_num=train_num+val_num+test_num
print("total number : "+str(total_num))
print( "Time taken : {0} seconds".format(seconds))
:所有的xml文件
:所有的图片
:2中生成的训练集、测试集、验证集对应的txt
标签分类的配置文件(label_map.txt)
item {
id: 1 # id 从1开始编号
name: 'red pedestrian'
}
item {
id: 2
name: 'green pedestrian'
}
依赖\models\research\object_detection\dataset_tools\create_pascal_tf_record.py代码,
修改对应参数,生成tfrecored格式数据。
根据xml文件生成对应的json
参考链接:https://blog.csdn.net/weixin_41765699/article/details/100124689
import xml.etree.ElementTree as ET
import os
import json
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []
category_set = dict()
image_set = set()
category_item_id = -1
image_id = 20180000000
annotation_id = 0
def addCatItem(name):
global category_item_id
category_item = dict()
category_item['supercategory'] = 'none'
category_item_id += 1
category_item['id'] = category_item_id
category_item['name'] = name
coco['categories'].append(category_item)
category_set[name] = category_item_id
return category_item_id
def addImgItem(file_name, size):
global image_id
if file_name is None:
raise Exception('Could not find filename tag in xml file.')
if size['width'] is None:
raise Exception('Could not find width tag in xml file.')
if size['height'] is None:
raise Exception('Could not find height tag in xml file.')
image_id += 1
image_item = dict()
image_item['id'] = image_id
image_item['file_name'] = file_name
image_item['width'] = size['width']
image_item['height'] = size['height']
coco['images'].append(image_item)
image_set.add(file_name)
return image_id
def addAnnoItem(object_name, image_id, category_id, bbox):
global annotation_id
annotation_item = dict()
annotation_item['segmentation'] = []
seg = []
# bbox[] is x,y,w,h
# left_top
seg.append(bbox[0])
seg.append(bbox[1])
# left_bottom
seg.append(bbox[0])
seg.append(bbox[1] + bbox[3])
# right_bottom
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1] + bbox[3])
# right_top
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1])
annotation_item['segmentation'].append(seg)
annotation_item['area'] = bbox[2] * bbox[3]
annotation_item['iscrowd'] = 0
annotation_item['ignore'] = 0
annotation_item['image_id'] = image_id
annotation_item['bbox'] = bbox
annotation_item['category_id'] = category_id
annotation_id += 1
annotation_item['id'] = annotation_id
coco['annotations'].append(annotation_item)
def parseXmlFiles(xml_path):
for f in os.listdir(xml_path):
if not f.endswith('.xml'):
continue
bndbox = dict()
size = dict()
current_image_id = None
current_category_id = None
file_name = None
size['width'] = None
size['height'] = None
size['depth'] = None
xml_file = os.path.join(xml_path, f)
print(xml_file)
tree = ET.parse(xml_file)
root = tree.getroot()
if root.tag != 'annotation':
raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))
# elem is , , ,
更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中对应的输入参数,转换得到.pbtxt文件。
复制描述模型的四个文件,后缀分别为.data-00000-of-00001 .index .meta和checkpoint文件,应用以下命令
python object_detection/export_inference_graph.py --input_type=image_tensor
--pipeline_config_path=\models\ssd_mobilenet_v2_coco.config
--trained_checkpoint_prefix=\pedestrian_data\model\model.ckpt-1589
--output_directory\pedestrian_data\test
得到对应的pb模型。
ssd_mobilenet_v2 训练得到的pb模型转换为pbtxt,更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中对应的输入参数,转换得到.pbtxt文件。运用opencv中的dnn库读取网络,进行c++端的检测,代码如下:
参考链接:https://blog.csdn.net/atpalain_csdn/article/details/100098720
#include
#include
#include
#include
#include
#include
#include
using namespace std;
using namespace cv;
using namespace dnn;
float confThreshold, nmsThreshold;
std::vector classes;
void postprocess(Mat& frame, const std::vector& out, Net& net);
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
int main(int argc, char** argv)
{
// 根据选择的检测模型文件进行配置
confThreshold = 0.5;
nmsThreshold = 0.4;
float scale = 1.0;
Scalar mean = { 0, 0, 0 };
bool swapRB = true;
int inpWidth = 300;
int inpHeight = 300;
String modelPath = "frozen_inference_graph.pb";
String configPath = "frozen_inference_graph.pbtxt";
string image_file = "E:\\project\\data\\hands_data\\img1125\\";
String framework = "";
int backendId = cv::dnn::DNN_BACKEND_OPENCV;
int targetId = cv::dnn::DNN_TARGET_CPU;
//String classesFile = R"(object_detection_classes_coco.txt)";
// Open file with classes names.
//if (!classesFile.empty()) {
// const std::string& file = classesFile;
// std::ifstream ifs(file.c_str());
// if (!ifs.is_open())
// CV_Error(Error::StsError, "File " + file + " not found");
// std::string line;
// while (std::getline(ifs, line)) {
// classes.push_back(line);
// }
//}
classes.push_back("raiseHand");
// Load a model.
Net net = readNet(modelPath, configPath, framework);
net.setPreferableBackend(backendId);
net.setPreferableTarget(targetId);
std::vector outNames = net.getUnconnectedOutLayersNames();
// Create a window
static const std::string kWinName = "Deep learning object detection in OpenCV";
// Process frames.
Mat frame, blob;
namedWindow(kWinName, 0);
vector< cv::String > files;
cv::glob(image_file, files);
for (int i = 0; i < files.size(); i++)
{
frame = cv::imread(files[i]);
//cv::Mat image(90, 120, CV_8UC3, cv::Scalar::all(0));
if (frame.empty())
{
return 0;
}
//frame = imread("E:\\project\\data\\hands_data\\train_ssd\\hands_data_pos+neg\\raisehandData\\test\\0010.jpg");
// Create a 4D blob from a frame.
Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,
inpHeight > 0 ? inpHeight : frame.rows);
blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);
// Run a model.
net.setInput(blob);
if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN
{
resize(frame, frame, inpSize);
Mat imInfo = (Mat_(1, 3) << inpSize.height, inpSize.width, 1.6f);
net.setInput(imInfo, "im_info");
}
std::vector outs;
net.forward(outs, outNames);
postprocess(frame, outs, net);
// Put efficiency information.
std::vector layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
std::string label = format("Inference time: %.2f ms", t);
cout << label << endl;
putText(frame, label, Point(0, 15), FONT_HERSHEY_PLAIN, 0.5, Scalar(0, 255, 0));
imshow(kWinName, frame);
waitKey(1);
}
return 0;
}
void postprocess(Mat& frame, const std::vector& outs, Net& net)
{
static std::vector outLayers = net.getUnconnectedOutLayers();
static std::string outLayerType = net.getLayer(outLayers[0])->type;
std::vector classIds;
std::vector confidences;
std::vector boxes;
if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN
{
// Network produces output blob with a shape 1x1xNx7 where N is a number of
// detections and an every detection is a vector of values
// [batchId, classId, confidence, left, top, right, bottom]
CV_Assert(outs.size() == 1);
float* data = (float*)outs[0].data;
for (size_t i = 0; i < outs[0].total(); i += 7) {
float confidence = data[i + 2];
if (confidence > confThreshold) {
int left = (int)data[i + 3];
int top = (int)data[i + 4];
int right = (int)data[i + 5];
int bottom = (int)data[i + 6];
int width = right - left + 1;
int height = bottom - top + 1;
classIds.push_back((int)(data[i + 1]) - 1); // Skip 0th background class id.
boxes.push_back(Rect(left, top, width, height));
confidences.push_back(confidence);
}
}
}
else if (outLayerType == "DetectionOutput") {
// Network produces output blob with a shape 1x1xNx7 where N is a number of
// detections and an every detection is a vector of values
// [batchId, classId, confidence, left, top, right, bottom]
CV_Assert(outs.size() == 1);
float* data = (float*)outs[0].data;
for (size_t i = 0; i < outs[0].total(); i += 7) {
float confidence = data[i + 2];
if (confidence > confThreshold) {
int left = (int)(data[i + 3] * frame.cols);
int top = (int)(data[i + 4] * frame.rows);
int right = (int)(data[i + 5] * frame.cols);
int bottom = (int)(data[i + 6] * frame.rows);
int width = right - left + 1;
int height = bottom - top + 1;
classIds.push_back((int)(data[i + 1]) - 1); // Skip 0th background class id.
boxes.push_back(Rect(left, top, width, height));
confidences.push_back(confidence);
}
}
}
else if (outLayerType == "Region") {
for (size_t i = 0; i < outs.size(); ++i) {
// Network produces output blob with a shape NxC where N is a number of
// detected objects and C is a number of classes + 4 where the first 4
// numbers are [center_x, center_y, width, height]
float* data = (float*)outs[i].data;
for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {
Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
Point classIdPoint;
double confidence;
minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
if (confidence > confThreshold) {
int centerX = (int)(data[0] * frame.cols);
int centerY = (int)(data[1] * frame.rows);
int width = (int)(data[2] * frame.cols);
int height = (int)(data[3] * frame.rows);
int left = centerX - width / 2;
int top = centerY - height / 2;
classIds.push_back(classIdPoint.x);
confidences.push_back((float)confidence);
boxes.push_back(Rect(left, top, width, height));
}
}
}
}
else
CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);
std::vector indices;
NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i) {
int idx = indices[i];
Rect box = boxes[idx];
drawPred(classIds[idx], confidences[idx], box.x, box.y,
box.x + box.width, box.y + box.height, frame);
}
}
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));
std::string label = format("%.2f", conf);
if (!classes.empty()) {
CV_Assert(classId < (int)classes.size());
label = classes[classId] + ": " + label;
}
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.2, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - labelSize.height),
Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.2, Scalar());
}
pb转pbtxt时报错:
_graph_ssd.py --input frozen_inference_graph.pb --config ssd_mobilenet_v2_coco.config --output graph.pbtxt
Scale: [0.200000-0.950000]
Aspect ratios: [1.0, 2.0, 0.5, 3.0, 0.3333]
Reduce boxes in the lowest layer: True
Number of classes: 1
Number of layers: 6
box predictor: convolutional
Input image size: 300x300
Traceback (most recent call last):
File "tf_text_graph_ssd.py", line 368, in
createSSDGraph(args.input, args.config, args.output)
File "tf_text_graph_ssd.py", line 232, in createSSDGraph
assert(graph_def.node[0].op == 'Placeholder')
AssertionError
在转换之前,先对pb文件进行如下操作:
import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraph
with tf.gfile.FastGFile('ssdlite.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
graph_def = TransformGraph(graph_def, ['image_tensor'], ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'], ['sort_by_execution_order'])
with tf.gfile.FastGFile('ssdlite_new.pb', 'wb') as f:
f.write(graph_def.SerializeToString())#保存新的模型
再应用models里的代码进行转换,得到pbtxt文件。