深度学习的数据格式转换(mobilenet+ssd,centernet)

深度学习的数据格式转换(mobilenet+ssd,centernet)

初步生成VOC2012数据集

1、数据标注

1.1有目标的图片标注

应用labelImg对图片进行标注,下载链接:https://github.com/tzutalin/labelImg
标注时,需要注意的点,(参考自https://blog.csdn.net/chenmaolin88/article/details/79357502)

标注什么 预定义的所有类别的所有对象实例(就是说,如果图片里面有3只浣熊,就要分别标注3只浣熊), 除非:你拿不准那玩意儿是不是。对象非常非常的小(尺度自己拿捏),只能看见对象的不到 10-20%的部分 , 因此你拿不准那个到底是哪一类的,比如你只能看见一个轮胎,你不确定是卡车还是小轿车,这种就可以不用标注.如果图片中的对象肉眼都难以识别,就丢掉这张图片
难以识别(difficult) 若肉眼虽然可以大致识别,但确信度不是很高,则勾选difficult复选框,表示这个对象不是很好识别。
矩形框 用矩形框标注对象的可见区域, 不可见的区域不要标注. 非对象的区域不要标注,矩形框应该要且仅包括对象的所有可见的像素点, 除非为了包括很小一部分的对象部件,需要扩大很大一个矩形框面积,比如,小轿车的天线可以不用框进来,因为他太小了,且天线对于汽车来说无关紧要,并非主要特征。
截断Truncated 如果对象超过 15-20% 的部分不在矩形框内,则将对象标记为Truncated. 这个标记意味着矩形框内没有包含完成的对象实例。这个属性在LabelImg中无法直接勾选,需要手工编辑XML文件里的对应标签。
遮挡Occlusion 如果矩形框内,对象有超过 5% 的部分被遮挡, 标记为 Occluded. 这个标记指示矩形框内的图像存在被遮挡的情况。这个属性在LabelImg中无法直接勾选,需要手工编辑XML文件里的对应标签。
衣服、雪、泥etc 如果遮挡物是跟对象强相关的,则不用标记为遮挡,比如 人身上的衣服,应视为人的一部分。
透明 透过玻璃看到的对象也应该被标记, 但是若玻璃是有点反光的,则玻璃上的映像,应被标记为遮挡 occlusion
镜子 镜子里的对象也应该被标记。
透明 透过玻璃看到的对象也应该被标记, 但是若玻璃是有点反光的,则玻璃上的映像,应被标记为遮挡 occlusion
海报 图片里面的海报、杂志等上面的对象也应该被标记,除非是一些很浮夸的卡通画

1.2无目标图片的标注

应用以下代码,自动生成无标注目标的xml文件
参考链接:https://www.jianshu.com/p/5b2254fdf8f8

#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image
 
# VEDAI 图像存储位置
src_img_dir = "/home/lehui/Desktop/负样本700"
# VEDAI 图像生成的xml文件存放位置
src_xml_dir = "/home/lehui/Desktop/xml"
 
img_Lists = glob.glob(src_img_dir + '/*.jpg')
 
img_basenames = [] # e.g. 100.jpg
for item in img_Lists:
    img_basenames.append(os.path.basename(item))
 
img_names = [] # e.g. 100
for item in img_basenames:
    temp1, temp2 = os.path.splitext(item)
    img_names.append(temp1)
 
for img in img_names:
    im = Image.open((src_img_dir + '/' + img + '.jpg'))
    width, height = im.size
    # write in xml file
    #os.mknod(src_xml_dir + '/' + img + '.xml')
    xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
    xml_file.write('\n')
    xml_file.write('    VOC2007\n')
    xml_file.write('    ' + str(img) + '.jpg' + '\n')
    xml_file.write('    '+ src_xml_dir + '/' + str(img) + '.jpg' + '\n')
    xml_file.write('    \n')
    xml_file.write('        ' + "Unknow" + '\n')
    xml_file.write('    \n')
    xml_file.write('    \n')
    xml_file.write('        ' + str(width) + '\n')
    xml_file.write('        ' + str(height) + '\n')
    xml_file.write('        3\n')
    xml_file.write('    \n')
    xml_file.write('    0\n')
    xml_file.write('')

2、分配训练集、验证集、测试集

应用以下代码,分别得到train、val、test对应的xml集合。
参考链接:https://www.cnblogs.com/gezhuangzhuang/p/10613468.html

import os  
import random  
import time  
import shutil
#xmlfilepath——所有xml的路径,saveBasePath——保存结果的路径,下面建立三个文件夹:train、val、test
xmlfilepath=r'./Annotations'  
saveBasePath=r"./Annotations"

trainval_percent=0.8  
train_percent=0.8  
total_xml = os.listdir(xmlfilepath)  
num=len(total_xml)  
list=range(num)  
tv=int(num*trainval_percent)  
tr=int(tv*train_percent)  
trainval= random.sample(list,tv)  
train=random.sample(trainval,tr)  
print("train and val size",tv)  
print("train size",tr) 

start = time.time()

test_num=0  
val_num=0  
train_num=0  

for i in list:  
    name=total_xml[i]
    if i in trainval:  #train and val set 
        if i in train: 
            directory="train"  
            train_num += 1  
            xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  
            if(not os.path.exists(xml_path)):  
                os.mkdir(xml_path)  
            filePath=os.path.join(xmlfilepath,name)  
            newfile=os.path.join(saveBasePath,os.path.join(directory,name))  
            shutil.copyfile(filePath, newfile)
        else:
            directory="validation"  
            xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  
            if(not os.path.exists(xml_path)):  
                os.mkdir(xml_path)  
            val_num += 1  
            filePath=os.path.join(xmlfilepath,name)   
            newfile=os.path.join(saveBasePath,os.path.join(directory,name))  
            shutil.copyfile(filePath, newfile)

    else:
        directory="test"  
        xml_path = os.path.join(os.getcwd(), 'Annotations/{}'.format(directory))  
        if(not os.path.exists(xml_path)):  
                os.mkdir(xml_path)  
        test_num += 1  
        filePath=os.path.join(xmlfilepath,name)  
        newfile=os.path.join(saveBasePath,os.path.join(directory,name))  
        shutil.copyfile(filePath, newfile)

end = time.time()  
seconds=end-start  
print("train total : "+str(train_num))  
print("validation total : "+str(val_num))  
print("test total : "+str(test_num))  
total_num=train_num+val_num+test_num  
print("total number : "+str(total_num))  
print( "Time taken : {0} seconds".format(seconds))

3、文件格式组成

:所有的xml文件
:所有的图片
:2中生成的训练集、测试集、验证集对应的txt
标签分类的配置文件(label_map.txt)

item {
  id: 1    # id 从1开始编号
  name: 'red pedestrian'
}

item {
  id: 2
  name: 'green pedestrian'
}

tfrecored数据的生成

依赖\models\research\object_detection\dataset_tools\create_pascal_tf_record.py代码,
修改对应参数,生成tfrecored格式数据。

json数据的生成

根据xml文件生成对应的json
参考链接:https://blog.csdn.net/weixin_41765699/article/details/100124689

import xml.etree.ElementTree as ET
import os
import json
 
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []
 
category_set = dict()
image_set = set()
 
category_item_id = -1
image_id = 20180000000
annotation_id = 0
 
 
def addCatItem(name):
    global category_item_id
    category_item = dict()
    category_item['supercategory'] = 'none'
    category_item_id += 1
    category_item['id'] = category_item_id
    category_item['name'] = name
    coco['categories'].append(category_item)
    category_set[name] = category_item_id
    return category_item_id
 
 
def addImgItem(file_name, size):
    global image_id
    if file_name is None:
        raise Exception('Could not find filename tag in xml file.')
    if size['width'] is None:
        raise Exception('Could not find width tag in xml file.')
    if size['height'] is None:
        raise Exception('Could not find height tag in xml file.')
    image_id += 1
    image_item = dict()
    image_item['id'] = image_id
    image_item['file_name'] = file_name
    image_item['width'] = size['width']
    image_item['height'] = size['height']
    coco['images'].append(image_item)
    image_set.add(file_name)
    return image_id
 
 
def addAnnoItem(object_name, image_id, category_id, bbox):
    global annotation_id
    annotation_item = dict()
    annotation_item['segmentation'] = []
    seg = []
    # bbox[] is x,y,w,h
    # left_top
    seg.append(bbox[0])
    seg.append(bbox[1])
    # left_bottom
    seg.append(bbox[0])
    seg.append(bbox[1] + bbox[3])
    # right_bottom
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1] + bbox[3])
    # right_top
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1])
 
    annotation_item['segmentation'].append(seg)
 
    annotation_item['area'] = bbox[2] * bbox[3]
    annotation_item['iscrowd'] = 0
    annotation_item['ignore'] = 0
    annotation_item['image_id'] = image_id
    annotation_item['bbox'] = bbox
    annotation_item['category_id'] = category_id
    annotation_id += 1
    annotation_item['id'] = annotation_id
    coco['annotations'].append(annotation_item)
 
 
def parseXmlFiles(xml_path):
    for f in os.listdir(xml_path):
        if not f.endswith('.xml'):
            continue
 
        bndbox = dict()
        size = dict()
        current_image_id = None
        current_category_id = None
        file_name = None
        size['width'] = None
        size['height'] = None
        size['depth'] = None
 
        xml_file = os.path.join(xml_path, f)
        print(xml_file)
 
        tree = ET.parse(xml_file)
        root = tree.getroot()
        if root.tag != 'annotation':
            raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))
 
        # elem is , , , 
        for elem in root:
            current_parent = elem.tag
            current_sub = None
            object_name = None
 
            if elem.tag == 'folder':
                continue
 
            if elem.tag == 'filename':
                file_name = elem.text
                if file_name in category_set:
                    raise Exception('file_name duplicated')
 
            # add img item only after parse  tag
            elif current_image_id is None and file_name is not None and size['width'] is not None:
                if file_name not in image_set:
                    current_image_id = addImgItem(file_name, size)
                    print('add image with {} and {}'.format(file_name, size))
                else:
                    raise Exception('duplicated image: {}'.format(file_name))
                    # subelem is , , , , 
            for subelem in elem:
                bndbox['xmin'] = None
                bndbox['xmax'] = None
                bndbox['ymin'] = None
                bndbox['ymax'] = None
 
                current_sub = subelem.tag
                if current_parent == 'object' and subelem.tag == 'name':
                    object_name = subelem.text
                    if object_name not in category_set:
                        current_category_id = addCatItem(object_name)
                    else:
                        current_category_id = category_set[object_name]
 
                elif current_parent == 'size':
                    if size[subelem.tag] is not None:
                        raise Exception('xml structure broken at size tag.')
                    size[subelem.tag] = int(subelem.text)
 
                # option is , , , , when subelem is 
                for option in subelem:
                    if current_sub == 'bndbox':
                        if bndbox[option.tag] is not None:
                            raise Exception('xml structure corrupted at bndbox tag.')
                        bndbox[option.tag] = int(option.text)
 
                # only after parse the  tag
                if bndbox['xmin'] is not None:
                    if object_name is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_image_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_category_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    bbox = []
                    # x
                    bbox.append(bndbox['xmin'])
                    # y
                    bbox.append(bndbox['ymin'])
                    # w
                    bbox.append(bndbox['xmax'] - bndbox['xmin'])
                    # h
                    bbox.append(bndbox['ymax'] - bndbox['ymin'])
                    print('add annotation with {},{},{},{}'.format(object_name, current_image_id, current_category_id,
                                                                   bbox))
                    addAnnoItem(object_name, current_image_id, current_category_id, bbox)
 
 
if __name__ == '__main__':
    xml_path = 'Z:\pycharm_projects\ssd\VOCtest60\Annotations'    # 这是xml文件所在的地址
    json_file = './test.json'                                     # 这是你要生成的json文件                        
    parseXmlFiles(xml_path)                                       # 只需要改动这两个参数就行了
    json.dump(coco, open(json_file, 'w'))


3、模型转换

3.1 pb to pbtxt

更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中对应的输入参数,转换得到.pbtxt文件。

3.2 pt to pth

3.2 pth to onnx

3.2 onnx to ncnn

4、利用opencvdnn库调用mobilenet——ssd模型

复制描述模型的四个文件,后缀分别为.data-00000-of-00001 .index .meta和checkpoint文件,应用以下命令

python object_detection/export_inference_graph.py --input_type=image_tensor
 --pipeline_config_path=\models\ssd_mobilenet_v2_coco.config 
 --trained_checkpoint_prefix=\pedestrian_data\model\model.ckpt-1589 
--output_directory\pedestrian_data\test

得到对应的pb模型。
ssd_mobilenet_v2 训练得到的pb模型转换为pbtxt,更改D:\Program Files\opencv\sources\samples\dnn\tf_text_graph_ssd.py,中对应的输入参数,转换得到.pbtxt文件。运用opencv中的dnn库读取网络,进行c++端的检测,代码如下:
参考链接:https://blog.csdn.net/atpalain_csdn/article/details/100098720

#include 
#include 
#include 
#include 

#include 
#include 
#include 

using namespace std;
using namespace cv;
using namespace dnn;

float confThreshold, nmsThreshold;
std::vector classes;

void postprocess(Mat& frame, const std::vector& out, Net& net);
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);

int main(int argc, char** argv)
{
	// 根据选择的检测模型文件进行配置
	confThreshold = 0.5;
	nmsThreshold = 0.4;

	float scale = 1.0;
	Scalar mean = { 0, 0, 0 };
	bool swapRB = true;
	int inpWidth = 300;
	int inpHeight = 300;

	String modelPath = "frozen_inference_graph.pb";
	String configPath = "frozen_inference_graph.pbtxt";
	string image_file = "E:\\project\\data\\hands_data\\img1125\\";
	String framework = "";

	int backendId = cv::dnn::DNN_BACKEND_OPENCV;
	int targetId = cv::dnn::DNN_TARGET_CPU;

	//String classesFile = R"(object_detection_classes_coco.txt)";

	// Open file with classes names.
	//if (!classesFile.empty()) {
	//	const std::string& file = classesFile;
	//	std::ifstream ifs(file.c_str());
	//	if (!ifs.is_open())
	//		CV_Error(Error::StsError, "File " + file + " not found");
	//	std::string line;
	//	while (std::getline(ifs, line)) {
	//		classes.push_back(line);
	//	}
	//}
	classes.push_back("raiseHand");
	

	// Load a model.
	Net net = readNet(modelPath, configPath, framework);
	net.setPreferableBackend(backendId);
	net.setPreferableTarget(targetId);

	std::vector outNames = net.getUnconnectedOutLayersNames();

	// Create a window
	static const std::string kWinName = "Deep learning object detection in OpenCV";

	// Process frames.
	Mat frame, blob;
	namedWindow(kWinName, 0);
	vector< cv::String > files;
	cv::glob(image_file, files);
	for (int i = 0; i < files.size(); i++)
	{
		frame = cv::imread(files[i]);

		//cv::Mat image(90, 120, CV_8UC3, cv::Scalar::all(0));
		if (frame.empty())
		{
			return 0;
		}

		//frame = imread("E:\\project\\data\\hands_data\\train_ssd\\hands_data_pos+neg\\raisehandData\\test\\0010.jpg");

		// Create a 4D blob from a frame.
		Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,
			inpHeight > 0 ? inpHeight : frame.rows);
		blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);

		// Run a model.
		net.setInput(blob);
		if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN
		{
			resize(frame, frame, inpSize);
			Mat imInfo = (Mat_(1, 3) << inpSize.height, inpSize.width, 1.6f);
			net.setInput(imInfo, "im_info");
		}

		std::vector outs;
		net.forward(outs, outNames);

		postprocess(frame, outs, net);

		// Put efficiency information.
		std::vector layersTimes;
		double freq = getTickFrequency() / 1000;
		double t = net.getPerfProfile(layersTimes) / freq;
		std::string label = format("Inference time: %.2f ms", t);
		cout << label << endl;
		putText(frame, label, Point(0, 15), FONT_HERSHEY_PLAIN, 0.5, Scalar(0, 255, 0));
		
		imshow(kWinName, frame);
		waitKey(1);
	}
	return 0;
}

void postprocess(Mat& frame, const std::vector& outs, Net& net)
{
	static std::vector outLayers = net.getUnconnectedOutLayers();
	static std::string outLayerType = net.getLayer(outLayers[0])->type;

	std::vector classIds;
	std::vector confidences;
	std::vector boxes;
	if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN
	{
		// Network produces output blob with a shape 1x1xNx7 where N is a number of
		// detections and an every detection is a vector of values
		// [batchId, classId, confidence, left, top, right, bottom]
		CV_Assert(outs.size() == 1);
		float* data = (float*)outs[0].data;
		for (size_t i = 0; i < outs[0].total(); i += 7) {
			float confidence = data[i + 2];
			if (confidence > confThreshold) {
				int left = (int)data[i + 3];
				int top = (int)data[i + 4];
				int right = (int)data[i + 5];
				int bottom = (int)data[i + 6];
				int width = right - left + 1;
				int height = bottom - top + 1;
				classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.
				boxes.push_back(Rect(left, top, width, height));
				confidences.push_back(confidence);
			}
		}
	}
	else if (outLayerType == "DetectionOutput") {
		// Network produces output blob with a shape 1x1xNx7 where N is a number of
		// detections and an every detection is a vector of values
		// [batchId, classId, confidence, left, top, right, bottom]
		CV_Assert(outs.size() == 1);
		float* data = (float*)outs[0].data;
		for (size_t i = 0; i < outs[0].total(); i += 7) {
			float confidence = data[i + 2];
			if (confidence > confThreshold) {
				int left = (int)(data[i + 3] * frame.cols);
				int top = (int)(data[i + 4] * frame.rows);
				int right = (int)(data[i + 5] * frame.cols);
				int bottom = (int)(data[i + 6] * frame.rows);
				int width = right - left + 1;
				int height = bottom - top + 1;
				classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.
				boxes.push_back(Rect(left, top, width, height));
				confidences.push_back(confidence);
			}
		}
	}
	else if (outLayerType == "Region") {
		for (size_t i = 0; i < outs.size(); ++i) {
			// Network produces output blob with a shape NxC where N is a number of
			// detected objects and C is a number of classes + 4 where the first 4
			// numbers are [center_x, center_y, width, height]
			float* data = (float*)outs[i].data;
			for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {
				Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
				Point classIdPoint;
				double confidence;
				minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
				if (confidence > confThreshold) {
					int centerX = (int)(data[0] * frame.cols);
					int centerY = (int)(data[1] * frame.rows);
					int width = (int)(data[2] * frame.cols);
					int height = (int)(data[3] * frame.rows);
					int left = centerX - width / 2;
					int top = centerY - height / 2;

					classIds.push_back(classIdPoint.x);
					confidences.push_back((float)confidence);
					boxes.push_back(Rect(left, top, width, height));
				}
			}
		}
	}
	else
		CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);

	std::vector indices;
	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
	for (size_t i = 0; i < indices.size(); ++i) {
		int idx = indices[i];
		Rect box = boxes[idx];
		drawPred(classIds[idx], confidences[idx], box.x, box.y,
			box.x + box.width, box.y + box.height, frame);
	}
}

void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
	rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));

	std::string label = format("%.2f", conf);
	if (!classes.empty()) {
		CV_Assert(classId < (int)classes.size());
		label = classes[classId] + ": " + label;
	}

	int baseLine;
	Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.2, 1, &baseLine);

	top = max(top, labelSize.height);
	rectangle(frame, Point(left, top - labelSize.height),
		Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
	putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.2, Scalar());
}

问题集锦

pb转pbtxt时报错:

_graph_ssd.py --input frozen_inference_graph.pb --config ssd_mobilenet_v2_coco.config --output graph.pbtxt
Scale: [0.200000-0.950000]
Aspect ratios: [1.0, 2.0, 0.5, 3.0, 0.3333]
Reduce boxes in the lowest layer: True
Number of classes: 1
Number of layers: 6
box predictor: convolutional
Input image size: 300x300
Traceback (most recent call last):
  File "tf_text_graph_ssd.py", line 368, in 
    createSSDGraph(args.input, args.config, args.output)
  File "tf_text_graph_ssd.py", line 232, in createSSDGraph
    assert(graph_def.node[0].op == 'Placeholder')
AssertionError

在转换之前,先对pb文件进行如下操作:

import tensorflow as tf
from tensorflow.tools.graph_transforms import TransformGraph
 
with tf.gfile.FastGFile('ssdlite.pb', 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    graph_def = TransformGraph(graph_def, ['image_tensor'], ['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'], ['sort_by_execution_order'])
    with tf.gfile.FastGFile('ssdlite_new.pb', 'wb') as f:
        f.write(graph_def.SerializeToString())#保存新的模型

再应用models里的代码进行转换,得到pbtxt文件。

你可能感兴趣的:(dl)