首先想要说一下,刚开始我是打算fine-tune一下SSD的,然后我只需要分类里面的6类即可,但是SSD给出的VOC数据集标签中是有21类的,并且各个类别之间还会有交叉,还会有的图片有我想要的类别也有没有我想要的类别,但我只想将拥有我想要的那6个类的.xml文件给挑选出来进行训练,所以就用到了如下的Linux脚本处理方法。后来,我发现在自己制作VOC格式数据的时候,也基本上会用到这些知识,所以就分享一下。当然,本人的Linux shell很渣,只是想给同样跟我一样是Linux小白的同学提供一点帮助。
find -name "*.xml" -exec grep -l 'person' {} \; -fprint out.txt
sed -i 's/^..//g' person-test.txt
sed -i 's/\.xml/\.jpg/g' person-test.txt
import shutil
file = open(r'person-test.txt')
n=0
for out in file.readlines():
print out
shutil.move("/home/fx/code/caffe-ssd-lw/data/VOCdevkit/VOC2012/JPEGImages/"+out.strip(),"/home/fx/code/caffe-ssd-lw/data/VOCdevkit/VOC2012/PersonImages")
直接将2.txt文件的全部内容粘贴到1.txt文件的全部内容之后:
cat 1.txt 2.txt > out.txt
将2.txt文件的全部内容按相同的行号粘贴到1.txt文件中:
paste -d ' ' 1.txt 2.txt > out.txt
import xml.etree.cElementTree as ET
import os
names = ['person','bicycle','car','bus','motorbike']
n=0
directory = open('out.txt')
delete = open('delete.txt','w')
for dir in directory.readlines():
tree = ET.ElementTree(file=dir.strip())
for elem in tree.iter(tag='name'):
if elem.text in names:
print "i get it"
else:
print "i didn't get it"extract
n+=1
delete.write(dir)
break
print n
再得到我们想要的delete.txt文件之后,我们再用一个命令即可得到去除的文件。
grep -v -f delete.txt trainval.txt > trainvla.txt
见该网址即可:http://www.programgo.com/article/522152370/
最后附上在github上问SSD作者如何利用自己数据fine-tune SSD模型,然后作者给出了如下回答:
You should create a labelmap file for your own data using : https://github.com/weiliu89/caffe/blob/ssd/tools/create_label_map.cpp
And currently it only support annotation whose format is same as VOC or COCO. If not, you should write your own function to read the annotation. You can refer to:https://github.com/weiliu89/caffe/blob/ssd/src/caffe/util/io.cpp#L251 (Parse VOC/ILSVRC detection annotation.)
I would also strongly suggest using this to debug before you train on your own data.:https://github.com/weiliu89/caffe/blob/ssd/examples/ssd.ipynb
同时删除两个文件中相同的部分:http://www.cnblogs.com/raceblog/archive/2011/03/24/shell-delete-comm.html