caffe c++批量抽取特征的方法在[1],但是该方法使用中有几个疑问:
1. 如何转换levelDB 格式为libsvm格式。
2. ./build/tools/extract_features mini-batch 是代表什么意思,和imagenet_val.prototxt中的batch_size的关系是什么?
本文主要解决如上两个问题,具体extract_features源代码还需要进一步分析。
第一个问题,
./build/tools/extract_features models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10其中,10 是mini-batch, 假设imagenet_val.prototxt的batches size是128, 那么程序将抽取128 * 10个图片的特征。如果你有100张图片, 你可以设置mini-batch = 1, batches size = 100.
import numpy as np import caffe import sys from caffe.proto import caffe_pb2 #parse argument dbName = sys.argv[1] featureFile = sys.argv[2] output = open(featureFile, 'w')
# open leveldb files db = leveldb.LevelDB(dbName) # get db iterator it = db.RangeIter() count = 0 for key,value in it: # convert string to datum datum = caffe_pb2.Datum.FromString(db.Get(key)) # convert datum to numpy string arr = caffe.io.datum_to_array(datum)[0] i = 0 tmpS = '' # convert to svm format for i in range(0, len(arr)): tmpS += str(i+1) + ':' + str(arr[i].tolist()[0]) + ' ' #print tmpS output.write(tmpS.strip() + "\n") count+=1 print count output.close()
但是这个程序有个巨大的bug,db.RangeIter()返回的key 顺序是按照 字母 进行排序的,和levelDB的排序方式是不一样的。具体参见[3]:
The problem is most likely caused by re-ordering of training/test examples since the db.RangeIter()
iterates over keys in alphabetical order while extract_features
creates keys from index values without leading zeros (unlike convert_imageset
). Hence, you get an order like 0, 1, 10, 100, ...
Parse the key value in python and put the extracted feature vector at that position.
在这里,我们也只能说fuck了。修改后代码如下 :# get db iterator it = db.RangeIter() features = {} for key,value in it: # convert string to datum datum = caffe_pb2.Datum.FromString(value) # convert datum to numpy string arr = caffe.io.datum_to_array(datum)[0] features[int(key)] = arr #write to file, since the key in it is sorted by alpha_number default, while leveldb is sorted by number, we must sort the key again. sort_features = collections.OrderedDict(sorted(features.items())) for k, arr in sort_features.iteritems(): if(k > imageCount - 1): break line = "" for i in range(0, len(arr)): line += str(i+1) + ':' + str(arr[i].tolist()[0]) + ' ' output.write(line.strip() + "\n") output.close()
参考文章:
1. http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html
2. http://bean.logdown.com/posts/211192-caffe-use-caffe-to-extract-features-of-each-layer
3. https://github.com/BVLC/caffe/issues/1158