目前bug主要是create_imagenet.sh(来源于examples/imagenet)生成lmdb数据时产生的
bug 1 mkdir *_val_lmdb failed
这个一般是因为指定路径下已经存在了该文件,导致出现冲突问题,我最开始对于这问题是每次都手动敲码删除该文件,最后发现自己很笨,可以直接加个语句到create_imagenet.sh中:
rm -rf $EXAMPLE/mytask_train_lmdb
rm -rf $EXAMPLE/mytask_val_lmdb
bug 2 找不到指定路径下的图片could not open or find file
第一个情况是我在windows cmd下生成的txt标签文件,这里路径是反斜杠,我没有注意到。解决的最好办法就是打开txt文件,将反斜杠替换为斜杠。要么就是在linux下运行make_list.py就不会出现这个问题了。
第二种情况,这个着实困扰了我好久,怎么也搞不懂,路径明明对着了,为啥就不对呢?百思不得其解。。。最后才发现是python里面的转义字符 \t 搞的鬼 在图片名和标签之间的空格用\t表示的,解决这个问题的办法是用 ‘ ’代替了,好了:
#fout.write('%s\t%d\n'%(image_list[i][0], image_list[i][1]))
fout.write('%s%s%d\n'%(image_list[i][0], ' ',image_list[i][1]))#space not \t
正确情况,开始生成lmdb 数据比较大啊 378430图像 比较耗时
代码一
make_list.py
import fnmatch,os
import random
import numpy as np
import argparse
def list_image(root, recursive, exts):
image_list = []
if recursive:
cat = {}
for path, subdirs, files in os.walk(root,True):
print path
for fname in files:
fpath = os.path.join(path,fname)
suffix = os.path.splitext(fname)[1].lower()
if os.path.isfile(fpath) and (suffix in exts):
if path not in cat:
cat[path] = len(cat)
image_list.append((os.path.relpath(fpath, root), cat[path]))
# print fpath,cat[path]
else:
for fname in os.listdir(root):
fpath = os.path.join(root, fname)
suffix = os.path.splitext(fname)[1].lower()
if os.path.isfile(fpath) and (suffix in exts):
image_list.append((os.path.relpath(fpath, root), 0))
return image_list
def write_list(path_out, image_list):
with open(path_out, 'w') as fout:
for i in xrange(len(image_list)):
#fout.write('%d \t %d \t %s\n'%(i, image_list[i][1], image_list[i][0]))
#fout.write('%s\t%d\n'%(image_list[i][0], image_list[i][1]))
fout.write('%s%s%d\n'%(image_list[i][0], ' ',image_list[i][1]))#space not \t
def make_list(prefix_out, root, recursive, exts, num_chunks, train_ratio):
image_list = list_image(root, recursive, exts)
random.shuffle(image_list)
N = len(image_list)
chunk_size = (N+num_chunks-1)/num_chunks
for i in xrange(num_chunks):
chunk = image_list[i*chunk_size:(i+1)*chunk_size]
if num_chunks > 1:
str_chunk = '_%d'%i
else:
str_chunk = ''
if train_ratio < 1:
sep = int(chunk_size*train_ratio)
write_list(prefix_out+str_chunk+'_train.txt', chunk[:sep])
write_list(prefix_out+str_chunk+'_val.txt', chunk[sep:])
else:
write_list(prefix_out+str_chunk+'.txt', chunk)
def main():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description='Make image list files that are\
required by im2rec')
parser.add_argument('root', help='path to folder that contain images.')
parser.add_argument('prefix', help='prefix of output list files.')
parser.add_argument('--exts', type=list, default=['.bmp','.bmp'],
help='list of acceptable image extensions.')
parser.add_argument('--chunks', type=int, default=1, help='number of chunks.')
parser.add_argument('--train_ratio', type=float, default=1.0,
help='Percent of images to use for training.')
parser.add_argument('--recursive', type=bool, default=True,
help='If true recursively walk through subdirs and assign an unique label\
to images in each folder. Otherwise only include images in the root folder\
and give them label 0.')
args = parser.parse_args()
make_list(args.prefix, args.root, args.recursive,
args.exts, args.chunks, args.train_ratio)
if __name__ == '__main__':
main()
代码二
create_imagenet.sh
#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
EXAMPLE=examples/mytask
DATA=/mnt/hgfs/caffe
TOOLS=build/tools
TRAIN_DATA_ROOT=/mnt/hgfs/caffe/train/
VAL_DATA_ROOT=/mnt/hgfs/caffe/val/
# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=true
if $RESIZE; then
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
fi
if [ ! -d "$TRAIN_DATA_ROOT" ]; then
echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet training data is stored."
exit 1
fi
if [ ! -d "$VAL_DATA_ROOT" ]; then
echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
"where the ImageNet validation data is stored."
exit 1
fi
echo "Creating train lmdb..."
rm -rf $EXAMPLE/mytask_train_lmdb
rm -rf $EXAMPLE/mytask_val_lmdb
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$TRAIN_DATA_ROOT \
$DATA/train.txt \
$EXAMPLE/mytask_train_lmdb
echo "Train lmdb done!"
echo "Creating val lmdb..."
GLOG_logtostderr=1 $TOOLS/convert_imageset \
--resize_height=$RESIZE_HEIGHT \
--resize_width=$RESIZE_WIDTH \
--shuffle \
$VAL_DATA_ROOT \
$DATA/val.txt \
$EXAMPLE/mytask_val_lmdb
echo "val lmdb done!"
echo "Done."