im = cv2.imread(im_file) #路径中含中文 导致读取结果为none
#修改为:
im = cv2.imdecode(np.fromfile(im_file, dtype=np.uint8), -1)
因为识别率太低,如果你把deom.py中的CONF_THRESH改成0.1,就能显示图片了,只是效果会很差。
将demo.py中demo函数的CONF_THRESH = 0.1
Traceback (most recent call last):
File "D:\Program Files\JetBrains\PyCharm 2018.1.1\helpers\pydev\pydevd.py", line 1664, in
main()
File "D:\Program Files\JetBrains\PyCharm 2018.1.1\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\Program Files\JetBrains\PyCharm 2018.1.1\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\Program Files\JetBrains\PyCharm 2018.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "F:/Faster-RCNN-TensorFlow-Python3/train.py", line 217, in
train.train()
File "F:/Faster-RCNN-TensorFlow-Python3/train.py", line 149, in train
blobs = self.data_layer.forward()
File "F:\图像处理\Faster-RCNN-TensorFlow-Python3\lib\layer_utils\roi_data_layer.py", line 75, in forward
blobs = self._get_next_minibatch()
File "F:\Faster-RCNN-TensorFlow-Python3\lib\layer_utils\roi_data_layer.py", line 71, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "F:\Faster-RCNN-TensorFlow-Python3\lib\utils\minibatch.py", line 30, in get_minibatch
im_blob, im_scales = _get_image_blob(roidb, random_scale_inds)
File "F:\Faster-RCNN-TensorFlow-Python3\lib\utils\minibatch.py", line 64, in _get_image_blob
im = cv2.imdecode(np.fromfile(roidb[i]['image'], dtype=np.uint8), -1)
KeyError: 'image'
解决:
将Faster-RCNN-TensorFlow-Python3.5-master\data\cache文件夹中之前生成的文件模型删除。
因为会自己读取cache中的文本,导致训练出现错误。
(来自https://www.cnblogs.com/pprp/p/9465065.html)
直接报错,没有一次成功迭代
File "F:\Faster-RCNN-TensorFlow-Python3\lib\layer_utils\proposal_target_layer.py", line 135, in _sample_rois
raise Exception()
解决办法:调整config.py里的roi_bg_threshold_high和roi_bg_threshold_low,一般把roi_bg_threshold_low改成0.0就不会出现这个问题
更换另一数据集后,从第67次迭代开始持续同上报错,尝试上述解决方法,再次遇到下一问题,按照下面解决方案修改pascal_voc.py文件后,顺利迭代3700多次再次报错inds = np.where(ovr <= thresh)[0]
为解决上一问题,将roi_bg_threshold_low改成0.0,报错如下:
F:\Faster-RCNN-TensorFlow-Python3\lib\utils\bbox_transform.py:28: RuntimeWarning: invalid value encountered in log
targets_dh = np.log(gt_heights / ex_heights)
F:\Faster-RCNN-TensorFlow-Python3\lib\utils\py_cpu_nms.py:35: RuntimeWarning: invalid value encountered in less_equal
inds = np.where(ovr <= thresh)[0]
根据一些博客所述尝试修改pascal_voc.py文件,将167行至171行每一行后面的-1删除(因为自己制作的xml文件中有些框的坐标是从左上角开始的,也就是(0,0),如果再减一就会出现log(-1)的情况),无效。。
使用以下代码检查是否是坐标越界的情况
import os
from xml.etree.ElementTree import ElementTree
'''检查xml,是否有坐标越界等情况'''
def read_xml(in_path):
'''''读取并解析xml文件
in_path: xml路径
return: ElementTree'''
tree = ElementTree()
tree.parse(in_path)
return tree
def check():
url = "Annotations"
flag = 1
for item in os.listdir(url):
tree = read_xml(url + "/" + item)
root = tree.getroot()
object = root.findall("object")
size = root.find("size")
width =int(size.find("width").text)
height = int(size.find("height").text)
if object == None:
print(item)
flag=0
continue
for it in object:
bndbox = it.find("bndbox")
if bndbox == None:
print(item)
flag = 0
xmin = int(bndbox.find("xmin").text)
xmax = int(bndbox.find("xmax").text)
ymin = int(bndbox.find("ymin").text)
ymax = int(bndbox.find("ymax").text)
if xmin <= 0 or xmin >= xmax or ymin <=0 or ymin >= ymax:
print(item)
flag = 0
if xmax > width | ymax> height:
print(item)
flag = 0
if flag:
print("Checked OK")
if __name__ =='__main__':
check()
检查完毕,发现xml文件无问题。
后来想到之前某问题时的解决方法:将Faster-RCNN-TensorFlow-Python3.5-master\data\cache文件夹中之前生成的文件模型删除。
试之,再无报错,问题解决。
model定义中,第6层之前,
reshaped = tf.reshape(relu3, [pool_shape[0], nodes]) # shape=(n, 120)
改为: reshaped = tf.layers.flatten(relu3)
allow_smaller_final_batch=True
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
(0) Out of range: FIFOQueue '_4_batch/fifo_queue' is closed and has insufficient elements (requested 4, current size 0)
[[{{node batch}}]]
(1) Out of range: FIFOQueue '_4_batch/fifo_queue' is closed and has insufficient elements (requested 4, current size 0)
[[{{node batch}}]]
[[batch/_61]]
使用try except 输出已读取到的batch中每个数据的label
659 [33 33 33 33]
660 [33 33 34 34]
661 [34 34 34 34]
662 [34 34 34 34]
663 [34 34 34]
664 error:
665 error:
666 error:
...
810 error:
上面结果说明,在读取到了label为34的类的13张图片,可能是从14张图片开始出问题
于是输出创建tfrecords时label为34的文件名,得读取顺序。检查第14张图片似乎并无问题。
val_data\zh_chuan
1 1.jpg
2 319_sun_chuan_1.jpg
3 320_sun_chuan_15.jpg
4 5.jpg
5 debug_char_auxRoi_121.jpg
6 debug_char_auxRoi_135.jpg
7 debug_char_auxRoi_142.jpg
8 debug_char_auxRoi_145.jpg
9 debug_char_auxRoi_89.jpg
10 debug_chineseMat0.jpg
11 debug_chineseMat12.jpg
12 debug_chineseMat13.jpg
13 debug_chineseMat35.jpg
14 debug_chineseMat38.jpg
15 debug_chineseMat39.jpg
16 debug_chineseMat42.jpg
17 debug_chineseMat47.jpg
18 debug_chineseMat48.jpg
19 debug_chineseMat56.jpg
20 debug_chineseMat57.jpg
21 debug_chineseMat60.jpg
22 debug_chineseMat62.jpg
23 debug_chineseMat7.jpg
24 debug_chineseMat9.jpg
25 gt_891_0.jpg
输出转换上面文件时,读取所得内容,无空值。 似乎问题不在于图片
原因主要是num_threads数不够,可以考虑增加num_threads的数量,比如32,64,128。当然你改变capacity这个参数,那么也会改变结果,感觉是往fifo里面写数据需要一定的时间,如果时间不够就会报错。还和图片的size有关系。如果报错(requested 128, current size 0),说明数据根本你没有读进来,你可以一步一步调试,看读数据的程序哪里出错了。
————————————————
版权声明:本文为CSDN博主「love萌萌loli」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qwe2508/article/details/80448260
根据上述,设置num_threads为4、32、64、128(原本为1)
均在Step [1900]之后报错
减少类别到34类,在1956次训练后,读入batch错误
减少类别到33,32,31类,仍是在1956次训练后,读入batch错误
减少类别至20类,在1957次训练后,读入batch错误
1956 [0 0 0 0]
1957 [0 0 0 0]
1958 error:
1959 error:
减少类别至10类,
1390 [0 0 0 0]
1391 [9 0 0 0]
1392 [0 0 0 0]
......
1405 [0 0 9 0]
1406 [0 0 0 0]
1407 [0 0 0 0]
......
1956 [0 0 0 0]
1957 [0 0 0 0]
1958 error:
1959 error:
改为A、B、C、D、E五类(之前成功训练模型)
1954 [0 0 0 0]
1955 [0 0 0 0]
1956 [0 0 0 0]
1957 error:
1958 error:
1959 error:
使用之前生成的这五类的tfrecords,可正常训练。但
train_record_file = 'train.tfrecords'
val_record_file = 'train.tfrecords'
重启了pycharm
现使用划分后生成的的五类数据的tfrecord,亦使:
train_record_file = 'train.tfrecords'
val_record_file = 'train.tfrecords'
可正常训练。
将val_record_file = 'train.tfrecords’改为= ‘val.tfrecords’,仍可正常训练
保持val_record_file = ‘val.tfrecords’,将数据增加到10类,仍可正常训练
保持val_record_file = ‘val.tfrecords’,将数据增加到34类,仍可正常训练
保持val_record_file = ‘val.tfrecords’,将数据增加到65类,报错。。(如初)
0 [0 0 0 0]
2020-05-03 23:30:14.179043: Step [0] train Loss : 4.044424, training accuracy : 0
3246 811
658 [33 33 33 33]
659 [33 33 33 33]
660 [33 33 34 34]
661 [34 34 34 34]
662 [34 34 34 34]
663 error:
664 error:
.....
810 error:
2020-05-03 23:30:19.105869: Step [0] val Loss : 4.207195, val accuracy : 0.00554871
1 [0 0 0 0]
2 [0 0 0 0]
3 [0 0 0 0]
......
52 [0 0 0 0]
53 [0 0 0 0]
54 error:
55 error:
......
500 error:
2020-05-03 23:30:25.579560: Step [500] train Loss : 3.997459, training accuracy : 0
3246 811
0 error:
File "train.py", line 49, in net_evaluation
feed_dict={input_images: val_x,
UnboundLocalError: local variable 'val_x' referenced before assignment
保持val_record_file = ‘val.tfrecords’,将数据改回到34类,可正常训练
使用5类汉字数据进行训练,迅速报错
0 [0 0 0 0]
2020-05-03 23:57:43.501129: Step [0] train Loss : 1.615997, training accuracy : 0.25
74 18
0 error:
File "train.py", line 49, in net_evaluation
feed_dict={input_images: val_x,
UnboundLocalError: local variable 'val_x' referenced before assignment
使用show_img显示tfrecords中保存的图片,报错:
Invalid argument: Input to reshape is a tensor with 2352 values, but the requested shape has 784
[[{{node Reshape}}]]
[[mul/_15]]
Original stack trace for 'Reshape':
File "data_reader.py", line 156, in
train_images, train_labels = read_records('train.tfrecords', 28, 28, type='normalization')
File "data_reader.py", line 43, in read_records
tf_image = tf.reshape(tf_image, [resize_height, resize_width, 1]) # 设置图像的维度
2352=784*3 说明存在3通道图片
在创建tfrecord的代码中,加入
if len(img.split()) >2:
img = img.convert('L')
将三通道的图转换为灰度图,至此问题解决,可正常训练
当既自定义了input_images,又从图中获取了input_x时
Invalid argument: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,28,28,1]
Original stack trace for 'input_1':
File "predict.py", line 22, in
name='input')
根据报错显示 input_1 为重新自定义的tensor,即定义probs2时的输入,如下
input_images = tf.placeholder(dtype=tf.float32,shape=[None, 28, 28, 1],name='input')
......
logit2 = model_Lenet5.inference(input_images, 1, regularizer)
probs2 = tf.nn.softmax(logit2)
......
probs2, index = sess.run([probs2, pred_class_index], feed_dict={input_x: img})
在run时,填充的input_x是从图中读取的tensor,而非定义probs2时所用的input_images,所以报错。将input_x改为input_images即可正常运行。
总结:目的tensor要与feed_dict中tensor相对应。
[[0.01945407 0.01398481 0.01117951 0.02447441 0.01526438 0.01761573
0.01606896 0.01489647 0.01260772 0.01419507 0.01247283 0.01348115
0.01689115 0.01669618 0.01441986 0.00921112 0.01199187 0.01113395
0.01562285 0.01571186 0.02013027 0.01576558 0.01522272 0.02342227
0.01568468 0.0109561 0.01539537 0.01482384 0.01129816 0.01843453
0.01634269 0.01774859 0.01920019 0.01192253 0.01813584 0.01293925
0.0141176 0.01715764 0.01921851 0.01852516 0.0132573 0.01316539
0.0192745 0.01007393 0.01571972 0.01291238 0.01254836 0.01106239
0.01295787 0.01617481 0.0198624 0.02073024 0.0133837 0.02250618
0.01655234 0.01463597 0.01757438 0.01516759 0.0125383 0.01451214
0.01695031 0.01575247 0.01240974 0.01190402 0.01455811]]
当采用重新定义tensor的方法时,若同时导入了图(即使用了下面的语句),就会导致上面的输出结果
saver = tf.train.import_meta_graph('modelslenet/model.ckpt-1996000.meta')
将上面语句替换saver = tf.train.Saver(),即可获得预期的分类结果。
总结:重新定义tensor和导入图读取tensor的方法不宜混用
当仅采用从图中读取tensor的方法,且仅获取了input_x,且导入图与获取图中变量的语句在sess初始化前,即:
saver = tf.train.import_meta_graph('modelslenet/model.ckpt-1996000.meta')
graph = tf.get_default_graph()
input_x = graph.get_operation_by_name('input').outputs[0]
logit = graph.get_tensor_by_name('layer7-fc2/add:0')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
saver.restore(sess, 'modelslenet/model.ckpt-1996000')
result = sess.run(logit, feed_dict={input_x: img})
下面两种报错交替出现(每次运行,显示其中一种)
Invalid argument: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,28,28,1]
Original stack trace for 'input_1':
File "predict1.py", line 33, in
saver = tf.train.import_meta_graph('modelslenet/model.ckpt-1996000.meta') # 导入验证准确率比较高的某个计算图
Invalid argument: You must feed a value for placeholder tensor 'keep_prob_1' with dtype float
Original stack trace for 'keep_prob_1':
File "predict1.py", line 33, in
saver = tf.train.import_meta_graph('modelslenet/model.ckpt-1996000.meta') # 导入验证准确率比较高的某个计算图
根据报错显示,input_1和keep_prob_1来自图中
尝试增加feed_dict:
最终修改成如下:(但无效)
result = sess.run(logit, feed_dict={input_x: img,input_y:np.zeros((1,65)),kp:1.0})
继续
在上面的基础上,尝试更改加载图和获取变量的代码位置
错误原因:新变量名与就变量名相同。
probs, index = sess.run([probs, pred_class_index], feed_dict={input_images: img})
更改为:
chances, index = sess.run([probs, pred_class_index], feed_dict={input_images: img})