报错完整信息如下:
Traceback (most recent call last):
File "deeplab_model.py", line 272, in
image_batch,label_batch = get_Batch(train_list,label_list,batch_size)
File "deeplab_model.py", line 238, in get_Batch
input_queue = tf.train.slice_input_producer([data, label], num_epochs=None, shuffle=False)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py", line 338, in slice_input_producer
tensor_list = ops.convert_n_to_tensor_or_indexed_slices(tensor_list)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1332, in convert_n_to_tensor_or_indexed_slices
values=values, dtype=dtype, name=name, as_ref=False)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1303, in internal_convert_n_to_tensor_or_indexed_slices
value, dtype=dtype, name=n, as_ref=as_ref))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1262, in internal_convert_to_tensor_or_indexed_slices
value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1104, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 214, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 496, in make_tensor_proto
"Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
从结果很明显可以看出,是一次放入tensor的张量不能超过2G,可是实际中有很多数据集是超过2GB的,所以我们要进行一个切分操作!!目的是实现将超过2GB的切分到每个小块不超过2G,然后再一个一个处理就行了。
以我的数据为例:
我把我数据的维度全部打出来了,原始数据是 420*384*576*16的,420张384*576的图片,图片是16通道数的。
这个数据集是远远超过2G,所以我进行了切分,把420分别切成了100,100,100,100,20个
然后将他们分别开始batch划分,然后存成一个列表,然后循环的喂数据,问题就解决啦。
然后我们上代码:
第一部分:将420的切分成100,100,100,100,20的代码:
max_steps = 3000
batch_size = 3
train_x1 = 384 #训练集矩阵第一维度
train_x2 = 576 #训练集矩阵第二维度
train_dim = 16 #训练集图片通道数
label_y1 = 384 #标签矩阵第一维度`
label_y2 = 576 #标签矩阵第二维度
label_dim = 1 #标签图片通道数
'''
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
针对图片自已读入不能超过2G做出切分操作
'''
# train_new_list = train_list[0:3,0:train_x1,0:train_x2,None]
# print(train_new_list.shape)
temp_position = [0,100,200,300,400,420]
train_data = []
label_data = []
for i in range(len(temp_position)-1):
# print(i)
train_new_list = train_list[temp_position[i]:temp_position[i+1],0:train_x1,0:train_x2,0:train_dim]
print(train_new_list.shape)
label_new_list = label_list[temp_position[i]:temp_position[i+1],0:label_y1,0:label_y2,0:label_dim]
train_data.append(train_new_list)
label_data.append(label_new_list)
print(train_list.shape, label_list.shape)
代码的开始的几个参数全是图片的个个参数,最后的train_list就是一个列表,每个元素是大家的矩阵例如:train_list[0]的矩阵就是 100*384*576*16的
第二部分:batch的切分:
def get_Batch(data, label, batch_size):
with tf.device('/cpu:0'):
X_batch = []
Y_batch = []
for i in range(len(data)):
#print(data.shape, label.shape)
data_in = data[i]
label_in = label[i]
input_queue = tf.train.slice_input_producer([data_in, label_in], num_epochs=None, shuffle=False)
x_batch, y_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=1, capacity=128, allow_smaller_final_batch=False)
print('x_batch',x_batch.shape)
print('y_batch',y_batch.shape)
X_batch.append(x_batch)
Y_batch.append(y_batch)
return X_batch, Y_batch
这里开始用tf.device指定了cpu是因为显存不够,只能用内存,如果你的显存够用,那你就当我没说。
这里最后return的两个batch也是列表,我们最后读取的时候写个循环即可
最后的循环部分,上代码:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
sess.run(tf.local_variables_initializer())
tf.train.start_queue_runners()
for step in range(max_steps):
start_time = time.time()
# image_batch, label_batch = sess.run([images_train, labels_train])
for i in range(len(temp_position)):
date,label= sess.run([image_batch[i],label_batch[i]])
_, loss_value = sess.run([train_op, loss], feed_dict={image_holder: date, label_holder: label})
duration = time.time() - start_time
if step % 10 == 0:
examples_per_sec = batch_size / duration
sec_per_batch = float(duration)
format_str = ('step %d,loss=%.2f (%.1f examples/sec;%.3f sec/batch)')
print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))
这里最关键的是
for i in range(len(temp_position)):
这个循环,这样问题就解决啦!!
这个问题我也是上网查过很多,这是我自己的解决方法,而且我自我感觉理论说的过去,所以记录一下,如果大家和我问题很相似,我觉得应该可以解决,网上很多人都是通过tf.data中的函数去解决,我没有尝试,所以不清楚tf.data和我的方法有什么不同。也欢迎大家留言交流这个问题。