Tensorflow:ValueError: Cannot create a tensor proto whose content is larger than 2GB

报错完整信息如下:

Traceback (most recent call last):
  File "deeplab_model.py", line 272, in 
    image_batch,label_batch = get_Batch(train_list,label_list,batch_size)
  File "deeplab_model.py", line 238, in get_Batch
    input_queue = tf.train.slice_input_producer([data, label], num_epochs=None, shuffle=False)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py", line 338, in slice_input_producer
    tensor_list = ops.convert_n_to_tensor_or_indexed_slices(tensor_list)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1332, in convert_n_to_tensor_or_indexed_slices
    values=values, dtype=dtype, name=name, as_ref=False)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1303, in internal_convert_n_to_tensor_or_indexed_slices
    value, dtype=dtype, name=n, as_ref=as_ref))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1262, in internal_convert_to_tensor_or_indexed_slices
    value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1104, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 214, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 496, in make_tensor_proto
    "Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

从结果很明显可以看出,是一次放入tensor的张量不能超过2G,可是实际中有很多数据集是超过2GB的,所以我们要进行一个切分操作!!目的是实现将超过2GB的切分到每个小块不超过2G,然后再一个一个处理就行了。

以我的数据为例:

Tensorflow:ValueError: Cannot create a tensor proto whose content is larger than 2GB_第1张图片

我把我数据的维度全部打出来了,原始数据是  420*384*576*16的,420张384*576的图片,图片是16通道数的。

这个数据集是远远超过2G,所以我进行了切分,把420分别切成了100,100,100,100,20个

然后将他们分别开始batch划分,然后存成一个列表,然后循环的喂数据,问题就解决啦。

然后我们上代码:

第一部分:将420的切分成100,100,100,100,20的代码:

max_steps = 3000
batch_size = 3
train_x1 = 384  #训练集矩阵第一维度
train_x2 = 576  #训练集矩阵第二维度
train_dim = 16  #训练集图片通道数
label_y1 = 384  #标签矩阵第一维度`
label_y2 = 576  #标签矩阵第二维度
label_dim = 1   #标签图片通道数
'''
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
针对图片自已读入不能超过2G做出切分操作
'''
# train_new_list = train_list[0:3,0:train_x1,0:train_x2,None]
# print(train_new_list.shape)
temp_position = [0,100,200,300,400,420]
train_data = []
label_data = []
for i in range(len(temp_position)-1):
    # print(i)
    train_new_list = train_list[temp_position[i]:temp_position[i+1],0:train_x1,0:train_x2,0:train_dim]
    print(train_new_list.shape)
    label_new_list = label_list[temp_position[i]:temp_position[i+1],0:label_y1,0:label_y2,0:label_dim]
    train_data.append(train_new_list)
    label_data.append(label_new_list)
print(train_list.shape, label_list.shape)

代码的开始的几个参数全是图片的个个参数,最后的train_list就是一个列表,每个元素是大家的矩阵例如:train_list[0]的矩阵就是 100*384*576*16的

第二部分:batch的切分:

def get_Batch(data, label, batch_size):
    with tf.device('/cpu:0'):
        X_batch = []
        Y_batch = []
        for i in range(len(data)):
            #print(data.shape, label.shape)
            data_in = data[i]
            label_in = label[i]
            input_queue = tf.train.slice_input_producer([data_in, label_in], num_epochs=None, shuffle=False)
            x_batch, y_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=1, capacity=128, allow_smaller_final_batch=False)
            print('x_batch',x_batch.shape)
            print('y_batch',y_batch.shape)
            X_batch.append(x_batch)
            Y_batch.append(y_batch)
        return X_batch, Y_batch

这里开始用tf.device指定了cpu是因为显存不够,只能用内存,如果你的显存够用,那你就当我没说。

这里最后return的两个batch也是列表,我们最后读取的时候写个循环即可

 

最后的循环部分,上代码:

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
sess.run(tf.local_variables_initializer())
tf.train.start_queue_runners()
for step in range(max_steps):
    start_time = time.time()
    # image_batch, label_batch = sess.run([images_train, labels_train])
    for i in range(len(temp_position)):
        date,label= sess.run([image_batch[i],label_batch[i]])
        _, loss_value = sess.run([train_op, loss], feed_dict={image_holder: date, label_holder: label})
        duration = time.time() - start_time
        if step % 10 == 0:
            examples_per_sec = batch_size / duration
            sec_per_batch = float(duration)

            format_str = ('step %d,loss=%.2f (%.1f examples/sec;%.3f sec/batch)')
            print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))

这里最关键的是

for i in range(len(temp_position)):

这个循环,这样问题就解决啦!!

这个问题我也是上网查过很多,这是我自己的解决方法,而且我自我感觉理论说的过去,所以记录一下,如果大家和我问题很相似,我觉得应该可以解决,网上很多人都是通过tf.data中的函数去解决,我没有尝试,所以不清楚tf.data和我的方法有什么不同。也欢迎大家留言交流这个问题。

你可能感兴趣的:(python,tensorflow)