【源代码】
class CausalSelfAttention(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super().__init__()
self.mha = tf.keras.layers.MultiHeadAttention(**kwargs)
self.add = tf.keras.layers.Add()
self.layer_norm = tf.keras.layers.LayerNormalization()
def call(self, x):
attn_output = self.mha(query=x, value=x, key=x, use_causal_mask=True)
x = self.add([x, attn_output])
x = self.layer_norm(x)
return x
【解决办法】
参考: TypeError: call() got an unexpected keyword argument \'use_causal_mask\' ---> 在 flickr8k/flickr30k 数据集上出现这个错误
我刚开始配置的环境是 Tensorflow-gpu 2.6.0,但 tf.keras.layers.MultiHeadAttention 的参数 use_causal_mask 是在 Tensorflow 2.10.0 版本中才引入的......遂安装 Tensorflow-gpu 2.10.0 版本(只因不想再写一遍 mask...)
(cuda 11.2-11.4 和 cudnn 8.1 可参考: tensorflow 2.10.0安装所需依赖库版本确定方法)
conda create -n tensorflow python=3.8
conda activate tensorflow
pip install tensorflow-gpu==2.10.0 -i https://pypi.douban.com/simple/
Anaconda Prompt 中验证是否安装成功
python
import tensorflow as tf
tf.test.is_gpu_available()
【源代码】
def call(x): # x is a Tensor
random_data = tf.random.normal(x.shape, mean=0.0, stddev=1.0)
data_score = tf.nn.softmax(x + random_data, axis=-1)
return data_score
【解决办法】
tf.random.normal(shape, mean=0.0, stddev=1.0, dtype=tf.dtypes.float32)
该随机生成函数接受 1-D 整数 Tensor 或 python 列表作为 shape,但传的是 元组(x.shape)
因此,将 x.shape 改为 tf.shape(x) 即可
random_data = tf.random.normal(tf.shape(x), mean=0.0, stddev=1.0)
参考: Cannot convert a partially known TensorShape to a Tensor
上面的问题涉及到 Tensor 各种 shape 的用法,其中 tf.shape() 返回的是 Tensor,
而 Tensor.shape 和 Tensor.get_shape() 返回的则是 TensorShape 类型的元组(也就是元组)
(tf.Tensor.get_shape() 和 tf.Tensor.shape 等价)
feature = tf.constant([[3.7, 3.8, 2.5], [3.6, 3.7, 3.8]])
# tf.Tensor(
# [[3.7 3.8 2.5]
# [3.6 3.7 3.8]], shape=(2, 3), dtype=float32)
shape1 = tf.shape(feature)
print(shape1) # tf.Tensor([2 3], shape=(2,), dtype=int32)
shape2 = feature.shape
print(shape2) # TensorShape (2, 3)
shape3 = feature.get_shape()
print(shape3) # TensorShape (2, 3)
对比,取出 shape 中某个维度的值,区别更加明显:
shape1 = tf.shape(feature)[-1]
print(shape1) # tf.Tensor(3, shape=(), dtype=int32)
shape2 = feature.shape[-1]
print(shape2) # {int} 3
shape3 = feature.get_shape()[-1]
print(shape3) # {int} 3
因此在使用循环迭代时,
for i in range(tensor.shape[0]) # int
for i in tf.range(tf.shape(tensor)[0]) # tf.Tensor
【源代码】
features = features.numpy() # features is a Tensor
【解决办法】
好像和 eager execution 有关系(换成 Tensor.eval() 也不行)
对于 Tensor.eval(),官方说(tensorflow.google.cn/versions/r2.10/api_docs/python/tf/Tensor#eval)如果没有使用 compat.v1 库,在 eager execution (or within tf.function) ,不需要调用 eval。
Note: Before invoking Tensor.eval(), its graph must have been launched in a session, and either a default session must be available, or session must be specified explicitly.
但是,控制台打印 numpy() 没有问题(该问题好像发生在 call_fn,记不太清了)
print(features.numpy()) # ndarray
1)参考: Tensorflow 2.3: AttributeError: 'Tensor' object has no attribute 'numpy' with eager mode enabled
调用 model.compile 时添加参数 experimental_run_tf_function=False,试了但没有用仍然报该错;
2)参考: AttributeError: 'Tensor' object has no attribute 'numpy' in custom loss function (Tensorflow 2.1.0)
调用 model.compile 时设置 run_eagerly=True,试了也不行,报错如下(没解决...)
W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
【源代码】
def _batch_encoder(self, x):
item_representations = [] # python list
for i in tf.range(tf.shape(x)[0]):
item_embed = self.item_encoder(x[i])
item_representations.append(tf.expand_dims(news_embed, axis=0))
batch_item_representations = tf.concat(item_representations, axis=0)
return batch_item_representations
【解决办法】
参考: Tensorflow:此处无法访问:它是在另一个函数或代码块中定义的
该错误是由于 使用python列表临时保存tensor对象 导致的。内存回收机制将删除跟踪此函数后保存的内容,因此无法实现访问。 如果要保存这些临时张量,必须使用 tf.TensorArray 作为替换。
修改后的代码如下:
def _batch_item_encoder(self, x):
item_representations = tf.TensorArray(tf.float32, size=0, dynamic_size=True, clear_after_read=False)
for i in tf.range(tf.shape(x)[0]):
item_embed = self.item_encoder(x[i])
item_representations = item_representations.write(i, item_embed)
batch_item_representations = item_representations.stack()
return batch_item_representations
Object was never used (type
If you want to mark it as used call its "mark_used()" method.
【源代码】
item_representations = tf.TensorArray(tf.float32, size=0, dynamic_size=True, clear_after_read=False)
for i in tf.range(tf.shape(x)[0]):
item_embed = self.item_encoder(x[i])
item_representations.write(i, item_embed) # wrong
【解决办法】
参考: tensorflow循环操作tensor array的一些坑
将value写入 tf.TensorArray 的index位置,write方法需要返回给写入的TensorArray!对比:
ta = tf.TensorArray(tf.float32, size=0, dynamic_size=True, clear_after_read=False)
ta.write(ids, tensor) # wrong
ta = ta.write(ids, tensor) # right
因此上述错误的代码应修改如下:
item_representations = tf.TensorArray(tf.float32, size=0, dynamic_size=True, clear_after_read=False)
for i in tf.range(tf.shape(x)[0]):
item_embed = self.item_encoder(x[i])
item_representations = item_representations.write(i, item_embed) # right
【源代码】
feature = tf.constant([[3.7, 3.8, 2.5], [3.6, 2.6, 3.8]])
indices = tf.constant([[0, 2], [0, 2]])
updates = tf.constant([[0.12, 0.13], [0.26, 0.23]])
for index, value in enumerate(feature): # token is a 1-D Tensor
token_idx = indices[index] # idx 1-D Tensor
token_val = updates[index] # val 1-D Tensor
for idx, val in enumerate(token_idx):
value[val] = token_val[idx] # The usage of this line is wrong
Note:上述源代码的最后一行,不可以用这种方式给 Tensor 赋值,会报错
NotImplementedError: Cannot convert a symbolic tf.Tensor to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.
【解决办法】
参考: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function
我的目的是,按照 indices 中的索引来更新 Tensor 的值,因此想同时使用 index 和 value 来遍历整个 Tensor
但显然,Tensor 是不可以迭代的,因此使用 tf.range 和 tf.tensor_scatter_nd_update(或 tf.TensorArray)来替换,修改后的代码如下:
def function(x): # x is a 2-D tensor
token_top_mask = tf.zeros_like(x)
for index in tf.range(tf.shape(x)[0]): # index is a 0-D Tensor
indices = tf.constant([[0], [2]])
updates = tf.constant([0.13, 0.12])
token_mask = tf.tensor_scatter_nd_update(x[index], indices, updates)
indices = tf.expand_dims(tf.expand_dims(index, axis=-1), axis=-1)
token_mask = tf.expand_dims(token_mask, axis=0)
token_top_mask = tf.tensor_scatter_nd_update(token_top_mask, indices, token_mask)
return token_top_mask
Traceback (most recent call last):
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
AttributeError: in user code:
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\training.py", line 1160, in train_function *
return step_function(self, iterator)
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\training.py", line 1146, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\training.py", line 1135, in run_step **
outputs = model.train_step(data)
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\training.py", line 998, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\training.py", line 1092, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\compile_utils.py", line 577, in update_state
self.build(y_pred, y_true)
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\compile_utils.py", line 483, in build
self._metrics = tf.__internal__.nest.map_structure_up_to(
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\compile_utils.py", line 631, in _get_metric_objects
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\compile_utils.py", line 631, in
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
File "D:\Python\Anaconda3\envs\tensorflow2\lib\site-packages\keras\engine\compile_utils.py", line 652, in _get_metric_object
y_t_rank = len(y_t.shape.as_list())
AttributeError: 'tuple' object has no attribute 'shape'
【源代码】
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(lr=0.00005),
metrics=['acc'])
【解决办法】
参考:as_list() is not defined on an unknown TensorShape on y_t_rank = len(y_t.shape.as_list()) and related to metrics
找到报错位置在 metrics 处,不知道为什么 'acc' 不管用,反正换成 tf.keras.metrics.Accuracy() 就好了
model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(lr=0.00005),
metrics=[tf.keras.metrics.Accuracy()])
分配器 GPU_0_bfc 在试图分配给某模块需要的10.0KiB 时内存不足
如果是内存碎片的原因,那么环境变量“TF_GPU_ALLOCATOR=cuda_malloc_async”可能会改善这种情况
当前分配汇总如下
......
2023-03-08 17:45:50.687264: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 6039a5f00 of size 3514624 next 18446744073709551615
2023-03-08 17:45:50.687542: I tensorflow/core/common_runtime/bfc_allocator.cc:1065] Summary of in-use Chunks by size:
2023-03-08 17:45:50.687791: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25248 Chunks of size 256 totalling 6.16MiB
2023-03-08 17:45:50.688040: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 58 Chunks of size 768 totalling 43.5KiB
......
2023-03-08 17:45:50.763248: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 3.88GiB
2023-03-08 17:45:50.763478: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 4163895296 memory_limit_: 4163895296 available bytes: 0 curr_region_allocation_bytes_: 8327790592
2023-03-08 17:45:50.763885: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit: 4163895296
InUse: 4163893504
MaxInUse: 4163895296
NumAllocs: 1745563
MaxAllocSize: 65280000
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2023-03-08 17:45:50.765619: W tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************************************************************************************
2023-03-08 17:45:50.765978: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:681 : Resource exhausted: OOM when allocating tensor with shape[150,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2023-03-08 17:46:00.774886: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 10.0KiB (rounded to 10240)requested by op mo_e_news_rec/while/body/_1/mo_c_item_rec/while/item_encoder/naive_layer/dense/Tensordot_1/MatMul
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-03-08 17:46:00.776133: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2023-03-08 17:46:00.776371: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256): Total Chunks: 25255, Chunks in use: 25248. 6.17MiB allocated for chunks. 6.16MiB in use in bin. 330.2KiB client-requested in use in bin.
......
2023-03-08 17:46:00.794044: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 10.0KiB was 8.0KiB, Chunk State:
2023-03-08 17:46:00.794290: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 4163895296
2023-03-08 17:46:00.794513: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 50ba00000 of size 256 next 1
2023-03-08 17:46:00.794750: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 50ba00100 of size 1280 next 2
......
【解决办法】
查到了几个参考的解决思路,大概就是 gpu 显存不够了
所以,要么减小batch_size,要么减少模型参数,要么换个多 gpu 大显存的机器。
(减小 batch_size 对于超出太多的模型参数来说基本没有用,仍然内存不够,只能减小模型参数。。。)
Tensorflow 不适用于 gpu - 使用了太多内存。如何解决?
关于python:Tensorflow内存不足和CPU / GPU使用率
TensorFlow耗尽GPU内存:分配器(GPU_0_BFC)在尝试分配时耗尽内存
2023-03-08 22:56:44.493180: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2023-03-08 22:56:44.493261: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at matmul_op_impl.h:442 : Internal: Attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/training.py", line 1975, in fit_generator
return self.fit(
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/training.py", line 1134, in fit
data_handler = data_adapter.get_data_handler(
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 1383, in get_data_handler
return DataHandler(*args, **kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 1138, in __init__
self._adapter = adapter_cls(
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 917, in __init__
super(KerasSequenceAdapter, self).__init__(
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 801, in __init__
model.distribute_strategy.run(
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1286, in run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2849, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3632, in _call_for_each_replica
return fn(*args, **kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 597, in wrapper
return func(*args, **kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 802, in
lambda x: model(x, training=False), args=(concrete_x,))
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1037, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/keras/layers/einsum_dense.py", line 197, in call
ret = tf.einsum(self.equation, inputs, self.kernel)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/ops/special_math_ops.py", line 751, in einsum
return _einsum_v2(equation, *inputs, **kwargs)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/ops/special_math_ops.py", line 1180, in _einsum_v2
return gen_linalg_ops.einsum(inputs, resolved_equation)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/ops/gen_linalg_ops.py", line 1076, in einsum
_ops.raise_from_not_ok_status(e, name)
File "/home/xxx/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Attempting to perform BLAS operation using StreamExecutor without BLAS support [Op:Einsum]
【解决办法】
参考:InternalError: Attempting to perform BLAS operation using StreamExecutor without BLAS support
【tensorflow报错记录】InternalError: Attempting to perform BLAS operation using StreamExecutor without BL
"Attempting to perform BLAS operation using StreamExecutor without BLAS support" error occurs
以下几篇文章可供参考:
Tensorflow Keras中的masking与padding的学习笔记
tensorflow2中的遮盖和填充(padding&mask)以及dnamic_rnn学习笔记
代码讲解 Tensorflow 2.0 中的 mask
参考:TensorFlow 显存占用率高 GPU利用率低
【解决方案】tensorflow显存占比高但是GPU利用率低的问题
首先查看 GPU 使用情况
nvidia-smi -l 5 # 每隔5秒刷新
以下总结网上见过的方法:
【1】增大 batch_size(适用于:显存占用低 + GPU利用率低)
【2】减小 batch_size(适用于:显存占用高 + GPU利用率低)
【3】代码的问题:
数据预处理,CPU 将数据从磁盘读入内存,GPU 从内存读取数据斌训练,CPU 的读取速度跟不上 GPU 读取加训练的速度,使得 GPU 长时间等待 CPU 送过来的数据,处于空闲状态;
参考:TensorFlow学习- GPU显存占满而利用率(util)很低原因&提高方法
训练效率低?GPU利用率上不去?快来看看别人家的tricks吧~