最近在移植Pytorch代码的时候,发现很多函数都是可以对应上,或者需要进行修改,因此写此篇文章用于记录遇到的各种函数和解决办法,以及存在的BUG和解决方案。后面遇到问题会继续更新此博客
Pytorch | 描述 | Tensorflow | 描述 |
---|---|---|---|
masked_fill_ |
掩码操作 用value填充tensor中与mask中值为1位置相对应的元素。mask的形状必须与要填充的tensor形状一致 |
tf.where | 将mask作为condition, x值为value, y值为原本tensorf的值 x = tf.where(mask, x=mask_value, y=x) |
nn.function.pad | tf.pad | ||
nn.finfo |
根据括号中的类型来获得信息,获得符合这个类型的数型 | tf.experimental.numpy.finfo() |
|
nn.einsum | tf.einsum | ||
nn.chunk() | 分块 | tf.split() | |
nn.Parameter() |
tf.Variable() | ||
torch.no_grad() |
tf.stop_gradient() | ||
nn.cumsum | 利用tf.cast和tf.math.cumsum | ||
torch.arange |
tf.range | ||
torch.stack |
tf.stack |
||
torch.cat | tf.concat | ||
torch.permute |
tf.transpose | ||
Pytorch | Tensorflow | 备注 |
---|---|---|
nn.Linear |
tf.keras.Dense |
TF不需要输入的维度,只要输出维度 |
nn.Sequential |
tf.keras.Sequential |
TF中注意在()里面加[ ] |
nn.ModuleList |
直接新建一个空列表,然后list.append() |
|
nn.Identity() |
自己构建一个Module,直接返回输入值 | |
描述:trunc_normal_ , 在pytorch中,该函数是截取正泰分布,限制变量的取值范围。
在pytorch中的代码如下所示
def _no_grad_trunc_normal_(tensor, mean, std, a, b): # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf def norm_cdf(x): # Computes standard normal cumulative distribution function return (1. + math.erf(x / math.sqrt(2.))) / 2. if (mean < a - 2 * std) or (mean > b + 2 * std): warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. " "The distribution of values may be incorrect.", stacklevel=2) with torch.no_grad(): # Values are generated by using a truncated uniform distribution and # then using the inverse CDF for the normal distribution. # Get upper and lower cdf values l = norm_cdf((a - mean) / std) u = norm_cdf((b - mean) / std) # Uniformly fill tensor with values from [l, u], then translate to # [2l-1, 2u-1]. tensor.uniform_(2 * l - 1, 2 * u - 1) # Use inverse cdf transform for normal distribution to get truncated # standard normal tensor.erfinv_() # Transform to proper mean, std tensor.mul_(std * math.sqrt(2.)) tensor.add_(mean) # Clamp to ensure it's in the proper range tensor.clamp_(min=a, max=b) return tensor def trunc_normal_(tensor: Tensor, mean: float = 0., std: float = 1., a: float = -2., b: float = 2.) -> Tensor: r"""Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)` with values outside :math:`[a, b]` redrawn until they are within the bounds. The method used for generating the random values works best when :math:`a \leq \text{mean} \leq b`. Args: tensor: an n-dimensional `torch.Tensor` mean: the mean of the normal distribution std: the standard deviation of the normal distribution a: the minimum cutoff value b: the maximum cutoff value Examples: >>> w = torch.empty(3, 5) >>> nn.init.trunc_normal_(w) """ return _no_grad_trunc_normal_(tensor, mean, std, a, b)
移植到Tensorflow中为
import tensorflow as tf import math import warnings def _no_grad_trunc_normal_(tensor, mean, std, a, b): # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf def norm_cdf(x): # Computes standard normal cumulative distribution function return (1. + math.erf(x / math.sqrt(2.))) / 2. if (mean < a - 2 * std) or (mean > b + 2 * std): warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. " "The distribution of values may be incorrect.", stacklevel=2) # with torch.no_grad(): # Values are generated by using a truncated uniform distribution and # then using the inverse CDF for the normal distribution. # Get upper and lower cdf values l = tf.stop_gradient(norm_cdf((a - mean) / std)) u = tf.stop_gradient(norm_cdf((b - mean) / std)) # Uniformly fill tensor with values from [l, u], then translate to # [2l-1, 2u-1]. tensor = tf.stop_gradient(tf.random.uniform(shape=tf.shape(tensor), minval=2 * l - 1, maxval=2 * u - 1)) # tensor.uniform_(2 * l - 1, 2 * u - 1) # Use inverse cdf transform for normal distribution to get truncated # standard normal tensor = tf.stop_gradient(tf.math.erfinv(x=tensor)) # tensor.erfinv_() # Transform to proper mean, std tensor = tf.stop_gradient(tf.math.multiply(x=tensor, y=std * math.sqrt(2.))) # tensor.mul_(std * math.sqrt(2.)) tensor = tf.stop_gradient(tf.math.add(x=tensor, y=mean)) # tensor.add_(mean) # Clamp to ensure it's in the proper range tensor = tf.stop_gradient(tf.clip_by_value(t=tensor, clip_value_min=a, clip_value_max=b)) # tensor.clamp_(min=a, max=b) return tensor def trunc_normal_(tensor, mean: float = 0., std: float = 1., a: float = -2., b: float = 2.): r"""Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)` with values outside :math:`[a, b]` redrawn until they are within the bounds. The method used for generating the random values works best when :math:`a \leq \text{mean} \leq b`. Args: tensor: an n-dimensional `torch.Tensor` mean: the mean of the normal distribution std: the standard deviation of the normal distribution a: the minimum cutoff value b: the maximum cutoff value Examples: # >>> w = torch.empty(3, 5) # >>> nn.init.trunc_normal_(w) """ return _no_grad_trunc_normal_(tensor, mean, std, a, b)
torch.nn.Flatten
(tensor, start_dim=1, end_dim=- 1),在pytorch中,该函数可以压缩任何范围内的维度。在Tensorflow中实现的代码如下
def flatten(tensor, start_dim=0, end_dim=-1): """ :param tensor: :param start_dim: :param end_dim: :return: 对应pytorch的flatten操作,将索引为 start_dim 和 end_dim 之间(包括该位置)的数量相乘,其余位置不变。因为默认 start_dim=0,end_dim=-1,所以 torch.flatten(t) 返回只有一维的数据。 """ input_dim = tf.shape(tensor).numpy().tolist() assert start_dim <= len(input_dim) - 1 and end_dim <= len(input_dim) - 1 # 确保数值不会高于维度 li = [] # if start_dim == 0 if end_dim == -1: li.extend(input_dim[:start_dim]) li.extend([-1]) # print(li) tensor = tf.reshape(tensor, shape=li) else: li.extend(input_dim[:start_dim]) li.extend([-1]) li.extend(input_dim[end_dim + 1:]) # print(li) tensor = tf.reshape(tensor, shape=li) return tensor
该函数在pytorch中,是遍历所有子层,通常用于整体初始化参数,在Tensorflow中目前我还不晓得如何在继承Module中实现该方法,一个一个调用太慢了。或许利用全局变量是个解决思路
在Tensorflow使用的问题如下:
1. Tensor type unknown to einops
如果采用函数式API的方式构建模型,特别是当中又调用了其他的网络模块,则可能发生
Tensor type unknown to einops
解决办法是改为继承tf.keras.Model方式编写,并在该__init__时实现其他模块类,在
call中进行调用,则可以解决。
Residual(PreNorm(dim, Attention(dim, heads=heads, dropout=dropout, num_keypoints=num_keypoints, scale_with_head=scale_with_head))),
其中Residual,PreNorm,Attention都是单独的Module,而在Tensorflow中,我采用的是再新建一个继承Module的类,按调用顺序依次执行。