Pytorch 转 Tensorflow代码的一些参考和坑(持续更新)

最近在移植Pytorch代码的时候,发现很多函数都是可以对应上,或者需要进行修改,因此写此篇文章用于记录遇到的各种函数和解决办法,以及存在的BUG和解决方案。后面遇到问题会继续更新此博客

1.函数对应

Pytorch & Tensorflow 函数对应
Pytorch 描述 Tensorflow 描述
masked_fill_
掩码操作
用value填充tensor中与mask中值为1位置相对应的元素。mask的形状必须与要填充的tensor形状一致
tf.where

将mask作为condition,  x值为value, y值为原本tensorf的值

x = tf.where(mask,  x=mask_value, y=x)

nn.function.pad tf.pad

nn.finfo
根据括号中的类型来获得信息,获得符合这个类型的数型
tf.experimental.numpy.finfo()
nn.einsum tf.einsum
nn.chunk() 分块 tf.split()
nn.Parameter()
tf.Variable()
torch.no_grad()
tf.stop_gradient()
nn.cumsum 利用tf.cast和tf.math.cumsum
torch.arange
tf.range
torch.stack
tf.stack
torch.cat tf.concat
torch.permute
tf.transpose
Pytorch & Tensorflow Layer对应
Pytorch Tensorflow 备注
nn.Linear
tf.keras.Dense
TF不需要输入的维度,只要输出维度
nn.Sequential
tf.keras.Sequential

TF中注意在()里面加[ ]

nn.ModuleList

直接新建一个空列表,然后list.append()

nn.Identity()
 自己构建一个Module,直接返回输入值

2.函数移植

 1.trunc_normal_ :

描述:trunc_normal_  , 在pytorch中,该函数是截取正泰分布,限制变量的取值范围。

在pytorch中的代码如下所示

def _no_grad_trunc_normal_(tensor, mean, std, a, b):
    # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
    def norm_cdf(x):
        # Computes standard normal cumulative distribution function
        return (1. + math.erf(x / math.sqrt(2.))) / 2.

    if (mean < a - 2 * std) or (mean > b + 2 * std):
        warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. "
                      "The distribution of values may be incorrect.",
                      stacklevel=2)

    with torch.no_grad():
        # Values are generated by using a truncated uniform distribution and
        # then using the inverse CDF for the normal distribution.
        # Get upper and lower cdf values
        l = norm_cdf((a - mean) / std)
        u = norm_cdf((b - mean) / std)

        # Uniformly fill tensor with values from [l, u], then translate to
        # [2l-1, 2u-1].
        tensor.uniform_(2 * l - 1, 2 * u - 1)

        # Use inverse cdf transform for normal distribution to get truncated
        # standard normal
        tensor.erfinv_()

        # Transform to proper mean, std
        tensor.mul_(std * math.sqrt(2.))
        tensor.add_(mean)

        # Clamp to ensure it's in the proper range
        tensor.clamp_(min=a, max=b)
        return tensor

def trunc_normal_(tensor: Tensor, mean: float = 0., std: float = 1., a: float = -2., b: float = 2.) -> Tensor:
    r"""Fills the input Tensor with values drawn from a truncated
    normal distribution. The values are effectively drawn from the
    normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)`
    with values outside :math:`[a, b]` redrawn until they are within
    the bounds. The method used for generating the random values works
    best when :math:`a \leq \text{mean} \leq b`.

    Args:
        tensor: an n-dimensional `torch.Tensor`
        mean: the mean of the normal distribution
        std: the standard deviation of the normal distribution
        a: the minimum cutoff value
        b: the maximum cutoff value

    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.trunc_normal_(w)
    """
    return _no_grad_trunc_normal_(tensor, mean, std, a, b)

移植到Tensorflow中为

import tensorflow as tf
import math
import warnings


def _no_grad_trunc_normal_(tensor, mean, std, a, b):
    # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
    def norm_cdf(x):
        # Computes standard normal cumulative distribution function
        return (1. + math.erf(x / math.sqrt(2.))) / 2.

    if (mean < a - 2 * std) or (mean > b + 2 * std):
        warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. "
                      "The distribution of values may be incorrect.",
                      stacklevel=2)

    # with torch.no_grad():
    # Values are generated by using a truncated uniform distribution and
    # then using the inverse CDF for the normal distribution.
    # Get upper and lower cdf values
    l = tf.stop_gradient(norm_cdf((a - mean) / std))
    u = tf.stop_gradient(norm_cdf((b - mean) / std))

    # Uniformly fill tensor with values from [l, u], then translate to
    # [2l-1, 2u-1].
    tensor = tf.stop_gradient(tf.random.uniform(shape=tf.shape(tensor), minval=2 * l - 1, maxval=2 * u - 1))
    # tensor.uniform_(2 * l - 1, 2 * u - 1)
    # Use inverse cdf transform for normal distribution to get truncated
    # standard normal
    tensor = tf.stop_gradient(tf.math.erfinv(x=tensor))
    # tensor.erfinv_()

    # Transform to proper mean, std
    tensor = tf.stop_gradient(tf.math.multiply(x=tensor, y=std * math.sqrt(2.)))
    # tensor.mul_(std * math.sqrt(2.))
    tensor = tf.stop_gradient(tf.math.add(x=tensor, y=mean))
    # tensor.add_(mean)

    # Clamp to ensure it's in the proper range
    tensor = tf.stop_gradient(tf.clip_by_value(t=tensor, clip_value_min=a, clip_value_max=b))
    # tensor.clamp_(min=a, max=b)
    return tensor

def trunc_normal_(tensor, mean: float = 0., std: float = 1., a: float = -2., b: float = 2.):
    r"""Fills the input Tensor with values drawn from a truncated
    normal distribution. The values are effectively drawn from the
    normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)`
    with values outside :math:`[a, b]` redrawn until they are within
    the bounds. The method used for generating the random values works
    best when :math:`a \leq \text{mean} \leq b`.

    Args:
        tensor: an n-dimensional `torch.Tensor`
        mean: the mean of the normal distribution
        std: the standard deviation of the normal distribution
        a: the minimum cutoff value
        b: the maximum cutoff value

    Examples:
        # >>> w = torch.empty(3, 5)
        # >>> nn.init.trunc_normal_(w)
    """
    return _no_grad_trunc_normal_(tensor, mean, std, a, b)

2.flatten()操作

torch.nn.Flatten(tensor, start_dim=1end_dim=- 1),在pytorch中,该函数可以压缩任何范围内的维度。

在Tensorflow中实现的代码如下

def flatten(tensor, start_dim=0, end_dim=-1):
    """
    :param tensor:
    :param start_dim:
    :param end_dim:
    :return: 对应pytorch的flatten操作,将索引为 start_dim 和 end_dim 之间(包括该位置)的数量相乘,其余位置不变。因为默认 start_dim=0,end_dim=-1,所以 torch.flatten(t) 返回只有一维的数据。
    """
    input_dim = tf.shape(tensor).numpy().tolist()
    assert start_dim <= len(input_dim) - 1 and end_dim <= len(input_dim) - 1  # 确保数值不会高于维度
    li = []
    # if start_dim == 0
    if end_dim == -1:
        li.extend(input_dim[:start_dim])
        li.extend([-1])
        # print(li)
        tensor = tf.reshape(tensor, shape=li)
    else:
        li.extend(input_dim[:start_dim])
        li.extend([-1])
        li.extend(input_dim[end_dim + 1:])
        # print(li)
        tensor = tf.reshape(tensor, shape=li)
    return tensor

3.nn.Module.apply(待解决)

该函数在pytorch中,是遍历所有子层,通常用于整体初始化参数,在Tensorflow中目前我还不晓得如何在继承Module中实现该方法,一个一个调用太慢了。或许利用全局变量是个解决思路

3.其他库的函数调用

4.BUG

1.einops

在Tensorflow使用的问题如下:

1. Tensor type unknown to einops

如果采用函数式API的方式构建模型,特别是当中又调用了其他的网络模块,则可能发生

Tensor type unknown to einops

解决办法是改为继承tf.keras.Model方式编写,并在该__init__时实现其他模块类,在

call中进行调用,则可以解决。

2.单行调用形式

 Residual(PreNorm(dim, Attention(dim, heads=heads, dropout=dropout, num_keypoints=num_keypoints, scale_with_head=scale_with_head))),

其中Residual,PreNorm,Attention都是单独的Module,而在Tensorflow中,我采用的是再新建一个继承Module的类,按调用顺序依次执行。

3.继承Module构建网络

1.该方式Batch为None,因此广播式需要传入确定的Batch大小,直接在__init__时确定下来。

你可能感兴趣的:(Tensorflow2,pytorch,tensorflow,深度学习)