一维卷积tensorflow2版本的Conv1D以及Pytroch的nn.Conv1d用法 - 知乎
简单总结:
1.1 torch的1d卷积核在最后一维上滑动,tf的1d卷积核在倒数第2维上滑动。torch的in_channel是倒数第二维,tf的in_channel是倒数第一维。
1.2 torch需要指定in_channel和out_channel,前者是倒数第二维的维度,后者实际就是filter的数量,对应mlp中的输出维度。
torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
in_channels:在文本应用中,即为词向量的维度(文本经过embedding之后的维度)
out_channels:卷积产生的通道数,相当于是将词向量的维度从in_channels变成了out_channels
kernel_size:卷积核的尺寸;卷积核的第二个维度由in_channels决定,所以实际上卷积核的大小为kernel_size * in_channels
padding:对输入的每一条边,补充0的层数
1.3 tf不需要指定in_channel,in_channel默认就是最后一维(也就是通常H*W*C中的C),只需要指定out_channel和kernel_size。这样看来tf的设计更好一些,因为in_channle已经通过固定维度指定了,不会发生变化,这一点上来说torch可以向tf学习。
conv1D = tf.keras.layers.Conv1D(1, 3, padding='valid') #1是输出通道数,3是卷积核大小,不使用边界填充
2.1 torch版本:
torch.nn.
Conv2d
(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
torch卷积输入数据格式是 (N,Cin,H,W) , 输出是 (N,Cout,Hout,Wout)
其中kernel_size
, stride
, padding
, dilation
三个参数有两种设置方法:
a single
int
– in which case the same value is used for the height and width dimensiona
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
2.2 tf.layer版本
tf卷积输入数据格式是 (N,H,W,Cin) , 输出是 (N,Hout,Wout,Cout)
tf.layers.Conv2D(inputs, filters, kernel_size, strides=(1, 1), padding='valid')
其中kernel_size和strides都需要设置为一个list或者integer,integer会对所有维度生效相同的值。
Arguments |
|
---|---|
inputs |
Tensor input. |
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions. |
strides |
An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1. |
padding |
One of "valid" or "same" (case-insensitive). |
2.3 tf.nn版本
tf.nn.conv2d(
input, filter=None, strides=None, padding=None, use_cudnn_on_gpu=True,
data_format='NHWC', dilations=[1, 1, 1, 1], name=None, filters=None
)
Args |
|
---|---|
input |
A Tensor . Must be one of the following types: half , bfloat16 , float32 , float64 . A 4-D tensor. The dimension order is interpreted according to the value of data_format , see below for details. |
filter |
A Tensor . Must have the same type as input . A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels] |
strides |
An int or list of ints that has length 1 , 2 or 4 . The stride of the sliding window for each dimension of input . If a single value is given it is replicated in the H and W dimension. By default the N and C dimensions are set to 1. The dimension order is determined by the value of data_format , see below for details. |
padding |
Either the string "SAME" or "VALID" indicating the type of padding algorithm to use, or a list indicating the explicit paddings at the start and end of each dimension. When explicit padding is used and data_format is "NHWC" , this should be in the form [[0, 0], [pad_top,pad_bottom], [pad_left, pad_right], [0, 0]] . When explicit padding used and data_format is "NCHW" , this should be in the form [[0, 0], [0, 0],[pad_top, pad_bottom], [pad_left, pad_right]] . |
(1)torch中对于数据【N,C,H,W】这种四维数据可以用Conv2d实现mlp,只需要将kernal_size和stride_size都设置成【1,1】,filter_size设置成mlp的out_channel即可实现第二维从C变成filter_size的mlp,对于tf来说【N,H,W,C】的数据同理。
(2)torch中对于数据【N,C,H】这种四维数据可以用Conv1d实现mlp,只需要将kernal_size和stride_size都设置成【1】,filter_size设置成mlp的out_channel即可实现第二维从C变成filter_size的mlp,对于tf来说【N,H,C】的数据同理。