RuntimeError: Expected hidden[0] size (x, x, x), got(x, x, x)

先上图:

RuntimeError: Expected hidden[0] size (x, x, x), got(x, x, x)_第1张图片

上图是在训练BILSTM网络时出现的问题。

问题描述:通过定义BILSTM网络的初始权重h0,c0,并将其作为BILSTM的初始权重输入至网络,通过如下代码实现

output, (hn, cn) = self.bilstm(input, (h0, c0))

 网络结构如下所示:

self.bilstm = nn.LSTM(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            bidirectional=True,
            bias=True,
            dropout=config.drop_out
        )

初始权重的维度在此我根据官方文档的定义为 h0,c0 进行初始化,维度为:

**h_0** of shape `(num_layers * num_directions, batch, hidden_size)`
**c_0** of shape `(num_layers * num_directions, batch, hidden_size)`

在BILSTM网络中参数定义如下:

num_layers: 2

num_directions: 2

batch: 4

seq_len: 10

input_size: 300

hidden_size: 100 

那么根据官方文档中定义的  h0,c0 维度应为: ( 2*2,4,100)=(4,4,100)

但根据文章最开始的错误截图表明,隐藏层初始权重的维度应该为(4,10,100),这不禁让我怀疑官方文档中规定的维度是否正确。

显然,官方文档是不可能错的,并且在以往使用BLSTM、RNN、BIGRU时的隐状态维度均与官方规定的维度相同,一时不知从何下手。

于是重新查看网络结构,发现遗漏了一个重要参数,即batch_first,来看一下BILSTM所需的所有参数:

Args:
        input_size: The number of expected features in the input `x`
        hidden_size: The number of features in the hidden state `h`
        num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
            would mean stacking two LSTMs together to form a `stacked LSTM`,
            with the second LSTM taking in outputs of the first LSTM and
            computing the final results. Default: 1
        bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
            Default: ``True``
        batch_first: If ``True``, then the input and output tensors are provided
            as (batch, seq, feature). Default: ``False``
        dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
            LSTM layer except the last layer, with dropout probability equal to
            :attr:`dropout`. Default: 0
        bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False``

batch_first 参数可以使得在训练过程中 batch这个维度在第一维,即输入数据维度为

(batch size,seq len,embedding dim),如果不添加  batch_first=True ,则其维度为

(seq len,batch size,embedding dim)

由于中午没有休息,迷迷糊糊的忘记添加了这个重要的参数,导致报错:初始权重维度不正确,通过添加  batch_first=True 后顺利的运行。

修改后网络结构如下:

self.bilstm = nn.LSTM(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True,
            bias=True,
            dropout=config.drop_out
        )

 

扩展: 当我们使用RNN及其变体网络时,想要添加初始权重,其维度一定是官方规定的维度,即

(num_layers * num_directions, batch, hidden_size)

同时一定要确保设置 batch_first=True ,官方文档中并未说明当设置 batch_first=True 时 h0、c0、hn、cn 的维度才为 (num_layers * num_directions, batch, hidden_size),所以千万要小心谨慎!

同时当hn、cn的维度不正确时,也要检查是否设置 batch_first 参数,RNN及其变体网络均适用该方法!

你可能感兴趣的:(Pytorch,神经网络,深度学习,神经网络,pytorch)