先上图:
上图是在训练BILSTM网络时出现的问题。
问题描述:通过定义BILSTM网络的初始权重h0,c0,并将其作为BILSTM的初始权重输入至网络,通过如下代码实现
output, (hn, cn) = self.bilstm(input, (h0, c0))
网络结构如下所示:
self.bilstm = nn.LSTM(
input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=self.num_layers,
bidirectional=True,
bias=True,
dropout=config.drop_out
)
初始权重的维度在此我根据官方文档的定义为 h0,c0 进行初始化,维度为:
**h_0** of shape `(num_layers * num_directions, batch, hidden_size)`
**c_0** of shape `(num_layers * num_directions, batch, hidden_size)`
在BILSTM网络中参数定义如下:
num_layers: 2
num_directions: 2
batch: 4
seq_len: 10
input_size: 300
hidden_size: 100
那么根据官方文档中定义的 h0,c0 维度应为: ( 2*2,4,100)=(4,4,100)
但根据文章最开始的错误截图表明,隐藏层初始权重的维度应该为(4,10,100),这不禁让我怀疑官方文档中规定的维度是否正确。
显然,官方文档是不可能错的,并且在以往使用BLSTM、RNN、BIGRU时的隐状态维度均与官方规定的维度相同,一时不知从何下手。
于是重新查看网络结构,发现遗漏了一个重要参数,即batch_first,来看一下BILSTM所需的所有参数:
Args:
input_size: The number of expected features in the input `x`
hidden_size: The number of features in the hidden state `h`
num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
would mean stacking two LSTMs together to form a `stacked LSTM`,
with the second LSTM taking in outputs of the first LSTM and
computing the final results. Default: 1
bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
Default: ``True``
batch_first: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False``
dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
LSTM layer except the last layer, with dropout probability equal to
:attr:`dropout`. Default: 0
bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False``
batch_first 参数可以使得在训练过程中 batch这个维度在第一维,即输入数据维度为
(batch size,seq len,embedding dim),如果不添加 batch_first=True ,则其维度为
(seq len,batch size,embedding dim)
由于中午没有休息,迷迷糊糊的忘记添加了这个重要的参数,导致报错:初始权重维度不正确,通过添加 batch_first=True 后顺利的运行。
修改后网络结构如下:
self.bilstm = nn.LSTM(
input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=self.num_layers,
batch_first=True,
bidirectional=True,
bias=True,
dropout=config.drop_out
)
扩展: 当我们使用RNN及其变体网络时,想要添加初始权重,其维度一定是官方规定的维度,即
(num_layers * num_directions, batch, hidden_size)
同时一定要确保设置 batch_first=True ,官方文档中并未说明当设置 batch_first=True 时 h0、c0、hn、cn 的维度才为 (num_layers * num_directions, batch, hidden_size),所以千万要小心谨慎!
同时当hn、cn的维度不正确时,也要检查是否设置 batch_first 参数,RNN及其变体网络均适用该方法!