Pytorch -- 简单的rnn 记不住的 api

也不太简单的流程图

encoder
数据
vector
RNN/ LSTM/ GRU
decoder
output

layer api

流程图看到,需要几个 layer, encoder 这里就选择 nn.Embedding, 循环神经

nn.Embedding

  • torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)
    • num_embeddings: 词汇量有多大呀,一共有5000个不同单词(token),你给我传 100 不是很为难我嘛
    • embedding_dim:每个 vector 用多少数字表示捏?
    • padding_idx:我这没有这个货,你说你用啥填吧,你说的算,不说就按0填入了。
    • Output: (*, H), where * is the input shape and H=embedding_dim

循环神经

先回顾一下公式

h t = t a n h ( W i h x t + b i h + W h h h t − 1 + b h h ) h_t =tanh(W_{ih}x _t+b_{ih} +W_{hh}h_{t−1}+b_{hh}) ht=tanh(Wihxt+bih+Whhht1+bhh)

  • where h t h_t ht is the hidden state at time t,
  • x t x_t xt is the input at time t
  • h ( t − 1 ) h_{(t-1)} h(t1) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0.
  • If nonlinearity is ‘relu’, then ReLU is used instead of tanh.
api 参数
  • input_size – The number of expected features in the input x, 也就是你有多少 token(one hot 情况下),但是之前经过了 embedding layer,这个就为 embedding dictionary dim,

  • hidden_size – The number of features in the hidden state h. 这里其实不太好理解,看回公式, 上一个hidden state 要与新进入的 input 有个加和,矩阵运算来讲,不同的维度是不能相加减,好在 torch 已经做的很好,这个调试就好啦。

  • num_layers – Number of recurrent layers. E.g, setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

  • nonlinearity – The non-linearity to use. Can be either ‘tanh’ or ‘relu’. Default: ‘tanh’

  • bias – If False, then the layer does not use bias weights b i h b_ih bih and b h h b_hh bhh. Default: True

  • batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

  • dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

  • bidirectional – If True, becomes a bidirectional RNN. Default: False

当然 rnn 现在用的已经很少,几乎被 lstm,gru 取代,但是思路没什么差距,会用一个其他也会,这里不赘述了

decoder

这里比较简单,用 nn,Linear() ,太简单就不多讲了

上代码

class RNNModel(nn.Module):
    """ 一个简单的循环神经网络"""
    def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5):
        ''' 该模型包含以下几层:
            - 词嵌入层
            - 一个循环神经网络层(RNN, LSTM, GRU)
            - 一个线性层,从hidden state到输出单词表
            - 一个dropout层,用来做regularization
        ''' 
        super(RNNModel, self).__init__()
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        if rnn_type in ['LSTM', 'GRU']:
            self.rnn = getattr(nn, rnn_type)(ninp, nhid, nlayers, dropout=dropout)
        else:
            try:
                nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
            except KeyError:
                raise ValueError( """An invalid option for `--model` was supplied,
                                 options are ['LSTM', 'GRU', 'RNN_TANH' or 'RNN_RELU']""")
            self.rnn = nn.RNN(ninp, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
        self.decoder = nn.Linear(nhid, ntoken)
        self.init_weights()
        self.rnn_type = rnn_type
        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def init_hidden(self, bsz, requires_grad=True):
        weight = next(self.parameters())
        if self.rnn_type == 'LSTM':
            return (weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad),
                    weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad))
        else:
            return weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad)
    
    def forward(self, input, hidden):
        ''' Forward pass:
            - word embedding
            - 输入循环神经网络
            - 一个线性层从hidden state转化为输出单词表
        '''
        emb = self.drop(self.encoder(input))
        output, hidden = self.rnn(emb, hidden)
        output = self.drop(output)
        decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
        return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden

你可能感兴趣的:(Pytorch -- 简单的rnn 记不住的 api)