NNDL 作业8:RNN - 简单循环网络

    • SRN
    • 1. 使用Numpy实现SRN
    • 2. 在1的基础上,增加激活函数tanh
    • 3. 分别使用nn.RNNCell、nn.RNN实现SRN
      • nn.RNNCell
      • nn.RNN
      • 总结:
    • 5. 实现“Character-Level Language Models”源代码(必做)
    • 7. “编码器-解码器”的简单实现(必做)
      • 实现原理
    • 总结

SRN

SRN是只有一个隐藏层的循环卷积神经网络,如图所示。
NNDL 作业8:RNN - 简单循环网络_第1张图片
其中W、U、b共享权重,

1. 使用Numpy实现SRN

NNDL 作业8:RNN - 简单循环网络_第2张图片
简单循环神经网络:只有一层隐藏层神经元的神经网络


import numpy as np

inputs = np.array([[1., 1.],
                   [1., 1.],
                   [2., 2.]])  # 初始化输入序列
print('inputs is ', inputs)

state_t = np.zeros(2, )  # 初始化存储器
print('state_t is ', state_t)

w1, w2, w3, w4, w5, w6, w7, w8 = 1., 1., 1., 1., 1., 1., 1., 1.
U1, U2, U3, U4 = 1., 1., 1., 1.
print('--------------------------------------')
for input_t in inputs:
    print('inputs is ', input_t)
    print('state_t is ', state_t)
    in_h1 = np.dot([w1, w3], input_t) + np.dot([U2, U4], state_t)
    in_h2 = np.dot([w2, w4], input_t) + np.dot([U1, U3], state_t)
    state_t = in_h1, in_h2
    output_y1 = np.dot([w5, w7], [in_h1, in_h2])
    output_y2 = np.dot([w6, w8], [in_h1, in_h2])
    print('output_y is ', output_y1, output_y2)
    print('---------------')
'''
np.dot(x,y):计算x和y的加权和
'''

NNDL 作业8:RNN - 简单循环网络_第3张图片

2. 在1的基础上,增加激活函数tanh

NNDL 作业8:RNN - 简单循环网络_第4张图片

# -*- coding: utf-8 -*-
# @Time : 2022-11-12 17:46
# @Author : Mr.Liu
# @Email : [email protected]
# @File : 1.py
# @ProjectName: python

import numpy as np

inputs = np.array([[1., 1.],
                   [1., 1.],
                   [2., 2.]])  # 初始化输入序列
print('inputs is ', inputs)

state_t = np.zeros(2, )  # 初始化存储器
print('state_t is ', state_t)

w1, w2, w3, w4, w5, w6, w7, w8 = 1., 1., 1., 1., 1., 1., 1., 1.
U1, U2, U3, U4 = 1., 1., 1., 1.
print('--------------------------------------')
for input_t in inputs:
    print('inputs is ', input_t)
    print('state_t is ', state_t)
    in_h1 = np.tanh(np.dot([w1, w3], input_t) + np.dot([U2, U4], state_t))
    in_h2 = np.tanh(np.dot([w2, w4], input_t) + np.dot([U1, U3], state_t))
    state_t = in_h1, in_h2
    output_y1 = np.dot([w5, w7], [in_h1, in_h2])
    output_y2 = np.dot([w6, w8], [in_h1, in_h2])
    print('output_y is ', output_y1, output_y2)
    print('---------------')

NNDL 作业8:RNN - 简单循环网络_第5张图片

3. 分别使用nn.RNNCell、nn.RNN实现SRN

NNDL 作业8:RNN - 简单循环网络_第6张图片

nn.RNNCell

NNDL 作业8:RNN - 简单循环网络_第7张图片

inputsize指的就是每个独热向量Xi的长度

inputSize指的是句子的长度(处理过程中应该是要填充后,都和字数最多的一样长,用的就是这个长度)

batchsize就是说我一次拿几个句子去跑我的网络结构

hidden_size:隐层那个向量h的长度大小,也是预期结果想得到的向量长度的大小(即output向量的长度大小)

具体的说:就是把这些句子,根据batch_size的不同,一次选取batch_size个句子,然后把每个字对应的长度为128的独热向量xi和hi-1进行循环计算得到hi,循环seq_len次,最后的hi就是输入这个句子后通过RNN
Cell得到的结果,长度大小为hidden_size

import torch

batch_size = 1
seq_len = 3  # 序列长度
input_size = 2  # 输入序列维度
hidden_size = 2  # 隐藏层维度
output_size = 2  # 输出层维度

# RNNCell
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
# 初始化参数 https://zhuanlan.zhihu.com/p/342012463
for name, param in cell.named_parameters():
    if name.startswith("weight"):
        torch.nn.init.ones_(param)
    else:
        torch.nn.init.zeros_(param)
# 线性层
liner = torch.nn.Linear(hidden_size, output_size)
liner.weight.data = torch.Tensor([[1, 1], [1, 1]])
liner.bias.data = torch.Tensor([0.0])

seq = torch.Tensor([[[1, 1]],
                    [[1, 1]],
                    [[2, 2]]])
hidden = torch.zeros(batch_size, hidden_size)
output = torch.zeros(batch_size, output_size)

for idx, input in enumerate(seq):
    print('=' * 20, idx, '=' * 20)

    print('Input :', input)
    print('hidden :', hidden)

    hidden = cell(input, hidden)
    output = liner(hidden)
    print('output :', output)

NNDL 作业8:RNN - 简单循环网络_第8张图片

nn.RNN

NNDL 作业8:RNN - 简单循环网络_第9张图片

参数解释如下: input_size:The number of expected features in the input
x,即输入特征的维度, 一般rnn中输入的是词向量,那么 input_size 就等于一个词向量的维度。 hidden_size:The
number of features in the hidden state
h,即隐藏层神经元个数,或者也叫输出的维度(因为rnn输出为各个时间步上的隐藏状态)。 num_layers:Number of
recurrent layers. E.g., setting num_layers=2 would mean stacking two
RNNs together to form a stacked RNN,with the second RNN taking in
outputs of the first RNN and computing the final results. Default: 1
即网络的层数。 nonlinearity:The non-linearity to use. Can be either ‘tanh’ or
‘relu’. Default: ‘tanh’,即激活函数。 bias:If False, then the layer does not
use bias weights b_ih and b_hh. Default: True,即是否使用偏置。 batch_first:If
True, then the input and output tensors are provided as (batch, seq,
feature). Default: False,即输入数据的形式,默认是
False,如果设置成True,则格式为(seq(num_step), batch,
input_dim),也就是将序列长度放在第一位,batch 放在第二位。 dropout:If non-zero, introduces
a Dropout layer on the outputs of each RNN layer except the last
layer, with dropout probability equal to :attr:dropout. Default:
0,即是否应用dropout, 默认不使用,如若使用将其设置成一个0-1的数字即可。 bidirectional:If True,
becomes a bidirectional RNN. Default: False,是否使用双向的 rnn,默认是 False。

import torch

batch_size = 1
seq_len = 3
input_size = 2
hidden_size = 2
num_layers = 1
output_size = 2

cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
for name, param in cell.named_parameters():  # 初始化参数
    if name.startswith("weight"):
        torch.nn.init.ones_(param)
    else:
        torch.nn.init.zeros_(param)

# 线性层
liner = torch.nn.Linear(hidden_size, output_size)
liner.weight.data = torch.Tensor([[1, 1], [1, 1]])
liner.bias.data = torch.Tensor([0.0])

inputs = torch.Tensor([[[1, 1]],
                       [[1, 1]],
                       [[2, 2]]])
hidden = torch.zeros(num_layers, batch_size, hidden_size)
out, hidden = cell(inputs, hidden)

print('Input :', inputs[0])
print('hidden:', 0, 0)
print('Output:', liner(out[0]))
print('--------------------------------------')
print('Input :', inputs[1])
print('hidden:', out[0])
print('Output:', liner(out[1]))
print('--------------------------------------')
print('Input :', inputs[2])
print('hidden:', out[1])
print('Output:', liner(out[2]))

NNDL 作业8:RNN - 简单循环网络_第10张图片

总结:

1、nn.RnnCell是nn.RNN的每单层循环单元,在nn.RnnCell的基础上增加for循环,来计算最终的结果。
NNDL 作业8:RNN - 简单循环网络_第11张图片
通过参数我们也可以发现nn.Rnn是含有batch_size参数的,而nn.RnnCell是不含有batch_size参数的。batch_size即为数据量的大小,也就是nn.RnnCell的循环次数。

5. 实现“Character-Level Language Models”源代码(必做)

NNDL 作业8:RNN - 简单循环网络_第12张图片

import torch

input_size = 4
hidden_size = 3
batch_size = 1


idx2char = ['e','h','l','o']
# hello 的编码 1 0 2 2 3
x_data = [1,0,2,2]
# ohlol 的编码 3 1 2 3 2
y_data = [0,2,2,3]
out_hot_lookup = [[1,0,0,0],
                  [0,1,0,0],
                  [0,0,1,0],
                  [0,0,0,1]]
x_one_hot = [out_hot_lookup[x] for x in x_data]

inputs = torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
labels = torch.LongTensor(y_data).view(-1,1)
class RNN_Model(torch.nn.Module):
    def __init__(self,input_size,hidden_size,batch_size,num_layers=1):
        super(RNN_Model, self).__init__()
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.run = torch.nn.RNNCell(input_size=self.input_size,hidden_size=self.hidden_size)
    def forward(self,input,hidden):
        hidden = self.run(input,hidden)
        return hidden
    def init_hidden(self):
        return torch.zeros(self.batch_size,self.hidden_size)
net = RNN_Model(input_size=input_size,hidden_size=hidden_size,batch_size=batch_size)

Loss = torch.nn.CrossEntropyLoss()
optim = torch.optim.Adam(net.parameters(),lr=0.1)
for epoch in range(100):
    loss = 0
    optim.zero_grad()
    hidden = net.init_hidden()
    for input,label in zip(inputs,labels):
        _,idx = input.max(dim=1)
        print('previous',idx2char[idx.item()],end='')
        hidden = net(input,hidden)
        loss+= Loss(hidden,label)
        _,idx = hidden.max(dim=1)
        print('after',idx2char[idx.item()],end='\n')
    print('\n')
    loss.backward()
    optim.step()

NNDL 作业8:RNN - 简单循环网络_第13张图片
翻译:
Okay, so we have an idea about what RNNs are, why they are super exciting, and how they work. We’ll now ground this in a fun application: We’ll train RNN character-level language models. That is, we’ll give the RNN a huge chunk of text and ask it to model the probability distribution of the next character in the sequence given a sequence of previous characters. This will then allow us to generate new text one character at a time.
好的,我们对RNN是什么已经有了一定的条件,以及为什么他们这么优秀,和他们是怎样工作的,下面我们来围绕一个有趣的应用我们训练RNN模型,我们给RNN一块文章并且询问模型在这句中下个单词的的句子通过上一个字母来判断,这就允许我们每次一个字母形成一个新的文章。
As a working example, suppose we only had a vocabulary of four possible letters “helo”, and wanted to train an RNN on the training sequence “hello”. This training sequence is in fact a source of 4 separate training examples: 1. The probability of “e” should be likely given the context of “h”, 2. “l” should be likely in the context of “he”, 3. “l” should also be likely given the context of “hel”, and finally 4. “o” should be likely given the context of “hell”.
我们举一个例子,我们有一个单词是helo,并且想要训练一个RNN模型寻连hello,这个训练的序列是由四个单独的训练例子担任的,第一个字母e应该由h来给出,l应该由he来给出,l应该由hel来给出,最后一个o应该由hell给出。
Concretely, we will encode each character into a vector using 1-of-k encoding (i.e. all zero except for a single one at the index of the character in the vocabulary), and feed them into the RNN one at a time with the step function. We will then observe a sequence of 4-dimensional output vectors (one dimension per character), which we interpret as the confidence the RNN currently assigns to each character coming next in the sequence.
我们将每个字母进行独热编码(每个向量只有一个为1,其余为0,并且用RNN对每一步进行训练,我们将会观测出一个四个维度的输出向量代表每个字母),我们用RNN去判断出每个字母。如下图所示
NNDL 作业8:RNN - 简单循环网络_第14张图片

7. “编码器-解码器”的简单实现(必做)

实现原理

给出要编码的数据,通过RNN训练网络模型,通过要编码的数据推测出译码的数据,也就是体现了一种时序功能,所以可以用RNN进行训练。

import torch
import numpy as np
import torch.nn as nn
import torch.utils.data as Data

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# S: Symbol that shows starting of decoding input
# E: Symbol that shows starting of decoding output
# ?: Symbol that will fill in blank sequence if current batch data size is short than n_step

letter = [c for c in 'SE?abcdefghijklmnopqrstuvwxyz']
letter2idx = {n: i for i, n in enumerate(letter)}

seq_data = [['man', 'women'], ['black', 'white'], ['king', 'queen'], ['girl', 'boy'], ['up', 'down'], ['high', 'low']]

# Seq2Seq Parameter
n_step = max([max(len(i), len(j)) for i, j in seq_data]) # max_len(=5)
n_hidden = 128
n_class = len(letter2idx) # classfication problem
batch_size = 3

def make_data(seq_data):
    enc_input_all, dec_input_all, dec_output_all = [], [], []

    for seq in seq_data:
        for i in range(2):
            seq[i] = seq[i] + '?' * (n_step - len(seq[i])) # 'man??', 'women'

        enc_input = [letter2idx[n] for n in (seq[0] + 'E')] # ['m', 'a', 'n', '?', '?', 'E']
        dec_input = [letter2idx[n] for n in ('S' + seq[1])] # ['S', 'w', 'o', 'm', 'e', 'n']
        dec_output = [letter2idx[n] for n in (seq[1] + 'E')] # ['w', 'o', 'm', 'e', 'n', 'E']

        enc_input_all.append(np.eye(n_class)[enc_input])
        dec_input_all.append(np.eye(n_class)[dec_input])
        dec_output_all.append(dec_output) # not one-hot

    # make tensor
    return torch.Tensor(enc_input_all), torch.Tensor(dec_input_all), torch.LongTensor(dec_output_all)

'''
enc_input_all: [6, n_step+1 (because of 'E'), n_class]
dec_input_all: [6, n_step+1 (because of 'S'), n_class]
dec_output_all: [6, n_step+1 (because of 'E')]
'''
enc_input_all, dec_input_all, dec_output_all = make_data(seq_data)

class TranslateDataSet(Data.Dataset):
    def __init__(self, enc_input_all, dec_input_all, dec_output_all):
        self.enc_input_all = enc_input_all
        self.dec_input_all = dec_input_all
        self.dec_output_all = dec_output_all
    
    def __len__(self): # return dataset size
        return len(self.enc_input_all)
    
    def __getitem__(self, idx):
        return self.enc_input_all[idx], self.dec_input_all[idx], self.dec_output_all[idx]

loader = Data.DataLoader(TranslateDataSet(enc_input_all, dec_input_all, dec_output_all), batch_size, True)

# Model
class Seq2Seq(nn.Module):
    def __init__(self):
        super(Seq2Seq, self).__init__()
        self.encoder = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5) # encoder
        self.decoder = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5) # decoder
        self.fc = nn.Linear(n_hidden, n_class)

    def forward(self, enc_input, enc_hidden, dec_input):
        # enc_input(=input_batch): [batch_size, n_step+1, n_class]
        # dec_inpu(=output_batch): [batch_size, n_step+1, n_class]
        enc_input = enc_input.transpose(0, 1) # enc_input: [n_step+1, batch_size, n_class]
        dec_input = dec_input.transpose(0, 1) # dec_input: [n_step+1, batch_size, n_class]

        # h_t : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
        _, h_t = self.encoder(enc_input, enc_hidden)
        # outputs : [n_step+1, batch_size, num_directions(=1) * n_hidden(=128)]
        outputs, _ = self.decoder(dec_input, h_t)

        model = self.fc(outputs) # model : [n_step+1, batch_size, n_class]
        return model

model = Seq2Seq().to(device)
criterion = nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(5000):
  for enc_input_batch, dec_input_batch, dec_output_batch in loader:
      # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
      h_0 = torch.zeros(1, batch_size, n_hidden).to(device)

      (enc_input_batch, dec_intput_batch, dec_output_batch) = (enc_input_batch.to(device), dec_input_batch.to(device), dec_output_batch.to(device))
      # enc_input_batch : [batch_size, n_step+1, n_class]
      # dec_intput_batch : [batch_size, n_step+1, n_class]
      # dec_output_batch : [batch_size, n_step+1], not one-hot
      pred = model(enc_input_batch, h_0, dec_intput_batch)
      # pred : [n_step+1, batch_size, n_class]
      pred = pred.transpose(0, 1) # [batch_size, n_step+1(=6), n_class]
      loss = 0
      for i in range(len(dec_output_batch)):
          # pred[i] : [n_step+1, n_class]
          # dec_output_batch[i] : [n_step+1]
          loss += criterion(pred[i], dec_output_batch[i])
      if (epoch + 1) % 1000 == 0:
          print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))
          
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
    
# Test
def translate(word):
    enc_input, dec_input, _ = make_data([[word, '?' * n_step]])
    enc_input, dec_input = enc_input.to(device), dec_input.to(device)
    # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
    hidden = torch.zeros(1, 1, n_hidden).to(device)
    output = model(enc_input, hidden, dec_input)
    # output : [n_step+1, batch_size, n_class]

    predict = output.data.max(2, keepdim=True)[1] # select n_class dimension
    decoded = [letter[i] for i in predict]
    translated = ''.join(decoded[:decoded.index('E')])

    return translated.replace('?', '')

print('test')
print('man ->', translate('man'))
print('mans ->', translate('mans'))
print('king ->', translate('king'))
print('black ->', translate('black'))
print('up ->', translate('up'))

NNDL 作业8:RNN - 简单循环网络_第15张图片

总结

这次实验要看的英文文献比较多,算是锻炼了自己的英文水平了,一边搜,一边做,后期会补充选做,希望老师再来看,今天先把必做的交上,这次实验在实践上巩固了RNN循环神经网络,体会到了RNN的时序功能,给出前面的信息退出后面的信息,和人类的语义一样,给出前面的信息得到后面的信息,如果直接从中间开始交谈的话,那么很难懂的,这也就是可能为什么RNN用于NLP的原因吧,大概。这次实验主要是用torch中的RNN框架进行一些应用。

你可能感兴趣的:(rnn,python,深度学习)