PyTorch深度学习实践 Lecture12 RNN基础篇


Author :Horizon Max

编程技巧篇:各种操作小结

机器视觉篇:会变魔术 OpenCV

深度学习篇:简单入门 PyTorch

神经网络篇:经典网络模型

算法篇:再忙也别忘了 LeetCode


视频链接:Lecture 12 Basic_RNN
文档资料:

//Here is the link:
课件链接:https://pan.baidu.com/s/1vZ27gKp8Pl-qICn_p2PaSw
提取码:cxe4

文章目录

  • Basic_RNN
    • 概述
      • RNN(循环神经网络)
      • RNN Cell
      • 使用Pytorch构造RNN Cell
      • 使用Pytorch构造RNN
      • RNN Cell Code
      • RNN Cell 运行结果
      • RNN Code
      • RNN 运行结果
      • 总结
      • Using Embedding
      • Embedding Result
    • 附录:相关文档资料

Basic_RNN

概述

RNN(循环神经网络)

RNN是用来专门处理带有 序列 的数据,即前面的输入与后面的输入有相关的联系;

像天气预测,股市预测、自然语言处理(NLP);

e.g:具有 时间序列 关系:我 | 爱 | 中国

如下图左边为RNN的基本表示方式,右侧为其展开形式,即:利用 RNN Cell 循环使用;

总的来说,前一个的 输出 是 后一个的 输入,xt 是输入、ht 是输出;

h1 = RNN Cell (x1,h0);h2 = RNN Cell (x2,h1);h3 = RNN Cell (x3,h2) …

h0若有先验则使用先验,如果没有则使用与h同维度的全零矩阵;
PyTorch深度学习实践 Lecture12 RNN基础篇_第1张图片
RNN 包括 输入层隐藏层输出层,其中隐藏层为下面的 RNN Cell

RNN Cell

RNN Cell的本质是一个 线性变换 层(即:Linear Layer)
PyTorch深度学习实践 Lecture12 RNN基础篇_第2张图片
将 xt 和 ht 输入到RNN Cell进行线性变换: wx+b ,然后使用 激活函数(一般使用 tanh)对输出值进行激活(注意各参数的维度);

tanh可以使输出分布在 (-1,+1) 之间,函数图像:

PyTorch深度学习实践 Lecture12 RNN基础篇_第3张图片

使用Pytorch构造RNN Cell

注意主要的 几个参数

输入维度 :input_shape
隐层维度 :hidden_shape
批量大小 :batch_size
序列长度 :seq_len

PyTorch深度学习实践 Lecture12 RNN基础篇_第4张图片

Code :

# Here is the code :

import torch

batch_size = 1     # 批量大小
seq_len = 3        # 序列长度
input_size = 4     # 输入维度
hidden_size = 2    # 隐层维度

# Construction of RNNCell
cell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)

# (seq, batch, features)
dataset = torch.randn(seq_len, batch_size, input_size)    # Wrapping the sequence
hidden = torch.zeros(batch_size, hidden_size)    # Initializing the hidden to zero (即 h0)

# 对数据进行展示
for idx, data in enumerate(dataset):
    print('=' * 20, idx, '=' * 20)
    print('Input size:', data.shape, data)

    hidden = cell(data, hidden)

    print('hidden size:', hidden.shape, hidden)
    print(hidden)

运行结果 :

==================== 0 ====================
Input size: torch.Size([1, 4]) tensor([[ 1.2552, -2.0234, -1.4022,  0.3870]])
hidden size: torch.Size([1, 2]) tensor([[-0.9857, -0.6875]], grad_fn=)
tensor([[-0.9857, -0.6875]], grad_fn=)
==================== 1 ====================
Input size: torch.Size([1, 4]) tensor([[-0.3590,  1.2850, -0.3742,  0.4842]])
hidden size: torch.Size([1, 2]) tensor([[0.0437, 0.3355]], grad_fn=)
tensor([[0.0437, 0.3355]], grad_fn=)
==================== 2 ====================
Input size: torch.Size([1, 4]) tensor([[-1.2730, -0.5386, -2.0886, -0.2790]])
hidden size: torch.Size([1, 2]) tensor([[-0.0229,  0.0303]], grad_fn=)
tensor([[-0.0229,  0.0303]], grad_fn=)

使用Pytorch构造RNN

注意主要的 几个参数

输入维度 :input_shape
隐层维度 :hidden_shape
批量大小 :batch_size
序列长度 :seq_len
隐层数目 :num_layers

PyTorch深度学习实践 Lecture12 RNN基础篇_第5张图片PyTorch深度学习实践 Lecture12 RNN基础篇_第6张图片
什么是 num_layers (图中有3个线性层):
PyTorch深度学习实践 Lecture12 RNN基础篇_第7张图片
这就能说明为什么输入 hidden of shape 中有 num_layers 参数

Code :

# Here is the code :

import torch

batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
num_layers = 1

cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)

# (seqLen, batchSize, inputSize)
inputs = torch.randn(seq_len, batch_size, input_size)
hidden = torch.zeros(num_layers, batch_size, hidden_size)

out, hidden = cell(inputs, hidden)

print('Output size:', out.shape)        # (seq_len, batch_size, hidden_size)
print('Output:', out)
print('Hidden size:', hidden.shape)     # (num_layers, batch_size, hidden_size)
print('Hidden:', hidden)

运行结果:

Output size: torch.Size([3, 1, 2])
Output: tensor([[[ 0.9840,  0.1540]],

				 [[ 0.1060, -0.9769]],

				 [[-0.3047, -0.6314]]], grad_fn=)
Hidden size: torch.Size([1, 1, 2])
Hidden: tensor([[[-0.3047, -0.6314]]], grad_fn=)

RNN Cell Code

实现 序列到序列 之间的 转换 :" hello " - > " ohlol "
PyTorch深度学习实践 Lecture12 RNN基础篇_第8张图片

可以知道字符并不是向量,无法进行模型运算,所以第一步:字符向量化 (独热向量):

PyTorch深度学习实践 Lecture12 RNN基础篇_第9张图片
进而 模型 演变成这样,这里使用的是交叉熵损失:
PyTorch深度学习实践 Lecture12 RNN基础篇_第10张图片

# Here is the code :

import torch

input_size = 4
hidden_size = 4
batch_size = 1


# Prepare Data

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 3, 3]        # 表示 hello 字符
y_data = [3, 1, 2, 3, 2]        # 表示 ohlol 字符
# 利用下面的 x_one_hot 取出对应列的 one_hot_lookup 列表值

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]    # 此时的输入 x 维度为(seqLen, inputSize)

inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)   # (seqLen, batchSize, inputSize)
labels = torch.LongTensor(y_data).view(-1, 1)    # (seqLen, inputSize)
# torch.Tensor默认是torch.FloatTensor是32位浮点类型数据,torch.LongTensor是64位整型

print('- ' * 20)
print(inputs.shape, labels.shape)
print('- ' * 20)


# Design Modle

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init__()
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.rnncell = torch.nn.RNNCell(input_size=self.input_size, hidden_size=self.hidden_size)

    def forward(self, inputs, hidden):
        hidden = self.rnncell(inputs, hidden)   # (batch_size, hidden_size)
        return hidden

    def init_hidden(self):     # 生成 h0
        return torch.zeros(self.batch_size, self.hidden_size)

net = Model(input_size, hidden_size, batch_size)


# Loss and Optimizer

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)


# Training Cycle

epochs = 15

for epoch in range(epochs):
    loss = 0
    optimizer.zero_grad()
    hidden = net.init_hidden()
    print('Predicted string:', end='')
    for input, label in zip(inputs, labels):
        hidden = net(input, hidden)
        loss += criterion(hidden, label)  # 所有的loss值构造计算图需要相加,不要使用item()
        _, idx = hidden.max(dim = 1)
        print(idx2char[idx.item()], end='')
    loss.backward()
    optimizer.step()
    print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))

RNN Cell 运行结果

- - - - - - - - - - - - - - - - - - - - 
torch.Size([5, 1, 4]) torch.Size([5, 1])
- - - - - - - - - - - - - - - - - - - - 

Predicted string : lllll, Epoch [1/15] loss=6.9243
Predicted string : lllll, Epoch [2/15] loss=5.8079
Predicted string : lllll, Epoch [3/15] loss=4.9950
Predicted string : ohlll, Epoch [4/15] loss=4.3414
Predicted string : ohlol, Epoch [5/15] loss=3.8480
Predicted string : ohlol, Epoch [6/15] loss=3.4899
Predicted string : ohlol, Epoch [7/15] loss=3.2105
Predicted string : ohlol, Epoch [8/15] loss=2.9713
Predicted string : ohlol, Epoch [9/15] loss=2.7506
Predicted string : ohlol, Epoch [10/15] loss=2.5562
Predicted string : ohlol, Epoch [11/15] loss=2.4172
Predicted string : ohlol, Epoch [12/15] loss=2.2963
Predicted string : ohlol, Epoch [13/15] loss=2.1816
Predicted string : ohlol, Epoch [14/15] loss=2.0963
Predicted string : ohlol, Epoch [15/15] loss=2.0449

RNN Code

# Here is the code :

import torch

input_size = 4
hidden_size = 4
batch_size = 1
seq_len = 5
num_layers = 1


# Prepare Data

idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 3, 3]        # 表示 hello 字符
y_data = [3, 1, 2, 3, 2]        # 表示 ohlol 字符
# 利用下面的 x_one_hot 取出对应列的 one_hot_lookup 列表值

one_hot_lookup = [[1, 0, 0, 0],
                  [0, 1, 0, 0],
                  [0, 0, 1, 0],
                  [0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]    # 此时的输入 x 维度为(seqLen, inputSize)

inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)   # (seqLen, batchSize, inputSize)
labels = torch.LongTensor(y_data)    # (seqLen, inputSize)

print('- ' * 20)
print(inputs.shape, labels.shape)
print('- ' * 20)


# Design Modle

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(Model, self).__init__()
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.rnn = torch.nn.RNN(input_size=self.input_size, hidden_size=self.hidden_size, )

    def forward(self, inputs):
        hidden = torch.zeros(self.num_layers, self.batch_size, self.hidden_size)
        out, _ = self.rnn(inputs, hidden)     # (seqLen, batch_size, hidden_size)
        return out.view(-1, self.hidden_size) # 为容易计算交叉熵改变维度 (seqLen * batch_size, hidden_size)

net = Model(input_size, hidden_size, batch_size)


# Loss and Optimizer

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)


# Training Cycle

epochs = 15

for epoch in range(epochs):
    optimizer.zero_grad()
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    _, idx = outputs.max(dim=1)
    idx = idx.data.numpy()
    print('Predicted : ', ''.join([idx2char[x] for x in idx]), end='')
    print(', Epoch [%d/15] loss = %.3f' % (epoch + 1, loss.item()))

RNN 运行结果

- - - - - - - - - - - - - - - - - - - - 
torch.Size([5, 1, 4]) torch.Size([5])
- - - - - - - - - - - - - - - - - - - - 

Predicted :  oolll, Epoch [1/15] loss = 1.261
Predicted :  oolll, Epoch [2/15] loss = 1.060
Predicted :  oooll, Epoch [3/15] loss = 0.947
Predicted :  oholl, Epoch [4/15] loss = 0.873
Predicted :  oholl, Epoch [5/15] loss = 0.800
Predicted :  ohlll, Epoch [6/15] loss = 0.727
Predicted :  ohlll, Epoch [7/15] loss = 0.673
Predicted :  ohlll, Epoch [8/15] loss = 0.622
Predicted :  ohlol, Epoch [9/15] loss = 0.575
Predicted :  ohlol, Epoch [10/15] loss = 0.544
Predicted :  ohlol, Epoch [11/15] loss = 0.531
Predicted :  ohlol, Epoch [12/15] loss = 0.523
Predicted :  ohlol, Epoch [13/15] loss = 0.502
Predicted :  ohlol, Epoch [14/15] loss = 0.481
Predicted :  ohlol, Epoch [15/15] loss = 0.471

总结

在前面使用的都是 独热向量 ,它存在着这些问题:

1、输入变多时 维度太高
2、向量过于 稀疏
3、属于 硬编码 ,而不是学习到的;

解决方式:

embedding :高维的稀疏样本映射到低维的空间,且保留了予以关系;

PyTorch深度学习实践 Lecture12 RNN基础篇_第11张图片
PyTorch深度学习实践 Lecture12 RNN基础篇_第12张图片

Using Embedding

PyTorch深度学习实践 Lecture12 RNN基础篇_第13张图片

# Here is the code :

import torch

# parameters
num_class = 4 
input_size = 4 
hidden_size = 8 
embedding_size = 10 
num_layers = 2 
batch_size = 1 
seq_len = 5

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.rnn = torch.nn.RNN(input_size=embedding_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        x = self.emb(x)               # (batch, seqLen, embeddingSize)
        x, _ = self.rnn(x, hidden)    # (batchSize, seqLen, hiddenSize)
        x = self.fc(x)                # (batchSize, seqLen, numClass)
        return x.view(-1,num_class)   # reshape to use Cross Entropy: (seqLen * batch_size, hidden_size)
        
net = Model()

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)


idx2char = ['e', 'h', 'l', 'o'] 
x_data = [[1, 0, 2, 2, 3]]  # (batch, seq_len) 
y_data = [3, 1, 2, 3, 2]    # (batch * seq_len)

inputs = torch.LongTensor(x_data)   # Input should be LongTensor: (batchSize, seqLen)
labels = torch.LongTensor(y_data)   # Target should be LongTensor: (batchSize * seqLen)

epochs = 15

for epoch in range(epochs):
    optimizer.zero_grad()
    outputs = net(inputs) 
    loss = criterion(outputs, labels) 
    loss.backward() 
    optimizer.step()

    _, idx = outputs.max(dim=1) 
    idx = idx.data.numpy() 
    print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='') 
    print(', Epoch [%d/15] loss = %.3f' % (epoch + 1, loss.item()))

Embedding Result

Predicted:  ooohh, Epoch [1/15] loss = 1.337
Predicted:  ohlol, Epoch [2/15] loss = 1.078
Predicted:  ohlol, Epoch [3/15] loss = 0.819
Predicted:  ohlol, Epoch [4/15] loss = 0.591
Predicted:  ohlol, Epoch [5/15] loss = 0.419
Predicted:  ohlol, Epoch [6/15] loss = 0.289
Predicted:  ohlol, Epoch [7/15] loss = 0.196
Predicted:  ohlol, Epoch [8/15] loss = 0.132
Predicted:  ohlol, Epoch [9/15] loss = 0.089
Predicted:  ohlol, Epoch [10/15] loss = 0.060
Predicted:  ohlol, Epoch [11/15] loss = 0.042
Predicted:  ohlol, Epoch [12/15] loss = 0.029
Predicted:  ohlol, Epoch [13/15] loss = 0.021
Predicted:  ohlol, Epoch [14/15] loss = 0.016
Predicted:  ohlol, Epoch [15/15] loss = 0.012

可以发现使用 Embedding 它更早的出现需要的 “ohlol” 字符。



附录:相关文档资料

PyTorch 官方文档: PyTorch Documentation
PyTorch 中文手册: PyTorch Handbook


《PyTorch深度学习实践》系列链接:

  Lecture01 Overview
  Lecture02 Linear_Model
  Lecture03 Gradient_Descent
  Lecture04 Back_Propagation
  Lecture05 Linear_Regression_with_PyTorch
  Lecture06 Logistic_Regression
  Lecture07 Multiple_Dimension_Input
  Lecture08 Dataset_and_Dataloader
  Lecture09 Softmax_Classifier
  Lecture10 Basic_CNN
  Lecture11 Advanced_CNN
  Lecture12 Basic_RNN
  Lecture13 RNN_Classifier

你可能感兴趣的:(简单入门,PyTorch,深度学习,机器学习,PyTorch,RNN)