[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered.

0. Statement

This blog is a summary of what I have learned, about using RNNs, embedding, PyTorch, etc., to complete RNNs experiments. If there are any mistakes, welcome to correct them.

1. Start

  1. The following figure shows a general overview of the entire neural network, with some details not shown specifically.
    [NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第1张图片
  2. The following figure shows the problem to be solved in this experiment: when a sequence is input, the other correct Sequence is the output.
    [NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第2张图片

2. Dataset

  1. Building an overall frequency dictionary.
  2. Building a list of index-to-character mappings. (idx2char)
  3. Building a list of character-to-index mappings. (char2idx)
# 原始文本
input_text = "hihell"
output_text = "ihello"

# 词典
char_freq = Counter(input_text + output_text)
print("char_freq:",char_freq) # Counter({'l': 4, 'h': 3, 'i': 2, 'e': 2, 'o': 1})
print("char_freq.keys():", char_freq.keys()) # dict_keys(['h', 'i', 'e', 'l', 'o'])
# idx2char = [c for c in char_freq.keys()]
idx2char = [c for i, c in enumerate(char_freq.keys())]
print("idx2char:",idx2char) # ['h', 'i', 'e', 'l', 'o']
char2idx = {c: i for i, c in enumerate(idx2char)}

# 输入输出的向量
input_idx = [char2idx[c] for c in input_text]
print("input_idx:hihell", input_idx) # input_idx: [0, 1, 0, 2, 3, 3]
output_idx = [char2idx[c] for c in output_text]
print("output_idx:ihello",output_idx) # output_idx [1, 0, 2, 3, 3, 4]

3. Overall Architecture

Visualization of RNN.
[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第3张图片
From here

import torch
from torch import nn
import numpy as np

class Net(nn.Module):
    def __init__(self):
        self.model0 = nn.Sequential(
            nn.Embedding(num_embeddings=6, embedding_dim=8),
            nn.RNN(input_size=8, hidden_size=5)
        self.model1 = nn.Sequential(
            nn.Linear(in_features=5, out_features=5)

    def forward(self, x):
        x = self.model0(x)
        x = self.model1(x[0])
        return x

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第4张图片

4. Details

4.1 Input of shape

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第5张图片

Forward function of RNN

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第6张图片

⚠️: Input of shape of RNN (seq_len, batch, input_size) - Because the data to be input is ‘hihell’, so the seq_len is 6 because it is a sequence so the batch is 1, after the feature_len needs to be determined by the embedding layer.

inputs = torch.tensor(input_idx).reshape(6, 1)
print(inputs.shape) # torch.Size([6, 1])
labels = torch.tensor(output_idx) # torch.Size([6])

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第7张图片

import torch.nn as nn
import torch

# 有6个词的词向量,每个词向量是8维的
embeds = nn.Embedding(num_embeddings=6, embedding_dim=8)
# 给出一句话的每个单词的索引(index)
a = torch.tensor([0, 1, 0, 2, 3, 3]).reshape(6, 1)
print(embeds(a).size()) # torch.Size([6, 1, 8])

4.2 Flatten Layer

If do not have Flatten Layer, you will get the shape of output(6 * 1 * 5).

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第8张图片

Else, you will get the shape of output(6 * 5).

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第9张图片

⚠️: What is the difference between (6 * 1 * 5) and (6 * 5)?

When you put the (6 * 1 * 5) into the Loss Function, you will get an Error. A请添加图片描述

So Let’s check the Loss Function in PyTorch documents. From here

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第10张图片
[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第11张图片
If your targets are just 1-dim, your input should be 2-dim.

5. Complete code and conclusion


import torch
from torch import nn
import numpy as np

class Net(nn.Module):
    def __init__(self):
        self.model0 = nn.Sequential(
            nn.Embedding(num_embeddings=6, embedding_dim=8),
            nn.RNN(input_size=8, hidden_size=5)
        self.model1 = nn.Sequential(
            nn.Linear(in_features=5, out_features=5)

    def forward(self, x):
        x = self.model0(x)
        x = self.model1(x[0])
        return x


from collections import Counter
import torch
from torch import nn
import torch.nn.functional as F
from MyNet import *

# 原始文本
input_text = "hihell"
output_text = "ihello"

# 词典
char_freq = Counter(input_text + output_text)
print("char_freq:",char_freq) # Counter({'l': 4, 'h': 3, 'i': 2, 'e': 2, 'o': 1})
print("char_freq.keys():", char_freq.keys()) # dict_keys(['h', 'i', 'e', 'l', 'o'])
# idx2char = [c for c in char_freq.keys()]
idx2char = [c for i, c in enumerate(char_freq.keys())]
print("idx2char:",idx2char) # ['h', 'i', 'e', 'l', 'o']
char2idx = {c: i for i, c in enumerate(idx2char)}

# 输入输出的向量
input_idx = [char2idx[c] for c in input_text]
print("input_idx:hihell", input_idx) # input_idx: [0, 1, 0, 2, 3, 3]
output_idx = [char2idx[c] for c in output_text]
print("output_idx:ihello",output_idx) # output_idx [1, 0, 2, 3, 3, 4]

# 训练
inputs = torch.tensor(input_idx).reshape(6, 1)
print(inputs.shape) # torch.Size([6, 1])
labels = torch.tensor(output_idx) # torch.Size([6])

# 创建网络
net = Net()
# output = net(inputs)
# print(output)
# print(output.shape)

# 创建损失函数
loss_fn = nn.CrossEntropyLoss()

# 优化器
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)

pre_idx = []

for i in range(1000):

    # 训练步骤开始

    outputs = net(inputs)
    # print(outputs.shape)
    # print(output_idx)
    loss = loss_fn(outputs, labels)

    # 优化器优化模型

for i in pre_idx[-6:]:

[NLP] Teach RNN to output ‘ihello‘ when ‘hihell‘ is entered._第12张图片
