所谓Seq2Seq(Sequence to Sequence), 就是一种能够根据给定的序列,通过特定的方法生成另一个序列的方法。最早由《Sequence to Sequence Learning with Neural Networks》等提出,用于解决机器翻译中输入和输出不等长的问题(比如同一句话中英文的长度不相同)。随着NLP领域的发展,Seq2Seq技术还广泛应用到了聊天机器人、文本摘要自动生成、图片描述自动生成、机器写诗歌、代码补全等任务中,成为NLP中最重要的思想之一。其基本结构如下:
如图所示,最基础的Seq2Seq模型包含了三个部分,即Encoder、Decoder以及连接两者的中间状态向量C,Encoder通过学习输入,将其编码成一个固定大小的状态向量c,继而将c传给Decoder,Decoder再通过对状态向量c的学习来进行输出。
seq_data = [['man', 'women'], ['black', 'white'], ['king', 'queen'], ['girl', 'boy'], ['up', 'down'], ['high', 'low']]
本文只是展示seq2seq的基本原理,所使用数据较为简单,目的是通过学习,当输入第一个单词时,网络经过计算可以得到第二单词(类似于翻译的思想)。
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
seq_data = [['man', 'women'], ['black', 'white'], ['king', 'queen'], ['girl', 'boy'], ['up', 'down'], ['high', 'low']]
// S: 在单词前加'S'
// E: 在单词后加'E'
// P: 当单词长度小于step长时,在后面补上'P'
char_arr = [c for c in 'SEPabcdefghijklmnopqrstuvwxyz']
num_dict = {n:i for i, n in enumerate(char_arr)}
// n_class=29, batch_size=6
n_step = 5
n_hidden = 128
n_class = len(num_dict)
batch_size = len(seq_data)
def make_batch(seq_data):
input_batch, output_batch, target_batch = [], [], []
for seq in seq_data:
for i in range(2):
//单词长度小于n_step时用'P'补齐(将所有单词长变为5)
seq[i] = seq[i] + 'P' * (n_step - len(seq[i]))
input = [num_dict[n] for n in seq[0]]
output = [num_dict[n] for n in ('S' + seq[1])]
target = [num_dict[n] for n in (seq[1]) + 'E']
//将input_batch和target_batch转为one-hot向量
input_batch.append(np.eye(n_class)[input])
output_batch.append(np.eye(n_class)[output])
target_batch.append(target)
//将input_batch、target_batch和target_batch转化为Variable变量
return Variable(torch.Tensor(input_batch)), Variable(torch.Tensor(output_batch)), Variable(torch.LongTensor(target_batch))
// input_batch: (batch_size, seq_len, n_class) (6, 5, 29)
// output_batch: (batch_size, seq_len, n_class) (6, 6 ,29),在单词前加上了'S'
// target_batch: (batch_size, seq_len) (6, 6),在单词后加上了'E'
input_batch, output_batch, target_batch = make_batch(seq_data)
class Seq2Seq(nn.Module):
def __init__(self):
super().__init__()
self.enc = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)
self.dec = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)
self.fc = nn.Linear(n_hidden, n_class)
def forward(self, enc_input, enc_hidden, dec_input):
//因为RNN要求的输入形状是(seq_len, batch_size, n_class),所以要用transpose方法将input_batch和output_batch的0轴和1轴调换位置。
enc_input = enc_input.transpose(0, 1)
dec_input = dec_input.transpose(0, 1)
_, enc_states = self.enc(enc_input, enc_hidden)
outputs, _ = self.dec(dec_input, enc_states)
pred = self.fc(outputs)
return pred
敲黑板,网络的细节如下:
model = Seq2Seq()
loss_fn= nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
for epoch in range(5000):
//初始化变量hidden(num_layers*num_directions, batch_size, n_hidden),因为使用了单向单层的RNN,所以第一个参数为1。
hidden = Variable(torch.zeros(1, batch_size, n_hidden))
optimizer.zero_grad()
//input_batch和hidden作为encoder层的输入, output_batch作为decoder层的输入
pred = model(input_batch, hidden, output_batch)
//输出结果的形状为(seq_len, batch_size, n_class),为了计算loss须将0轴和1轴调换位置,即变为(batch_size, seq_len, n_class)
pred = pred.transpose(0, 1)
loss = 0
for i in range(len(seq_data)):
loss += loss_fn(pred[i], target_batch[i])
if (epoch + 1) % 1000 == 0:
print('Epoch: %d Cost: %f' % (epoch+1, loss))
loss.backward()
optimizer.step()
def translate(word):
input_batch, output_batch, _ = make_batch([[word, 'P' * len(word)]])
//hidden形状:(num_layers*num_directions, batch_size, n_hidden),这里batch为1(测试一次只输入一个单词)
hidden = Variable(torch.zeros(1, 1, n_hidden))
//output形状:(seq_len+1(=6), batch_size(=1), n_class)
output = model(input_batch, hidden, output_batch)
predict = output.data.max(2, keepdim=True)[1] # select n_class dimension
decoded = [char_arr[i] for i in predict]
end = decoded.index('E')
translated = ''.join(decoded[:end])
return translated.replace('P', '')
print('man ->', translate('man'))
print('men ->', translate('men'))
print('king ->', translate('king'))
print('higher ->', translate('higher'))
测试结果:
可以看到当用了men和higher作为输入时仍能输出正确的“翻译”结果。
参考链接
https://blog.csdn.net/qq_32241189/article/details/81591456
https://github.com/graykode/nlp-tutorial