PyTorch Tutorial
PyTorch中,关于训练词向量的练习,描述如下:
The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typcially, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as pretraining embeddings. It almost always helps performance a couple of percent.
The CBOW model is as follows. Given a target word wiwi and an NN context window on each side, wi−1,…,wi−Nwi−1,…,wi−N and wi+1,…,wi+Nwi+1,…,wi+N, referring to all context words collectively as CC, CBOW tries to minimize
where qwqw is the embedding for word ww.
Implement this model in Pytorch by filling in the class below. Some tips:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)
CONTEXT_SIZE = 2 # 2 words to the left, 2 to the right
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()
# By deriving a set from `raw_text`, we deduplicate the array
vocab = set(raw_text)
vocab_size = len(vocab)
word_to_ix = {word: i for i, word in enumerate(vocab)}
data = []
for i in range(2, len(raw_text) - 2):
context = [raw_text[i - 2], raw_text[i - 1],
raw_text[i + 1], raw_text[i + 2]]
target = raw_text[i]
data.append((context, target))
print(data[:5])
class CBOW(nn.Module):
def __init__(self, vocab_size, embedding_dim):
super(CBOW,self).__init__()
self.embeddings = nn.Embedding(vocab_size, embedding_dim) # embeddings, 待训练参数为embedding词表
self.linear1 = nn.Linear(embedding_dim, vocab_size) # 待训练参数为 A b
def forward(self, inputs):
embeds = self.embeddings(inputs)
add_embeds = torch.sum(embeds, dim=0).view(1,-1) # 相加后reshape
out = self.linear1(add_embeds)
log_probs = F.log_softmax(out)
return log_probs
# create your model and train. here are some functions to help you make
# the data ready for use by your module
def make_context_vector(context, word_to_ix):
idxs = [word_to_ix[w] for w in context]
tensor = torch.LongTensor(idxs)
return Variable(tensor)
make_context_vector(data[0][0], word_to_ix) # example
# 声明loss model optimizer
losses = []
loss_function = nn.NLLLoss()
model = CBOW(vocab_size, embedding_dim=20, context_size=CONTEXT_SIZE)
optimizer = optim.SGD(model.parameters(), lr=0.001)
# 训练10个epoch
for epoch in range(10):
total_loss = torch.FloatTensor([0])
for context, target in data:
context_idxs = [word_to_ix[w] for w in context]
target_idx = word_to_ix[target]
context_var = Variable(torch.LongTensor(context_idxs))
target_var = Variable(torch.LongTensor([target_idx]))
model.zero_grad()
log_probs = model(context_var)
loss = loss_function(log_probs,target_var)
loss.backward()
optimizer.step()
total_loss += loss.data
losses.append(total_loss)
print(losses)
运行结果:
[(['We', 'are', 'to', 'study'], 'about'), (['are', 'about', 'study', 'the'], 'to'), (['about', 'to', 'the', 'idea'], 'study'), (['to', 'study', 'idea', 'of'], 'the'), (['study', 'the', 'of', 'a'], 'idea')]
[
260.2805
[torch.FloatTensor of size 1]
,
255.0300
[torch.FloatTensor of size 1]
,
249.8967
[torch.FloatTensor of size 1]
,
244.8781
[torch.FloatTensor of size 1]
,
239.9720
[torch.FloatTensor of size 1]
,
235.1766
[torch.FloatTensor of size 1]
,
230.4900
[torch.FloatTensor of size 1]
,
225.9105
[torch.FloatTensor of size 1]
,
221.4367
[torch.FloatTensor of size 1]
,
217.0672
[torch.FloatTensor of size 1]
]