Transformer模型比之前笔记中介绍的任何模型都要大得多。因此,我们将使用transformers library获得预先训练的transformers,并使用它们作为我们的嵌入层。我们将冻结(而不是训练)transformer,只训练从transformer产生的表示中学习的模型的其余部分。在本例中,我们将使用多层双向GRU,但是任何模型都可以从这些表示中学习。
首先,一如既往地,让我们为确定性结果设置随机种子。
import torch
import random
import numpy as np
SEED = 1234
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
transformer已经用特定的词汇表进行了训练,这意味着我们需要用完全相同的词汇表进行训练,还需要用与transformer最初训练时相同的方式对数据进行标记。
幸运的是,transformer库为提供的每个transformer模型都提供了标记器。在这种情况下,我们使用的是忽略大小写的BERT模型(即每个单词都会小写)。我们通过加载预先训练的bert-base-uncased标记器来实现这一点。
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
标记器有一个vocab属性,它包含我们将要使用的实际词汇表。我们可以通过检查它的长度来检查其中有多少个标记。
print(len(tokenizer.vocab))
30522
使用标记器就像调用tokenizer.tokenize对字符串进行标记一样简单。这将以一种与预先训练的transformer模型一致的方式对数据进行标记和小写化。
tokens = tokenizer.tokenize('Hello WORLD how ARE yoU?')
print(tokens)
['hello', 'world', 'how', 'are', 'you', '?']
我们可以使用tokenizer.convert_tokens_to_ids通过使用词汇表将标记数字化。
indexes = tokenizer.convert_tokens_to_ids(tokens)
print(indexes)
[7592, 2088, 2129, 2024, 2017, 1029]
transformer也被训练用特殊的符号来标记句子的开始和结束,详细在这里。以及标准填充和未知标记。我们也可以从标记器中获得这些。
注意:标记器确实有序列的开始和序列的结束属性(bos_token和eos_token)
init_token = tokenizer.cls_token
eos_token = tokenizer.sep_token
pad_token = tokenizer.pad_token
unk_token = tokenizer.unk_token
print(init_token, eos_token, pad_token, unk_token)
[CLS] [SEP] [PAD] [UNK]
我们可以通过使用词汇表将特定的标记转换成索引…
init_token_idx = tokenizer.convert_tokens_to_ids(init_token)
eos_token_idx = tokenizer.convert_tokens_to_ids(eos_token)
pad_token_idx = tokenizer.convert_tokens_to_ids(pad_token)
unk_token_idx = tokenizer.convert_tokens_to_ids(unk_token)
print(init_token_idx, eos_token_idx, pad_token_idx, unk_token_idx)
101 102 0 100
…或者显式地从标记器获取它们。
init_token_idx = tokenizer.cls_token_id
eos_token_idx = tokenizer.sep_token_id
pad_token_idx = tokenizer.pad_token_id
unk_token_idx = tokenizer.unk_token_id
print(init_token_idx, eos_token_idx, pad_token_idx, unk_token_idx)
101 102 0 100
我们需要处理的另一件事是,模型是根据定义的最大长度序列进行训练的——它不知道如何处理比它所接受训练的长度更长的序列。我们可以通过检查要使用的transformer版本的max_model_input_sizes来获得这些输入大小的最大长度。在本例中,它是512个标记。
max_input_length = tokenizer.max_model_input_sizes['bert-base-uncased']
print(max_input_length)
512
以前我们已经使用了spaCy标记器来标记我们的示例。然而,我们现在需要定义一个将传递给TEXT字段的函数,该函数将为我们处理所有标记化。它还将把标记数量减少到最大长度。注意,我们的最大长度比实际的最大长度小2。这是因为我们需要向每个序列添加两个标记,一个用于开始,一个用于结束。
def tokenize_and_cut(sentence):
tokens = tokenizer.tokenize(sentence)
tokens = tokens[:max_input_length-2]
return tokens
现在我们定义字段。transformer期望batch维度为第一个维度,因此我们设置batch_first = True。因为我们已经为我们的TEXT中提供了词汇表,所以我们设置use_vocab = False来告诉torchtext我们将处理词汇表方面的事情。我们传递tokenize_and_cut函数作为标记器。preprocessing (预处理)参数是一个在示例中接受标记化后的函数,我们将在这里将标记转换为它们的索引。最后,我们定义特殊的标记——注意我们将它们定义为它们的索引值,而不是它们的字符串值,即100而不是[UNK],这是因为序列已经被转换为索引。
我们像之前一样定义label字段。
from torchtext import data
TEXT = data.Field(batch_first = True,
use_vocab = False,
tokenize = tokenize_and_cut,
preprocessing = tokenizer.convert_tokens_to_ids,
init_token = init_token_idx,
eos_token = eos_token_idx,
pad_token = pad_token_idx,
unk_token = unk_token_idx)
LABEL = data.LabelField(dtype = torch.float)
我们像前面一样加载数据并创建验证分割。
from torchtext import datasets
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)
train_data, valid_data = train_data.split(random_state = random.seed(SEED))
print(f"Number of training examples: {len(train_data)}")
print(f"Number of validation examples: {len(valid_data)}")
print(f"Number of testing examples: {len(test_data)}")
Number of training examples: 17500
Number of validation examples: 7500
Number of testing examples: 25000
我们可以检查一个示例,确保文本已经被数字化。
print(vars(train_data.examples[6]))
{'text': [5949, 1997, 2026, 2166, 1010, 1012, 1012, 1012, 1012, 1996, 2472, 2323, 2022, 10339, 1012, 2339, 2111, 2514, 2027, 2342, 2000, 2191, 22692, 5691, 2097, 2196, 2191, 3168, 2000, 2033, 1012, 2043, 2016, 2351, 2012, 1996, 2203, 1010, 2009, 2081, 2033, 4756, 1012, 1045, 2018, 2000, 2689, 1996, 3149, 2116, 2335, 2802, 1996, 2143, 2138, 1045, 2001, 2893, 10339, 3666, 2107, 3532, 3772, 1012, 11504, 1996, 3124, 2040, 2209, 9895, 2196, 4152, 2147, 2153, 1012, 2006, 2327, 1997, 2008, 1045, 3246, 1996, 2472, 2196, 4152, 2000, 2191, 2178, 2143, 1010, 1998, 2038, 2010, 3477, 5403, 3600, 2579, 2067, 2005, 2023, 10231, 1012, 1063, 1012, 6185, 2041, 1997, 2184, 1065], 'label': 'neg'}
我们可以使用convert_ids_to_tokens将这些索引转换回可读的标记。
tokens = tokenizer.convert_ids_to_tokens(vars(train_data.examples[6])['text'])
print(tokens)
['waste', 'of', 'my', 'life', ',', '.', '.', '.', '.', 'the', 'director', 'should', 'be', 'embarrassed', '.', 'why', 'people', 'feel', 'they', 'need', 'to', 'make', 'worthless', 'movies', 'will', 'never', 'make', 'sense', 'to', 'me', '.', 'when', 'she', 'died', 'at', 'the', 'end', ',', 'it', 'made', 'me', 'laugh', '.', 'i', 'had', 'to', 'change', 'the', 'channel', 'many', 'times', 'throughout', 'the', 'film', 'because', 'i', 'was', 'getting', 'embarrassed', 'watching', 'such', 'poor', 'acting', '.', 'hopefully', 'the', 'guy', 'who', 'played', 'heath', 'never', 'gets', 'work', 'again', '.', 'on', 'top', 'of', 'that', 'i', 'hope', 'the', 'director', 'never', 'gets', 'to', 'make', 'another', 'film', ',', 'and', 'has', 'his', 'pay', '##che', '##ck', 'taken', 'back', 'for', 'this', 'crap', '.', '{', '.', '02', 'out', 'of', '10', '}']
虽然我们已经处理了文本的词汇表,但是我们仍然需要为标签构建词汇表。
LABEL.build_vocab(train_data)
print(LABEL.vocab.stoi)
defaultdict(None, {'neg': 0, 'pos': 1})
与之前一样,我们创建迭代器。理想情况下,我们希望使用最大的批量,因为我发现这可以为transformer提供最好的结果。
BATCH_SIZE = 128
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
(train_data, valid_data, test_data),
batch_size = BATCH_SIZE,
device = device)
接下来,我们将加载预先训练过的模型,确保加载与标记器相同的模型。
from transformers import BertTokenizer, BertModel
bert = BertModel.from_pretrained('bert-base-uncased')
接下来,我们将定义实际的模型。
我们将使用预先训练的transformer模型,而不是使用嵌入层来获得文本的嵌入。然后,这些嵌入信息将被输入GRU,对输入句子的情感进行预测。我们通过transformer的config属性获得嵌入维度大小(称为hidden_size)。初始化的其余部分是标准的。
在前向(forward )传递中,我们将transformer封装在no_grad中,以确保模型的这一部分不计算梯度。transformer实际上返回整个序列的嵌入以及一个汇集(pooled)的输出。文档指出,汇集(pooled)的输出“通常不是输入语义内容的良好总结,对于整个输入序列,平均或汇集隐藏状态序列通常更好”,因此我们不会使用它。前向传递的其余部分是递归模型的标准实现,我们在最后的时间步中获取隐藏状态,并将其通过线性层获得我们的预测。
import torch.nn as nn
class BERTGRUSentiment(nn.Module):
def __init__(self,
bert,
hidden_dim,
output_dim,
n_layers,
bidirectional,
dropout):
super().__init__()
self.bert = bert
embedding_dim = bert.config.to_dict()['hidden_size']
self.rnn = nn.GRU(embedding_dim,
hidden_dim,
num_layers = n_layers,
bidirectional = bidirectional,
batch_first = True,
dropout = 0 if n_layers < 2 else dropout)
self.out = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, text):
#text = [batch size, sent len]
with torch.no_grad():
embedded = self.bert(text)[0]
#embedded = [batch size, sent len, emb dim]
_, hidden = self.rnn(embedded)
#hidden = [n layers * n directions, batch size, emb dim]
if self.rnn.bidirectional:
hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
else:
hidden = self.dropout(hidden[-1,:,:])
#hidden = [batch size, hid dim]
output = self.out(hidden)
#output = [batch size, out dim]
return output
接下来,我们使用标准超参数创建模型的实例。
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.25
model = BERTGRUSentiment(bert,
HIDDEN_DIM,
OUTPUT_DIM,
N_LAYERS,
BIDIRECTIONAL,
DROPOUT)
我们可以检查模型有多少参数。我们的标准型号在5M以下,但这款有112M!幸运的是,其中110M的参数来自transformer,我们不会训练这些参数。
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')
The model has 112,241,409 trainable parameters
为了冻结参数(而不是训练它们),我们需要将它们的requires_grad属性设置为False。为此,我们只需循环遍历模型中的所有named_parameters,如果它们是bert transformer模型的一部分,则设置requires_grad = False。
for name, param in model.named_parameters():
if name.startswith('bert'):
param.requires_grad = False
我们现在可以看到,我们的模型在3M的可训练参数下,使得它几乎可以与FastText模型相媲美。然而,文本仍然需要通过transformer传播,这将导致训练花费相当长的时间。
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')
The model has 2,759,169 trainable parameters
我们可以再次检查可训练参数的名称,确保它们是有意义的。可以看出,它们都是GRU (rnn)和线性层(out)的参数。
for name, param in model.named_parameters():
if param.requires_grad:
print(name)
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
rnn.weight_ih_l0_reverse
rnn.weight_hh_l0_reverse
rnn.bias_ih_l0_reverse
rnn.bias_hh_l0_reverse
rnn.weight_ih_l1
rnn.weight_hh_l1
rnn.bias_ih_l1
rnn.bias_hh_l1
rnn.weight_ih_l1_reverse
rnn.weight_hh_l1_reverse
rnn.bias_ih_l1_reverse
rnn.bias_hh_l1_reverse
out.weight
out.bias
作为标准,我们定义我们的优化器和标准(损失函数)。
import torch.optim as optim
optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
将模型和标准放到GPU上(如果有的话)
model = model.to(device)
criterion = criterion.to(device)
接下来,我们将定义以下函数:计算准确率、执行训练epoch、执行评估epoch和计算训练/评估epoch需要多长时间。
def binary_accuracy(preds, y):
"""
Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
"""
#round predictions to the closest integer
rounded_preds = torch.round(torch.sigmoid(preds))
correct = (rounded_preds == y).float() #convert into float for division
acc = correct.sum() / len(correct)
return acc
def train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
optimizer.zero_grad()
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
def evaluate(model, iterator, criterion):
epoch_loss = 0
epoch_acc = 0
model.eval()
with torch.no_grad():
for batch in iterator:
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
import time
def epoch_time(start_time, end_time):
elapsed_time = end_time - start_time
elapsed_mins = int(elapsed_time / 60)
elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
return elapsed_mins, elapsed_secs
最后,我们将训练我们的模型。由于transformer的大小,这需要比以前的任何模型的时间长得多。即使我们没有训练任何transformer的参数,我们仍然需要通过模型传递数据,这在标准GPU上需要花费相当多的时间。
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
start_time = time.time()
train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'tut6-model.pt')
print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}%')
Epoch: 01 | Epoch Time: 7m 27s
Train Loss: 0.286 | Train Acc: 88.16%
Val. Loss: 0.247 | Val. Acc: 90.26%
Epoch: 02 | Epoch Time: 7m 27s
Train Loss: 0.234 | Train Acc: 90.77%
Val. Loss: 0.229 | Val. Acc: 91.00%
Epoch: 03 | Epoch Time: 7m 27s
Train Loss: 0.209 | Train Acc: 91.83%
Val. Loss: 0.225 | Val. Acc: 91.10%
Epoch: 04 | Epoch Time: 7m 27s
Train Loss: 0.182 | Train Acc: 92.97%
Val. Loss: 0.217 | Val. Acc: 91.98%
Epoch: 05 | Epoch Time: 7m 27s
Train Loss: 0.156 | Train Acc: 94.17%
Val. Loss: 0.230 | Val. Acc: 91.76%
我们将加载给我们带来最佳验证损失的参数,并在测试集中尝试这些参数——这将为我们带来迄今为止最好的结果!
model.load_state_dict(torch.load('tut6-model.pt'))
test_loss, test_acc = evaluate(model, test_iterator, criterion)
print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')
Test Loss: 0.198 | Test Acc: 92.31%
然后,我们将使用该模型来测试一些序列的情感。我们对输入序列进行标记,将其缩减为最大长度,将特殊标记添加到任意一边,将其转换为一个张量,添加一个伪批处理维数,然后将其传递给我们的模型。
def predict_sentiment(model, tokenizer, sentence):
model.eval()
tokens = tokenizer.tokenize(sentence)
tokens = tokens[:max_input_length-2]
indexed = [init_token_idx] + tokenizer.convert_tokens_to_ids(tokens) + [eos_token_idx]
tensor = torch.LongTensor(indexed).to(device)
tensor = tensor.unsqueeze(0)
prediction = torch.sigmoid(model(tensor))
return prediction.item()
res = predict_sentiment(model, tokenizer, "This film is terrible")
print(res)
0.02264496125280857
res = predict_sentiment(model, tokenizer, "This film is great")
print(res)
0.9411056041717529
import torch
import random
import numpy as np
SEED = 1234
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
print(len(tokenizer.vocab))
tokens = tokenizer.tokenize('hello WORLD how ARE yoU?')
print(tokens)
indexs = tokenizer.convert_tokens_to_ids(tokens)
print(indexs)
init_token = tokenizer.cls_token
eos_token = tokenizer.sep_token
pad_token = tokenizer.pad_token
unk_token = tokenizer.unk_token
print(init_token, eos_token, pad_token, unk_token)
init_token_idx = tokenizer.convert_tokens_to_ids(init_token)
eos_token_idx = tokenizer.convert_tokens_to_ids(eos_token)
pad_token_idx = tokenizer.convert_tokens_to_ids(pad_token)
unk_token_idx = tokenizer.convert_tokens_to_ids(unk_token)
print(init_token_idx, eos_token_idx, pad_token_idx, unk_token_idx)
init_token_idx = tokenizer.cls_token_id
eos_token_idx = tokenizer.sep_token_id
pad_token_idx = tokenizer.pad_token_id
unk_token_idx = tokenizer.unk_token_id
print(init_token_idx, eos_token_idx, pad_token_idx, unk_token_idx)
max_input_length = tokenizer.max_model_input_sizes['bert-base-uncased']
print(max_input_length)
def tokenize_and_cut(sentence):
tokens = tokenizer.tokenize(sentence)
tokens = tokens[ :max_input_length-2]
return tokens
from torchtext import data
TEXT = data.Field(
batch_first=True,
use_vocab=False,
tokenize=tokenize_and_cut,
preprocessing=tokenizer.convert_tokens_to_ids,
init_token=init_token_idx,
eos_token=eos_token_idx,
unk_token=unk_token_idx,
pad_token=pad_token_idx
)
LABEL = data.LabelField(dtype=torch.float)
from torchtext import datasets
train_data, test_data = datasets.IMDB.splits(
text_field=TEXT,
label_field=LABEL
)
train_data, valid_data = train_data.split(random_state = random.seed(SEED))
print(f'Number of training examples: {len(train_data)}')
print(f'Number of validation examples: {len(valid_data)}')
print(f'Number of testing examples: {len(test_data)}')
print(vars(train_data.examples[6]))
tokens = tokenizer.convert_ids_to_tokens(vars(train_data.examples[6])['text'])
print(tokens)
LABEL.build_vocab(train_data)
print(LABEL.vocab.stoi)
BATCH_SIZE = 128
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
(train_data, valid_data, test_data),
batch_size = BATCH_SIZE,
device = device)
from transformers import BertTokenizer, BertModel
bert = BertModel.from_pretrained('bert-base-uncased')
import torch.nn as nn
class BERTGRUSentiment(nn.Module):
def __init__(self, bert, hidden_dim, output_dim, n_layers, bidirectional, dropout):
super(BERTGRUSentiment, self).__init__()
self.bert = bert
embedding_dim = bert.config.to_dict()['hidden_size']
self.rnn = nn.GRU(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional,
batch_first=True, dropout= 0 if n_layers < 2 else dropout)
self.out = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, text):
# text = [batch_size, sent_len]
with torch.no_grad():
embedded = self.bert(text)[0]
# embedded = [batch_size, sent_len, emb_dim]
_, hidden = self.rnn(embedded)
# hidden = [n_layers * n_directions, batch_size, emb_dim]
if self.rnn.bidirectional:
hidden = self.dropout(torch.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=1))
else:
hidden = self.dropout(hidden[-1, :, :])
# hidden = [batch_size, hidden_dim]
output = self.out(hidden)
# output = [batch_size, output_dim]
return output
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
model = BERTGRUSentiment(bert, HIDDEN_DIM, OUTPUT_DIM, N_LAYERS, BIDIRECTIONAL, DROPOUT)
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')
for name, param in model.named_parameters():
if name.startswith('bert'):
param.requires_grad = False
def count_paraeters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')
for name, param in model.named_parameters():
if param.requires_grad:
print(name)
import torch.optim as optim
optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
model = model.to(device)
criterion = criterion.to(device)
def binary_accuracy(preds, y):
rounded_preds = torch.round(torch.sigmoid(preds))
correct = (rounded_preds == y).float()
acc = correct.sum() / len(correct)
return acc
def train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
optimizer.zero_grad()
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
def evaluate(model, iterator, criterion):
epoch_loss = 0
epoch_acc = 0
model.eval()
with torch.no_grad():
for batch in iterator:
predictions = model(batch.text).squeeze(1)
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
import time
def epoch_time(start_time, end_time):
elapsed_time = end_time - start_time
elapsed_mins = int(elapsed_time / 60)
elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
return elapsed_mins, elapsed_secs
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
start_time = time.time()
train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'tut6-model.pt')
print(f'Epoch: {epoch + 1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc * 100:.2f}%')
print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc * 100:.2f}%')
model.load_state_dict(torch.load('tut6-model.pt'))
test_loss, test_acc = evaluate(model, test_iterator, criterion)
print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')
def predict_sentiment(model, tokenizer, sentence):
model.eval()
tokens = tokenizer.tokenize(sentence)
tokens = tokens[:max_input_length-2]
indexed = [init_token_idx] + tokenizer.convert_tokens_to_ids(tokens) + [eos_token_idx]
tensor = torch.LongTensor(indexed).to(device)
tensor = tensor.unsqueeze(0)
prediction = torch.sigmoid(model(tensor))
return prediction.item()
res = predict_sentiment(model, tokenizer, "This film is terrible")
print(res)
res = predict_sentiment(model, tokenizer, "This film is great")
print(res)