python可嵌入性_python-是否可以仅冻结pytorch嵌入层中的某些嵌入权重?

1.将嵌入分为两个单独的对象

一种方法是使用两个单独的嵌入,一个用于预训练,另一个用于待训练.

GloVe应该被冻结,而没有预训练表示的GloVe应该从可训练层获取.

如果格式化数据以用于预训练的令牌表示,则该数据的范围比不具有GloVe表示的令牌的范围小.假设您的预训练索引在[0,300]范围内,而没有代表性的索引在[301,500].我会遵循以下思路:

import numpy as np

import torch

class YourNetwork(torch.nn.Module):

def __init__(self,glove_embeddings: np.array,how_many_tokens_not_present: int):

self.pretrained_embedding = torch.nn.Embedding.from_pretrained(glove_embeddings)

self.trainable_embedding = torch.nn.Embedding(

how_many_tokens_not_present,glove_embeddings.shape[1]

)

# Rest of your network setup

def forward(self,batch):

# Which tokens in batch do not have representation,should have indices BIGGER

# than the pretrained ones,adjust your data creating function accordingly

mask = batch > self.pretrained_embedding.shape[0]

# You may want to optimize it,you could probably get away without copy,though

# I'm not currently sure how

pretrained_batch = batch.copy()

pretrained_batch[mask] = 0

embedded_batch = self.pretrained_embedding[pretrained_batch]

# Every token without representation has to be brought into appropriate range

batch -= self.pretrained_embedding.shape[0]

# Zero out the ones which already have pretrained embedding

batch[~mask] = 0

non_pretrained_embedded_batch = self.trainable_embedding(batch)

# And finally change appropriate tokens from placeholder embedding created by

# pretrained into trainable embeddings.

embedded_batch[mask] = non_pretrained_embedded_batch[mask]

# Rest of your code

...

假设您的预训练索引在[0,500].

2.指定令牌的零梯度.

这有点棘手,但我认为它非常简洁且易于实现.因此,如果获得没有GloVe表示形式的标记的索引,则可以在反向传播后将它们的梯度显式归零,这样这些行就不会被更新.

import torch

embedding = torch.nn.Embedding(10,3)

X = torch.LongTensor([[1,2,4,5],[4,3,9]])

values = embedding(X)

loss = values.mean()

# Use whatever loss you want

loss.backward()

# Let's say those indices in your embedding are pretrained (have GloVe representation)

indices = torch.LongTensor([2,5])

print("Before zeroing out gradient")

print(embedding.weight.grad)

print("After zeroing out gradient")

embedding.weight.grad[indices] = 0

print(embedding.weight.grad)

和第二种方法的输出:

Before zeroing out gradient

tensor([[0.0000,0.0000,0.0000],[0.0417,0.0417,0.0417],[0.0833,0.0833,0.0833],[0.0000,0.0417]])

After zeroing out gradient

tensor([[0.0000,0.0417]])

你可能感兴趣的:(python可嵌入性)