线性嵌入 pytorch_使用Pytorch从头开始创建您的迷你单词嵌入

线性嵌入 pytorch

机器学习 (Machine Learning)

介绍: (Introduction:)

On a lighter note, the embedding of a particular word (In Higher Dimension) is nothing but a vector representation of that word (In Lower Dimension). Where words with similar meaning Ex. “Joyful” and “Cheerful” and other closely related words like Ex. “Money” and “Bank”, gets closer vector representation when projected in the Lower Dimension.

简单来说,一个特定单词的嵌入(在较高维度中)不过是该单词(较低维度)的向量表示 凡具有类似含义的词例。 “快乐”和“快乐”以及其他紧密相关的词,例如Ex。 当在“较低维度”中投影时, “ Money”和“ Bank”将获得更接近的矢量表示。

The transformation from words to vectors is called word embedding

从单词到向量的转换称为单词嵌入

So the underlying concept in creating a mini word embedding boils down to train a simple Auto-Encoder with some text data.

因此,创建迷你单词嵌入的基本概念归结为使用一些文本数据训练一个简单的自动编码器。

一些基础知识: (Some Basics :)

Before we proceed to our creation of mini word embedding, it’s good to brush up our basics concepts of word embedding showered by the deep learning community so far.

在继续创建迷你词嵌入之前,最好先梳理到目前为止深度学习社区所介绍的词嵌入的基本概念。

The popular and state-of-the-art word embedding models out there are as follows:-

流行的,最先进的词嵌入模型如下:

  1. Word2Vec (Google)

    Word2Vec(Google)
  2. Glove (Stanford University)

    手套(斯坦福大学)

They are trained on a huge amount of text corpus like Wikipedia or entire web is scraped, up to 6 Billion words (In Higher Dimension), and projected them into as low as 100,200,300 dense embeddings (In Lower Dimension).

他们接受了像Wikipedia这样的大量文本语料库的训练,或者刮掉了多达60亿个单词 (在较高维度中)的整个网络并将它们投影到低至100,200,300个密集嵌入(在较低维度中)。

Here in our model, we project them into 2 dense embeddings.

在我们的模型中,我们将它们投影到2个密集的嵌入中。

使用的技巧: (Techniques used :)

The above state-of-the-art models use any one of the 2 primary techniques to accomplish the task.

上面的最新模型使用两种主要技术中的任何一种来完成任务。

  1. Continous-Bag-of-Words (CBOW)

    连续词袋(CBOW)
  2. Skip-Gram

    跳过格拉姆

1. CBOW: (1. CBOW :)

CBOW attempts to guess the output (target word) from its neighboring words (context words). Window size is a hyper-parameter here.

CBOW尝试从其相邻单词(上下文单词)猜测输出(目标单词)。 此处的窗口大小是一个超参数。

范例: (Example :)

Sentence: cats and mice are buddies

句子:猫和老鼠是哥们

Target Word(Output): mice (let’s say)

目标词(输出): 鼠标 (比方说)

Context Word(Inputs): cats and _ are buddies

上下文词(输入): 猫和 _ 是好友

2. Skip-Gram: (2. Skip-Gram:)

Skip-Gram guesses the context words from a target word. We will be implementing this in this post.

Skip-Gram从目标单词中猜测上下文单词。 我们将在这篇文章中实现这一点。

Sentence: cats and mice are buddies

句子:猫和老鼠是哥们

Target Word(Output): and, mice …

目标词(输出): 鼠标……

Context Word(Inputs): cat, cat …

上下文词(输入): catcat…

More on the techniques later.

稍后会详细介绍这些技术。

CBOW Vs Skip-gram CBOW VS跳过图

迷你字嵌入程序: (Mini Word Embedding Process:)

1.数据准备和数据预处理 (1. Data Preparation and Data Preprocessing)

2.超参数选择与模型构建 (2. Hyper-parameter selection and Model building)

3.模型推论 (3. Model Inference)

1.数据准备和数据预处理 (1. Data Preparation and Data Preprocessing)

Here comes the fun part, as I stated before the above state of the models used a large amount of text data to train those models, since we are interested in a mini version of it, let’s choose a small dataset.

正如我之前在模型的上述状态使用大量文本数据来训练那些模型之前所说的那样,这是一个有趣的部分,因为我们对它的微型版本感兴趣,所以我们选择一个小的数据集。

And to make things exciting, I have chosen Tom and Jerry cartoon play as our data corpus.

为了使事情变得令人兴奋,我选择了Tom和Jerry卡通游戏作为我们的数据语料库。

Tom and Jerry — Play 汤姆和杰瑞–播放

Our mini dataset looks like this,

我们的迷你数据集看起来像这样,

data = ['cat chases mice',
        'cat catches mice',
        'cat eats mice',
        'mice runs into hole',
        'cat says bad words',
        'cat and mice are pals',
        'cat and mice are chums',
        'mice stores food in hole',
        'cat stores food in house',
        'mice sleeps in hole',
        'cat sleeps in house',
        'cat and mice are buddies',
        'mice lives in hole',
        'cat lives in house']

So we will be using the above data, now we shall start the pre-processing steps.

因此,我们将使用上述数据,现在我们将开始预处理步骤。

  1. First, we need to map each unique word into an integer and later map the integer into one-hot encoding.

    首先,我们需要将每个唯一的单词映射为一个整数,然后再将该整数映射为一个热编码。
Data — Preprocess 数据-预处理

2. Then once we have made the integer and one hot mapping for every word, now we shall create batches for training.

2.然后,我们为每个单词制作了整数和一个热映射后,现在我们将创建用于训练的批处理。

Since we have limited data and implementing a mini word embedding, we shall consider the skip-gram model with the window size of 2 (Consider the adjacent 2 words as targets) and predict the target word, given the context word (INPUT).

由于我们的数据有限,并且实现了迷你词嵌入,因此我们将考虑使用窗口大小为2skip-gram模型(将相邻的2个词作为目标),并在给出上下文词(INPUT)的情况下预测目标词。

Refer to the picture below to understand our skip-gram model.

请参考下面的图片以了解我们的跳过语法模型。

Our training batch 我们的培训批次
Sample Data Format 样本数据格式

The code implementation for the above batch preparation is shown below.

以上批处理代码的实现如下所示。

idx_2_word = {}
word_2_idx = {}
temp = []
i = 1
for doc in docs:
  for word in doc.split():
    if word not in temp:
      temp.append(word)
      idx_2_word[i] = word
      word_2_idx[word] = i
      i += 1


print("idx_2_word")
print(idx_2_word)


print("word_2_idx")
print(word_2_idx)


############################ OUTPUT ##############################


idx_2_word
{1: 'cat', 2: 'and', 3: 'mice', 4: 'are', 5: 'buddies', 6: 'lives',
 7: 'in', 8: 'hole', 9: 'house', 10: 'chases', 11: 'catches', 12: 'eats', 
 13: 'runs', 14: 'into', 15: 'says', 16: 'bad', 17: 'words', 18: 'pals', 
 19: 'chums', 20: 'stores', 21: 'food', 22: 'sleeps'}


word_2_idx
{'cat': 1, 'and': 2, 'mice': 3, 'are': 4, 'buddies': 5, 'lives': 6, 
 'in': 7, 'hole': 8, 'house': 9, 'chases': 10, 'catches': 11, 'eats': 12, 
 'runs': 13, 'into': 14, 'says': 15, 'bad': 16, 'words': 17, 'pals': 18, 
 'chums': 19, 'stores': 20, 'food': 21, 'sleeps': 22}
vocab_size = 25


def one_hot_map(doc):
  x = []
  for word in doc.split():
    x.append(word_2_idx[word])
  return x
  
encoded_docs = [one_hot_map(d) for d in docs]
print(encoded_docs)


############################ OUTPUT ##############################


[[1, 2, 3, 4, 5],
 [3, 6, 7, 8],
 [1, 6, 7, 9],
 [1, 10, 3],
 [1, 11, 3],
 [1, 12, 3],
 [3, 13, 14, 8],
 [1, 15, 16, 17],
 [1, 2, 3, 4, 18],
 [1, 2, 3, 4, 19],
 [3, 20, 21, 7, 8],
 [1, 20, 21, 7, 9],
 [3, 22, 7, 8],
 [1, 22, 7, 9]]
# Padding for consistency, max size of 10


max_len = 10
padded_docs = pad_sequences(encoded_docs, maxlen=max_len, padding='post')
padded_docs


############################ OUTPUT ##############################


array([[ 1,  2,  3,  4,  5,  0,  0,  0,  0,  0],
       [ 3,  6,  7,  8,  0,  0,  0,  0,  0,  0],
       [ 1,  6,  7,  9,  0,  0,  0,  0,  0,  0],
       [ 1, 10,  3,  0,  0,  0,  0,  0,  0,  0],
       [ 1, 11,  3,  0,  0,  0,  0,  0,  0,  0],
       [ 1, 12,  3,  0,  0,  0,  0,  0,  0,  0],
       [ 3, 13, 14,  8,  0,  0,  0,  0,  0,  0],
       [ 1, 15, 16, 17,  0,  0,  0,  0,  0,  0],
       [ 1,  2,  3,  4, 18,  0,  0,  0,  0,  0],
       [ 1,  2,  3,  4, 19,  0,  0,  0,  0,  0],
       [ 3, 20, 21,  7,  8,  0,  0,  0,  0,  0],
       [ 1, 20, 21,  7,  9,  0,  0,  0,  0,  0],
       [ 3, 22,  7,  8,  0,  0,  0,  0,  0,  0],
       [ 1, 22,  7,  9,  0,  0,  0,  0,  0,  0]], dtype=int32)
# Creating dataset tuples for training


training_data = np.empty((0,2))


window = 2
for sentence in padded_docs:
  sent_len = len(sentence)
  for i, word in enumerate(sentence):
    w_context = []
    if sentence[i] != 0:
      w_target = sentence[i]
      for j in range(i-window, i + window + 1):
        if j != i and j <= sent_len -1 and j >=0 and sentence[j]!=0:
          w_context = sentence[j]
          training_data = np.append(training_data, [[w_target, w_context]], axis=0)
          #training_data.append([w_target, w_context])


print(len(training_data))
print(training_data.shape)
print(training_data[0:21])


############################ OUTPUT ##############################


148
(148, 2)
array([[ 1.,  2.],
       [ 1.,  3.],
       [ 2.,  1.],
       [ 2.,  3.],
       [ 2.,  4.],
       [ 3.,  1.],
       [ 3.,  2.],
       [ 3.,  4.],
       [ 3.,  5.],
       [ 4.,  2.],
       [ 4.,  3.],
       [ 4.,  5.],
       [ 5.,  3.],
       [ 5.,  4.],
       [ 3.,  6.],
       [ 3.,  7.],
       [ 6.,  3.],
       [ 6.,  7.],
       [ 6.,  8.],
       [ 7.,  3.],
       [ 7.,  6.]])
# Final Step, Input Variables and Output Variables


enc = OneHotEncoder()
enc.fit(np.array(range(30)).reshape(-1,1))
onehot_label_x = enc.transform(training_data[:,0].reshape(-1,1)).toarray()


print("onehot_label_x")
print(onehot_label_x)


enc = OneHotEncoder()
enc.fit(np.array(range(30)).reshape(-1,1))
onehot_label_y = enc.transform(training_data[:,1].reshape(-1,1)).toarray()


print("onehot_label_y")
print(onehot_label_y)


############################ OUTPUT ##############################


onehot_label_x
array([[0., 1., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])


onehot_label_y
array([[0., 0., 1., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

2.超参数选择与模型构建 (2. Hyper-parameter selection and Model building)

Now that we are done creating our batches, now let’s build a simple Auto-Encoder type model for training. In simple words, it’s a neural network that compresses the higher dimension into a lower dimension and later decompresses it to a higher dimension.

现在我们已经完成了批次的创建,现在让我们构建一个简单的自动编码器类型模型进行训练。 简而言之,它是一个神经网络,它将较高的维度压缩为较低的维度,然后将其解压缩为较高的维度。

So it is understood that the lower dimension captures the important features of the input, which in our case is our word embedding of the target word.

因此可以理解,较低维度捕获了输入的重要特征,在我们的案例中,这是我们目标词的词嵌入

Auto-Encoder Design 自动编码器设计

Here from the design above, I have modified our neural network function to provide the output of the final layer (30D) and as well as the output of the middle layer (2D) [Our Word Embedding].

在这里,根据上面的设计,我修改了神经网络功能以提供最后一层(30D)的输出以及中间层(2D)的输出[我们的词嵌入]。

To design the neural network I will be using the PyTorch framework.

为了设计神经网络,我将使用PyTorch框架。

超参数选择: (Hyper-Parameter Selection :)

  1. input_size = 30 (Input as well as Output Dimension)

    input_size = 30(输入和输出尺寸)
  2. hidden_size = 2 (Hidden Layer dimension)

    hidden_​​size = 2(隐藏层尺寸)
  3. learning_rate = 0.01 (lr for weight optimization)

    learning_rate = 0.01(用于优化体重的lr)
  4. num_epochs = 5000 (How many times to train the model on entire data)

    num_epochs = 5000(对整个数据训练模型的次数)

Therefore for the above specifications, I have designed the model in Pytorch.

因此,对于以上规格,我在Pytorch中设计了该模型。

See the code implementation below.

请参见下面的代码实现。

# Hyperparameters


input_size = 30
hidden_size = 2
learning_rate = 0.01
num_epochs = 7000


class WEMB(nn.Module):
  def __init__(self, input_size, hidden_size):
    super(WEMB, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.softmax = nn.Softmax(dim=1)
    
    self.l1 = nn.Linear(self.input_size, self.hidden_size, bias=False)
    self.l2 = nn.Linear(self.hidden_size, self.input_size, bias=False)
   
  def forward(self, x):
    out_bn = self.l1(x) # bn - bottle_neck
    out = self.l2(out_bn)
    out = self.softmax(out)
    return out, out_bn


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


model = WEMB(input_size, hidden_size).to(device)
model.train(True)
print(model)


# Loss and optimizer
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, \
                            momentum=0, weight_decay=0, nesterov=False)
summary(model, torch.ones((1,30)))


############################ OUTPUT ##############################


WEMB(
  (softmax): Softmax(dim=1)
  (l1): Linear(in_features=30, out_features=2, bias=False)
  (l2): Linear(in_features=2, out_features=30, bias=False)
)


======================================================


          Kernel Shape Output Shape Params Mult-Adds
Layer                                               
0_l1           [30, 2]       [1, 2]   60.0      60.0
1_l2           [2, 30]      [1, 30]   60.0      60.0
2_softmax            -      [1, 30]      -         -
------------------------------------------------------
                      Totals
Total params           120.0
Trainable params       120.0
Non-trainable params     0.0
Mult-Adds              120.0
======================================================


Kernel Shape	Output Shape	Params	Mult-Adds
Layer				
0_l1	[30, 2]	[1, 2]	60.0	60.0
1_l2	[2, 30]	[1, 30]	60.0	60.0
2_softmax	-	[1, 30]	NaN	NaN

Now let’s start the training process.

现在让我们开始训练过程。

loss_val = []
onehot_label_x = onehot_label_x.to(device)
onehot_label_y = onehot_label_y.to(device)


for epoch in range(num_epochs):
  for i in range(onehot_label_y.shape[0]):
      inputs = onehot_label_x[i].float()
      labels = onehot_label_y[i].float()
      inputs = inputs.unsqueeze(0)
      labels = labels.unsqueeze(0)


      # Forward pass
      output, wemb = model(inputs)
      loss = criterion(output, labels)


      # Backward and optimize
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
  loss_val.append(loss.item())


  if (epoch+1) % 100 == 0:
    print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


plt.plot(loss_val)


############################ OUTPUT ##############################


Epoch [100/7000], Loss: 0.1468
Epoch [200/7000], Loss: 0.1457
Epoch [300/7000], Loss: 0.1445
Epoch [400/7000], Loss: 0.1429
Epoch [500/7000], Loss: 0.1408
Epoch [600/7000], Loss: 0.1379
Epoch [700/7000], Loss: 0.1338
Epoch [800/7000], Loss: 0.1280
Epoch [900/7000], Loss: 0.1202
Epoch [1000/7000], Loss: 0.1099
Epoch [1100/7000], Loss: 0.0980
Epoch [1200/7000], Loss: 0.0865
Epoch [1300/7000], Loss: 0.0776
Epoch [1400/7000], Loss: 0.0716
Epoch [1500/7000], Loss: 0.0679
Epoch [1600/7000], Loss: 0.0657
Epoch [1700/7000], Loss: 0.0647
Epoch [1800/7000], Loss: 0.0643
Epoch [1900/7000], Loss: 0.0644
Epoch [2000/7000], Loss: 0.0646
Epoch [2100/7000], Loss: 0.0650
Epoch [2200/7000], Loss: 0.0653
Epoch [2300/7000], Loss: 0.0656
Epoch [2400/7000], Loss: 0.0657
Epoch [2500/7000], Loss: 0.0659
Epoch [2600/7000], Loss: 0.0659
Epoch [2700/7000], Loss: 0.0659
Epoch [2800/7000], Loss: 0.0658
Epoch [2900/7000], Loss: 0.0657
Epoch [3000/7000], Loss: 0.0656
Epoch [3100/7000], Loss: 0.0655
Epoch [3200/7000], Loss: 0.0653
Epoch [3300/7000], Loss: 0.0651
Epoch [3400/7000], Loss: 0.0650
Epoch [3500/7000], Loss: 0.0648
Epoch [3600/7000], Loss: 0.0646
Epoch [3700/7000], Loss: 0.0644
Epoch [3800/7000], Loss: 0.0642
Epoch [3900/7000], Loss: 0.0640
Epoch [4000/7000], Loss: 0.0638
Epoch [4100/7000], Loss: 0.0636
Epoch [4200/7000], Loss: 0.0634
Epoch [4300/7000], Loss: 0.0632
Epoch [4400/7000], Loss: 0.0630
Epoch [4500/7000], Loss: 0.0628
Epoch [4600/7000], Loss: 0.0626
Epoch [4700/7000], Loss: 0.0624
Epoch [4800/7000], Loss: 0.0622
Epoch [4900/7000], Loss: 0.0620
Epoch [5000/7000], Loss: 0.0618
Epoch [5100/7000], Loss: 0.0617
Epoch [5200/7000], Loss: 0.0615
Epoch [5300/7000], Loss: 0.0613
Epoch [5400/7000], Loss: 0.0611
Epoch [5500/7000], Loss: 0.0610
Epoch [5600/7000], Loss: 0.0608
Epoch [5700/7000], Loss: 0.0607
Epoch [5800/7000], Loss: 0.0605
Epoch [5900/7000], Loss: 0.0603
Epoch [6000/7000], Loss: 0.0602
Epoch [6100/7000], Loss: 0.0600
Epoch [6200/7000], Loss: 0.0599
Epoch [6300/7000], Loss: 0.0598
Epoch [6400/7000], Loss: 0.0596
Epoch [6500/7000], Loss: 0.0595
Epoch [6600/7000], Loss: 0.0593
Epoch [6700/7000], Loss: 0.0592
Epoch [6800/7000], Loss: 0.0591
Epoch [6900/7000], Loss: 0.0590
Epoch [7000/7000], Loss: 0.0588
Training Loss 培训损失

The loss graph looks good and out model doesn’t overfit nor underfit. Now let’s pass all our inputs and get the 2D [Word Embedding] ( Lower Dimension Representation) for the input words and plot them to see whether our model has learned the semantic meaning in our data corpus. You have to train it for more epochs if you have larger training data.

损失图看起来不错,输出模型既不会过拟合也不会不足。 现在,让我们传递所有输入,并为输入单词获取2D [ 单词嵌入 ](低维表示),并对其进行绘图,以查看我们的模型是否已在数据语料库中学习了语义。 如果您有更大的训练数据,则必须训练更多的时间。

损失函数的选择(可选阅读): (Choice of Loss Functions (Optional Read) :)

Since all our vectors are one-hot encoded, which means, in our output, we have a vector of 30 arrays out of which we have “1” to represent the INDEXED word and “0” elsewhere.

由于我们所有的向量都是一键编码的,这意味着在我们的输出中,我们有一个包含30个数组的向量,其中有“ 1”代表索引词,在其他地方有“ 0”。

范例: (Example :)

word — Cat

词—猫

Integer Encoding : 1

整数编码:1

One-hot encoding : 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

一键式编码:0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Since this is not a multi-class or multi-label classification, we will be using BCELoss(Pytorch) or BinaryCrossEntropy (Keras/Tensorflow) as our loss function.

由于这不是多类或多标签分类,因此我们将使用BCELoss (Pytorch)或BinaryCrossEntropy (Keras / Tensorflow)作为损失函数。

And in the final layer, before the outputs, I have used the Softmax Activation Function, since we want to compare the output(Actual) with probability(Prediction).

在最后一层中,在输出之前,我使用了Softmax激活函数,因为我们想将输出(实际)与概率(预测)进行比较。

SoftMax — Soft (Lower values are softly treated into lower probabilities, not zeroing out), Max (Higher values are projected to higher Probabilities < 1).

SoftMax-软性(将较低的值轻柔地处理为较低的概率,而不是归零),最大(最大的值投影为较高的概率<1)。

I often get confused about whether to use Softmax or Sigmoid, where output softmax scores when added sums to 1 and sigmoid squishes the output in the range of 0–1. So I have provided a code snippet for you to consider the output loss, and I wanted to go with softmax here as it worked for me.

对于使用Softmax还是Sigmoid,我经常感到困惑,在这些情况下,输出softmax得分相加后得出的总和为1,而Sigmoid将输出压缩为0–1。 因此,我提供了一个代码段供您考虑输出损失,并且我想在这里使用softmax,因为它对我有用。

使用Softmax: (Using Softmax :)

PREDICTED → (0.9, 0.0, 0.0)

预测→(0.9,0.0,0.0)

PREDICTED + SOFTMAX → (0.5515, 0.2242, 0.2242)

预测+ SOFTMAX→(0.5515,0.2242,0.2242)

ACTUAL TARGET →(1.0, 0.0, 0.0)

实际目标→(1.0,0.0,0.0)

LOSS VALUE → 0.36

损失值→0.36

m = nn.Softmax()
loss = nn.BCELoss()input = torch.tensor([0.9,0.0,0.0])
target = torch.tensor([1.0,0.0,0.0])
output = loss(m(input), target)print(input, m(input), target)
print("Loss",output)### DISPLAYED OUTPUTS ###tensor([0.9000, 0.0000, 0.0000])
tensor([0.5515, 0.2242, 0.2242])
tensor([1., 0., 0.])Loss([0.36])

使用Sigmoid: (Using Sigmoid:)

PREDICTED → (0.9, 0.0, 0.0)

预测→(0.9,0.0,0.0)

PREDICTED + SIGMOID → (0.7109, 0.5000, 0.5000)

预测+ SIGMOID→(0.7109,0.5000,0.5000)

ACTUAL TARGET →(1.0, 0.0, 0.0)

实际目标→(1.0,0.0,0.0)

LOSS VALUE → 0.57

损失值→0.57

m = nn.Sigmoid()
loss = nn.BCELoss()input = torch.tensor([0.9,0.0,0.0])
target = torch.tensor([1.0,0.0,0.0])
output = loss(m(input), target)print(input, m(input), target)
print("Loss",output)### DISPLAYED OUTPUTS ###tensor([0.9000, 0.0000, 0.0000])
tensor([0.7109, 0.5000, 0.5000])
tensor([1., 0., 0.])Loss([0.5758])

3.模型推论 (3. Model Inference)

Here we shall pass every word in our corpus and extract the 2D Latent representation learned by the model (Word Embedding).

在这里,我们将传递语料库中的每个单词,并提取模型学习的2D潜在表示( 单词嵌入 )。

# Passing all our training data and get the 2-D ouptut
docs = ['cat and mice are buddies hole lives in house chases catches runs into says bad words pals chums stores sleeps']
encoded_docs = [one_hot_map(d) for d in docs]


test_arr = np.array([[ 1.,  2., 3., 4., 5., 8., 6., 7., 9., 10., 11., 13., 14., 15., 16., 17., 18., 19., 20., 22.]])
test = enc.transform(test_arr.reshape(-1,1)).toarray()


output = []
for i in range(test.shape[0]):
  _, wemb2 = model(torch.from_numpy(test[i]).unsqueeze(0).float())
  wemb2 = wemb2[0].detach().cpu().numpy()
  output.append(wemb2)
print(len(output))


docs = ['cat', 'and', 'mice', 'are', 'buddies', 'hole', 'lives', 'in',\
        'house', 'chases', 'catches', 'runs', 'into', 'says', 'bad', \
        'words', 'pals', 'chums', 'stores', 'sleeps']


for i in range(0, len(docs)):
  print("Word - {} - It's Word Embedding {:.3} & {:.3}".format(docs[i], output[i][0], output[i][0]))
  
############################ OUTPUT ##############################


20


Word - cat - It's Word Embeddings : -0.0428 & -0.0428 
Word - and - It's Word Embeddings : -0.152 & -0.152 
Word - mice - It's Word Embeddings : 0.113 & 0.113 
Word - are - It's Word Embeddings : -0.151 & -0.151 
Word - buddies - It's Word Embeddings : -0.121 & -0.121 
Word - hole - It's Word Embeddings : -0.0583 & -0.0583 
Word - lives - It's Word Embeddings : 0.116 & 0.116 
Word - in - It's Word Embeddings : 0.0778 & 0.0778 
Word - house - It's Word Embeddings : 0.0933 & 0.0933 
Word - chases - It's Word Embeddings : 0.00153 & 0.00153 
Word - catches - It's Word Embeddings : -0.119 & -0.119 
Word - runs - It's Word Embeddings : -0.0713 & -0.0713 
Word - into - It's Word Embeddings : 0.0398 & 0.0398 
Word - says - It's Word Embeddings : 0.104 & 0.104 
Word - bad - It's Word Embeddings : 0.151 & 0.151 
Word - words - It's Word Embeddings : 0.169 & 0.169 
Word - pals - It's Word Embeddings : -0.0761 & -0.0761 
Word - chums - It's Word Embeddings : -0.124 & -0.124 
Word - stores - It's Word Embeddings : -0.0167 & -0.0167 
Word - sleeps - It's Word Embeddings : 0.128 & 0.128

比较图-训练有素与未训练的模型: (COMPARING THE PLOT — TRAINED VS UN-TRAINED MODEL :)

Trained Model’s Vs Untrained Model’s Output. 训练模型与未训练模型的输出。

And we did it, just as we expected, we can see the words “Mice & Cat” are very close in the embedding dimension, this feature is learned from the data corpus as they occur very frequently one after another.

正如我们所期望的,我们做到了,我们可以看到“ Mice&Cat”一词在嵌入维度上非常接近,此功能是从数据语料库中获悉的,因为它们经常一个接一个地出现。

Also the words, “Buddies, Pals, and Chums”, “lives, sleep & house” and “catches, chases” are also closer in the embedding dimension.

在嵌入方面, “好友,好朋友和密友”, “生活,睡眠和住所”和“捕捉,追逐”等词也更接近。

The reason is that those bolded words carry some semantic meaning between them. For instance “Buddies, Pals, and Chums”, generally refer to the same meaning — Friends/partners and our model captured it.

原因是那些加粗的词在它们之间带有某种语义。 例如, “好友,好朋友和密友”通常指的是相同的含义-朋友/合作伙伴,我们的模型已将其捕获。

Similarly, we know the words tom(cat) and jerry(mice) occurs frequently, so the model interprets there’s a relationship between them and projects them nearby in the latent dimension.

同样,我们知道单词tom( cat )和jerry( mice )经常出现,因此该模型解释它们之间存在某种关系,并将它们投影在潜在维度附近。

This is exactly what happens inside the word2vec model on a larger scale, but instead, it has different architecture (CBOW or Skip-gram with different window sizes & multiple target words) and it’s trained on high volume data.

这确实是在word2vec模型内部发生的较大规模的变化,但是,它具有不同的体系结构(CBOW或具有不同窗口大小和多个目标词的Skip-gram),并且接受了大量数据的训练。

Room for Improvement :

改进空间:

But this model cannot capture features from a high volume of the data corpus, so we need to change the architecture of our model to do so.

但是此模型无法捕获大量数据语料库中的特征,因此我们需要更改模型的体系结构才能做到这一点。

In this implementation, we used a single target word per input word for prediction, but it can be extended like the figure on the right, where the same neural network can be used to predict across multiple target words for the given input word, which makes the model capture nuances in the dataset.

在此实现中,我们对每个输入单词使用单个目标单词进行预测,但可以像右图所示那样对其进行扩展,其中可以使用相同的神经网络在给定输入单词的多个目标单词之间进行预测,这使得模型捕获数据集中的细微差别。

Our Old Architecture Vs New Architecture, Source — Author 我们的旧体系结构与新体系结构,来源—作者

Since I want this post to be simple and unique across the other word embedding articles available on the medium, I used a single target word predictor model, but now you can easily comprehend other articles if you understand the underlying concept firmly.

由于我希望这篇文章在媒体上提供的其他词嵌入文章中既简单又独特,因此我使用了单个目标词预测器模型,但是现在,如果您牢牢理解基础概念,就可以轻松理解其他文章。

I hope I was able to provide some visual understanding to our mini word embedding, let me know your thoughts in the comment section.

希望我能对我们的迷你单词嵌入提供一些视觉理解,让我在评论部分中告诉您您的想法。

Check out the Notebooks that contains the entire code implementation and feel free to break it.

签出包含整个代码实现的笔记本,可以随意破坏它。

See in GitHub,

GitHub中查看

Run in Google Colab,

Google Colab中运行

Or if you prefer Kaggle,

或者,如果您更喜欢Kaggle

Until then, see you next time.

在那之前,下次见。

Article By:

文章作者:

BALAKRISHNAKUMAR V

BALAKRISHNAKUMAR V

Co-Founder — DeepScopy (An AI-Based Medical Imaging Startup)

联合创始人— DeepScopy (基于AI的医学成像初创公司)

Connect with me → LinkedIn, GitHub, Twitter, Medium

与我联系 → LinkedIn , GitHub , Twitter , 中

https://deepscopy.com/ https://deepscopy.com/

`

`

Visit us → DeepScopy

访问我们 → DeepScopy

Connect with us → Twitter, LinkedIn, Medium

与我们联系 → Twitter , LinkedIn , 中

翻译自: https://medium.com/towards-artificial-intelligence/create-your-own-mini-word-embedding-from-scratch-c7b32bd84f8e

线性嵌入 pytorch

你可能感兴趣的:(python)