算一算-Word2Vec(2)

摘要:

上一篇文章里,我们详细的介绍了Word2Vec下SkipGram的层级SoftMax的计算流程,在这篇文章里,我们将介SkipGram下的Negative Sampling的计算流程。

训练参数

和上篇文章一样,我们使用同样的参数以及训练语料,具体如下:
算一算-Word2Vec(2)_第1张图片

训练语料如下:
算一算-Word2Vec(2)_第2张图片

SkipGram & Negative Samping 训练过程

预备工作:

和层级SoftMax一样,我们需要对词向量进行初始化:
算一算-Word2Vec(2)_第3张图片
接下来,构造的样本和上一篇文章的一样,具体如下:
算一算-Word2Vec(2)_第4张图片

关于负采样:

在讨论负采样之前,我们先来思考一个问题,对于训练样本 {cat,say} { c a t , s a y } ,当输入是 cat c a t 时,输出是是 say s a y ,也就是说输出是 say s a y 时算正样本,输出为其他时单词时比如 dogwoofelepigmonkey d o g 、 w o o f 、 e l e 、 p i g 、 m o n k e y 都算是一个负样本。
那么当词汇表的非常巨大的时候,负样本的计算开销是非常庞大的。负采样的核心思想其实就是:每次在计算负样本的时候,只抽取其中的若干个词(设定参数)作为负样本,具体来说,其中一个最简单的负采样的思路如下:

首先我们统计每个词的词频:
算一算-Word2Vec(2)_第5张图片

之后,我们根据频率按累加的方式在一个长度为1的线段上把这些词分段画出:
算一算-Word2Vec(2)_第6张图片

有了这个之后,我们就可以在 [0,1] [ 0 , 1 ] 中随机产生一个随机数来选择我的抽样词。可以看到,频率越大的词被抽中的概率也越大。

最后,我们参考文章可以推导出下面的算法流程图:

Algorithm 2:SkipGramNegativeSampling____________________________________________________________________ Input:w,wInput:Negative,Input:Context(w)wInput:V(w),wInput:θindex,For U in Context(w) do:       e=0       For d in UNeg{U}Negative do :               index=get_index{d}               f=σ(V(w)Tθindex)               g=η(Label{d})f)               e:=e+gθindex               θindex:=θindex+gV(w)       V(w):=V(w)+eEnd A l g o r i t h m   2 : S k i p G r a m − N e g a t i v e S a m p l i n g _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _   I n p u t : w , 中 心 词 w I n p u t : N e g a t i v e , 负 采 样 的 个 数 I n p u t : C o n t e x t ( w ) , 中 心 词 w 的 上 下 文 集 合 I n p u t : V ( w ) , 中 心 词 w 的 词 向 量 I n p u t : θ i n d e x , 辅 助 向 量 F o r   U   i n   C o n t e x t ( w )   d o :               e = 0               F o r   d   i n   U ∪ N e g { U } N e g a t i v e   d o   :                               i n d e x = g e t _ i n d e x { d }                               f = σ ( V ( w ) T θ i n d e x )                               g = η ( L a b e l { d } ) − f )                               e := e + g θ i n d e x                               θ i n d e x := θ i n d e x + g V ( w )               V ( w ) := V ( w ) + e E n d


我们先简单的说明一下流程图中的参数含义(部分已经在前面的文章解释了)

1. Negative N e g a t i v e 表示负采样的个数。比如对于本文的例子,词汇表的长度为6,当参数Negative设置为2的时候即每次抽取2个词作为负样本。


2. Neg{U}Negative N e g { U } N e g a t i v e 表示对词 U U 以外的词做负采样,采样的个数为 Negative N e g a t i v e 个。
比如说对于训练样本 {cat,say} { c a t , s a y } , Neg{say}2 N e g { s a y } 2 其结果就是在 dogwoofelepigmonkey d o g 、 w o o f 、 e l e 、 p i g 、 m o n k e y 抽取2个词作为负样本。


3. get_index{d} g e t _ i n d e x { d } 其实就是找到词 d d 的编号。后面计算过程中会说明。


4. Label{d} L a b e l { d } 为词 d d 的标签。
比如说对于训练样本 {cat,say} { c a t , s a y } , Label{say} L a b e l { s a y } 就是1。当抽取出来的负样本为 Neg{say}2={dog,woof} N e g { s a y } 2 = { d o g , w o o f }
Label{dog}=0 L a b e l { d o g } = 0 , Label{woof}=0 L a b e l { w o o f } = 0


计算

通过上面一系列的铺垫,其实有些问题可能还是很模糊,结合下面的计算,做进一步的说明。
我们先初始化 θindex θ i n d e x 辅 助 向 量 矩 阵

word θindex θ i n d e x 1 2 3 4 5
cat c a t θ0 θ 0 0 0 0 0 0
say s a y θ1 θ 1 0 0 0 0 0
dog d o g θ2 θ 2 0 0 0 0 0
woof w o o f θ3 θ 3 0 0 0 0 0
ele e l e θ4 θ 4 0 0 0 0 0
pig p i g θ5 θ 5 0 0 0 0 0
monkey m o n k e y θ6 θ 6 0 0 0 0 0

第一次训练:

步骤0:此时,
w=cat w = c a t
Context(w)={say} C o n t e x t ( w ) = { s a y }
V(cat)=[0.054,0.090,0.038,0.063,0.015] V ( c a t ) = [ 0.054 , − 0.090 , − 0.038 , 0.063 , − 0.015 ]

步骤1:
初始化向量 e=[0,0,0,0,0] e = [ 0 , 0 , 0 , 0 , 0 ]

步骤2:
首先计算正样本 say s a y
index=get_index{say}=1 i n d e x = g e t _ i n d e x { s a y } = 1
Label{say}=1 L a b e l { s a y } = 1
f=σ(V(w)Tθindex) f = σ ( V ( w ) T θ i n d e x ) ==>
σ(V(cat)Tθ1)=σ(0.05400.09000.03800.06300.0150)=0.5 σ ( V ( c a t ) T θ 1 ) = σ ( 0.054 ∗ 0 − 0.090 ∗ 0 − 0.038 ∗ 0 0.063 ∗ 0 − 0.015 ∗ 0 ) = 0.5

步骤3:
g=η(Label{say}f)=0.95(10.5)=0.475 g = η ( L a b e l { s a y } − f ) = 0.95 ∗ ( 1 − 0.5 ) = 0.475

步骤4:
e=e+gθindex e = e + g θ i n d e x ==>
e=e+gθ1=[0,0,0,0,0]+0.475[0,0,0,0,0]=[0,0,0,0,0] e = e + g θ 1 = [ 0 , 0 , 0 , 0 , 0 ] + 0.475 ∗ [ 0 , 0 , 0 , 0 , 0 ] = [ 0 , 0 , 0 , 0 , 0 ]

步骤5:
θindex=θindex+gV(w) θ i n d e x = θ i n d e x + g V ( w ) ==>
θ1=θ1+gV(cat)=[0,0,0,0,0]+0.475[0.054,0.090,0.038,0.063,0.015]=[0.026,0.043,0.018,0.030,0.007] θ 1 = θ 1 + g V ( c a t ) = [ 0 , 0 , 0 , 0 , 0 ] + 0.475 ∗ [ 0.054 , − 0.090 , − 0.038 , 0.063 , − 0.015 ] = [ 0.026 , − 0.043 , − 0.018 , 0.030 , − 0.007 ]

步骤6:
计算完正样本后,我们根据 Negative=2 N e g a t i v e = 2 参数设置,我们需要抽取两个词作为负样本,这里假设抽取的词为 Neg{say}2={dog,woof} N e g { s a y } 2 = { d o g , w o o f }

步骤2:(计算负样本 dog d o g
index=get_index{dog}=2 i n d e x = g e t _ i n d e x { d o g } = 2
Label{dog}=0 L a b e l { d o g } = 0
f=σ(V(w)Tθindex) f = σ ( V ( w ) T θ i n d e x ) ==>
σ(V(cat)Tθ2)=σ(0.05400.09000.03800.06300.0150)=0.5 σ ( V ( c a t ) T θ 2 ) = σ ( 0.054 ∗ 0 − 0.090 ∗ 0 − 0.038 ∗ 0 0.063 ∗ 0 − 0.015 ∗ 0 ) = 0.5

步骤3:
g=η(Label{dog}f)=0.95(00.5)=0.475 g = η ( L a b e l { d o g } − f ) = 0.95 ∗ ( 0 − 0.5 ) = − 0.475

步骤4:
e=e+gθindex e = e + g θ i n d e x ==>
e=e+gθ2=[0,0,0,0,0]0.475[0,0,0,0,0]=[0,0,0,0,0] e = e + g θ 2 = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ 0 , 0 , 0 , 0 , 0 ] = [ 0 , 0 , 0 , 0 , 0 ]

步骤5:
θindex=θindex+gV(w) θ i n d e x = θ i n d e x + g V ( w ) ==>
θ2=θ2+gV(cat)=[0,0,0,0,0]0.475[0.054,0.090,0.038,0.063,0.015]=[0.026,0.043,0.018,0.030,0.007] θ 2 = θ 2 + g V ( c a t ) = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ 0.054 , − 0.090 , − 0.038 , 0.063 , − 0.015 ] = [ − 0.026 , 0.043 , 0.018 , − 0.030 , 0.007 ]

计算完一个负样本,还有一个负样本:

步骤2:(计算负样本 woof w o o f
index=get_index{woof}=3 i n d e x = g e t _ i n d e x { w o o f } = 3
Label{woof}=0 L a b e l { w o o f } = 0
f=σ(V(w)Tθindex) f = σ ( V ( w ) T θ i n d e x ) ==>
σ(V(cat)Tθ3)=σ(0.05400.09000.03800.06300.0150)=0.5 σ ( V ( c a t ) T θ 3 ) = σ ( 0.054 ∗ 0 − 0.090 ∗ 0 − 0.038 ∗ 0 0.063 ∗ 0 − 0.015 ∗ 0 ) = 0.5

步骤3:
g=η(Label{woof}f)=0.95(00.5)=0.475 g = η ( L a b e l { w o o f } − f ) = 0.95 ∗ ( 0 − 0.5 ) = − 0.475

步骤4:
e=e+gθindex e = e + g θ i n d e x ==>
e=e+gθ3=[0,0,0,0,0]0.475[0,0,0,0,0]=[0,0,0,0,0] e = e + g θ 3 = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ 0 , 0 , 0 , 0 , 0 ] = [ 0 , 0 , 0 , 0 , 0 ]

步骤5:
θindex=θindex+gV(w) θ i n d e x = θ i n d e x + g V ( w ) ==>
θ3=θ3+gV(cat)=[0,0,0,0,0]0.475[0.054,0.090,0.038,0.063,0.015]=[0.026,0.043,0.018,0.030,0.007] θ 3 = θ 3 + g V ( c a t ) = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ 0.054 , − 0.090 , − 0.038 , 0.063 , − 0.015 ] = [ − 0.026 , 0.043 , 0.018 , − 0.030 , 0.007 ]

步骤7:更新词向量 V(w) V ( w )
V(w)=V(w)+e V ( w ) = V ( w ) + e ==>
V(w)=[0.054,0.090,0.038,0.063,0.015]+[0,0,0,0,0]=[0.054,0.090,0.038,0.063,0.015] V ( w ) = [ 0.054 , − 0.090 , − 0.038 , 0.063 , − 0.015 ] + [ 0 , 0 , 0 , 0 , 0 ] = [ 0.054 , − 0.090 , − 0.038 , 0.063 , − 0.015 ]

至此,对于训练样本 {cat,say} { c a t , s a y } 才训练完毕。
这里简单汇总一下训练的结果:

对于辅助向量 θindex θ i n d e x

word θindex θ i n d e x 1 2 3 4 5
cat c a t θ0 θ 0 0 0 0 0 0
say s a y θ1 θ 1 0.026 -0.043 -0.018 0.030 -0.007
dog d o g θ2 θ 2 -0.026 0.043 0.018 -0.030 0.007
woof w o o f θ3 θ 3 -0.026 0.043 0.018 -0.030 0.007
ele e l e θ4 θ 4 0 0 0 0 0
pig p i g θ5 θ 5 0 0 0 0 0
monkey m o n k e y θ6 θ 6 0 0 0 0 0

接着,我们需要训练下一个语料: {say,cat} { s a y , c a t }

步骤0:此时,
w=say w = s a y
Context(w)={cat} C o n t e x t ( w ) = { c a t }
V(say)=[0.010,0.073,0.065,0.042,0.044] V ( s a y ) = [ − 0.010 , − 0.073 , 0.065 , − 0.042 , − 0.044 ]

步骤1:
初始化向量 e=[0,0,0,0,0] e = [ 0 , 0 , 0 , 0 , 0 ]

步骤2:
首先计算正样本 cat c a t
index=get_index{cat}=0 i n d e x = g e t _ i n d e x { c a t } = 0
Label{cat}=1 L a b e l { c a t } = 1
f=σ(V(w)Tθindex) f = σ ( V ( w ) T θ i n d e x ) ==>
σ(V(say)Tθ0)=σ(0.01000.0730+0.06500.04200.0440)=0.5 σ ( V ( s a y ) T θ 0 ) = σ ( − 0.010 ∗ 0 − 0.073 ∗ 0 + 0.065 ∗ 0 − 0.042 ∗ 0 − 0.044 ∗ 0 ) = 0.5

步骤3:
g=η(Label{cat}f)=0.95(10.5)=0.475 g = η ( L a b e l { c a t } − f ) = 0.95 ∗ ( 1 − 0.5 ) = 0.475

步骤4:
e=e+gθindex e = e + g θ i n d e x ==>
e=e+gθ0=[0,0,0,0,0]+0.475[0,0,0,0,0]=[0,0,0,0,0] e = e + g θ 0 = [ 0 , 0 , 0 , 0 , 0 ] + 0.475 ∗ [ 0 , 0 , 0 , 0 , 0 ] = [ 0 , 0 , 0 , 0 , 0 ]

步骤5:
θindex=θindex+gV(w) θ i n d e x = θ i n d e x + g V ( w ) ==>
θ0=θ0+gV(say)=[0,0,0,0,0]+0.475[0.010,0.073,0.065,0.042,0.044]=[0.005,0.035,0.031,0.020,0.021] θ 0 = θ 0 + g V ( s a y ) = [ 0 , 0 , 0 , 0 , 0 ] + 0.475 ∗ [ − 0.010 , − 0.073 , 0.065 , − 0.042 , − 0.044 ] = [ − 0.005 , − 0.035 , 0.031 , − 0.020 , − 0.021 ]

步骤6:
计算完正样本后,我们根据 Negative=2 N e g a t i v e = 2 参数设置,我们需要抽取两个词作为负样本,这里假设抽取的词为 Neg{cat}2={dog,ele} N e g { c a t } 2 = { d o g , e l e }

步骤2:(计算负样本 dog d o g
index=get_index{dog}=2 i n d e x = g e t _ i n d e x { d o g } = 2
Label{dog}=0 L a b e l { d o g } = 0
f=σ(V(w)Tθindex) f = σ ( V ( w ) T θ i n d e x ) ==>
σ(V(say)Tθ2)=σ(0.010(0.026)0.0730.043+0.0650.0180.0420.0300.0440.007)0.5 σ ( V ( s a y ) T θ 2 ) = σ ( − 0.010 ∗ ( − 0.026 ) − 0.073 ∗ 0.043 + 0.065 ∗ 0.018 − 0.042 ∗ − 0.030 − 0.044 ∗ 0.007 ) ≈ 0.5

步骤3:
g=η(Label{dog}f)=0.95(00.5)=0.475 g = η ( L a b e l { d o g } − f ) = 0.95 ∗ ( 0 − 0.5 ) = − 0.475

步骤4:
e=e+gθindex e = e + g θ i n d e x ==>
e=e+gθ2=[0,0,0,0,0]0.475[0.026,0.043,0.018,0.030,0.007]=[0.012,0.020,0.009,0.014,0.003] e = e + g θ 2 = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ − 0.026 , 0.043 , 0.018 , − 0.030 , 0.007 ] = [ 0.012 , − 0.020 , − 0.009 , 0.014 , − 0.003 ]

步骤5:
θindex=θindex+gV(w) θ i n d e x = θ i n d e x + g V ( w ) ==>
θ2=θ2+gV(say)=[0.026,0.043,0.018,0.030,0.007]0.475[0.010,0.073,0.065,0.042,0.044]=[0.021,0.078,0.013,0.010,0.028] θ 2 = θ 2 + g V ( s a y ) = [ − 0.026 , 0.043 , 0.018 , − 0.030 , 0.007 ] − 0.475 ∗ [ − 0.010 , − 0.073 , 0.065 , − 0.042 , − 0.044 ] = [ − 0.021 , 0.078 , − 0.013 , − 0.010 , 0.028 ]

计算完一个负样本,还有一个负样本:

步骤2:(计算负样本 ele e l e
index=get_index{ele}=4 i n d e x = g e t _ i n d e x { e l e } = 4
Label{ele}=0 L a b e l { e l e } = 0
f=σ(V(w)Tθindex) f = σ ( V ( w ) T θ i n d e x ) ==>
σ(V(say)Tθ4)=σ(0.01000.0730+0.06500.04200.0440)=0.5 σ ( V ( s a y ) T θ 4 ) = σ ( − 0.010 ∗ 0 − 0.073 ∗ 0 + 0.065 ∗ 0 − 0.042 ∗ 0 − 0.044 ∗ 0 ) = 0.5

步骤3:
g=η(Label{ele}f)=0.95(00.5)=0.475 g = η ( L a b e l { e l e } − f ) = 0.95 ∗ ( 0 − 0.5 ) = − 0.475

步骤4:
e=e+gθindex e = e + g θ i n d e x ==>
e=e+gθ4=[0,0,0,0,0]0.475[0,0,0,0,0]=[0,0,0,0,0] e = e + g θ 4 = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ 0 , 0 , 0 , 0 , 0 ] = [ 0 , 0 , 0 , 0 , 0 ]

步骤5:
θindex=θindex+gV(w) θ i n d e x = θ i n d e x + g V ( w ) ==>
θ4=θ4+gV(say)=[0,0,0,0,0]0.475[0.010,0.073,0.065,0.042,0.044]=[0.005,0.035,0.031,0.020,0.021] θ 4 = θ 4 + g V ( s a y ) = [ 0 , 0 , 0 , 0 , 0 ] − 0.475 ∗ [ − 0.010 , − 0.073 , 0.065 , − 0.042 , − 0.044 ] = [ 0.005 , 0.035 , − 0.031 , 0.020 , 0.021 ]

步骤7:更新词向量 V(w) V ( w )
V(w)=V(w)+e V ( w ) = V ( w ) + e ==>
V(w)=[0.010,0.073,0.065,0.042,0.044]+[0.012,0.020,0.009,0.014,0.003]=[0.002,0.093,0.056,0.028,0.047] V ( w ) = [ − 0.010 , − 0.073 , 0.065 , − 0.042 , − 0.044 ] + [ 0.012 , − 0.020 , − 0.009 , 0.014 , − 0.003 ] = [ 0.002 , − 0.093 , − 0.056 , − 0.028 , − 0.047 ]

至此,第二个训练样本也训练完了,后面剩余的流程也类似。

以上的内容就是Neagtive Sampling的计算流程。

你可能感兴趣的:(机器学习理论学习,机器学习)