机器学习笔记之受限玻尔兹曼机(三)推断任务——后验概率

机器学习笔记之受限玻尔兹曼机——推断任务[后验概率]

  • 引言
    • 回顾:受限玻尔兹曼机的模型表示
    • 推断任务求解——后验概率(posterior)
      • 基于隐变量的后验概率求解
      • 基于观测变量的后验概率求解
    • 受限玻尔兹曼机与神经网络的联系

引言

上一节介绍了受限玻尔兹曼机的模型表示(Representation),本节将介绍推断任务(Inference)。

回顾:受限玻尔兹曼机的模型表示

针对玻尔兹曼机概率图结构过于复杂,计算代价过于庞大的问题,提出一种关于结点间边的约束方式:只有隐变量和观测变量之间存在边连接,隐变量、观测变量内部无边连接
已知一个受限玻尔兹曼机表示如下:
机器学习笔记之受限玻尔兹曼机(三)推断任务——后验概率_第1张图片
从图中可以看出,受限玻尔兹曼机将随机变量集合 X \mathcal X X分成两个部分:
X = ( x 1 , x 2 , ⋯   , x p ) T = ( h v ) \mathcal X = (x_1,x_2,\cdots,x_p)^T = \begin{pmatrix} h \\ v\end{pmatrix} X=(x1,x2,,xp)T=(hv)

  • 其中蓝色结点表示观测变量包含的随机变量集合(这里使用向量表示) v = ( v 1 , v 2 , ⋯   , v n ) T v = (v_1,v_2,\cdots,v_n)^T v=(v1,v2,,vn)T
  • 白色结点表示隐变量包含的随机变量集合 h = ( h 1 , h 2 , ⋯   , h m ) T h = (h_1,h_2,\cdots,h_m)^T h=(h1,h2,,hm)T
  • 并且有 m + n = p m + n = p m+n=p

基于该模型,随机变量集合 X \mathcal X X联合概率分布 P ( X ) \mathcal P(\mathcal X) P(X)表示如下:
P ( X ) = P ( h , v ) = 1 Z exp ⁡ { − E ( h , v ) } = 1 Z exp ⁡ ( v T W h + b T v + c T h ) = 1 Z { ∏ j = 1 m ∏ i = 1 n exp ⁡ ( v i ⋅ w i j ⋅ h j ) ∏ i = 1 n exp ⁡ ( b i v i ) ∏ j = 1 m exp ⁡ ( c j h j ) } \begin{aligned} \mathcal P(\mathcal X) = \mathcal P(h,v) & = \frac{1}{\mathcal Z} \exp \{- \mathbb E(h,v)\} \\ & = \frac{1}{\mathcal Z} \exp (v^T \mathcal W h + b^T v + c^Th) \\ & = \frac{1}{\mathcal Z} \left\{\prod_{j=1}^m \prod_{i=1}^n \exp (v_i \cdot w_{ij} \cdot h_j)\prod_{i=1}^n \exp (b_iv_i) \prod_{j=1}^m \exp (c_jh_j)\right\} \end{aligned} P(X)=P(h,v)=Z1exp{E(h,v)}=Z1exp(vTWh+bTv+cTh)=Z1{j=1mi=1nexp(viwijhj)i=1nexp(bivi)j=1mexp(cjhj)}
其中 W , b , c \mathcal W,b,c W,b,c分别表示针对结点和边的权重信息:
W = ( w 11 , w 12 , ⋯   , w 1 m w 21 , w 22 , ⋯   , w 2 m ⋮ w n 1 , w n 2 , ⋯   , w n m ) n × m b = ( b 1 b 2 ⋮ b n ) n × 1 c = ( c 1 c 2 ⋮ c m ) m × 1 \mathcal W = \begin{pmatrix} w_{11},w_{12},\cdots,w_{1m} \\ w_{21},w_{22},\cdots,w_{2m} \\ \vdots \\ w_{n1},w_{n2},\cdots,w_{nm} \\ \end{pmatrix}_{n \times m} \quad b = \begin{pmatrix} b_1 \\b_2 \\ \vdots \\ b_n \end{pmatrix}_{n \times 1} \quad c = \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_m \end{pmatrix}_{m \times 1} W=w11,w12,,w1mw21,w22,,w2mwn1,wn2,,wnmn×mb=b1b2bnn×1c=c1c2cmm×1

推断任务求解——后验概率(posterior)

关于受限玻尔兹曼机的推断任务,是基于模型参数 W , b , c \mathcal W,b,c W,b,c均已给定(模型已知),将随机变量 v , h v,h v,h的概率分布求解出来。这里主要求解两方面的概率结果

  • 后验概率:包括观测变量后验 P ( v ∣ h ) \mathcal P(v \mid h) P(vh),以及隐变量后验 P ( h ∣ v ) \mathcal P(h \mid v) P(hv)
  • 边缘概率:主要关注观测变量边缘概率分布 P ( v ) \mathcal P(v) P(v)

基于隐变量的后验概率求解

这里以隐变量后验 P ( h ∣ v ) \mathcal P(h \mid v) P(hv)为例,进行求解。 P ( h ∣ v ) \mathcal P(h \mid v) P(hv)本质上是针对隐变量集合的联合后验概率分布 进行求解:
P ( h ∣ v ) = P ( h 1 , h 2 , ⋯   , h m ∣ v ) \mathcal P(h \mid v) = \mathcal P(h_1,h_2,\cdots,h_m \mid v) P(hv)=P(h1,h2,,hmv)

为了简化运算,定义随机变量集合 X \mathcal X X服从伯努利分布(Bernoulli Distribution)。从而无论是观测变量还是隐变量,都仅包含两种选择方式: { 0 , 1 } \{0,1\} {0,1}

然而根据受限玻尔兹曼机的特殊约束,在 v v v被观测的条件下,任意两个隐变量 h i , h j ∈ h ; i ≠ j h_i,h_j \in h;i\neq j hi,hjh;i=j之间均存在条件独立性。即:
详见马尔可夫随机场的结构表示中的’全局马尔可夫性‘(Global Markov Property),由于 h i , h j h_i,h_j hi,hj之间不存在直接关联关系,因而它们只可能与某一观测变量之间达成关联关系。如果该观测变量被观测, h i , h j h_i,h_j hi,hj之间路径阻塞,两者自然条件独立。
h i ⊥ h j ∣ v h_i \perp h_j \mid v hihjv
因而,可以将 P ( h ∣ v ) \mathcal P(h \mid v) P(hv)简化为:
P ( h ∣ v ) = ∏ l = 1 m P ( h l ∣ v ) \mathcal P(h \mid v) = \prod_{l=1}^m \mathcal P(h_l \mid v) P(hv)=l=1mP(hlv)
仅需求解出 P ( h l ∣ v ) \mathcal P(h_l \mid v) P(hlv)即可。

  • 首先求解 P ( h l = 1 ∣ v ) \mathcal P(h_l = 1 \mid v) P(hl=1v),回顾已知条件——模型给定意味着随机变量 X \mathcal X X隐变量 h h h观测变量 v v v概率密度函数/联合概率分布 P ( X ) , P ( h ) , P ( v ) \mathcal P(\mathcal X),\mathcal P(h),\mathcal P(v) P(X),P(h),P(v)均是已知的。因此,这里将 除去 h l h_l hl之外剩余的其他隐变量 h − l = { h j } j ≠ l h_{-l} = \{h_j\}_{j \neq l} hl={hj}j=l引入:
    为什么可以将 h − l h_{-l} hl直接写在条件概率的条件部分:因为 h − l h_{-l} hl中的所有隐变量结点均与 h l h_l hl条件独立。这相当于 h − l h_{-l} hl是无关条件,不影响 h l h_l hl后验概率结果。
    P ( h l = 1 ∣ v ) = P ( h l = 1 ∣ h − l , v ) \begin{aligned} \mathcal P(h_l = 1\mid v) = \mathcal P(h_l =1\mid h_{-l},v) \end{aligned} P(hl=1v)=P(hl=1hl,v)
    使用贝叶斯定理将其展开:
    后续推导,前式均使用 Δ \Delta Δ进行表示。
    Δ = P ( h l = 1 , h − l , v ) P ( h − l , v ) = P ( h l = 1 , h − l , v ) ∑ h l = 0 , 1 P ( h l , h − l , v ) = P ( h l = 1 , h − l , v ) P ( h l = 1 , h − l , v ) + P ( h l = 0 , h − l , v ) \begin{aligned} \Delta & = \frac{\mathcal P(h_l=1,h_{-l},v)}{\mathcal P(h_{-l},v)} \\ & = \frac{\mathcal P(h_l=1,h_{-l},v)}{\sum_{h_l = 0,1} \mathcal P(h_l,h_{-l},v)}\\ & = \frac{\mathcal P(h_l = 1,h_{-l},v)}{\mathcal P(h_l = 1,h_{-l},v) + \mathcal P(h_l = 0,h_{-l},v)} \end{aligned} Δ=P(hl,v)P(hl=1,hl,v)=hl=0,1P(hl,hl,v)P(hl=1,hl,v)=P(hl=1,hl,v)+P(hl=0,hl,v)P(hl=1,hl,v)

  • 如何求解 P ( h l = 1 , h − l , v ) \mathcal P(h_l = 1,h_{-l},v) P(hl=1,hl,v)?此时关于 h , v h,v h,v联合概率分布 P ( h , v ) \mathcal P(h,v) P(h,v)是已知的,它就是 P ( X ) \mathcal P(\mathcal X) P(X)。这里利用联合概率分布 P ( h , v ) \mathcal P(h,v) P(h,v) P ( h l = 1 , h − l , v ) \mathcal P(h_l = 1,h_{-l},v) P(hl=1,hl,v)进行求解。将表示联合概率分布的能量函数 分解成两部分:
    E ( h , v ) = − ( ∑ j = 1 m ∑ i = 1 n v i ⋅ w i j ⋅ h j + ∑ i = 1 n b i v i + ∑ j = 1 m c j h j ) \mathbb E(h,v) = -\left(\sum_{j=1}^m \sum_{i=1}^n v_i \cdot w_{ij} \cdot h_j + \sum_{i=1}^n b_iv_i + \sum_{j=1}^m c_jh_j\right) E(h,v)=(j=1mi=1nviwijhj+i=1nbivi+j=1mcjhj)

    • h l h_l hl有关的部分:
      和隐变量 h l h_l hl结点相关的部分表示如下。
      机器学习笔记之受限玻尔兹曼机(三)推断任务——后验概率_第2张图片
      h l h_l hl被确定之后, A h l ( v ) \mathcal A_{h_l}(v) Ahl(v)函数和其他隐变量结点之间没有联系; H l ( v ) \mathcal H_l(v) Hl(v)表示‘和隐变量’ h l h_l hl相关的、仅包含 v v v一种变量的函数(因为模型已知,模型参数 w i l , c l w_{il},c_l wil,cl均已知)。
      A h l ( v ) = h l ∑ i = 1 n w i l ⋅ v i + c l ⋅ h l = h l ( ∑ i = 1 n w i l ⋅ v i + c l ) = h l ⋅ H l ( v ) \begin{aligned} \mathcal A_{h_l}(v) & = h_l \sum_{i=1}^n w_{il} \cdot v_i + c_l \cdot h_l \\ & = h_l \left(\sum_{i=1}^n w_{il} \cdot v_i + c_l\right) \\ & = h_l \cdot \mathcal H_l(v) \end{aligned} Ahl(v)=hli=1nwilvi+clhl=hl(i=1nwilvi+cl)=hlHl(v)
    • 剩余和 h l h_l hl无关的分布:
      除了上述的图描述,剩余的子图全部是‘与 h l h_l hl无关的分布’,用 H − l ( h − l , v ) \mathcal H_{-l}(h_{-l},v) Hl(hl,v)表示。该式子和除去 h l h_l hl之外的其他结点均有关联。
      H − l ( h − l , v ) = ∑ j ≠ l m ∑ i = 1 n h j ⋅ w j i ⋅ v i + ∑ i = 1 n b i v i + ∑ j ≠ l m c j h j E ( h , v ) = − ( A h l ( v ) + H − l ( h − l , v ) ) = − [ h l ⋅ H l ( v ) + H − l ( h − l , v ) ] \begin{aligned} \mathcal H_{{-l}}(h_{-l},v) & = \sum_{j \neq l}^m \sum_{i=1}^n h_j \cdot w_{ji} \cdot v_i + \sum_{i=1}^n b_i v_i + \sum_{j \neq l}^m c_j h_j \\ \mathbb E(h,v) & = - (\mathcal A_{h_l}(v) + \mathcal H_{-l}(h_{-l},v)) \\ & = - \left[ h_l \cdot \mathcal H_l(v) + \mathcal H_{-l}(h_{-l},v)\right] \end{aligned} Hl(hl,v)E(h,v)=j=lmi=1nhjwjivi+i=1nbivi+j=lmcjhj=(Ahl(v)+Hl(hl,v))=[hlHl(v)+Hl(hl,v)]
  • 至此,回归公式 Δ \Delta Δ

    • 分子部分可表示为:
      h l = 1 h_l=1 hl=1代入。
      P ( h l = 1 , h − l , v ) = 1 Z exp ⁡ [ − E ( h , v ) ] = 1 Z exp ⁡ [ H l ( v ) + H − l ( h − l , v ) ] \begin{aligned} \mathcal P(h_l=1,h_{-l},v) & = \frac{1}{\mathcal Z} \exp [ - \mathbb E(h,v)] \\ & = \frac{1}{\mathcal Z} \exp \left[\mathcal H_l(v) + \mathcal H_{-l}(h_{-l},v)\right] \end{aligned} P(hl=1,hl,v)=Z1exp[E(h,v)]=Z1exp[Hl(v)+Hl(hl,v)]
    • 分母部分可表示为:
      h l = 0 h_l=0 hl=0代入。
      P ( h l = 1 , h − l , v ) + P ( h l = 0 , h − l , v ) = 1 Z exp ⁡ [ H l ( v ) + H − l ( h − l , v ) ] + 1 Z exp ⁡ [ H − l ( h − l , v ) ] \begin{aligned} \mathcal P(h_l = 1,h_{-l},v) + \mathcal P(h_l = 0,h_{-l},v) & = \frac{1}{\mathcal Z} \exp \left[\mathcal H_l(v) + \mathcal H_{-l}(h_{-l},v)\right] + \frac{1}{\mathcal Z} \exp [\mathcal H_{-l}(h_{-l},v)] \end{aligned} P(hl=1,hl,v)+P(hl=0,hl,v)=Z1exp[Hl(v)+Hl(hl,v)]+Z1exp[Hl(hl,v)]

    此时分子、分母同时除以分子
    1 Z , H − l ( h − l , v ) \frac{1}{\mathcal Z},\mathcal H_{-l}(h_{-l},v) Z1,Hl(hl,v)均消掉了。
    Δ = P ( h l = 1 , h − l , v ) P ( h l = 1 , h − l , v ) + P ( h l = 0 , h − l , v ) = 1 1 + P ( h l = 0 , h − l , v ) P ( h l = 1 , h − l , v ) = 1 1 + 1 Z exp ⁡ [ H − l ( h − l , v ) ] 1 Z exp ⁡ [ H l ( v ) + H − l ( h − l , v ) ] = 1 1 + exp ⁡ { − H l ( v ) } \begin{aligned} \Delta & = \frac{\mathcal P(h_l = 1,h_{-l},v)}{\mathcal P(h_l = 1,h_{-l},v) + \mathcal P(h_l = 0,h_{-l},v)} \\ & = \frac{1}{1 + \frac{\mathcal P(h_l = 0,h_{-l},v)}{\mathcal P(h_l = 1,h_{-l},v)}} \\ & = \frac{1}{1 + \frac{\frac{1}{\mathcal Z} \exp [\mathcal H_{-l}(h_{-l},v)]}{\frac{1}{\mathcal Z} \exp \left[\mathcal H_l(v) + \mathcal H_{-l}(h_{-l},v)\right]}} \\ & = \frac{1}{1 + \exp \{-\mathcal H_l(v)\}} \end{aligned} Δ=P(hl=1,hl,v)+P(hl=0,hl,v)P(hl=1,hl,v)=1+P(hl=1,hl,v)P(hl=0,hl,v)1=1+Z1exp[Hl(v)+Hl(hl,v)]Z1exp[Hl(hl,v)]1=1+exp{Hl(v)}1
    这个格式实际上就是 Sigmoid \text{Sigmoid} Sigmoid函数 的表达形式:
    Sigmoid ( x ) = 1 1 + e − x \text{Sigmoid}(x) = \frac{1}{1 + e^{-x}} Sigmoid(x)=1+ex1

  • 因此,基于伯努利分布的离散型随机变量,受限玻尔兹曼机中基于观测变量 v v v给定(已被观测) 的条件下,某隐变量 h l h_l hl的后验概率分布 P ( h l = 1 ∣ v ) \mathcal P(h_l = 1 \mid v) P(hl=1v)可以使用 Sigmoid \text{Sigmoid} Sigmoid函数进行表示
    哈哈,叠了一堆buff~
    此时的表达式中全部是已知的量。 w i l , c l w_{il},c_l wil,cl是模型参数; v i ( i = 1 , 2 , ⋯   , n ) v_i(i=1,2,\cdots,n) vi(i=1,2,,n)表示观测值。
    P ( h l = 1 ∣ v ) = σ [ H l ( v ) ] = 1 1 + exp ⁡ [ − H l ( v ) ] = 1 1 + exp ⁡ [ − ( ∑ i = 1 n w l i ⋅ v i + c l ) ] \begin{aligned} \mathcal P(h_l = 1 \mid v) & = \sigma [\mathcal H_l(v)] \\ & = \frac{1}{1 + \exp [- \mathcal H_l(v)]} \\ & = \frac{1}{1 + \exp \left[-\left(\sum_{i=1}^n w_{li} \cdot v_i + c_l\right)\right]} \end{aligned} P(hl=1v)=σ[Hl(v)]=1+exp[Hl(v)]1=1+exp[(i=1nwlivi+cl)]1

此时 P ( h l = 1 ∣ v ) \mathcal P(h_l = 1 \mid v) P(hl=1v)已经求解。同理, P ( h l = 0 ∣ v ) = 1 − P ( h l = 1 ∣ v ) \mathcal P(h_l = 0 \mid v) = 1 - \mathcal P(h_l = 1 \mid v) P(hl=0v)=1P(hl=1v),从而关于 P ( h l ∣ v ) \mathcal P(h_l \mid v) P(hlv)条件概率分布求解完毕
P ( h l ∣ v ) = { 1 1 + exp ⁡ [ − ( ∑ i = 1 n w l i ⋅ v i + c l ) ] h l = 1 exp ⁡ [ − ( ∑ i = 1 n w l i ⋅ v i + c l ) ] 1 + exp ⁡ [ − ( ∑ i = 1 n w l i ⋅ v i + c l ) ] h l = 0 \mathcal P(h_l \mid v) = \begin{cases} \frac{1}{1 + \exp \left[-\left(\sum_{i=1}^n w_{li} \cdot v_i + c_l\right)\right]} \quad h_l =1 \\ \quad \\ \frac{\exp \left[-\left(\sum_{i=1}^n w_{li} \cdot v_i + c_l\right)\right]}{1 + \exp \left[-\left(\sum_{i=1}^n w_{li} \cdot v_i + c_l\right)\right]} \quad h_l = 0 \end{cases} P(hlv)=1+exp[(i=1nwlivi+cl)]1hl=11+exp[(i=1nwlivi+cl)]exp[(i=1nwlivi+cl)]hl=0
从而,关于所有隐变量结点的后验概率分布 P ( h ∣ v ) \mathcal P(h \mid v) P(hv)即可求解:
P ( h ∣ v ) = ∏ j = 1 m P ( h j ∣ v ) \mathcal P(h \mid v) = \prod_{j=1}^m \mathcal P(h_j \mid v) P(hv)=j=1mP(hjv)

基于观测变量的后验概率求解

后验概率 P ( v ∣ h ) \mathcal P(v \mid h) P(vh)求解过程和 P ( h ∣ v ) \mathcal P(h \mid v) P(hv)求解思路完全相同
P ( v ∣ h ) = ∏ i = 1 n P ( v i ∣ h ) \mathcal P(v \mid h) = \prod_{i=1}^n \mathcal P(v_i \mid h) P(vh)=i=1nP(vih)
由于随机变量集合 v v v中各随机变量相互独立,依然从 v v v选择一个随机变量 v k v_k vk进行求解。由于 v k v_k vk同样是伯努利分布,因而 P ( v k = 1 ∣ h ) \mathcal P(v_k = 1 \mid h) P(vk=1h)可表示为:
对分母进行积分~
这次先将分子、分母同时除以 P ( v k = 1 , v − k , h ) \mathcal P(v_k = 1,v_{-k},h) P(vk=1,vk,h);
P ( v k = 1 ∣ h ) = P ( v k = 1 ∣ v − k , h ) = P ( v k = 1 , v − k , h ) P ( v − k , h ) = P ( v k = 1 , v − k , h ) P ( v k = 1 , v − k , h ) + P ( v k = 0 , v − k , h ) = 1 1 + P ( v k = 0 , v − k , h ) P ( v k = 1 , v − k , h ) \begin{aligned} \mathcal P(v_k = 1 \mid h) & = \mathcal P(v_k = 1 \mid v_{-k},h) \\ & = \frac{\mathcal P(v_k = 1,v_{-k},h)}{\mathcal P(v_{-k},h)}\\ & = \frac{\mathcal P(v_k = 1,v_{-k},h)}{\mathcal P(v_k = 1,v_{-k},h) + \mathcal P(v_k = 0,v_{-k},h)} \\ & = \frac{1}{1 + \frac{\mathcal P(v_k = 0,v_{-k},h)}{\mathcal P(v_k = 1,v_{-k},h)}} \end{aligned} P(vk=1h)=P(vk=1vk,h)=P(vk,h)P(vk=1,vk,h)=P(vk=1,vk,h)+P(vk=0,vk,h)P(vk=1,vk,h)=1+P(vk=1,vk,h)P(vk=0,vk,h)1
此时,需要求解 P ( v k = 0 , v − k , h ) , P ( v k = 1 , v − k , h ) \mathcal P(v_k = 0,v_{-k},h),\mathcal P(v_k = 1,v_{-k},h) P(vk=0,vk,h),P(vk=1,vk,h)。此时与 v k v_k vk相关的(存在边相连接的)随机变量集合为:
机器学习笔记之受限玻尔兹曼机(三)推断任务——后验概率_第3张图片
依然将结点分成两部分, v k v_k vk相关的和无关的。对应的能量函数表示如下:
E ( h , v ) = − [ v k ⋅ V k ( h ) + V − k ( v − k , h ) ] { V k ( h ) = ∑ j = 1 m w k j ⋅ h j + b k V − k ( v − k , h ) = ∑ j = 1 m ∑ i ≠ k n h j ⋅ w j i ⋅ v i + ∑ i ≠ k n b i v i + ∑ j = 1 m c j h j \begin{aligned} \mathbb E(h,v) & = -[v_k \cdot \mathcal V_k(h) + \mathcal V_{-k}(v_{-k},h)] \\ & \begin{cases} \mathcal V_k(h) = \sum_{j=1}^m w_{kj} \cdot h_j + b_k \\ \mathcal V_{-k}(v_{-k},h) = \sum_{j=1}^m \sum_{i \neq k}^n h_j \cdot w_{ji} \cdot v_i + \sum_{i\neq k}^n b_i v_i + \sum_{j=1}^m c_j h_j \end{cases} \end{aligned} E(h,v)=[vkVk(h)+Vk(vk,h)]{Vk(h)=j=1mwkjhj+bkVk(vk,h)=j=1mi=knhjwjivi+i=knbivi+j=1mcjhj
v k = 1 , v k = 0 v_k= 1,v_k= 0 vk=1vk=0代入,有:
{ P ( v k = 0 , v − k , h ) = 1 Z exp ⁡ { V − k ( v − k , h ) } P ( v k = 1 , v − k , h ) = 1 Z exp ⁡ { V k ( h ) + V − k ( v − k , h ) } \begin{cases} \mathcal P(v_k = 0,v_{-k},h) = \frac{1}{\mathcal Z} \exp \{\mathcal V_{-k}(v_{-k},h)\} \\ \mathcal P(v_k = 1,v_{-k},h) = \frac{1}{\mathcal Z} \exp \{\mathcal V_k(h) + \mathcal V_{-k}(v_{-k},h)\} \end{cases} {P(vk=0,vk,h)=Z1exp{Vk(vk,h)}P(vk=1,vk,h)=Z1exp{Vk(h)+Vk(vk,h)}
最终,得到 P ( v k = 1 ∣ h ) \mathcal P(v_k = 1 \mid h) P(vk=1h)结果如下:
P ( v k = 1 ∣ h ) = 1 1 + exp ⁡ [ − V k ( h ) ] = 1 1 + [ − ∑ j = 1 m w k j ⋅ h j + b k ] \begin{aligned} \mathcal P(v_k = 1 \mid h) & = \frac{1}{1 + \exp [-\mathcal V_k(h)]} \\ & = \frac{1}{1 + [-\sum_{j=1}^m w_{kj} \cdot h_j + b_k]} \end{aligned} P(vk=1h)=1+exp[Vk(h)]1=1+[j=1mwkjhj+bk]1

受限玻尔兹曼机与神经网络的联系

重新观察表示 P ( h l ∣ v ) \mathcal P(h_l \mid v) P(hlv) Sigmoid \text{Sigmoid} Sigmoid函数
Sigmoid ( ∑ i = 1 n w i l ⋅ v i + c l ) \text{Sigmoid} \left(\sum_{i=1}^n w_{il} \cdot v_i + c_l\right) Sigmoid(i=1nwilvi+cl)
Sigmoid \text{Sigmoid} Sigmoid函数内部明显是一个线性计算
观测变量集合 v = ( v 1 , v 2 , ⋯   , v n ) T v = (v_1,v_2,\cdots,v_n)^T v=(v1,v2,,vn)T是自变量; W l = ( w 1 l , w 2 l ⋯   , w n l ) T \mathcal W_l = (w_{1l},w_{2l}\cdots,w_{nl})^T Wl=(w1l,w2l,wnl)T表示权重信息; c l c_l cl表示偏置信息。
∑ j = 1 n w l j ⋅ v j + c l = ( w 1 l , w 2 l , ⋯   , w n l ) ( v 1 v 2 ⋮ v n ) + c l = W l T ⋅ v + c l \begin{aligned} \sum_{j=1}^n w_{lj} \cdot v_j + c_l & = (w_{1l},w_{2l},\cdots,w_{nl})\begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} + c_l \\ & = \mathcal W_l^T \cdot v + c_l \end{aligned} j=1nwljvj+cl=(w1l,w2l,,wnl)v1v2vn+cl=WlTv+cl
因此,可以将 受限玻尔兹曼机和神经网络关联起来。将每一个观测变量 v i ( i = 1 , 2 , ⋯   , n ) v_i(i=1,2,\cdots,n) vi(i=1,2,,n)看做一个神经元;因而受限玻尔兹曼机的隐变量可看成 激活函数是 Sigmoid \text{Sigmoid} Sigmoid函数的神经网络的隐藏层

下一节将介绍受限玻尔兹曼机的推断任务——边缘概率求解

相关参考:
机器学习-受限玻尔兹曼机(5)-模型推断(Inference)-后验概率

你可能感兴趣的:(机器学习,机器学习,受限玻尔兹曼机,条件概率推断,sigmoid函数,RBM和神经网络的关系)