关于机器学习中的受限玻尔兹曼机(RBM)的非二值情况的推导
http://blog.51cto.com/13345387/1971665
限制Boltzmann机(Restricted Boltzmann Machine)
http://www.cnblogs.com/neopenx/p/4399336.html
上面两篇文章对非二值情况的受限玻尔兹曼机说得相当详细了。
(1)Binary-Binary
E ( v , h ) = − ∑ j = 1 n ∑ i = 1 m w i j h j v i − ∑ i = 1 m a i v i − ∑ j = 1 n b j h j E\left ( v,h \right ) = -\sum_{j=1}^{n}\sum_{i=1}^{m}w_{ij}h_{j}v_{i} - \sum_{i=1}^{m}a_{i}v_{i} - \sum_{j=1}^{n}b_{j}h_{j} E(v,h)=−j=1∑ni=1∑mwijhjvi−i=1∑maivi−j=1∑nbjhj
(2)Gaussian-Binary
E ( v , h ) = − ∑ j = 1 n ∑ i = 1 m w i j h j v i σ i − ∑ i = 1 m ( v i − a i ) 2 2 σ i 2 − ∑ j = 1 n b j h j E\left ( v,h \right ) = -\sum_{j=1}^{n}\sum_{i=1}^{m}w_{ij}h_{j}\frac{v_{i}}{\sigma_{i}} - \sum_{i=1}^{m}\frac{\left ( v_{i}-a_{i} \right )^2}{2\sigma_{i}^2} - \sum_{j=1}^{n}b_{j}h_{j} E(v,h)=−j=1∑ni=1∑mwijhjσivi−i=1∑m2σi2(vi−ai)2−j=1∑nbjhj
(3)Gaussian-Gaussian
因为很不稳定,所以我也没有去研究。
我们知道权重的更新为:
Δ w i j = ϵ ( ⟨ v i h j ⟩ d a t a − ⟨ v i h j ⟩ m o d e l ) \Delta w_{ij} = \epsilon (\left \langle v_{i}h_{j} \right \rangle_{data}-\left \langle v_{i}h_{j} \right \rangle_{model}) Δwij=ϵ(⟨vihj⟩data−⟨vihj⟩model)
第一项数据分布的期望可以从输入获得
p ( H j = 1 ∣ v ) = σ ( ∑ i = 1 m w i j v i + a j ) p\left ( H_{j}=1|v \right )=\sigma\left ( \sum_{i=1}^{m}w_{ij}v_{i}+a_{j} \right ) p(Hj=1∣v)=σ(i=1∑mwijvi+aj)
p ( V i = 1 ∣ h ) = σ ( ∑ j = 1 m w i j h j + b i ) p\left ( V_{i}=1|h \right )=\sigma\left ( \sum_{j=1}^{m}w_{ij}h_{j}+b_{i} \right ) p(Vi=1∣h)=σ(j=1∑mwijhj+bi)
但是很难求第二项模型分布的期望就难求了,Hinton提出了共轭梯度的概念,简化了权重的更新公式。
里面涉及到吉布斯采样等内容。
Binary unit与Gaussian unit两中单元唯一的区别就是激活的方式。
Binary unit和Gaussian unit都是根据随机给定的概率进行决定是否激活的。通过一个sigmoid函数得到某个unit的激活概率,对于二进制单元,使用一个0/1随机生成器,当记过概率大于随机数,则单元激活,输出1;而对于高斯单元,随机数生成的方式不一样,它通过一个高斯分布生成,均值为激活单元的均值,方差需要提前设定好,一般取较小的数如0.01.
看代码:
def gibbs_sampling_step(self, visible, n_features):
"""Perform one step of gibbs sampling.
:param visible: activations of the visible units
:param n_features: number of features
:return: tuple(hidden probs, hidden states, visible probs,
new hidden probs, new hidden states)
"""
hprobs, hstates = self.sample_hidden_from_visible(visible)
vprobs = self.sample_visible_from_hidden(hprobs, n_features)
hprobs1, hstates1 = self.sample_hidden_from_visible(vprobs)
return hprobs, hstates, vprobs, hprobs1, hstates1
def sample_visible_from_hidden(self, hidden, n_features):
"""Sample the visible units from the hidden units.
This is the Negative phase of the Contrastive Divergence algorithm.
:param hidden: activations of the hidden units
:param n_features: number of features
:return: visible probabilities
"""
visible_activation = tf.add(
tf.matmul(hidden, tf.transpose(self.W)),
self.bv_
)
if self.visible_unit_type == 'bin':
vprobs = tf.nn.sigmoid(visible_activation)
elif self.visible_unit_type == 'gauss':
vprobs = tf.truncated_normal(
(1, n_features), mean=visible_activation, stddev=self.stddev)
else:
vprobs = None
return vprobs
def sample_hidden_from_visible(self, visible):
"""Sample the hidden units from the visible units.
This is the Positive phase of the Contrastive Divergence algorithm.
:param visible: activations of the visible units
:return: tuple(hidden probabilities, hidden binary states)
"""
hprobs = tf.nn.sigmoid(tf.add(tf.matmul(visible, self.W), self.bh_))
hstates = utilities.sample_prob(hprobs, self.hrand)
return hprobs, hstates
def compute_positive_association(self, visible,
hidden_probs, hidden_states):
"""Compute positive associations between visible and hidden units.
:param visible: visible units
:param hidden_probs: hidden units probabilities
:param hidden_states: hidden units states
:return: positive association = dot(visible.T, hidden)
"""
if self.visible_unit_type == 'bin':
positive = tf.matmul(tf.transpose(visible), hidden_states)
elif self.visible_unit_type == 'gauss':
positive = tf.matmul(tf.transpose(visible), hidden_probs)
else:
positive = None
return positive
上述源码在这:Deep-Learning-TensorFlow