Training a deep autoencoder or a classifier on MNIST digits_Rbm训练(python)
一、Rbm阅读材料
http://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
http://deeplearning.net/tutorial/rbm.html
二、Rbm训练的基本原理
三、Rbm代码分析
我们建立一个RBM类型。网络参数可以经过结构体初始化,也可以通过参数初始化。当一个RBM被当作 一个深度网络的搭建块时,初始化是有有用的。RBM作为深度网络 的基本块时,权值矩阵和隐藏层偏置都由一个多层感知器相应的sigmoid层共享。
class RBM(object):
"""Restricted Boltzmann Machine (RBM) """
def __init__(self, input=None, n_visible=784, n_hidden=500,
W=None, hbias=None, vbias=None, numpy_rng=None,
theano_rng=None):
"""
RBM constructor. Defines the parameters of the model along with
basic operations for inferring hidden from visible (and vice-versa),
as well as for performing CD updates.
:param input: None for standalone RBMs or symbolic variable if RBM is
part of a larger graph.
:param n_visible: number of visible units
:param n_hidden: number of hidden units
:param W: None for standalone RBMs or symbolic variable pointing to a
shared weight matrix in case RBM is part of a DBN network; in a DBN,
the weights are shared between RBMs and layers of a MLP
:param hbias: None for standalone RBMs or symbolic variable pointing
to a shared hidden units bias vector in case RBM is part of a
different network
:param vbias: None for standalone RBMs or a symbolic variable
pointing to a shared visible units bias
"""
self.n_visible = n_visible
self.n_hidden = n_hidden
if numpy_rng is None:
# create a number generator
numpy_rng = numpy.random.RandomState(1234)
if theano_rng is None:
theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
if W is None :
# W is initialized with `initial_W` which is uniformely sampled
# from -4.*sqrt(6./(n_visible+n_hidden)) and 4.*sqrt(6./(n_hidden+n_visible))
# the output of uniform if converted using asarray to dtype
# theano.config.floatX so that the code is runable on GPU
initial_W = numpy.asarray(numpy.random.uniform(
low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
size=(n_visible, n_hidden)),
dtype=theano.config.floatX)
# theano shared variables for weights and biases
W = theano.shared(value=initial_W, name='W')
if hbias is None :
# create shared variable for hidden units bias
hbias = theano.shared(value=numpy.zeros(n_hidden,
dtype=theano.config.floatX), name='hbias')
if vbias is None :
# create shared variable for visible units bias
vbias = theano.shared(value =numpy.zeros(n_visible,
dtype = theano.config.floatX),name='vbias')
# initialize input layer for standalone RBM or layer0 of DBN
self.input = input if input else T.dmatrix('input')
self.W = W
self.hbias = hbias
self.vbias = vbias
self.theano_rng = theano_rng
# **** WARNING: It is not a good idea to put things in this list
# other than shared variables created in this function.
self.params = [self.W, self.hbias, self.vbias]
Next step is to define functions which construct the symbolic graph associated with Eqs. (7) - (8). The code is as follows:
def propup(self, vis):
''' This function propagates the visible units activation upwards to
the hidden units
Note that we return also the pre_sigmoid_activation of the layer. As
it will turn out later, due to how Theano deals with optimization and
stability this symbolic variable will be needed to write down a more
stable graph (see details in the reconstruction cost function)
'''
pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_h_given_v(self, v0_sample):
''' This function infers state of hidden units given visible units '''
# compute the activation of the hidden units given a sample of the visibles
pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
# get a sample of the hiddens given their activation
# Note that theano_rng.binomial returns a symbolic sample of dtype
# int64 by default. If we want to keep our computations in floatX
# for the GPU we need to specify to return the dtype floatX
h1_sample = self.theano_rng.binomial(size=h1_mean.shape, n=1, p=h1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_h1, h1_mean, h1_sample]
def propdown(self, hid):
'''This function propagates the hidden units activation downwards to
the visible units
Note that we return also the pre_sigmoid_activation of the layer. As
it will turn out later, due to how Theano deals with optimization and
stability this symbolic variable will be needed to write down a more
stable graph (see details in the reconstruction cost function)
'''
pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias
return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_v_given_h(self, h0_sample):
''' This function infers state of visible units given hidden units '''
# compute the activation of the visible given the hidden sample
pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
# get a sample of the visible given their activation
# Note that theano_rng.binomial returns a symbolic sample of dtype
# int64 by default. If we want to keep our computations in floatX
# for the GPU we need to specify to return the dtype floatX
v1_sample = self.theano_rng.binomial(size=v1_mean.shape,n=1, p=v1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_v1, v1_mean, v1_sample]
We can then use these functions to define the symbolic graph for a Gibbs sampling step. We define two functions:
•gibbs_vhv which performs a step of Gibbs sampling starting from the visible units. As we shall see, this will be useful for sampling from the RBM.
•gibbs_hvh which performs a step of Gibbs sampling starting from the hidden units. This function will be useful for performing CD and PCD updates.
The code is as follows:
def gibbs_hvh(self, h0_sample):
''' This function implements one step of Gibbs sampling,
starting from the hidden state'''
pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
return [pre_sigmoid_v1, v1_mean, v1_sample, pre_sigmoid_h1, h1_mean, h1_sample]
def gibbs_vhv(self, v0_sample):
''' This function implements one step of Gibbs sampling,
starting from the visible state'''
pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
return [pre_sigmoid_h1, h1_mean, h1_sample, pre_sigmoid_v1, v1_mean, v1_sample]
我们可以采用这些函数来定义一次吉布斯采样的符号图形。我们定义了两个函数:
gibbs_vhv :这个函数从可视单元开始,执行一次吉布斯采样。正如我们所看见的,这对从RBM采样是有用的。
gibbs_hvh :这个函数从隐藏层单元开始,执行一次吉布斯采样。这个函数对执行CD和PCD 的更新有是帮助的。
这两个函数的代码如下:
def gibbs_hvh(self, h0_sample):
''' This function implements one step of Gibbs sampling,
starting from the hidden state'''
pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
return [pre_sigmoid_v1, v1_mean, v1_sample, pre_sigmoid_h1, h1_mean, h1_sample]
def gibbs_vhv(self, v0_sample):
''' This function implements one step of Gibbs sampling,
starting from the visible state'''
pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
return [pre_sigmoid_h1, h1_mean, h1_sample, pre_sigmoid_v1, v1_mean, v1_sample]