教程地址:http://www.deeplearning.net/tutorial/rnnrbm.html#rnnrbm
代码,数据集,论文 见教程。
RNN-RBM也是能量模型,用于对时间序列的密度估计,在 time step t 的特征向量 为高维向量。
它可以描述多峰的条件概率分布 , where。
表示在 time t 时刻,历史序列(t 之前所有)。
每一个 time step 就是一个 RBM,而RBM的参数又由 RNN(隐藏层为)决定。
(1)
(2)
The overall probability distribution is given by the sum over the time stepsin agiven sequence:
(4)
where the right-hand side multiplicand is the marginalized probability of the RBM.
教程实现了两个函数:一个训练RNN-RBM,另一个生成采样序列。
训练时,有, RNN隐藏层 and 参数 (可以计算出)。参数更新为SGD随机梯度下降,和RBM训练类似,使用 contrastive divergence (CD)算法。
序列的生成和RNN相似,只是在每个time step 都需要按RBM中的Gibbs采样得出。
函数 build_rbm 建立RBM部分(图例中上部)Gibbs链,输入为 mini-batch(a binary matrix);输入也可以不是 mini-batch(也可以说当mini-batch为1时),a binary vector。
def build_rbm(v, W, bv, bh, k): '''Construct a k-step Gibbs chain starting at v for an RBM. v : Theano vector or matrix If a matrix, multiple chains will be run in parallel (batch). W : Theano matrix Weight matrix of the RBM. bv : Theano vector Visible bias vector of the RBM. bh : Theano vector Hidden bias vector of the RBM. k : scalar or Theano scalar Length of the Gibbs chain. Return a (v_sample, cost, monitor, updates) tuple: v_sample : Theano vector or matrix with the same shape as `v` Corresponds to the generated sample(s). cost : Theano scalar Expression whose gradient with respect to W, bv, bh is the CD-k approximation to the log-likelihood of `v` (training example) under the RBM. The cost is averaged in the batch case. monitor: Theano scalar Pseudo log-likelihood (also averaged in the batch case). updates: dictionary of Theano variable -> Theano variable The `updates` object returned by scan.''' def gibbs_step(v): mean_h = T.nnet.sigmoid(T.dot(v, W) + bh) h = rng.binomial(size=mean_h.shape, n=1, p=mean_h, dtype=theano.config.floatX) mean_v = T.nnet.sigmoid(T.dot(h, W.T) + bv) v = rng.binomial(size=mean_v.shape, n=1, p=mean_v, dtype=theano.config.floatX) return mean_v, v chain, updates = theano.scan(lambda v: gibbs_step(v)[1], outputs_info=[v], n_steps=k) v_sample = chain[-1] mean_v = gibbs_step(v_sample)[0] monitor = T.xlogx.xlogy0(v, mean_v) + T.xlogx.xlogy0(1 - v, 1 - mean_v) monitor = monitor.sum() / v.shape[0] def free_energy(v): return -(v * bv).sum() - T.log(1 + T.exp(T.dot(v, W) + bh)).sum() cost = (free_energy(v) - free_energy(v_sample)) / v.shape[0] return v_sample, cost, monitor, updates
函数 build_rnnrbm 融合RNN和RBM,关系如上文图例。
RNN部分(图例中下部)在训练时: 已知,RNN的训练不需要RBM的参数,先把RNN中隐藏层 u0到uT 按公式(3)计算出来,再把 T 个RBM 一次性构建出来。
模型训练完成后:RNN和RBM互相影响, 需在每个 time step t 通过 t 时刻的 RBM Gibbs采样得到,从而计算 t 时刻的RNN隐藏层 u以及后续 time step 的RBM(公式(2,3))和RNN。
def build_rnnrbm(n_visible, n_hidden, n_hidden_recurrent): '''Construct a symbolic RNN-RBM and initialize parameters. n_visible : integer Number of visible units. n_hidden : integer Number of hidden units of the conditional RBMs. n_hidden_recurrent : integer Number of hidden units of the RNN. Return a (v, v_sample, cost, monitor, params, updates_train, v_t, updates_generate) tuple: v : Theano matrix Symbolic variable holding an input sequence (used during training) v_sample : Theano matrix Symbolic variable holding the negative particles for CD log-likelihood gradient estimation (used during training) cost : Theano scalar Expression whose gradient (considering v_sample constant) corresponds to the LL gradient of the RNN-RBM (used during training) monitor : Theano scalar Frame-level pseudo-likelihood (useful for monitoring during training) params : tuple of Theano shared variables The parameters of the model to be optimized during training. updates_train : dictionary of Theano variable -> Theano variable Update object that should be passed to theano.function when compiling the training function. v_t : Theano matrix Symbolic variable holding a generated sequence (used during sampling) updates_generate : dictionary of Theano variable -> Theano variable Update object that should be passed to theano.function when compiling the generation function.''' W = shared_normal(n_visible, n_hidden, 0.01) bv = shared_zeros(n_visible) bh = shared_zeros(n_hidden) Wuh = shared_normal(n_hidden_recurrent, n_hidden, 0.0001) Wuv = shared_normal(n_hidden_recurrent, n_visible, 0.0001) Wvu = shared_normal(n_visible, n_hidden_recurrent, 0.0001) Wuu = shared_normal(n_hidden_recurrent, n_hidden_recurrent, 0.0001) bu = shared_zeros(n_hidden_recurrent) params = W, bv, bh, Wuh, Wuv, Wvu, Wuu, bu # learned parameters as shared # variables v = T.matrix() # a training sequence u0 = T.zeros((n_hidden_recurrent,)) # initial value for the RNN hidden # units # If `v_t` is given, deterministic recurrence to compute the variable # biases bv_t, bh_t at each time step. If `v_t` is None, same recurrence # but with a separate Gibbs chain at each time step to sample (generate) # from the RNN-RBM. The resulting sample v_t is returned in order to be # passed down to the sequence history. def recurrence(v_t, u_tm1): bv_t = bv + T.dot(u_tm1, Wuv) bh_t = bh + T.dot(u_tm1, Wuh) generate = v_t is None if generate: v_t, _, _, updates = build_rbm(T.zeros((n_visible,)), W, bv_t, bh_t, k=25) u_t = T.tanh(bu + T.dot(v_t, Wvu) + T.dot(u_tm1, Wuu)) return ([v_t, u_t], updates) if generate else [u_t, bv_t, bh_t] # For training, the deterministic recurrence is used to compute all the # {bv_t, bh_t, 1 <= t <= T} given v. Conditional RBMs can then be trained # in batches using those parameters. (u_t, bv_t, bh_t), updates_train = theano.scan( lambda v_t, u_tm1, *_: recurrence(v_t, u_tm1), sequences=v, outputs_info=[u0, None, None], non_sequences=params) v_sample, cost, monitor, updates_rbm = build_rbm(v, W, bv_t[:], bh_t[:], k=15) updates_train.update(updates_rbm) # symbolic loop for sequence generation (v_t, u_t), updates_generate = theano.scan( lambda u_tm1, *_: recurrence(None, u_tm1), outputs_info=[None, u0], non_sequences=params, n_steps=200) return (v, v_sample, cost, monitor, params, updates_train, v_t, updates_generate)
class RnnRbm: '''Simple class to train an RNN-RBM from MIDI files and to generate sample sequences.''' def __init__( self, n_hidden=150, n_hidden_recurrent=100, lr=0.001, r=(21, 109), dt=0.3 ): '''Constructs and compiles Theano functions for training and sequence generation. n_hidden : integer Number of hidden units of the conditional RBMs. n_hidden_recurrent : integer Number of hidden units of the RNN. lr : float Learning rate r : (integer, integer) tuple Specifies the pitch range of the piano-roll in MIDI note numbers, including r[0] but not r[1], such that r[1]-r[0] is the number of visible units of the RBM at a given time step. The default (21, 109) corresponds to the full range of piano (88 notes). dt : float Sampling period when converting the MIDI files into piano-rolls, or equivalently the time difference between consecutive time steps.''' self.r = r self.dt = dt (v, v_sample, cost, monitor, params, updates_train, v_t, updates_generate) = build_rnnrbm( r[1] - r[0], n_hidden, n_hidden_recurrent ) gradient = T.grad(cost, params, consider_constant=[v_sample]) updates_train.update( ((p, p - lr * g) for p, g in zip(params, gradient)) ) self.train_function = theano.function( [v], monitor, updates=updates_train ) self.generate_function = theano.function( [], v_t, updates=updates_generate ) def train(self, files, batch_size=100, num_epochs=200): '''Train the RNN-RBM via stochastic gradient descent (SGD) using MIDI files converted to piano-rolls. files : list of strings List of MIDI files that will be loaded as piano-rolls for training. batch_size : integer Training sequences will be split into subsequences of at most this size before applying the SGD updates. num_epochs : integer Number of epochs (pass over the training set) performed. The user can safely interrupt training with Ctrl+C at any time.''' assert len(files) > 0, 'Training set is empty!' \ ' (did you download the data files?)' dataset = [midiread(f, self.r, self.dt).piano_roll.astype(theano.config.floatX) for f in files] try: for epoch in range(num_epochs): numpy.random.shuffle(dataset) costs = [] for s, sequence in enumerate(dataset): for i in range(0, len(sequence), batch_size): cost = self.train_function(sequence[i:i + batch_size]) costs.append(cost) print('Epoch %i/%i' % (epoch + 1, num_epochs)) print(numpy.mean(costs)) sys.stdout.flush() except KeyboardInterrupt: print('Interrupted by user.') def generate(self, filename, show=True): '''Generate a sample sequence, plot the resulting piano-roll and save it as a MIDI file. filename : string A MIDI file will be created at this location. show : boolean If True, a piano-roll of the generated sequence will be shown.''' piano_roll = self.generate_function() midiwrite(filename, piano_roll, self.r, self.dt) if show: extent = (0, self.dt * len(piano_roll)) + self.r pylab.figure() pylab.imshow(piano_roll.T, origin='lower', aspect='auto', interpolation='nearest', cmap=pylab.cm.gray_r, extent=extent) pylab.xlabel('time (s)') pylab.ylabel('MIDI note number') pylab.title('generated piano-roll')
在Nottingham数据集上运行 200 epochs,训练大约 24 小时。
The figures below show the piano-rolls of two sample sequences and we provide the corresponding MIDI files:
感觉教程对数据集介绍不太清楚,不太知道输入输出是啥,下面介绍下:
不难看出 piano-rolls 为输入,为持续60s的序列(对应 time step)样本,黑色表示1,白色表示0,对应文中提到的输入为 binary ,看下图应该很容易知道 piano-rolls 具体是啥了。
Nottingham数据集中每个样本为一个行数为150左右(代表150个time step),列数为88(代表21-109的midi note number,对应钢琴的音域A0到C8,可以说音域吗。。。)的矩阵,代表一段曲子的 piano-rolls,矩阵元素都为0或1。
目的是通过 序列piano-rolls 预测 后续 piano-rolls。