Contents:
为了构建神经网络系统,必须准备语言特征作为系统输入,声学特征作为系统输出。请按照本节说明准备数据。
神经网络将矢量作为输入,所以语言特征的字母表示需要矢量化。.
“组合” 风格:
dmgc 是指the dimensionality of MCC with delta and delta delta features。如果 dmgc 设为 60,只有静态特征被使用。请为每种特征设置文件扩展名,例如
- [Extensions] mgc_ext : .mgc
- [Extensions] bap_ext : .bap
- [Extensions] lf0_ext : .lf0
开源的 WORLD vocoder同样被支持。作为SPSS的修改版本可以在版本库中找到。
如果你有自己更喜欢的声音合成机,请试着给每种特征一个昵称来匹配支持的类型。
在系统中,提供了标准神经网络结构的若干示例样本。在下面描述他们:
你可以通过选择每一隐层的隐层单元来定义自己的结构。对于每种类型的隐层,请查看Models 一节.
在‘./recipes/dnn’目录可以找到一个配置文件样例。请使用 ‘submit.sh ./run_lstm.py ./recipes/dnn/feed_foward_dnn.conf’ 来构建前馈神经网络。请修改配置文件来适应自己的工作环境(如,数据路径)。
一个样例配置文件提供于 ‘./recipes/dnn/hybrid_lstm.conf’。按照与深度前馈神经网络部分相同的示例样本进行。
样例配置文件提供于目录 ‘./recipes/blstm’ 。 ‘blstm.conf’ 用于多个双向LSTM层,而 while ‘hybrid_blstm.conf’用于混合结构,即在底层使用若干前馈层,顶部使用一个BLSTM层。
这一示例样本是为了支持Wu & King (ICASSP 2016)的论文。 提供了LSTM的若干变体。请使用相应的配置文件来做实验。
这些是不在源码评注(docstring)中的。
models.deep_rnn.
DeepRecurrentNetwork
(
n_in,
hidden_layer_size,
n_out,
L1_reg,
L2_reg,
hidden_layer_type,
output_type='LINEAR'
)
该类用于组合各种神经网络结构。从基础的前馈神经网络到双向门限递归神经网络和混合结构。混合 指前馈和递归的组合结构。
__init__
(
n_in,
hidden_layer_size,
n_out,
L1_reg,
L2_reg,
hidden_layer_type,
output_type='LINEAR'
)
该函数初始化一个神经网络
参数: |
|
---|
build_finetune_functions
(
train_shared_xy,
valid_shared_xy
)
该函数用来构建微调函数,并更新梯度
参数: |
|
---|---|
返回: | finetune functions for training and development |
parameter_prediction
(
test_set_x
)
该函数用于预测
参数: | test_set_x (python array variable) – 一个测试句子的输入特征 |
---|---|
返回: | 预测的特征 |
这些是不在源码评注(docstring)中的。
layers.gating.
VanillaRNN
(
rng,
x,
n_in,
n_h
)
该类实现了一个标准的递归神经网络:h_{t} = f(W^{hx}x_{t} + W^{hh}h_{t-1}+b_{h})
__init__
(
rng,
x,
n_in,
n_h
)
初始化一个标准的 RNN 隐藏单元
参数: |
|
---|
recurrent_as_activation_function
(
Wix,
h_tm1
)
Implement the recurrent unit as an activation function. This function is called by self.__init__().
Parameters: |
|
---|---|
Returns: | h_t is the hidden activation of current time step |
layers.gating.
LstmBase
(
rng,
x,
n_in,
n_h
)
该类作为基类提供给所有长短时记忆(LSTM)相关的类。LSTM 的若干变体在 (Wu & King, ICASSP 2016)中被探讨:Zhizheng Wu, Simon King, “Investigating gated recurrent neural networks for speech synthesis”, ICASSP 2016
__init__
(
rng,
x,
n_in,
n_h
)
Initialise all the components in a LSTM block, including input gate, output gate, forget gate, peephole connections
Parameters: |
|
---|
lstm_as_activation_function
(
)
A genetic recurrent activation function for variants of LSTM architectures.The function is called by self.recurrent_fn().
recurrent_fn
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1=None
)
This implements a genetic recurrent function, called by self.__init__().
Parameters: |
|
---|---|
Returns: | h_t is the hidden activation of current time step, and c_t is the activation for cell memory of current time step |
layers.gating.
VanillaLstm
(
rng,
x,
n_in,
n_h
)
该类实现了标准的LSTM 块,继承自类 layers.gating.LstmBase
。
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a vanilla LSTM block
Parameters: |
|
---|
lstm_as_activation_function
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1
)
This function treats the LSTM block as an activation function, and implements the standard LSTM activation function.The meaning of each input and output parameters can be found inlayers.gating.LstmBase.recurrent_fn()
layers.gating.
LstmNFG
(
rng,
x,
n_in,
n_h
)
该类实现了一个不带遗忘门的LSTM 块,继承自类 layers.gating.LstmBase
.
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a LSTM with the forget gate
Parameters: |
|
---|
lstm_as_activation_function
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1
)
This function treats the LSTM block as an activation function, and implements the LSTM (without the forget gate) activation function.The meaning of each input and output parameters can be found inlayers.gating.LstmBase.recurrent_fn()
layers.gating.
LstmNIG
(
rng,
x,
n_in,
n_h
)
该类实现了一个不带输入门的LSTM 块,继承自类 layers.gating.LstmBase
.
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a LSTM with the input gate
Parameters: |
|
---|
lstm_as_activation_function
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1
)
This function treats the LSTM block as an activation function, and implements the LSTM (without the input gate) activation function.The meaning of each input and output parameters can be found inlayers.gating.LstmBase.recurrent_fn()
layers.gating.
LstmNOG
(
rng,
x,
n_in,
n_h
)
该类实现了一个不带输出门的LSTM 块,继承自类该类实现了一个不带遗忘门的LSTM 块,继承自类 layers.gating.LstmBase
.
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a LSTM with the output gate
Parameters: |
|
---|
lstm_as_activation_function
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1
)
This function treats the LSTM block as an activation function, and implements the LSTM (without the output gate) activation function.The meaning of each input and output parameters can be found inlayers.gating.LstmBase.recurrent_fn()
layers.gating.
LstmNoPeepholes
(
rng,
x,
n_in,
n_h
)
该类实现了一个不带peephole connections的LSTM 块,继承自类
layers.gating.LstmBase
.
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a LSTM with the peephole connections
Parameters: |
|
---|
lstm_as_activation_function
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1
)
This function treats the LSTM block as an activation function, and implements the LSTM (without the output gate) activation function.The meaning of each input and output parameters can be found inlayers.gating.LstmBase.recurrent_fn()
layers.gating.
SimplifiedLstm
(
rng,
x,
n_in,
n_h
)
该类实现了一个简化的LSTM 块,只保留遗忘门,继承自类 layers.gating.LstmBase
.
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a LSTM with the peephole connections
Parameters: |
|
---|
lstm_as_activation_function
(
Wix,
Wfx,
Wcx,
Wox,
h_tm1,
c_tm1
)
This function treats the LSTM block as an activation function, and implements the LSTM (simplified LSTM) activation function.The meaning of each input and output parameters can be found inlayers.gating.LstmBase.recurrent_fn()
layers.gating.
GatedRecurrentUnit
(
rng,
x,
n_in,
n_h
)
该类实现了一个门限递归单元(GRU),提出于 Cho et al 2014 (http://arxiv.org/pdf/1406.1078.pdf).
__init__
(
rng,
x,
n_in,
n_h
)
Initialise a gated recurrent unit
Parameters: |
|
---|
io_funcs.binary_io.
BinaryIOCollection
utils.providers.
ListDataProvider
(
x_file_list,
y_file_list,
n_ins=0,
n_outs=0,
buffer_size=500000,
sequential=False,
shuffle=False
)
该类提供接口以逐句或逐块将数据加载到CPU/GPU记忆空间。
在语音合成中,通常我们不能加载所有训练数据/评估数据到RAM中,我们要做以下三步:
当使用连续训练时,逐句数据加载有效。当帧的顺序不重要时,逐块加载被使用。
该类假定二进制格式带float32精度,不带任何header(例如,HTK header)。
__init__
(
x_file_list,
y_file_list,
n_ins=0,
n_outs=0,
buffer_size=500000,
sequential=False,
shuffle=False
)
初始化一个data provider
参数: |
|
---|
load_next_partition
(
)
加载一块数据。帧的数量为初始化时设置的缓冲大小。
load_next_utterance
(
)
加载一句数据。当逐句加载时(如顺序训练)该方法被调用。
使数据能被theano实现共享。如果想知道为什么要使其共享,请参考theano文档:http://deeplearning.net/software/theano/library/compile/shared.html
参数: |
|
---|---|
Returns: | 共享的数据集 – data_set |
reset
(
)
当文件列表中的所有文件被用于DNN训练,重置 data provider 以开始新一轮迭代。
frontend.label_normalisation.
HTSLabelNormalisation
(
question_file_name=None,
subphone_feats='full',
continuous_flag=True
)
该类用于将HTS格式的标签转换成连续或二进制值,并以float32精度二进制格式存储。
QS: 和HTS中使用的一样
CQS: 是本系统中新定义的问题。这儿有一个问题的样例: CQS C-Syl-Tone {_(d+)+}。正则表达式用于连续值。
HTS标签中应该有时间对齐信息。这儿有一个HTS标签的样例:
3050000 3100000 xx~#-p+l=i:1_4/A/0_0_0/B/1-1-4:1-1&1-4#1-3$1-4>0-1<0-1|i/C/1+1+3/D/0_0/E/content+1:1+3&1+2#0+1/F/content_1/G/0_0/H/4=3:1=1&L-L%/I/0_0/J/4+3-1[2]
3100000 3150000 xx~#-p+l=i:1_4/A/0_0_0/B/1-1-4:1-1&1-4#1-3$1-4>0-1<0-1|i/C/1+1+3/D/0_0/E/content+1:1+3&1+2#0+1/F/content_1/G/0_0/H/4=3:1=1&L-L%/I/0_0/J/4+3-1[3]
3150000 3250000 xx~#-p+l=i:1_4/A/0_0_0/B/1-1-4:1-1&1-4#1-3$1-4>0-1<0-1|i/C/1+1+3/D/0_0/E/content+1:1+3&1+2#0+1/F/content_1/G/0_0/H/4=3:1=1&L-L%/I/0_0/J/4+3-1[4]
3250000 3350000 xx~#-p+l=i:1_4/A/0_0_0/B/1-1-4:1-1&1-4#1-3$1-4>0-1<0-1|i/C/1+1+3/D/0_0/E/content+1:1+3&1+2#0+1/F/content_1/G/0_0/H/4=3:1=1&L-L%/I/0_0/J/4+3-1[5]
3350000 3900000 xx~#-p+l=i:1_4/A/0_0_0/B/1-1-4:1-1&1-4#1-3$1-4>0-1<0-1|i/C/1+1+3/D/0_0/E/content+1:1+3&1+2#0+1/F/content_1/G/0_0/H/4=3:1=1&L-L%/I/0_0/J/4+3-1[6]
305000 310000 是起始和终止时间。[2], [3], [4], [5], [6] 指HMM状态索引。
wildcards2regex
(
question,
convert_number_pattern=False
)
将HTK风格的问题转换为正则表达式来搜索标签。如果 convert_number_pattern为真,保持以下序列非转义来提取连续值:
(d+) – 处理不含小数点的数 ([d.]+) – 处理含与不含小数点的数