所用版本:EST 2.1, Festival 2.1, Festvox2.4(该版本兼容Festival2.1,官方发布的Festvox2.1版本可能存在兼容问题)
The process is fairly long, recording and labelling take a few hours each, and after that there
may still be extensive hand correction. It also requires a fair amount of disk space so make sure
you have at least 500 Meg free. The basic list of steps is as follows.
Create templates.
Generate prompts.
Record nonsense words.
Autolabel recorded words.
Generate diphone index Generate pitchmarks and LPC co-efficients.
Test Package for distribution.
0. Before continuing make sure that you have Speech tools, Festival, FestVox and EMU Label
correctly installed. Also make sure the following environment variable are set.
export ESTDIR=/home/...../speech_tools
export FESTVOXDIR=/home/...../FestVox
export PATH = $PATH:/home/...../Festival/bin
1. Make a directory and change to it. By convention the directory is named Insitution_language_name_type e.g.
mkdir ~/ru_us_matt_diphone
cd ~/ru_us_matt)diphone
2. FestVox provides a tool to build the basic directory structure. It takes institution, language and
name as arguments. E.g.
$FESTVOXDIR/src/diphones/setup_diphone ru us matt
The setup script also needs to copy in some language specific files. For US English the following
packages will need to be part of Festival.
FestVox_kallpc16k
festlex_POSLEX
festlex_CMU
(请参见博主另一篇文章:编译Festival手记)
3. The nonsense word list must be generated.
festival -b festvox/diphlist.scm festvox/us_schema.scm '(diphone-gen-schema "us" "etc/usdiph.list")'
4. The prompts must be synthesised so that Festival can prompt the user before recording the diphones.
festival -b festvox/diphlist.scm festvox/us_schema.scm '(diphone-gen-waves "prompt-wav" "prompt-lab" "etc/usdiph.list")'
耐心等待一长串的输出。
5.现在可以录制diphone语音库了:
bin/prompt_them etc/usdiph.list
注意,如果启动festival需要使用命令 padsp festival的话,这里的命令也必须写成
padsp bin/prompt_them etc/usdiph.list
注意,在按下回车前一定要确保你可以足够集中精力,并且在一个安静的环境中。因为接下来你要盯着屏幕不间断的录音两个小时,产生1369个录音片段。
终端会提示将要录下的音节,比如pau t aa t ae t aa pau,然后会输出合成的语音,接下来提示开始录音两秒,然后对着麦克风说完这个音节串即可。
6.所有录音完成之后就可以使用自动标注(auto label)的脚本程序对录音片段自动标注了:
bin/make_labs prompt-wav/*.wav
7.如果需要手动修改标注,那么官方文档提供的工具emulabel已经过时了,现在可以使用wavesurfer这个工具,如果是ubuntu系统可以直接在源里安装(sudo apt-get install wavesurfer)。如果要修改pitchmark则还需要Wavesurfer Pitchmark Plugin
插件,目前可以在http://mh21.de/pmedit/index.html下载(如果直接打开链接会404,那就从google搜索页中打开吧)。把他下载到~/.wavesurfer/1.8/plugins里面即可。根据邮件列表中一个朋友的描述,将wav/、lab/、mcep/、pm/目录中的文件全部拷贝到一个单独的目录中(此时我的mcep/和pm/目录是空的,见后)。然后使用wavesurfer打开一个录音文件。此时提示使用什么配置,选择transcription。点击右键->create pane ->Pitchmarks 。
注意,此时可能因为缺少pitchmark文件(pm/下)所以面板显示一片空白。Festvox提供pitchmark和label files互转的脚本make_pm_pmlab和make_pmlab_pm。使用make_pmlab_pm脚本将lab/下的label文件转换为pm文件然后拷贝到上面说到的同一个目录下即可:
bin/make_pmlab_pm lab/*.lab
8.现在diphone索引必须被建立:
mkdir dic
bin/make_diph_index etc/usdiph.list dic/mattdiph.est
这个脚本不会自动创建dic目录,所以如果语言库目录下没有dic目录的话执行会出错,所以执行命令前先创建dic目录。
9.下一步是pitchmark的提取, and then moving it to the nearest peak.
不过首先需要拷贝etc/usdiph.list 为etc/txt.done.data,这是将要执行的两个脚本中的小bug。
cp etc/usdiph.list etc/txt.done.data
bin/make_pm_wave wav/*.wav
bin/make_pm_fix pm/*.pm
此条命令如果执行失败,察看一下pm/下有没有.pm文件,如果没有,参考第7步使用make_pmlab_mp脚本。
10. You can optionally match the power, first the files must be analysed and a mean factor extracted 。
bin/find_powerfactors lab/*.lab (终于遇到个可以执行的命令了……)
And finally you can use this to build the pitch-synchronous LPC coefficients
bin/make_lpc wav/*.wav
11.现在可以测试一下我们的语音库了(Festival貌似不能使用ALSA输出,所以在前面加上padsp):
padsp festival festvox/ru_us_matt_diphone.scm '(voice_ru_us_matt_diphone)'
12. A group file must be built that contains only the bits needed from the larger wave files.
festival (us_make_group_file “group/mattlpc.group” nil)
13.切换至Festival的英语声音目录(如果是其他语言,可在voices目录下创建新目录。注意,voices文件夹下的us指的是unit selection,与美式英语没有任何关系)
cd /your/festival/directory/lib/voices/english/
为我们的声音库添加一个符号链接:
ln -s /path/to/your/voice/ru_us_matt_diphone
14.再次启动Festival,键入如下命令:
(voice_ru_us_matt_diphone)
是不是可以使用我们的声音库了?
参考资料
Creating a Voice for Festival Speech Synthesis System
Mailing list of the EMU Speech Database System
Build Synthesis Voice 2.1