在运行完timit示例后,开始运行中文库thchs30。在运行的过程中,遇到的第一个错误如下:
decode.sh: feature type is lda
steps/align_fmllr.sh: doing final alignment.
ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstdeterminizestar[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstdeterminizestar() [0x626fe2]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string, std::allocator >)
main
__libc_start_main
_start
ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstrmsymbols[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstrmsymbols() [0x54d89c]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string, std::allocator >)
main
__libc_start_main
_start
ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstrmepslocal[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstrmepslocal() [0x5739d4]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string, std::allocator >)
main
__libc_start_main
_start
ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstminimizeencoded[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input
[ Stack-Trace: ]
fstminimizeencoded() [0x5c3b92]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string, std::allocator >)
main
__libc_start_main
_start
在请教大神@wbglearn点击打开链接后,才知道是脚本在并行运算的时候出错了,解决办法是把下面代码中红色标注的并行运算符号&去掉
#monophone
steps/train_mono.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono || exit 1;
#test monophone model
local/thchs-30_decode.sh --mono true --nj $n "steps/decode.sh" exp/mono data/mfcc &
#monophone_ali
steps/align_si.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono exp/mono_ali || exit 1;
#triphone
steps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
#test tri1 model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri1 data/mfcc &
#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;
#lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali exp/tri2b || exit 1;
#test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &
#lda_mllt_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" --use-graphs true data/mfcc/train data/lang exp/tri2b exp/tri2b_ali || exit 1;
#sat
steps/train_sat.sh --cmd "$train_cmd" 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1;
#test tri3b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc &
#sat_ali
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri3b exp/tri3b_ali || exit 1;
#quick
steps/train_quick.sh --cmd "$train_cmd" 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1;
#test tri4b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri4b data/mfcc &
#quick_ali
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri4b exp/tri4b_ali || exit 1;
#quick_ali_cv
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/dev data/lang exp/tri4b exp/tri4b_ali_cv || exit 1;
#train dnn model
local/nnet/run_dnn.sh --stage 0 --nj $n exp/tri4b exp/tri4b_ali exp/tri4b_ali_cv || exit 1;
然后上面的错误就能解决了。
但是在跑带噪语音dae的时候又出现了错误:
num_fea = 40
run.pl: job failed, log is in exp/tri4b_dnn_dae/log/train_nnet.log
任务失败,错误日志在上面那个路径的文件中,打开对应的文件找到错误如下:
steps/nnet/train_scheduler.sh: line 86: 21609 Segmentation fault (core dumped)
$train_tool --cross-validate=true --randomize=false --verbose=$verbose
$train_tool_opts ${feature_transform:+ --feature-transform=$feature_transform}
${frame_weights:+ "--frame-weights=$frame_weights"}
${utt_weights:+ "--utt-weights=$utt_weights"} "$feats_cv" "$labels_cv"
$mlp_best 2>> $log
在同样的文件夹下还有个日志文件,里面有错误如下:
LOG (nnet-train-frmshuff[5.1]:Init():nnet-randomizer.cc:32) Seeding by srand with : 777
LOG (nnet-train-frmshuff[5.1]:main():nnet-train-frmshuff.cc:157) CROSS-VALIDATION STARTED
apply-cmvn --norm-vars=false scp:exp/tri4b_dnn_dae/tgt_cmvn.scp ark:- ark:-
copy-feats scp:exp/tri4b_dnn_dae/tgt_feats.scp ark:-
WARNING (apply-cmvn[5.1]:Open():util/kaldi-table-inl.h:1650) Script file exp/tri4b_dnn_dae/tgt_cmvn.scp contains duplicate key: A02
ERROR (apply-cmvn[5.1]:RandomAccessTableReader():util/kaldi-table-inl.h:2528) Error opening RandomAccessTableReader object (rspecifier is: scp:exp/tri4b_dnn_dae/tgt_cmvn.scp)
[ Stack-Trace: ]
apply-cmvn() [0x5413ae]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::RandomAccessTableReader > >::RandomAccessTableReader(std::__cxx11::basic_string, std::allocator > const&)
kaldi::RandomAccessTableReaderMapped > >::RandomAccessTableReaderMapped(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)
main
__libc_start_main
_start
这个问题真的不知道该怎么解决了,怪自己太菜。由于自己做的ASR降噪部分没用kaldi的DNN,所以这个问题对自己的研究方向没大的影响,所以就先搁置了。
对于纯净语音的解码结果及识别率等信息保存路在文件/home/wang/download/KALDI_ROOT/egs/thchs30/s5/exp。里面对应的tir1 tri2b tri3b tri4b tri4b_dnn文件夹下就是识别结果。
虽然这个问题没大影响,但是总像肉中刺一样难受,如果有人遇到同样的问题欢迎和我交流。