TIMIT这样那样的问题,都是我们自己的问题。跑不了DNN? 不存在的。
这是运行结果。下面有详细输出。
sv@HP:~/lkaldi/egs/timit/s5$ cat RESULTS
# Use caution when comparing these results with other published results.
Training Set : 3696 sentences 4620 sentences
Dev Set : 400 sentences
Test Set : 192 sentences Core Test Set (different from Full 1680 sent. set)
Language Model : Bigram phoneme language model which is extracted from training set
Phone mapping : Training with 48 phonemes, for testing mapped to 39 phonemes
# monophone, deltas.
---------------------------------Dev Set------------------------------------------
%WER 31.7 | 400 15057 | 71.8 19.5 8.7 3.5 31.7 100.0 | -0.457 | exp/mono/decode_dev/score_5/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 32.7 | 192 7215 | 70.5 19.8 9.6 3.2 32.7 100.0 | -0.482 | exp/mono/decode_test/score_5/ctm_39phn.filt.sys
# tri1 : first triphone system (delta+delta-delta features)
---------------------------------Dev Set------------------------------------------
%WER 25.1 | 400 15057 | 78.9 15.9 5.2 4.0 25.1 99.8 | -0.178 | exp/tri1/decode_dev/score_10/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 25.6 | 192 7215 | 78.3 15.9 5.8 3.9 25.6 100.0 | -0.129 | exp/tri1/decode_test/score_10/ctm_39phn.filt.sys
# tri2 : an LDA+MLLT system
---------------------------------Dev Set------------------------------------------
%WER 23.0 | 400 15057 | 80.7 14.6 4.7 3.7 23.0 99.5 | -0.230 | exp/tri2/decode_dev/score_10/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 23.7 | 192 7215 | 80.0 14.8 5.2 3.7 23.7 99.5 | -0.284 | exp/tri2/decode_test/score_10/ctm_39phn.filt.sys
# tri3 : Speaker Adaptive Training (SAT) system
---------------------------------Dev Set------------------------------------------
%WER 20.3 | 400 15057 | 82.7 12.8 4.5 3.1 20.3 99.8 | -0.556 | exp/tri3/decode_dev/score_10/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 21.6 | 192 7215 | 81.6 13.6 4.9 3.2 21.6 99.5 | -0.560 | exp/tri3/decode_test/score_10/ctm_39phn.filt.sys
# SGMM2 Training :
---------------------------------Dev Set------------------------------------------
%WER 17.8 | 400 15057 | 85.1 11.0 3.9 2.9 17.8 99.3 | -0.451 | exp/sgmm2_4/decode_dev/score_7/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 19.7 | 192 7215 | 83.2 12.2 4.6 3.0 19.7 99.0 | -0.291 | exp/sgmm2_4/decode_test/score_8/ctm_39phn.filt.sys
# SGMM2 + MMI Training :
---------------------------------Dev Set------------------------------------------
%WER 18.0 | 400 15057 | 85.6 11.2 3.3 3.6 18.0 98.8 | -0.599 | exp/sgmm2_4_mmi_b0.1/decode_dev_it1/score_6/ctm_39phn.filt.sys
%WER 18.0 | 400 15057 | 85.7 11.2 3.1 3.6 18.0 99.0 | -0.619 | exp/sgmm2_4_mmi_b0.1/decode_dev_it2/score_6/ctm_39phn.filt.sys
%WER 18.1 | 400 15057 | 85.6 11.3 3.1 3.7 18.1 98.8 | -0.646 | exp/sgmm2_4_mmi_b0.1/decode_dev_it3/score_6/ctm_39phn.filt.sys
%WER 18.1 | 400 15057 | 85.3 11.3 3.4 3.4 18.1 99.0 | -0.463 | exp/sgmm2_4_mmi_b0.1/decode_dev_it4/score_7/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 19.9 | 192 7215 | 83.4 12.3 4.3 3.4 19.9 99.5 | -0.300 | exp/sgmm2_4_mmi_b0.1/decode_test_it1/score_8/ctm_39phn.filt.sys
%WER 20.2 | 192 7215 | 83.0 12.3 4.6 3.2 20.2 99.0 | -0.208 | exp/sgmm2_4_mmi_b0.1/decode_test_it2/score_9/ctm_39phn.filt.sys
%WER 20.2 | 192 7215 | 83.4 12.4 4.2 3.7 20.2 99.5 | -0.333 | exp/sgmm2_4_mmi_b0.1/decode_test_it3/score_8/ctm_39phn.filt.sys
%WER 20.3 | 192 7215 | 83.0 12.6 4.5 3.3 20.3 99.0 | -0.235 | exp/sgmm2_4_mmi_b0.1/decode_test_it4/score_9/ctm_39phn.filt.sys
# bMMI not helpful here...
# Hybrid System (Dans DNN):
---------------------------------Dev Set------------------------------------------
%WER 21.1 | 400 15057 | 81.9 12.6 5.6 3.0 21.1 99.5 | -0.485 | exp/tri4_nnet/decode_dev/score_5/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 23.0 | 192 7215 | 79.4 13.5 7.1 2.4 23.0 100.0 | -0.138 | exp/tri4_nnet/decode_test/score_7/ctm_39phn.filt.sys
# Hybrid System (Karel's DNN)
---------------------------------Dev Set------------------------------------------
%WER 17.5 | 400 15057 | 84.6 10.5 4.8 2.2 17.5 98.5 | -0.471 | exp/dnn4_pretrain-dbn_dnn/decode_dev/score_6/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 18.5 | 192 7215 | 84.2 11.0 4.8 2.7 18.5 100.0 | -1.151 | exp/dnn4_pretrain-dbn_dnn/decode_test/score_4/ctm_39phn.filt.sys
# Hybrid System (Karel's DNN), sMBR training
---------------------------------Dev Set------------------------------------------
%WER 17.3 | 400 15057 | 85.5 10.6 4.0 2.7 17.3 98.5 | -0.696 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_dev_it1/score_5/ctm_39phn.filt.sys
%WER 17.3 | 400 15057 | 85.4 10.7 3.9 2.7 17.3 98.5 | -0.380 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_dev_it6/score_7/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 18.6 | 192 7215 | 84.2 11.1 4.7 2.8 18.6 100.0 | -0.816 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_test_it1/score_5/ctm_39phn.filt.sys
%WER 18.8 | 192 7215 | 84.7 11.4 3.9 3.5 18.8 100.0 | -0.819 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_test_it6/score_5/ctm_39phn.filt.sys
# sMBR not helpful here...
# Combination SGMM + Dans DNN:
---------------------------------Dev Set------------------------------------------
%WER 16.7 | 400 15057 | 86.0 10.9 3.1 2.7 16.7 99.5 | -0.102 | exp/combine_2/decode_dev_it1/score_6/ctm_39phn.filt.sys
%WER 16.7 | 400 15057 | 86.4 10.8 2.8 3.1 16.7 99.5 | -0.248 | exp/combine_2/decode_dev_it2/score_5/ctm_39phn.filt.sys
%WER 16.8 | 400 15057 | 85.8 10.9 3.3 2.6 16.8 99.3 | -0.013 | exp/combine_2/decode_dev_it3/score_7/ctm_39phn.filt.sys
%WER 16.9 | 400 15057 | 86.2 11.0 2.8 3.1 16.9 99.8 | -0.240 | exp/combine_2/decode_dev_it4/score_5/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 18.4 | 192 7215 | 84.6 12.0 3.5 3.0 18.4 99.0 | -0.223 | exp/combine_2/decode_test_it1/score_5/ctm_39phn.filt.sys
%WER 18.5 | 192 7215 | 84.5 12.1 3.4 3.0 18.5 99.0 | -0.215 | exp/combine_2/decode_test_it2/score_5/ctm_39phn.filt.sys
%WER 18.5 | 192 7215 | 84.4 12.0 3.7 2.9 18.5 99.0 | -0.074 | exp/combine_2/decode_test_it3/score_6/ctm_39phn.filt.sys
%WER 18.6 | 192 7215 | 84.9 12.0 3.1 3.6 18.6 99.0 | -0.451 | exp/combine_2/decode_test_it4/score_4/ctm_39phn.filt.sys
sv@HP:~$ sudo lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
sv@HP:~$ cat /proc/cpuinfo | grep model\ name
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
sv@HP:~$ cat /proc/meminfo | grep MemTotal
MemTotal: 16321360 kB
sv@HP:~$ lspci | grep 'VGA'
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
一网打尽。
sv@HP:~/lkaldi/egs/timit/s5$ ./run.sh
===================================================================
Data & Lexicon & Language Preparation
===================================================================
wav-to-duration --read-entire-file=true scp:train_wav.scp ark,t:train_dur.ark
LOG (wav-to-duration[5.5.164~1-9698]:main():wav-to-duration.cc:92) Printed duration for 3696 audio files.
LOG (wav-to-duration[5.5.164~1-9698]:main():wav-to-duration.cc:94) Mean duration was 3.06336, min and max durations were 0.91525, 7.78881
wav-to-duration --read-entire-file=true scp:dev_wav.scp ark,t:dev_dur.ark
LOG (wav-to-duration[5.5.164~1-9698]:main():wav-to-duration.cc:92) Printed duration for 400 audio files.
LOG (wav-to-duration[5.5.164~1-9698]:main():wav-to-duration.cc:94) Mean duration was 3.08212, min and max durations were 1.09444, 7.43681
wav-to-duration --read-entire-file=true scp:test_wav.scp ark,t:test_dur.ark
LOG (wav-to-duration[5.5.164~1-9698]:main():wav-to-duration.cc:92) Printed duration for 192 audio files.
LOG (wav-to-duration[5.5.164~1-9698]:main():wav-to-duration.cc:94) Mean duration was 3.03646, min and max durations were 1.30562, 6.21444
Data preparation succeeded
LOGFILE:/dev/null
$bin/ngt -i="$inpfile" -n=$order -gooout=y -o="$gzip -c > $tmpdir/ngram.${sdict}.gz" -fd="$tmpdir/$sdict" $dictionary $additional_parameters >> $logfile 2>&1
$bin/ngt -i="$inpfile" -n=$order -gooout=y -o="$gzip -c > $tmpdir/ngram.${sdict}.gz" -fd="$tmpdir/$sdict" $dictionary $additional_parameters >> $logfile 2>&1
$scr/build-sublm.pl $verbose $prune $prune_thr_str $smoothing "$additional_smoothing_parameters" --size $order --ngrams "$gunzip -c $tmpdir/ngram.${sdict}.gz" -sublm $tmpdir/lm.$sdict $additional_parameters >> $logfile 2>&1
inpfile: data/local/lm_tmp/lm_phone_bg.ilm.gz
outfile: /dev/stdout
loading up to the LM level 1000 (if any)
dub: 10000000
OOV code is 50
OOV code is 50
Saving in txt format to /dev/stdout
Dictionary & language model preparation succeeded
utils/prepare_lang.sh --sil-prob 0.0 --position-dependent-phones false --num-sil-states 3 data/local/dict sil data/local/lang_tmp data/lang
Checking data/local/dict/silence_phones.txt ...
--> reading data/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/silence_phones.txt is OK
Checking data/local/dict/optional_silence.txt ...
--> reading data/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/optional_silence.txt is OK
Checking data/local/dict/nonsilence_phones.txt ...
--> reading data/local/dict/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/nonsilence_phones.txt is OK
Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.
Checking data/local/dict/lexicon.txt
--> reading data/local/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/lexicon.txt is OK
Checking data/local/dict/lexiconp.txt
--> reading data/local/dict/lexiconp.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/lexiconp.txt is OK
Checking lexicon pair data/local/dict/lexicon.txt and data/local/dict/lexiconp.txt
--> lexicon pair data/local/dict/lexicon.txt and data/local/dict/lexiconp.txt match
Checking data/local/dict/extra_questions.txt ...
--> reading data/local/dict/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/local/dict]
fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int
prepare_lang.sh: validating output directory
utils/validate_lang.pl data/lang
Checking data/lang/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang/phones.txt is OK
Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt
Checking data/lang/phones/context_indep.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
--> data/lang/phones/context_indep.{
txt, int, csl} are OK
Checking data/lang/phones/nonsilence.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 47 entry/entries in data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
--> data/lang/phones/nonsilence.{
txt, int, csl} are OK
Checking data/lang/phones/silence.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/phones/silence.txt
--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
--> data/lang/phones/silence.{
txt, int, csl} are OK
Checking data/lang/phones/optional_silence.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.{
txt, int, csl} are OK
Checking data/lang/phones/disambig.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 2 entry/entries in data/lang/phones/disambig.txt
--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
--> data/lang/phones/disambig.{
txt, int, csl} are OK
Checking data/lang/phones/roots.{
txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 48 entry/entries in data/lang/phones/roots.txt
--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
--> data/lang/phones/roots.{
txt, int} are OK
Checking data/lang/phones/sets.{
txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 48 entry/entries in data/lang/phones/sets.txt
--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
--> data/lang/phones/sets.{
txt, int} are OK
Checking data/lang/phones/extra_questions.{
txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 2 entry/entries in data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
--> data/lang/phones/extra_questions.{
txt, int} are OK
Checking optional_silence.txt ...
--> reading data/lang/phones/optional_silence.txt
--> data/lang/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> data/lang/phones/disambig.txt has "#0" and "#1"
--> data/lang/phones/disambig.txt is OK
Checking topo ...
Checking word-level disambiguation symbols...
--> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking data/lang/oov.{
txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang/oov.txt
--> data/lang/oov.int corresponds to data/lang/oov.txt
--> data/lang/oov.{
txt, int} are OK
--> data/lang/L.fst is olabel sorted
--> data/lang/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang]
Preparing train, dev and test data
utils/validate_data_dir.sh: Successfully validated data-directory data/train
utils/validate_data_dir.sh: Successfully validated data-directory data/dev
utils/validate_data_dir.sh: Successfully validated data-directory data/test
Preparing language models for test
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test_bg/words.txt - data/lang_test_bg/G.fst
LOG (arpa2fst[5.5.164~1-9698]:Read():arpa-file-parser.cc:94) Reading \data\ section.
LOG (arpa2fst[5.5.164~1-9698]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
LOG (arpa2fst[5.5.164~1-9698]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.
WARNING (arpa2fst[5.5.164~1-9698]:ConsumeNGram():arpa-lm-compiler.cc:313) line 60 [-3.26717 <s> <s>] skipped: n-gram has invalid BOS/EOS placement
LOG (arpa2fst[5.5.164~1-9698]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 50 to 50
fstisstochastic data/lang_test_bg/G.fst
0.000510126 -0.0763018
utils/validate_lang.pl data/lang_test_bg
Checking data/lang_test_bg/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_test_bg/phones.txt is OK
Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_test_bg/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt
Checking data/lang_test_bg/phones/context_indep.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_test_bg/phones/context_indep.txt
--> data/lang_test_bg/phones/context_indep.int corresponds to data/lang_test_bg/phones/context_indep.txt
--> data/lang_test_bg/phones/context_indep.csl corresponds to data/lang_test_bg/phones/context_indep.txt
--> data/lang_test_bg/phones/context_indep.{
txt, int, csl} are OK
Checking data/lang_test_bg/phones/nonsilence.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 47 entry/entries in data/lang_test_bg/phones/nonsilence.txt
--> data/lang_test_bg/phones/nonsilence.int corresponds to data/lang_test_bg/phones/nonsilence.txt
--> data/lang_test_bg/phones/nonsilence.csl corresponds to data/lang_test_bg/phones/nonsilence.txt
--> data/lang_test_bg/phones/nonsilence.{
txt, int, csl} are OK
Checking data/lang_test_bg/phones/silence.{
txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_test_bg/phones/silence.txt
--> data/lang_test_bg/phones/silence.int