kaldi run.sh 执行

用于公司研究记录

#prepare language stuff
#build a large lexicon that invovles words in both the training and decoding.
(
  echo "make word graph ..."
  cd $H; mkdir -p data/{dict,lang,graph} && \
  cp $thchs/resource/dict/{extra_questions.txt,nonsilence_phones.txt,optional_silence.txt,silence_phones.txt} data/dict && \
  cat $thchs/resource/dict/lexicon.txt $thchs/data_thchs30/lm_word/lexicon.txt | \
        grep -v '' | grep -v '' | sort -u > data/dict/lexicon.txt || exit 1;
  utils/prepare_lang.sh --position_dependent_phones false data/dict "" data/local/lang data/lang || exit 1;
  gzip -c $thchs/data_thchs30/lm_word/word.3gram.lm > data/graph/word.3gram.lm.gz || exit 1;
  utils/format_lm.sh data/lang data/graph/word.3gram.lm.gz $thchs/data_thchs30/lm_word/lexicon.txt data/graph/lang || exit 1;
)


这段的主题是:prepare language stuff

创建目录 data/ dict ,lang ,graph

关注cp 以及 gzip语句

得知是移动 extra_questions.txt,nonsilence_phones.txt,optional_silence.txt,silence_phones.txt

最后是移动 lm_word/lexicon.txt





#make_phone_graph
(
  echo "make phone graph ..."
  cd $H; mkdir -p data/{dict_phone,graph_phone,lang_phone} && \
  cp $thchs/resource/dict/{extra_questions.txt,nonsilence_phones.txt,optional_silence.txt,silence_phones.txt} data/dict_phone  && \
  cat $thchs/data_thchs30/lm_phone/lexicon.txt | grep -v '' | sort -u > data/dict_phone/lexicon.txt  && \
  echo " sil " >> data/dict_phone/lexicon.txt  || exit 1;
  utils/prepare_lang.sh --position_dependent_phones false data/dict_phone "" data/local/lang_phone data/lang_phone || exit 1;
  gzip -c $thchs/data_thchs30/lm_phone/phone.3gram.lm > data/graph_phone/phone.3gram.lm.gz  || exit 1;
  utils/format_lm.sh data/lang_phone data/graph_phone/phone.3gram.lm.gz $thchs/data_thchs30/lm_phone/lexicon.txt \
    data/graph_phone/lang  || exit 1;
)


这段的主题是 make_phone_graph其实要与上面的主题结合一起看.

创建目录:data/dict_phone, phone,lang_phone

关注cp,以及gzip语句

得知是移动/lm_phone/lexicon.txt 



kaldi run.sh 执行_第1张图片

这是执行完make phone_graph 的一些东西.

关注点,倒数第三行 data/graph_phone/lang/G.fst

好了,大量的词汇 会产生.fst .fst就是(openfst)

这个fst可能就是一个模型了.

kaldi run.sh 执行_第2张图片

kaldi run.sh 执行_第3张图片

上图是执行完 monophone 后的结果示意图

就是 data/graph/lang/tmp/LG.fst

data/lang exp/mono/log/


tree-info  都在  exp/mono/tree   无法用vim打开


你可能感兴趣的:(kaldi run.sh 执行)