见性见声

Kaldi单步完美运行AIShell v1 S5之五：DNN (chain)

Kaldi单步完美运行AIShell v1 S5之五：DNN（chain）

致谢
机器配置

问题：显卡设备老旧，一个GPU，想跑tdnn模型，如何破？

Kaldi单步完美运行AIShell v1 S5之五：DNN (chain)

第14部分：DNN Chain Model
第12部分：Chain训练、解码、校准
第15部分：迭代

致谢

感谢AIShell在商业化道路上的探索。期待着v3的到来。

机器配置

sv@HP:~$ sudo lsb_release -a
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic

sv@HP:~$ cat /proc/cpuinfo | grep model\ name
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
model name	: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
sv@HP:~$ cat /proc/meminfo | grep MemTotal
MemTotal:       16321360 kB
sv@HP:~$ lspci | grep 'VGA'
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)

问题：显卡设备老旧，一个GPU，想跑tdnn模型，如何破？

**解答：**
将num-jobs-initial和num-jobs-final都设为1，将epochs改为2或者3。
GPU设为独占。
sv@HP:~/lkaldi/egs/aishell/s5$ sudo nvidia-smi -c 3
[sudo] password for sv: 
Set compute mode to EXCLUSIVE_PROCESS for GPU 00000000:01:00.0.
All done.
sv@HP:~/lkaldi/egs/aishell/s5$ sudo nvidia-smi
Wed Jan 16 10:31:58 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| 27%   31C    P8     7W / 151W |    225MiB /  8116MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1432      G   /usr/lib/xorg/Xorg                           125MiB |
|    0      1645      G   /usr/bin/gnome-shell                          94MiB |
|    0      2622      G   /opt/firefox/firefox-bin                       3MiB |
+-----------------------------------------------------------------------------+

Kaldi单步完美运行AIShell v1 S5之五：DNN (chain)

终篇。Chain Model的结果可以线上、实时，才有独立的商用价值。

第14部分：DNN Chain Model

先看结果。DNN nnet3错词率降到8.68%。Chain降到7.72%。

sv@HP:~/lkaldi/egs/aishell/s5$ for x in exp/ */decode_test; do [ -d $x ] && grep WER $x/cer_* | utils/best_wer.sh; done 2>/dev/null

%WER 36.59 [ 38335 / 104765, 849 ins, 3183 del, 34303 sub ] exp/mono/decode_test/cer_10_0.0
%WER 18.83 [ 19727 / 104765, 971 ins, 1161 del, 17595 sub ] exp/tri1/decode_test/cer_13_0.5
%WER 18.79 [ 19684 / 104765, 957 ins, 1142 del, 17585 sub ] exp/tri2/decode_test/cer_14_0.5
%WER 16.84 [ 17643 / 104765, 791 ins, 991 del, 15861 sub ] exp/tri3a/decode_test/cer_14_0.5
%WER 13.63 [ 14277 / 104765, 762 ins, 639 del, 12876 sub ] exp/tri4a/decode_test/cer_13_0.5
%WER 8.68 [ 9097 / 104765, 355 ins, 464 del, 8278 sub ] exp/nnet3/tdnn_sp/decode_test/cer_14_1.0
%WER 7.72 [ 8087 / 104765, 364 ins, 552 del, 7171 sub ] exp/chain/tdnn_1a_sp/decode_test/cer_11_0.5

第12部分：Chain训练、解码、校准

sv@HP:~/lkaldi/egs/aishell/s5$ local/chain/run_tdnn.sh
local/chain/run_tdnn.sh 
local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment)
utils/data/perturb_data_dir_speed_3way.sh: data/train_sp/feats.scp already exists: refusing to run this (please delete data/train_sp/feats.scp if you want this to run)
sv@HP:~/lkaldi/egs/aishell/s5$ local/chain/run_tdnn.sh
local/chain/run_tdnn.sh 
local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment)
utils/data/perturb_data_dir_speed_3way.sh: making sure the utt2dur and the reco2dur files are present
... in data/train, because obtaining it after speed-perturbing
... would be very slow, and you might need them.
utils/data/get_utt2dur.sh: data/train/utt2dur already exists with the expected length.  We won't recompute it.
utils/data/get_reco2dur.sh: data/train/reco2dur already exists with the expected length.  We won't recompute it.
utils/data/perturb_data_dir_speed.sh: generated speed-perturbed version of data in data/train, in data/train_sp_speed0.9
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_speed0.9
utils/data/perturb_data_dir_speed.sh: generated speed-perturbed version of data in data/train, in data/train_sp_speed1.1
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_speed1.1
utils/data/combine_data.sh data/train_sp data/train data/train_sp_speed0.9 data/train_sp_speed1.1
utils/data/combine_data.sh: combined utt2uniq
utils/data/combine_data.sh [info]: not combining segments as it does not exist
utils/data/combine_data.sh: combined utt2spk
utils/data/combine_data.sh [info]: not combining utt2lang as it does not exist
utils/data/combine_data.sh: combined utt2dur
utils/data/combine_data.sh: combined reco2dur
utils/data/combine_data.sh [info]: **not combining feats.scp as it does not exist everywhere**
utils/data/combine_data.sh: combined text
utils/data/combine_data.sh [info]: **not combining cmvn.scp as it does not exist everywhere**
utils/data/combine_data.sh [info]: not combining vad.scp as it does not exist
utils/data/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist
utils/data/combine_data.sh: combined wav.scp
utils/data/combine_data.sh [info]: not combining spk2gender as it does not exist
fix_data_dir.sh: kept all 360294 utterances.
fix_data_dir.sh: old files are kept in data/train_sp/.backup
utils/data/perturb_data_dir_speed_3way.sh: generated 3-way speed-perturbed version of data in data/train, in data/train_sp
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp
local/nnet3/run_ivector_common.sh: making MFCC features for low-resolution speed-perturbed data
steps/make_mfcc_pitch.sh --cmd run.pl --mem 8G --nj 70 data/train_sp exp/make_mfcc/train_sp mfcc_perturbed
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC & Pitch features for train_sp
steps/compute_cmvn_stats.sh data/train_sp exp/make_mfcc/train_sp mfcc_perturbed
Succeeded creating CMVN stats for train_sp
fix_data_dir.sh: kept all 360294 utterances.
fix_data_dir.sh: old files are kept in data/train_sp/.backup
local/nnet3/run_ivector_common.sh: aligning with the perturbed low-resolution data
steps/align_fmllr.sh --nj 30 --cmd run.pl --mem 8G data/train_sp data/lang exp/tri5a exp/tri5a_sp_ali
steps/align_fmllr.sh: feature type is lda
steps/align_fmllr.sh: compiling training graphs
steps/align_fmllr.sh: aligning data in data/train_sp using exp/tri5a/final.alimdl and speaker-independent features.
steps/align_fmllr.sh: computing fMLLR transforms
steps/align_fmllr.sh: doing final alignment.
steps/align_fmllr.sh: done aligning data.
steps/diagnostic/analyze_alignments.sh --cmd run.pl --mem 8G data/lang exp/tri5a_sp_ali
steps/diagnostic/analyze_alignments.sh: see stats in exp/tri5a_sp_ali/log/analyze_alignments.log
404 warnings in exp/tri5a_sp_ali/log/align_pass2.*.log
2 warnings in exp/tri5a_sp_ali/log/fmllr.*.log
387 warnings in exp/tri5a_sp_ali/log/align_pass1.*.log
local/nnet3/run_ivector_common.sh: creating high-resolution MFCC features
utils/copy_data_dir.sh: copied data from data/train_sp to data/train_sp_hires
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_hires
utils/copy_data_dir.sh: copied data from data/dev to data/dev_hires
utils/validate_data_dir.sh: Successfully validated data-directory data/dev_hires
utils/copy_data_dir.sh: copied data from data/test to data/test_hires
utils/validate_data_dir.sh: Successfully validated data-directory data/test_hires
utils/data/perturb_data_dir_volume.sh: data/train_sp_hires/feats.scp exists; moving it to data/train_sp_hires/.backup/ as it wouldn't be valid any more.
utils/data/perturb_data_dir_volume.sh: added volume perturbation to the data in data/train_sp_hires
steps/make_mfcc_pitch.sh --nj 10 --mfcc-config conf/mfcc_hires.conf --cmd run.pl --mem 8G data/train_sp_hires exp/make_hires/train_sp mfcc_perturbed_hires
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_hires
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC & Pitch features for train_sp_hires
steps/compute_cmvn_stats.sh data/train_sp_hires exp/make_hires/train_sp mfcc_perturbed_hires
Succeeded creating CMVN stats for train_sp_hires
fix_data_dir.sh: kept all 360294 utterances.
fix_data_dir.sh: old files are kept in data/train_sp_hires/.backup
utils/copy_data_dir.sh: copied data from data/train_sp_hires to data/train_sp_hires_nopitch
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_hires_nopitch
utils/data/limit_feature_dim.sh: warning: removing data/train_sp_hires_nopitch/cmvn.cp, you will have to regenerate it from the features.
utils/validate_data_dir.sh: Successfully validated data-directory data/train_sp_hires_nopitch
steps/compute_cmvn_stats.sh data/train_sp_hires_nopitch exp/make_hires/train_sp mfcc_perturbed_hires
Succeeded creating CMVN stats for train_sp_hires_nopitch
steps/make_mfcc_pitch.sh --nj 10 --mfcc-config conf/mfcc_hires.conf --cmd run.pl --mem 8G data/dev_hires exp/make_hires/dev mfcc_perturbed_hires
steps/make_mfcc_pitch.sh: moving data/dev_hires/feats.scp to data/dev_hires/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/dev_hires
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC & Pitch features for dev_hires
steps/compute_cmvn_stats.sh data/dev_hires exp/make_hires/dev mfcc_perturbed_hires
Succeeded creating CMVN stats for dev_hires
fix_data_dir.sh: kept all 14326 utterances.
fix_data_dir.sh: old files are kept in data/dev_hires/.backup
utils/copy_data_dir.sh: copied data from data/dev_hires to data/dev_hires_nopitch
utils/validate_data_dir.sh: Successfully validated data-directory data/dev_hires_nopitch
utils/data/limit_feature_dim.sh: warning: removing data/dev_hires_nopitch/cmvn.cp, you will have to regenerate it from the features.
utils/validate_data_dir.sh: Successfully validated data-directory data/dev_hires_nopitch
steps/compute_cmvn_stats.sh data/dev_hires_nopitch exp/make_hires/dev mfcc_perturbed_hires
Succeeded creating CMVN stats for dev_hires_nopitch
steps/make_mfcc_pitch.sh --nj 10 --mfcc-config conf/mfcc_hires.conf --cmd run.pl --mem 8G data/test_hires exp/make_hires/test mfcc_perturbed_hires
steps/make_mfcc_pitch.sh: moving data/test_hires/feats.scp to data/test_hires/.backup
utils/validate_data_dir.sh: Successfully validated data-directory data/test_hires
steps/make_mfcc_pitch.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC & Pitch features for test_hires
steps/compute_cmvn_stats.sh data/test_hires exp/make_hires/test mfcc_perturbed_hires
Succeeded creating CMVN stats for test_hires
fix_data_dir.sh: kept all 7176 utterances.
fix_data_dir.sh: old files are kept in data/test_hires/.backup
utils/copy_data_dir.sh: copied data from data/test_hires to data/test_hires_nopitch
utils/validate_data_dir.sh: Successfully validated data-directory data/test_hires_nopitch
utils/data/limit_feature_dim.sh: warning: removing data/test_hires_nopitch/cmvn.cp, you will have to regenerate it from the features.
utils/validate_data_dir.sh: Successfully validated data-directory data/test_hires_nopitch
steps/compute_cmvn_stats.sh data/test_hires_nopitch exp/make_hires/test mfcc_perturbed_hires
Succeeded creating CMVN stats for test_hires_nopitch
local/nnet3/run_ivector_common.sh: computing a subset of data to train the diagonal UBM.
utils/data/subset_data_dir.sh: reducing #utt from 360294 to 90073
local/nnet3/run_ivector_common.sh: computing a PCA transform from the hires data.
steps/online/nnet2/get_pca_transform.sh --cmd run.pl --mem 8G --splice-opts --left-context=3 --right-context=3 --max-utts 10000 --subsample 2 exp/nnet3/diag_ubm/train_sp_hires_nopitch_subset exp/nnet3/pca_transform
Done estimating PCA transform in exp/nnet3/pca_transform
local/nnet3/run_ivector_common.sh: training the diagonal UBM.
steps/online/nnet2/train_diag_ubm.sh --cmd run.pl --mem 8G --nj 30 --num-frames 700000 --num-threads 8 exp/nnet3/diag_ubm/train_sp_hires_nopitch_subset 512 exp/nnet3/pca_transform exp/nnet3/diag_ubm
steps/online/nnet2/train_diag_ubm.sh: Directory exp/nnet3/diag_ubm already exists. Backing up diagonal UBM in exp/nnet3/diag_ubm/backup.wLX
steps/online/nnet2/train_diag_ubm.sh: initializing model from E-M in memory, 
steps/online/nnet2/train_diag_ubm.sh: starting from 256 Gaussians, reaching 512;
steps/online/nnet2/train_diag_ubm.sh: for 20 iterations, using at most 700000 frames of data
Getting Gaussian-selection info
steps/online/nnet2/train_diag_ubm.sh: will train for 4 iterations, in parallel over
steps/online/nnet2/train_diag_ubm.sh: 30 machines, parallelized with 'run.pl --mem 8G'
steps/online/nnet2/train_diag_ubm.sh: Training pass 0
steps/online/nnet2/train_diag_ubm.sh: Training pass 1
steps/online/nnet2/train_diag_ubm.sh: Training pass 2
steps/online/nnet2/train_diag_ubm.sh: Training pass 3
local/nnet3/run_ivector_common.sh: training the iVector extractor
steps/online/nnet2/train_ivector_extractor.sh --cmd run.pl --mem 8G --nj 10 data/train_sp_hires_nopitch exp/nnet3/diag_ubm exp/nnet3/extractor
steps/online/nnet2/train_ivector_extractor.sh: Directory exp/nnet3/extractor already exists. Backing up iVector extractor in exp/nnet3/extractor/backup.FP5
steps/online/nnet2/train_ivector_extractor.sh: doing Gaussian selection and posterior computation
Accumulating stats (pass 0)
Summing accs (pass 0)
Updating model (pass 0)
Accumulating stats (pass 1)
Summing accs (pass 1)
Updating model (pass 1)
Accumulating stats (pass 2)
Summing accs (pass 2)
Updating model (pass 2)
Accumulating stats (pass 3)
Summing accs (pass 3)
Updating model (pass 3)
Accumulating stats (pass 4)
Summing accs (pass 4)
Updating model (pass 4)
Accumulating stats (pass 5)
Summing accs (pass 5)
Updating model (pass 5)
Accumulating stats (pass 6)
Summing accs (pass 6)
Updating model (pass 6)
Accumulating stats (pass 7)
Summing accs (pass 7)
Updating model (pass 7)
Accumulating stats (pass 8)
Summing accs (pass 8)
Updating model (pass 8)
Accumulating stats (pass 9)
Summing accs (pass 9)
Updating model (pass 9)
utils/data/modify_speaker_info.sh: copied data from data/train_sp_hires_nopitch to exp/nnet3/ivectors_train_sp/train_sp_sp_hires_nopitch_max2, number of speakers changed from 1020 to 180399
utils/validate_data_dir.sh: Successfully validated data-directory exp/nnet3/ivectors_train_sp/train_sp_sp_hires_nopitch_max2
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --mem 8G --nj 30 exp/nnet3/ivectors_train_sp/train_sp_sp_hires_nopitch_max2 exp/nnet3/extractor exp/nnet3/ivectors_train_sp
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to exp/nnet3/ivectors_train_sp using the extractor in exp/nnet3/extractor.
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --mem 8G --nj 8 data/dev_hires_nopitch exp/nnet3/extractor exp/nnet3/ivectors_dev
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to exp/nnet3/ivectors_dev using the extractor in exp/nnet3/extractor.
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --mem 8G --nj 8 data/test_hires_nopitch exp/nnet3/extractor exp/nnet3/ivectors_test
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to exp/nnet3/ivectors_test using the extractor in exp/nnet3/extractor.
steps/align_fmllr_lats.sh --nj 30 --cmd run.pl --mem 8G data/train_sp data/lang exp/tri5a exp/tri5a_sp_lats
steps/align_fmllr_lats.sh: feature type is lda
steps/align_fmllr_lats.sh: compiling training graphs
steps/align_fmllr_lats.sh: aligning data in data/train_sp using exp/tri5a/final.alimdl and speaker-independent features.
steps/align_fmllr_lats.sh: computing fMLLR transforms
steps/align_fmllr_lats.sh: generating lattices containing alternate pronunciations.
lsteps/align_fmllr_lats.sh: done generating lattices from training transcripts.
1 warnings in exp/tri5a_sp_lats/log/generate_lattices.*.log
2 warnings in exp/tri5a_sp_lats/log/fmllr.*.log
399 warnings in exp/tri5a_sp_lats/log/align_pass1.*.log
steps/nnet3/chain/build_tree.sh --frame-subsampling-factor 3 --context-opts --context-width=2 --central-position=1 --cmd run.pl --mem 8G 5000 data/train_sp data/lang_chain exp/tri5a_sp_ali exp/chain/tri6_7d_tree_sp
steps/nnet3/chain/build_tree.sh: feature type is lda
steps/nnet3/chain/build_tree.sh: Using transforms from exp/tri5a_sp_ali
steps/nnet3/chain/build_tree.sh: Initializing monophone model (for alignment conversion, in case topology changed)
steps/nnet3/chain/build_tree.sh: Accumulating tree stats
steps/nnet3/chain/build_tree.sh: Getting questions for tree clustering.
steps/nnet3/chain/build_tree.sh: Building the tree
steps/nnet3/chain/build_tree.sh: Initializing the model
steps/nnet3/chain/build_tree.sh: Converting alignments from exp/tri5a_sp_ali to use current tree
steps/nnet3/chain/build_tree.sh: Done building tree
local/chain/run_tdnn.sh: creating neural net configs using the xconfig parser
tree-info exp/chain/tri6_7d_tree_sp/tree 
steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain/tdnn_1a_sp/configs/network.xconfig --config-dir exp/chain/tdnn_1a_sp/configs/
nnet3-init exp/chain/tdnn_1a_sp/configs//init.config exp/chain/tdnn_1a_sp/configs//init.raw 
LOG (nnet3-init[5.5.164~1-9698]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/chain/tdnn_1a_sp/configs//init.raw
nnet3-info exp/chain/tdnn_1a_sp/configs//init.raw 
nnet3-init exp/chain/tdnn_1a_sp/configs//ref.config exp/chain/tdnn_1a_sp/configs//ref.raw 
LOG (nnet3-init[5.5.164~1-9698]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/chain/tdnn_1a_sp/configs//ref.raw
nnet3-info exp/chain/tdnn_1a_sp/configs//ref.raw 
nnet3-init exp/chain/tdnn_1a_sp/configs//ref.config exp/chain/tdnn_1a_sp/configs//ref.raw 
LOG (nnet3-init[5.5.164~1-9698]:main():nnet3-init.cc:80) Initialized raw neural net and wrote it to exp/chain/tdnn_1a_sp/configs//ref.raw
nnet3-info exp/chain/tdnn_1a_sp/configs//ref.raw 
2019-01-16 20:02:01,589 [steps/nnet3/chain/train.py:35 - <module> - INFO ] Starting chain model trainer (train.py)
steps/nnet3/chain/train.py --stage -10 --cmd run.pl --mem 8G --feat.online-ivector-dir exp/nnet3/ivectors_train_sp --feat.cmvn-opts --norm-means=false --norm-vars=false --chain.xent-regularize 0.1 --chain.leaky-hmm-coefficient 0.1 --chain.l2-regularize 0.00005 --chain.apply-deriv-weights false --chain.lm-opts=--num-extra-lm-states=2000 --egs.dir  --egs.stage -10 --egs.opts --frames-overlap-per-eg 0 --egs.chunk-width 150,110,90 --trainer.num-chunk-per-minibatch 128 --trainer.frames-per-iter 1500000 --trainer.num-epochs 2 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.001 --trainer.optimization.final-effective-lrate 0.0001 --trainer.max-param-change 2.0 --cleanup.remove-egs true --feat-dir data/train_sp_hires --tree-dir exp/chain/tri6_7d_tree_sp --lat-dir exp/tri5a_sp_lats --dir exp/chain/tdnn_1a_sp
['steps/nnet3/chain/train.py', '--stage', '-10', '--cmd', 'run.pl --mem 8G', '--feat.online-ivector-dir', 'exp/nnet3/ivectors_train_sp', '--feat.cmvn-opts', '--norm-means=false --norm-vars=false', '--chain.xent-regularize', '0.1', '--chain.leaky-hmm-coefficient', '0.1', '--chain.l2-regularize', '0.00005', '--chain.apply-deriv-weights', 'false', '--chain.lm-opts=--num-extra-lm-states=2000', '--egs.dir', '', '--egs.stage', '-10', '--egs.opts', '--frames-overlap-per-eg 0', '--egs.chunk-width', '150,110,90', '--trainer.num-chunk-per-minibatch', '128', '--trainer.frames-per-iter', '1500000', '--trainer.num-epochs', '2', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.001', '--trainer.optimization.final-effective-lrate', '0.0001', '--trainer.max-param-change', '2.0', '--cleanup.remove-egs', 'true', '--feat-dir', 'data/train_sp_hires', '--tree-dir', 'exp/chain/tri6_7d_tree_sp', '--lat-dir', 'exp/tri5a_sp_lats', '--dir', 'exp/chain/tdnn_1a_sp']
2019-01-16 20:02:01,649 [steps/nnet3/chain/train.py:273 - train - INFO ] Arguments for the experiment
{'alignment_subsampling_factor': 3,
 'apply_deriv_weights': False,
 'backstitch_training_interval': 1,
 'backstitch_training_scale': 0.0,
 'chunk_left_context': 0,
 'chunk_left_context_initial': -1,
 'chunk_right_context': 0,
 'chunk_right_context_final': -1,
 'chunk_width': '150,110,90',
 'cleanup': True,
 'cmvn_opts': '--norm-means=false --norm-vars=false',
 'combine_sum_to_one_penalty': 0.0,
 'command': 'run.pl --mem 8G',
 'compute_per_dim_accuracy': False,
 'deriv_truncate_margin': None,
 'dir': 'exp/chain/tdnn_1a_sp',
 'do_final_combination': True,
 'dropout_schedule': None,
 'egs_command': None,
 'egs_dir': None,
 'egs_opts': '--frames-overlap-per-eg 0',
 'egs_stage': -10,
 'email': None,
 'exit_stage': None,
 'feat_dir': 'data/train_sp_hires',
 'final_effective_lrate': 0.0001,
 'frame_subsampling_factor': 3,
 'frames_per_iter': 1500000,
 'initial_effective_lrate': 0.001,
 'input_model': None,
 'l2_regularize': 5e-05,
 'lat_dir': 'exp/tri5a_sp_lats',
 'leaky_hmm_coefficient': 0.1,
 'left_deriv_truncate': None,
 'left_tolerance': 5,
 'lm_opts': '--num-extra-lm-states=2000',
 'max_lda_jobs': 10,
 'max_models_combine': 20,
 'max_objective_evaluations': 30,
 'max_param_change': 2.0,
 'momentum': 0.0,
 'num_chunk_per_minibatch': '128',
 'num_epochs': 2.0,
 'num_jobs_final': 1,
 'num_jobs_initial': 1,
 'online_ivector_dir': 'exp/nnet3/ivectors_train_sp',
 'preserve_model_interval': 100,
 'presoftmax_prior_scale_power': -0.25,
 'proportional_shrink': 0.0,
 'rand_prune': 4.0,
 'remove_egs': True,
 'reporting_interval': 0.1,
 'right_tolerance': 5,
 'samples_per_iter': 400000,
 'shrink_saturation_threshold': 0.4,
 'shrink_value': 1.0,
 'shuffle_buffer_size': 5000,
 'srand': 0,
 'stage': -10,
 'train_opts': [],
 'tree_dir': 'exp/chain/tri6_7d_tree_sp',
 'use_gpu': 'yes',
 'xent_regularize': 0.1}
2019-01-16 20:02:07,967 [steps/nnet3/chain/train.py:327 - train - INFO ] Creating phone language-model
2019-01-16 20:02:14,455 [steps/nnet3/chain/train.py:332 - train - INFO ] Creating denominator FST
copy-transition-model exp/chain/tri6_7d_tree_sp/final.mdl exp/chain/tdnn_1a_sp/0.trans_mdl 
LOG (copy-transition-model[5.5.164~1-9698]:main():copy-transition-model.cc:62) Copied transition model.
2019-01-16 20:02:15,517 [steps/nnet3/chain/train.py:339 - train - INFO ] Initializing a basic network for estimating preconditioning matrix
2019-01-16 20:02:15,553 [steps/nnet3/chain/train.py:361 - train - INFO ] Generating egs
steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --cmd run.pl --mem 8G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir exp/nnet3/ivectors_train_sp --left-context 13 --right-context 13 --left-context-initial -1 --right-context-final -1 --left-tolerance 5 --right-tolerance 5 --frame-subsampling-factor 3 --alignment-subsampling-factor 3 --stage -10 --frames-per-iter 1500000 --frames-per-eg 150,110,90 --srand 0 data/train_sp_hires exp/chain/tdnn_1a_sp exp/tri5a_sp_lats exp/chain/tdnn_1a_sp/egs
File data/train_sp_hires/utt2uniq exists, so augmenting valid_uttlist to
include all perturbed versions of the same 'real' utterances.
steps/nnet3/chain/get_egs.sh: creating egs.  To ensure they are not deleted later you can do:  touch exp/chain/tdnn_1a_sp/egs/.nodelete
steps/nnet3/chain/get_egs.sh: feature type is raw
tree-info exp/chain/tdnn_1a_sp/tree 
feat-to-dim scp:exp/nnet3/ivectors_train_sp/ivector_online.scp - 
steps/nnet3/chain/get_egs.sh: working out number of frames of training data
steps/nnet3/chain/get_egs.sh: working out feature dim
steps/nnet3/chain/get_egs.sh: creating 110 archives, each with 16567 egs, with
steps/nnet3/chain/get_egs.sh:   150,110,90 labels per example, and (left,right) context = (13,13)
steps/nnet3/chain/get_egs.sh: Getting validation and training subset examples in background.
steps/nnet3/chain/get_egs.sh: Generating training examples on disk
... Getting subsets of validation examples for diagnostics and combination.
steps/nnet3/chain/get_egs.sh: recombining and shuffling order of archives on disk
steps/nnet3/chain/get_egs.sh: removing temporary archives
steps/nnet3/chain/get_egs.sh: removing temporary alignments, lattices and transforms
steps/nnet3/chain/get_egs.sh: Finished preparing training examples

第15部分：迭代

2019-01-16 20:15:29,645 [steps/nnet3/chain/train.py:410 - train - INFO ] Copying the properties from exp/chain/tdnn_1a_sp/egs to exp/chain/tdnn_1a_sp
2019-01-16 20:15:29,671 [steps/nnet3/chain/train.py:424 - train - INFO ] Computing the preconditioning matrix for input features
2019-01-16 20:16:04,298 [steps/nnet3/chain/train.py:433 - train - INFO ] Preparing the initial acoustic model.
2019-01-16 20:16:05,196 [steps/nnet3/chain/train.py:467 - train - INFO ] Training will run for 2.0 epochs = 660 iterations
2019-01-16 20:16:05,196 [steps/nnet3/chain/train.py:509 - train - INFO ] Iter: 0/659    Epoch: 0.00/2.0 (0.0% complete)    lr: 0.001000    
2019-01-16 20:16:34,711 [steps/nnet3/chain/train.py:509 - train - INFO ] Iter: 1/659    Epoch: 0.00/2.0 (0.2% complete)    lr: 0.000997    
2019-01-16 20:16:57,582 [steps/nnet3/chain/train.py:509 - train - INFO ] Iter: 2/659    Epoch: 0.01/2.0 (0.3% complete)    lr: 0.000993    
以下省略28万字左右   。。。。  请脑补  。。。
2019-01-17 00:29:47,901 [steps/nnet3/chain/train.py:509 - train - INFO ] Iter: 658/659    Epoch: 1.99/2.0 (99.7% complete)    lr: 0.000101    
2019-01-17 00:30:11,185 [steps/nnet3/chain/train.py:509 - train - INFO ] Iter: 659/659    Epoch: 2.00/2.0 (99.8% complete)    lr: 0.000100    
2019-01-17 00:30:34,175 [steps/nnet3/chain/train.py:565 - train - INFO ] Doing final combination to produce final.mdl
2019-01-17 00:30:34,175 [steps/libs/nnet3/train/chain_objf/acoustic_model.py:571 - combine_models - INFO ] Combining set([519, 527, 660, 535, 495, 543, 551, 647, 559, 567, 575, 583, 503, 591, 599, 655, 607, 615, 623, 631, 511, 639]) models.
2019-01-17 00:30:49,737 [steps/nnet3/chain/train.py:594 - train - INFO ] Cleaning up the experiment directory exp/chain/tdnn_1a_sp
steps/nnet2/remove_egs.sh: Finished deleting examples in exp/chain/tdnn_1a_sp/egs
exp/chain/tdnn_1a_sp: num-iters=660 nj=1..1 num-params=12.2M dim=43+100->4320 combine=-0.054->-0.054 (over 4) xent:train/valid[438,659]=(-0.897,-0.855/-1.06,-1.03) logprob:train/valid[438,659]=(-0.053,-0.049/-0.072,-0.071)
tree-info exp/chain/tdnn_1a_sp/tree 
tree-info exp/chain/tdnn_1a_sp/tree 
fstcomposecontext --context-size=2 --central-position=1 --read-disambig-syms=data/lang_test/phones/disambig.int --write-disambig-syms=data/lang_test/tmp/disambig_ilabels_2_1.int data/lang_test/tmp/ilabels_2_1.4603 data/lang_test/tmp/LG.fst 
fstisstochastic data/lang_test/tmp/CLG_2_1.fst 
-0.0663446 -0.0666824
[info]: CLG not stochastic.
make-h-transducer --disambig-syms-out=exp/chain/tdnn_1a_sp/graph/disambig_tid.int --transition-scale=1.0 data/lang_test/tmp/ilabels_2_1 exp/chain/tdnn_1a_sp/tree exp/chain/tdnn_1a_sp/final.mdl 
fsttablecompose exp/chain/tdnn_1a_sp/graph/Ha.fst data/lang_test/tmp/CLG_2_1.fst 
fstdeterminizestar --use-log=true 
fstrmsymbols exp/chain/tdnn_1a_sp/graph/disambig_tid.int 
fstrmepslocal 
fstminimizeencoded 
fstisstochastic exp/chain/tdnn_1a_sp/graph/HCLGa.fst 
0.393711 -0.237036
HCLGa is not stochastic
add-self-loops --self-loop-scale=1.0 --reorder=true exp/chain/tdnn_1a_sp/final.mdl exp/chain/tdnn_1a_sp/graph/HCLGa.fst 
fstisstochastic exp/chain/tdnn_1a_sp/graph/HCLG.fst 
0.177603 -0.184153
[info]: final HCLG is not stochastic.
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --nj 10 --cmd run.pl --mem 8G --online-ivector-dir exp/nnet3/ivectors_dev exp/chain/tdnn_1a_sp/graph data/dev_hires exp/chain/tdnn_1a_sp/decode_dev
steps/nnet3/decode.sh: feature type is raw
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 8G --iter final exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_dev
steps/diagnostic/analyze_lats.sh: see stats in exp/chain/tdnn_1a_sp/decode_dev/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,26) and mean=12.2
steps/diagnostic/analyze_lats.sh: see stats in exp/chain/tdnn_1a_sp/decode_dev/log/analyze_lattice_depth_stats.log
score best paths
+ steps/score_kaldi.sh --cmd 'run.pl --mem 8G' data/dev_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_dev
steps/score_kaldi.sh --cmd run.pl --mem 8G data/dev_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_dev
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 8G' data/dev_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_dev
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 8G data/dev_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_dev
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
score confidence and timing with sclite
Decoding done.
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --nj 10 --cmd run.pl --mem 8G --online-ivector-dir exp/nnet3/ivectors_test exp/chain/tdnn_1a_sp/graph data/test_hires exp/chain/tdnn_1a_sp/decode_test
steps/nnet3/decode.sh: feature type is raw
steps/diagnostic/analyze_lats.sh --cmd run.pl --mem 8G --iter final exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_test
steps/diagnostic/analyze_lats.sh: see stats in exp/chain/tdnn_1a_sp/decode_test/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,4,39) and mean=18.6
steps/diagnostic/analyze_lats.sh: see stats in exp/chain/tdnn_1a_sp/decode_test/log/analyze_lattice_depth_stats.log
score best paths
+ steps/score_kaldi.sh --cmd 'run.pl --mem 8G' data/test_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_test
steps/score_kaldi.sh --cmd run.pl --mem 8G data/test_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_test
steps/score_kaldi.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ steps/scoring/score_kaldi_cer.sh --stage 2 --cmd 'run.pl --mem 8G' data/test_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_test
steps/scoring/score_kaldi_cer.sh --stage 2 --cmd run.pl --mem 8G data/test_hires exp/chain/tdnn_1a_sp/graph exp/chain/tdnn_1a_sp/decode_test
steps/scoring/score_kaldi_cer.sh: scoring with word insertion penalty=0.0,0.5,1.0
+ echo 'local/score.sh: Done'
local/score.sh: Done
score confidence and timing with sclite
Decoding done.

继续：Kaldi单步完美运行AIShell v1 S5之五：chain DNN
继续：Kaldi单步完美运行AIShell v1 S5之四：nnet3 DNN
回头：Kaldi单步完美运行AIShell v1 S5之三：三音素TriPhone
回头：Kaldi单步完美运行AIShell v1 S5之二：单音素MonoPhone
回头：Kaldi单步完美运行AIShell v1 S5之一：MONO前

其他参考：Kaldi完美运行TIMIT完整结果（含DNN）

你可能感兴趣的:(Kaldi)

说话人识别python_基于各种分类算法的说话人识别（年龄段识别） weixin_39673184 说话人识别python
基于各种分类算法的语音分类(年龄段识别)概述实习期间作为帮手打杂进行了一段时间的语音识别研究，内容是基于各种分类算法的语音的年龄段识别，总结一下大致框架，基本思想是：获取语料库TIMIT提取数据特征，进行处理MFCC/i-vectorLDA/PLDA/PCA语料提取，基于分类算法进行分类SVM/SVR/GMM/GBDT...用到的工具有HTK(C,shell)/Kaldi(C++,shell)/L
2020-11-23 安装kaldi提示CUDA版本与实际安装版本不符 CBCU Ubuntu Kaldi CUDA 语音识别 linux 深度学习
安装kaldi提示CUDA版本与实际安装版本不符在kaldi的src文件夹下运行./configure--shared提示：***configurefailed:CUDA9_1doesnotsupportg++(g++-7).Youneedg++<7.0.***而我在实际安装的版本是10_1：nvcc:NVIDIA(R)CudacompilerdriverCopyright(c)2005-2019
sherpa-onnx开源语音处理框架研究报告：从技术解析到应用实践 chanalbert AI 开源分享开源 python c++java
1项目概述与技术背景开源地址：https://github.com/k2-fsa/sherpa-onnxsherpa-onnx是一个基于下一代Kaldi和ONNX运行时的开源语音处理框架，由K2-FSA团队开发并维护。该项目专注于提供跨平台、高效率的语音处理能力，支持在完全离线的环境中运行语音识别(ASR)、文本转语音(TTS)、说话人识别、语音活动检测(VAD)等多项功能。与依赖云服务的传统语音
使用kaldi的sherpa-onnx根据文字语音合成（英文）静候光阴架设私有大模型语音识别语音识别人工智能
专栏总目录文字转语音，不论文字有多长，立刻出结果一、准备sherpa-onnx项目文件（一）下载项目文件下载地址：https://github.com/k2-fsa/sherpa-onnx（二）下载最新模型文件下载地址：https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
Kaldi GStreamer 服务器：实时语音识别的强大工具滑辰煦Marc
KaldiGStreamer服务器：实时语音识别的强大工具kaldi-gstreamer-serveralumae/kaldi-gstreamer-server:KaldiGStreamerServer是基于Kaldi语音识别工具包和GStreamer多媒体框架构建的一个服务器应用，允许通过网络传输音频数据，并利用Kaldi处理这些数据以实现语音识别。项目地址:https://gitcode.co
学习基本咖啡知识帅云毅职业技能成长学习印象笔记
本文主要内容咖啡豆的历史和种类如何制作一杯咖啡咖啡种类有哪些咖啡历史和种类咖啡的历史咖啡的起源起源地：咖啡的起源可以追溯到非洲的埃塞俄比亚。传说在公元9世纪，一位名叫卡尔迪（Kaldi）的埃塞俄比亚牧羊人发现，他的羊群在食用了一种红色浆果后变得异常兴奋，整夜不睡觉。卡尔迪尝试了这种浆果后，也感到精神振奋。后来，这种浆果被传入寺院，僧侣们用它来制作饮品，以帮助他们在夜间祈祷时保持清醒。植物学起源：咖
python实现语音转文字张航柯 python 开发语言
一、下载模型地址模型地址两个模型一个小一点，加载快一个大一点，加载慢加载的话每次启动只加载一次二、代码pipinstallspeech_recognitionvosk代码importjsonimportspeech_recognitionassrfromvoskimportModel,KaldiRecognizerrecognizer=sr.Recognizer()defrecognize_aud
最方便的离线python实时中文语音识别！迟钝皮纳德 python 语音识别
废话不多说，直接上代码，先安装环境需要安装的包：jsonpyaudionumpyvosk新建一个py文件写入：importjsonimportpyaudioimportnumpyasnpfromvoskimportModel,KaldiRecognizer,SetLogLeveldefSaveWave(model):#设置音频参数FORMAT=pyaudio.paInt16#音频流的格式RATE=
数字人源头厂商-源码出售源码交付-OEM系统贴牌余~~18538162800 音视频线性代数网络人工智能
引言在数字化浪潮中，数字人正成为创新应用的焦点。从虚拟偶像活跃于舞台，到虚拟客服在各行业的普及，数字人展现出巨大的潜力。搭建数字人源码系统，是融合多领域前沿技术的复杂工程，涵盖图形学、人工智能、语音处理等。本文将深入剖析数字人源码搭建的技术开发细节，为开发者提供全面且深入的技术指南。技术体系架构感知层语音识别：技术选型：采用Kaldi语音识别框架，它是一个开源且灵活的工具包，支持多种语言和声学模型
python pyaudio使用调用本地麦克风获取音频哦里哦里哦里给 AI 大语言模型实战 python 音视频开发语言
目录一、直接上代码二、代码解析一、直接上代码importpyaudiofromvoskimportModel,KaldiRecognizer#加载模型MODEL_PATH="vosk-model-en-us-0.22"#修改为您的模型路径model=Model(MODEL_PATH)#初始化音频流recognizer=KaldiRecognizer(model,16000)p=pyaudio.Py
通过手机控制家用电器的一个程序的设计（一） zhumin726 智能家居智能家居
一、概述设计一款安卓平台上的家庭智能控制软件，通过语音识别指令控制家用电器。该软件结合离线语音识别技术、红外线和WIFI通讯技术，实现对家电的智能控制，如开关机、调温度、调频道等操作。二、主要功能模块离线语音识别模块功能：识别用户的语音指令。技术：使用离线语音识别API，如PocketSphinx或Kaldi。操作流程：用户说出指令→语音数据被传输到离线语音识别引擎→引擎返回文本指令。命令解析模块
Vosk Android使用方法熊爱吃鱼 android 语音识别 java
Vosk是一个基于kaldi的开源语音识别框架，支持多种编程语言和多个平台，易于使用和集成，是做语音识别时很好的选择。使用步骤如下：下载vosk源码：源码地址。利用源码编译so库，不会编译的小伙伴可以从这里下载aar包：libvosk.so，然后把文件后缀名从.aar改为.zip，再解压这个文件即可在其中jni目录下找到so库。删除源码vosk-api-0.3.45\android\lib\src
Token Passing解码搬砖人NO17 语音神经网络共同学习语音识别人工智能
1、TokenPassing讲解视频参考地址：Tokenpassing2、TokenPassing（以Kaldi代码为例）（1）取src/fstext/deterministic-fst-test.cc，描述了怎么创建fst。StdVectorFst*CreateBackoffFst(){StdVectorFst*fst=newStdVectorFst();fst->AddState();//st
智能语音技术栈 chenkaifang 不归类
识别原理——硬件数据采集——软件数据处理目前主流的开源平台包括CMUSphinx、HTK、Kaldi、Julius、iATROS、CNTK、TensorFlow等，CMUSphinx是离线的语音识别工具，支持DSP等低功耗的离线应用场景。由于深度学习对于语音识别WER的下降具有明显的作用，所以Kaldi、CNTK、TensorFlow等支持深度学习的工具目前比较流行，Kaldi的优势就是集成了很多
ai智能语音机器人如何基于本地语音识别，搭建一款智能聊天机器人？ VO_794632978 WX-794632978 语音机器人人工智能机器人语音识别腾讯云阿里云
基于本地语音识别技术，搭建智能聊天机器人是一种广泛使用的人工智能应用。它可以为用户提供语音聊天、语音指令和语音控制等服务，提高用户的体验和方便性。以下是基于本地语音识别的智能聊天机器人搭建过程。确定使用的技术在搭建智能聊天机器人之前，需要确定将使用的技术和平台。通常情况下，语音识别技术可以使用开源框架，如CMUSphinx、Kaldi、DeepSpeech等。它们都提供了丰富的文档和示例代码，用于
julius开源语音识别引擎 xyc310898673 语音识别
开源语音识别软件HTK，对Julius和Kaldi等系统了解者更佳一.InstallationBySourcetarballInstallationprocessissimpleandcontainsthefollowingpoints:1.DownloadthenewestsourcetarbalfromJuliusofficialsite–link2.Unpackthearchiveforex
Kaldi单步完美运行AIShell v1 S5之四：DNN (nnet3、xent、MPE) aiXpert Kaldi dnn nnet3 kaldi asr 语音识别
Kaldi单步完美运行AIShellv1S5之四：DNN（nnet3、xent、MPE）致谢机器配置问题：显卡设备老旧，一个GPU，想跑tdnn模型，如何破？第11部分：nnet3DNN第12部分：nnet3训练、解码、校准第13部分：迭代深度计算第14部分：Chain致谢感谢AIShell在商业化道路上的探索。期待着v3的到来。机器配置sv@HP:~$sudolsb_release-aDistr
Kaldi完美运行TIMIT完整结果（含DNN） aiXpert Kaldi kaldi timit 语音识别 asr dnn
Kaldi完美运行TIMIT完整结果（含DNN）完全完整含DNN的TIMIT结果RESULTS机器配置Kaldi下TIMIT详细输出第一部分：数据准备第二部分：MFCC&CMVN第三部分：单音素第四部分：tri1:Deltas第五部分：LDA+MLLT第六部分：LDA+MLLT+SAT第七部分：SGMM2第八部分：MＭＩ+SGMM2第九部分：DNN第十部分：ＤＮＮ+SGMM第十一部分：成功的结果后
wenet环境部署 weixin_43870390 语音识别
下载镜像、生成container原始nvidia提供镜像的网站（包含kaldi）：https://docs.nvidia.com/deeplearning/frameworks/kaldi-release-notes/rel_20-03.html#rel_20-03本次采用的是21.02版本，包含如下内容：Ubuntu20.04includingPython3.8NVIDIACUDA11.2.0i
tensorflow环境安装配置 weixin_43870390 网络
下载匹配cuda的kaldi镜像Ubuntu20.04includingPython3.8NVIDIACUDA11.6.0cuBLAS11.8.1.74NVIDIAcuDNN8.3.2.44NVIDIANCCL2.11.4(optimizedforNVLink™)rdma-core36.0NVIDIAHPC-X2.10OpenMPI4.1.2rc4+OpenUCX1.12.0GDRCopy2.3N
离线语音识别 sherpa-ncnn 尝鲜体验杭州_燕十三语音识别 sherpa-ncnn
文章目录1、ubuntu编译运行依赖安装下载与编译模型下载运行2、树莓派4B编译运行确认树莓派4B环境交叉编译交叉编译模型下载与运行模型对比测试树莓派4B运行大模型Sherpa-NCNN是一个基于C++的轻量级神经网络推理框架，是kaldi下的一个子项目，它专门针对移动设备和嵌入式系统进行了优化。Sherpa-NCNN的目标是提供高性能、低延迟的推理能力，适用于移动设备和嵌入式系统，可以以满足实时
RivaGAN 水印项目张昊亮 typescript
git地址https://github.com/DAI-Lab/RivaGANDockerfile(/tools下文件为git下的文件)################################################使用NVIDIACUDA10.0开发环境作为基础镜像FROMkaldiasr/kaldi:gpu-ubuntu18.04-cuda10.0#设置非交互式安装模式以避免某
Kaldi中语言模型 legendayue 语音识别语言模型语音识别
数据准备流程是为了整理数据，生成指定的文件或者是变成指定的格式，方便kaldi后面的语言模型训练，数据准备流程1、处理集外词，将分词后的预料库data/local/train/text中的文件索引全部替换成，在生成语言模型时，如果计数文件中或者训练文件总出现了词典之外的词（OOV）将被替换成，然后将作为正常词进行统计，这么做的好处是给大量OOV分配概率，使得相比于以前频数稀少的精确词，使用增大计算
如何解决kaldi的依赖库mkl安装失败的问题醉心编码 shell基础知识及技巧 c/c++kaldi python asr
最近在学习如何使用kaldi进行语音识别。按照进程进行安装部署时发现IntelMKL库总是失败。通过搜索大量的资料，但都发现不太适用。现在将失败的症状和解决方法分享一下，希望能给读者提供一些帮助。通过执行./check_dependencies.sh发现缺少IntelMKL。[root@localhostextras]#./check_dependencies.sh./check_dependen
报名开启丨2023 SpeechHome 语音技术研讨会语音之家语音之家活动专区智能语音人工智能语音识别 AIGC 开源
2023SpeechHome语音技术研讨会将于11月18日—11月19日，在北京举办，同时举行开源语音技术交流会和第八届Kaldi技术交流会。欢迎大家报名参加（报名链接在文末）！本届研讨会覆盖5大主题，包括语音前沿技术、音频生成、音频与大模型、数据与大模型及开源技术，其中开源技术内容包括Kaldi、ESPnet、WeNet、ModelScope、AISHELL等。邀请来自产学研智能语音技术领域的专
Lhotse 音频库管理音频数据集 mingqian_chu #音频部分音视频
原文参考这里，原文作者GenerativeAI，作者FeitengLhotse是一个旨在使语音和音频数据准备更具灵活性和可访问性的Python库，它与k2一起，构成了下一代Kaldi语音处理库的一部分。主要目标：1.以Python为中心的设计吸引更广泛的社区参与语音处理任务。2.为有经验的Kaldi用户提供富有表现力的命令行接口。3.为常用的语料库提供标准的数据准备方案。4.为与语音和音频相关的任
语音识别学习笔记 AI视觉网奇语音识别语音识别学习笔记
目录端到端的多说话人语音识别序列化训练方法简介新一代Kaldi:Two-pass实时语音识别端到端的多说话人语音识别序列化训练方法简介端到端的多说话人语音识别序列化训练方法简介-知乎2.2基于排列不变性训练PermutationInvariantTraining(PIT)的多说话人语音识别所谓排列不变性训练是在AED的基础之上，添加多个output分支（通常支持几个人就有几个分支），文本序列和输出
kaldi mfcc 落红灬有丶情 kaldi
Kaldi特征提取之-预处理背景本质上语音信号是一维的时间信号，随时间上下波动。现实中，人们再说话时会受到各种音素的干扰，为了进一步进行处理，我们必须进行必要的预处理以便之后的特征提取。诸如FBank，MFCC，PLP等都需要经过预处理步骤。本章将假设语音的格式为wav。预处理整个预处理过程如下图所示：分帧从图中可以看出我们需要将不定长的音频切分成固定长度的小段，这一步称为分帧。分帧的原因在于语音
语音识别开源框架 Swaggie 机器学习语音识别开源人工智能
语音识别开源框架文章目录语音识别开源框架Whisper特征Github地址开源文档介绍论文参考ASRT特征环境Github地址开源文档介绍DeepSpeech特征环境Github地址文档介绍论文参考DeepSpeech2环境Github地址文档介绍论文参考ESPNET特征Github地址开源文档介绍kaldi特征Kaldi'sversusothertoolkitsTheflavorofKaldiG
双系统Ubuntu-22.04.3安装编译kaldi 伪_装 ubuntu linux 运维
Ubuntu物理内存要求85-100G以上，运行内存5-6G以上（如果第一次安装的Ubuntu物理内存不够，请勿进行扩容，扩容易出现黑屏、蓝屏、死机的情况，应该卸载Ubuntu重新安装，在安装过程中进行内存分配；运行内存可直接在虚拟机进行操作）1.1下载kaldiKaldi地址：GitHub-kaldi-asr/kaldi:kaldi-asr/kaldiistheofficiallocationo
java线程Thread和Runnable区别和联系 zx_code java jvm thread 多线程 Runnable
我们都晓得java实现线程2种方式，一个是继承Thread，另一个是实现Runnable。模拟窗口买票，第一例子继承thread，代码如下 package thread; public class ThreadTest { public static void main(String[] args) { Thread1 t1 = new Thread1(
【转】JSON与XML的区别比较丁_新 json xml
1.定义介绍 (1).XML定义扩展标记语言 (Extensible Markup Language, XML) ，用于标记电子文件使其具有结构性的标记语言，可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。 XML使用DTD(document type definition)文档类型定义来组织数据;格式统一，跨平台和语言，早已成为业界公认的标准。 XML是标
c++ 实现五种基础的排序算法 CrazyMizzz C++c 算法
#include<iostream> using namespace std; //辅助函数，交换两数之值 template<class T> void mySwap(T &x, T &y){ T temp = x; x = y; y = temp; } const int size = 10; //一、用直接插入排
我的软件麦田的设计者我的软件音乐类娱乐放松
这是我写的一款app软件，耗时三个月，是一个根据央视节目开门大吉改变的，提供音调，猜歌曲名。1、手机拥有者在android手机市场下载本APP，同意权限，安装到手机上。2、游客初次进入时会有引导页面提醒用户注册。（同时软件自动播放背景音乐）。3、用户登录到主页后，会有五个模块。a、点击不胫而走，用户得到开门大吉首页部分新闻，点击进入有新闻详情。b、
linux awk命令详解被触发 linux awk
awk是行处理器: 相比较屏幕处理的优点，在处理庞大文件时不会出现内存溢出或是处理缓慢的问题，通常用来格式化文本信息 awk处理过程: 依次对每一行进行处理，然后输出 awk命令形式: awk [-F|-f|-v] ‘BEGIN{} //{command1; command2} END{}’ file [-F|-f|-v]大参数，-F指定分隔符，-f调用脚本，-v定义变量 var=val
各种语言比较 _wy_ 编程语言
Java Ruby PHP 擅长领域
oracle 中数据类型为clob的编辑知了ing oracle clob
public void updateKpiStatus(String kpiStatus,String taskId){ Connection dbc=null; Statement stmt=null; PreparedStatement ps=null; try { dbc = new DBConn().getNewConnection(); //stmt = db
分布式服务框架 Zookeeper -- 管理分布式环境中的数据矮蛋蛋 zookeeper
原文地址： http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/ 安装和配置详解本文介绍的 Zookeeper 是以 3.2.2 这个稳定版本为基础，最新的版本可以通过官网 http://hadoop.apache.org/zookeeper/来获取，Zookeeper 的安装非常简单，下面将从单机模式和集群模式两
tomcat数据源 alafqq tomcat
数据库 JNDI(Java Naming and Directory Interface，Java命名和目录接口)是一组在Java应用中访问命名和目录服务的API。没有使用JNDI时我用要这样连接数据库： 03. Class.forName("com.mysql.jdbc.Driver"); 04. conn
遍历的方法百合不是茶遍历
遍历在java的泛
linux查看硬件信息的命令 bijian1013 linux
linux查看硬件信息的命令一.查看CPU： cat /proc/cpuinfo 二.查看内存： free 三.查看硬盘： df linux下查看硬件信息 1、lspci 列出所有PCI 设备； lspci - list all PCI devices:列出机器中的PCI设备（声卡、显卡、Modem、网卡、USB、主板集成设备也能
java常见的ClassNotFoundException bijian1013 java
1.java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory 添加包common-logging.jar2.java.lang.ClassNotFoundException: javax.transaction.Synchronization
【Gson五】日期对象的序列化和反序列化 bit1129 反序列化
对日期类型的数据进行序列化和反序列化时，需要考虑如下问题： 1. 序列化时，Date对象序列化的字符串日期格式如何 2. 反序列化时，把日期字符串序列化为Date对象，也需要考虑日期格式问题 3. Date A -> str -> Date B,A和B对象是否equals 默认序列化和反序列化 import com
【Spark八十六】Spark Streaming之DStream vs. InputDStream bit1129 Stream
1. DStream的类说明文档： /** * A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous * sequence of RDDs (of the same type) representing a continuous st
通过nginx获取header信息 ronin47 nginx header
1. 提取整个的Cookies内容到一个变量，然后可以在需要时引用，比如记录到日志里面， if ( $http_cookie ~* "(.*)$") { set $all_cookie $1; } 变量$all_cookie就获得了cookie的值，可以用于运算了
java-65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 bylijinnan java
参考了网上的http://blog.csdn.net/peasking_dd/article/details/6342984 写了个java版的： public class Print_1_To_NDigit { /** * Q65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 * 1.使用字符串
Netty源码学习-ReplayingDecoder bylijinnan java netty
ReplayingDecoder是FrameDecoder的子类，不熟悉FrameDecoder的，可以先看看 http://bylijinnan.iteye.com/blog/1982618 API说，ReplayingDecoder简化了操作，比如： FrameDecoder在decode时，需要判断数据是否接收完全： public class IntegerH
js特殊字符过滤 cngolon js特殊字符 js特殊字符过滤
1.js中用正则表达式过滤特殊字符, 校验所有输入域是否含有特殊符号function stripscript(s) { var pattern = new RegExp("[`~!@#$^&*()=|{}':;',\\[\\].<>/?~！@#￥……&*（）——|{}【】‘；：”“'。，、？]"
hibernate使用sql查询 ctrain Hibernate
import java.util.Iterator; import java.util.List; import java.util.Map; import org.hibernate.Hibernate; import org.hibernate.SQLQuery; import org.hibernate.Session; import org.hibernate.Transa
linux shell脚本中切换用户执行命令方法 daizj linux shell 命令切换用户
经常在写shell脚本时，会碰到要以另外一个用户来执行相关命令，其方法简单记下： 1、执行单个命令：su - user -c "command" 如：下面命令是以test用户在/data目录下创建test123目录 [root@slave19 /data]# su - test -c "mkdir /data/test123"
好的代码里只要一个 return 语句 dcj3sjt126com return
别再这样写了：public boolean foo() { if (true) { return true; } else { return false;
Android动画效果学习 dcj3sjt126com android
1、透明动画效果方法一：代码实现 public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) { View rootView = inflater.inflate(R.layout.fragment_main, container, fals
linux复习笔记之bash shell (4)管道命令 eksliang linux管道命令汇总 linux管道命令 linux常用管道命令
转载请出自出处： http://eksliang.iteye.com/blog/2105461 bash命令执行的完毕以后，通常这个命令都会有返回结果，怎么对这个返回的结果做一些操作呢？那就得用管道命令‘|’。上面那段话，简单说了下管道命令的作用，那什么事管道命令呢？答：非常的经典的一句话，记住了，何为管
Android系统中自定义按键的短按、双击、长按事件 gqdy365 android
在项目中碰到这样的问题：由于系统中的按键在底层做了重新定义或者新增了按键，此时需要在APP层对按键事件（keyevent）做分解处理，模拟Android系统做法，把keyevent分解成： 1、单击事件：就是普通key的单击； 2、双击事件：500ms内同一按键单击两次； 3、长按事件：同一按键长按超过1000ms（系统中长按事件为500ms）； 4、组合按键：两个以上按键同时按住；
asp.net获取站点根目录下子目录的名称 hvt .net C#asp.net hovertree Web Forms
使用Visual Studio建立一个.aspx文件(Web Forms)，例如hovertree.aspx,在页面上加入一个ListBox代码如下： <asp:ListBox runat="server" ID="lbKeleyiFolder" /> 那么在页面上显示根目录子文件夹的代码如下： string[] m_sub
Eclipse程序员要掌握的常用快捷键 justjavac java eclipse 快捷键 ide
判断一个人的编程水平，就看他用键盘多，还是鼠标多。用键盘一是为了输入代码（当然了，也包括注释），再有就是熟练使用快捷键。曾有人在豆瓣评《卓有成效的程序员》：“人有多大懒，才有多大闲”。之前我整理了一个程序员图书列表，目的也就是通过读书，让程序员变懒。写道程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可
c++编程随记 lx.asymmetric C++笔记
为了字体更好看，改变了格式…… &&运算符： #include<iostream> using namespace std; int main(){ int a=-1,b=4,k; k=(++a<0)&&!(b--
linux标准IO缓冲机制研究音频数据 linux
一、什么是缓存I/O(Buffered I/O)缓存I/O又被称作标准I/O,大多数文件系统默认I/O操作都是缓存I/O。在Linux的缓存I/O机制中，操作系统会将I/O的数据缓存在文件系统的页缓存(page cache)中，也就是说，数据会先被拷贝到操作系统内核的缓冲区中，然后才会从操作系统内核的缓冲区拷贝到应用程序的地址空间。1.缓存I/O有以下优点:A.缓存I/O使用了操作系统内核缓冲区，
随想生活暗黑小菠萝生活
其实账户之前就申请了，但是决定要自己更新一些东西看也是最近。从毕业到现在已经一年了。没有进步是假的，但是有多大的进步可能只有我自己知道。毕业的时候班里12个女生，真正最后做到软件开发的只要两个包括我，PS：我不是说测试不好。当时因为考研完全放弃找工作，考研失败，我想这只是我的借口。那个时候才想到为什么大学的时候不能好好的学习技术，增强自己的实战能力，以至于后来找工作比较费劲。我
我认为POJO是一个错误的概念 windshome java POJO 编程 J2EE 设计
这篇内容其实没有经过太多的深思熟虑，只是个人一时的感觉。从个人风格上来讲，我倾向简单质朴的设计开发理念；从方法论上，我更加倾向自顶向下的设计；从做事情的目标上来看，我追求质量优先，更愿意使用较为保守和稳妥的理念和方法。 &