John_Jiang-

PyTorch-Kaldi工具箱简介及核心代码注解

PyTorch-Kaldi简介

PyTroch-Kaldi是一款新推出的语音识别工具箱。由名字可以看出来，它是pytroch和kaldi的混合体。由于Kaldi内部的DNN拓展性较差（若需要添加新的网络Component，需要自己添加propagate和backpropagate)，所以作者构建了一个PyTroch-Kaldi工具箱，工具箱的框架如下图所示。
该工具箱依然使用DNN-HMM混合模型进行声学模型的建模，但其DNN部分由Pytorch实现，而特征提取、标签/对齐计算和和解码则使用依旧使用Kaldi完成。这大大简化了声学模型中DNN的构造难度。

该项目在Github上的地址为：项目地址
arxiv上论文地址为：论文地址

PyTorch-Kaldi核心逻辑

PyTorch-Kaldi的核心逻辑如下图所示。图中的虚线框表示一个Python文件。虚线箭头表示某步需要一个调用一个新的Python文件。

核心代码注释

为了更为全面的理解PyTorch-Kaldi的代码逻辑、方便进行大家对框架进行修改，这里选取了一些PyTorch-Kaldi中最重要的代码进行了注释。下列代码的注释可以直接点击百度云链接进行下载。
run_exp.py


# Reading global cfg file (first argument-mandatory file) 
cfg_file=sys.argv[1]
if not(os.path.exists(cfg_file)):
     sys.stderr.write('ERROR: The config file %s does not exist!\n'%(cfg_file))
     sys.exit(0)
else:
    config = configparser.ConfigParser()
    config.read(cfg_file)


# Reading and parsing optional arguments from command line (e.g.,--optimization,lr=0.002)
[section_args,field_args,value_args]=read_args_command_line(sys.argv,config)


# Output folder creation
out_folder=config['exp']['out_folder']
if not os.path.exists(out_folder):
    os.makedirs(out_folder+'/exp_files')

# Log file path    
log_file=config['exp']['out_folder']+'/log.log'


    
# Read, parse, and check the config file     
cfg_file_proto=config['cfg_proto']['cfg_proto']
[config,name_data,name_arch]=check_cfg(cfg_file,config,cfg_file_proto)


# Read cfg file options
is_production=strtobool(config['exp']['production'])   #“产品” 模式   不训练模型，只使用之前训练好的模型进行正向传播和解码
cfg_file_proto_chunk=config['cfg_proto']['cfg_proto_chunk']

cmd=config['exp']['cmd']
N_ep=int(config['exp']['N_epochs_tr'])
N_ep_str_format='0'+str(max(math.ceil(np.log10(N_ep)),1))+'d'
tr_data_lst=config['data_use']['train_with'].split(',')
valid_data_lst=config['data_use']['valid_with'].split(',')
forward_data_lst=config['data_use']['forward_with'].split(',')
max_seq_length_train=config['batches']['max_seq_length_train']
forward_save_files=list(map(strtobool,config['forward']['save_out_file'].split(',')))


print("- Reading config file......OK!")

     
# Copy the global cfg file into the output folder
cfg_file=out_folder+'/conf.cfg'
with open(cfg_file, 'w') as configfile:   
    config.write(configfile) 
    

# Load the run_nn function from core libriary    
# The run_nn is a function that process a single chunk of data   #run_nn是用来处理单个块数据的函数
run_nn_script=config['exp']['run_nn_script'].split('.py')[0]
module = importlib.import_module('core')
run_nn=getattr(module, run_nn_script)

         
         
# Splitting data into chunks (see out_folder/additional_files)
create_lists(config)

# Writing the config files
create_configs(config)

print("- Chunk creation......OK!\n")

# create res_file
res_file_path=out_folder+'/res.res' #文件res.res总结了各个时期的训练和评估表现。
res_file = open(res_file_path, "w")
res_file.close()



# Learning rates and architecture-specific optimization parameters
arch_lst=get_all_archs(config) #获得所有层模型的cfg数据
lr={}
auto_lr_annealing={}
improvement_threshold={}
halving_factor={}
pt_files={}

for arch in arch_lst:
    lr[arch]=expand_str_ep(config[arch]['arch_lr'],'float',N_ep,'|','*') #学习率
    if len(config[arch]['arch_lr'].split('|'))>1:
       auto_lr_annealing[arch]=False
    else:
       auto_lr_annealing[arch]=True 
    improvement_threshold[arch]=float(config[arch]['arch_improvement_threshold'])
    halving_factor[arch]=float(config[arch]['arch_halving_factor']) #对半影响
    pt_files[arch]=config[arch]['arch_pretrain_file'] #pre-train模型

    
# If production, skip training and forward directly from last saved models
if is_production:
    ep           = N_ep-1   #跳过TRAINING LOOP
    N_ep         = 0
    model_files  = {}

    for arch in pt_files.keys():
        model_files[arch] = out_folder+'/exp_files/final_'+arch+'.pkl'  #.pkl模型是用于语音解码的最终模型


op_counter=1 # used to dected the next configuration file from the list_chunks.txt

# Reading the ordered list of config file to process 
cfg_file_list = [line.rstrip('\n') for line in open(out_folder+'/exp_files/list_chunks.txt')]
cfg_file_list.append(cfg_file_list[-1])


# A variable that tells if the current chunk is the first one that is being processed:
processed_first=True

data_name=[]
data_set=[]
data_end_index=[]
fea_dict=[]
lab_dict=[]
arch_dict=[]

 
# --------TRAINING LOOP--------#
for ep in range(N_ep):
    
    tr_loss_tot=0
    tr_error_tot=0
    tr_time_tot=0
    
    print('------------------------------ Epoch %s / %s ------------------------------'%(format(ep, N_ep_str_format),format(N_ep-1, N_ep_str_format)))

    for tr_data in tr_data_lst:
        
        # Compute the total number of chunks for each training epoch
        N_ck_tr=compute_n_chunks(out_folder,tr_data,ep,N_ep_str_format,'train')
        N_ck_str_format='0'+str(max(math.ceil(np.log10(N_ck_tr)),1))+'d'
     
        # ***Epoch training***
        for ck in range(N_ck_tr): #训练模型
            
            
            # paths of the output files (info,model,chunk_specific cfg file)
            info_file=out_folder+'/exp_files/train_'+tr_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.info' #train.info文件报告每个训练块的损失和错误性能。
            
            if ep+ck==0:
                model_files_past={}
            else:
                model_files_past=model_files
                
            model_files={}
            for arch in pt_files.keys():
                model_files[arch]=info_file.replace('.info','_'+arch+'.pkl')
            
            config_chunk_file=out_folder+'/exp_files/train_'+tr_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.cfg'
            
            # update learning rate in the cfg file (if needed)
            change_lr_cfg(config_chunk_file,lr,ep)
                        
            
            # if this chunk has not already been processed, do training...
            if not(os.path.exists(info_file)):
                
                    print('Training %s chunk = %i / %i' %(tr_data,ck+1, N_ck_tr))
                                 
                    # getting the next chunk 
                    next_config_file=cfg_file_list[op_counter]

                        
                    # run chunk processing    #训练模型                 
                    [data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict]=run_nn(data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict,config_chunk_file,processed_first,next_config_file)
                                        
                    
                    # update the first_processed variable
                    processed_first=False
                                                            
                    if not(os.path.exists(info_file)):
                        sys.stderr.write("ERROR: training epoch %i, chunk %i not done! File %s does not exist.\nSee %s \n" % (ep,ck,info_file,log_file))
                        sys.exit(0)
                                  
            # update the operation counter
            op_counter+=1          
            
            
            # update pt_file (used to initialized the DNN for the next chunk)  
            for pt_arch in pt_files.keys():
                pt_files[pt_arch]=out_folder+'/exp_files/train_'+tr_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'_'+pt_arch+'.pkl'
                
            # remove previous pkl files
            if len(model_files_past.keys())>0:
                for pt_arch in pt_files.keys():
                    if os.path.exists(model_files_past[pt_arch]):
                        os.remove(model_files_past[pt_arch]) 
    
    
        # Training Loss and Error    
        tr_info_lst=sorted(glob.glob(out_folder+'/exp_files/train_'+tr_data+'_ep'+format(ep, N_ep_str_format)+'*.info'))
        [tr_loss,tr_error,tr_time]=compute_avg_performance(tr_info_lst)
        
        tr_loss_tot=tr_loss_tot+tr_loss
        tr_error_tot=tr_error_tot+tr_error
        tr_time_tot=tr_time_tot+tr_time
        
        
        # ***Epoch validation***
        if ep>0:
            # store previous-epoch results (useful for learnig rate anealling)
            valid_peformance_dict_prev=valid_peformance_dict
        
        valid_peformance_dict={}  
        tot_time=tr_time  
    
    
    for valid_data in valid_data_lst:  #验证数据集
        
        # Compute the number of chunks for each validation dataset
        N_ck_valid=compute_n_chunks(out_folder,valid_data,ep,N_ep_str_format,'valid')
        N_ck_str_format='0'+str(max(math.ceil(np.log10(N_ck_valid)),1))+'d'
    
        for ck in range(N_ck_valid):
            
            
            # paths of the output files
            info_file=out_folder+'/exp_files/valid_'+valid_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.info'            
            config_chunk_file=out_folder+'/exp_files/valid_'+valid_data+'_ep'+format(ep, N_ep_str_format)+'_ck'+format(ck, N_ck_str_format)+'.cfg'
    
            # Do validation if the chunk was not already processed
            if not(os.path.exists(info_file)):
                print('Validating %s chunk = %i / %i' %(valid_data,ck+1,N_ck_valid))
                    
                # Doing eval
                
                # getting the next chunk 
                next_config_file=cfg_file_list[op_counter]
                                         
                # run chunk processing                    
                [data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict]=run_nn(data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict,config_chunk_file,processed_first,next_config_file)
                                                   
                # update the first_processed variable
                processed_first=False
                
                if not(os.path.exists(info_file)):
                    sys.stderr.write("ERROR: validation on epoch %i, chunk %i of dataset %s not done! File %s does not exist.\nSee %s \n" % (ep,ck,valid_data,info_file,log_file))
                    sys.exit(0)
    
            # update the operation counter
            op_counter+=1
        
        # Compute validation performance  
        valid_info_lst=sorted(glob.glob(out_folder+'/exp_files/valid_'+valid_data+'_ep'+format(ep, N_ep_str_format)+'*.info'))
        [valid_loss,valid_error,valid_time]=compute_avg_performance(valid_info_lst)
        valid_peformance_dict[valid_data]=[valid_loss,valid_error,valid_time]
        tot_time=tot_time+valid_time
       
        
    # Print results in both res_file and stdout #打印结果到输出文件中
    dump_epoch_results(res_file_path, ep, tr_data_lst, tr_loss_tot, tr_error_tot, tot_time, valid_data_lst, valid_peformance_dict, lr, N_ep)

        
    # Check for learning rate annealing   学习率退火处理
    if ep>0:
        # computing average validation error (on all the dataset specified)
        err_valid_mean=np.mean(np.asarray(list(valid_peformance_dict.values()))[:,1])
        err_valid_mean_prev=np.mean(np.asarray(list(valid_peformance_dict_prev.values()))[:,1])
        
        for lr_arch in lr.keys():
            # If an external lr schedule is not set, use newbob learning rate anealing
            if ep str to bool 


for data in forward_data_lst:
    for k in range(len(forward_outs)):#支持多个forward选项
        if forward_dec_outs[k]:#如果需要进行forward
            
            print('Decoding %s output %s' %(data,forward_outs[k]))
            
            info_file=out_folder+'/exp_files/decoding_'+data+'_'+forward_outs[k]+'.info'
            
            
            # create decode config file
            config_dec_file=out_folder+'/decoding_'+data+'_'+forward_outs[k]+'.conf'
            config_dec = configparser.ConfigParser()
            config_dec.add_section('decoding')  #添加一个decoding的section
            
            for dec_key in config['decoding'].keys(): #将总的cfg文件的decoding块写入decoding过程的cfg文件中
                config_dec.set('decoding',dec_key,config['decoding'][dec_key])
 
            # add graph_dir, datadir, alidir  
            lab_field=config[cfg_item2sec(config,'data_name',data)]['lab']
            
            
            # Production case, we don't have labels   没有标签
            if not is_production:
                pattern='lab_folder=(.*)\nlab_opts=(.*)\nlab_count_file=(.*)\nlab_data_folder=(.*)\nlab_graph=(.*)'
                alidir=re.findall(pattern,lab_field)[0][0]  #配对的第0个 lab_folder
                config_dec.set('decoding','alidir',os.path.abspath(alidir))

                datadir=re.findall(pattern,lab_field)[0][3]  #配对的第三行  lab_data_folder
                config_dec.set('decoding','data',os.path.abspath(datadir))
                
                graphdir=re.findall(pattern,lab_field)[0][4]  #配对的第四行 lab_graph
                config_dec.set('decoding','graphdir',os.path.abspath(graphdir))
            else:#有标签
                pattern='lab_data_folder=(.*)\nlab_graph=(.*)'
                datadir=re.findall(pattern,lab_field)[0][0]
                config_dec.set('decoding','data',os.path.abspath(datadir))
                
                graphdir=re.findall(pattern,lab_field)[0][1]
                config_dec.set('decoding','graphdir',os.path.abspath(graphdir))

                # The ali dir is supposed to be in exp/model/ which is one level ahead of graphdir  
                alidir = graphdir.split('/')[0:len(graphdir.split('/'))-1]
                alidir = "/".join(alidir)
                config_dec.set('decoding','alidir',os.path.abspath(alidir))

            
            with open(config_dec_file, 'w') as configfile:
                config_dec.write(configfile)
             
            out_folder=os.path.abspath(out_folder)
            files_dec=out_folder+'/exp_files/forward_'+data+'_ep*_ck*_'+forward_outs[k]+'_to_decode.ark'   # .ark文件，该文件将作为第三个参数传入decode_dnn.sh   数据文件   本文件在下一步中可能会被删除
            out_dec_folder=out_folder+'/decode_'+data+'_'+forward_outs[k]  #decoding输出的文件夹
                
            if not(os.path.exists(info_file)):
                
                # Run the decoder  #首先调用kaldi_decoding_scripts文件夹中的decode_dnn.sh 
                cmd_decode=cmd+config['decoding']['decoding_script_folder'] +'/'+ config['decoding']['decoding_script']+ ' '+os.path.abspath(config_dec_file)+' '+ out_dec_folder + ' \"'+ files_dec + '\"' 
                run_shell(cmd_decode,log_file)
                
                # remove ark files if needed
                if not forward_save_files[k]:
                    list_rem=glob.glob(files_dec)
                    for rem_ark in list_rem:
                        os.remove(rem_ark)
                    
                    
            # Print WER results and write info file
            cmd_res='./check_res_dec.sh '+out_dec_folder#然后调用本地文件夹下的check_res_dec.sh
            wers=run_shell(cmd_res,log_file).decode('utf-8')
            res_file = open(res_file_path, "a")
            res_file.write('%s\n'%wers)
            print(wers)

# Saving Loss and Err as .txt and plotting curves
if not is_production:
    create_curves(out_folder, N_ep, valid_data_lst)

core.py


def run_nn(data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict,cfg_file,processed_first,next_config_file):
    
    # This function processes the current chunk using the information in cfg_file. In parallel, the next chunk is load into the CPU memory
    
    # Reading chunk-specific cfg file (first argument-mandatory file) 
    if not(os.path.exists(cfg_file)):
         sys.stderr.write('ERROR: The config file %s does not exist!\n'%(cfg_file))
         sys.exit(0)
    else:
        config = configparser.ConfigParser()
        config.read(cfg_file)
    
    # Setting torch seed
    seed=int(config['exp']['seed'])
    torch.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)
    
    
    # Reading config parameters
    output_folder=config['exp']['out_folder']
    use_cuda=strtobool(config['exp']['use_cuda'])
    multi_gpu=strtobool(config['exp']['multi_gpu'])
    
    to_do=config['exp']['to_do']
    info_file=config['exp']['out_info']
    
    model=config['model']['model'].split('\n') #模型参数
    
    forward_outs=config['forward']['forward_out'].split(',')
    forward_normalize_post=list(map(strtobool,config['forward']['normalize_posteriors'].split(',')))
    forward_count_files=config['forward']['normalize_with_counts_from'].split(',')
    require_decodings=list(map(strtobool,config['forward']['require_decoding'].split(',')))
    
    use_cuda=strtobool(config['exp']['use_cuda'])
    save_gpumem=strtobool(config['exp']['save_gpumem'])
    is_production=strtobool(config['exp']['production'])  

    if to_do=='train':
        batch_size=int(config['batches']['batch_size_train'])
    
    if to_do=='valid':
        batch_size=int(config['batches']['batch_size_valid'])
    
    if to_do=='forward':
        batch_size=1
        
    
    # ***** Reading the Data********
    if processed_first:
        
        # Reading all the features and labels for this chunk
        shared_list=[]
        
        p=threading.Thread(target=read_lab_fea, args=(cfg_file,is_production,shared_list,output_folder,)) #多线程读取cfg_file文件内指向的内容，并存入shared_list中，output_folder为log记录输出文件夹
        p.start()
        p.join()
        
        data_name=shared_list[0]
        data_end_index=shared_list[1]
        fea_dict=shared_list[2]
        lab_dict=shared_list[3]
        arch_dict=shared_list[4]
        data_set=shared_list[5]


        
        # converting numpy tensors into pytorch tensors and put them on GPUs if specified
        if not(save_gpumem) and use_cuda:
           data_set=torch.from_numpy(data_set).float().cuda() #使用cuda
        else:
           data_set=torch.from_numpy(data_set).float()
                           
    # Reading all the features and labels for the next chunk  #多线程读取下个特征数据块
    shared_list=[]
    p=threading.Thread(target=read_lab_fea, args=(next_config_file,is_production,shared_list,output_folder,))
    p.start()
    
    # Reading model and initialize networks #阅读模型参数，初始化模型
    inp_out_dict=fea_dict
    
    [nns,costs]=model_init(inp_out_dict,model,config,arch_dict,use_cuda,multi_gpu,to_do)   #初始化模型  在utils.py中   调用neural_networks.py形成模型    nns为总网络    costs为开销
       
    # optimizers initialization
    optimizers=optimizer_init(nns,config,arch_dict)   #初始化优化器   在untils.py中
           
    
    # pre-training      在已经有上一步的过程（train\vaild\test）中，将模型变成上一步已经完成的模型
    for net in nns.keys():
      pt_file_arch=config[arch_dict[net][0]]['arch_pretrain_file']  #得到cfg文件的arch_pertrain_file 
      
      if pt_file_arch!='none':        
          checkpoint_load = torch.load(pt_file_arch)
          nns[net].load_state_dict(checkpoint_load['model_par'])
          optimizers[net].load_state_dict(checkpoint_load['optimizer_par'])
          optimizers[net].param_groups[0]['lr']=float(config[arch_dict[net][0]]['arch_lr']) # loading lr of the cfg file for pt
          
    
    
    
    if to_do=='forward': #对forward过程来说，需要进行    只有在forward中才会形成ark文件
        
        post_file={}
        for out_id in range(len(forward_outs)):  #这个for循环是对所有输出来说（可能有多个输出的网络）
            if require_decodings[out_id]:
                out_file=info_file.replace('.info','_'+forward_outs[out_id]+'_to_decode.ark')#输出的ark位置
            else:
                out_file=info_file.replace('.info','_'+forward_outs[out_id]+'.ark')
            post_file[forward_outs[out_id]]=open_or_fd(out_file,output_folder,'wb')  #Open file, gzipped file, pipe, or forward the file-descriptor.  返回的是句柄？


    # check automatically(自动的) if the model is sequential(连续的)   得到cfg文件中该层的arch_seq_model的值
    seq_model=is_sequential_dict(config,arch_dict)   #RNN LSTM GRU 等与输入顺序有关的架构，该处设为True    CNN.MLP等与输入顺序无关的架构 该处设为False   false会随机化特征
      
    # ***** Minibatch Processing loop********
    if seq_model or to_do=='forward':
        N_snt=len(data_name)
        N_batches=int(N_snt/batch_size)
    else:
        N_ex_tr=data_set.shape[0]
        N_batches=int(N_ex_tr/batch_size)
        
    
    beg_batch=0
    end_batch=batch_size 
    
    snt_index=0
    beg_snt=0 
    

    start_time = time.time()
    
    # array of sentence lengths   得到表示句子长度的数组
    arr_snt_len=shift(shift(data_end_index, -1,0)-data_end_index,1,0)
    arr_snt_len[0]=data_end_index[0]
    
    
    loss_sum=0
    err_sum=0
    
    inp_dim=data_set.shape[1]
    for i in range(N_batches):   #对分块进行循环
        
        max_len=0
    
        if seq_model: #如果是顺序输入的架构  需要保留序列的顺序    
         
         max_len=int(max(arr_snt_len[snt_index:snt_index+batch_size]))  
         inp= torch.zeros(max_len,batch_size,inp_dim).contiguous()  #   inp.shape[0]表示最长序列的长度   inp.shap[1]表示batch大小 inp.shap[2]表示特征维数
    
            
         for k in range(batch_size): #对这一块的每个序列进行循环
              
                  snt_len=data_end_index[snt_index]-beg_snt  #句子长度 等于 句子末尾的序列号-开头的序列号
                  N_zeros=max_len-snt_len  #该序列需要添加的零的个数
                  
                  # Appending a random number of initial zeros, tge others are at the end.  随机生成一个位置，它之前都是0,特征都在它之后。 特征添加完以后再补零至max_len。
                  N_zeros_left=random.randint(0,N_zeros)  #随机序列开始的位置
                 
                  # randomizing could have a regularization effect  随机化可能具有regularization效应    inp随机取得了数据（将特征放置到了随机的地方）
                  inp[N_zeros_left:N_zeros_left+snt_len,k,:]=data_set[beg_snt:beg_snt+snt_len,:]  #inp为三维tensor    
                  
                  beg_snt=data_end_index[snt_index]
                  snt_index=snt_index+1
                
        else:
            # features and labels for batch i
            if to_do!='forward':#当训练或者验证时，不变数据，因为有batch
                inp= data_set[beg_batch:end_batch,:].contiguous()
            else:#当 当前 的过程是forward时，batch=1,按顺序获取特征序列（并没有补0） 
                snt_len=data_end_index[snt_index]-beg_snt
                inp= data_set[beg_snt:beg_snt+snt_len,:].contiguous() #这里的inp仅为二维tensor,无batch
                beg_snt=data_end_index[snt_index]
                snt_index=snt_index+1
    
        # use cuda
        if use_cuda:
            inp=inp.cuda()
    
        if to_do=='train':
            # Forward input, with autograd graph active       调用 utils.py 内的forward_model函数
            outs_dict=forward_model(fea_dict,lab_dict,arch_dict,model,nns,costs,inp,inp_out_dict,max_len,batch_size,to_do,forward_outs)
            
            for opt in optimizers.keys():
                optimizers[opt].zero_grad()
                
    
            outs_dict['loss_final'].backward()#反向传播
            
            # Gradient Clipping (th 0.1)
            #for net in nns.keys():
            #    torch.nn.utils.clip_grad_norm_(nns[net].parameters(), 0.1)
            
            
            for opt in optimizers.keys():
                if not(strtobool(config[arch_dict[opt][0]]['arch_freeze'])):
                    optimizers[opt].step()
        else:#  forward or vaild 这两个过程均不需要反向传播。为了节约内存，均不使用autgrad graph。
            with torch.no_grad(): # Forward input without autograd graph (save memory)
                outs_dict=forward_model(fea_dict,lab_dict,arch_dict,model,nns,costs,inp,inp_out_dict,max_len,batch_size,to_do,forward_outs)
    
                    
        if to_do=='forward':  #保存ark文件   ark文件保存的是loglikelihood
            for out_id in range(len(forward_outs)):  
                
                out_save=outs_dict[forward_outs[out_id]].data.cpu().numpy()
                
                if forward_normalize_post[out_id]:
                    # read the config file
                    counts = load_counts(forward_count_files[out_id])
                    out_save=out_save-np.log(counts/np.sum(counts))             
                    
                # save the output 保存输出的ark文件     极为重要 
                write_mat(output_folder,post_file[forward_outs[out_id]], out_save, data_name[i])
        else:
            loss_sum=loss_sum+outs_dict['loss_final'].detach()
            err_sum=err_sum+outs_dict['err_final'].detach()
           
        # update it to the next batch 
        beg_batch=end_batch
        end_batch=beg_batch+batch_size
        
        # Progress bar 进度条
        if to_do == 'train':
          status_string="Training | (Batch "+str(i+1)+"/"+str(N_batches)+")"+" | L:" +str(round(outs_dict['loss_final'].detach().item(),3))
          if i==N_batches-1:
             status_string="Training | (Batch "+str(i+1)+"/"+str(N_batches)+")"

             
        if to_do == 'valid':
          status_string="Validating | (Batch "+str(i+1)+"/"+str(N_batches)+")"
        if to_do == 'forward':
          status_string="Forwarding | (Batch "+str(i+1)+"/"+str(N_batches)+")"
          
        progress(i, N_batches, status=status_string)
    
    elapsed_time_chunk=time.time() - start_time 
    
    loss_tot=loss_sum/N_batches
    err_tot=err_sum/N_batches
    
    # clearing memory
    del inp, outs_dict, data_set
    
    # save the model
    if to_do=='train':
     
    
         for net in nns.keys():
             checkpoint={}
             checkpoint['model_par']=nns[net].state_dict()
             checkpoint['optimizer_par']=optimizers[net].state_dict()
             
             out_file=info_file.replace('.info','_'+arch_dict[net][0]+'.pkl')
             torch.save(checkpoint, out_file)#保存模型文件
         
    if to_do=='forward':#关闭所有的输出ark文件的句柄   只有在forward中才会形成ark文件
        for out_name in forward_outs:
            post_file[out_name].close()
         
    
         
    # Write info file 这里写了info文件
    with open(info_file, "w") as text_file:
        text_file.write("[results]\n")
        if to_do!='forward':
            text_file.write("loss=%s\n" % loss_tot.cpu().numpy())
            text_file.write("err=%s\n" % err_tot.cpu().numpy())
        text_file.write("elapsed_time_chunk=%f\n" % elapsed_time_chunk)
    
    text_file.close()
    
    
    # Getting the data for the next chunk (read in parallel)    
    p.join()
    data_name=shared_list[0]
    data_end_index=shared_list[1]
    fea_dict=shared_list[2]
    lab_dict=shared_list[3]
    arch_dict=shared_list[4]
    data_set=shared_list[5]
    
    
    # converting numpy tensors into pytorch tensors and put them on GPUs if specified
    if not(save_gpumem) and use_cuda:
       data_set=torch.from_numpy(data_set).float().cuda()
    else:
       data_set=torch.from_numpy(data_set).float()
       
       
    return [data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict]

utils.py 选取了一些utils.py中较为重要的工具。

      
def run_shell(cmd,log_file): #执行cmd 并返回未编码的output
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,shell=True)
    
    (output, err) = p.communicate()
    p.wait()
    with open(log_file, 'a+') as logfile:
        logfile.write(output.decode("utf-8")+'\n')
        logfile.write(err.decode("utf-8")+'\n')
    
    #print(output.decode("utf-8"))    
    return output


def read_args_command_line(args,config): #读取command中输入的里面的参数
    
    sections=[]
    fields=[]
    values=[]

    for i in range(2,len(args)):

        # check if the option is valid for second level
        r2=re.compile('--.*,.*=.*')

        # check if the option is valid for 4 level
        r4=re.compile('--.*,.*,.*,.*=".*"')
        if r2.match(args[i]) is None and r4.match(args[i]) is None:
            sys.stderr.write('ERROR: option \"%s\" from command line is not valid! (the format must be \"--section,field=value\")\n' %(args[i]))
            sys.exit(0)
        
        sections.append(re.search('--(.*),', args[i]).group(1))
        fields.append(re.search(',(.*)', args[i].split('=')[0]).group(1))
        values.append(re.search('=(.*)', args[i]).group(1))

    # parsing command line arguments
    for i in range(len(sections)):

        # Remove multi level is level >= 2
        sections[i] = sections[i].split(',')[0]

        if sections[i] in config.sections():

            # Case of args level > than 2 like --sec,fields,0,field="value"
            if len(fields[i].split(',')) >= 2:

                splitted = fields[i].split(',')

                #Get the actual fields
                field  = splitted[0]
                number = int(splitted[1])
                f_name = splitted[2]
                if field in list(config[sections[i]]):

                    # Get the current string of the corresponding field
                    current_config_field = config[sections[i]][field]

                    # Count the number of occurence of the required field
                    matching = re.findall(f_name+'.', current_config_field)
                    if number >= len(matching):
                        sys.stderr.write('ERROR: the field number \"%s\" provided from command line is not valid, we found \"%s\" \"%s\" field(s) in section \"%s\"!\n' %(number, len(matching), f_name, field ))
                        sys.exit(0)
                    else:
                        
                        # Now replace
                        str_to_be_replaced         = re.findall(f_name+'.*', current_config_field)[number]
                        new_str                    = str(f_name+'='+values[i])
                        replaced                   = nth_replace_string(current_config_field, str_to_be_replaced, new_str, number+1)
                        config[sections[i]][field] = replaced

                else:
                    sys.stderr.write('ERROR: field \"%s\" of section \"%s\" from command line is not valid!")\n' %(field,sections[i]))
                    sys.exit(0)
            else:
                if fields[i] in list(config[sections[i]]):
                    config[sections[i]][fields[i]]=values[i]
                else:
                    sys.stderr.write('ERROR: field \"%s\" of section \"%s\" from command line is not valid!")\n' %(fields[i],sections[i])) 
                    sys.exit(0)
        else:
            sys.stderr.write('ERROR: section \"%s\" from command line is not valid!")\n' %(sections[i]))
            sys.exit(0)
        
    return [sections,fields,values]


def compute_avg_performance(info_lst):
    
    losses=[]
    errors=[]
    times=[]

    for tr_info_file in info_lst:
        config_res = configparser.ConfigParser()
        config_res.read(tr_info_file)
        losses.append(float(config_res['results']['loss']))
        errors.append(float(config_res['results']['err']))
        times.append(float(config_res['results']['elapsed_time_chunk']))
        
    loss=np.mean(losses)
    error=np.mean(errors)
    time=np.sum(times)
    
    return [loss,error,time]


    
def check_cfg(cfg_file,config,cfg_file_proto):  #检查参数，并转换某些特殊参数
 
    # Check consistency between cfg_file and cfg_file_proto  检查一致性  
    [config_proto,name_data,name_arch]=check_consistency_with_proto(cfg_file,cfg_file_proto)

    # Reload data_name because they might be altered by arguments    name_data是所有[dataset]里面设置的dataname的list
    name_data=[]
    for sec in config.sections():
        if 'dataset' in sec:
            name_data.append(config[sec]['data_name'])
            
    # check consistency between [data_use] vs [data*]
    sec_parse=True
    data_use_with=[]
    for data in list(dict(config.items('data_use')).values()):
        data_use_with.append(data.split(','))
        
    data_use_with=sum(data_use_with, [])

    if not(set(data_use_with).issubset(name_data)):
        sys.stderr.write("ERROR: in [data_use] you are using a dataset not specified in [dataset*] %s \n" % (cfg_file))
        sec_parse=False
     
    # Set to false the first layer norm layer if the architecture is sequential (to avoid numerical instabilities)  如果架构是时序（ sequential）的，则将第一层norm层设置为false（以避免数值不稳定性） 
    seq_model=False
    for sec in config.sections():
     if "architecture" in sec:  
         if strtobool(config[sec]['arch_seq_model']):
             seq_model=True
             break
         
    if seq_model:
        for item in list(config['architecture1'].items()):
            if 'use_laynorm' in item[0] and '_inp' not in item[0]:
                ln_list=item[1].split(',')
                if ln_list[0]=='True':
                    ln_list[0]='False'
                    config['architecture1'][item[0]]=','.join(ln_list)


    # Parse fea and lab  fields in datasets*
    cnt=0
    fea_names_lst=[]
    lab_names_lst=[]
    for data in name_data:

        # Check for production case 'none' lab name
        [lab_names,_,_]=parse_lab_field(config[cfg_item2sec(config,'data_name',data)]['lab'])
        config['exp']['production']=str('False')
        if lab_names== ["none"] and data == config['data_use']['forward_with']: #必须要在验证的时候才可能会改为True
            config['exp']['production']=str('True')
            continue
        elif lab_names == ["none"] and data != config['data_use']['forward_with']:
            continue

        [fea_names,fea_lsts,fea_opts,cws_left,cws_right]=parse_fea_field(config[cfg_item2sec(config,'data_name',data)]['fea'])
        [lab_names,lab_folders,lab_opts]=parse_lab_field(config[cfg_item2sec(config,'data_name',data)]['lab']) #从[dataset]里面读到lab_names,lab_folders,lab_opts
        
        fea_names_lst.append(sorted(fea_names)) #此步在循环内，向fea_names_lst中添加fea的名字
        lab_names_lst.append(sorted(lab_names))#此步在循环内，向lab_names_lst中添加lab的名字
        
        # Check that fea_name doesn't contain special characters
        for name_features in fea_names_lst[cnt]:
            if not(re.match("^[a-zA-Z0-9]*$", name_features)):
                    sys.stderr.write("ERROR: features names (fea_name=) must contain only letters or numbers (no special characters as \"_,$,..\") \n" )
                    sec_parse=False
                    sys.exit(0) 
            
        if cnt>0:
            if fea_names_lst[cnt-1]!=fea_names_lst[cnt]:#数据集的fea一定需要是相同的
                sys.stderr.write("ERROR: features name (fea_name) must be the same of all the datasets! \n" )
                sec_parse=False
                sys.exit(0) 
            if lab_names_lst[cnt-1]!=lab_names_lst[cnt]: #数据集的lab_name一定需要相同的
                sys.stderr.write("ERROR: labels name (lab_name) must be the same of all the datasets! \n" )
                sec_parse=False
                sys.exit(0) 
            
        cnt=cnt+1

    # Create the output folder 
    out_folder=config['exp']['out_folder']

    if not os.path.exists(out_folder) or not(os.path.exists(out_folder+'/exp_files')) :
        os.makedirs(out_folder+'/exp_files')
        
    # Parsing forward field
    model=config['model']['model']
    possible_outs=list(re.findall('(.*)=',model.replace(' ','')))
    forward_out_lst=config['forward']['forward_out'].split(',')
    forward_norm_lst=config['forward']['normalize_with_counts_from'].split(',')
    forward_norm_bool_lst=config['forward']['normalize_posteriors'].split(',')

    lab_lst=list(re.findall('lab_name=(.*)\n',config['dataset1']['lab'].replace(' ','')))        #lab_lst是[dataset]里lab_name=?内  ?的lst   
    lab_folders=list(re.findall('lab_folder=(.*)\n',config['dataset1']['lab'].replace(' ','')))
    N_out_lab=['none'] * len(lab_lst)

    for i in range(len(lab_opts)):
        
        # Compute number of monophones if needed  #ali是对齐的意思
        if "ali-to-phones" in lab_opts[i]:

            log_file=config['exp']['out_folder']+'/log.log'
            folder_lab_count=lab_folders[i]
            cmd="hmm-info "+folder_lab_count+"/final.mdl | awk '/phones/{print $4}'"
            output=run_shell(cmd,log_file)
            if output.decode().rstrip()=='':
                sys.stderr.write("ERROR: hmm-info command doesn't exist. Make sure your .bashrc contains the Kaldi paths and correctly exports it.\n")
                sys.exit(0)
    
            N_out=int(output.decode().rstrip())
            N_out_lab[i]=N_out


        
    
    for i in range(len(forward_out_lst)):

        if forward_out_lst[i] not in possible_outs:
            sys.stderr.write('ERROR: the output \"%s\" in the section \"forward_out\" is not defined in section model)\n' %(forward_out_lst[i]))
            sys.exit(0)

        if strtobool(forward_norm_bool_lst[i]):

            if forward_norm_lst[i] not in lab_lst:
                if not os.path.exists(forward_norm_lst[i]):
                    sys.stderr.write('ERROR: the count_file \"%s\" in the section \"forward_out\" is does not exist)\n' %(forward_norm_lst[i]))
                    sys.exit(0)
                else:
                    # Check if the specified file is in the right format
                    f = open(forward_norm_lst[i],"r")
                    cnts = f.read()
                    if not(bool(re.match("(.*)\[(.*)\]", cnts))):
                        sys.stderr.write('ERROR: the count_file \"%s\" in the section \"forward_out\" is not in the right format)\n' %(forward_norm_lst[i]))
                        
                    
            else:
                # Try to automatically retrieve the count file from the config file 尝试从配置文件自动检索计数文件 
    
                    
                # Compute the number of context-dependent phone states   计算上下文相关的phone状态数
                if "ali-to-pdf" in lab_opts[lab_lst.index(forward_norm_lst[i])]:
                    log_file=config['exp']['out_folder']+'/log.log'
                    folder_lab_count=lab_folders[lab_lst.index(forward_norm_lst[i])]
                    cmd="hmm-info "+folder_lab_count+"/final.mdl | awk '/pdfs/{print $4}'"  #number of pdfs
                    output=run_shell(cmd,log_file)
                    if output.decode().rstrip()=='':
                        sys.stderr.write("ERROR: hmm-info command doesn't exist. Make sure your .bashrc contains the Kaldi paths and correctly exports it.\n")
                        sys.exit(0)

                    N_out=int(output.decode().rstrip())   #rstrip：用来去除结尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
                    N_out_lab[lab_lst.index(forward_norm_lst[i])]=N_out  #上下文相关的phone状态数  number of pdfs
                    count_file_path=out_folder+'/exp_files/forward_'+forward_out_lst[i]+'_'+forward_norm_lst[i]+'.count'
                    cmd="analyze-counts --print-args=False --verbose=0 --binary=false --counts-dim="+str(N_out)+" \"ark:ali-to-pdf "+folder_lab_count+"/final.mdl \\\"ark:gunzip -c "+folder_lab_count+"/ali.*.gz |\\\" ark:- |\" "+ count_file_path
                    run_shell(cmd,log_file)
                    forward_norm_lst[i]=count_file_path

                else:
                    sys.stderr.write('ERROR: Not able to automatically retrieve count file for the label \"%s\". Please add a valid count file path in \"normalize_with_counts_from\" or set normalize_posteriors=False \n' %(forward_norm_lst[i]))
                    sys.exit(0)
                    
    # Update the config file with the count_file paths
    config['forward']['normalize_with_counts_from']=",".join(forward_norm_lst)

    
    # When possible replace the pattern "N_out_lab*" with the detected number of output   尽可能的用检测输出的数字替换掉cfg文件中的N_out_lab_*      lab_*必须是[dataset]里面的lab_name=?  (lab_*==?)
    for sec in config.sections():
        for field in list(config[sec]):
            for i in range(len(lab_lst)):
                pattern='N_out_'+lab_lst[i]
            
                if pattern in config[sec][field]:
                    if N_out_lab[i]!='none':
                        config[sec][field]=config[sec][field].replace(pattern,str(N_out_lab[i])) #替换   也就是用lab里面的个数替换掉N_out_lab*

                    else:
                       sys.stderr.write('ERROR: Cannot automatically retrieve the number of output in %s. Please, add manually the number of outputs \n' %(pattern))
                       sys.exit(0)
                       
                       
    # Check the model field
    parse_model_field(cfg_file)

    
    # Create block diagram picture of the model
    create_block_diagram(cfg_file)
    


    if sec_parse==False:
        sys.exit(0)
 
        
    return  [config,name_data,name_arch] 


#
def cfg_item2sec(config,field,value):  #找到cfg文件内第一个包含field=data的section，并返回section    eg:cfg_item2sec(config,'data_name',data)
    
    for sec in config.sections():#轮询每一个sections
        if field in list(dict(config.items(sec)).keys()):#如果sec有field这个域     
            if value in list(dict(config.items(sec)).values()):#且这个field的值刚好等于value   eg: data_name = data
                return sec#返回该section
            
    sys.stderr.write("ERROR: %s=%s not found in config file \n" % (field,value))
    sys.exit(0)
    return -1
        
        
        

                    
                    
def compute_n_chunks(out_folder,data_list,ep,N_ep_str_format,step):   #在exp_files文件中找到该step（train\vaild\forward）的轮此ep下，总共有多少个chunk
    list_ck=sorted(glob.glob(out_folder+'/exp_files/'+step+'_'+data_list+'_ep'+format(ep, N_ep_str_format)+'*.lst'))
    last_ck=list_ck[-1]#找到最末位的chunk的序号
    N_ck=int(re.findall('_ck(.+)_', last_ck)[-1].split('_')[0])+1 #序号+1   从0开始变成从1开始
    return N_ck

def dict_fea_lab_arch(config):#读取数据
    model=config['model']['model'].split('\n')#模型结构参数
    fea_lst=list(re.findall('fea_name=(.*)\n',config['data_chunk']['fea'].replace(' ','')))# fea_name = mfcc
    lab_lst=list(re.findall('lab_name=(.*)\n',config['data_chunk']['lab'].replace(' ','')))# lab_name = lab_cd  

    
    fea_lst_used=[]
    lab_lst_used=[]
    arch_lst_used=[]
    
    fea_dict_used={}
    lab_dict_used={}
    arch_dict_used={}
    
    fea_lst_used_name=[]
    lab_lst_used_name=[]
    arch_lst_used_name=[]
    
    fea_field=config['data_chunk']['fea'] #读取fea块
    lab_field=config['data_chunk']['lab']#读取lab块
    
    pattern='(.*)=(.*)\((.*),(.*)\)'
    
    for line in model:
        [out_name,operation,inp1,inp2]=list(re.findall(pattern,line)[0])
        
        if inp1 in fea_lst and inp1 not in fea_lst_used_name :  #inp1=GRU_layers pass
            pattern_fea="fea_name="+inp1+"\nfea_lst=(.*)\nfea_opts=(.*)\ncw_left=(.*)\ncw_right=(.*)"
            if sys.version_info[0]==2:#python2
                fea_lst_used.append((inp1+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).encode('utf8').split(','))  
                fea_dict_used[inp1]=(inp1+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).encode('utf8').split(',') 
            else:#python3
                fea_lst_used.append((inp1+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).split(','))
                fea_dict_used[inp1]=(inp1+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).split(',')
            
            fea_lst_used_name.append(inp1) #it has mfcc
            
            
        if inp2 in fea_lst and inp2 not in fea_lst_used_name: #inp2=mfcc in
            pattern_fea="fea_name="+inp2+"\nfea_lst=(.*)\nfea_opts=(.*)\ncw_left=(.*)\ncw_right=(.*)"
            if sys.version_info[0]==2:
                fea_lst_used.append((inp2+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).encode('utf8').split(',')) #添加所有特性到list之中
                fea_dict_used[inp2]=(inp2+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).encode('utf8').split(',')
            else:
                fea_lst_used.append((inp2+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).split(','))
                fea_dict_used[inp2]=(inp2+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).split(',')
                
            
            fea_lst_used_name.append(inp2)
        if inp1 in lab_lst and inp1 not in lab_lst_used_name:#inp1=GRU_layers   pass
            pattern_lab="lab_name="+inp1+"\nlab_folder=(.*)\nlab_opts=(.*)"
            
            if sys.version_info[0]==2:
                lab_lst_used.append((inp1+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).encode('utf8').split(','))
                lab_dict_used[inp1]=(inp1+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).encode('utf8').split(',')
            else:
                lab_lst_used.append((inp1+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).split(','))
                lab_dict_used[inp1]=(inp1+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).split(',')
            
            lab_lst_used_name.append(inp1)
            
        if inp2 in lab_lst and inp2 not in lab_lst_used_name:#inp2=lab_cd  in 
            pattern_lab="lab_name="+inp2+"\nlab_folder=(.*)\nlab_opts=(.*)"
            
            if sys.version_info[0]==2:
                lab_lst_used.append((inp2+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).encode('utf8').split(','))  #添加所有特性到list之中
                lab_dict_used[inp2]=(inp2+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).encode('utf8').split(',')
            else:
                lab_lst_used.append((inp2+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).split(','))
                lab_dict_used[inp2]=(inp2+","+",".join(list(re.findall(pattern_lab,lab_field)[0]))).split(',')              
            
            lab_lst_used_name.append(inp2) # it has lab_cd 
            
        if operation=='compute' and inp1 not in arch_lst_used_name:
            arch_id=cfg_item2sec(config,'arch_name',inp1)
            arch_seq_model=strtobool(config[arch_id]['arch_seq_model'])
            arch_lst_used.append([arch_id,inp1,arch_seq_model])
            arch_dict_used[inp1]=[arch_id,inp1,arch_seq_model]
            
            arch_lst_used_name.append(inp1)# it has GRU_layers\MLP_layers\
            
            
    # convert to unicode (for python 2)
    for i in range(len(fea_lst_used)):
        fea_lst_used[i]=list(map(str, fea_lst_used[i]))
        
    for i in range(len(lab_lst_used)):
        lab_lst_used[i]=list(map(str, lab_lst_used[i])) 
        
    for i in range(len(arch_lst_used)):
        arch_lst_used[i]=list(map(str, arch_lst_used[i]))
     
    return [fea_dict_used,lab_dict_used,arch_dict_used]   #返回的是字典       fea_dict_used为输入数据（mfcc）的配置       lab_dict_used为lab（lab_cd）的配置     arch_dict_used为网络结构的配置（块section name 是否为序列输入）     



def is_sequential(config,arch_lst): # To cancel
    seq_model=False
    
    for [arch_id,arch_name,arch_seq] in arch_lst:
        if strtobool(config[arch_id]['arch_seq_model']):
            seq_model=True
            break
    return seq_model


def is_sequential_dict(config,arch_dict):
    seq_model=False
    
    for arch in arch_dict.keys():
        arch_id=arch_dict[arch][0]
        if strtobool(config[arch_id]['arch_seq_model']):
            seq_model=True
            break
    return seq_model


def compute_cw_max(fea_dict): #计算两边最大的cw
    cw_left_arr=[]
    cw_right_arr=[]
    
    for fea in fea_dict.keys():
        cw_left_arr.append(int(fea_dict[fea][3]))
        cw_right_arr.append(int(fea_dict[fea][4]))
    
    cw_left_max=max(cw_left_arr)
    cw_right_max=max(cw_right_arr)
    
    return [cw_left_max,cw_right_max]


def model_init(inp_out_dict,model,config,arch_dict,use_cuda,multi_gpu,to_do): #读取配置文件中的model.model下的每一行，形成网络
    
    pattern='(.*)=(.*)\((.*),(.*)\)'
     
    nns={}
    costs={}
       
    for line in model: #读每一行
        [out_name,operation,inp1,inp2]=list(re.findall(pattern,line)[0])   # out_name输出   operation操作名称  inp1该层的名称   inp2输入
        
        if operation=='compute':
            
            # computing input dim
            inp_dim=inp_out_dict[inp2][-1]#得到上一层网络的输出维数
            
            # import the class
            module = importlib.import_module(config[arch_dict[inp1][0]]['arch_library'])
            nn_class=getattr(module, config[arch_dict[inp1][0]]['arch_class'])#导入neural_network.py里面定义的模块。
            
            # add use cuda and todo options
            config.set(arch_dict[inp1][0],'use_cuda',config['exp']['use_cuda'])
            config.set(arch_dict[inp1][0],'to_do',config['exp']['to_do'])
            
            arch_freeze_flag=strtobool(config[arch_dict[inp1][0]]['arch_freeze'])

            
            # initialize the neural network
            net=nn_class(config[arch_dict[inp1][0]],inp_dim) #初始化该层网络
    
    
            
            if use_cuda:
                net.cuda()
                if multi_gpu:
                    net = nn.DataParallel(net)
                    
            
            if to_do=='train':
                if not(arch_freeze_flag):
                    net.train()
                else:
                   # Switch to eval modality if architecture is frozen (mainly for batch_norm/dropout functions)
                   net.eval() 
            else:
                net.eval()
    
            
            # addigng nn into the nns dict
            nns[arch_dict[inp1][1]]=net
            
            if multi_gpu:
                out_dim=net.module.out_dim
            else:
                out_dim=net.out_dim
                
            # updating output dim
            inp_out_dict[out_name]=[out_dim]
            
        if operation=='concatenate':
            
            inp_dim1=inp_out_dict[inp1][-1]
            inp_dim2=inp_out_dict[inp2][-1]
            
            inp_out_dict[out_name]=[inp_dim1+inp_dim2]
        
        if operation=='cost_nll':
            costs[out_name] = nn.NLLLoss()  #nn.NLLLoss()负对数似然损失函数
            inp_out_dict[out_name]=[1]
            
            
        if operation=='cost_err':
            inp_out_dict[out_name]=[1]
            
        if operation=='mult' or operation=='sum' or operation=='mult_constant' or operation=='sum_constant' or operation=='avg' or operation=='mse':
            inp_out_dict[out_name]=inp_out_dict[inp1]    

    return [nns,costs]

def forward_model(fea_dict,lab_dict,arch_dict,model,nns,costs,inp,inp_out_dict,max_len,batch_size,to_do,forward_outs):
    
    # Forward Step
    outs_dict={}  #output的字典（包含了每个输出的特性）
    pattern='(.*)=(.*)\((.*),(.*)\)'
    
    # adding input features to out_dict:
    for fea in fea_dict.keys():  #支持特性多输入
        if len(inp.shape)==3 and len(fea_dict[fea])>1:  #  len(inp.shape)==3都是arch_seq_model=True的网络    []
            outs_dict[fea]=inp[:,:,fea_dict[fea][5]:fea_dict[fea][6]]

        if len(inp.shape)==2 and len(fea_dict[fea])>1:  #  len(inp.shape)==2都是arch_seq_model=False的网络
            outs_dict[fea]=inp[:,fea_dict[fea][5]:fea_dict[fea][6]]

    
    
    for line in model:  #model是cfg文件内的model块
        [out_name,operation,inp1,inp2]=list(re.findall(pattern,line)[0]) #读取model各行的cfg字符

        if operation=='compute':#如果进行的操作是计算
            
            if len(inp_out_dict[inp2])>1: # if it is an input feature  如果输入的是特征（如mfcc）
                
                # Selection of the right feature in the inp tensor  在inp tensor里选择正确的特性
                if len(inp.shape)==3:
                    inp_dnn=inp[:,:,inp_out_dict[inp2][-3]:inp_out_dict[inp2][-2]]
                    if not(bool(arch_dict[inp1][2])):
                        inp_dnn=inp_dnn.view(max_len*batch_size,-1)
                                 
                if len(inp.shape)==2:
                    inp_dnn=inp[:,inp_out_dict[inp2][-3]:inp_out_dict[inp2][-2]]
                    if bool(arch_dict[inp1][2]):
                        inp_dnn=inp_dnn.view(max_len,batch_size,-1)
                    
                outs_dict[out_name]=nns[inp1](inp_dnn)  #进行计算

                
            else:#如果输入的不是特性
                if not(bool(arch_dict[inp1][2])) and len(outs_dict[inp2].shape)==3:
                    outs_dict[inp2]=outs_dict[inp2].view(max_len*batch_size,-1)
                    
                if bool(arch_dict[inp1][2]) and len(outs_dict[inp2].shape)==2:
                    outs_dict[inp2]=outs_dict[inp2].view(max_len,batch_size,-1)
                    
                outs_dict[out_name]=nns[inp1](outs_dict[inp2])
                
            if to_do=='forward' and out_name==forward_outs[-1]:  #若to_do是forward，只进行到[forward]块中  forward_out = out_dnn2  的这一步（out_dnn2）
                break

        
        if operation=='cost_nll':#损失函数
            
            # Put labels in the right format
            if len(inp.shape)==3:
                lab_dnn=inp[:,:,lab_dict[inp2][3]]
            if len(inp.shape)==2:
                lab_dnn=inp[:,lab_dict[inp2][3]]
            
            lab_dnn=lab_dnn.view(-1).long()
            
            # put output in the right format
            out=outs_dict[inp1]

            
            if len(out.shape)==3:
                out=out.view(max_len*batch_size,-1)
            
            if to_do!='forward':
                outs_dict[out_name]=costs[out_name](out, lab_dnn)
            
            
        if operation=='cost_err':#损失的误差值

            if len(inp.shape)==3:
                lab_dnn=inp[:,:,lab_dict[inp2][3]]
            if len(inp.shape)==2:
                lab_dnn=inp[:,lab_dict[inp2][3]]
            
            lab_dnn=lab_dnn.view(-1).long()
            
            # put output in the right format
            out=outs_dict[inp1]
            
            if len(out.shape)==3:
                out=out.view(max_len*batch_size,-1)
            
            if to_do!='forward':
                pred=torch.max(out,dim=1)[1] 
                err = torch.mean((pred!=lab_dnn).float())
                outs_dict[out_name]=err
                #print(err)

        
        if operation=='concatenate':#串联
           dim_conc=len(outs_dict[inp1].shape)-1
           outs_dict[out_name]=torch.cat((outs_dict[inp1],outs_dict[inp2]),dim_conc) #check concat axis   cat的作用是拼接
           if to_do=='forward' and out_name==forward_outs[-1]:
                break
            
        if operation=='mult':#相乘
            outs_dict[out_name]=outs_dict[inp1]*outs_dict[inp2]
            if to_do=='forward' and out_name==forward_outs[-1]:
                break
 
        if operation=='sum':#相加
            outs_dict[out_name]=outs_dict[inp1]+outs_dict[inp2] 
            if to_do=='forward' and out_name==forward_outs[-1]:
                break
            
        if operation=='mult_constant':#乘以常数
            outs_dict[out_name]=outs_dict[inp1]*float(inp2)
            if to_do=='forward' and out_name==forward_outs[-1]:
                break
            
        if operation=='sum_constant':#加上常数
            outs_dict[out_name]=outs_dict[inp1]+float(inp2)
            if to_do=='forward' and out_name==forward_outs[-1]:
                break
            
        if operation=='avg':#两数取平均
            outs_dict[out_name]=(outs_dict[inp1]+outs_dict[inp2])/2
            if to_do=='forward' and out_name==forward_outs[-1]:
                break
            
        if operation=='mse':#求mse
            outs_dict[out_name]=torch.mean((outs_dict[inp1] - outs_dict[inp2]) ** 2)
            if to_do=='forward' and out_name==forward_outs[-1]:
                break


            
    return  outs_dict

neural_networks.py 只选取了LSTM进行注释

class LSTM(nn.Module):
    
    def __init__(self, options,inp_dim):
        super(LSTM, self).__init__()
        
        # Reading parameters
        self.input_dim=inp_dim  #输入的维数
        self.lstm_lay=list(map(int, options['lstm_lay'].split(',')))       #每个lay的神经元个数
        self.lstm_drop=list(map(float, options['lstm_drop'].split(',')))   #dropout

        self.lstm_use_batchnorm=list(map(strtobool, options['lstm_use_batchnorm'].split(',')))   #use laynorm      bool变量组
        self.lstm_use_laynorm=list(map(strtobool, options['lstm_use_laynorm'].split(',')))       #use batchnorm    bool变量组

        self.lstm_use_laynorm_inp=strtobool(options['lstm_use_laynorm_inp'])                     #use laynorm input    bool变量
        self.lstm_use_batchnorm_inp=strtobool(options['lstm_use_batchnorm_inp'])                 #use batchnorm input  bool变量 

        self.lstm_act=options['lstm_act'].split(',')                                             #lstm Activation function  激活函数
        self.lstm_orthinit=strtobool(options['lstm_orthinit'])					 #是否使用正交初始化

        self.bidir=strtobool(options['lstm_bidir'])						 #是否使用双向
        self.use_cuda=strtobool(options['use_cuda'])						 #是否使用cuda
        self.to_do=options['to_do']
        
        if self.to_do=='train':
            self.test_flag=False
        else:
            self.test_flag=True
        
        
        # List initialization
        self.wfx  = nn.ModuleList([]) # Forget  权重（输入值）
        self.ufh  = nn.ModuleList([]) # Forget  权重（上一时刻状态值）
        
        self.wix  = nn.ModuleList([]) # Input
        self.uih  = nn.ModuleList([]) # Input  
        
        self.wox  = nn.ModuleList([]) # Output
        self.uoh  = nn.ModuleList([]) # Output  
        
        self.wcx  = nn.ModuleList([]) # Cell state
        self.uch = nn.ModuleList([])  # Cell state
        
        self.ln  = nn.ModuleList([]) # Layer Norm
        self.bn_wfx  = nn.ModuleList([]) # Batch Norm
        self.bn_wix  = nn.ModuleList([]) # Batch Norm
        self.bn_wox  = nn.ModuleList([]) # Batch Norm
        self.bn_wcx = nn.ModuleList([]) # Batch Norm
        
        self.act  = nn.ModuleList([]) # Activations
       
  
        # Input layer normalization
        if self.lstm_use_laynorm_inp:
           self.ln0=LayerNorm(self.input_dim)   #   输入层normalliaztion  
          
        # Input batch normalization    
        if self.lstm_use_batchnorm_inp:
           self.bn0=nn.BatchNorm1d(self.input_dim,momentum=0.05)  
           
        self.N_lstm_lay=len(self.lstm_lay)   #层数
             
        current_input=self.input_dim     #当前的输入维数
        
        # Initialization of hidden layers
        
        for i in range(self.N_lstm_lay):
             
             # Activations
             self.act.append(act_fun(self.lstm_act[i]))   #添加该层的激活函数
            
             add_bias=True    #是否添加偏置
             
             
             if self.lstm_use_laynorm[i] or self.lstm_use_batchnorm[i]:  #如果使用了laynorm 或者  batchnorm，则偏置无效    因为使用了norm以后，数据的分布已经改变为正态分布，故偏置已经无意义
                 add_bias=False
             
                  
             # Feed-forward connections 前向连接
             self.wfx.append(nn.Linear(current_input, self.lstm_lay[i],bias=add_bias))
             self.wix.append(nn.Linear(current_input, self.lstm_lay[i],bias=add_bias))
             self.wox.append(nn.Linear(current_input, self.lstm_lay[i],bias=add_bias))
             self.wcx.append(nn.Linear(current_input, self.lstm_lay[i],bias=add_bias))
            
             # Recurrent connections  循环连接
             self.ufh.append(nn.Linear(self.lstm_lay[i], self.lstm_lay[i],bias=False)) 
             self.uih.append(nn.Linear(self.lstm_lay[i], self.lstm_lay[i],bias=False))
             self.uoh.append(nn.Linear(self.lstm_lay[i], self.lstm_lay[i],bias=False))
             self.uch.append(nn.Linear(self.lstm_lay[i], self.lstm_lay[i],bias=False))
             
             if self.lstm_orthinit:  #正交初始化
                nn.init.orthogonal_(self.ufh[i].weight) #将权重进行正交初始化
                nn.init.orthogonal_(self.uih[i].weight)
                nn.init.orthogonal_(self.uoh[i].weight)
                nn.init.orthogonal_(self.uch[i].weight)
            
             
             # batch norm initialization
             self.bn_wfx.append(nn.BatchNorm1d(self.lstm_lay[i],momentum=0.05))   #batch normalization
             self.bn_wix.append(nn.BatchNorm1d(self.lstm_lay[i],momentum=0.05))
             self.bn_wox.append(nn.BatchNorm1d(self.lstm_lay[i],momentum=0.05))
             self.bn_wcx.append(nn.BatchNorm1d(self.lstm_lay[i],momentum=0.05))
                
             self.ln.append(LayerNorm(self.lstm_lay[i]))
                
             if self.bidir:   #是否是双向的LSTM
                 current_input=2*self.lstm_lay[i]
             else:
                 current_input=self.lstm_lay[i]
                 
        self.out_dim=self.lstm_lay[i]+self.bidir*self.lstm_lay[i]  #输出的维数   self.bidir是bool值
            
             
        
    def forward(self, x):  #计算前向

        # Applying Layer/Batch Norm
        if bool(self.lstm_use_laynorm_inp):
            x=self.ln0((x))
        
        if bool(self.lstm_use_batchnorm_inp):
            x_bn=self.bn0(x.view(x.shape[0]*x.shape[1],x.shape[2])) #首先展开x成为一个二维数组，并进行batch normalization
            x=x_bn.view(x.shape[0],x.shape[1],x.shape[2])           #然后将x变成原先的shape

          
        for i in range(self.N_lstm_lay): #每一层
            
            # Initial state and concatenation
            if self.bidir:
                h_init = torch.zeros(2*x.shape[1], self.lstm_lay[i])
                x=torch.cat([x,flip(x,0)],1)    #cat为拼接函数   1表示横向拼接    0表示纵向拼接
            else:
                h_init = torch.zeros(x.shape[1],self.lstm_lay[i])
        
               
            # Drop mask initilization (same mask for all time steps)            
            if self.test_flag==False:
                drop_mask=torch.bernoulli(torch.Tensor(h_init.shape[0],h_init.shape[1]).fill_(1-self.lstm_drop[i]))   #bernoulli  伯努利分布（两点分布）     drop_mask首先是一个全部都为0.8，shape=(shape[0],shape[1])的矩阵  然后经过伯努利分布得到各点值为0或1的矩阵
            else:
                drop_mask=torch.FloatTensor([1-self.lstm_drop[i]])
                
            if self.use_cuda:
               h_init=h_init.cuda()
               drop_mask=drop_mask.cuda()
               
                 
            # Feed-forward affine transformations (all steps in parallel)  前馈仿射变换  y=WX+b
            wfx_out=self.wfx[i](x)#计算前馈
            wix_out=self.wix[i](x)
            wox_out=self.wox[i](x)
            wcx_out=self.wcx[i](x)
            
            # Apply batch norm if needed (all steps in parallel)
            if self.lstm_use_batchnorm[i]:

                wfx_out_bn=self.bn_wfx[i](wfx_out.view(wfx_out.shape[0]*wfx_out.shape[1],wfx_out.shape[2]))
                wfx_out=wfx_out_bn.view(wfx_out.shape[0],wfx_out.shape[1],wfx_out.shape[2])
         
                wix_out_bn=self.bn_wix[i](wix_out.view(wix_out.shape[0]*wix_out.shape[1],wix_out.shape[2]))
                wix_out=wix_out_bn.view(wix_out.shape[0],wix_out.shape[1],wix_out.shape[2])
   
                wox_out_bn=self.bn_wox[i](wox_out.view(wox_out.shape[0]*wox_out.shape[1],wox_out.shape[2]))
                wox_out=wox_out_bn.view(wox_out.shape[0],wox_out.shape[1],wox_out.shape[2])

                wcx_out_bn=self.bn_wcx[i](wcx_out.view(wcx_out.shape[0]*wcx_out.shape[1],wcx_out.shape[2]))
                wcx_out=wcx_out_bn.view(wcx_out.shape[0],wcx_out.shape[1],wcx_out.shape[2]) 
            
            
            # Processing time steps
            hiddens = []
            ct=h_init
            ht=h_init
            
            for k in range(x.shape[0]):
                
                # LSTM equations
                ft=torch.sigmoid(wfx_out[k]+self.ufh[i](ht))   #wx_out之前已经计算过了      uh还没有计算过
                it=torch.sigmoid(wix_out[k]+self.uih[i](ht))
                ot=torch.sigmoid(wox_out[k]+self.uoh[i](ht))
                ct=it*self.act[i](wcx_out[k]+self.uch[i](ht))*drop_mask+ft*ct
                ht=ot*self.act[i](ct)
                
                if self.lstm_use_laynorm[i]:
                    ht=self.ln[i](ht)
                    
                hiddens.append(ht)
                
            # Stacking hidden states  合并隐藏状态，将不同时刻得到的隐藏状态合并成同一个tensor，沿时间轴
            h=torch.stack(hiddens)
            
            # Bidirectional concatenations 双向
            if self.bidir:
                h_f=h[:,0:int(x.shape[1]/2)]
                h_b=flip(h[:,int(x.shape[1]/2):x.shape[1]].contiguous(),0)
                h=torch.cat([h_f,h_b],2)
                
            # Setup x for the next hidden layer
            x=h

              
        return x

你可能感兴趣的:(语音识别,pytorch-kaldi,语音识别,pytorch-kaldi,kaldi,pytorch)

Pytorch实现论文之三元DCGAN生成RGB图像用于红外图像着色生成这张生成的图像能检测吗 GAN系列优质GAN模型训练自己的数据集人工智能 python 生成对抗网络深度学习 pytorch 机器学习计算机视觉
简介简介：采用了三次DCGAN单独生成单通道图像之后进行组成RGB图像放入鉴别器中检测，并在鉴别器和生成器的损失训练中采用梯度方法来提升或者降低权重。该方法将用于获得红外图像着色的生成。论文题目：InfraredImageColorizationbasedonaTripletDCGANArchitecture（基于三元DCGAN架构的红外图像着色）会议：2017IEEEConferenceonCo
conda更换环境版本（比如torch版本）挨打且不服66 python python
找到想要的torch版本pytorch官网torch过往的版本创建新环境condacreate--namemyenvpython=3.8condaactivatemyenvconda虚拟环境中安装CUDA和CUDNN深度学习用显卡训练的时候，需要安装与显卡对应的cuda和cudnn。但不同的项目所支持的pytorch版本是不一样的，而pytorch版本和cuda版本之间又是互相依赖的，所以如果可以
一文读懂！深度学习 + PyTorch 的超实用学习路线 a小胡哦深度学习 python pytorch
深度学习作为人工智能领域的核心技术，正深刻改变着诸多行业。PyTorch则是深度学习实践中备受青睐的框架，它简单易用且功能强大。下面就为大家详细规划深度学习结合PyTorch的学习路线。一、基础知识储备数学基础数学是很重要的！！！线性代数、概率论与数理统计、微积分是深度学习的数学基石。熟悉矩阵运算、概率分布、梯度计算等概念，能帮助理解深度学习模型的原理。例如，在神经网络中，矩阵乘法用于神经元之间的
Python 用pytorch从头写Transformer源码，一行一解释；机器翻译实例代码；Transformer源码解读与实战医学小达人 NLP LLMs GPT 深度学习人工智能 transformer python 机器学习
1.Transformer简介Transformer模型被提出的主要动机是解决序列处理任务的长距离依赖问题。传统的RNN和LSTM虽然能够处理序列任务，但因为它们在处理序列时需要一步步前进，因此其他信息无法立即对其产生影响，当序列变长时，长距离依赖的信息很可能会被丢失。为了解决这个问题，Transformer模型被设计出来，内核思想是利用自注意力机制，这样模型可以直接对输入序列的任意两个位置建立直
采用分布式部署deepseek 慧香一格 AI 学习分布式 deepseek
分布式部署DeepSeek涉及使用多个计算节点来加速模型训练或提升推理效率。下面是一个基本的指南，帮助您了解如何进行分布式部署。1.环境准备硬件需求：确保您的集群环境中有足够的GPU资源，并且所有机器之间可以通过高速网络互联。软件依赖：安装必要的库和工具，如PyTorch、Transformers等。特别地，对于分布式训练，还需要安装torch.distributed或者类似的库支持，例如Horo
huggingface/pytorch-image-models GarryLau ML&DL pytorch python huggingface
huggingface/pytorch-image-models1.使用技巧1.1.训练指令单卡：pythontrain.py--pretrained--input-size3224224--mean000--std111--batch-size128--validation-batch-size128--color-jitter-prob0.2--grayscale-prob0.2--gauss
PyTorch `.pth` 转 ONNX：从模型训练到跨平台部署 MO__YE 人工智能
PyTorch.pth转ONNX：从模型训练到跨平台部署在深度学习里，模型的格式决定了它的可用性。如果你是PyTorch用户，你可能熟悉.pth文件，它用于存储训练好的模型。但当你想在不同的环境（如TensorRT、OpenVINO、ONNXRuntime）部署模型时，.pth可能并不适用。这时，ONNX（OpenNeuralNetworkExchange）就必不可少。本文目录：什么是.pth文件
Pytorch实现之SCGAN实现人脸修复这张生成的图像能检测吗优质GAN模型训练自己的数据集 GAN系列 pytorch 人工智能 python 生成对抗网络深度学习计算机视觉 gan
简介简介：在输入端对输入图像采用掩码遮挡部分图像，之后通过跳跃生成对抗网络生成修复掩码部分的人脸进而生成完整的人脸数据。对于生成器结构的损失采用MES损失，对于鉴别器的结构采用WGAN-GP的损失。鉴别器为双鉴别器结构，一个负责检验完整图像的真假，一个负责检验掩码部分图像的真假。论文题目：SCGAN:GenerativeAdversarialNetworksofSkipConnectionforF
PyTorch `.pth` 转 ONNX：从模型训练到跨平台部署 MO__YE pytorch 人工智能 python
PyTorch.pth转ONNX：从模型训练到跨平台部署在深度学习里，模型的格式决定了它的可用性。如果你是PyTorch用户，你可能熟悉.pth文件，它用于存储训练好的模型。但当你想在不同的环境（如TensorRT、OpenVINO、ONNXRuntime）部署模型时，.pth可能并不适用。这时，ONNX（OpenNeuralNetworkExchange）就必不可少。本文目录：什么是.pth文件
搜广推校招面经十九 Y1nhl 搜广推面经搜索引擎推荐算法 python 求职招聘
快手推荐算法一、1*1的cnn有什么作用？1.1.降维与通道数调整（ChannelReduction）在CNN中，特征图（FeatureMap）通常有多个通道（channels）。1×1卷积可以用于减少通道数，从而降低计算量，提高模型效率。1×1卷积可以增加通道数，以增强特征表达能力。示例代码（PyTorch）：importtorchimporttorch.nnasnnconv1x1=nn.Con
新书速览|细说PyTorch深度学习：理论、算法、模型与编程实现全栈开发圈深度学习 pytorch 算法
超详细的PyTorch深度学习入门书，100余个编程示例+6大热点案例，大咖带路，边学边实践。本书特点：1.专家编撰：由资深专家精心编撰，通俗易懂，娓娓道来2．范例丰富：100余个编程教学示例，帮你深入理解，边学习、边操练。3.实战应用：6大典型应用，原理与实操并重，快速掌握提升实战能力。4技术先进：视觉transformer模型详解，紧跟大模型核心技术。5易于上手：Pytorch详解并使用Pyt
人工智能到底是什么？ yzx991013 开发语言人工智能 python django
人工智能（ArtificialIntelligence，简称AI）是一门研究和开发能够模拟、延伸和扩展人类智能的理论、方法、技术及应用系统的学科。以下是关于人工智能的具体介绍：定义-从技术角度：人工智能是让计算机系统具备像人类一样的感知、学习、推理、决策等能力，通过算法和数据使计算机能处理和理解各种复杂信息，如语音识别系统能听懂人类语言并转化为文字。-从学科交叉角度：人工智能融合了计算机科学、控制
Android系统开机时间优化-实践篇（一）漫步的傻瓜 Android系统启动时间优化 android linux
Android系统开机时间优化目录背景正文优化内容小结产品功能：高清大屏、多路摄像头、蓝牙、WIFI、4G无线网络、收音机、语音识别等等。背景主芯片是多核处理器，高版本Android系统，启动时间相比android4.x的十几秒慢很多。优化前的状态:处理前已被优化的内容有：裁剪多余的原生apk和资源文件、部分耗时动作等,并修改log输出等级。这种情况下，启动时间，想比原生系统有较大改善，但不够理想
pytorch环境配置过程中遇到的那些坑枕绵 pytorch 人工智能 python conda
基本命令查看conda版本conda--version查看conda环境配置condaconfig--show创建虚拟环境condacreate-nenvNamepython=3.8查看虚拟环境的个数详情condaenvlist激活虚拟环境condaactivateenvName删除虚拟环境condaremove--nameenvName--all下载pytorchpip3installtorch
【pytorch】norm的使用安安爸Chris pytorch python 深度学习
torch.norm[deprecated]在torch.norm中，通过参数p来定制order主要有如下几类L1norm计算张量中所有数值之和L2norm计算张量中所有数值的平方和开根Frobeniusnorm计算张量中所有维度上所有数值的平方和开根Infinitynorm计算张量中有所数值绝对值最大Negativeinfinitynorm计算张量中所有数值绝对值最小importtorch#Cr
每天五分钟深度学习框架pytorch：搭建谷歌的Inception网络模块每天五分钟玩转人工智能深度学习框架pytorch 深度学习 pytorch 网络人工智能机器学习 Inception
本文重点前面我们学习了VGG，从现在开始我们将学习谷歌公司推出的GoogLeNet。当年ImageNet竞赛的第二名是VGG，而第一名就是GoogLeNet,它的模型设计拥有很多的技巧，这个model证明了一件事：用更多的卷积，更深的层次可以得到更好的结构GoogLeNet的网络结构如图所示就是GoogLeNet的网络结构，在这个网络结构中我们可以看到红色框起来的地方，他就是Inception块，
Pytorch实现mnist手写数字识别 Zn要学习 python
>-**本文为[365天深度学习训练营]中的学习记录博客**>-**原作者：[K同学啊]**我的环境：语言环境：Python3.8编译器：JupyterLab深度学习环境：torch==1.12.1+cu113torchvision==0.13.1+cu113一、前期准备1.设置GPU如果设备上支持GPU就使用GPU,否则使用CPUimporttorchimporttorch.nnasnnimpo
第二章：12.3 建立表现基准望云山190 基准性能水平人工智能机器学习
背景介绍语音识别是一种常见的机器学习应用，用户通过语音输入代替键盘输入，系统需要将语音转换为文本。在这个过程中，算法的性能可以通过训练误差和交叉验证误差来评估。误差定义训练误差（Jtrain）：指算法在训练数据集上无法正确转录的音频片段的百分比。在这个例子中，训练误差是10.8%，意味着算法在训练数据上犯了10.8%的错误。交叉验证误差（Jcv）：指算法在未见过的数据（交叉验证集）上无法正确转录的
Pytorch官方文档英语翻译 yanzhiwen2 深度学习Pyrotch pytorch 机器学习 python 人工智能深度学习
深度学习Pytorch-Pytorch官方文档英语翻译1.a-e1.1span跨度1.2blended混合的1.3criterion标准1.4deprecated弃用的1.5clamp钳制1.6arbitraryshapes任意形状1.7explodinggradients梯度爆炸1.8converge收敛1.9approximate近似1.10arg参数1.11argument参数1.12con
PyTorch中文/英文官方文档&教程资源三千の世界 Python DataAnalysis Computer Science pytorch
PyTorch中文文档https://pytorch-cn.readthedocs.io/zh/latest/PyTorch英文文档https://pytorch.org/docs/stable/index.htmlPyTorch官方教程-PyTorch教程1.1.0文档https://pytorch.org/tutorials/
【PYTORCH】官方的turoria实现中英文翻译 liwulin0506 pytorch python pytorch 人工智能 python
参考https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html背景pytorch官方的是seq2seq是法语到英文，做了一个中文到英文的。数据集下载后解压，使用的data\testsets\devset\UNv1.0.devset.zh和UNv1.0.devset.en，因为电脑配置不行，所以只选取了10
PyTorch实战深度学习——用CNN进行手写数字识别一位小说男主人工智能入门深度学习
用CNN进行手写数字识别---计算机专业研究生的代码第一课，相当于”HelloWorld“，不管以后选择什么研究方向，都值得一看,欢迎大家留言交流学习！下面手把手教大家一步一步实现该任务：1.环境准备首先呢，您需要确保安装了PyTorch库。如果还没有安装，可以使用以下命令进行安装，这里默认您已经有Anaconda并创建好虚拟环境啦，如果还没有安装，可以参考其他更完整的安装pytorch的教程：p
语音与自然语言处理（NLP）：智能交互的核心技术给生活加糖！热门知识自然语言处理交互人工智能
随着人工智能（AI）技术的飞速发展，语音识别与自然语言处理（NaturalLanguageProcessing,NLP）成为了智能交互系统的核心技术。它们不仅改变了人们与计算机、设备的交互方式，也推动了众多行业的革新。从智能助手（如苹果的Siri、亚马逊的Alexa）到机器翻译、自动客服系统，语音和NLP技术正逐步融入日常生活，改善我们与数字世界的沟通方式。一、什么是语音识别与自然语言处理（NLP
基于DeepSeek-R1的高效推理优化实战：从API封装到动态批处理竹木有心人工智能
引言在LLM（大语言模型）应用中，推理延迟和计算资源消耗是核心痛点。本文以DeepSeek-R1-7B模型为例，通过动态批处理、模型量化和异步推理三大技术，将单次推理耗时从2.3s降至0.4s，吞吐量提升6倍。所有代码均通过PyTorch2.1+验证。一、环境准备与模型加载优化1.1硬件感知的模型加载通过device_map自动分配计算资源，避免显存溢出fromtransformersimport
darts框架使用 ME_Seraph 机器学习 darts
文|Seraph高版本Pytorch问题运行test.py报错IndexError:invalidindexofa0-dimtensor.Usetensor.item()toconverta0-dimtensortoaPythonnumber解决：update函数的参数loss.data[0]，prec1.data[0]，prec5.data[0]等修改为loss.item()，prec1.ite
第N11周：seq2seq翻译实战-Pytorch复现计算机真好丸 pytorch 人工智能 python
文章目录一、前期准备1.搭建语言类2.文本处理函数3.文件读取函数二、Seq2Seq模型1.编码器（encoder）2.解码器（decoder）三、训练1.数据预处理2.训练函数3.评估四、评估与训练1.Loss图2.可视化注意力五、总结本文为365天深度学习训练营中的学习记录博客原作者：K同学啊一、前期准备from__future__importunicode_literals,print_fu
第N5周：Pytorch文本分类入门计算机真好丸 pytorch 分类人工智能
文章目录一、前期准备1.环境安装2.加载数据3.构建词典4.生成数据批次和迭代器二、准备模型1.定义模型2.定义实例三、训练模型1.拆分数据集并运行模型2.使用测试数据集评估模型本文为365天深度学习训练营中的学习记录博客原作者：K同学啊一、前期准备1.环境安装确保安装了torchtext与portalocker库2.加载数据importtorch#强制使用CPUdevice=torch.devi
第TR5周：Transformer实战：文本分类计算机真好丸 transformer 分类深度学习
文章目录1.准备环境1.1环境安装1.2加载数据2.数据预处理2.1构建词典2.2生成数据批次和迭代器2.3构建数据集3.模型构建3.1定义位置编码函数3.2定义Transformer模型3.3初始化模型3.4定义训练函数3.5定义评估函数4.训练模型4.1模型训练5.总结：本文为365天深度学习训练营中的学习记录博客原作者：K同学啊1.准备环境1.1环境安装这是一个使用PyTorch通过Tran
深度学习框架探秘｜TensorFlow vs PyTorch：AI 框架的巅峰对决紫雾凌寒智启前沿：AI 洞察・创未来人工智能深度学习 tensorflow pytorch ai
在深度学习框架中，TensorFlow和PyTorch无疑是两大明星框架。前面两篇文章我们分别介绍了TensorFlow（点击查看）和PyTorch（点击查看）。它们引领着AI开发的潮流，吸引着无数开发者投身其中。但这两大框架究竟谁更胜一筹？是TensorFlow的全面与稳健，还是PyTorch的灵活与便捷？让我们一同深入剖析，探寻答案。在深度学习框架中，TensorFlow和PyTorch无疑是
ASR技术与Whisper引擎 Catformon whisper
一、ASR技术简介ASR英文全称是AutomaticSpeechRecognition，中文叫做自动语音识别，是利用机器对语音信号进行识别和理解并将其转换成相文本和命令的技术。下面2张图是网上找到的语音识别结构图和流程图。以下为ASR技术的核心技术。特征提取：通过编码将声音转变为数字信号，提取有效的声学特征。梅尔频率倒谱系数MFCC是最经典的语音特征。声学模型：声学模型通过处理编码得到的向量，将相
基本数据类型和引用类型的初始值 3213213333332132 java基础
package com.array; /** * @Description 测试初始值 * @author FuJianyong * 2015-1-22上午10:31:53 */ public class ArrayTest { ArrayTest at; String str; byte bt; short s; int i; long
摘抄笔记--《编写高质量代码：改善Java程序的151个建议》白糖_ 高质量代码
记得3年前刚到公司，同桌同事见我无事可做就借我看《编写高质量代码：改善Java程序的151个建议》这本书，当时看了几页没上心就没研究了。到上个月在公司偶然看到，于是乎又找来看看，我的天，真是非常多的干货，对于我这种静不下心的人真是帮助莫大呀。看完整本书，也记了不少笔记
【备忘】Django 常用命令及最佳实践 dongwei_6688 django
注意：本文基于 Django 1.8.2 版本生成数据库迁移脚本（python 脚本） python manage.py makemigrations polls 说明：polls 是你的应用名字，运行该命令时需要根据你的应用名字进行调整查看该次迁移需要执行的 SQL 语句（只查看语句，并不应用到数据库上）： python manage.p
阶乘算法之一N! 末尾有多少个零周凡杨 java 算法阶乘面试效率
&n
spring注入servlet g21121 Spring注入
传统的配置方法是无法将bean或属性直接注入到servlet中的，配置代理servlet亦比较麻烦，这里其实有比较简单的方法，其实就是在servlet的init()方法中加入要注入的内容： ServletContext application = getServletContext(); WebApplicationContext wac = WebApplicationContextUtil
Jenkins 命令行操作说明文档 510888780 centos
假设Jenkins的URL为http://22.11.140.38:9080/jenkins/ 基本的格式为 java 基本的格式为 java -jar jenkins-cli.jar [-s JENKINS_URL] command [options][args] 下面具体介绍各个命令的作用及基本使用方法 1. &nb
UnicodeBlock检测中文用法布衣凌宇 UnicodeBlock
/** * 判断输入的是汉字 */ public static boolean isChinese(char c) { Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
java下实现调用oracle的存储过程和函数 aijuans java orale
1.创建表：STOCK_PRICES 2.插入测试数据： 3.建立一个返回游标： PKG_PUB_UTILS 4.创建和存储过程：P_GET_PRICE 5.创建函数： 6.JAVA调用存储过程返回结果集 JDBCoracle10G_INVO
Velocity Toolbox antlove 模板 tool box velocity
velocity.VelocityUtil package velocity; import org.apache.velocity.Template; import org.apache.velocity.app.Velocity; import org.apache.velocity.app.VelocityEngine; import org.apache.velocity.c
JAVA正则表达式匹配基础百合不是茶 java 正则表达式的匹配
正则表达式;提高程序的性能,简化代码,提高代码的可读性,简化对字符串的操作正则表达式的用途; 字符串的匹配字符串的分割字符串的查找字符串的替换正则表达式的验证语法 [a] //[]表示这个字符只出现一次 ,[a] 表示a只出现一
是否使用EL表达式的配置 bijian1013 jsp web.xml EL EasyTemplate
今天在开发过程中发现一个细节问题，由于前端采用EasyTemplate模板方法实现数据展示，但老是不能正常显示出来。后来发现竟是EL将我的EasyTemplate的${...}解释执行了，导致我的模板不能正常展示后台数据。网
精通Oracle10编程SQL(1-3)PLSQL基础 bijian1013 oracle 数据库 plsql
--只包含执行部分的PL/SQL块 --set serveroutput off begin dbms_output.put_line('Hello,everyone!'); end; select * from emp; --包含定义部分和执行部分的PL/SQL块 declare v_ename varchar2(5); begin select
【Nginx三】Nginx作为反向代理服务器 bit1129 nginx
Nginx一个常用的功能是作为代理服务器。代理服务器通常完成如下的功能：接受客户端请求将请求转发给被代理的服务器从被代理的服务器获得响应结果把响应结果返回给客户端实例本文把Nginx配置成一个简单的代理服务器对于静态的html和图片，直接从Nginx获取对于动态的页面，例如JSP或者Servlet，Nginx则将请求转发给Res
Plugin execution not covered by lifecycle configuration: org.apache.maven.plugin blackproof maven 报错
转：http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin maven报错： Plugin execution not covered by lifecycle configuration:
发布docker程序到marathon ronin47 docker 发布应用
1 发布docker程序到marathon 1.1 搭建私有docker registry 1.1.1 安装docker regisry docker pull docker-registry docker run -t -p 5000:5000 docker-registry 下载docker镜像并发布到私有registry docker pull consol/tomcat-8.0
java-57-用两个栈实现队列&&用两个队列实现一个栈 bylijinnan java
import java.util.ArrayList; import java.util.List; import java.util.Stack; /* * Q 57 用两个栈实现队列 */ public class QueueImplementByTwoStacks { private Stack<Integer> stack1; pr
Nginx配置性能优化 cfyme nginx
转载地址：http://blog.csdn.net/xifeijian/article/details/20956605 大多数的Nginx安装指南告诉你如下基础知识——通过apt-get安装，修改这里或那里的几行配置，好了，你已经有了一个Web服务器了。而且，在大多数情况下，一个常规安装的nginx对你的网站来说已经能很好地工作了。然而，如果你真的想挤压出Nginx的性能，你必
[JAVA图形图像]JAVA体系需要稳扎稳打,逐步推进图像图形处理技术 comsci java
对图形图像进行精确处理，需要大量的数学工具，即使是从底层硬件模拟层开始设计，也离不开大量的数学工具包，因为我认为，JAVA语言体系在图形图像处理模块上面的研发工作，需要从开发一些基础的，类似实时数学函数构造器和解析器的软件包入手，而不是急于利用第三方代码工具来实现一个不严格的图形图像处理软件...... &nb
MonkeyRunner的使用 dai_lm android MonkeyRunner
要使用MonkeyRunner，就要学习使用Python，哎先抄一段官方doc里的代码作用是启动一个程序（应该是启动程序默认的Activity），然后按MENU键，并截屏 # Imports the monkeyrunner modules used by this program from com.android.monkeyrunner import MonkeyRun
Hadoop-- 海量文件的分布式计算处理方案 datamachine mapreduce hadoop 分布式计算
csdn的一个关于hadoop的分布式处理方案，存档。原帖：http://blog.csdn.net/calvinxiu/article/details/1506112。 Hadoop 是Google MapReduce的一个Java实现。MapReduce是一种简化的分布式编程模式，让程序自动分布到一个由普通机器组成的超大集群上并发执行。就如同ja
以資料庫驗證登入 dcj3sjt126com yii
以資料庫驗證登入由於 Yii 內定的原始框架程式, 採用綁定在UserIdentity.php 的 demo 與 admin 帳號密碼: public function authenticate() { $users=array( &nbs
github做webhooks：[2]php版本自动触发更新 dcj3sjt126com github git webhooks
上次已经说过了如何在github控制面板做查看url的返回信息了。这次就到了直接贴钩子代码的时候了。工具/原料 git github 方法/步骤在github的setting里面的webhooks里把我们的url地址填进去。钩子更新的代码如下： error_reportin
Eos开发常用表达式蕃薯耀 Eos开发 Eos入门 Eos开发常用表达式
Eos开发常用表达式 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2014年8月18日 15:03:35 星期一 &
SpringSecurity3.X--SpEL 表达式 hanqunfeng SpringSecurity
使用 Spring 表达式语言配置访问控制，要实现这一功能的直接方式是在<http>配置元素上添加 use-expressions 属性： <http auto-config="true" use-expressions="true"> 这样就会在投票器中自动增加一个投票器：org.springframework
Redis vs Memcache IXHONG redis
1. Redis中，并不是所有的数据都一直存储在内存中的，这是和Memcached相比一个最大的区别。 2. Redis不仅仅支持简单的k/v类型的数据，同时还提供list，set，hash等数据结构的存储。 3. Redis支持数据的备份，即master-slave模式的数据备份。 4. Redis支持数据的持久化，可以将内存中的数据保持在磁盘中，重启的时候可以再次加载进行使用。 Red
Python - 装饰器使用过程中的误区解读 kvhur JavaScript jquery html5 css
大家都知道装饰器是一个很著名的设计模式，经常被用于AOP(面向切面编程)的场景，较为经典的有插入日志，性能测试，事务处理，Web权限校验， Cache等。原文链接：http://www.gbtags.com/gb/share/5563.htm Python语言本身提供了装饰器语法（@），典型的装饰器实现如下： @function_wrapper de
架构师之mybatis-----update 带case when 针对多种情况更新 nannan408 case when
1.前言. 如题. 2. 代码. <update id="batchUpdate" parameterType="java.util.List"> <foreach collection="list" item="list" index=&
Algorithm算法视频教程栏目记者 Algorithm 算法
课程：Algorithm算法视频教程百度网盘下载地址： http://pan.baidu.com/s/1qWFjjQW 密码: 2mji 程序写的好不好,还得看算法屌不屌！Algorithm算法博大精深。一、课程内容：课时1、算法的基本概念 + Sequential search 课时2、Binary search 课时3、Hash table 课时4、Algor
C语言算法之冒泡排序 qiufeihu c 算法
任意输入10个数字由小到大进行排序。代码： #include <stdio.h> int main() { int i,j,t,a[11]; /*定义变量及数组为基本类型*/ for(i = 1;i < 11;i++){ scanf("%d",&a[i]); /*从键盘中输入10个数*/ } for
JSP异常处理 wyzuomumu Web jsp
1.在可能发生异常的网页中通过指令将HTTP请求转发给另一个专门处理异常的网页中: <%@ page errorPage="errors.jsp"%> 2.在处理异常的网页中做如下声明： errors.jsp: <%@ page isErrorPage="true"%>，这样设置完后就可以在网页中直接访问exc