心学-知行合一

语音识别系列4--语音识别CTC之模型训练源码解析

一、介绍

上一节我们简单介绍了CTC及数据准备过程，做好了数据准备，本节我们介绍CTC模型训练及源码解析。

CTC（Connectionist Temporal Classification）连接时间分类，直观上理解，循环神经网络（RNN）更适合于CTC训练，关于CTC的原理上的介绍，大家已经写的很多了，本节我们主要从代码着手，帮助大家从零搭建CTC-ASR训练系统。既然是系统，我们就让代码的扩展性更强一些，我们现在支持LSTM网络结构。

二、训练源码及解析

2.1配置文件（config-lstm.yml）：

param: #配置参数
  num_classes: 219 #我们使用声韵母建模，音素总个数为219个
  encoder_type: lstm #网络结构使用LSTM结构
  input_size: 40 #输入我们使用40维的MFCC
  left_context: 10 #输入左边拼帧10帧
  right_context: 10 #输入右边拼帧10帧
  num_units: 512 #隐层单元个数
  num_layers: 4 #隐层数
  lstm_impl: BasicLSTMCell #LSTM结构类型
  use_peephole: True #LSTM结构是否使用PEEPHOLE
  weight_init: 0.1 #初始化参数
  clip_grad_norm: 5.0 #梯度更新参数
  clip_activation: 50 #激活函数截断参数
  num_proj: 256 #映射层维数
  weight_decay: 0 #正则化系数
  train_data_size: 3000 #训练数据量
  label_type: monophone #建模单元类型
  optimizer: adam #使用的优化器
  learning_rate: 0.0001 #初始学习率
  dropout: 0.8 #参数更新比例
  bottleneck_dim: 0 #瓶颈层维数
  train_data_file: ./data/th30h.tfrecords #训练数据及标签
  label_file: ./data/dict.txt #音素对应的字典
  beam_width: 1 #解码beam宽度
  batch_size: 32 #更新一次参数batch大小
  print_step: 50 #保存模型的频率，50次迭代保存一次模型
  num_epoch: 6 #数据迭代轮数

2.2网络结构文件（lstm.py）：

# -*- coding: utf-8 -*-

"""Unidirectional LSTM encoder."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

#LSTM编码器，支持BasicLSTM,LSTM,BlockLSTM
#之所以叫编码器，相当于把语音特征编码到分类标签
class LSTMEncoder(object):
  """Unidirectional LSTM encoder.
  Args:
    num_units (int): 每一层的结点数
    num_proj (int): 映射层的结点数
    num_layers (int): 网络层数
    lstm_impl (string, optional): LSTM结构的不同实现
      - BasicLSTMCell: tf.contrib.rnn.BasicLSTMCell 基本LSTM (no peephole)
      - LSTMCell: tf.contrib.rnn.LSTMCell 标准LSTM
      - LSTMBlockCell: tf.contrib.rnn.LSTMBlockCell BLOCK LSTM
    use_peephole (bool): 是否使用peephole
    parameter_init (float): 初始化网络参数
    clip_activation (float): 通过激活函数后的裁剪范围 (> 0)
    time_major (bool, optional): 计算时是否使用时间为主序
    name (string, optional): 设置网络结构名称
  """
  def __init__(self,
               num_units,
               num_proj,
               num_layers,
               lstm_impl,
               use_peephole,
               parameter_init,
               clip_activation,
               time_major=False,
               name='lstm_encoder'):
    self.num_units = num_units
    if lstm_impl != 'LSTMCell':
      self.num_proj = None
    else:
      self.num_proj = num_proj
    self.num_layers = num_layers
    self.lstm_impl = lstm_impl
    self.use_peephole = use_peephole
    self.parameter_init = parameter_init
    self.clip_activation = clip_activation
    self.time_major = time_major
    self.name = name
  #可调用对象
  def __call__(self, inputs, inputs_seq_len, keep_prob, is_training):
    """Construct model graph.
    Args:
      inputs (placeholder): A tensor of size`[B, T, input_size]`
      inputs_seq_len (placeholder): A tensor of size` [B]`
      keep_prob (placeholder, float): A probability to keep nodes
                in the hidden-hidden connection
      is_training (bool):
    Returns:
      outputs: Encoder states.
        if time_major is True, a tensor of size
                    `[T, B, num_units (num_proj)]`
        otherwise, `[B, T, num_units (num_proj)]`
      final_state: A final hidden state of the encoder
    """
    initializer = tf.random_uniform_initializer(
      minval=-self.parameter_init, maxval=self.parameter_init)
    if self.lstm_impl == 'BasicLSTMCell':
      outputs, final_state = basiclstmcell(
                             self.num_units, self.num_layers,
                             inputs, inputs_seq_len, keep_prob, initializer,
                             self.time_major)
    elif self.lstm_impl == 'LSTMCell':
      outputs, final_state = lstmcell(
                             self.num_units, self.num_proj, self.num_layers,
                             self.use_peephole, self.clip_activation,
                             inputs, inputs_seq_len, keep_prob, initializer,
                             self.time_major)
    elif self.lstm_impl == 'LSTMBlockCell':
      outputs, final_state = lstmblockcell(
                             self.num_units, self.num_layers,
                             self.use_peephole,
                             inputs, inputs_seq_len, keep_prob, initializer,
                             self.time_major)
    else:
      raise IndexError( 'lstm_impl is "BasicLSTMCell" or "LSTMCell" or ' +
                '"LSTMBlockCell" or "LSTMBlockFusedCell" or ' +
                '"CudnnLSTM".')
    return outputs, final_state

#basic lstm网络结构
def basiclstmcell(num_units, num_layers, inputs, inputs_seq_len,
                  keep_prob, initializer, time_major):
  if time_major:
    # Convert from batch-major to time-major
    inputs = tf.transpose(inputs, [1, 0, 2])
  lstm_list = []
  with tf.variable_scope('multi_lstm', initializer=initializer) as scope:
    for i_layer in range(1, num_layers + 1, 1):
      lstm = tf.contrib.rnn.BasicLSTMCell(
                           num_units,
                           forget_bias=1.0,
                           state_is_tuple=True,
                           activation=tf.tanh)
      # Dropout for the hidden-hidden connections
      lstm = tf.contrib.rnn.DropoutWrapper(
                lstm, output_keep_prob=keep_prob)
      lstm_list.append(lstm)
    # Stack multiple cells
    stacked_lstm = tf.contrib.rnn.MultiRNNCell(
                     lstm_list, state_is_tuple=True)
    # Ignore 2nd return (the last state)
    outputs, final_state = tf.nn.dynamic_rnn(
                     cell=stacked_lstm,
                     inputs=inputs,
                     sequence_length=inputs_seq_len,
                     dtype=tf.float32,
                     time_major=time_major,
                     scope=scope)
  return outputs, final_state

#标准lstm网络结构
def lstmcell(num_units, num_proj, num_layers, use_peephole, clip_activation,
             inputs, inputs_seq_len, keep_prob, initializer, time_major):
  if time_major:
    # Convert form batch-major to time-major
    inputs = tf.transpose(inputs, [1, 0, 2])
  lstm_list = []
  with tf.variable_scope('multi_lstm', initializer=initializer) as scope:
    for i_layer in range(1, num_layers + 1, 1): 
      lstm = tf.contrib.rnn.LSTMCell(
                num_units,
                use_peepholes=use_peephole,
                cell_clip=clip_activation,
                num_proj=num_proj,
                forget_bias=1.0,
                state_is_tuple=True)
      # Dropout for the hidden-hidden connections
      lstm = tf.contrib.rnn.DropoutWrapper(
                lstm, output_keep_prob=keep_prob)
      lstm_list.append(lstm)
    # Stack multiple cells
    stacked_lstm = tf.contrib.rnn.MultiRNNCell(
            lstm_list, state_is_tuple=True)
    # Ignore 2nd return (the last state)
    outputs, final_state = tf.nn.dynamic_rnn(
            cell=stacked_lstm,
            inputs=inputs,
            sequence_length=inputs_seq_len,
            dtype=tf.float32,
            time_major=time_major,
            scope=scope)
  return outputs, final_state

#block lstm网络结构
def lstmblockcell(num_units, num_layers, use_peephole, inputs,
                  inputs_seq_len, keep_prob, initializer, time_major):
  if time_major:
    inputs = tf.transpose(inputs, [1, 0, 2])
  lstm_list = []
  with tf.variable_scope('multi_lstm', initializer=initializer) as scope:
    for i_layer in range(1, num_layers + 1, 1):
      lstm = tf.contrib.rnn.LSTMBlockCell(
                num_units, forget_bias=1.0, 
                use_peephole=use_peephole)
      lstm = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
      lstm_list.append(lstm)
    stacked_lstm = tf.contrib.rnn.MultiRNNCell(lstm_list, state_is_tuple=True)
    outputs, final_state = tf.nn.dynamic_rnn(cell=stacked_lstm,
      inputs=inputs,sequence_length=inputs_seq_len,
      dtype=tf.float32,time_major=time_major,scope=scope)
  return outputs, final_state

2.3选择网络结构（choose_encoder.py）：

# -*- coding: utf-8 -*-

"""Select & load encoder."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from lstm import LSTMEncoder

ENCODERS = { 
  "lstm": LSTMEncoder,
}

#选择模型结构，这里仅支持LSTM结构
def load(encoder_type):
  """Select & load encoder.
  Args:
    encoder_type (string): name of the ctc model in the key of ENCODERS
  Returns:
    An instance of the encoder
  """
  if encoder_type not in ENCODERS.keys():
    raise ValueError(
          "encoder_type should be one of [%s], you provided %s." %
          (", ".join(ENCODERS), encoder_type))
  return ENCODERS[encoder_type]

2.4基本的工具函数（basic_util.py）：

# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
from os.path import join, isdir
#这里是一些工具函数

#创建目录
def mkdir(path_to_dir):
  if path_to_dir is not None and (not isdir(path_to_dir)):
    os.makedirs(path_to_dir)
  return path_to_dir

#创建子目录
def mkdir_join(path_to_dir, *dir_name):
  if path_to_dir is None:
    return path_to_dir
  for i in range(len(dir_name)):
    if '.' not in dir_name[i]:
      path_to_dir = mkdir(join(path_to_dir, dir_name[i]))
    else:
      path_to_dir = join(path_to_dir, dir_name[i])
  return path_to_dir

#统计总的参数量
def count_total_parameters(variables):
  total_parameters = 0 
  parameters_dict = {}
  for variable in variables:
    shape = variable.get_shape()
    variable_parameters = 1 
    for dim in shape:
      variable_parameters *= dim.value
    total_parameters += variable_parameters
    parameters_dict[variable.name] = variable_parameters
  return parameters_dict, total_parameters

2.5 模型基类（model_base.py）：

# -*- coding: utf-8 -*-

"""Base class for all models."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

#支持的优化器
OPTIMIZER_CLS_NAMES = { 
  "adagrad": tf.train.AdagradOptimizer,
  "adadelta": tf.train.AdadeltaOptimizer,
  "adam": tf.train.AdamOptimizer,
  "rmsprop": tf.train.RMSPropOptimizer,
  "sgd": tf.train.GradientDescentOptimizer,
  "momentum": tf.train.MomentumOptimizer,
  "nestrov": tf.train.MomentumOptimizer
}

#模型基类
class ModelBase(object):
  def __init__(self, *args, **kwargs):
    pass

  def _build(self, *args, **kwargs):
    """Construct model graph."""
    raise NotADirectoryError

  def create_placeholders(self):
    """Create placeholders and append them to list."""
    raise NotImplementedError

  def compute_loss(self, *args, **kwargs):
    """Operation for computing loss."""
    raise NotImplementedError

  def _add_noise_to_inputs(self, inputs, stddev=0.075):
    """Add gaussian noise to the inputs.
    Args:
      inputs: the noise free input-features.
      stddev (float, optional): The standart deviation of the noise.
      Default is 0.075.
    Returns:
      inputs: Input features plus noise.
    """
    raise NotImplementedError

  def _add_noise_to_gradients(grads_and_vars, gradient_noise_scale,
                                stddev=0.075):
    """Adds scaled noise from a 0-mean normal distribution to gradients.
    Args:
      grads_and_vars:
      gradient_noise_scale:
      stddev (float):
    Returns:
    """
    raise NotImplementedError

  #设置优化器
  def _set_optimizer(self, optimizer, learning_rate):
    """Set optimizer.
    Args:
      optimizer (string): the name of the optimizer in
        OPTIMIZER_CLS_NAMES
      learning_rate (float): A learning rate
    Returns:
      optimizer:
    """
    optimizer = optimizer.lower()
    if optimizer not in OPTIMIZER_CLS_NAMES:
      raise ValueError(
        "Optimizer name should be one of [%s], you provided %s." %
        (", ".join(OPTIMIZER_CLS_NAMES), optimizer))

    # Select optimizer
    if optimizer == 'momentum':
      return OPTIMIZER_CLS_NAMES[optimizer](
             learning_rate=learning_rate,
             momentum=0.9)
    elif optimizer == 'nestrov':
      return OPTIMIZER_CLS_NAMES[optimizer](
             learning_rate=learning_rate,
             momentum=0.9,
             use_nesterov=True)
    else:
      return OPTIMIZER_CLS_NAMES[optimizer](
             learning_rate=learning_rate)

  def train(self, loss, optimizer, learning_rate):
    """Operation for training. Only the sigle GPU training is supported.
    Args:
      loss: An operation for computing loss
      optimizer (string): name of the optimizer in OPTIMIZER_CLS_NAMES
      learning_rate (placeholder): A learning rate
    Returns:
      train_op: operation for training
    """
    # Create a variable to track the global step
    global_step = tf.Variable(0, name='global_step', trainable=False)

    # Set optimizer
    self.optimizer = self._set_optimizer(optimizer, learning_rate)

    if self.clip_grad_norm is not None:
      # Compute gradients
      grads_and_vars = self.optimizer.compute_gradients(loss)

      # Clip gradients
      clipped_grads_and_vars = self._clip_gradients(grads_and_vars)

      # Create operation for gradient update
      with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        train_op = self.optimizer.apply_gradients(
                    clipped_grads_and_vars,
                    global_step=global_step)
    else:
      # Use the optimizer to apply the gradients that minimize the loss
      # and also increment the global step counter as a single training
      # step
      with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        train_op = self.optimizer.minimize(
          loss, global_step=global_step)
    return train_op

  def _clip_gradients(self, grads_and_vars):
    """Clip gradients.
    Args:
      grads_and_vars (list): list of tuples of `(grads, vars)`
    Returns:
      clipped_grads_and_vars (list): list of tuple of
                `(clipped grads, vars)`
    """
    clipped_grads_and_vars = []

    # Clip gradient norm
    for grad, var in grads_and_vars:
      if grad is not None:
        clipped_grads_and_vars.append(
          (tf.clip_by_norm(grad, clip_norm=self.clip_grad_norm),
          var))
    return clipped_grads_and_vars

2.6 CTC模型（model_ctc.py）：

# -*- coding: utf-8 -*-
"""CTC model."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from model_base import ModelBase
from choose_encoder import load

class CTC(ModelBase):
  """Connectionist Temporal Classification (CTC) network.
  Args:
    encoder_type (string): The type of an encoder
      lstm: Unidirectional LSTM
    input_size (int): the dimensions of input vectors
    num_units (int): the number of units in each layer
    num_layers (int): the number of layers
    num_classes (int): the number of classes of target labels
      (except for a blank label)
    lstm_impl (string, optional): a base implementation of LSTM. This is
        not used for GRU models.
      - BasicLSTMCell: tf.contrib.rnn.BasicLSTMCell (no peephole)
      - LSTMCell: tf.contrib.rnn.LSTMCell
      - LSTMBlockCell: tf.contrib.rnn.LSTMBlockCell
      Choose the background implementation of tensorflow.
      Default is LSTMBlockCell.
    use_peephole (bool, optional): if True, use peephole connection. This
      is not used for GRU models.
    left_context (int, optional): the number of left context to slice
    right_context (int, optional): the number of right context to slice
    parameter_init (float, optional): the range of uniform distribution to
      initialize weight parameters (>= 0)
    clip_grad_norm (float, optional): the range of clipping of gradient
      norm (> 0)
    clip_activation (float, optional): the range of clipping of cell
      activation (> 0). This is not used for GRU models.
    num_proj (int, optional): the number of nodes in the projection layer.
      This is not used for GRU models.
    weight_decay (float, optional): a parameter for weight decay
    bottleneck_dim (int, optional): the dimensions of the bottleneck layer
    time_major (bool, optional): if True, time-major computation will be
      performed
  """
  def __init__(self,
         encoder_type,
         input_size,
         num_units,
         num_layers,
         num_classes,
         lstm_impl='LSTMBlockCell',
         use_peephole=True,
         left_context=10,
         right_context=10,
         parameter_init=0.1,
         clip_grad_norm=None,
         clip_activation=None,
         num_proj=None,
         weight_decay=0.0,
         bottleneck_dim=None,
         time_major=True):
    super(CTC, self).__init__()
    if clip_grad_norm is not None:
      assert float(clip_grad_norm) > 0, 'clip_grad_norm must be larger than 0.'
    assert float(weight_decay) >= 0, 'weight_decay must not be a negative value.'

    self.encoder_type = encoder_type
    self.input_size = input_size
    self.left_context = left_context
    self.right_context = right_context
    self.num_units = num_units
    if int(num_proj) == 0:
      self.num_proj = None
    elif num_proj is not None:
      self.num_proj = int(num_proj)
    else:
      self.num_proj = None
    self.num_layers = num_layers
    self.bottleneck_dim = bottleneck_dim
    #add blank state
    self.num_classes = num_classes + 1
    self.lstm_impl = lstm_impl
    self.use_peephole = use_peephole

    # Regularization
    self.parameter_init = parameter_init
    self.clip_grad_norm = clip_grad_norm
    self.clip_activation = clip_activation
    self.weight_decay = weight_decay

    # Summaries for TensorBoard
    self.summaries_train = []
    self.summaries_dev = []

    # Placeholders
    self.inputs_pl_list = []
    self.labels_pl_list = []
    self.inputs_seq_len_pl_list = []
    self.keep_prob_pl_list = []

    self.time_major = time_major
    self.name = encoder_type + '_ctc'

    if encoder_type in ['lstm']:
      self.encoder = load(encoder_type)(
        num_units=num_units,
        num_proj=self.num_proj,
        num_layers=num_layers,
        lstm_impl=lstm_impl,
        use_peephole=use_peephole,
        parameter_init=parameter_init,
        clip_activation=clip_activation,
        time_major=time_major)
    else:
      raise NotImplementedError

  def _build(self, inputs, inputs_seq_len, keep_prob, is_training):
    """Construct model graph.
    Args:
      inputs: A tensor of size `[B, T, input_size]`
      inputs_seq_len (placeholder): A tensor of size` [B]`
      keep_prob (placeholder, float): A probability to keep nodes
        in the hidden-hidden connection
      is_training (bool):
    Returns:
      logits: A tensor of size `[T, B, num_classes]`
    """
    # inputs: `[B, T, input_size]`
    batch_size = tf.shape(inputs)[0]
    max_time = tf.shape(inputs)[1]
    encoder_outputs, final_state = self.encoder(
      inputs, inputs_seq_len, keep_prob, is_training)
    self.encoder_outputs = encoder_outputs
    # Reshape to apply the same weights over the timesteps
    output_dim = encoder_outputs.shape.as_list()[-1]
    outputs_2d = tf.reshape(encoder_outputs, shape=[batch_size * max_time, output_dim])
    if self.bottleneck_dim is not None and self.bottleneck_dim != 0:
      with tf.variable_scope('bottleneck') as scope:
        outputs_2d = tf.contrib.layers.fully_connected(
          outputs_2d,
          num_outputs=self.bottleneck_dim,
          activation_fn=tf.nn.relu,
          weights_initializer=tf.truncated_normal_initializer(stddev=self.parameter_init),
          biases_initializer=tf.zeros_initializer(),
          scope=scope)
      # Dropout for the hidden-output connections
      outputs_2d = tf.nn.dropout(
          outputs_2d, keep_prob, name='dropout_bottleneck')
    with tf.variable_scope('output') as scope:
      logits_2d = tf.contrib.layers.fully_connected(outputs_2d,
          num_outputs=self.num_classes,
          activation_fn=None,
          weights_initializer=tf.truncated_normal_initializer(
            stddev=self.parameter_init),
          biases_initializer=tf.zeros_initializer(),
          scope=scope)
      if self.time_major:
        # Reshape back to the original shape
        logits = tf.reshape(logits_2d, shape=[max_time, batch_size, self.num_classes])
      else:
        # Reshape back to the original shape
        logits = tf.reshape(logits_2d, shape=[batch_size, max_time, self.num_classes])
        # Convert to time-major: `[T, B, num_classes]'
        logits = tf.transpose(logits, [1, 0, 2])
    return logits

  def create_placeholders(self):
    """Create placeholders and append them to list."""
    self.inputs_pl_list.append(
      tf.placeholder(tf.float32, shape=[None, None, self.input_size * (self.left_context + self.right_context + 1)],
                     name='input'))
    self.labels_pl_list.append(
      tf.SparseTensor(tf.placeholder(tf.int64, name='indices'),
                      tf.placeholder(tf.int32, name='values'),
                      tf.placeholder(tf.int64, name='shape')))
    self.inputs_seq_len_pl_list.append(
      tf.placeholder(tf.int32, shape=[None], name='inputs_seq_len'))
    self.keep_prob_pl_list.append(
      tf.placeholder(tf.float32, name='keep_prob'))

  def compute_loss(self, inputs, labels, inputs_seq_len,
     keep_prob, scope=None, softmax_temperature=1,
                     is_training=True):
    """Operation for computing CTC loss.
    Args:
      inputs: A tensor of size `[B, T, input_size]`
      labels: A SparseTensor of target labels
      inputs_seq_len: A tensor of size `[B]`
      keep_prob (placeholder, float): A probability to keep nodes
                in the hidden-hidden connection
      scope (optional): A scope in the model tower
      softmax_temperature (int, optional): temperature parameter for
                ths softmax layer
      is_training (bool, optional):
    Returns:
      total_loss: operation for computing total ctc loss (ctc loss + L2).
                 This is a single scalar tensor to minimize.
      logits: A tensor of size `[T, B, num_classes]`
    """
    # Build model graph
    logits = self._build(inputs, inputs_seq_len, keep_prob,
                             is_training=is_training)
    # Weight decay
    if self.weight_decay > 0:
      with tf.name_scope("weight_decay_loss"):
        weight_sum = 0
        for var in tf.trainable_variables():
          if 'bias' not in var.name.lower():
            weight_sum += tf.nn.l2_loss(var)
        tf.add_to_collection('losses', weight_sum * self.weight_decay)
    with tf.name_scope("ctc_loss"):
      ctc_losses = tf.nn.ctc_loss(labels,
        logits / softmax_temperature,
        tf.cast(inputs_seq_len, tf.int32),
        preprocess_collapse_repeated=False,
        ctc_merge_repeated=True,
        ignore_longer_outputs_than_inputs=True,
        time_major=True)
      ctc_loss = tf.reduce_mean(ctc_losses, name='ctc_loss_mean')
      tf.add_to_collection('losses', ctc_loss)
    # Compute total loss
    total_loss = tf.add_n(tf.get_collection('losses', scope),
                              name='total_loss')
    # Add a scalar summary for the snapshot of loss
    if self.weight_decay > 0:
      self.summaries_train.append(
        tf.summary.scalar('weight_loss_train',
                          weight_sum * self.weight_decay))
      self.summaries_dev.append(
        tf.summary.scalar('weight_loss_dev',
                          weight_sum * self.weight_decay))

      self.summaries_train.append(
        tf.summary.scalar('total_loss_train', total_loss))
      self.summaries_dev.append(
        tf.summary.scalar('total_loss_dev', total_loss))

    self.summaries_train.append(
      tf.summary.scalar('ctc_loss_train', ctc_loss))
    self.summaries_dev.append(
      tf.summary.scalar('ctc_loss_dev', ctc_loss))
    return total_loss, logits

  def decoder(self, logits, inputs_seq_len, beam_width=1):
    """Operation for decoding.
    Args:
      logits: A tensor of size `[T, B, num_classes]`
      inputs_seq_len: A tensor of size `[B]`
      beam_width (int, optional): beam width for beam search.
                1 disables beam search, which mean greedy decoding.
    Return:
      decode_op: A SparseTensor
    """
    assert isinstance(beam_width, int), "beam_width must be integer."
    assert beam_width >= 1, "beam_width must be >= 1"

    if beam_width == 1:
      decoded, _ = tf.nn.ctc_greedy_decoder(logits, inputs_seq_len)
    else:
      decoded, _ = tf.nn.ctc_beam_search_decoder(logits, inputs_seq_len,
                beam_width=beam_width)
    decode_op = tf.to_int32(decoded[0])
    return decode_op

  def posteriors(self, logits, blank_prior=1):
    """Operation for computing posteriors of each time steps.
    Args:
      logits: A tensor of size `[T, B, num_classes]`
      blank_prior (float): A prior for blank classes. posteriors are
                divided by this prior.
    Return:
      posteriors_op: operation for computing posteriors for each class
    """
    # Convert to batch-major: `[B, T, num_classes]'
    logits = tf.transpose(logits, (1, 0, 2))
    logits_2d = tf.reshape(logits, [-1, self.num_classes])
    posteriors_op = tf.nn.softmax(logits_2d)
    return posteriors_op

  def compute_ler(self, decode_op, labels):
    """Operation for computing LER (Label Error Rate).
    Args:
      decode_op: operation for decoding
      labels: A SparseTensor of target labels
    Return:
      ler_op: operation for computing LER
    """
    # Compute LER (normalize by label length)
    ler_op = tf.reduce_mean(tf.edit_distance(
            decode_op, labels, normalize=True))
    # Add a scalar summary for the snapshot of LER
    self.summaries_train.append(tf.summary.scalar('ler_train', ler_op))
    self.summaries_dev.append(tf.summary.scalar('ler_dev', ler_op))
    return ler_op

2.7 训练主程序（train_ctc.py）：

# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from os.path import join, isfile, abspath
import sys 
import time
from setproctitle import setproctitle
import shutil
import yaml
import os

from model_ctc import CTC
from basic_util import mkdir_join, mkdir
from basic_util import count_total_parameters
import numpy as np
import math

from tensorflow.python.framework import graph_util

left_context = 10
right_context = 10
skip = 4

def general_frame(feature, seq_len):
  max_len, mfcc_len = feature.shape
  feat_num = left_context + right_context + 1
  frame_list = [np.concatenate([feature[0 if m<0 else seq_len-1 
               if m>seq_len-1 else m] for m in range(n-left_context, 
               n+right_context+1)]) for n in range(0, seq_len, skip)]
  new_seq_len = math.ceil(seq_len / skip)
  new_feature = np.asarray(frame_list).astype(np.float32)
  return new_feature, new_seq_len

def parse_function(example_proto):
  features = {'feature': tf.VarLenFeature(tf.string),
              'label'  : tf.VarLenFeature(tf.string),
              'seq_len': tf.FixedLenFeature([], tf.int64)}
  parsed_features = tf.parse_single_example(example_proto, features)
  feature = parsed_features['feature']
  feature = tf.sparse_tensor_to_dense(parsed_features['feature'], default_value=b'0.0')
  feature = tf.decode_raw(feature[0], tf.float32)
  feature = tf.reshape(feature, [-1, 40])
  label = parsed_features['label']
  label = tf.sparse_tensor_to_dense(parsed_features['label'], default_value=b'0')
  label = tf.decode_raw(label[0], tf.int64)
  seq_len = parsed_features['seq_len']
  feature, seq_len = tf.py_func(general_frame, [feature, seq_len], [tf.float32, tf.int64])
  seq_len = tf.cast(seq_len, tf.int32)
  return feature, label, seq_len

def dense_to_sparse(dense):
  indices = []
  values = []
  for n, seq in enumerate(dense):
    seq = np.append(seq, -1)
    seq = seq[:np.argmin(seq)]
    indices.extend(zip([n] * len(seq), range(len(seq))))
    values.extend(seq)
  indices = np.asarray(indices, dtype=np.int64)
  values = np.asarray(values, dtype=np.int32)
  shape = np.asarray(dense.shape, dtype=np.int64)
  return indices, values, shape

def average_gradients(tower_grads):
  average_grads=[]
  for grad_and_vars in zip(*tower_grads):
    grads=[]
    for g,_ in grad_and_vars:
      expend_g=tf.expand_dims(g,0)
      grads.append(expend_g)
    grad=tf.concat(grads,0)
    grad=tf.reduce_mean(grad,0)
    v=grad_and_vars[0][1]
    grad_and_var=(grad,v)
    average_grads.append(grad_and_var)
  return average_grads

def do_train(model, params, gpu_indices):
  # Tell TensorFlow that the model will be built into the default graph
  with tf.Graph().as_default(), tf.device('/cpu:0'):
    global_step = tf.Variable(0, name='global_step', trainable=False)
    # Set optimizer
    #设置优化器
    learning_rate_pl = tf.placeholder(tf.float32, name='learning_rate')
    optimizer = model._set_optimizer(params['optimizer'], learning_rate_pl)
    # Calculate the gradients for each model tower
    #定义梯度和损失
    total_grads_and_vars, total_losses = [], []
    decode_ops, ler_ops = [], []
    #定义所有的设备名称
    all_devices = ['/gpu:%d' % i_gpu for i_gpu in range(len(gpu_indices))]
    with tf.variable_scope(tf.get_variable_scope()):
      for i_gpu in range(len(all_devices)):
        with tf.device(all_devices[i_gpu]):
          with tf.name_scope('tower_gpu%d' % i_gpu) as scope:
            # Define placeholders in each tower
            model.create_placeholders()
            tower_loss, tower_logits = model.compute_loss(
                   model.inputs_pl_list[i_gpu],
                   model.labels_pl_list[i_gpu],
                   model.inputs_seq_len_pl_list[i_gpu],
                   model.keep_prob_pl_list[i_gpu],
                   scope)
            tower_loss = tf.expand_dims(tower_loss, axis=0)
            total_losses.append(tower_loss)
            #仅在一个卡上更新参数
            tf.get_variable_scope().reuse_variables()
            tower_grads_and_vars = optimizer.compute_gradients(
              tower_loss)
            tower_grads_and_vars = model._clip_gradients(tower_grads_and_vars)
            total_grads_and_vars.append(tower_grads_and_vars)
            decode_op_tower = model.decoder(tower_logits, model.inputs_seq_len_pl_list[i_gpu], 
                              beam_width=params['beam_width'])
            decode_ops.append(decode_op_tower)
            ler_op_tower = model.compute_ler(decode_op_tower, model.labels_pl_list[i_gpu])
            ler_op_tower = tf.expand_dims(ler_op_tower, axis=0)
            ler_ops.append(ler_op_tower)
    #loss平均
    total_losses = tf.concat(axis=0, values=total_losses)
    loss_op = tf.reduce_mean(total_losses, axis=0)
    #ler平均
    ler_ops = tf.concat(axis=0, values=ler_ops)
    ler_op = tf.reduce_mean(ler_ops, axis=0)
    #梯度平均
    average_grads_and_vars = average_gradients(total_grads_and_vars)
    train_op = optimizer.apply_gradients(average_grads_and_vars,global_step=global_step)

    summary_train = tf.summary.merge(model.summaries_train)
    summary_dev = tf.summary.merge(model.summaries_dev)
    #一次初始化所有参数
    init_op = tf.global_variables_initializer()
    saver = tf.train.Saver(max_to_keep=None)
    #统计所有的参数
    parameters_dict, total_parameters = count_total_parameters(tf.trainable_variables())

    for parameter_name in sorted(parameters_dict.keys()):
      print("%s %d" % (parameter_name, parameters_dict[parameter_name]))
    print("Total %d variables, %s M parameters" %  (len(parameters_dict.keys()),
           "{:,}".format(total_parameters / 1000000)))
    #获取数据
    train_dataset = tf.data.TFRecordDataset(params['train_data_file'])
    train_dataset = train_dataset.map(parse_function)
    train_dataset = train_dataset.shuffle(1000)
    train_dataset = train_dataset.padded_batch(params['batch_size'], padded_shapes=([None, None],
               [None], []), padding_values=(0.0, tf.cast(-1, tf.int64), tf.cast(0, tf.int32)))
    train_dataset = train_dataset.repeat(1)

    iterator = train_dataset.make_initializable_iterator()
    #获取一个batch
    batch_feat, batch_label, batch_seq_len = iterator.get_next()
    s_indices, s_value, s_shape = tf.py_func(dense_to_sparse, [batch_label],  [tf.int64, tf.int32, tf.int64])
    batch_label = tf.SparseTensor(s_indices, s_value, s_shape)
 
    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=False)) as sess:
      summary_writer = tf.summary.FileWriter(model.save_path, sess.graph)
      sess.run(init_op)
      start_time_train = time.time()
      learning_rate = float(params['learning_rate'])
      step_id = 0
      last_id = 0
      #找到上一次训练的模型并接着训练
      ckpt = tf.train.latest_checkpoint(model.save_path)
      print("===========================")
      print("ckpt:", ckpt)
      if ckpt != None:
        saver.restore(sess, ckpt)
        ind = ckpt.rfind("-")
        last_id = int(ckpt[ind + 1:])
        print("++++++++++++++++++++++++++++")
        print("last_id: ", last_id)
      print("===========================")
      for epoch in range(params['num_epoch']):
        start_time_epoch = time.time()
        sess.run(iterator.initializer)
        print("global_step: ", sess.run(global_step))
        try:
          while (True):
            start_time_step = time.time()
            if step_id < last_id:
              step_id += 1
              print("===skip: ", step_id)
              sys.stdout.flush()
              continue
            feed_dict_train = {}
            #获取数据
            for i_gpu in range(len(gpu_indices)):
              new_feat, new_label, new_seq_len = sess.run([batch_feat, batch_label, batch_seq_len])
              feed_dict_train[model.inputs_pl_list[i_gpu]] = new_feat
              feed_dict_train[model.labels_pl_list[i_gpu]] = new_label
              feed_dict_train[model.inputs_seq_len_pl_list[i_gpu]] = new_seq_len
              feed_dict_train[model.keep_prob_pl_list[i_gpu]] = float(params['dropout'])
            feed_dict_train[learning_rate_pl] = learning_rate
            #一次训练
            step_loss, step_ler, _, _ = sess.run([loss_op, 
                            ler_op, global_step, train_op], feed_dict=feed_dict_train)
            step_id += 1
            end_time_step = time.time()
            step_time = end_time_step - start_time_step
            #打印一次训练结果
            print("batch: ", step_id, " loss: ", step_loss, " ler: ", step_ler, " time: ", step_time)
            sys.stdout.flush()
            #每隔一段时间保存一次模型
            if step_id % params['print_step'] == 0:
              summary_str_train = sess.run(summary_train, feed_dict=feed_dict_train)
              summary_writer.add_summary(summary_str_train, step_id)
              summary_writer.flush()
              checkpoint_file = join(model.save_path, 'model.ckpt')
              save_path = saver.save(sess, checkpoint_file, global_step=step_id)
              print("Model saved in file: %s" % save_path)
              sys.stdout.flush()
        #出错之后进行下一轮训练
        except tf.errors.OutOfRangeError:
          end_time_epoch = time.time()
          epoch_time = end_time_epoch - start_time_epoch
          print("epoch: ", epoch, " end, use time: ", epoch_time)
          sys.stdout.flush()
      end_time_train = time.time()
      train_time = end_time_train - start_time_train
      print("train end, total time: ", train_time)
      sys.stdout.flush()
      summary_writer.close()

def main(config_path, model_save_path, gpu_indices):

  #加载配置文件
  with open(config_path, "r") as f:
    config = yaml.load(f)
    params = config['param']

  # Model setting
  model = CTC(encoder_type=params['encoder_type'],
              input_size=params['input_size'],
              left_context=params['left_context'],
              right_context=params['right_context'],
              num_units=params['num_units'],
              num_layers=params['num_layers'],
              num_classes=params['num_classes'],
              lstm_impl=params['lstm_impl'],
              use_peephole=params['use_peephole'],
              parameter_init=params['weight_init'],
              clip_grad_norm=params['clip_grad_norm'],
              clip_activation=params['clip_activation'],
              num_proj=params['num_proj'],
              weight_decay=params['weight_decay'])
  # Set process name
  setproctitle('tf' + model.name + '_' + str(params['train_data_size']) + '_' + params['label_type'])
  #设置模型的名称
  model.name += '_' + str(params['num_units'])
  model.name += '_' + str(params['num_layers'])
  model.name += '_' + params['optimizer']
  model.name += '_lr' + str(params['learning_rate'])
  if params['num_proj'] != 0:
    model.name += '_proj' + str(params['num_proj'])
  if params['dropout'] != 0:
    model.name += '_drop' + str(params['dropout'])
  if params['weight_decay'] != 0:
    model.name += '_wd' + str(params['weight_decay'])
  if params['bottleneck_dim'] != 0:
    model.name += '_bottle' + str(params['bottleneck_dim'])
  if len(gpu_indices) >= 2:
    model.name += '_gpu' + str(len(gpu_indices))

  # Set save path
  #创建模型保存路径
  model.save_path = mkdir_join(
        model_save_path, 'ctc', params['label_type'],
        str(params['train_data_size']), model.name)

  # Reset model directory
  model_index = 0
  new_model_path = model.save_path
  while True:
    #如果模型保存路径中有COMPLETE.TXT，则修改目录名
    if isfile(join(new_model_path, 'complete.txt')):
      # Training of the first model have been finished
      model_index += 1
      new_model_path = model.save_path + '_' + str(model_index)
    elif False and isfile(join(new_model_path, 'config.yml')):
      # Training of the first model have not been finished yet
      model_index += 1
      new_model_path = model.save_path + '_' + str(model_index)
    else:
      break
  #创建新的模型目录
  model.save_path = mkdir(new_model_path)

  # Save config file
  #配置文件拷贝一份到模型目录中
  shutil.copyfile(config_path, join(model.save_path, 'config.yml'))

  #sys.stdout = open(join(model.save_path, 'train.log'), 'w')
  #开始训练
  do_train(model=model, params=params, gpu_indices=gpu_indices)

if __name__ == '__main__':
  args = sys.argv
  if len(args) != 3 and len(args) != 4:
    print("input config file, model store path, [0,1,2,3]")
    exit(-1)
  import tensorflow as tf
  #选择哪些GPU可见
  os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2'
  main(config_path=args[1], model_save_path=args[2],
    gpu_indices=list(map(int, args[3].split(','))))

三、结论

以上就是CTC整个训练脚本，在3000小时数据上，字准确率达到了97%，句准确率达到了91%的效果。

你可能感兴趣的:(tensorflow,asr,ctc)

深入解析如何进行TensorFlow框架下的算子开发与适配插件开发：基于昇腾AI的完整流程快撑死的鱼华为昇腾 Ascend C的算子开发系统学习人工智能 tensorflow python
深入解析如何进行TensorFlow框架下的算子开发与适配插件开发：基于昇腾AI的完整流程在人工智能领域中，算子（Operator）作为深度学习模型的基础执行单元，决定了整个模型的计算性能和结果准确性。随着硬件平台的多样化，如何将第三方深度学习框架中的算子适配到特定的硬件平台变得至关重要。本文将深入探讨如何在TensorFlow框架下开发适配昇腾AI处理器的算子插件，通过解析算子属性映射、数据排布
深入解析框架适配开发：基于CANN平台的自定义算子开发与第三方框架适配全流程详解快撑死的鱼华为昇腾 Ascend C的算子开发系统学习人工智能
深入解析框架适配开发：基于CANN平台的自定义算子开发与第三方框架适配全流程详解随着深度学习的发展，不同的深度学习框架如TensorFlow、PyTorch、ONNX等在AI开发者社区中占据了重要地位。然而，针对某些硬件平台（如华为昇腾AI处理器），算子库中的算子并非都已经适配了所有主流框架。为了解决这一问题，框架适配开发应运而生，它允许开发者将已存在于算子库中的算子适配到其他未支持的第三方框架上
有趣的python代码实例_Python之路：200个Python有趣的小例子一网打尽 weixin_39845406 有趣的python代码实例
概述博主最近在学习python，看完了一整套学习视频，然后呃呃呃，还是用不太流畅。碰巧在全球最大的同性交友论坛GayHub(呸！是开源代码托管平台Github)上面发现了一个项目，该项目列举了200多个Python小例子，Python基础、Python坑点、Python字符串和正则、Python绘图、Python日期和文件、Web开发、数据科学、机器学习、深度学习、TensorFlow、Pytor
matlab程序代编程写做代码图像处理BP神经网络机器深度学习python matlabgoodboy 深度学习 matlab 图像处理
1.安装必要的库首先，确保你已经安装了必要的Python库。如果没有安装，请运行以下命令：bash复制代码pipinstallnumpymatplotlibtensorflowopencv-python2.图像预处理我们将使用OpenCV来加载和预处理图像数据。假设你有一个图像数据集，每个类别的图像存放在单独的文件夹中。python复制代码importosimportcv2importnumpya
用 aiofiles 模块的 asyncio.to_thread() 方法将同步文件操作转换为异步操作 PyAIGCMaster python
importaiofilesimportreasyncdefdownload_text_to_file(url,name):asyncwithaiohttp.ClientSession()assession:asyncwithsession.get(url)asresponse:text=awaitresponse.text()obj_content=re.compile(r"content:'(
深度解析：Python与TensorFlow在日平均气温预测中的应用——LSTM神经网络实战 AI_DL_CODE python 神经网络 tensorflow LSTM 气温预测 RNN
文章目录1.引言1.1研究背景与意义1.2研究目标与问题定义2.概念解析2.1Python语言简介2.2TensorFlow框架概述2.3LSTM神经网络原理3.原理详解3.1时间序列分析基础3.1.1时间序列的组成3.1.2时间序列分析方法3.2LSTM在时间序列分析中的应用3.2.1LSTM的优势3.2.2LSTM的结构3.3日平均气温预测的数学模型3.3.1ARIMA模型3.3.2LSTM模
基于深度学习的推荐系统构建：Movielens 数据集 fresh的转码之路深度学习人工智能机器学习推荐算法
基于深度学习的推荐系统构建：Movielens数据集依赖环境代码语言：python3.11.5开发平台：pycharmtensorflow版本：2.18.0MovieLen1M数据及简介MovieLens1M数据集包含包含6000个用户在近4000部电影上的100万条评分，也包括电影元数据信息和用户属性信息。下载地址为：http://files.grouplens.org/datasets/mov
百度指数+selenium+request+比特指纹浏览器+pywebview+pandas+flask过程性万山y python selenium 爬虫 flask pandas
1.cookies和headrs问题使用selenium获得的cookies测试没有问题，但是获得的heards头不可以使用，经过测试比较需要添加或者修改几项重点的heards为{'Cipher-Text':'1704885072633_1704970047346_SlMkwPX0ZnotTaSrpOEx50xhLlPT5iMH867nxTtYuapcdPhsh2d2ooVE2F+RSm+yhIF
tf.function-＞ AttributeError: ‘double‘ object has no attribute ‘shape‘ 乔宇同学学习tensorflow
跑tensorflow时出现的bug,不使用tf.function没问题，一旦挂上装饰符，就报错，报错内容如下：Traceback(mostrecentcalllast):File"D:\Anaconda3\envs\tensorflow2\lib\site-packages\tensorflow_core\python\eager\function.py",line111,in_make_inp
用TensorFlow.NET搭建一个全连接神经网络 chiyong7717 人工智能 c#python
在本文中，我们将学习如何在C＃中构建神经网络模型计算图。与线性分类器相比，神经网络的关键优势在于它可以分离不可线性分离的数据。我们将实现此模型来对MNIST数据集的手写数字图像进行分类。我们要构建的神经网络的结构如下。MNIST数据的手写数字图像有10个类（从0到9）。该网络具有2个隐藏层：第一层具有200个隐藏单元（神经元），第二层具有10个神经元（称为分类器层）。让我们一步一步地用代码来实现：
C#遇见TensorFlow.NET：开启机器学习的全新时代墨夶 C#学习资料1 机器学习 c#tensorflow
在当今快速发展的科技世界里，机器学习（MachineLearning,ML）已经成为推动创新的重要力量。从个性化推荐系统到自动驾驶汽车，ML的应用无处不在。对于那些习惯于使用C#进行开发的程序员来说，将机器学习集成到他们的项目中似乎是一项具有挑战性的任务。但随着TensorFlow.NET的出现，这一切变得不再困难。今天，我们将一起探索如何利用这一强大的工具，在熟悉的.NET环境中轻松构建、训练和
python中tensorflow_python机器学习TensorFlow框架弦歌缓缓
TensorFlow框架关注公众号“轻松学编程”了解更多。一、简介TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统，其命名来源于本身的运行原理。Tensor(张量)意味着N维数组，Flow(流)意味着基于数据流图的计算，TensorFlow为张量从流图的一端流动到另一端的计算过程。TensorFlow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统
基于深度学习CNN网络 mini-xception网络实现构建一个完整的人脸表情检测_识别分类系统，包括训练、评估、前端和服务端代码计算机c9硕士算法工程师卷积神经网络深度学习 cnn 分类
人脸表情检测该项目已训练好网络模型，配置好环境即可运行使用，效果见图像，实现图像识别、摄像头识别、摄像头识别/识别分类项目-说明文档-UI界面-cnn网络项目基本介绍：【网络】深度学习CNN网络mini-xception网络【环境】python>=3.5tensorflow2opencvpyqt5【文件】训练预测全部源代码、训练好的模型、fer2013数据集、程序算法讲解文档【类别】对7种表情检测
推荐3D UNet实现：深度学习3D体素数据语义分割的利器！滑辰煦Marc
推荐3DUNet实现：深度学习3D体素数据语义分割的利器！去发现同类优质开源项目:https://gitcode.com/在这个快速发展的深度学习时代，3DUNet已经成为3D图像处理领域中不可或缺的工具，尤其在医疗影像分析和3D物体识别等任务上展现出强大的潜力。这个开源项目为我们提供了一个高效、灵活的3DUNet实现，支持Tensorflow、PyTorch和Chainer三种主流深度学习框架。
深度学习驱动的极端天气预测：时空数据异常检测与应用全解析（基于Python + TensorFlow） AI_DL_CODE 深度学习 python tensorflow 人工智能天气预测
摘要：时空数据异常检测在气象领域识别偏离正常模式的数据点，对极端天气预测至关重要。深度学习，尤其是LSTM网络，因其强大的特征学习能力在该领域显示出巨大潜力。通过整合多源气象数据，深度学习模型能够自动挖掘复杂模式和非线性关系，提高预测准确性。然而，挑战依然存在，包括数据质量问题、模型可解释性不足以及极端天气的内在复杂性和不确定性。未来，通过模型架构创新、训练算法优化以及探索深度学习在气候预测、气象
python 代码实现了一个条件生成对抗网络（Conditional Generative Adversarial Network，CGAN），用于生成与给定的理化值相关的光谱数据 max500600 算法开发语言 python 生成对抗网络开发语言
importtensorflowastfimportnumpyasnpimportpandasaspdimportosimportmatplotlib.pyplotaspltfromsklearn.model_selectionimporttrain_test_splitfromtensorflow.keras.layersimportAdd,BatchNormalizationos.enviro
MindIE+MindFormers推理方案指导人工智能pytorch
组件介绍CANNCANN是什么异构计算架构CANN（ComputeArchitectureforNeuralNetworks）是昇腾针对AI场景推出的异构计算架构，向上支持多种AI框架，包括MindSpore、PyTorch、TensorFlow等，向下服务AI处理器与编程，发挥承上启下的关键作用，是提升昇腾AI处理器计算效率的关键平台。同时针对多样化应用场景，提供多层次编程接口，支持用户快速构建
【YashanDB知识库】YashanDB备份恢复的两种渠道数据库
本文内容来自YashanDB官网，原文内容请见https://www.yashandb.com/newsinfo/7106884.html?templateId=171...背景：通过备份恢复完成yasdb到yasdb的数据迁移，环境分为A环境（ip:127.0.0.1）与B环境（ip:127.0.0.2），数据从A环境迁移到B环境。方式有两种，一是yasql命令行，二是通过yasrman工具ya
whisper.cpp 学习笔记法号：行颠机器学习 whisper 学习笔记
whisper.cppwhisper.cpp学习笔记whisper介绍源码下载源码编译支持的模型优化/加速生成库文件使用whispe.cpp的demo参考文献whisper.cpp学习笔记whisper介绍whisper是基于OpenAI的自动语音识别（ASR）模型。他可以识别包括英语、普通话等在内多国语言。whisper分为whisper（python版本）和whisper.cpp（C/C++版
开源人工智能模型框架：探索与实践 CodeJourney. 人工智能能源
摘要本文深入探讨了开源人工智能模型框架，旨在为研究人员、开发者及相关从业者提供全面的理解与参考。通过对多个主流开源框架，如TensorFlow、PyTorch、Keras、Detectron2、OpenCV、HuggingFaceTransformers、AllenNLP、MindSpore和Fastai的详细分析，阐述其特点、应用场景、优势与不足，并结合具体示例说明其使用方法，同时配以相关架构图
【机器学习：十五、神经网络的编译和训练】 KeyPan 机器学习机器学习神经网络人工智能深度学习 pytorch ubuntu linux
1.TensorFlow实现代码TensorFlow是深度学习中最为广泛使用的框架之一，提供了灵活的接口来构建、编译和训练神经网络。以下是实现神经网络的一个完整代码示例，以“手写数字识别”为例：importtensorflowastffromtensorflow.kerasimportlayers,models#加载MNIST数据集(x_train,y_train),(x_test,y_test)
应急救援路径规划中的蚁群算法与路径评价研究【附代码】拉勾科研工作室算法
数据科学与大数据专业|数据分析与模型构建|数据驱动决策✨专业领域：数据挖掘与清洗大数据处理与存储技术机器学习与深度学习模型数据可视化与报告生成分布式计算与云计算数据安全与隐私保护擅长工具：Python/R/Matlab数据分析与建模Hadoop/Spark大数据处理平台SQL数据库管理与优化Tableau/PowerBI数据可视化工具TensorFlow/PyTorch深度学习框架✅具体问题可以私
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
mysql查询统计聚合函数三小皮 mysql 数据库
业务中用户统计报表使用，查询字段使用聚合函数+条件，快速实现报表统计。SELECTMIN(s.org_name)ASorgName,s.way_nameASwayName,COUNT(s.id)ASwaybillTotal,SUM(s.take_weight)AStakeWeightTotal,SUM(s.revert_weight)ASrevertWeightTotal,SUM(s.settle
英语日积月累2023-06-08 抽刀断水2
StratifiedStratifiedStratified分层此外，欧洲社会相对来说是分阶层的；职业和社会地位是通过继承得到的。Moreover,Europeansocietywasrelativelystratified;occupationandsocialstatuswereinherited.straightforwardstraightforwardstraightforward直爽的
关于python版本与TensorFlow安装的版本问题 iiimharrygGc. python tensorflow 开发语言
实测在conda环境下，python3.12的版本无法安装TensorFlow2.14.0（截至2024.5.21）最新版本在python3.7版本下正常安装ps：上述安装均在anacondanavigator软件内安装
Vue + Django的人脸识别系统 DXSsssss python DRF tensorflow 人脸识别
最近在研究机器学习，刚好最近看了vue+Djangodrf的一些课程，学以致用，做了一个人脸识别系统。项目前端使用Vue框架，用到了elementui组件，写起来真是方便。比之前传统的dtl方便了太多。后端使用了drf，识别知识刚开始打算使用opencv+tensorflow,但是发现吧识别以后的结果返回到浏览器当中时使用opencv比较麻烦（主要是我太菜，想不到比较好的方法），因此最终使用了tf
Awesome TensorFlow weixin_30594001 人工智能移动开发大数据
AwesomeTensorFlowAcuratedlistofawesomeTensorFlowexperiments,libraries,andprojects.Inspiredbyawesome-machine-learning.WhatisTensorFlow?TensorFlowisanopensourcesoftwarelibraryfornumericalcomputationusin
【ShuQiHere】小白也能懂的 TensorFlow 和 PyTorch GPU 配置教程 ShuQiHere tensorflow pytorch 人工智能
【ShuQiHere】在深度学习中，GPU的使用对于加速模型训练至关重要。然而，对于许多刚刚入门的小白来说，如何在TensorFlow和PyTorch中指定使用GPU进行训练可能会感到困惑。在本文中，我将详细介绍如何在这两个主流的深度学习框架中指定使用GPU进行训练，并确保每一个步骤都简单易懂，跟着我的步骤来，你也能轻松上手！1.安装所需库首先，确保你已经安装了TensorFlow或PyTorch
TensorFlow的基本概念以及使用场景张柏慈决策树
TensorFlow是一个机器学习平台，用于构建和训练机器学习模型。它使用图形表示计算任务，其中节点表示数学操作，边表示计算之间的数据流动。TensorFlow的主要特点包括：1.多平台支持：TensorFlow可以运行在多种硬件和操作系统上，包括CPU、GPU和移动设备。2.自动求导：TensorFlow可以自动计算模型参数的梯度，通过优化算法更新参数，以提高模型的准确性。3.分布式计算：Ten
Maven Array_06 eclipse jdk maven
Maven Maven是基于项目对象模型(POM)，信息来管理项目的构建，报告和文档的软件项目管理工具。 Maven 除了以程序构建能力为特色之外，还提供高级项目管理工具。由于 Maven 的缺省构建规则有较高的可重用性，所以常常用两三行 Maven 构建脚本就可以构建简单的项目。由于 Maven 的面向项目的方法，许多 Apache Jakarta 项目发文时使用 Maven，而且公司
ibatis的queyrForList和queryForMap区别 bijian1013 java ibatis
一.说明 iBatis的返回值参数类型也有种：resultMap与resultClass，这两种类型的选择可以用两句话说明之： 1.当结果集列名和类的属性名完全相对应的时候，则可直接用resultClass直接指定查询结果类
LeetCode[位运算] - #191 计算汉明权重 Cwind java 位运算 LeetCode Algorithm 题解
原题链接：#191 Number of 1 Bits 要求：写一个函数，以一个无符号整数为参数，返回其汉明权重。例如，‘11’的二进制表示为'00000000000000000000000000001011', 故函数应当返回3。汉明权重：指一个字符串中非零字符的个数；对于二进制串，即其中‘1’的个数。难度：简单分析：将十进制参数转换为二进制，然后计算其中1的个数即可。 “
浅谈java类与对象 15700786134 java
java是一门面向对象的编程语言，类与对象是其最基本的概念。所谓对象，就是一个个具体的物体，一个人，一台电脑，都是对象。而类，就是对象的一种抽象，是多个对象具有的共性的一种集合，其中包含了属性与方法，就是属于该类的对象所具有的共性。当一个类创建了对象，这个对象就拥有了该类全部的属性，方法。相比于结构化的编程思路，面向对象更适用于人的思维
linux下双网卡同一个IP 被触发 linux
转自： http://q2482696735.blog.163.com/blog/static/250606077201569029441/ 由于需要一台机器有两个网卡，开始时设置在同一个网段的IP，发现数据总是从一个网卡发出，而另一个网卡上没有数据流动。网上找了下，发现相同的问题不少：一、关于双网卡设置同一网段IP然后连接交换机的时候出现的奇怪现象。当时没有怎么思考、以为是生成树
安卓按主页键隐藏程序之后无法再次打开肆无忌惮_ 安卓
遇到一个奇怪的问题，当SplashActivity跳转到MainActivity之后，按主页键，再去打开程序，程序没法再打开（闪一下），结束任务再开也是这样，只能卸载了再重装。而且每次在Log里都打印了这句话"进入主程序"。后来发现是必须跳转之后再finish掉SplashActivity 本来代码： // 销毁这个Activity fin
通过cookie保存并读取用户登录信息实例知了ing JavaScript html
通过cookie的getCookies()方法可获取所有cookie对象的集合；通过getName()方法可以获取指定的名称的cookie；通过getValue()方法获取到cookie对象的值。另外，将一个cookie对象发送到客户端，使用response对象的addCookie()方法。下面通过cookie保存并读取用户登录信息的例子加深一下理解。（1）创建index.jsp文件。在改
JAVA 对象池矮蛋蛋 java ObjectPool
原文地址： http://www.blogjava.net/baoyaer/articles/218460.html Jakarta对象池 ☆为什么使用对象池恰当地使用对象池化技术，可以有效地减少对象生成和初始化时的消耗，提高系统的运行效率。Jakarta Commons Pool组件提供了一整套用于实现对象池化
ArrayList根据条件+for循环批量删除的方法 alleni123 java
场景如下： ArrayList<Obj> list Obj-> createTime, sid. 现在要根据obj的createTime来进行定期清理。（释放内存） ------------------------- 首先想到的方法就是 for(Obj o:list){ if(o.createTime-currentT>xxx){
阿里巴巴“耕地宝”大战各种宝百合不是茶平台战略
“耕地保”平台是阿里巴巴和安徽农民共同推出的一个 “首个互联网定制私人农场”，“耕地宝”由阿里巴巴投入一亿，主要是用来进行农业方面，将农民手中的散地集中起来不仅加大农民集体在土地上面的话语权，还增加了土地的流通与利用率，提高了土地的产量，有利于大规模的产业化的高科技农业的发展，阿里在农业上的探索将会引起新一轮的产业调整，但是集体化之后农民的个体的话语权将更少，国家应出台相应的法律法规保护
Spring注入有继承关系的类（1） bijian1013 java spring
一个类一个类的注入 1.AClass类 package com.bijian.spring.test2; public class AClass { String a; String b; public String getA() { return a; } public void setA(Strin
30岁转型期你能否成为成功人士 bijian1013 成功
很多人由于年轻时走了弯路，到了30岁一事无成，这样的例子大有人在。但同样也有一些人，整个职业生涯都发展得很优秀，到了30岁已经成为职场的精英阶层。由于做猎头的原因，我们接触很多30岁左右的经理人，发现他们在职业发展道路上往往有很多致命的问题。在30岁之前，他们的职业生涯表现很优秀，但从30岁到40岁这一段，很多人
[Velocity三]基于Servlet+Velocity的web应用 bit1129 velocity
什么是VelocityViewServlet 使用org.apache.velocity.tools.view.VelocityViewServlet可以将Velocity集成到基于Servlet的web应用中，以Servlet+Velocity的方式实现web应用 Servlet + Velocity的一般步骤 1.自定义Servlet，实现VelocityViewServl
【Kafka十二】关于Kafka是一个Commit Log Service bit1129 service
Kafka is a distributed, partitioned, replicated commit log service.这里的commit log如何理解？ A message is considered "committed" when all in sync replicas for that partition have applied i
NGINX + LUA实现复杂的控制 ronin47 lua nginx 控制
安装lua_nginx_module 模块 lua_nginx_module 可以一步步的安装，也可以直接用淘宝的OpenResty Centos和debian的安装就简单了。。这里说下freebsd的安装： fetch http://www.lua.org/ftp/lua-5.1.4.tar.gz tar zxvf lua-5.1.4.tar.gz cd lua-5.1.4 ma
java-14.输入一个已经按升序排序过的数组和一个数字，在数组中查找两个数，使得它们的和正好是输入的那个数字 bylijinnan java
public class TwoElementEqualSum { /** * 第 14 题：题目：输入一个已经按升序排序过的数组和一个数字，在数组中查找两个数，使得它们的和正好是输入的那个数字。要求时间复杂度是 O(n) 。如果有多对数字的和等于输入的数字，输出任意一对即可。例如输入数组 1 、 2 、 4 、 7 、 11 、 15 和数字 15 。由于
Netty源码学习-HttpChunkAggregator-HttpRequestEncoder-HttpResponseDecoder bylijinnan java netty
今天看Netty如何实现一个Http Server org.jboss.netty.example.http.file.HttpStaticFileServerPipelineFactory： pipeline.addLast("decoder", new HttpRequestDecoder()); pipeline.addLast(&quo
java敏感词过虑-基于多叉树原理 cngolon 违禁词过虑替换违禁词敏感词过虑多叉树
基于多叉树的敏感词、关键词过滤的工具包，用于java中的敏感词过滤 1、工具包自带敏感词词库，第一次调用时读入词库，故第一次调用时间可能较长，在类加载后普通pc机上html过滤5000字在80毫秒左右，纯文本35毫秒左右。 2、如需自定义词库，将jar包考入WEB-INF工程的lib目录，在WEB-INF/classes目录下建一个 utf-8的words.dict文本文件，
多线程知识 cuishikuan 多线程
T1，T2，T3三个线程工作顺序，按照T1，T2，T3依次进行 public class T1 implements Runnable{ @Override
spring整合activemq dalan_123 java spring jms
整合spring和activemq需要搞清楚如下的东东1、ConnectionFactory分： a、spring管理连接到activemq服务器的管理ConnectionFactory也即是所谓产生到jms服务器的链接 b、真正产生到JMS服务器链接的ConnectionFactory还得
MySQL时间字段究竟使用INT还是DateTime？ dcj3sjt126com mysql
环境：Windows XPPHP Version 5.2.9MySQL Server 5.1 第一步、创建一个表date_test（非定长、int时间） CREATE TABLE `test`.`date_test` (`id` INT NOT NULL AUTO_INCREMENT ,`start_time` INT NOT NULL ,`some_content`
Parcel: unable to marshal value dcj3sjt126com marshal
在两个activity直接传递List<xxInfo>时，出现Parcel: unable to marshal value异常。在MainActivity页面（MainActivity页面向NextActivity页面传递一个List<xxInfo>）： Intent intent = new Intent(this, Next
linux进程的查看上（ps） eksliang linux ps linux ps -l linux ps aux
ps:将某个时间点的进程运行情况选取下来转载请出自出处：http://eksliang.iteye.com/admin/blogs/2119469 http://eksliang.iteye.com ps 这个命令的man page 不是很好查阅，因为很多不同的Unix都使用这儿ps来查阅进程的状态，为了要符合不同版本的需求，所以这个
为什么第三方应用能早于System的app启动 gqdy365 System
Android应用的启动顺序网上有一大堆资料可以查阅了，这里就不细述了，这里不阐述ROM启动还有bootloader，软件启动的大致流程应该是启动kernel -> 运行servicemanager 把一些native的服务用命令启动起来（包括wifi, power, rild, surfaceflinger, mediaserver等等）-> 启动Dalivk中的第一个进程Zygot
App Framework发送JSONP请求(3) hw1287789687 jsonp 跨域请求发送jsonp ajax请求越狱请求
App Framework 中如何发送JSONP请求呢? 使用jsonp,详情请参考:http://json-p.org/ 如何发送Ajax请求呢? (1)登录 /*** * 会员登录 * @param username * @param password */ var user_login=function(username,password){ // aler
发福利，整理了一份关于“资源汇总”的汇总 justjavac 资源
觉得有用的话，可以去github关注：https://github.com/justjavac/awesome-awesomeness-zh_CN 通用 free-programming-books-zh_CN 免费的计算机编程类中文书籍精彩博客集合 hacke2/hacke2.github.io#2 ResumeSample 程序员简历
用 Java 技术创建 RESTful Web 服务 macroli java 编程 Web REST
转载：http://www.ibm.com/developerworks/cn/web/wa-jaxrs/ JAX-RS (JSR-311) 【 Java API for RESTful Web Services 】是一种 Java™ API，可使 Java Restful 服务的开发变得迅速而轻松。这个 API 提供了一种基于注释的模型来描述分布式资源。注释被用来提供资源的位
CentOS6.5-x86_64位下oracle11g的安装详细步骤及注意事项超声波 oracle linux
前言：这两天项目要上线了，由我负责往服务器部署整个项目，因此首先要往服务器安装oracle，服务器本身是CentOS6.5的64位系统，安装的数据库版本是11g，在整个的安装过程中碰到很多的坑，不过最后还是通过各种途径解决并成功装上了。转别写篇博客来记录完整的安装过程以及在整个过程中的注意事项。希望对以后那些刚刚接触的菜鸟们能起到一定的帮助作用。安装过程中可能遇到的问题（注
HttpClient 4.3 设置keeplive 和 timeout 的方法 supben httpclient
ConnectionKeepAliveStrategy kaStrategy = new DefaultConnectionKeepAliveStrategy() { @Override public long getKeepAliveDuration(HttpResponse response, HttpContext context) { long keepAlive
Spring 4.2新特性-@Import注解的升级 wiselyman spring 4
3.1 @Import @Import注解在4.2之前只支持导入配置类在4.2,@Import注解支持导入普通的java类,并将其声明成一个bean 3.2 示例演示java类 package com.wisely.spring4_2.imp; public class DemoService { public void doSomethin