胧月夜い

FFN(mlpack)

前馈神经网络

FFN
- Constructor
- Train
- Evaluate
- - Reset
  - Forward
  - Loss
- Gradient
- EvaluateWithGradient
- Predict
Layer
- Linear
- - Constructor
  - Forward
  - Backward
  - Gradient
- Convolution
- - Constructor
  - Forward
  - Backward
  - Gradient
Test
Reference

FFN

Constructor

主要构造函数头文件：

/**
 * Implementation of a standard feed forward network.
 *
 * @tparam OutputLayerType The output layer type used to evaluate the network.
 * @tparam InitializationRuleType Rule used to initialize the weight matrix.
 * @tparam CustomLayers Any set of custom layers that could be a part of the
 *         feed forward network.
 */
template<
  typename OutputLayerType = NegativeLogLikelihood<>,
  typename InitializationRuleType = RandomInitialization,
  typename... CustomLayers
>
class FFN
{
 public:
  //! Convenience typedef for the internal model construction.
  using NetworkType = FFN<OutputLayerType, InitializationRuleType>;

  /**
   * Create the FFN object.
   *
   * Optionally, specify which initialize rule and performance function should
   * be used.
   *
   * If you want to pass in a parameter and discard the original parameter
   * object, be sure to use std::move to avoid unnecessary copy.
   *
   * @param outputLayer Output layer used to evaluate the network.
   * @param initializeRule Optional instantiated InitializationRule object
   *        for initializing the network parameter.
   */
  FFN(OutputLayerType outputLayer = OutputLayerType(),
      InitializationRuleType initializeRule = InitializationRuleType());

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::FFN(
    OutputLayerType outputLayer, InitializationRuleType initializeRule) :
    outputLayer(std::move(outputLayer)),
    initializeRule(std::move(initializeRule)),
    width(0),
    height(0),
    reset(false),
    numFunctions(0),
    deterministic(false)
{
  /* Nothing to do here. */
}

构造函数有两个主要的模板参数： OutputLayerType 和 InitializationRuleType，去看一下它们的默认实现

NegativeLogLikelihood 头文件：

/**
 * Implementation of the negative log likelihood layer. The negative log
 * likelihood layer expectes that the input contains log-probabilities for each
 * class. The layer also expects a class index, in the range between 1 and the
 * number of classes, as target when calling the Forward function.
 *
 * @tparam InputDataType Type of the input data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 * @tparam OutputDataType Type of the output data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 */
template <
    typename InputDataType = arma::mat,
    typename OutputDataType = arma::mat
>
class NegativeLogLikelihood
{
 public:
  /**
   * Create the NegativeLogLikelihoodLayer object.
   */
  NegativeLogLikelihood();

  /**
   * Computes the Negative log likelihood.
   *
   * @param input Input data used for evaluating the specified function.
   * @param target The target vector, that contains the class index in the range
   *        between 1 and the number of classes.
   */
  template<typename InputType, typename TargetType>
  typename InputType::elem_type Forward(const InputType& input,
                                        const TargetType& target);

  /**
   * Ordinary feed backward pass of a neural network. The negative log
   * likelihood layer expects that the input contains log-probabilities for
   * each class. The layer also expects a class index, in the range between 1
   * and the number of classes, as target when calling the Forward function.
   *
   * @param input The propagated input activation.
   * @param target The target vector, that contains the class index in the range
   *        between 1 and the number of classes.
   * @param output The calculated error.
   */
  template<typename InputType, typename TargetType, typename OutputType>
  void Backward(const InputType& input,
                const TargetType& target,
                OutputType& output);

  //! Get the input parameter.
  InputDataType& InputParameter() const { return inputParameter; }
  //! Modify the input parameter.
  InputDataType& InputParameter() { return inputParameter; }

  //! Get the output parameter.
  OutputDataType& OutputParameter() const { return outputParameter; }
  //! Modify the output parameter.
  OutputDataType& OutputParameter() { return outputParameter; }

  //! Get the delta.
  OutputDataType& Delta() const { return delta; }
  //! Modify the delta.
  OutputDataType& Delta() { return delta; }

  /**
   * Serialize the layer
   */
  template<typename Archive>
  void serialize(Archive& /* ar */, const unsigned int /* version */);

 private:
  //! Locally-stored delta object.
  OutputDataType delta;

  //! Locally-stored input parameter object.
  InputDataType inputParameter;

  //! Locally-stored output parameter object.
  OutputDataType outputParameter;
}; // class NegativeLogLikelihood

实现：

template<typename InputDataType, typename OutputDataType>
NegativeLogLikelihood<InputDataType, OutputDataType>::NegativeLogLikelihood()
{
  // Nothing to do here.
}

template<typename InputDataType, typename OutputDataType>
template<typename InputType, typename TargetType>
typename InputType::elem_type
NegativeLogLikelihood<InputDataType, OutputDataType>::Forward(
    const InputType& input,
    const TargetType& target)
{
  typedef typename InputType::elem_type ElemType;
  ElemType output = 0;
  for (size_t i = 0; i < input.n_cols; ++i)
  {
    size_t currentTarget = target(i) - 1;
    Log::Assert(currentTarget < input.n_rows,
        "Target class out of range.");

    output -= input(currentTarget, i);
  }

  return output;
}

template<typename InputDataType, typename OutputDataType>
template<typename InputType, typename TargetType, typename OutputType>
void NegativeLogLikelihood<InputDataType, OutputDataType>::Backward(
      const InputType& input,
      const TargetType& target,
      OutputType& output)
{
  output = arma::zeros<OutputType>(input.n_rows, input.n_cols);
  for (size_t i = 0; i < input.n_cols; ++i)
  {
    size_t currentTarget = target(i) - 1;
    Log::Assert(currentTarget < input.n_rows,
        "Target class out of range.");

    output(currentTarget, i) = -1;
  }
}

template<typename InputDataType, typename OutputDataType>
template<typename Archive>
void NegativeLogLikelihood<InputDataType, OutputDataType>::serialize(
    Archive& /* ar */,
    const unsigned int /* version */)
{
  // Nothing to do here.
}

负对数似然损失中重要的就是那两个 Forward , Backward 方法，我们不妨引入一些记号：
$X_1, \cdots , X_N) \ , \quad X_i \in \mathbb{R}^n \ \ \forall \ i \in [1, N] \\[6pt] \Rightarrow \begin{bmatrix} x_{11} \ x_{12} \cdots \ x_{1N} \\ \vdots \\ x_{n1} \ x_{n2} \cdots \ x_{nN} \end{bmatrix} \\[6pt] target: (y_1 , \cdots , y_N) \ , \quad y_i \in [1, m]$
因此：
Forward:
$\sum_{i=1}^N x_{(y_i ,i)} \ , \quad y_i \leqslant n$
Backward:
$\times N): \quad output_{(j, i)}= \begin{cases} -1 \ , \quad j = y_i \ \ (y_i \leqslant n) \\ 0 \ , \quad otherwise \end{cases}$

RandomInitialization ：

/**
 * This class is used to initialize randomly the weight matrix.
 */
class RandomInitialization
{
 public:
  /**
   * Initialize the random initialization rule with the given lower bound and
   * upper bound.
   *
   * @param lowerBound The number used as lower bound.
   * @param upperBound The number used as upper bound.
   */
  RandomInitialization(const double lowerBound = -1,
                       const double upperBound = 1) :
      lowerBound(lowerBound), upperBound(upperBound) { }

  /**
   * Initialize the random initialization rule with the given bound.
   * Using the negative of the bound as lower bound and the positive bound as
   * upper bound.
   *
   * @param bound The number used as lower bound
   */
  RandomInitialization(const double bound) :
      lowerBound(-std::abs(bound)), upperBound(std::abs(bound)) { }

  /**
   * Initialize randomly the elements of the specified weight matrix.
   *
   * @param W Weight matrix to initialize.
   * @param rows Number of rows.
   * @param cols Number of columns.
   */
  template<typename eT>
  void Initialize(arma::Mat<eT>& W, const size_t rows, const size_t cols)
  {
    if (W.is_empty())
      W.set_size(rows, cols);

    W.randu();
    W *= (upperBound - lowerBound);
    W += lowerBound;
  }

  /**
   * Initialize randomly the elements of the specified weight matrix.
   *
   * @param W Weight matrix to initialize.
   */
  template<typename eT>
  void Initialize(arma::Mat<eT>& W)
  {
    if (W.is_empty())
      Log::Fatal << "Cannot initialize an empty matrix." << std::endl;

    W.randu();
    W *= (upperBound - lowerBound);
    W += lowerBound;
  }

  /**
   * Initialize randomly the elements of the specified weight 3rd order tensor.
   *
   * @param W Weight matrix to initialize.
   * @param rows Number of rows.
   * @param cols Number of columns.
   * @param slices Number of slices.
   */
  template<typename eT>
  void Initialize(arma::Cube<eT>& W,
                  const size_t rows,
                  const size_t cols,
                  const size_t slices)
  {
    if (W.is_empty())
      W.set_size(rows, cols, slices);

    for (size_t i = 0; i < slices; ++i)
      Initialize(W.slice(i), rows, cols);
  }

  /**
   * Initialize randomly the elements of the specified weight 3rd order tensor.
   *
   * @param W Weight matrix to initialize.
   */
  template<typename eT>
  void Initialize(arma::Cube<eT>& W)
  {
    if (W.is_empty())
      Log::Fatal << "Cannot initialize an empty cube." << std::endl;

    for (size_t i = 0; i < W.n_slices; ++i)
      Initialize(W.slice(i));
  }

 private:
  //! The number used as lower bound.
  double lowerBound;

  //! The number used as upper bound.
  double upperBound;
}; // class RandomInitialization

.randu() 在官方中的说明：

.randu() uses a uniform distribution in the [0,1] interval

因此，该初始化方法先产生服从 $U (0, 1)$ 的初始值，再乘以 $(u p p e r B o u n d - l o w e r B o u n d)$ ，加上 $l o w e r B o u n d$
有：
$\dfrac{(upperBound + lowerBound)}{2} \\[6pt] D(W) = \dfrac{(upperBound - lowerBound)^2}{12}$

Train

Train 头文件：

  /**
   * Train the feedforward network on the given input data using the given
   * optimizer.
   *
   * This will use the existing model parameters as a starting point for the
   * optimization. If this is not what you want, then you should access the
   * parameters vector directly with Parameters() and modify it as desired.
   *
   * If you want to pass in a parameter and discard the original parameter
   * object, be sure to use std::move to avoid unnecessary copy.
   *
   * @tparam OptimizerType Type of optimizer to use to train the model.
   * @tparam CallbackTypes Types of Callback Functions.
   * @param predictors Input training variables.
   * @param responses Outputs results from input training variables.
   * @param optimizer Instantiated optimizer used to train the model.
   * @param callbacks Callback function for ensmallen optimizer `OptimizerType`.
   *      See https://www.ensmallen.org/docs.html#callback-documentation.
   * @return The final objective of the trained model (NaN or Inf on error).
   */
  template<typename OptimizerType, typename... CallbackTypes>
  double Train(arma::mat predictors,
               arma::mat responses,
               OptimizerType& optimizer,
               CallbackTypes&&... callbacks);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
template<typename OptimizerType, typename... CallbackTypes>
double FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::Train(
      arma::mat predictors,
      arma::mat responses,
      OptimizerType& optimizer,
      CallbackTypes&&... callbacks)
{
  ResetData(std::move(predictors), std::move(responses));

  WarnMessageMaxIterations<OptimizerType>(optimizer, this->predictors.n_cols);

  // Train the model.
  Timer::Start("ffn_optimization");
  const double out = optimizer.Optimize(*this, parameter, callbacks...);
  Timer::Stop("ffn_optimization");

  Log::Info << "FFN::FFN(): final objective of trained model is " << out
      << "." << std::endl;
  return out;
}

构造完模型后，就是利用给定的数据集和标签来进行训练，从实现来看，这不难理解：
利用 ensmallen 里的优化器，将自身作为待优化的函数传入，将参数 parameter 传入

参照之前介绍的 Adam 优化算法，可以猜到，该模型一定封装有 Evaluate 和 Gradient 函数

果不其然：

Evaluate

Evaluate 头文件：

  /**
   * Evaluate the feedforward network with the given parameters. This function
   * is usually called by the optimizer to train the model.
   *
   * @param parameters Matrix model parameters.
   */
  double Evaluate(const arma::mat& parameters);

   /**
   * Evaluate the feedforward network with the given parameters, but using only
   * a number of data points. This is useful for optimizers such as SGD, which
   * require a separable objective function.
   *
   * @param parameters Matrix model parameters.
   * @param begin Index of the starting point to use for objective function
   *        evaluation.
   * @param batchSize Number of points to be passed at a time to use for
   *        objective function evaluation.
   * @param deterministic Whether or not to train or test the model. Note some
   *        layer act differently in training or testing mode.
   */
  double Evaluate(const arma::mat& parameters,
                  const size_t begin,
                  const size_t batchSize,
                  const bool deterministic);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
double FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::Evaluate(
    const arma::mat& parameters)
{
  double res = 0;
  for (size_t i = 0; i < predictors.n_cols; ++i)
    res += Evaluate(parameters, i, 1, true);

  return res;
}

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
double FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::Evaluate(
    const arma::mat& /* parameters */,
    const size_t begin,
    const size_t batchSize,
    const bool deterministic)
{
  if (parameter.is_empty())
    ResetParameters();

  if (deterministic != this->deterministic)
  {
    this->deterministic = deterministic;
    ResetDeterministic();
  }

  Forward(predictors.cols(begin, begin + batchSize - 1));
  double res = outputLayer.Forward(
      boost::apply_visitor(outputParameterVisitor, network.back()),
      responses.cols(begin, begin + batchSize - 1));

  for (size_t i = 0; i < network.size(); ++i)
  {
    res += boost::apply_visitor(lossVisitor, network[i]);
  }

  return res;
}

先看一下两个 Reset 方法：

Reset

ResetDeterministic

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
void FFN<OutputLayerType, InitializationRuleType,
         CustomLayers...>::ResetDeterministic()
{
  DeterministicSetVisitor deterministicSetVisitor(deterministic);
  std::for_each(network.begin(), network.end(),
      boost::apply_visitor(deterministicSetVisitor));
}

里面用到了两个标准库的函数，首先是 std::for_each，其函数原型：

UnaryProc for_each ( InputIterator beg, InputIterator end, UnaryProc op)

可以猜想，boost::apply_visitor 一定是个函数对象了：

boost::apply_visitor — Allows compile-time checked type-safe application of the given visitor to the content of the given variant, ensuring that all types are handled by the visitor.

apply_visitor 有多个重载，在这里应该是使用它作为一元函数对象，即，将 deterministicSetVisitor 依次作用到 network 的每一个元素上

继续去看看
DeterministicSetVisitor 头文件：

/**
 * DeterministicSetVisitor set the deterministic parameter given the
 * deterministic value.
 */
class DeterministicSetVisitor : public boost::static_visitor<void>
{
 public:
  //! Set the deterministic parameter given the current deterministic value.
  DeterministicSetVisitor(const bool deterministic = true);

  //! Set the deterministic parameter.
  template<typename LayerType>
  void operator()(LayerType* layer) const;

  void operator()(MoreTypes layer) const;

 private:
  //! The deterministic parameter.
  const bool deterministic;

  //! Set the deterministic parameter if the module implements the
  //! Deterministic() and Model() function.
  template<typename T>
  typename std::enable_if<
      HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
      HasModelCheck<T>::value, void>::type
  LayerDeterministic(T* layer) const;

  //! Set the deterministic parameter if the module implements the
  //! Model() function.
  template<typename T>
  typename std::enable_if<
      !HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
      HasModelCheck<T>::value, void>::type
  LayerDeterministic(T* layer) const;

  //! Set the deterministic parameter if the module implements the
  //! Deterministic() function.
  template<typename T>
  typename std::enable_if<
      HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
      !HasModelCheck<T>::value, void>::type
  LayerDeterministic(T* layer) const;

  //! Do not set the deterministic parameter if the module doesn't implement the
  //! Deterministic() or Model() function.
  template<typename T>
  typename std::enable_if<
      !HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
      !HasModelCheck<T>::value, void>::type
  LayerDeterministic(T* layer) const;
};

实现：

//! DeterministicSetVisitor visitor class.
inline DeterministicSetVisitor::DeterministicSetVisitor(
    const bool deterministic) : deterministic(deterministic)
{
  /* Nothing to do here. */
}

template<typename LayerType>
inline void DeterministicSetVisitor::operator()(LayerType* layer) const
{
  LayerDeterministic(layer);
}

inline void DeterministicSetVisitor::operator()(MoreTypes layer) const
{
  layer.apply_visitor(*this);
}

template<typename T>
inline typename std::enable_if<
    HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
    HasModelCheck<T>::value, void>::type
DeterministicSetVisitor::LayerDeterministic(T* layer) const
{
  layer->Deterministic() = deterministic;

  for (size_t i = 0; i < layer->Model().size(); ++i)
  {
    boost::apply_visitor(DeterministicSetVisitor(deterministic),
        layer->Model()[i]);
  }
}

template<typename T>
inline typename std::enable_if<
    !HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
    HasModelCheck<T>::value, void>::type
DeterministicSetVisitor::LayerDeterministic(T* layer) const
{
  for (size_t i = 0; i < layer->Model().size(); ++i)
  {
    boost::apply_visitor(DeterministicSetVisitor(deterministic),
        layer->Model()[i]);
  }
}

template<typename T>
inline typename std::enable_if<
    HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
    !HasModelCheck<T>::value, void>::type
DeterministicSetVisitor::LayerDeterministic(T* layer) const
{
  layer->Deterministic() = deterministic;
}

template<typename T>
inline typename std::enable_if<
    !HasDeterministicCheck<T, bool&(T::*)(void)>::value &&
    !HasModelCheck<T>::value, void>::type
DeterministicSetVisitor::LayerDeterministic(T* /* input */) const
{
  /* Nothing to do here. */
}

总体上，就是继续将函数 DeterministicSetVisitor 作用于 layer->Model 每一个元素上
其具体行为依据 Layer 类型的不同而不同

ResetParameters

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
void FFN<OutputLayerType, InitializationRuleType,
         CustomLayers...>::ResetParameters()
{
  ResetDeterministic();

  // Reset the network parameter with the given initialization rule.
  NetworkInitialization<InitializationRuleType,
                        CustomLayers...> networkInit(initializeRule);
  networkInit.Initialize(network, parameter);
}

先前介绍的 RandomInitialization 在这里派上了用场，不过，还得先去看一下 NetworkInitialization

network_init

/**
 * This class is used to initialize the network with the given initialization
 * rule.
 */
template<typename InitializationRuleType, typename... CustomLayers>
class NetworkInitialization
{
 public:
  /**
   * Use the given initialization rule to initialize the specified network.
   *
   * @param initializeRule Rule to initialize the given network.
   */
  NetworkInitialization(
      const InitializationRuleType& initializeRule = InitializationRuleType()) :
      initializeRule(initializeRule)
  {
    // Nothing to do here.
  }

  /**
   * Initialize the specified network and store the results in the given
   * parameter.
   *
   * @param network Network that should be initialized.
   * @param parameter The network parameter.
   * @param parameterOffset Offset for network paramater, default 0.
   */
  template <typename eT>
  void Initialize(const std::vector<LayerTypes<CustomLayers...> >& network,
                  arma::Mat<eT>& parameter, size_t parameterOffset = 0)
  {
    // Determine the number of parameter/weights of the given network.
    if (parameter.is_empty())
    {
      size_t weights = 0;
      for (size_t i = 0; i < network.size(); ++i)
        weights += boost::apply_visitor(weightSizeVisitor, network[i]);
      parameter.set_size(weights, 1);
    }

    // Initialize the network layer by layer or the complete network.
    if (ann::InitTraits<InitializationRuleType>::UseLayer)
    {
      for (size_t i = 0, offset = parameterOffset; i < network.size(); ++i)
      {
        // Initialize the layer with the specified parameter/weight
        // initialization rule.
        const size_t weight = boost::apply_visitor(weightSizeVisitor,
            network[i]);
        arma::Mat<eT> tmp = arma::mat(parameter.memptr() + offset,
            weight, 1, false, false);
        initializeRule.Initialize(tmp, tmp.n_elem, 1);

        // Increase the parameter/weight offset for the next layer.
        offset += weight;
      }
    }
    else
    {
      initializeRule.Initialize(parameter, parameter.n_elem, 1);
    }

    // Note: We can't merge the for loop into the for loop above because
    // WeightSetVisitor also sets the parameter/weights of the inner modules.
    // Inner Modules are held by the parent module e.g. the concat module can
    // hold various other modules.
    for (size_t i = 0, offset = parameterOffset; i < network.size(); ++i)
    {
      offset += boost::apply_visitor(WeightSetVisitor(parameter, offset),
          network[i]);

      boost::apply_visitor(resetVisitor, network[i]);
    }
  }

 private:
  //! Instantiated InitializationRule object for initializing the network
  //! parameter.
  InitializationRuleType initializeRule;

  //! Locally-stored reset visitor.
  ResetVisitor resetVisitor;

  //! Locally-stored weight size visitor.
  WeightSizeVisitor weightSizeVisitor;
}; // class NetworkInitialization

首先是对 network 每一个元素调用 weightSizeVisitor 取得 parameter 的形状

WeightSizeVisitor 头文件：

/**
 * WeightSizeVisitor returns the number of weights of the given module.
 */
class WeightSizeVisitor : public boost::static_visitor<size_t>
{
 public:
  //! Return the number of weights.
  template<typename LayerType>
  size_t operator()(LayerType* layer) const;

  size_t operator()(MoreTypes layer) const;

 private:
  //! If the module doesn't implement the Parameters() or Model() function
  //! return 0.
  template<typename T, typename P>
  typename std::enable_if<
      !HasParametersCheck<T, P&(T::*)()>::value &&
      !HasModelCheck<T>::value, size_t>::type
  LayerSize(T* layer, P& output) const;

  //! Return the number of parameters if the module implements the Model()
  //! function.
  template<typename T, typename P>
  typename std::enable_if<
      !HasParametersCheck<T, P&(T::*)()>::value &&
      HasModelCheck<T>::value, size_t>::type
  LayerSize(T* layer, P& output) const;

  //! Return the number of parameters if the module implements the Parameters()
  //! function.
  template<typename T, typename P>
  typename std::enable_if<
      HasParametersCheck<T, P&(T::*)()>::value &&
      !HasModelCheck<T>::value, size_t>::type
  LayerSize(T* layer, P& output) const;

  //! Return the accumulated number of parameters if the module implements the
  //! Parameters() and Model() function.
  template<typename T, typename P>
  typename std::enable_if<
      HasParametersCheck<T, P&(T::*)()>::value &&
      HasModelCheck<T>::value, size_t>::type
  LayerSize(T* layer, P& output) const;
};

实现：

//! WeightSizeVisitor visitor class.
template<typename LayerType>
inline size_t WeightSizeVisitor::operator()(LayerType* layer) const
{
  return LayerSize(layer, layer->OutputParameter());
}

inline size_t WeightSizeVisitor::operator()(MoreTypes layer) const
{
  return layer.apply_visitor(*this);
}

template<typename T, typename P>
inline typename std::enable_if<
    !HasParametersCheck<T, P&(T::*)()>::value &&
    !HasModelCheck<T>::value, size_t>::type
WeightSizeVisitor::LayerSize(T* /* layer */, P& /* output */) const
{
  return 0;
}

template<typename T, typename P>
inline typename std::enable_if<
    !HasParametersCheck<T, P&(T::*)()>::value &&
    HasModelCheck<T>::value, size_t>::type
WeightSizeVisitor::LayerSize(T* layer, P& /* output */) const
{
  size_t weights = 0;
  for (size_t i = 0; i < layer->Model().size(); ++i)
  {
    weights += boost::apply_visitor(WeightSizeVisitor(), layer->Model()[i]);
  }

  return weights;
}

template<typename T, typename P>
inline typename std::enable_if<
    HasParametersCheck<T, P&(T::*)()>::value &&
    !HasModelCheck<T>::value, size_t>::type
WeightSizeVisitor::LayerSize(T* layer, P& /* output */) const
{
  return layer->Parameters().n_elem;
}

template<typename T, typename P>
inline typename std::enable_if<
    HasParametersCheck<T, P&(T::*)()>::value &&
    HasModelCheck<T>::value, size_t>::type
WeightSizeVisitor::LayerSize(T* layer, P& /* output */) const
{
  size_t weights = layer->Parameters().n_elem;
  for (size_t i = 0; i < layer->Model().size(); ++i)
  {
    weights += boost::apply_visitor(WeightSizeVisitor(), layer->Model()[i]);
  }

  return weights;
}

大体上，会返回 layer 的 Parameters 的元素个数（有的话）加上对 layer->Model 每一元素继续调用 WeightSizeVisitor 的结果（有的话）

将这些变量的个数相加，形成一个列向量，就是 parameter

InitTraits

/**
 * This is a template class that can provide information about various
 * initialization methods. By default, this class will provide the weakest
 * possible assumptions on the initialization method, and each initialization
 * method should override values as necessary. If a initialization method
 * doesn't need to override a value, then there's no need to write a InitTraits
 * specialization for that class.
 */
template<typename InitRuleType>
class InitTraits
{
 public:
  /**
   * This is true if the initialization method is used for a single layer.
   */
  static const bool UseLayer = true;
};

根据 UseLayer 决定是一层一层还是整个网络一起初始化，主要的初始化过程就是先前介绍的模板参数

除此之外，还对网络的每一层调用了一个 resetVisitor

resetVisitor 头文件：

/**
 * ResetVisitor executes the Reset() function.
 */
class ResetVisitor : public boost::static_visitor<void>
{
 public:
  //! Execute the Reset() function.
  template<typename LayerType>
  void operator()(LayerType* layer) const;

  void operator()(MoreTypes layer) const;

 private:
  //! Execute the Reset() function for a module which implements the Reset()
  //! function.
  template<typename T>
  typename std::enable_if<
      HasResetCheck<T, void(T::*)()>::value &&
      !HasModelCheck<T>::value, void>::type
  ResetParameter(T* layer) const;

  //! Execute the Reset() function for a module which implements the Model()
  //! function.
  template<typename T>
  typename std::enable_if<
      !HasResetCheck<T, void(T::*)()>::value &&
      HasModelCheck<T>::value, void>::type
  ResetParameter(T* layer) const;

  //! Execute the Reset() function for a module which implements the Reset()
  //! and Model() function.
  template<typename T>
  typename std::enable_if<
      HasResetCheck<T, void(T::*)()>::value &&
      HasModelCheck<T>::value, void>::type
  ResetParameter(T* layer) const;

  //! Do not execute the Reset() function for a module which doesn't implement
  // the Reset() or Model() function.
  template<typename T>
  typename std::enable_if<
      !HasResetCheck<T, void(T::*)()>::value &&
      !HasModelCheck<T>::value, void>::type
  ResetParameter(T* layer) const;
};

实现：

//! ResetVisitor visitor class.
template<typename LayerType>
inline void ResetVisitor::operator()(LayerType* layer) const
{
  ResetParameter(layer);
}

inline void ResetVisitor::operator()(MoreTypes layer) const
{
  layer.apply_visitor(*this);
}

template<typename T>
inline typename std::enable_if<
    HasResetCheck<T, void(T::*)()>::value &&
    !HasModelCheck<T>::value, void>::type
ResetVisitor::ResetParameter(T* layer) const
{
  layer->Reset();
}

template<typename T>
inline typename std::enable_if<
    !HasResetCheck<T, void(T::*)()>::value &&
    HasModelCheck<T>::value, void>::type
ResetVisitor::ResetParameter(T* layer) const
{
  for (size_t i = 0; i < layer->Model().size(); ++i)
  {
    boost::apply_visitor(ResetVisitor(), layer->Model()[i]);
  }
}

template<typename T>
inline typename std::enable_if<
    HasResetCheck<T, void(T::*)()>::value &&
    HasModelCheck<T>::value, void>::type
ResetVisitor::ResetParameter(T* layer) const
{
  for (size_t i = 0; i < layer->Model().size(); ++i)
  {
    boost::apply_visitor(ResetVisitor(), layer->Model()[i]);
  }

  layer->Reset();
}

template<typename T>
inline typename std::enable_if<
    !HasResetCheck<T, void(T::*)()>::value &&
    !HasModelCheck<T>::value, void>::type
ResetVisitor::ResetParameter(T* /* layer */) const
{
  /* Nothing to do here. */
}

和以往遇到的 Visitor 类差不多，调用 layer->Model 中的 ResetVisitor ，以及调用 layer 的 Reset 函数

如此一来，两个 Reset 方法就看完了，下面进入 Forward 函数：

Forward

一般情况下，因为 batchSize 为 1 , 所以对于 predictors 的每一列（每一数据点）调用了 Forward 函数

Forward 头文件：

  // Helper functions.
  /**
   * The Forward algorithm (part of the Forward-Backward algorithm).  Computes
   * forward probabilities for each module.
   *
   * @param input Data sequence to compute probabilities for.
   */
  template<typename InputType>
  void Forward(const InputType& input);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
template<typename InputType>
void FFN<OutputLayerType, InitializationRuleType,
         CustomLayers...>::Forward(const InputType& input)
{
  boost::apply_visitor(ForwardVisitor(input,
      boost::apply_visitor(outputParameterVisitor, network.front())),
      network.front());

  if (!reset)
  {
    if (boost::apply_visitor(outputWidthVisitor, network.front()) != 0)
    {
      width = boost::apply_visitor(outputWidthVisitor, network.front());
    }

    if (boost::apply_visitor(outputHeightVisitor, network.front()) != 0)
    {
      height = boost::apply_visitor(outputHeightVisitor, network.front());
    }
  }

  for (size_t i = 1; i < network.size(); ++i)
  {
    if (!reset)
    {
      // Set the input width.
      boost::apply_visitor(SetInputWidthVisitor(width), network[i]);

      // Set the input height.
      boost::apply_visitor(SetInputHeightVisitor(height), network[i]);
    }

    boost::apply_visitor(ForwardVisitor(boost::apply_visitor(
        outputParameterVisitor, network[i - 1]),
        boost::apply_visitor(outputParameterVisitor, network[i])), network[i]);

    if (!reset)
    {
      // Get the output width.
      if (boost::apply_visitor(outputWidthVisitor, network[i]) != 0)
      {
        width = boost::apply_visitor(outputWidthVisitor, network[i]);
      }

      // Get the output height.
      if (boost::apply_visitor(outputHeightVisitor, network[i]) != 0)
      {
        height = boost::apply_visitor(outputHeightVisitor, network[i]);
      }
    }
  }

  if (!reset)
    reset = true;
}

Visitor 类我就不展示了，因为它无非是调用某 layer 的对应的函数

比如说第一个语句，就是调用 network 的第一层 layer 的 Forward 函数，并将 input 以及 network 第一层的 outputParameter 函数结果作为参数

接着，如果是第一次调用该函数，reset 应该为 false （初始化的结果），就取得网络第一层输出的宽度及高度（不为零的话），接着设置第二层输入的形状与第一层输出进行对接，再次调用 Visitor 类的函数，即：调用这层的 Forward 函数，并将上一层的 outputParameter，以及这层的 outputParameter 作为参数，然后在整个网络中依次进行这个过程

最后 reset 置位

Loss

然后又调用了 outputLayer 的 Forward 函数，默认情况下就是之前介绍的负对数似然损失函数
将网络最后一层的 outputParameter 以及相应的 responses 列作为参数

返回的结果再加上每一层网络的 loss 函数结果，得到 Evaluate 的最终结果

Gradient

Gradient 头文件：

  /**
   * Evaluate the gradient of the feedforward network with the given parameters,
   * and with respect to only a number of points in the dataset. This is useful
   * for optimizers such as SGD, which require a separable objective function.
   *
   * @param parameters Matrix of the model parameters to be optimized.
   * @param begin Index of the starting point to use for objective function
   *        gradient evaluation.
   * @param gradient Matrix to output gradient into.
   * @param batchSize Number of points to be processed as a batch for objective
   *        function gradient evaluation.
   */
  void Gradient(const arma::mat& parameters,
                const size_t begin,
                arma::mat& gradient,
                const size_t batchSize);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
void FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::Gradient(
    const arma::mat& parameters,
    const size_t begin,
    arma::mat& gradient,
    const size_t batchSize)
{
  this->EvaluateWithGradient(parameters, begin, gradient, batchSize);
}

没什么好说的，下一个

EvaluateWithGradient

EvaluateWithGradient 头文件：

  /**
   * Evaluate the feedforward network with the given parameters.
   * This function is usually called by the optimizer to train the model.
   * This just calls the overload of EvaluateWithGradient() with batchSize = 1.
   *
   * @param parameters Matrix model parameters.
   * @param gradient Matrix to output gradient into.
   */
  template<typename GradType>
  double EvaluateWithGradient(const arma::mat& parameters, GradType& gradient);

   /**
   * Evaluate the feedforward network with the given parameters, but using only
   * a number of data points. This is useful for optimizers such as SGD, which
   * require a separable objective function.
   *
   * @param parameters Matrix model parameters.
   * @param begin Index of the starting point to use for objective function
   *        evaluation.
   * @param gradient Matrix to output gradient into.
   * @param batchSize Number of points to be passed at a time to use for
   *        objective function evaluation.
   */
  template<typename GradType>
  double EvaluateWithGradient(const arma::mat& parameters,
                              const size_t begin,
                              GradType& gradient,
                              const size_t batchSize);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
template<typename GradType>
double FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::
EvaluateWithGradient(const arma::mat& parameters, GradType& gradient)
{
  double res = 0;
  for (size_t i = 0; i < predictors.n_cols; ++i)
    res += EvaluateWithGradient(parameters, i, gradient, 1);

  return res;
}

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
template<typename GradType>
double FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::
EvaluateWithGradient(const arma::mat& /* parameters */,
                     const size_t begin,
                     GradType& gradient,
                     const size_t batchSize)
{
  if (gradient.is_empty())
  {
    if (parameter.is_empty())
      ResetParameters();

    gradient = arma::zeros<arma::mat>(parameter.n_rows, parameter.n_cols);
  }
  else
  {
    gradient.zeros();
  }

  if (this->deterministic)
  {
    this->deterministic = false;
    ResetDeterministic();
  }

  Forward(predictors.cols(begin, begin + batchSize - 1));
  double res = outputLayer.Forward(
      boost::apply_visitor(outputParameterVisitor, network.back()),
      responses.cols(begin, begin + batchSize - 1));

  for (size_t i = 0; i < network.size(); ++i)
  {
    res += boost::apply_visitor(lossVisitor, network[i]);
  }

  outputLayer.Backward(
      boost::apply_visitor(outputParameterVisitor, network.back()),
      responses.cols(begin, begin + batchSize - 1),
      error);

  Backward();
  ResetGradients(gradient);
  Gradient(predictors.cols(begin, begin + batchSize - 1));

  return res;
}

开始是一些变量初始化，之前已经介绍过了

然后是 Forward 函数以及构造损失函数，和 Evaluate 中的一样

不一样的在于之后，调用了 outputLayer 的 Backward 函数，并将最后一层的 outputParameter 函数结果，相应的 responses 列，以及 error 作为参数传入，默认情况下就是先前介绍的负对数似然损失

接着又调用了无参的 Backward 函数：

Backward

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
void FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::Backward()
{
  boost::apply_visitor(BackwardVisitor(boost::apply_visitor(
      outputParameterVisitor, network.back()), error,
      boost::apply_visitor(deltaVisitor, network.back())), network.back());

  for (size_t i = 2; i < network.size(); ++i)
  {
    boost::apply_visitor(BackwardVisitor(boost::apply_visitor(
        outputParameterVisitor, network[network.size() - i]),
        boost::apply_visitor(deltaVisitor, network[network.size() - i + 1]),
        boost::apply_visitor(deltaVisitor, network[network.size() - i])),
        network[network.size() - i]);
  }
}

Backward 和 Forward 是成双入对的，调用第 i 层的 Backward 函数，将第 i 层的 outputParameter 函数结果，第 i + 1 层的 delta 函数结果，以及第 i 层 delta 函数结果作为参数，整个过程从后向前依次进行，最后一层单独处理

下一步是 ResetGradients 函数：

ResetGradients

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
void FFN<OutputLayerType, InitializationRuleType,
         CustomLayers...>::ResetGradients(arma::mat& gradient)
{
  size_t offset = 0;
  for (size_t i = 0; i < network.size(); ++i)
  {
    offset += boost::apply_visitor(GradientSetVisitor(gradient, offset),
        network[i]);
  }
}

这个无非是针对网络中的每一层调用了自己的 GradientSet 函数

Gradient 的最后一步，是调用了一个接受一个矩阵的 Gradient 函数：

Gradient 头文件：

  /**
   * Iterate through all layer modules and update the the gradient using the
   * layer defined optimizer.
   */
  template<typename InputType>
  void Gradient(const InputType& input);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
template<typename InputType>
void FFN<OutputLayerType, InitializationRuleType,
         CustomLayers...>::Gradient(const InputType& input)
{
  boost::apply_visitor(GradientVisitor(input,
      boost::apply_visitor(deltaVisitor, network[1])), network.front());

  for (size_t i = 1; i < network.size() - 1; ++i)
  {
    boost::apply_visitor(GradientVisitor(boost::apply_visitor(
        outputParameterVisitor, network[i - 1]),
        boost::apply_visitor(deltaVisitor, network[i + 1])), network[i]);
  }

  boost::apply_visitor(GradientVisitor(boost::apply_visitor(
      outputParameterVisitor, network[network.size() - 2]), error),
      network[network.size() - 1]);
}

有了先前的经验，再来看这段代码就容易理解了：

先调用网络第一层的 Gradient 函数，将 input 和第二层的 delta 函数结果作为参数传入
接着进入循环，调用第 i 层的 Gradient 函数，并将 i - 1 层的 outputParameter 函数结果以及 i + 1 层的 delta 函数结果作为参数
最后调用网络最后一层的 Gradient 函数，将前一层的 outputParameter 函数结果以及 error 作为参数传入

Predict

Predict 头文件：

  /**
   * Predict the responses to a given set of predictors. The responses will
   * reflect the output of the given output layer as returned by the
   * output layer function.
   *
   * If you want to pass in a parameter and discard the original parameter
   * object, be sure to use std::move to avoid unnecessary copy.
   *
   * @param predictors Input predictors.
   * @param results Matrix to put output predictions of responses into.
   */
  void Predict(arma::mat predictors, arma::mat& results);

实现：

template<typename OutputLayerType, typename InitializationRuleType,
         typename... CustomLayers>
void FFN<OutputLayerType, InitializationRuleType, CustomLayers...>::Predict(
    arma::mat predictors, arma::mat& results)
{
  if (parameter.is_empty())
    ResetParameters();

  if (!deterministic)
  {
    deterministic = true;
    ResetDeterministic();
  }

  arma::mat resultsTemp;
  Forward(arma::mat(predictors.colptr(0), predictors.n_rows, 1, false, true));
  resultsTemp = boost::apply_visitor(outputParameterVisitor,
      network.back()).col(0);

  results = arma::mat(resultsTemp.n_elem, predictors.n_cols);
  results.col(0) = resultsTemp.col(0);

  for (size_t i = 1; i < predictors.n_cols; ++i)
  {
    Forward(arma::mat(predictors.colptr(i), predictors.n_rows, 1, false, true));

    resultsTemp = boost::apply_visitor(outputParameterVisitor,
        network.back());
    results.col(i) = resultsTemp.col(0);
  }
}

整个过程大概就是将 predictors 的每一列依次进行 Forward ，取出网络最后的输出作为结果矩阵，只是将第一步分开进行，以便确认结果矩阵的形状

Layer

Linear

Constructor

头文件：

/**
 * Implementation of the Linear layer class. The Linear class represents a
 * single layer of a neural network.
 *
 * @tparam InputDataType Type of the input data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 * @tparam OutputDataType Type of the output data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 */
template <
    typename InputDataType = arma::mat,
    typename OutputDataType = arma::mat,
    typename RegularizerType = NoRegularizer
>
class Linear
{
 public:
  //! Create the Linear object.
  Linear();

  /**
   * Create the Linear layer object using the specified number of units.
   *
   * @param inSize The number of input units.
   * @param outSize The number of output units.
   * @param regularizer The regularizer to use, optional.
   */
  Linear(const size_t inSize,
         const size_t outSize,
         RegularizerType regularizer = RegularizerType());

实现：

template<typename InputDataType, typename OutputDataType,
    typename RegularizerType>
Linear<InputDataType, OutputDataType, RegularizerType>::Linear() :
    inSize(0),
    outSize(0)
{
  // Nothing to do here.
}

template<typename InputDataType, typename OutputDataType,
    typename RegularizerType>
Linear<InputDataType, OutputDataType, RegularizerType>::Linear(
    const size_t inSize,
    const size_t outSize,
    RegularizerType regularizer) :
    inSize(inSize),
    outSize(outSize),
    regularizer(regularizer)
{
  weights.set_size(outSize * inSize + outSize, 1);
}

NoRegularizer 顾名思义，就是没有 Regularizer

这里构造函数主要就是设置 weights 向量的形状：
$outSize \times inSize = weight.Size \\[5pt] outSize \times 1 = bias.Size$

这点从 Reset 函数中也可看出：

Reset

template<typename InputDataType, typename OutputDataType,
    typename RegularizerType>
void Linear<InputDataType, OutputDataType, RegularizerType>::Reset()
{
  weight = arma::mat(weights.memptr(), outSize, inSize, false, false);
  bias = arma::mat(weights.memptr() + weight.n_elem,
      outSize, 1, false, false);
}

Forward

头文件：

  /**
   * Ordinary feed forward pass of a neural network, evaluating the function
   * f(x) by propagating the activity forward through f.
   *
   * @param input Input data used for evaluating the specified function.
   * @param output Resulting output activation.
   */
  template<typename eT>
  void Forward(const arma::Mat<eT>& input, arma::Mat<eT>& output);

实现：

template<typename InputDataType, typename OutputDataType,
    typename RegularizerType>
template<typename eT>
void Linear<InputDataType, OutputDataType, RegularizerType>::Forward(
    const arma::Mat<eT>& input, arma::Mat<eT>& output)
{
  output = weight * input;
  output.each_col() += bias;
}

正如 linear 头文件的注释中所说，线性层主要作为全连接层或者仿射变换：
$\cdot input + bias$

回顾整个网络的 Forward 函数，可知，input 要么是输入的数据点（第一层），要么是上一层的输出（除第一层外），而 output 都是这一层的 outputParameter

另外，所有参数传递都是引用类型

Backward

头文件：

  /**
   * Ordinary feed backward pass of a neural network, calculating the function
   * f(x) by propagating x backwards trough f. Using the results from the feed
   * forward pass.
   *
   * @param * (input) The propagated input activation.
   * @param gy The backpropagated error.
   * @param g The calculated gradient.
   */
  template<typename eT>
  void Backward(const arma::Mat<eT>& /* input */,
                const arma::Mat<eT>& gy,
                arma::Mat<eT>& g);

实现：

template<typename InputDataType, typename OutputDataType,
    typename RegularizerType>
template<typename eT>
void Linear<InputDataType, OutputDataType, RegularizerType>::Backward(
    const arma::Mat<eT>& /* input */, const arma::Mat<eT>& gy, arma::Mat<eT>& g)
{
  g = weight.t() * gy;
}

回顾整个网络的 Backward 函数，可知，gy（The backpropagated error）要么是 error （最后一层），要么是下一层的 delta （除最后一层），而 g（The calculated gradient）都是这一层的 delta

另外，所有参数传递都是引用类型

Gradient

头文件：

  /*
   * Calculate the gradient using the output delta and the input activation.
   *
   * @param input The input parameter used for calculating the gradient.
   * @param error The calculated error.
   * @param gradient The calculated gradient.
   */
  template<typename eT>
  void Gradient(const arma::Mat<eT>& input,
                const arma::Mat<eT>& error,
                arma::Mat<eT>& gradient);

实现：

template<typename InputDataType, typename OutputDataType,
    typename RegularizerType>
template<typename eT>
void Linear<InputDataType, OutputDataType, RegularizerType>::Gradient(
    const arma::Mat<eT>& input,
    const arma::Mat<eT>& error,
    arma::Mat<eT>& gradient)
{
  gradient.submat(0, 0, weight.n_elem - 1, 0) = arma::vectorise(
      error * input.t());
  gradient.submat(weight.n_elem, 0, gradient.n_elem - 1, 0) =
      arma::sum(error, 1);
  regularizer.Evaluate(weights, gradient);
}

同样道理，这里的 input 要么是 input （第一层），要么是上一层的 outputParameter ，而 error 要么是下一层的 delta，要么是 error（最后一层），而 gradient 就是这层的 gradient 矩阵

另外，所有参数传递都是引用类型

.submat 原型：

X.submat ( first_row, first_col, last_row, last_col )

.vectorise(X, dim) 官方解释：

Generate a flattened version of matrix X or cube Q
The argument dim is optional; by default dim=0 is used

因此，这里的 gradient 矩阵的更新应该是分为 weights 部分和 bias 部分，weights 部分更新为 $error \cdot input^{\mathsf{T}}$ ，bias 部分更新为 error 相应整行的和

最后再调用 regularizer 的 Evaluate 函数，这里没有就不讨论了

Convolution

Constructor

头文件：

/**
 * Implementation of the Convolution class. The Convolution class represents a
 * single layer of a neural network.
 *
 * @tparam ForwardConvolutionRule Convolution to perform forward process.
 * @tparam BackwardConvolutionRule Convolution to perform backward process.
 * @tparam GradientConvolutionRule Convolution to calculate gradient.
 * @tparam InputDataType Type of the input data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 * @tparam OutputDataType Type of the output data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 */
template <
    typename ForwardConvolutionRule = NaiveConvolution<ValidConvolution>,
    typename BackwardConvolutionRule = NaiveConvolution<FullConvolution>,
    typename GradientConvolutionRule = NaiveConvolution<ValidConvolution>,
    typename InputDataType = arma::mat,
    typename OutputDataType = arma::mat
>
class Convolution
{
 public:
  //! Create the Convolution object.
  Convolution();

  /**
   * Create the Convolution object using the specified number of input maps,
   * output maps, filter size, stride and padding parameter.
   *
   * @param inSize The number of input maps.
   * @param outSize The number of output maps.
   * @param kernelWidth Width of the filter/kernel.
   * @param kernelHeight Height of the filter/kernel.
   * @param strideWidth Stride of filter application in the x direction.
   * @param strideHeight Stride of filter application in the y direction.
   * @param padW Padding width of the input.
   * @param padH Padding height of the input.
   * @param inputWidth The width of the input data.
   * @param inputHeight The height of the input data.
   * @param paddingType The type of padding (Valid or Same). Defaults to None.
   */
  Convolution(const size_t inSize,
              const size_t outSize,
              const size_t kernelWidth,
              const size_t kernelHeight,
              const size_t strideWidth = 1,
              const size_t strideHeight = 1,
              const size_t padW = 0,
              const size_t padH = 0,
              const size_t inputWidth = 0,
              const size_t inputHeight = 0,
              const std::string& paddingType = "None");

  /**
   * Create the Convolution object using the specified number of input maps,
   * output maps, filter size, stride and padding parameter.
   *
   * @param inSize The number of input maps.
   * @param outSize The number of output maps.
   * @param kernelWidth Width of the filter/kernel.
   * @param kernelHeight Height of the filter/kernel.
   * @param strideWidth Stride of filter application in the x direction.
   * @param strideHeight Stride of filter application in the y direction.
   * @param padW A two-value tuple indicating padding widths of the input.
   *             First value is padding at left side. Second value is padding on
   *             right side.
   * @param padH A two-value tuple indicating padding heights of the input.
   *             First value is padding at top. Second value is padding on
   *             bottom.
   * @param inputWidth The width of the input data.
   * @param inputHeight The height of the input data.
   * @param paddingType The type of padding (Valid or Same). Defaults to None.
   */
  Convolution(const size_t inSize,
              const size_t outSize,
              const size_t kernelWidth,
              const size_t kernelHeight,
              const size_t strideWidth,
              const size_t strideHeight,
              const std::tuple<size_t, size_t>& padW,
              const std::tuple<size_t, size_t>& padH,
              const size_t inputWidth = 0,
              const size_t inputHeight = 0,
              const std::string& paddingType = "None");

实现：

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Convolution()
{
  // Nothing to do here.
}

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Convolution(
    const size_t inSize,
    const size_t outSize,
    const size_t kernelWidth,
    const size_t kernelHeight,
    const size_t strideWidth,
    const size_t strideHeight,
    const size_t padW,
    const size_t padH,
    const size_t inputWidth,
    const size_t inputHeight,
    const std::string& paddingType) :
    Convolution(
      inSize,
      outSize,
      kernelWidth,
      kernelHeight,
      strideWidth,
      strideHeight,
      std::tuple<size_t, size_t>(padW, padW),
      std::tuple<size_t, size_t>(padH, padH),
      inputWidth,
      inputHeight,
      paddingType)
{
  // Nothing to do here.
}

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Convolution(
    const size_t inSize,
    const size_t outSize,
    const size_t kernelWidth,
    const size_t kernelHeight,
    const size_t strideWidth,
    const size_t strideHeight,
    const std::tuple<size_t, size_t>& padW,
    const std::tuple<size_t, size_t>& padH,
    const size_t inputWidth,
    const size_t inputHeight,
    const std::string& paddingType) :
    inSize(inSize),
    outSize(outSize),
    kernelWidth(kernelWidth),
    kernelHeight(kernelHeight),
    strideWidth(strideWidth),
    strideHeight(strideHeight),
    padWLeft(std::get<0>(padW)),
    padWRight(std::get<1>(padW)),
    padHBottom(std::get<1>(padH)),
    padHTop(std::get<0>(padH)),
    inputWidth(inputWidth),
    inputHeight(inputHeight),
    outputWidth(0),
    outputHeight(0)
{
  weights.set_size(WeightSize(), 1);

  // Transform paddingType to lowercase.
  std::string paddingTypeLow = paddingType;
  util::ToLower(paddingType, paddingTypeLow);

  if (paddingTypeLow == "valid")
  {
    padWLeft = 0;
    padWRight = 0;
    padHTop = 0;
    padHBottom = 0;
  }
  else if (paddingTypeLow == "same")
  {
    InitializeSamePadding();
  }

  padding = ann::Padding<>(padWLeft, padWRight, padHTop, padHBottom);
}

主要分析一下第三个构造函数

weights 矩阵的行数为：

  //! Get size of weights for the layer.
  size_t WeightSize() const
  {
    return (outSize * inSize * kernelWidth * kernelHeight) + outSize;
  }

看一下 Reset 函数：

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
void Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Reset()
{
    weight = arma::cube(weights.memptr(), kernelWidth, kernelHeight,
        outSize * inSize, false, false);
    bias = arma::mat(weights.memptr() + weight.n_elem,
        outSize, 1, false, false);
}

weight 是一个 Cube ，拥有 $outSize \times inSize$ 个 Slice，每个 Slice 包含了 $kernelWidth \times kernelHeight$ 的矩阵
而 bias 是一个列矩阵，有 outSize 行

由此就不难解释 weights 的行数了：
$outSize \times inSize \times kernelWidth \times kernelHeight + outSize$

接着是 padding ，我们先按照默认的 None 来

Padding 头文件：

/**
 * Implementation of the Padding module class. The Padding module applies a bias term
 * to the incoming data.
 *
 * @tparam InputDataType Type of the input data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 * @tparam OutputDataType Type of the output data (arma::colvec, arma::mat,
 *         arma::sp_mat or arma::cube).
 */
template <
    typename InputDataType = arma::mat,
    typename OutputDataType = arma::mat
>
class Padding
{
 public:
  /**
   * Create the Padding object using the specified number of output units.
   *
   * @param padWLeft Left padding width of the input.
   * @param padWRight Right padding width of the input.
   * @param padHTop Top padding height of the input.
   * @param padHBottom Bottom padding height of the input.
   */
  Padding(const size_t padWLeft = 0,
          const size_t padWRight = 0,
          const size_t padHTop = 0,
          const size_t padHBottom = 0);

  /**
   * Ordinary feed forward pass of a neural network, evaluating the function
   * f(x) by propagating the activity forward through f.
   *
   * @param input Input data used for evaluating the specified function.
   * @param output Resulting output activation.
   */
  template<typename eT>
  void Forward(const arma::Mat<eT>& input, arma::Mat<eT>& output);

  /**
   * Ordinary feed backward pass of a neural network, calculating the function
   * f(x) by propagating x backwards trough f. Using the results from the feed
   * forward pass.
   *
   * @param * (input) The propagated input activation.
   * @param gy The backpropagated error.
   * @param g The calculated gradient.
   */
  template<typename eT>
  void Backward(const arma::Mat<eT>& /* input */,
                const arma::Mat<eT>& gy,
                arma::Mat<eT>& g);

  //! Get the output parameter.
  OutputDataType const& OutputParameter() const { return outputParameter; }
  //! Modify the output parameter.
  OutputDataType& OutputParameter() { return outputParameter; }

  //! Get the delta.
  OutputDataType const& Delta() const { return delta; }
  //! Modify the delta.
  OutputDataType& Delta() { return delta; }

  //! Get the left padding width.
  size_t PadWLeft() const { return padWLeft; }
  //! Modify the left padding width.
  size_t& PadWLeft() { return padWLeft; }

  //! Get the right padding width.
  size_t PadWRight() const { return padWRight; }
  //! Modify the right padding width.
  size_t& PadWRight() { return padWRight; }

  //! Get the top padding width.
  size_t PadHTop() const { return padHTop; }
  //! Modify the top padding width.
  size_t& PadHTop() { return padHTop; }

  //! Get the bottom padding width.
  size_t PadHBottom() const { return padHBottom; }
  //! Modify the bottom padding width.
  size_t& PadHBottom() { return padHBottom; }

  /**
   * Serialize the layer.
   */
  template<typename Archive>
  void serialize(Archive& ar, const unsigned int /* version */);

 private:
  //! Locally-stored left padding width.
  size_t padWLeft;

  //! Locally-stored right padding width.
  size_t padWRight;

  //! Locally-stored top padding height.
  size_t padHTop;

  //! Locally-stored bottom padding height.
  size_t padHBottom;

  //! Locally-stored number of rows and columns of input.
  size_t nRows, nCols;

  //! Locally-stored delta object.
  OutputDataType delta;

  //! Locally-stored output parameter object.
  OutputDataType outputParameter;
}; // class Padding

实现：

template<typename InputDataType, typename OutputDataType>
Padding<InputDataType, OutputDataType>::Padding(
    const size_t padWLeft,
    const size_t padWRight,
    const size_t padHTop,
    const size_t padHBottom) :
    padWLeft(padWLeft),
    padWRight(padWRight),
    padHTop(padHTop),
    padHBottom(padHBottom),
    nRows(0),
    nCols(0)
{
  // Nothing to do here.
}

template<typename InputDataType, typename OutputDataType>
template<typename eT>
void Padding<InputDataType, OutputDataType>::Forward(
    const arma::Mat<eT>& input, arma::Mat<eT>& output)
{
  nRows = input.n_rows;
  nCols = input.n_cols;
  output = arma::zeros(nRows + padWLeft + padWRight,
      nCols + padHTop + padHBottom);
  output.submat(padWLeft, padHTop, padWLeft + nRows - 1,
      padHTop + nCols - 1) = input;
}

template<typename InputDataType, typename OutputDataType>
template<typename eT>
void Padding<InputDataType, OutputDataType>::Backward(
    const arma::Mat<eT>& /* input */,
    const arma::Mat<eT>& gy,
    arma::Mat<eT>& g)
{
  g = gy.submat(padWLeft, padHTop, padWLeft + nRows - 1,
      padHTop + nCols - 1);
}

template<typename InputDataType, typename OutputDataType>
template<typename Archive>
void Padding<InputDataType, OutputDataType>::serialize(
    Archive& ar, const unsigned int /* version */)
{
  ar & BOOST_SERIALIZATION_NVP(padWLeft);
  ar & BOOST_SERIALIZATION_NVP(padWRight);
  ar & BOOST_SERIALIZATION_NVP(padHTop);
  ar & BOOST_SERIALIZATION_NVP(padHBottom);
}

可以看到 padding 层的 Forward 和 Backward 函数只是单纯地扩充一下输出的矩阵，默认的填充数值为零

Forward

头文件：

  /**
   * Ordinary feed forward pass of a neural network, evaluating the function
   * f(x) by propagating the activity forward through f.
   *
   * @param input Input data used for evaluating the specified function.
   * @param output Resulting output activation.
   */
  template<typename eT>
  void Forward(const arma::Mat<eT>& input, arma::Mat<eT>& output);

实现：

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
template<typename eT>
void Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Forward(const arma::Mat<eT>& input, arma::Mat<eT>& output)
{
  batchSize = input.n_cols;
  arma::cube inputTemp(const_cast<arma::Mat<eT>&>(input).memptr(),
      inputWidth, inputHeight, inSize * batchSize, false, false);

  if (padWLeft != 0 || padWRight != 0 || padHTop != 0 || padHBottom != 0)
  {
    inputPaddedTemp.set_size(inputTemp.n_rows + padWLeft + padWRight,
        inputTemp.n_cols + padHTop + padHBottom, inputTemp.n_slices);

    for (size_t i = 0; i < inputTemp.n_slices; ++i)
    {
      padding.Forward(inputTemp.slice(i), inputPaddedTemp.slice(i));
    }
  }

  size_t wConv = ConvOutSize(inputWidth, kernelWidth, strideWidth, padWLeft,
      padWRight);
  size_t hConv = ConvOutSize(inputHeight, kernelHeight, strideHeight, padHTop,
      padHBottom);

  output.set_size(wConv * hConv * outSize, batchSize);
  outputTemp = arma::Cube<eT>(output.memptr(), wConv, hConv,
      outSize * batchSize, false, false);
  outputTemp.zeros();

  for (size_t outMap = 0, outMapIdx = 0, batchCount = 0; outMap <
      outSize * batchSize; outMap++)
  {
    if (outMap != 0 && outMap % outSize == 0)
    {
      batchCount++;
      outMapIdx = 0;
    }

    for (size_t inMap = 0; inMap < inSize; inMap++, outMapIdx++)
    {
      arma::Mat<eT> convOutput;

      if (padWLeft != 0 || padWRight != 0 || padHTop != 0 || padHBottom != 0)
      {
        ForwardConvolutionRule::Convolution(inputPaddedTemp.slice(inMap +
            batchCount * inSize), weight.slice(outMapIdx), convOutput,
            strideWidth, strideHeight);
      }
      else
      {
        ForwardConvolutionRule::Convolution(inputTemp.slice(inMap +
            batchCount * inSize), weight.slice(outMapIdx), convOutput,
            strideWidth, strideHeight);
      }

      outputTemp.slice(outMap) += convOutput;
    }

    outputTemp.slice(outMap) += bias(outMap % outSize);
  }

  outputWidth = outputTemp.n_rows;
  outputHeight = outputTemp.n_cols;
}

先构建一个 Cube ，其 Slice 为 inSize * input.n_cols，每个 Slice 是一个 inputWidth * inputHeight 的矩阵

接着，如果需要扩充的话，就按照 pad 定的尺寸进行扩充，并将 inputTemp 的每一个 Slice 填充到 inputPaddedTemp 的每一个 Slice 上

然后计算：
$\dfrac{\lfloor inputWidth + padWLeft + padWRight - kernelWidth \rfloor}{strideWidth} + 1\\[6pt] hConvc = \dfrac{\lfloor inputHeight + padHTop + padHBottom - kernelHeight \rfloor}{srideHeight} + 1$

接着按照这个尺寸设置 output 和 outputTemp

接下来的双重循环用到了 ForwardConvolutionRule 默认是 NaiveConvolution

其实现：

/**
 * Computes the two-dimensional convolution. This class allows specification of
 * the type of the border type. The convolution can be compute with the valid
 * border type of the full border type (default).
 *
 * FullConvolution: returns the full two-dimensional convolution.
 * ValidConvolution: returns only those parts of the convolution that are
 * computed without the zero-padded edges.
 *
 * @tparam BorderMode Type of the border mode (FullConvolution or
 * ValidConvolution).
 */
template<typename BorderMode = FullConvolution>
class NaiveConvolution
{
 public:
  /*
   * Perform a convolution (valid mode).
   *
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the convolution.
   * @param output Output data that contains the results of the convolution.
   * @param dW Stride of filter application in the x direction.
   * @param dH Stride of filter application in the y direction.
   * @param dilationW The dilation factor in x direction.
   * @param dilationH The dilation factor in y direction.
   */
  template<typename eT, typename Border = BorderMode>
  static typename std::enable_if<
      std::is_same<Border, ValidConvolution>::value, void>::type
  Convolution(const arma::Mat<eT>& input,
              const arma::Mat<eT>& filter,
              arma::Mat<eT>& output,
              const size_t dW = 1,
              const size_t dH = 1,
              const size_t dilationW = 1,
              const size_t dilationH = 1)
  {
    output = arma::zeros<arma::Mat<eT> >(
        (input.n_rows - (filter.n_rows - 1) * dilationW - 1) / dW + 1,
        (input.n_cols - (filter.n_cols - 1) * dilationH -  1) / dH + 1);

    // It seems to be about 3.5 times faster to use pointers instead of
    // filter(ki, kj) * input(leftInput + ki, topInput + kj) and output(i, j).
    eT* outputPtr = output.memptr();

    for (size_t j = 0; j < output.n_cols; ++j)
    {
      for (size_t i = 0; i < output.n_rows; ++i, outputPtr++)
      {
        const eT* kernelPtr = filter.memptr();
        for (size_t kj = 0; kj < filter.n_cols; ++kj)
        {
          const eT* inputPtr = input.colptr(kj * dilationW + j * dW) + i * dH;
          for (size_t ki = 0; ki < filter.n_rows; ++ki, ++kernelPtr,
              inputPtr += dilationH)
            *outputPtr += *kernelPtr * (*inputPtr);
        }
      }
    }
  }

  /*
   * Perform a convolution (full mode).
   *
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the convolution.
   * @param output Output data that contains the results of the convolution.
   * @param dW Stride of filter application in the x direction.
   * @param dH Stride of filter application in the y direction.
   * @param dilationW The dilation factor in x direction.
   * @param dilationH The dilation factor in y direction.
   */
  template<typename eT, typename Border = BorderMode>
  static typename std::enable_if<
      std::is_same<Border, FullConvolution>::value, void>::type
  Convolution(const arma::Mat<eT>& input,
              const arma::Mat<eT>& filter,
              arma::Mat<eT>& output,
              const size_t dW = 1,
              const size_t dH = 1,
              const size_t dilationW = 1,
              const size_t dilationH = 1)
  {
    size_t outputRows = (input.n_rows - 1) * dW + 2 * (filter.n_rows - 1)
        * dilationW + 1;
    size_t outputCols = (input.n_cols - 1) * dH + 2 * (filter.n_cols - 1)
        * dilationH + 1;

    for (size_t i = 0; i < dW; ++i)
    {
      if (((((i + outputRows - 2 * (filter.n_rows - 1) * dilationW - 1) % dW)
          + dW) % dW) == i){
        outputRows += i;
        break;
      }
    }
    for (size_t i = 0; i < dH; ++i)
    {
      if (((((i + outputCols - 2 * (filter.n_cols - 1) * dilationH - 1) % dH)
          + dH) % dH) == i){
        outputCols += i;
        break;
      }
    }

    // Pad filter and input to the working output shape.
    arma::Mat<eT> inputPadded = arma::zeros<arma::Mat<eT> >(outputRows,
        outputCols);
    inputPadded.submat((filter.n_rows - 1) * dilationW, (filter.n_cols - 1)
        * dilationH, (filter.n_rows - 1) * dilationW + input.n_rows - 1,
        (filter.n_cols - 1) * dilationH + input.n_cols - 1) = input;

    NaiveConvolution<ValidConvolution>::Convolution(inputPadded, filter,
        output, 1, 1, dilationW, dilationH);
  }

  /*
   * Perform a convolution using 3rd order tensors.
   *
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the convolution.
   * @param output Output data that contains the results of the convolution.
   * @param dW Stride of filter application in the x direction.
   * @param dH Stride of filter application in the y direction.
   * @param dilationW The dilation factor in x direction.
   * @param dilationH The dilation factor in y direction.
   */
  template<typename eT>
  static void Convolution(const arma::Cube<eT>& input,
                          const arma::Cube<eT>& filter,
                          arma::Cube<eT>& output,
                          const size_t dW = 1,
                          const size_t dH = 1,
                          const size_t dilationW = 1,
                          const size_t dilationH = 1)
  {
    arma::Mat<eT> convOutput;
    NaiveConvolution<BorderMode>::Convolution(input.slice(0), filter.slice(0),
        convOutput, dW, dH, dilationW, dilationH);

    output = arma::Cube<eT>(convOutput.n_rows, convOutput.n_cols,
        input.n_slices);
    output.slice(0) = convOutput;

    for (size_t i = 1; i < input.n_slices; ++i)
    {
      NaiveConvolution<BorderMode>::Convolution(input.slice(i), filter.slice(i),
          output.slice(i), dW, dH, dilationW, dilationH);
    }
  }

  /*
   * Perform a convolution using dense matrix as input and a 3rd order tensors
   * as filter and output.
   *
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the convolution.
   * @param output Output data that contains the results of the convolution.
   * @param dW Stride of filter application in the x direction.
   * @param dH Stride of filter application in the y direction.
   * @param dilationW The dilation factor in x direction.
   * @param dilationH The dilation factor in y direction.
   */
  template<typename eT>
  static void Convolution(const arma::Mat<eT>& input,
                          const arma::Cube<eT>& filter,
                          arma::Cube<eT>& output,
                          const size_t dW = 1,
                          const size_t dH = 1,
                          const size_t dilationW = 1,
                          const size_t dilationH = 1)
  {
    arma::Mat<eT> convOutput;
    NaiveConvolution<BorderMode>::Convolution(input, filter.slice(0),
        convOutput, dW, dH, dilationW, dilationH);

    output = arma::Cube<eT>(convOutput.n_rows, convOutput.n_cols,
        filter.n_slices);
    output.slice(0) = convOutput;

    for (size_t i = 1; i < filter.n_slices; ++i)
    {
      NaiveConvolution<BorderMode>::Convolution(input, filter.slice(i),
          output.slice(i), dW, dH, dilationW, dilationH);
    }
  }

  /*
   * Perform a convolution using a 3rd order tensors as input and output and a
   * dense matrix as filter.
   *
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the convolution.
   * @param output Output data that contains the results of the convolution.
   * @param dW Stride of filter application in the x direction.
   * @param dH Stride of filter application in the y direction.
   * @param dilationW The dilation factor in x direction.
   * @param dilationH The dilation factor in y direction.
   */
  template<typename eT>
  static void Convolution(const arma::Cube<eT>& input,
                          const arma::Mat<eT>& filter,
                          arma::Cube<eT>& output,
                          const size_t dW = 1,
                          const size_t dH = 1,
                          const size_t dilationW = 1,
                          const size_t dilationH = 1)
  {
    arma::Mat<eT> convOutput;
    NaiveConvolution<BorderMode>::Convolution(input.slice(0), filter,
        convOutput, dW, dH, dilationW, dilationH);

    output = arma::Cube<eT>(convOutput.n_rows, convOutput.n_cols,
        input.n_slices);
    output.slice(0) = convOutput;

    for (size_t i = 1; i < input.n_slices; ++i)
    {
      NaiveConvolution<BorderMode>::Convolution(input.slice(i), filter,
          output.slice(i), dW, dH, dilationW, dilationH);
    }
  }
};  // class NaiveConvolution

Forward 默认使用的是 Valid 模式，简单看一下其实现（默认 dilationW = dilationH = 1）：
$\times n) \\[5pt] filter : (p \times q) \\[5pt] output: (a \times b) = ( \dfrac{m-p}{dW} + 1 , \dfrac{n-q}{dH} + 1) \\[6pt] \Rightarrow output_{(i , j)} = \sum_{k_j = 0}^{q} \sum_{k_i = 0}^p input_{(k_i + i \cdot dH \ , \ k_j + j \cdot dW)} \times kernel_{(k_i \ , \ k_j)}$

（注意 Armadillo 中的矩阵以列为主序）
官方说明：

.memptr()
Data for matrices is stored in a column-by-column order
Data for cubes is stored in a slice-by-slice (matrix-by-matrix) order

第一重循环是 outMap 的循环，而 outMap 是在遍历 outputTemp 的每一个 Slice ，其中，每到一个 outSize 的倍数时（除了零），递增 batchCount 以及置零 outMapIdx

outMapIdx 是在遍历 weight 的 Slice （weight 的 Slice 个数是：outSize $\times$ inSize）

第二重循环是 inMap 的循环，inMap 在其中遍历 inSize ，同时 outMapIdx 也在每次循环中递增（因此，outMapIdx 的递增将循环 weights 的全部 Slice），循环内部则使用上面介绍的卷积操作，将 inputTemp （或相应填充过的）与 weight 进行卷积，结果加到 outputTemp 中

这里的 batchCount 就起到了统一每个 batch 的作用，因为 inputTemp 的 Slice 个数是 inSize $\times$ batchSize
batchCount $\times$ inSize 相当于是基准，inMap 就是每次的偏移

当一个 batch 卷积结束后，还要再加上相应的 bias

最后调整 outputWidth 和 outputHeight

Backward

头文件：

  /**
   * Ordinary feed backward pass of a neural network, calculating the function
   * f(x) by propagating x backwards through f. Using the results from the feed
   * forward pass.
   *
   * @param * (input) The propagated input activation.
   * @param gy The backpropagated error.
   * @param g The calculated gradient.
   */
  template<typename eT>
  void Backward(const arma::Mat<eT>& /* input */,
                const arma::Mat<eT>& gy,
                arma::Mat<eT>& g);

实现：

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
template<typename eT>
void Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Backward(
    const arma::Mat<eT>& /* input */, const arma::Mat<eT>& gy, arma::Mat<eT>& g)
{
  arma::cube mappedError(((arma::Mat<eT>&) gy).memptr(), outputWidth,
      outputHeight, outSize * batchSize, false, false);

  g.set_size(inputWidth * inputHeight * inSize, batchSize);
  gTemp = arma::Cube<eT>(g.memptr(), inputWidth, inputHeight,
      inSize * batchSize, false, false);
  gTemp.zeros();

  for (size_t outMap = 0, outMapIdx = 0, batchCount = 0; outMap <
      outSize * batchSize; outMap++)
  {
    if (outMap != 0 && outMap % outSize == 0)
    {
      batchCount++;
      outMapIdx = 0;
    }

    for (size_t inMap = 0; inMap < inSize; inMap++, outMapIdx++)
    {
      arma::Mat<eT> output, rotatedFilter;
      Rotate180(weight.slice(outMapIdx), rotatedFilter);

      BackwardConvolutionRule::Convolution(mappedError.slice(outMap),
          rotatedFilter, output, strideWidth, strideHeight);

      if (padWLeft != 0 || padWRight != 0 || padHTop != 0 || padHBottom != 0)
      {
        gTemp.slice(inMap + batchCount * inSize) += output.submat(padWLeft,
            padHTop, padWLeft + gTemp.n_rows - 1, padHTop + gTemp.n_cols - 1);
      }
      else
      {
        gTemp.slice(inMap + batchCount * inSize) += output;
      }
    }
  }
}

双重循环里用到了 BackwardConvolutionRule::Convolution 其默认是 NaiveConvolution 去看一下其实现：

  /*
   * Perform a convolution (full mode).
   *
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the convolution.
   * @param output Output data that contains the results of the convolution.
   * @param dW Stride of filter application in the x direction.
   * @param dH Stride of filter application in the y direction.
   * @param dilationW The dilation factor in x direction.
   * @param dilationH The dilation factor in y direction.
   */
  template<typename eT, typename Border = BorderMode>
  static typename std::enable_if<
      std::is_same<Border, FullConvolution>::value, void>::type
  Convolution(const arma::Mat<eT>& input,
              const arma::Mat<eT>& filter,
              arma::Mat<eT>& output,
              const size_t dW = 1,
              const size_t dH = 1,
              const size_t dilationW = 1,
              const size_t dilationH = 1)
  {
    size_t outputRows = (input.n_rows - 1) * dW + 2 * (filter.n_rows - 1)
        * dilationW + 1;
    size_t outputCols = (input.n_cols - 1) * dH + 2 * (filter.n_cols - 1)
        * dilationH + 1;

    for (size_t i = 0; i < dW; ++i)
    {
      if (((((i + outputRows - 2 * (filter.n_rows - 1) * dilationW - 1) % dW)
          + dW) % dW) == i){
        outputRows += i;
        break;
      }
    }
    for (size_t i = 0; i < dH; ++i)
    {
      if (((((i + outputCols - 2 * (filter.n_cols - 1) * dilationH - 1) % dH)
          + dH) % dH) == i){
        outputCols += i;
        break;
      }
    }

    // Pad filter and input to the working output shape.
    arma::Mat<eT> inputPadded = arma::zeros<arma::Mat<eT> >(outputRows,
        outputCols);
    inputPadded.submat((filter.n_rows - 1) * dilationW, (filter.n_cols - 1)
        * dilationH, (filter.n_rows - 1) * dilationW + input.n_rows - 1,
        (filter.n_cols - 1) * dilationH + input.n_cols - 1) = input;

    NaiveConvolution<ValidConvolution>::Convolution(inputPadded, filter,
        output, 1, 1, dilationW, dilationH);
  }

FullConvolution 就是在 ValidConvolution 之前找到适当 outputRows 和 outputCols 构造出 padded 后的 input ，再调用 ValidConvolution 的卷积

回到 Backward ，大概的过程和 Forward 差不多，根据误差 (gy) 和梯度 (g) 分别构造出临时的 cube ，接着将逆时针旋转了 $180^{\circ}$ 的 weight 与误差进行卷积并加入到梯度矩阵里

Gradient

头文件：

  /*
   * Calculate the gradient using the output delta and the input activation.
   *
   * @param input The input parameter used for calculating the gradient.
   * @param error The calculated error.
   * @param gradient The calculated gradient.
   */
  template<typename eT>
  void Gradient(const arma::Mat<eT>& /* input */,
                const arma::Mat<eT>& error,
                arma::Mat<eT>& gradient);

实现：

template<
    typename ForwardConvolutionRule,
    typename BackwardConvolutionRule,
    typename GradientConvolutionRule,
    typename InputDataType,
    typename OutputDataType
>
template<typename eT>
void Convolution<
    ForwardConvolutionRule,
    BackwardConvolutionRule,
    GradientConvolutionRule,
    InputDataType,
    OutputDataType
>::Gradient(
    const arma::Mat<eT>& input,
    const arma::Mat<eT>& error,
    arma::Mat<eT>& gradient)
{
  arma::cube mappedError(((arma::Mat<eT>&) error).memptr(), outputWidth,
      outputHeight, outSize * batchSize, false, false);
  arma::cube inputTemp(((arma::Mat<eT>&) input).memptr(), inputWidth,
      inputHeight, inSize * batchSize, false, false);

  gradient.set_size(weights.n_elem, 1);
  gradientTemp = arma::Cube<eT>(gradient.memptr(), weight.n_rows,
      weight.n_cols, weight.n_slices, false, false);
  gradientTemp.zeros();

  for (size_t outMap = 0, outMapIdx = 0, batchCount = 0; outMap <
      outSize * batchSize; outMap++)
  {
    if (outMap != 0 && outMap % outSize == 0)
    {
      batchCount++;
      outMapIdx = 0;
    }

    for (size_t inMap = 0; inMap < inSize; inMap++, outMapIdx++)
    {
      arma::Mat<eT> inputSlice;
      if (padWLeft != 0 || padWRight != 0 || padHTop != 0 || padHBottom != 0)
      {
        inputSlice = inputPaddedTemp.slice(inMap + batchCount * inSize);
      }
      else
      {
        inputSlice = inputTemp.slice(inMap + batchCount * inSize);
      }

      arma::Mat<eT> deltaSlice = mappedError.slice(outMap);

      arma::Mat<eT> output;
      GradientConvolutionRule::Convolution(inputSlice, deltaSlice,
          output, strideWidth, strideHeight);

      if (gradientTemp.n_rows < output.n_rows ||
          gradientTemp.n_cols < output.n_cols)
      {
        gradientTemp.slice(outMapIdx) += output.submat(0, 0,
            gradientTemp.n_rows - 1, gradientTemp.n_cols - 1);
      }
      else if (gradientTemp.n_rows > output.n_rows ||
          gradientTemp.n_cols > output.n_cols)
      {
        gradientTemp.slice(outMapIdx).submat(0, 0, output.n_rows - 1,
            output.n_cols - 1) += output;
      }
      else
      {
        gradientTemp.slice(outMapIdx) += output;
      }
    }

    gradient.submat(weight.n_elem + (outMap % outSize), 0, weight.n_elem +
        (outMap % outSize), 0) = arma::accu(mappedError.slice(outMap));
  }
}

首先是 input 和 error 进行卷积
接着将这个卷积的结果以不出界的形式加到 gradientTemp 中
一轮 batch 后，gradient 再加上 mappedError 相应 Slice 的元素之和

Test

iris

#include 
#include 
#include 
#include 

using namespace arma;
using namespace mlpack;
using namespace mlpack::ann;

void ffn_test()
{
    // load data
    mat train_data;
    mat train_labels;
    mat test_data;
    mat test_labels;

    mlpack::data::Load("/home/aurainting/下载/mlpack-3.4.2/build/iris_train.csv", train_data);
    mlpack::data::Load("/home/aurainting/下载/mlpack-3.4.2/build/iris_train_labels.csv", train_labels);
    mlpack::data::Load("/home/aurainting/下载/mlpack-3.4.2/build/iris_test.csv", test_data);
    mlpack::data::Load("/home/aurainting/下载/mlpack-3.4.2/build/iris_test_labels.csv", test_labels);

    // build model
    FFN<> model;
    model.Add<Linear<>>(train_data.n_rows, 6);
    model.Add<ReLULayer<>>();
    model.Add<Linear<>>(6, 4);
    model.Add<ReLULayer<>>();
    model.Add<Linear<>>(4, 3);
    model.Add<LogSoftMax<>>();

    // train
    model.Train<ens::Adam>(train_data, train_labels + 1, ens::ProgressBar());

    // predict
    mat res;
    model.Predict(test_data, res);
    mat pred(1, test_labels.n_cols);
    for (size_t i = 0; i < res.n_cols; ++i)
        pred(0, i) = arma::index_max(res.col(i));
    cout << "accuracy: "
         << static_cast<double>(arma::accu(pred == test_labels)) / test_labels.n_cols << endl;
}


int main()
{
    ffn_test();
}

结果：

mnist

#include 
#include 
#include 
#include 
#include 
#include 

using namespace std;
using namespace arma;
using namespace mlpack;
using namespace mlpack::ann;

int reverseInt(int i)
{
    unsigned char ch1, ch2, ch3, ch4;
    ch1 = i & 255;
    ch2 = (i >> 8) & 255;
    ch3 = (i >> 16) & 255;
    ch4 = (i >> 24) & 255;
    return ((int)ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}

void read_mnist_labels(const string& filepath, mat& labels)
{
    ifstream file(filepath, ios::binary);
    if (file.is_open()) {
        int magic_number = 0;
        int number_of_items = 0;

        file.read((char*)&magic_number, sizeof (magic_number));
        file.read((char*)&number_of_items, sizeof (number_of_items));

        magic_number = reverseInt(magic_number);
        number_of_items = reverseInt(number_of_items);

        labels.resize(1, number_of_items);
        for (int i = 0; i < number_of_items; ++i) {
            unsigned char label = 0;
            file.read((char*)&label, sizeof (label));
            labels(0, i) = label;
        }
    }
}

void read_mnist_images(const string& filepath, mat& images)
{
    ifstream file(filepath, ios::binary);
    if (file.is_open()) {
        int magic_number = 0;
        int number_of_images = 0;
        int n_rows = 0;
        int n_cols = 0;

        file.read((char*)&magic_number, sizeof (magic_number));
        file.read((char*)&number_of_images, sizeof (number_of_images));
        file.read((char*)&n_rows, sizeof (n_rows));
        file.read((char*)&n_cols, sizeof (n_cols));

        magic_number = reverseInt(magic_number);
        number_of_images = reverseInt(number_of_images);
        n_rows = reverseInt(n_rows);
        n_cols = reverseInt(n_cols);

        images.reshape(n_rows * n_cols, number_of_images);
        for (int i = 0; i < number_of_images; ++i)
            for (int j = 0; j < n_rows * n_cols; ++j) {
                unsigned char pixel = 0;
                file.read((char*)&pixel, sizeof (pixel));
                images(j, i) = pixel;
            }
    }
}

void ffn_test()
{
    // load data
    string train_labels_path = "/home/aurainting/文档/data/mnist/train-labels-idx1-ubyte";
    string train_images_path = "/home/aurainting/文档/data/mnist/train-images-idx3-ubyte";
    string test_labels_path = "/home/aurainting/文档/data/mnist/t10k-labels-idx1-ubyte";
    string test_images_path = "/home/aurainting/文档/data/mnist/t10k-images-idx3-ubyte";

    mat train_labels;
    mat test_labels;
    mat train_images;
    mat test_images;

    read_mnist_labels(train_labels_path, train_labels);
    read_mnist_labels(test_labels_path, test_labels);
    read_mnist_images(train_images_path, train_images);
    read_mnist_images(test_images_path, test_images);

    // normalize
    uword nPoints = train_images.n_cols;
    for (uword i = 0; i < nPoints; ++i)
        train_images.col(i) /= norm(train_images.col(i), 2);
    nPoints = test_images.n_cols;
    for (uword i = 0; i < nPoints; ++i)
        test_images.col(i) /= norm(test_images.col(i), 2);

    // build model
    FFN<> model;
    model.Add<Convolution<>>(1, 8, 5, 5, 1, 1, 0, 0, 28, 28);
    model.Add<ReLULayer<>>();
    model.Add<MaxPooling<>>(8, 8, 2, 2);
    model.Add<Convolution<>>(8, 12, 2, 2);
    model.Add<ReLULayer<>>();
    model.Add<MaxPooling<>>(2, 2, 2, 2);
    model.Add<Linear<>>(192, 32);
    model.Add<ReLULayer<>>();
    model.Add<Linear<>>(32, 10);
    model.Add<LogSoftMax<>>();

    // train
    ens::Adam opt(0.001, 8, 0.9, 0.999, 1e-8, 8 * train_images.n_cols);
    model.Train<ens::Adam>(train_images, train_labels + 1, opt, ens::ProgressBar());

    // predict
    mat results;
    model.Predict(test_images, results);
    mat pred(1, results.n_cols);
    for (size_t i = 0; i < results.n_cols; ++i)
        pred(0, i) = arma::index_max(results.col(i));
    cout << "accuracy: "
         << static_cast<double>(arma::accu(pred == test_labels)) / test_labels.n_cols << endl;
}


int main()
{
    ffn_test();
}

结果：

Reference

Artificial Neural Network
Armadillo

你可能感兴趣的:(神经网络,深度学习)

spiking neural network概念学习 Zaгathustra 科研工作深度学习神经网络机器学习
我们认为，SNNs最大的优势在于其能够充分利用基于时空事件的信息。今天，我们有相当成熟的神经形态传感器，来记录环境实时的动态改变。这些动态感官数据可以与SNNs的时间处理能力相结合，以实现超低能耗的计算。在此类传感器中使用SNNs主要受限于缺乏适当的训练算法，从而可以有效地利用尖峰神经元的时间信息。实际上就精度而言，在大多数学习任务中SNNs的效果仍落后于第二代的深度学习。很明显，尖峰神经元可以实
深度学习（1)-简单神经网络示例 yyc_audio 深度学习人工智能
我们来看一个神经网络的具体实例：使用Python的Keras库来学习手写数字分类。在这个例子中，我们要解决的问题是，将手写数字的灰度图像（28像素×28像素）划分到10个类别中（从0到9）。我们将使用MNIST数据集，图2-1给出了MNIST数据集的一些样本。在机器学习中，分类问题中的某个类别叫作类（class），数据点叫作样本（sample），与某个样本对应的类叫作标签（label）。你不需要现
理论一、大模型—概念伯牙碎琴大模型自然语言处理 ai
一、总述大模型通常指的是参数规模庞大、训练难度较高的人工智能模型。随着深度学习技术的发展，研究人员和企业越来越倾向于构建更大的模型，以提高模型的性能和泛化能力。这些大模型往往需要大量的数据和计算资源来训练，并且在实际应用中通常表现出色。大模型全称是大型语言模型（LLM，LargeLanguageModel），这个“大”主要指模型结构容量大，结构中的参数多，用于预训练大模型的数据量大。一个大模型可以
人工智能的本质解构：从二进制桎梏到造物主悖论 Somnolence.·.·.·. 人工智能人工智能 ai
一、数学牢笼中的困兽：人工智能的0-1本质人工智能的底层逻辑是数学暴力的具象化演绎。晶体管开关的物理震荡被抽象为布尔代数的0-1序列，冯·诺依曼架构将思维简化为存储器与运算器的机械对话。即使深度神经网络看似模拟人脑突触，其本质仍是矩阵乘法的迭代游戏——波士顿动力机器人的空翻动作不过是微分方程求解的物理引擎呈现，AlphaGo的围棋神话只是蒙特卡洛树搜索的概率统计。这种基于有限离散数学的架构，注定人
YOLOv11 火焰识别：智能时代的火灾预警新利器星际编程喵 Python探索之旅 YOLO python 目标检测机器学习人工智能开发语言
前言随着人工智能（AI）在各个领域如火如荼发展，图像识别技术也跟着飞速进步。从最初的传统算法到如今的深度学习模型，图像识别在准确性和效率上提升令人惊叹。而在这场技术革命中，YOLO（YouOnlyLookOnce）系列模型无疑扮演举足轻重的角色。今天，我们将目光聚焦在最新的版本——YOLOv11。别误会，YOLOv11可不是什么随便升级。它远不止数字上多了个“1”那么简单。YOLOv11集成许多先
【AI中的数学-人工智能的数学基石】AI的心脏：探索人工智能的算法与核心技术云博士的AI课堂 AI中的数学人工智能算法数学 AI数学大模型
第一章人工智能的数学基石第二节AI的心脏：探索人工智能的算法与核心技术人工智能（AI）的迅猛发展离不开其背后的复杂算法与核心技术。这些算法不仅决定了AI系统的性能和能力，也构成了AI应用的基础。从基础的机器学习算法到先进的深度学习模型，AI的算法生态系统丰富多样，涵盖了广泛的数学原理和计算方法。本节将深入探讨驱动AI进步的关键算法与技术，揭示其工作机制及在实际应用中的重要性。一、机器学习：智能的基
文本生成型人工智能：逻辑算法与文字组合的重构艺术 Somnolence.·.·.·. 人工智能人工智能算法重构人机交互 ai chatgpt
引言在数字化浪潮的推动下，文本生成型人工智能（如GPT系列、Claude、Deepseek等）正逐步从技术工具演化为人类社会的"数字镜像"。其本质并非简单的文字堆砌，而是基于算法逻辑对海量语言数据的学习与重组。这一过程既模仿了人类的思维模式，又受制于技术基础设施的物理边界。以下从三个维度解析其运行逻辑与技术哲学。一、数据训练：从概率统计到仿生逻辑的跨越文本生成型AI的核心在于通过神经网络模型对语言
预测股票走势的ai模型 roxxo AI模型人工智能深度学习金融
AI股票走势预测模型用深度学习+时间序列分析来构建一个股票预测AI，基于历史数据预测未来走势。1.关键功能✅AI选股（基于财务数据+技术指标）✅股票走势预测（LSTM/Transformer）✅智能筛选高增长潜力股✅可视化分析2.关键技术数据来源：YahooFinance/AlphaVantage财务分析：PE、EPS、ROE、PB、成交量机器学习选股：随机森林/XGBoost深度学习预测：LST
麒麟SoC的详细架构组成介绍小蘑菇二号麒麟
目录麒麟SoC的主要组成部分1.应用处理器（ApplicationProcessor,AP）2.图形处理单元（GPU）3.神经网络处理单元（NPU）4.图像信号处理器（ISP）5.调制解调器（Modem,基带芯片）6.多媒体编解码器7.安全模块8.连接模块9.存储控制器10.电源管理单元（PMIC）典型麒麟SoC示例Kirin9000总结麒麟（Kirin）是华为自主研发的一系列高性能系统级芯片（S
AI 大模型创业：如何利用市场优势？ SuperAGI2025 计算机软件编程原理与应用实践 java python javascript kotlin golang 架构人工智能
AI大模型创业：如何利用市场优势？1.背景介绍随着人工智能技术的不断发展，大模型（LargeModels）在商业化应用中日益受到关注。大模型是指在特定领域中应用广泛、参数量巨大的神经网络模型，如BERT、GPT-3、DALL-E等。这些大模型通过在大规模数据集上进行预训练，具备强大的泛化能力和适应性，能够广泛应用于自然语言处理（NLP）、计算机视觉（CV）、生成对抗网络（GAN）等多个领域。然而，
使用 pip 和 conda 的安装深度学习环境 ZhengXinTang #深度学习环境 pip conda python
在决定使用pip和conda安装包时，了解这两个包管理器之间的主要区别非常重要。以下是细分：1.在使用conda安装的过程中，可以先参考另外一台机器中对应虚拟环境配置成功的，所设置的镜像源，使用condacofig--show,进行查看，2.设置，将网络下载时，连接时间加长condaconfig--setremote_connect_timeout_secs60condaconfig--setre
字节跳动实习生和校招生内推飞300 python javascript php 业界资讯算法
机器学习算法实习生-平台治理1、2026届硕士及以上学位在读，计算机等相关专业优先；2、有扎实的代码能力，熟悉深度学习/图神经网络/机器学习框架，如Pytorch、Tensorflow、DGL、Pyg、Sklearn等；3、熟悉机器学习/图学习/序列学习算法中的一项或者多项，如图建模、时序信号建模、节点/子图分类、社区挖掘、表征学习、自监督/半监督学习等，有一定深度和广度；4、熟悉相关算法在数据挖
一文带你了解人工智能：现状、应用、变革及未来展望空青726 人工智能 chatgpt ai 大数据机器学习深度学习创业创新
近年来，人工智能（AI）的发展势头迅猛，它已经渗透到了我们生活的方方面面。从智能手机的语音助手到自动驾驶汽车，从智能家居到医疗诊断，AI正在改变着我们的生活方式。本文将结合时事，为大家介绍当前人工智能的发展形势、在生活中的应用、人工智能的变革以及未来的发展方向。一、人工智能的发展形势1.深度学习：深度学习是当前AI领域的热门话题。通过模拟人脑神经元之间的相互作用，深度学习算法能够从大量数据中提取出
学习AI大模型用这十种方法，轻松入门大模型玩家学习人工智能 transformer 深度学习 langchain agi 大模型
AI大模型学习在当前技术环境下，AI大模型学习不仅要求研究者具备深厚的数学基础和编程能力，还需要对特定领域的业务场景有深入的了解。通过不断优化模型结构和算法，AI大模型学习能够不断提升模型的准确性和效率，为人类生活和工作带来更多便利。系统化理论知识建构：对于AI大模型的学习，首要任务是对基础理论进行全面而深入的理解。这意味着需要投入大量的时间去研读经典的机器学习和深度学习教材，包括但不限于《统计学
人工智能之数学基础：线性空间每天五分钟玩转人工智能机器学习深度学习之数学基础人工智能深度学习线性代数线性空间神经网络
本文重点本文我们将讲解线性空间的知识，它不仅是数学中非常重要的知识点，它在机器学习和深度学习中的价值也是非常重要的，在机器学习和深度学习中是可以通过线性空间来进行解释的。线性空间的直观理解线性空间可以看作是一个多维的“宇宙”，其中的“点”由向量表示，而“运动”则通过向量的加法和数乘来实现。这个宇宙中的每一个向量都可以看作是从原点出发到该点的一条有向线段，而线性空间的维度则决定了这个宇宙的大小和复杂
动手学深度学习V2.0(Pytorch)——25. 使用块的网络 VGG 吨吨不打野动手学深度学习pytorch 深度学习 pytorch 网络
文章目录P1讲解1.1基本介绍1.2总结P2代码实现2.1报错解决2.2windows下专用/共享GPU内存P3Q&AP4.其他4.1ImageNetClassificationLeaderboard4.2VGG其它讲解P1讲解1.1基本介绍视频地址：https://www.bilibili.com/video/BV1Ao4y117Pd教材文档：https://zh-v2.d2l.ai/chapt
AI驱动的知识发现：程序员的新机遇 AI大模型应用之禅计算机软件编程原理与应用实践 java python javascript kotlin golang 架构人工智能
AI驱动的知识发现：程序员的新机遇关键词：知识发现,AI驱动,数据挖掘,数据分析,算法优化,数据可视化,机器学习1.背景介绍1.1问题由来在当今信息化时代，数据量呈爆炸性增长，各行各业都面临着海量数据挖掘和知识发现的巨大挑战。传统的统计分析方法已难以满足需求，而人工智能（AI）技术的兴起为这一问题提供了新的解决方案。AI驱动的知识发现，即利用机器学习、深度学习等技术手段，从海量数据中自动提取有用信
机器学习入门-读书摘要不像程序员的程序媛机器学习人工智能
先看了《深度学习入门：基于python的理论和实践》这本电子书，早上因为入迷还坐过站了。。因为里面的反向传播和链式法则特别难懂，又网上搜了相关内容进行进一步理解，参考的以下文章（个人认为都讲的都非常好）：https://zhuanlan.zhihu.com/p/65472471https://zhuanlan.zhihu.com/p/635438713https://zhuanlan.zhihu.
python模块triton安装教程 2401_85863780 1024程序员节 triton whl
Triton是一个用于高性能计算的开源库，特别适用于深度学习和科学计算。通过预编译的whl文件安装Triton可以简化安装过程，尤其是在编译时可能会遇到依赖问题的情况下。以下是详细的安装步骤：安装前准备：Python环境：确保已经安装了Python，并且Python版本与whl文件兼容。pip：确保已经安装了pip，这是Python的包管理器，用来安装外部库。下载whl文件：从可靠的来源下载适用于
【机器学习】逻辑回归(LogisticRegression)原理与实战 GentleCP 机器学习(深度学习)逻辑回归 logistic regression 原理与实战机器学习
文章目录前言一、什么是逻辑回归1.1逻辑回归基础概念1.2逻辑回归核心概念二、逻辑回归Demo2.1数据准备2.2创建逻辑回归分类器2.3分类器预测三、逻辑回归实战3.1数据准备3.2数据划分与模型创建3.3预测数据评估模型四、参数选择五、总结六、参考资料本文属于我的机器学习/深度学习系列文章，点此查看系列文章目录前言本文主要通过文字和代码样例讲述逻辑回归的原理（包含逻辑回归的基础概念与推导）和实
《深度Q网络优化：突破高维连续状态空间的束缚》人工智能深度学习
在人工智能的发展历程中，深度Q网络（DQN）作为强化学习与深度学习融合的关键成果，为解决复杂决策问题开辟了新路径。但当面对高维连续状态空间时，DQN会出现训练不稳定、收敛速度慢等问题，严重限制了其应用范围。如何优化DQN以适应高维连续状态空间，成为当下研究的热点。深度Q网络基础回顾深度Q网络结合了深度学习强大的特征提取能力与Q学习的决策优化思想。在传统强化学习中，Q学习通过Q表记录每个状态-动作对
智享AI直播三代系统，开启「机器人比人更会带货」时代！缘分开始t621238 人工智能机器人
智享AI直播三代系统，开启「机器人比人更会带货」时代！在当今数字化浪潮汹涌的时代，直播行业作为电商领域的重要驱动力，正经历着前所未有的变革。近日，智享AI直播三代系统的横空出世，宛如一颗重磅炸弹，在直播行业掀起了惊涛骇浪，正式开启了「机器人比人更会带货」的全新时代。一、技术革新，颠覆传统直播模式智享AI直播三代系统的诞生，标志着直播行业进入了智能化的新纪元。它融合了先进的人工智能技术，包括深度学习
自学黑客（网络安全），一般人我劝你还是算了吧网安周星星 web安全安全 windows 网络网络安全
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包文章讲述了自学网络安全时常见的误区，如先学编程、过度追求深度学习以及收集过多资料，并提供了前期学习的硬件、软件选择建议，强调了基础编程知识和英文能力的重要性。文中给出了详细的学习路线，包括基础操作入门、实战操作以及参加CTF和HVV等竞赛来提升技能，并推荐了一系列相关书籍和学习资源。一、自学网络安全学习的误区和陷阱1.不要试图先成为一
DQN的原理和代码实现 SmallerFL NLP&机器学习 DQN 强化学习深度学习
文章目录1.概述2.DQN的训练步骤2.1初始化2.2训练循环2.3终止条件2.4评估3.代码示例1.概述深度Q网络（DeepQ-Network,DQN）是强化学习中的一种重要算法，由GoogleDeepMind于2013年提出。DQN结合了Q学习和深度学习，通过使用神经网络来近似Q值函数，解决了传统Q学习在高维状态空间中的问题。2.DQN的训练步骤2.1初始化环境：定义环境（例如，Atari游戏
深度学习基础知识 namelijink 深度学习人工智能
cuda简介：CUDA（ComputeUnifiedDeviceArchitecture）是由NVIDIA开发的一种并行计算平台和应用程序编程接口（API）。它允许开发人员利用NVIDIA的GPU（图形处理器）来加速各种计算任务，包括科学计算、机器学习、深度学习、数据分析等。NVIDIA是一个全球领先的计算技术公司，专注于设计和制造高性能计算设备。除了生产强大的GPU，NVIDIA还提供与其GPU
【python语言应用】最新全流程Python编程、机器学习与深度学习实践技术应用（帮助你快速了解和入门 Python）赵钰老师 python 机器学习深度学习 python 机器学习深度学习数据分析人工智能
近年来，人工智能领域的飞速发展极大地改变了各个行业的面貌。当前最新的技术动态，如大型语言模型和深度学习技术的发展，展示了深度学习和机器学习技术的强大潜力，成为推动创新和提升竞争力的关键。特别是PyTorch，凭借其灵活性和高效性，成为科研人员和工程师的首选工具。理解和掌握深度学习的基础知识，深入了解其与经典机器学习算法的区别与联系，并系统掌握包括迁移学习、循环神经网络（RNN）、长短时记忆网络（L
【Python深入浅出㊸】解锁Python3中的TensorFlow：开启深度学习之旅奔跑吧邓邓子 Python深入浅出 python 深度学习 tensorflow
目录一、TensorFlow简介1.1定义与背景1.2特点二、Python3与TensorFlow的关系2.1版本对应2.2为何选择Python3三、安装TensorFlow3.1安装步骤3.2验证安装四、TensorFlow基本概念与使用方法4.1计算图（Graph）4.2会话（Session）4.3张量（Tensor）4.4变量（Variable）4.5占位符（Placeholder）五、Te
【Java】已解决：java.util.concurrent.ExecutionException 屿小夏 java 开发语言 android
个人简介：某不知名博主，致力于全栈领域的优质博客分享|用最优质的内容带来最舒适的阅读体验！文末获取免费IT学习资料！文末获取更多信息精彩专栏推荐订阅收藏专栏系列直达链接相关介绍书籍分享点我跳转书籍作为获取知识的重要途径，对于IT从业者来说更是不可或缺的资源。不定期更新IT图书，并在评论区抽取随机粉丝，书籍免费包邮到家AI前沿点我跳转探讨人工智能技术领域的最新发展和创新，涵盖机器学习、深度学习、自然
c++加载TensorRT调用深度学习模型方法 feibaoqq 深度学习深度学习 YOLO
使用TensorRT来调用训练好的模型并输出结果是一个高效的推理过程，特别是在需要低延迟和高吞吐量的应用场景中。以下是一个基本的步骤指南，展示了如何在C++中使用TensorRT进行推理。步骤1：准备环境安装TensorRT：确保你已经安装了NVIDIATensorRT库。准备模型：确保你的训练好的模型已经转换为TensorRT支持的格式，通常是一个.engine文件。你可以使用onnx-tens
点云从入门到精通技术详解100篇-基于 CBCT 与口内扫描数据的牙齿点云配准格图素书深度学习计算机视觉数学建模人工智能
目录前言国内外研究现状传统牙齿配准点云配准2牙齿数据的深度学习点云配准基础2.1牙齿数据获取方法2.1.1口腔印模2.1.2辐射成像2.1.3口内扫描2.2深度学习网络2.2.1全连接神经网络2.2.2卷积神经网络2.2.3孪生神经网络2.3点云数据配准基础2.3.1点云数据格式2.3.2点云旋转表达2.3.3传统点云配准方法3基于PCRNet的PCR-SA牙齿点云配准3.1CBCT-IOS牙齿配
桌面上有多个球在同时运动，怎么实现球之间不交叉，即碰撞？换个号韩国红果果 html 小球碰撞
稍微想了一下，然后解决了很多bug，最后终于把它实现了。其实原理很简单。在每改变一个小球的x y坐标后，遍历整个在dom树中的其他小球，看一下它们与当前小球的距离是否小于球半径的两倍？若小于说明下一次绘制该小球（设为a）前要把他的方向变为原来相反方向（与a要碰撞的小球设为b），即假如当前小球的距离小于球半径的两倍的话，马上改变当前小球方向。那么下一次绘制也是先绘制b，再绘制a，由于a的方向已经改变
《高性能HTML5》读后整理的Web性能优化内容白糖_ html5
读后感先说说《高性能HTML5》这本书的读后感吧，个人觉得这本书前两章跟书的标题完全搭不上关系，或者说只能算是讲解了“高性能”这三个字，HTML5完全不见踪影。个人觉得作者应该首先把HTML5的大菜拿出来讲一讲，再去分析性能优化的内容，这样才会有吸引力。因为只是在线试读，没有机会看后面的内容，所以不胡乱评价了。
[JShop]Spring MVC的RequestContextHolder使用误区 dinguangx jeeshop 商城系统 jshop 电商系统
在spring mvc中，为了随时都能取到当前请求的request对象，可以通过RequestContextHolder的静态方法getRequestAttributes()获取Request相关的变量，如request, response等。在jshop中，对RequestContextHolder的
算法之时间复杂度周凡杨 java 算法时间复杂度效率
在计算机科学中，算法的时间复杂度是一个函数，它定量描述了该算法的运行时间。这是一个关于代表算法输入值的字符串的长度的函数。时间复杂度常用大O符号表述，不包括这个函数的低阶项和首项系数。使用这种方式时，时间复杂度可被称为是渐近的，它考察当输入值大小趋近无穷时的情况。这样用大写O()来体现算法时间复杂度的记法，
Java事务处理 g21121 java
一、什么是Java事务通常的观念认为，事务仅与数据库相关。事务必须服从ISO/IEC所制定的ACID原则。ACID是原子性（atomicity）、一致性（consistency）、隔离性（isolation）和持久性（durability）的缩写。事务的原子性表示事务执行过程中的任何失败都将导致事务所做的任何修改失效。一致性表示当事务执行失败时，所有被该事务影响的数据都应该恢复到事务执行前的状
Linux awk命令详解 510888780 linux
一. AWK 说明 awk是一种编程语言，用于在linux/unix下对文本和数据进行处理。数据可以来自标准输入、一个或多个文件，或其它命令的输出。它支持用户自定义函数和动态正则表达式等先进功能，是linux/unix下的一个强大编程工具。它在命令行中使用，但更多是作为脚本来使用。 awk的处理文本和数据的方式：它逐行扫描文件，从第一行到
android permission 布衣凌宇 Permission
<uses-permission android:name="android.permission.ACCESS_CHECKIN_PROPERTIES" ></uses-permission>允许读写访问"properties"表在checkin数据库中，改值可以修改上传 <uses-permission android:na
Oracle和谷歌Java Android官司将推迟 aijuans java oracle
北京时间 10 月 7 日，据国外媒体报道，Oracle 和谷歌之间一场等待已久的官司可能会推迟至 10 月 17 日以后进行，这场官司的内容是 Android 操作系统所谓的 Java 专利权之争。本案法官 William Alsup 称根据专利权专家 Florian Mueller 的预测，谷歌 Oracle 案很可能会被推迟。　　该案中的第二波辩护被安排在 10 月 17 日出庭，从目前看来
linux shell 常用命令 antlove linux shell command
grep [options] [regex] [files] /var/root # grep -n "o" * hello.c:1:/* This C source can be compiled with:
Java解析XML配置数据库连接(DOM技术连接 SAX技术连接) 百合不是茶 sax技术 Java解析xml文档 dom技术 XML配置数据库连接
XML配置数据库文件的连接其实是个很简单的问题,为什么到现在才写出来主要是昨天在网上看了别人写的,然后一直陷入其中,最后发现不能自拔所以今天决定自己完成 ,,,,现将代码与思路贴出来供大家一起学习 XML配置数据库的连接主要技术点的博客; JDBC编程 : JDBC连接数据库 DOM解析XML: DOM解析XML文件 SA
underscore.js 学习（二） bijian1013 JavaScript underscore
Array Functions 所有数组函数对参数对象一样适用。1.first _.first(array, [n]) 别名: head, take 返回array的第一个元素，设置了参数n，就
plSql介绍 bijian1013 oracle 数据库 plsql
/* * PL/SQL 程序设计学习笔记 * 学习plSql介绍.pdf * 时间：2010-10-05 */ --创建DEPT表 create table DEPT ( DEPTNO NUMBER(10), DNAME NVARCHAR2(255), LOC NVARCHAR2(255) ) delete dept; select
【Nginx一】Nginx安装与总体介绍 bit1129 nginx
启动、停止、重新加载Nginx nginx 启动Nginx服务器，不需要任何参数u nginx -s stop 快速(强制)关系Nginx服务器 nginx -s quit 优雅的关闭Nginx服务器 nginx -s reload 重新加载Nginx服务器的配置文件 nginx -s reopen 重新打开Nginx日志文件
spring mvc开发中浏览器兼容的奇怪问题 bitray jquery Ajax springMVC 浏览器上传文件
最近个人开发一个小的OA项目,属于复习阶段.使用的技术主要是spring mvc作为前端框架,mybatis作为数据库持久化技术.前台使用jquery和一些jquery的插件. 在开发到中间阶段时候发现自己好像忽略了一个小问题,整个项目一直在firefox下测试,没有在IE下测试,不确定是否会出现兼容问题.由于jquer
Lua的io库函数列表 ronin47 lua io
1、io表调用方式：使用io表，io.open将返回指定文件的描述，并且所有的操作将围绕这个文件描述　　io表同样提供三种预定义的文件描述io.stdin,io.stdout,io.stderr 　　2、文件句柄直接调用方式,即使用file:XXX()函数方式进行操作,其中file为io.open()返回的文件句柄　　多数I/O函数调用失败时返回nil加错误信息,有些函数成功时返回nil
java-26-左旋转字符串 bylijinnan java
public class LeftRotateString { /** * Q 26 左旋转字符串 * 题目：定义字符串的左旋转操作：把字符串前面的若干个字符移动到字符串的尾部。 * 如把字符串abcdef左旋转2位得到字符串cdefab。 * 请实现字符串左旋转的函数。要求时间对长度为n的字符串操作的复杂度为O(n)，辅助内存为O(1)。 */ pu
《vi中的替换艺术》-linux命令五分钟系列之十一 cfyme linux命令
vi方面的内容不知道分类到哪里好，就放到《Linux命令五分钟系列》里吧！今天编程，关于栈的一个小例子，其间我需要把”S.”替换为”S->”(替换不包括双引号)。其实这个不难，不过我觉得应该总结一下vi里的替换技术了，以备以后查阅。 1 所有替换方案都要在冒号“:”状态下书写。 2 如果想将abc替换为xyz，那么就这样 :s/abc/xyz/ 不过要特别
[轨道与计算]新的并行计算架构 comsci 并行计算
我在进行流程引擎循环反馈试验的过程中，发现一个有趣的事情。。。如果我们在流程图的每个节点中嵌入一个双向循环代码段，而整个流程中又充满着很多并行路由，每个并行路由中又包含着一些并行节点，那么当整个流程图开始循环反馈过程的时候，这个流程图的运行过程是否变成一个并行计算的架构呢？
重复执行某段代码 dai_lm android
用handler就可以了 private Handler handler = new Handler(); private Runnable runnable = new Runnable() { public void run() { update(); handler.postDelayed(this, 5000); } }; 开始计时 h
Java实现堆栈（list实现） datageek 数据结构——堆栈
public interface IStack<T> { //元素出栈，并返回出栈元素 public T pop(); //元素入栈 public void push(T element); //获取栈顶元素 public T peek(); //判断栈是否为空 public boolean isEmpty
四大备份MySql数据库方法及可能遇到的问题 dcj3sjt126com DB backup
一：通过备份王等软件进行备份前台进不去？用备份王等软件进行备份是大多老站长的选择，这种方法方便快捷，只要上传备份软件到空间一步步操作就可以，但是许多刚接触备份王软件的客用户来说还原后会出现一个问题：因为新老空间数据库用户名和密码不统一，网站文件打包过来后因没有修改连接文件，还原数据库是好了，可是前台会提示数据库连接错误，网站从而出现打不开的情况。解决方法：学会修改网站配置文件，大多是由co
github做webhooks：[1]钩子触发是否成功测试 dcj3sjt126com github git webhook
转自: http://jingyan.baidu.com/article/5d6edee228c88899ebdeec47.html github和svn一样有钩子的功能，而且更加强大。例如我做的是最常见的push操作触发的钩子操作，则每次更新之后的钩子操作记录都会在github的控制板可以看到！工具/原料 github 方法/步骤
">的作用" target="_blank">JSP中的作用蕃薯耀
JSP中<base href="<%=basePath%>">的作用 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
linux下SAMBA服务安装与配置 hanqunfeng linux
局域网使用的文件共享服务。一.安装包： rpm -qa | grep samba samba-3.6.9-151.el6.x86_64 samba-common-3.6.9-151.el6.x86_64 samba-winbind-3.6.9-151.el6.x86_64 samba-client-3.6.9-151.el6.x86_64 samba-winbind-clients
guava cache IXHONG cache
缓存，在我们日常开发中是必不可少的一种解决性能问题的方法。简单的说，cache 就是为了提升系统性能而开辟的一块内存空间。　　缓存的主要作用是暂时在内存中保存业务系统的数据处理结果，并且等待下次访问使用。在日常开发的很多场合，由于受限于硬盘IO的性能或者我们自身业务系统的数据处理和获取可能非常费时，当我们发现我们的系统这个数据请求量很大的时候，频繁的IO和频繁的逻辑处理会导致硬盘和CPU资源的
Query的开始--全局变量,noconflict和兼容各种js的初始化方法 kvhur JavaScript jquery css
这个是整个jQuery代码的开始，里面包含了对不同环境的js进行的处理，例如普通环境，Nodejs，和requiredJs的处理方法。还有jQuery生成$, jQuery全局变量的代码和noConflict代码详解完整资源： http://www.gbtags.com/gb/share/5640.htm jQuery 源码： (
美国人的福利和中国人的储蓄 nannan408
今天看了篇文章，震动很大，说的是美国的福利。美国医院的无偿入院真的是个好措施。小小的改善，对于社会是大大的信心。小孩，税费等，政府不收反补，真的体现了人文主义。美国这么高的社会保障会不会使人变懒？答案是否定的。正因为政府解决了后顾之忧，人们才得以倾尽精力去做一些有创造力，更造福社会的事情，这竟成了美国社会思想、人
N阶行列式计算(JAVA) qiuwanchi N阶行列式计算
package gaodai; import java.util.List; /** * N阶行列式计算 * @author 邱万迟 * */ public class DeterminantCalculation { public DeterminantCalculation(List<List<Double>> determina
C语言算法之打渔晒网问题 qiufeihu c 算法
如果一个渔夫从2011年1月1日开始每三天打一次渔，两天晒一次网，编程实现当输入2011年1月1日以后任意一天，输出该渔夫是在打渔还是在晒网。代码如下： #include <stdio.h> int leap(int a) /*自定义函数leap()用来指定输入的年份是否为闰年*/ { if((a%4 == 0 && a%100 != 0
XML中DOCTYPE字段的解析 wyzuomumu xml
DTD声明始终以!DOCTYPE开头,空一格后跟着文档根元素的名称,如果是内部DTD,则再空一格出现[],在中括号中是文档类型定义的内容. 而对于外部DTD,则又分为私有DTD与公共DTD,私有DTD使用SYSTEM表示,接着是外部DTD的URL. 而公共DTD则使用PUBLIC,接着是DTD公共名称,接着是DTD的URL. 私有DTD <!DOCTYPErootSYST