thy_2014

caffe源码阅读6-vision_layers.hpp+各cpp

vision_layers.hpp：

ConvolutionLayer类，CuDNNConvolutionLayer类；

Im2colLayer类；

LRNLayer类；

PoolingLayer类，CuDNNPoolingLayer类。

在这个文件中一共包含了上面的4种层。因为在卷积的时候，caffe为了加速处理，使用了小技巧im2col，所以我们先来看看im2colLayer：

1. im2colLayer：

1.1. 原理介绍：

用两个图来进行说明：

如上图，相当于是把很多个小矩阵行化之后合并成一个大矩阵。

这个图演示了卷积的过程。可能稍微有些看不太明白，我用matlab简单的演示一下：

clear,clc
im = {[1,2,0;1,1,3;0,2,2;],[0,2,1;0,3,2;1,1,0;],[1,2,1;0,1,3;3,3,2]};
kel = {[1,1;2,2;],[1,1;1,1;],[0,1;1,0;];[1,0;0,1;],[2,1;2,1;],[1,2;2,0;]};
out1 = zeros(2,2);
out2 = zeros(2,2);
for i = 1:2
    for j = 1:2
        for k = 1:3
            out1(i,j) = out1(i,j) + sum(sum(kel{1,k}.*im{k}(i:i+1,j:j+1)));
            out2(i,j) = out2(i,j) + sum(sum(kel{2,k}.*im{k}(i:i+1,j:j+1)));
        end
    end
end
out1
out2

上图中一共3个输入矩阵，6个卷积核。以上的代码演示的是传统卷积部分，也就是图片的上半部分：

然后，下半部分的卷积过程应该就好理解了。那么其实我们可以想象得到，两种方式做的乘法加法次数应该都是相同的，为什么下面版本的卷积会起到加速的作用呢？还多了一个转换的过程多麻烦啊！我暂时的理解是：这个应该与计算的多级缓存有关系，弄成一个大矩阵之后，换页的次数应该会大大降低，从而提高了速度。(后来发现还有另外一个好处，以后分析道全连接层的代码时会发现跟卷积层很像，卷积层的计算其实可以就看成一种局部全连接层嘛，而经过im2col转换之后，就变得和全连接层的计算一样了)

1.2. 源码分析：

A helper for image operations that rearranges image regions into column vectors. Used by ConvolutionLayer to perform convolution by matrix multiplication.

1.2.1. 属性变量：

  int kernel_h_, kernel_w_;
  int stride_h_, stride_w_;
  int channels_;
  int height_, width_;
  int pad_h_, pad_w_;

对跑过caffe的人来说，这几个变量并不陌生吧。因为在配置文件(例如：train_val.prototxt)中常常见到这几个东西。但是我们一般可能设置的都是单个值，因为我们一般使用的都是方形。从这里来看，原来可以使用矩形的。其实这些东西在caffe.proto中也可以看得到。

1.2.2. 构造函数：

  explicit Im2colLayer(const LayerParameter& param)
      : Layer(param) {}
  virtual void LayerSetUp(const vector*>& bottom,
      vector*>* top);
  virtual void Reshape(const vector*>& bottom,
      vector*>* top);

虽然这里只有一个是构造函数， Im2colLayer()。感觉这三个放在这里一起介绍合适一些。

在im2col_layer.cpp中找到对应的实现。

其中Im2colLayer()直接继承使用了Layer工厂类中的Layer()。

LayerSetUp()的实现如下：

template 
void Im2colLayer::LayerSetUp(const vector*>& bottom,
      vector*>* top) {
  ConvolutionParameter conv_param = this->layer_param_.convolution_param();
  CHECK(!conv_param.has_kernel_size() !=
      !(conv_param.has_kernel_h() && conv_param.has_kernel_w()))
      << "Filter size is kernel_size OR kernel_h and kernel_w; not both";
  CHECK(conv_param.has_kernel_size() ||
      (conv_param.has_kernel_h() && conv_param.has_kernel_w()))
      << "For non-square filters both kernel_h and kernel_w are required.";
  CHECK((!conv_param.has_pad() && conv_param.has_pad_h()
      && conv_param.has_pad_w())
      || (!conv_param.has_pad_h() && !conv_param.has_pad_w()))
      << "pad is pad OR pad_h and pad_w are required.";
  CHECK((!conv_param.has_stride() && conv_param.has_stride_h()
      && conv_param.has_stride_w())
      || (!conv_param.has_stride_h() && !conv_param.has_stride_w()))
      << "Stride is stride OR stride_h and stride_w are required.";
  if (conv_param.has_kernel_size()) {
    kernel_h_ = kernel_w_ = conv_param.kernel_size();
  } else {
    kernel_h_ = conv_param.kernel_h();
    kernel_w_ = conv_param.kernel_w();
  }
  CHECK_GT(kernel_h_, 0) << "Filter dimensions cannot be zero.";
  CHECK_GT(kernel_w_, 0) << "Filter dimensions cannot be zero.";
  if (!conv_param.has_pad_h()) {
    pad_h_ = pad_w_ = conv_param.pad();
  } else {
    pad_h_ = conv_param.pad_h();
    pad_w_ = conv_param.pad_w();
  }
  if (!conv_param.has_stride_h()) {
    stride_h_ = stride_w_ = conv_param.stride();
  } else {
    stride_h_ = conv_param.stride_h();
    stride_w_ = conv_param.stride_w();
  }
}

该方法函数首先调用的是：layer_param_.convolution_param()

还记得在Layer工厂类中的属性变量就有layer_param_吗？其定义如下：

LayerParameter layer_param_;

那么LayerParameter是怎样的一个类型呢？可以在caffe.proto中找到(详见： caffe源码阅读4-layer.hpp)

那么这个类中的方法convolution_param()是个什么东西？

从上面的源码返回类型来看，返回值是ConvolutionParameter类型的：

// Message that stores parameters used by ConvolutionLayer
message ConvolutionParameter {
  optional uint32 num_output = 1; // The number of outputs for the layer
  optional bool bias_term = 2 [default = true]; // whether to have bias terms
  // Pad, kernel size, and stride are all given as a single value for equal
  // dimensions in height and width or as Y, X pairs.
  optional uint32 pad = 3 [default = 0]; // The padding size (equal in Y, X)
  optional uint32 pad_h = 9 [default = 0]; // The padding height
  optional uint32 pad_w = 10 [default = 0]; // The padding width
  optional uint32 kernel_size = 4; // The kernel size (square)
  optional uint32 kernel_h = 11; // The kernel height
  optional uint32 kernel_w = 12; // The kernel width
  optional uint32 group = 5 [default = 1]; // The group size for group conv
  optional uint32 stride = 6 [default = 1]; // The stride (equal in Y, X)
  optional uint32 stride_h = 13; // The stride height
  optional uint32 stride_w = 14; // The stride width
  optional FillerParameter weight_filler = 7; // The filler for the weight
  optional FillerParameter bias_filler = 8; // The filler for the bias
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 15 [default = DEFAULT];
}

所以 convolution_param()差不多相当于返回了本层参数的详细信息。

LayerSetUp()中接着就是检测本层初始化的时候，该有的属性变量是否都具有，是否正确，例如需要包含kernel_size的话，就不用kernel_h和kernel_w了。

然后就开始对这些属性变量进行赋值。有点奇怪的是，LayerSetUp()代码中根本没有使用参数const vector*>& bottom, vector*>* top啊！这是什么情况？可能是为了以后需要的时候便于扩展而不需要修改接口吧。

下来来看看Reshape()函数：

template 
void Im2colLayer::Reshape(const vector*>& bottom,
      vector*>* top) {
  channels_ = bottom[0]->channels();
  height_ = bottom[0]->height();
  width_ = bottom[0]->width();
  (*top)[0]->Reshape(
      bottom[0]->num(), channels_ * kernel_h_ * kernel_w_,
      (height_ + 2 * pad_h_ - kernel_h_) / stride_h_ + 1,
      (width_ + 2 * pad_w_ - kernel_w_) / stride_w_ + 1);
}

该函数实现的是将本层的图像(或者说矩阵吧)大小改成和bottom一样。将top的也修改的和bottom一致，不过是做了变换之后的，与卷积核大小，步长等等有关。

1.2.3. 前馈反馈函数：

vision_layers.hpp中对这里的声明是：

 protected:
  virtual void Forward_cpu(const vector*>& bottom,
      vector*>* top);
  virtual void Forward_gpu(const vector*>& bottom,
      vector*>* top);
  virtual void Backward_cpu(const vector*>& top,
      const vector& propagate_down, vector*>* bottom);
  virtual void Backward_gpu(const vector*>& top,
      const vector& propagate_down, vector*>* bottom);

而在im2col_layer.cpp中的实现是：

template 
void Im2colLayer::Forward_cpu(const vector*>& bottom,
      vector*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  for (int n = 0; n < bottom[0]->num(); ++n) {
    im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_,
        width_, kernel_h_, kernel_w_, pad_h_, pad_w_,
        stride_h_, stride_w_, top_data + (*top)[0]->offset(n));
  }
}

template 
void Im2colLayer::Backward_cpu(const vector*>& top,
      const vector& propagate_down, vector*>* bottom) {
  const Dtype* top_diff = top[0]->cpu_diff();
  Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
  for (int n = 0; n < top[0]->num(); ++n) {
    col2im_cpu(top_diff + top[0]->offset(n), channels_, height_, width_,
        kernel_h_, kernel_w_, pad_h_, pad_w_,
        stride_h_, stride_w_, bottom_diff + (*bottom)[0]->offset(n));
  }
}

这两个代码看起来很简单，因为具体的操作过程都在 im2col_cpu()和 col2im_cpu()中。这里有没有感觉到有点奇怪？奇怪的有两个地方：

1 为什么只有cpu的计算呢？不应该也有gpu的计算吗？难道这里的计算不需要gpu？其实不然，在im2col_layer.cu中实现了gpu的部分，和这里cpu的代码基本长得一样。只是调用的是im2col_gpu()和col2im_gpu()。

2 为什么只对bottom[0]和top[0]操作呢？不应该可能有多个bottom和多个top吗？呃，这里我也懵了，先不管，继续往后面看吧。

然后关于im2col_layer中的其余几个函数，呃，暂时看上去没有什么特别的用处一样。

1.3 im2col, col2im的实现

在im2col.hpp+cpp中，一共包含4个函数：

#ifndef _CAFFE_UTIL_IM2COL_HPP_
#define _CAFFE_UTIL_IM2COL_HPP_

namespace caffe {

template 
void im2col_cpu(const Dtype* data_im, const int channels,
    const int height, const int width, const int kernel_h, const int kernel_w,
    const int pad_h, const int pad_w, const int stride_h,
    const int stride_w, Dtype* data_col);

template 
void col2im_cpu(const Dtype* data_col, const int channels,
    const int height, const int width, const int patch_h, const int patch_w,
    const int pad_h, const int pad_w, const int stride_h,
    const int stride_w, Dtype* data_im);

template 
void im2col_gpu(const Dtype* data_im, const int channels,
    const int height, const int width, const int kernel_h, const int kernel_w,
    const int pad_h, const int pad_w, const int stride_h,
    const int stride_w, Dtype* data_col);

template 
void col2im_gpu(const Dtype* data_col, const int channels,
    const int height, const int width, const int patch_h, const int patch_w,
    const int pad_h, const int pad_w, const int stride_h,
    const int stride_w, Dtype* data_im);

}  // namespace caffe

#endif  // CAFFE_UTIL_IM2COL_HPP_

gpu和cpu的实现基本一样，所以我们只看cpu的实现吧：

1.3.1 im2col_cpu()：

template 
void im2col_cpu(const Dtype* data_im, const int channels,
    const int height, const int width, const int kernel_h, const int kernel_w,
    const int pad_h, const int pad_w,
    const int stride_h, const int stride_w,
    Dtype* data_col) {
  int height_col = (height + 2 * pad_h - kernel_h) / stride_h + 1;
  int width_col = (width + 2 * pad_w - kernel_w) / stride_w + 1;
  int channels_col = channels * kernel_h * kernel_w;
  for (int c = 0; c < channels_col; ++c) {
    int w_offset = c % kernel_w;
    int h_offset = (c / kernel_w) % kernel_h;
    int c_im = c / kernel_h / kernel_w;
    for (int h = 0; h < height_col; ++h) {
      for (int w = 0; w < width_col; ++w) {
        int h_pad = h * stride_h - pad_h + h_offset;
        int w_pad = w * stride_w - pad_w + w_offset;
        if (h_pad >= 0 && h_pad < height && w_pad >= 0 && w_pad < width)
          data_col[(c * height_col + h) * width_col + w] =
            data_im[(c_im * height + h_pad) * width + w_pad];
        else
          data_col[(c * height_col + h) * width_col + w] = 0;
      }
    }
  }
}

原理已经在最开始介绍过了，那么这里的代码读起来应该不是那么费劲。

这里是将输入数据data_im经过im2col之后，将新的数据存入data_col中。至于说data_col的通道数，高，宽的计算，自己稍微推理一下吧。

需要注意的是：pad_h乘了2，pad_w也乘了2，我们首先需要明确pad_h的意思是在输入数据的高度方向两边各增加pad_h个单位长度，所以乘了2。那么pad_w也是一样的道理。

下面我画一个图来便于理解：(参考https://www.zhihu.com/question/28385679)

在源码当中还有一个小细节：

// Explicit instantiation
template void im2col_cpu(const float* data_im, const int channels,
    const int height, const int width, const int kernel_h, const int kernel_w,
    const int pad_h, const int pad_w, const int stride_h,
    const int stride_w, float* data_col);
template void im2col_cpu(const double* data_im, const int channels,
    const int height, const int width, const int kernel_h, const int kernel_w,
    const int pad_h, const int pad_w, const int stride_h,
    const int stride_w, double* data_col);

扎眼一看，还以为两个长一样，其实不然。

第一个的data_im, data_col是float*类型的；第一个的data_im, data_col是double*类型的；同时也说明了不支持整型。

1.3.2 col2im_cpu()：

template 
void col2im_cpu(const Dtype* data_col, const int channels,
    const int height, const int width, const int patch_h, const int patch_w,
    const int pad_h, const int pad_w,
    const int stride_h, const int stride_w,
    Dtype* data_im) {
  caffe_set(height * width * channels, Dtype(0), data_im);
  int height_col = (height + 2 * pad_h - patch_h) / stride_h + 1;
  int width_col = (width + 2 * pad_w - patch_w) / stride_w + 1;
  int channels_col = channels * patch_h * patch_w;
  for (int c = 0; c < channels_col; ++c) {
    int w_offset = c % patch_w;
    int h_offset = (c / patch_w) % patch_h;
    int c_im = c / patch_h / patch_w;
    for (int h = 0; h < height_col; ++h) {
      for (int w = 0; w < width_col; ++w) {
        int h_pad = h * stride_h - pad_h + h_offset;
        int w_pad = w * stride_w - pad_w + w_offset;
        if (h_pad >= 0 && h_pad < height && w_pad >= 0 && w_pad < width)
          data_im[(c_im * height + h_pad) * width + w_pad] +=
              data_col[(c * height_col + h) * width_col + w];
      }
    }
  }
}

有了前面的介绍，加上我画的那个图，反过来的过程就很容易理解了吧。

2 ConvolutionLayer：

param provides ConvolutionParameter convolution_param, with ConvolutionLayer options:
- num_output. The number of filters.
- kernel_size / kernel_h / kernel_w. The filter dimensions, given by kernel_size for square filters or kernel_h and kernel_w for rectangular filters.
- stride / stride_h / stride_w (\b optional, default 1). The filter stride, given by stride_size for equal dimensions or stride_h and stride_w for different strides. By default the convolution is dense with stride 1.
- pad / pad_h / pad_w (\b optional, default 0). The zero-padding for convolution, given by pad for equal dimensions or pad_h and pad_w for different padding. Input padding is computed implicitly instead of actually padding.
- group (\b optional, default 1). The number of filter groups. Group convolution is a method for reducing parameterization by selectively connecting input and output channels. The input and output channel dimensions must be divisible by the number of groups. For group @f$ \geq 1 @f$, the convolutional filters' input and output channels are separated s.t. each group takes 1 / group of the input channels and makes 1 / group of the output channels. Concretely 4 input channels, 8 output channels, and 2 groups separate input channels 1-2 and output channels 1-4 into the first group and input channels 3-4 and output channels 5-8 into the second group.
- bias_term (\b optional, default true). Whether to have a bias.
- engine: convolution has CAFFE (matrix multiplication) and CUDNN (library kernels + stream parallelism) engines.

以上是ConvolutionLayer类的源码注释：主要介绍各参数，其中有一个参数需要注意一下：group，如果该参数赋值为2，假设输入的channel有4个，而输出的channel为8，则输出的8个channel中的前4个来源于输入的前2个，输出的后4个来源于输入的后2个。

2.1 原理介绍：

可以参考：

caffe源码阅读(2)卷积层

cnn_tutorial

前馈：y = w*x + b

反馈：

卷积层参数传递求解：

但实际上经过im2col之后转换成了全连接的计算方式：(其实卷积层的参数传递计算与全连接层的计算本质上是一样的)

2.2 源码分析：

2.2.1 属性变量：

  int kernel_h_, kernel_w_;
  int stride_h_, stride_w_;
  int num_;
  int channels_;
  int pad_h_, pad_w_;
  int height_, width_;
  int group_;
  int num_output_;
  int height_out_, width_out_;
  bool bias_term_;

  /// M_ is the channel dimension of the output for a single group, which is the
  /// leading dimension of the filter matrix.
  int M_;
  /// K_ is the dimension of an unrolled input for a single group, which is the
  /// leading dimension of the data matrix.
  int K_;
  /// N_ is the spatial dimension of the output, the H x W, which are the last
  /// dimensions of the data and filter matrices.
  int N_;
  Blob col_buffer_;
  Blob bias_multiplier_;

前面一堆就没什么好说的了吧，关键来看看后面的几个：( http://blog.csdn.net/xiaoyezi_1834/article/details/50786363)

M_：相当于是卷积核的数量；

K_：kernel_h × kernel_w，相当于是卷积核元素个数

N_：((height + 2*pad_h – kernel_h）/stride_h+ 1)*((weight +2*pad_w – kernel_w)/stride_w + 1)，而不是简单的height×weight

col_buffer_：存储在前面介绍的im2col时的结果

2.2.2 前馈和反馈：

构造函数，SetUp()，Reshpe()这里就不做说明了。直接看前馈和反馈吧：

首先来看前馈函数：

template 
void ConvolutionLayer::Forward_cpu(const vector*>& bottom,
      vector*>* top) {
  for (int i = 0; i < bottom.size(); ++i) {
    const Dtype* bottom_data = bottom[i]->cpu_data();
    Dtype* top_data = (*top)[i]->mutable_cpu_data();
    Dtype* col_data = col_buffer_.mutable_cpu_data();
    const Dtype* weight = this->blobs_[0]->cpu_data();
    int weight_offset = M_ * K_;  // number of filter parameters in a group
    int col_offset = K_ * N_;  // number of values in an input region / column
    int top_offset = M_ * N_;  // number of values in an output region / column
    for (int n = 0; n < num_; ++n) {
      // im2col transformation: unroll input regions for filtering
      // into column matrix for multiplication.
      im2col_cpu(bottom_data + bottom[i]->offset(n), channels_, height_,
          width_, kernel_h_, kernel_w_, pad_h_, pad_w_, stride_h_, stride_w_,
          col_data);
      int offsetN = (*top)[i]->offset(n);
      // Take inner products for groups.
      for (int g = 0; g < group_; ++g) {
        caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_,
          (Dtype)1., weight + weight_offset * g, col_data + col_offset * g,
          (Dtype)0., top_data + (*top)[i]->offset(n) + top_offset * g);
      }
      // Add bias.
      if (bias_term_) {
        caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_,
            N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(),
            bias_multiplier_.cpu_data(),
            (Dtype)1., top_data + (*top)[i]->offset(n));
      }
    }
  }
}

这个原理已经在上面介绍过了，然后在代码中用到了 im2col_cpu()，这个函数已经在前面介绍过了。接下来代码中先做的事情是：y = y + w * x，最后的if判断是否需要加上偏置，如果要加上，则再进行 y = y + b。

其中调用了caffe_cpu_gemm()来实现代数运算：(源码实现在math_functions.cpp中)

template<>
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,
    const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
    const float alpha, const float* A, const float* B, const float beta,
    float* C) {
  int lda = (TransA == CblasNoTrans) ? K : M;
  int ldb = (TransB == CblasNoTrans) ? N : K;
  cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,
      ldb, beta, C, N);
}

template<>
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,
    const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
    const double alpha, const double* A, const double* B, const double beta,
    double* C) {
  int lda = (TransA == CblasNoTrans) ? K : M;
  int ldb = (TransB == CblasNoTrans) ? N : K;
  cblas_dgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,
      ldb, beta, C, N);
}

这里提供了两个版本的caffe_cpu_gemm()，这个时候你会发现，原来代数运算真正的实现者是BLAS：

官方API：

http://www.netlib.org/lapack/explore-html/db/d66/cblas__sgemm_8c_a584e7569aee83c27b2b2bf22ea4f9f23.html#a584e7569aee83c27b2b2bf22ea4f9f23

博客园的介绍：(尽管这里是MKL，其实他们的参数，功能都是一样的，估计实现代码也差不多)

http://www.cnblogs.com/darkknightzh/p/5553336.html

CSDN某人的介绍：(比较详细)

http://blog.csdn.net/cleverysm/article/details/1925549

再来看反馈部分：

template 
void ConvolutionLayer::Backward_cpu(const vector*>& top,
      const vector& propagate_down, vector*>* bottom) {
  const Dtype* weight = NULL;
  Dtype* weight_diff = NULL;
  if (this->param_propagate_down_[0]) {
    weight = this->blobs_[0]->cpu_data();
    weight_diff = this->blobs_[0]->mutable_cpu_diff();
    caffe_set(this->blobs_[0]->count(), Dtype(0), weight_diff);
  }
  Dtype* bias_diff = NULL;
  if (bias_term_ && this->param_propagate_down_[1]) {
    bias_diff = this->blobs_[1]->mutable_cpu_diff();
    caffe_set(this->blobs_[1]->count(), Dtype(0), bias_diff);
  }
  const int weight_offset = M_ * K_;
  const int col_offset = K_ * N_;
  const int top_offset = M_ * N_;
  for (int i = 0; i < top.size(); ++i) {
    const Dtype* top_diff = NULL;
    // Bias gradient, if necessary.
    if (bias_term_ && this->param_propagate_down_[1]) {
      top_diff = top[i]->cpu_diff();
      for (int n = 0; n < num_; ++n) {
        caffe_cpu_gemv(CblasNoTrans, num_output_, N_,
            1., top_diff + top[0]->offset(n),
            bias_multiplier_.cpu_data(), 1.,
            bias_diff);
      }
    }
    if (this->param_propagate_down_[0] || propagate_down[i]) {
      if (!top_diff) {
        top_diff = top[i]->cpu_diff();
      }
      Dtype* col_data = col_buffer_.mutable_cpu_data();
      Dtype* col_diff = col_buffer_.mutable_cpu_diff();
      const Dtype* bottom_data = (*bottom)[i]->cpu_data();
      Dtype* bottom_diff = (*bottom)[i]->mutable_cpu_diff();
      for (int n = 0; n < num_; ++n) {
        // Since we saved memory in the forward pass by not storing all col
        // data, we will need to recompute them.
        im2col_cpu(bottom_data + (*bottom)[i]->offset(n), channels_, height_,
                   width_, kernel_h_, kernel_w_, pad_h_, pad_w_,
                   stride_h_, stride_w_, col_data);
        // gradient w.r.t. weight. Note that we will accumulate diffs.
        if (this->param_propagate_down_[0]) {
          for (int g = 0; g < group_; ++g) {
            caffe_cpu_gemm(CblasNoTrans, CblasTrans, M_, K_, N_,
                (Dtype)1., top_diff + top[i]->offset(n) + top_offset * g,
                col_data + col_offset * g, (Dtype)1.,
                weight_diff + weight_offset * g);
          }
        }
        // gradient w.r.t. bottom data, if necessary.
        if (propagate_down[i]) {
          if (weight == NULL) {
            weight = this->blobs_[0]->cpu_data();
          }
          for (int g = 0; g < group_; ++g) {
            caffe_cpu_gemm(CblasTrans, CblasNoTrans, K_, N_, M_,
                (Dtype)1., weight + weight_offset * g,
                top_diff + top[i]->offset(n) + top_offset * g,
                (Dtype)0., col_diff + col_offset * g);
          }
          // col2im back to the data
          col2im_cpu(col_diff, channels_, height_, width_,
              kernel_h_, kernel_w_, pad_h_, pad_w_,
              stride_h_, stride_w_, bottom_diff + (*bottom)[i]->offset(n));
        }
      }
    }
  }
}

该函数最开始的部分对需要计算的参数(权重，偏置)的指针初始化，相当于是指向了需要存储的位置。

for循环，才是真正计算的主体：

其中用到了caffe_cpu_gemv()，可以查看文档，相当于实现了：y = alpha*A*x + beta*y，其中x, y表示向量，而A表示矩阵。

那么for循环的开始处理偏置的问题，根据计算原理，直接将top_diff+top[0]->offset(n)累加到bias_diff中即可。程序中实现的是：

bias_diff = 1.0 * [top_diff+top[0]->offset(n)] * bias_multiplier_.cpu_data() + 1.0 * bias_diff

而bias_mutiplier_在caffe中默认被设置为1，所以相当于是实现了一个累加的过程。

for循环后面就接着处理权重的问题了，这个时候因为可能批量处理时有很多个图，所以又涉及到一个for循环。

根据计算原理，在程序中对应的相当于是：

weight_diff+weight_offset*g = 1.0 * [top_diff+top[i]->offset(n)+top_offset*g] * [col_data+col_offset*g] + 1.0 * [weight_diff+weight_offset*g]

其中的g表示分组处理。

最后将误差传递到低层：(相当于是delta(l) = W * delta(l+1))

在程序中的体现是：col_diff+col_offset*g = 1.0 * [weight + weight_offset*g] * [top_diff + top[i]->offset(n) + top_offset * g] + 0.0 * [col_diff+col_offset*g]

引用caffe源码阅读(2)卷积层的一句话：

卷积运算是先要将kernel旋转180度之后再扫过去的。可以看出Caffe源码是没有这一步的，所以最后学出来的“kernel”实际上是应该还要旋转回来才是正确的卷积核。

CuDNNConvolutionLayer就跳过了，其实现的方式大致和ConvolutionLayer差不多。下面来看LRNLayer

3 LRNLayer：

Normalize the input in a local region across or within feature maps.

3.1 原理介绍：

局部相应归一化(local response normalization)：每个输入的值都除以：

其中alpha, beta都在源码和配置文件中都有定义；

而n则表示局部相应范围的大小，在源码中使用size_来表达(在配置文件中用local_size)，实际上n和size_是一种函数映射关系：(where n is the size of each local region)

如果在通道间计算归一化，则n=size_；

如果是通道内计算归一化，则n=size_*size_；

从配置文件说起，：

一半我们的网络配置文件中的LRN层参数如下：(caffeNet中截取的一段)
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
那么其中的local_size：(在源码中用size_)

1）通道间归一化时表示求和的通道数；

2）通道内归一化时表示求和区间的边长；

默认值为5。

alpha，beta也就是上面公式中的两个参数。

其实在写这个配置文件的时候还有一个可选参数：

norm_region: 选择对相邻通道间归一化还是通道内空间区域归一化，默认为ACROSS_CHANNELS，即通道间归一化；

在通道间归一化模式中，局部区域范围在相邻通道间，但没有空间扩展（即尺寸为 local_size x 1 x 1）；

在通道内归一化模式中，局部区域在空间上扩展，但只针对独立通道进行（即尺寸为 1 x local_size x local_size）；

3.2 源码分析：

3.2.1 属性变量：

  int size_;
  int pre_pad_;
  Dtype alpha_;
  Dtype beta_;
  int num_;
  int channels_;
  int height_;
  int width_;

上面的变量当中，基本都是熟悉的部分。其余的变量在原理介绍中已经说过了。

再来看看别的变量：

  // Fields used for normalization ACROSS_CHANNELS
  // scale_ stores the intermediate summing results
  Blob scale_;

  // Fields used for normalization WITHIN_CHANNEL
  shared_ptr > split_layer_;
  vector*> split_top_vec_;
  shared_ptr > square_layer_;
  Blob square_input_;
  Blob square_output_;
  vector*> square_bottom_vec_;
  vector*> square_top_vec_;
  shared_ptr > pool_layer_;
  Blob pool_output_;
  vector*> pool_top_vec_;
  shared_ptr > power_layer_;
  Blob power_output_;
  vector*> power_top_vec_;
  shared_ptr > product_layer_;
  Blob product_input_;
  vector*> product_bottom_vec_;

看上去很多东西，但实际上根据我们的配置文件，基本不会用到WITHIN_CHANNEL，所以下面一堆变量暂时不用关心。

3.2.2 构造函数：

template 
void LRNLayer::LayerSetUp(const vector*>& bottom,
      vector*>* top) {
  size_ = this->layer_param_.lrn_param().local_size();
  CHECK_EQ(size_ % 2, 1) << "LRN only supports odd values for local_size";
  pre_pad_ = (size_ - 1) / 2;
  alpha_ = this->layer_param_.lrn_param().alpha();
  beta_ = this->layer_param_.lrn_param().beta();
  if (this->layer_param_.lrn_param().norm_region() ==
      LRNParameter_NormRegion_WITHIN_CHANNEL) {
    // Set up split_layer_ to use inputs in the numerator and denominator.
    split_top_vec_.clear();
    split_top_vec_.push_back(&product_input_);
    split_top_vec_.push_back(&square_input_);
    LayerParameter split_param;
    split_layer_.reset(new SplitLayer(split_param));
    split_layer_->SetUp(bottom, &split_top_vec_);
    // Set up square_layer_ to square the inputs.
    square_bottom_vec_.clear();
    square_top_vec_.clear();
    square_bottom_vec_.push_back(&square_input_);
    square_top_vec_.push_back(&square_output_);
    LayerParameter square_param;
    square_param.mutable_power_param()->set_power(Dtype(2));
    square_layer_.reset(new PowerLayer(square_param));
    square_layer_->SetUp(square_bottom_vec_, &square_top_vec_);
    // Set up pool_layer_ to sum over square neighborhoods of the input.
    pool_top_vec_.clear();
    pool_top_vec_.push_back(&pool_output_);
    LayerParameter pool_param;
    pool_param.mutable_pooling_param()->set_pool(
        PoolingParameter_PoolMethod_AVE);
    pool_param.mutable_pooling_param()->set_pad(pre_pad_);
    pool_param.mutable_pooling_param()->set_kernel_size(size_);
    pool_layer_.reset(new PoolingLayer(pool_param));
    pool_layer_->SetUp(square_top_vec_, &pool_top_vec_);
    // Set up power_layer_ to compute (1 + alpha_/N^2 s)^-beta_, where s is
    // the sum of a squared neighborhood (the output of pool_layer_).
    power_top_vec_.clear();
    power_top_vec_.push_back(&power_output_);
    LayerParameter power_param;
    power_param.mutable_power_param()->set_power(-beta_);
    power_param.mutable_power_param()->set_scale(alpha_);
    power_param.mutable_power_param()->set_shift(Dtype(1));
    power_layer_.reset(new PowerLayer(power_param));
    power_layer_->SetUp(pool_top_vec_, &power_top_vec_);
    // Set up a product_layer_ to compute outputs by multiplying inputs by the
    // inverse demoninator computed by the power layer.
    product_bottom_vec_.clear();
    product_bottom_vec_.push_back(&product_input_);
    product_bottom_vec_.push_back(&power_output_);
    LayerParameter product_param;
    EltwiseParameter* eltwise_param = product_param.mutable_eltwise_param();
    eltwise_param->set_operation(EltwiseParameter_EltwiseOp_PROD);
    product_layer_.reset(new EltwiseLayer(product_param));
    product_layer_->SetUp(product_bottom_vec_, top);
  }
}

看上去这么大一堆，实际上如果我们忽略if里面的内容，只有很少一部分的初始化过程。因为if里面的一堆设置都是在通道内做归一化，而我们通常是做通道间归一化的，所以暂时也不用关心。

值得注意的是，在最开始对size_做了一个奇偶性的判断，只能够接受奇数！

Reshape()这里就暂时过掉了，基本都是那么回事。

3.2.3 前馈函数：

template 
void LRNLayer::Forward_cpu(const vector*>& bottom,
    vector*>* top) {
  switch (this->layer_param_.lrn_param().norm_region()) {
  case LRNParameter_NormRegion_ACROSS_CHANNELS:
    CrossChannelForward_cpu(bottom, top);
    break;
  case LRNParameter_NormRegion_WITHIN_CHANNEL:
    WithinChannelForward(bottom, top);
    break;
  default:
    LOG(FATAL) << "Unknown normalization region.";
  }
}

后面的前馈反馈函数都涉及到两个模式，我们这里都只看通道间归一化模式：

通道间归一化的前馈函数：

template 
void LRNLayer::CrossChannelForward_cpu(
    const vector*>& bottom, vector*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  Dtype* scale_data = scale_.mutable_cpu_data();
  // start with the constant value
  for (int i = 0; i < scale_.count(); ++i) {
    scale_data[i] = 1.;
  }
  Blob padded_square(1, channels_ + size_ - 1, height_, width_);
  Dtype* padded_square_data = padded_square.mutable_cpu_data();
  caffe_set(padded_square.count(), Dtype(0), padded_square_data);
  Dtype alpha_over_size = alpha_ / size_;
  // go through the images
  for (int n = 0; n < num_; ++n) {
    // compute the padded square
    caffe_sqr(channels_ * height_ * width_,
        bottom_data + bottom[0]->offset(n),
        padded_square_data + padded_square.offset(0, pre_pad_));
    // Create the first channel scale
    for (int c = 0; c < size_; ++c) {
      caffe_axpy(height_ * width_, alpha_over_size,
          padded_square_data + padded_square.offset(0, c),
          scale_data + scale_.offset(n, 0));
    }
    for (int c = 1; c < channels_; ++c) {
      // copy previous scale
      caffe_copy(height_ * width_,
          scale_data + scale_.offset(n, c - 1),
          scale_data + scale_.offset(n, c));
      // add head
      caffe_axpy(height_ * width_, alpha_over_size,
          padded_square_data + padded_square.offset(0, c + size_ - 1),
          scale_data + scale_.offset(n, c));
      // subtract tail
      caffe_axpy(height_ * width_, -alpha_over_size,
          padded_square_data + padded_square.offset(0, c - 1),
          scale_data + scale_.offset(n, c));
    }
  }

  // In the end, compute output
  caffe_powx(scale_.count(), scale_data, -beta_, top_data);
  caffe_mul(scale_.count(), top_data, bottom_data, top_data);
}

改程序前面会涉及到一个变量scale_data，这个数据来源于属性变量中的scale_，该变量初始化在 Reshape()中，相当于是创建了一个跟bottom[0]一样尺寸大小的Blob。

然后接着对scale_data全部赋初值1，注意这里为什么要赋初值为1！！这是因为在原理介绍中的公式括号里面的1.往后看完应该会明白。

接着创建了一个对象padded_square，它的大小和bottom一样，除了在通道维上比bottom多了size_-1。为什么要多size_-1呢？

还是否记得前面介绍过做局部归一化的时候，每个点要除以一个值(暂时我们设为x_centre)，而这个值得计算当中涉及到一个求和，这个求和的范围是以x_centre为中心，边长为size_的区域。如果是对通道间做归一化，那么这个区域相当于可以理解成1维的；如果是通道内做归一化，那么这个区域可以理解成2维的。

因为这里的代码是通道间的，我们就以1维的为例来说明为什么要多size_-1：

通过上面这个图来解释，应该就相当清楚了，图中的蓝色线条相当于是源码中的channels_，红色线条相当于是channels_ + size_ - 1.

源码接着对对象padded_square的数据部分赋初值0。

这里涉及到一些数学函数，在math_function.cpp中能够找到：
template 
void caffe_set(const int N, const Dtype alpha, Dtype* Y) {
  if (alpha == 0) {
    memset(Y, 0, sizeof(Dtype) * N);  // NOLINT(caffe/alt_fn)
    return;
  }
  for (int i = 0; i < N; ++i) {
    Y[i] = alpha;
  }
}
这个就不做说明了吧。
template <>
void caffe_sqr(const int n, const float* a, float* y) {
  vsSqr(n, a, y);
}

template <>
void caffe_sqr(const int n, const double* a, double* y) {
  vdSqr(n, a, y);
}
如果光看sqr还真不知道是计算什么，可能是平方，也可能是开根。然后网上也没有找到vsSqr()，不知道是什么库里面的。

在Caffe源码（一）：math_functions分析中基本将里面的函数全部都讲了，暂时无法考证是否全部正确，就默认他是对的吧。根据这篇博客的介绍，这里是计算平方的，也就是实现了：y = a^2
template <>
void caffe_axpy(const int N, const float alpha, const float* X,
    float* Y) { cblas_saxpy(N, alpha, X, 1, Y, 1); }

template <>
void caffe_axpy(const int N, const double alpha, const double* X,
    double* Y) { cblas_daxpy(N, alpha, X, 1, Y, 1); }
可以在BLAS API中查到，这里实现的是：Y = alpha*X + Y (X, Y都是向量，alpha是一个值)
template 
void caffe_copy(const int N, const Dtype* X, Dtype* Y) {
  if (X != Y) {
    if (Caffe::mode() == Caffe::GPU) {
#ifndef CPU_ONLY
      // NOLINT_NEXT_LINE(caffe/alt_fn)
      CUDA_CHECK(cudaMemcpy(Y, X, sizeof(Dtype) * N, cudaMemcpyDefault));
#else
      NO_GPU;
#endif
    } else {
      memcpy(Y, X, sizeof(Dtype) * N);  // NOLINT(caffe/alt_fn)
    }
  }
}

template void caffe_copy(const int N, const int* X, int* Y);
template void caffe_copy(const int N, const unsigned int* X,
    unsigned int* Y);
template void caffe_copy(const int N, const float* X, float* Y);
template void caffe_copy(const int N, const double* X, double* Y);
这里有几种copy函数，相当于是将X复制到Y里面去。
template <>
void caffe_powx(const int n, const float* a, const float b,
    float* y) {
  vsPowx(n, a, b, y);
}

template <>
void caffe_powx(const int n, const double* a, const double b,
    double* y) {
  vdPowx(n, a, b, y);
}
相当于是计算幂：y = a^b，其中y, a是向量，b是一个值
template <>
void caffe_mul(const int n, const float* a, const float* b,
    float* y) {
  vsMul(n, a, b, y);
}

template <>
void caffe_mul(const int n, const double* a, const double* b,
    double* y) {
  vdMul(n, a, b, y);
}
相当于是向量乘法，按元素乘，也就是：y[i] = a[i]*b[i]

涉及到的这个函数介绍完了，再来接着看前馈函数CrossChannelForward_cpu()：

计算alpha_over_size好理解，也就是原理介绍中的alpha/n。

接下来的大for循环：

将bottom中的数据平方之后存入padded_square_data中；相当于是原理介绍中的x_i^2

接下来的两个for循环，想画图来说明，还真不知道怎么画合适：

尽量用语言描述清楚吧：

第一个for注释上说的是计算第一个通道的scale：

我将大for看完之后，才明白scale是原理介绍中公式部分的括号部分，也就是(1 + alpha/n * sum(x_i^2))。

而scale已经赋值为1了，所以剩下的部分就是累加 “加号” 后面的部分。

大for遍历的是图像个数，第一个小for遍历的是size_也就是原理介绍中的求和范围大小。

需要注意的是，这里的计算直接作用于每个通道的所有数据(height_*weight_)，最后累加到scale_data里面，注意scale_data的尺寸大小和bottom大小是一样的，所以我这里的描述为了不引起歧义，再啰嗦一点，尽管是对每个通道的所有数据都操作了，但是也分别累加到scale_data对应的位置的。

第二个for：

有了前面的那张图 + 前面第一个for的解释，我想应该能够理解了。

经过这个大for之后，相当于是把原理介绍中的括号部分计算完了。

这个前馈函数的最后两行：第一个是计算幂，注意参数是 -beta。为什么要用负的呢？还记得原理介绍中是这样说的：“每个输入的值都除以”，所以为了将除转换成乘，就用负的幂。

最后一行也就是实现乘法，也就是原始数据乘以scale_data。

哎，终于把这个前馈弄完了，下面接着研究它的反馈函数。

3.2.4 反馈函数：

template 
void LRNLayer::Backward_cpu(const vector*>& top,
    const vector& propagate_down, vector*>* bottom) {
  switch (this->layer_param_.lrn_param().norm_region()) {
  case LRNParameter_NormRegion_ACROSS_CHANNELS:
    CrossChannelBackward_cpu(top, propagate_down, bottom);
    break;
  case LRNParameter_NormRegion_WITHIN_CHANNEL:
    WithinChannelBackward(top, propagate_down, bottom);
    break;
  default:
    LOG(FATAL) << "Unknown normalization region.";
  }
}

同样也只看通道间的反馈：

template 
void LRNLayer::CrossChannelBackward_cpu(
    const vector*>& top, const vector& propagate_down,
    vector*>* bottom) {
  const Dtype* top_diff = top[0]->cpu_diff();
  const Dtype* top_data = top[0]->cpu_data();
  const Dtype* bottom_data = (*bottom)[0]->cpu_data();
  const Dtype* scale_data = scale_.cpu_data();
  Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
  Blob padded_ratio(1, channels_ + size_ - 1, height_, width_);
  Blob accum_ratio(1, 1, height_, width_);
  Dtype* padded_ratio_data = padded_ratio.mutable_cpu_data();
  Dtype* accum_ratio_data = accum_ratio.mutable_cpu_data();
  // We hack a little bit by using the diff() to store an additional result
  Dtype* accum_ratio_times_bottom = accum_ratio.mutable_cpu_diff();
  caffe_set(padded_ratio.count(), Dtype(0), padded_ratio_data);
  Dtype cache_ratio_value = 2. * alpha_ * beta_ / size_;

  caffe_powx(scale_.count(), scale_data, -beta_, bottom_diff);
  caffe_mul(scale_.count(), top_diff, bottom_diff, bottom_diff);

  // go through individual data
  int inverse_pre_pad = size_ - (size_ + 1) / 2;
  for (int n = 0; n < num_; ++n) {
    int block_offset = scale_.offset(n);
    // first, compute diff_i * y_i / s_i
    caffe_mul(channels_ * height_ * width_,
        top_diff + block_offset, top_data + block_offset,
        padded_ratio_data + padded_ratio.offset(0, inverse_pre_pad));
    caffe_div(channels_ * height_ * width_,
        padded_ratio_data + padded_ratio.offset(0, inverse_pre_pad),
        scale_data + block_offset,
        padded_ratio_data + padded_ratio.offset(0, inverse_pre_pad));
    // Now, compute the accumulated ratios and the bottom diff
    caffe_set(accum_ratio.count(), Dtype(0), accum_ratio_data);
    for (int c = 0; c < size_ - 1; ++c) {
      caffe_axpy(height_ * width_, 1.,
          padded_ratio_data + padded_ratio.offset(0, c), accum_ratio_data);
    }
    for (int c = 0; c < channels_; ++c) {
      caffe_axpy(height_ * width_, 1.,
          padded_ratio_data + padded_ratio.offset(0, c + size_ - 1),
          accum_ratio_data);
      // compute bottom diff
      caffe_mul(height_ * width_,
          bottom_data + top[0]->offset(n, c),
          accum_ratio_data, accum_ratio_times_bottom);
      caffe_axpy(height_ * width_, -cache_ratio_value,
          accum_ratio_times_bottom, bottom_diff + top[0]->offset(n, c));
      caffe_axpy(height_ * width_, -1.,
          padded_ratio_data + padded_ratio.offset(0, c), accum_ratio_data);
    }
  }
}

~~这里相当于是在计算反函数！具体的自己研究吧~~~

这里的解释应该有误，误差传递通常应该是遵循：

4 PoolingLayer：

Pools the input image by taking the max, average, etc. within regions.

4.1 原理介绍：

在官网的介绍中，说提供了三种方式：MAX, AVG, STOCHASTIC. 但实际上在源码中只看到前两种，最后一种随机的没有。

前馈的过程比较简单：MAX就是将“卷积核”(过滤器)覆盖范围最大的那个值传递下去；而AVG就是将覆盖范围的平均值传递下去。

反馈的过程：

MAX：这个在前馈的时候需要记录传递下去的值的坐标，所以反馈的时候也只反馈给对应的坐标，其余的位置误差设定为0.

AVG：直接将top层的误差分成N份，反馈回去。N通常等于过滤器的大小(kernel_h * kernel_w)。边角部分的N可能会小一些，这是因为当过滤器去覆盖的时候，可能超出了边界。

4.2 源码分析：

LayerSetUp(), Reshape()都基本差不多的。

4.2.1 属性变量：

  int kernel_h_, kernel_w_;
  int stride_h_, stride_w_;
  int pad_h_, pad_w_;
  int channels_;
  int height_, width_;
  int pooled_height_, pooled_width_;
  Blob rand_idx_;
  Blob max_idx_;

重点来看看最后三行变量。pooled_height_, pooled_width_这两个好理解吧，也就是降采样(或者说池化)之后的大小。

rand_idx_应该是随机降采样时用的，可惜在源码中没有看到有这种降采样的实现。

那么max_idx_就是用于max pooling咯？

4.2.2 前馈反馈函数：

前馈函数：

// TODO(Yangqing): Is there a faster way to do pooling in the channel-first
// case?
template 
void PoolingLayer::Forward_cpu(const vector*>& bottom,
      vector*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  const int top_count = (*top)[0]->count();
  // We'll output the mask to top[1] if it's of size >1.
  const bool use_top_mask = top->size() > 1;
  int* mask = NULL;  // suppress warnings about uninitalized variables
  Dtype* top_mask = NULL;
  // Different pooling methods. We explicitly do the switch outside the for
  // loop to save time, although this results in more code.
  switch (this->layer_param_.pooling_param().pool()) {
  case PoolingParameter_PoolMethod_MAX:
    // Initialize
    if (use_top_mask) {
      top_mask = (*top)[1]->mutable_cpu_data();
      caffe_set(top_count, Dtype(-1), top_mask);
    } else {
      mask = max_idx_.mutable_cpu_data();
      caffe_set(top_count, -1, mask);
    }
    caffe_set(top_count, Dtype(-FLT_MAX), top_data);
    // The main loop
    for (int n = 0; n < bottom[0]->num(); ++n) {
      for (int c = 0; c < channels_; ++c) {
        for (int ph = 0; ph < pooled_height_; ++ph) {
          for (int pw = 0; pw < pooled_width_; ++pw) {
            int hstart = ph * stride_h_ - pad_h_;
            int wstart = pw * stride_w_ - pad_w_;
            int hend = min(hstart + kernel_h_, height_);
            int wend = min(wstart + kernel_w_, width_);
            hstart = max(hstart, 0);
            wstart = max(wstart, 0);
            const int pool_index = ph * pooled_width_ + pw;
            for (int h = hstart; h < hend; ++h) {
              for (int w = wstart; w < wend; ++w) {
                const int index = h * width_ + w;
                if (bottom_data[index] > top_data[pool_index]) {
                  top_data[pool_index] = bottom_data[index];
                  if (use_top_mask) {
                    top_mask[pool_index] = static_cast(index);
                  } else {
                    mask[pool_index] = index;
                  }
                }
              }
            }
          }
        }
        // compute offset
        bottom_data += bottom[0]->offset(0, 1);
        top_data += (*top)[0]->offset(0, 1);
        if (use_top_mask) {
          top_mask += (*top)[0]->offset(0, 1);
        } else {
          mask += (*top)[0]->offset(0, 1);
        }
      }
    }
    break;
  case PoolingParameter_PoolMethod_AVE:
    for (int i = 0; i < top_count; ++i) {
      top_data[i] = 0;
    }
    // The main loop
    for (int n = 0; n < bottom[0]->num(); ++n) {
      for (int c = 0; c < channels_; ++c) {
        for (int ph = 0; ph < pooled_height_; ++ph) {
          for (int pw = 0; pw < pooled_width_; ++pw) {
            int hstart = ph * stride_h_ - pad_h_;
            int wstart = pw * stride_w_ - pad_w_;
            int hend = min(hstart + kernel_h_, height_ + pad_h_);
            int wend = min(wstart + kernel_w_, width_ + pad_w_);
            int pool_size = (hend - hstart) * (wend - wstart);
            hstart = max(hstart, 0);
            wstart = max(wstart, 0);
            hend = min(hend, height_);
            wend = min(wend, width_);
            for (int h = hstart; h < hend; ++h) {
              for (int w = wstart; w < wend; ++w) {
                top_data[ph * pooled_width_ + pw] +=
                    bottom_data[h * width_ + w];
              }
            }
            top_data[ph * pooled_width_ + pw] /= pool_size;
          }
        }
        // compute offset
        bottom_data += bottom[0]->offset(0, 1);
        top_data += (*top)[0]->offset(0, 1);
      }
    }
    break;
  case PoolingParameter_PoolMethod_STOCHASTIC:
    NOT_IMPLEMENTED;
    break;
  default:
    LOG(FATAL) << "Unknown pooling method.";
  }
}

看着这么大一堆，别被吓着，其实仔细看看内容，里面实现了MAX 和 AVG的降采样，都比较好理解，具体的坐标对应关系有兴趣的自己研究研究。

反馈函数：

template 
void PoolingLayer::Backward_cpu(const vector*>& top,
      const vector& propagate_down, vector*>* bottom) {
  if (!propagate_down[0]) {
    return;
  }
  const Dtype* top_diff = top[0]->cpu_diff();
  Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
  // Different pooling methods. We explicitly do the switch outside the for
  // loop to save time, although this results in more codes.
  caffe_set((*bottom)[0]->count(), Dtype(0), bottom_diff);
  // We'll output the mask to top[1] if it's of size >1.
  const bool use_top_mask = top.size() > 1;
  const int* mask = NULL;  // suppress warnings about uninitialized variables
  const Dtype* top_mask = NULL;
  switch (this->layer_param_.pooling_param().pool()) {
  case PoolingParameter_PoolMethod_MAX:
    // The main loop
    if (use_top_mask) {
      top_mask = top[1]->cpu_data();
    } else {
      mask = max_idx_.cpu_data();
    }
    for (int n = 0; n < top[0]->num(); ++n) {
      for (int c = 0; c < channels_; ++c) {
        for (int ph = 0; ph < pooled_height_; ++ph) {
          for (int pw = 0; pw < pooled_width_; ++pw) {
            const int index = ph * pooled_width_ + pw;
            const int bottom_index =
                use_top_mask ? top_mask[index] : mask[index];
            bottom_diff[bottom_index] += top_diff[index];
          }
        }
        bottom_diff += (*bottom)[0]->offset(0, 1);
        top_diff += top[0]->offset(0, 1);
        if (use_top_mask) {
          top_mask += top[0]->offset(0, 1);
        } else {
          mask += top[0]->offset(0, 1);
        }
      }
    }
    break;
  case PoolingParameter_PoolMethod_AVE:
    // The main loop
    for (int n = 0; n < top[0]->num(); ++n) {
      for (int c = 0; c < channels_; ++c) {
        for (int ph = 0; ph < pooled_height_; ++ph) {
          for (int pw = 0; pw < pooled_width_; ++pw) {
            int hstart = ph * stride_h_ - pad_h_;
            int wstart = pw * stride_w_ - pad_w_;
            int hend = min(hstart + kernel_h_, height_ + pad_h_);
            int wend = min(wstart + kernel_w_, width_ + pad_w_);
            int pool_size = (hend - hstart) * (wend - wstart);
            hstart = max(hstart, 0);
            wstart = max(wstart, 0);
            hend = min(hend, height_);
            wend = min(wend, width_);
            for (int h = hstart; h < hend; ++h) {
              for (int w = wstart; w < wend; ++w) {
                bottom_diff[h * width_ + w] +=
                  top_diff[ph * pooled_width_ + pw] / pool_size;
              }
            }
          }
        }
        // offset
        bottom_diff += (*bottom)[0]->offset(0, 1);
        top_diff += top[0]->offset(0, 1);
      }
    }
    break;
  case PoolingParameter_PoolMethod_STOCHASTIC:
    NOT_IMPLEMENTED;
    break;
  default:
    LOG(FATAL) << "Unknown pooling method.";
  }
}

因为反馈的原理在前面介绍过了，所以这个代码看起来也不是那么复杂。

你可能感兴趣的:(深度学习,caffe,源码分析,layer,卷积层,LRN)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
【iOS】MVC设计模式 Magnetic_h ios mvc 设计模式 objective-c 学习 ui
MVC前言如何设计一个程序的结构，这是一门专门的学问，叫做"架构模式"（architecturalpattern），属于编程的方法论。MVC模式就是架构模式的一种。它是Apple官方推荐的App开发架构，也是一般开发者最先遇到、最经典的架构。MVC各层controller层Controller/ViewController/VC（控制器）负责协调Model和View，处理大部分逻辑它将数据从Mod
第一场雪岁月静好_nx
早晨起来，外面白茫茫的一片，总算是下雪了，这还是今年第一场雪呢！走在路上，踩着雪“咯吱咯吱”的，空气很湿润。树上、草坪上、屋顶上都落了白白的一层，天上还零星漂着几点雪。慢慢走在路上，呼吸着清新的空气，感受着冬天的美好，心情也好多了。
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
关于提高复杂业务逻辑代码可读性的思考编程经验分享开发经验 java 数据库开发语言
目录前言需求场景常规写法拆分方法领域对象总结前言实际工作中大部分时间都是在写业务逻辑，一般都是三层架构，表示层（Controller）接收客户端请求，并对入参做检验，业务逻辑层（Service）负责处理业务逻辑，一般开发都是在这一层中写具体的业务逻辑。数据访问层（Dao）是直接和数据库交互的，用于查数据给业务逻辑层，或者是将业务逻辑层处理后的数据写入数据库。简单的增删改查接口不用多说，基本上写好一
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
数组去重好奇的猫猫猫
整理自js中基础数据结构数组去重问题思考？如何去除数组中重复的项例如数组：[1,3,4,3,5]我们在做去重的时候，一开始想到的肯定是，逐个比较，外面一层循环，内层后一个与前一个一比较，如果是久不将当前这一项放进新的数组，挨个比较完之后返回一个新的去过重复的数组不好的实践方式上述方法效率极低，代码量还多，思考？有没有更好的方法这时候不禁一想当然有了！！！hashtable啊，通过对象的hash办法
傍晚小罗琳
鸟叫声在小区那边，密密稠稠，轻快而明亮，它们是归巢前互道晚安呢！金色的黄昏洋洋洒洒地飘落在房屋上，给它们镀上了一层淡淡的金边。一到黄昏，没有一个地方不是热闹的，街上的车慢慢多起来，出来散步的人也三五成群，谈笑风生。狗狗们似乎也闷坏了，撒欢地你追我赶，尽管小雨刚停，但它们的热情不减，叫着跑着，好不热闹。潮湿的空气弥漫着醉人的芬芳，楼下的杜鹃花也欣欣然张开了嘴，火红的花瓣张扬地舞动着，鲜艳欲滴，花瓣似
下一站深圳默琊
昨天已经买好3/15到深圳的机票了，原本上周还有点拖延症发作，不太积极，所以昨天就直接逼迫自己买机票，然后在订房，下周就是确认行业和把具体的面谈日程定下来。行业的选择上目前没有太大的偏好，上一份工作主要是风控和客服，客服部分也算是个小组长，有负责培训和一些案件SOP流程的制定等工作。总感觉客服这个职位的职涯发展只能是垂直的往更高的管理层走，对于横向发展似乎不容易，而鉴于做客服1年的感受，我不太喜欢
Shell、Bash、Zsh这都是啥啊小白码上飞 bash linux 开发语言
Zsh和Bash都是我们常用的Shell，那先搞明白啥是shell吧。Shell作为一个单词，他是“壳”的意思，蛋壳坚果壳。之所以叫壳，是为了和计算机的“核”来区分，用它表示“为使用者提供的操作界面”。所以这个命名其实很形象，翻译成中文，直译过来叫“壳层”。个人认为这个叫法很奇怪，意译貌似也没有什么好的词汇来匹配。就还是叫shell吧。维基百科给的定义是：Incomputing,ashellisa
如何在心上用功？余超林AIA财富管家
思考：如何在心上用功？学习心得：心-道-德-事的理解心-道-德-事这四部曲，本质上就是一个人的思维智慧的四个层面：事是最底层，这是所有人在这个社会谋求生存的基础，一个人能够把事情彻底做好，保质保量的完成，才会有真正的结果，但是这个层面要获得真正成功很困难，因为会做事的人很多，最终会出现恶性竞争；德是第三层，如果说整个社会做事的竞争激烈程度为100%，那么上升到德上的竞争激烈程度降低为80%，德是一
看不懂的秋天騎黑馬的東北漢
金风玉露,秋高气爽,咋一看欲冷还暖,初秋不知何时悄悄来到了我们身边,遍野金涂,层林墨染。虽然少了几分绚丽的花飞蝶舞,却多了一抹简约的秋水长天,艳阳当空高照,满月亲人团圆,每个人情不自禁走进了秋,不带一丝残花落叶的凄凉,只多了一种喜悦期盼的情愫。图片发自App每个人都有自己喜欢的季节，在自己的内心里也会有着对季节的诠释。然而我却看不懂，说实话我不太喜欢秋，即使秋天是收获的季节。图片发自App连续几年
香辣烤鱼豆腐，口感Q弹，喜欢吃烤串的一定要试试陈大仙儿
今天分享一道香辣烤鱼豆腐，特别香，吃起来特别Q弹，喜欢吃夜市，喜欢吃烤串的朋友一定要试试。食材表：鱼豆腐400g、甜面酱1勺、海鲜酱1勺、烧烤酱1勺、油辣椒2勺、植物油适量、孜然粉适量、熟白芝麻适量制作方法：1、把竹签放到清水中，浸泡至少半个小时，然后将鱼豆腐用竹签穿起来，放到盘中备用2、烤盘上铺一层锡纸，把穿好的鱼豆腐摆进去，然后在鱼豆腐表面刷上一层植物油，正反面都要刷上，这样可以防止粘到烤盘上
计算机网络八股总结 Petrichorzncu 八股总结计算机网络笔记
这里写目录标题网络模型划分（五层和七层）及每一层的功能五层网络模型七层网络模型（OSI模型）==三次握手和四次挥手具体过程及原因==三次握手四次挥手TCP/IP协议组成==UDP协议与TCP/IP协议的区别==Http协议相关知识网络地址，子网掩码等相关计算网络模型划分（五层和七层）及每一层的功能五层网络模型应用层：负责处理网络应用程序，如电子邮件、文件传输和网页浏览。主要协议包括HTTP、FTP
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
ERP企业资源规划系统点滴~ 教育电商
ERP企业资源规划系统ERP（EnterpriseResourcePlanning）企业资源规划系统是一种综合性的管理信息系统，旨在通过信息技术手段实现对企业内部资源的全面规划、管理和控制。以下是对ERP企业资源规划系统的详细解析：一、定义与核心思想ERP系统建立在信息技术基础上，以系统化的管理思想，为企业决策层及员工提供决策运行手段的管理平台。它不仅仅是一个软件，更重要的是一个管理思想，实现了企
《驴友的朝圣》065 户外运动论坛，论户外运动之现在与未来经典老表
十几年来，我国户外运动蓬勃发展，已经形成全民参与热情。各类户外运动项目和形式层出不穷。各种户外运动装备产品花样百出。看着形势一派大好。但是，在这大好形势之下，仍存在着诸多的发展瓶颈及安全与管理问题，需要提请重视。为此，江城登山协会在本地召开了“户外运动论坛”，邀请市内户外运动俱乐部及体育系统领导一起研讨本地区户外运动发展的可持续性。2019年6月1日，论坛在世贸万锦大酒店的支持下，在其三层会议大厅
MyBatis 详解阿贾克斯的黎明 java mybatis
目录目录一、MyBatis是什么二、为什么使用MyBatis（一）灵活性高（二）性能优化（三）易于维护三、怎么用MyBatis（一）添加依赖（二）配置MyBatis（三）创建实体类和接口（四）使用MyBatis一、MyBatis是什么MyBatis是一个优秀的持久层框架，它支持自定义SQL、存储过程以及高级映射。MyBatis免除了几乎所有的JDBC代码以及设置参数和获取结果集的工作。它可以通过简
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
研究表明，中年人“失业”成为了趋势，关键原因有这4点舒山有鹿
01在职场中，一直存在这么一个定律——35岁中年失业定律。很多人都特别疑惑，35岁还未到中年期，为什么人们会把“中年”跟“失业”挂钩呢？有句话，说得很现实：“35岁之前辞职，叫跳槽；35岁之后辞职，叫失业。”一般来说，35岁失业和40岁失业的本质是差不多的。只要他们还未升到管理层，便被单位辞退，就证明他们只能“另谋出路”了。况且，随着环境的愈发复杂，行业问题的频频发生，线下商业的不景气，那中年人找
L1 L2 L3 缓存京天不下雨 windows 缓存 windows
L1L2L3缓存L1Cache(一级bai缓存)是CPU第一层高速缓存，分为数据缓存和指令缓存。du内置的zhiL1高速缓存的容量和结构对daoCPU的性能影响较大，不过高速缓冲存储器均由静态RAM组成，结构较复杂，在CPU管芯面积不能太大的情况下，L1级高速缓存的容量不可能做得太大。一般服务器CPU的L1缓存的容量通常在32—4096KB。L2由于L1级高速缓存容量的限制，为了再次提高CPU的运
JVM源码分析之堆外内存完全解读 HeapDump性能社区
概述广义的堆外内存说到堆外内存，那大家肯定想到堆内内存，这也是我们大家接触最多的，我们在jvm参数里通常设置-Xmx来指定我们的堆的最大值，不过这还不是我们理解的Java堆，-Xmx的值是新生代和老生代的和的最大值，我们在jvm参数里通常还会加一个参数-XX:MaxPermSize来指定持久代的最大值，那么我们认识的Java堆的最大值其实是-Xmx和-XX:MaxPermSize的总和，在分代算法
spring mvc @RequestBody String类型参数 zoyation spring-mvc spring mvc
通过如下配置：text/html;charset=UTF-8application/json;charset=UTF-8在springmvc的Controller层使用@RequestBody接收Content-Type为application/json的数据时，默认支持Map方式和对象方式参数@RequestMapping(value="/{code}/saveUser",method=Requ
故事||我是一本书（3）贪玩
我的房间在精装洋房的第二层。这个楼层在书类的房子里，属于黄金楼层，因为无论是被家人选中来作文字交流，还是独自站在房间观察周围，这层都是最佳位置，不高不低，视野开阔。每日放学后，老贝在书桌上写作业的时候，我都能清晰的观察她。毕竟我刚刚认识她，对这个漂亮的、有点调皮的小姑娘，我很想深入的了解她。妈妈在创造我时，记录了一些小朋友学习文化知识情景的小故事，因此我灵魂深处，有一些这方面的记忆，但是毕竟不多，
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
leetcode021-合并两个有序链表陆阳226
问题描述将两个升序链表合并为一个新的升序链表并返回。新链表是通过拼接给定的两个链表的所有节点组成的。示例：输入：1->2->4,1->3->4输出：1->1->2->3->4->4解答递归法：每一层减去一个较小的节点，直到某个链表为null递归结束。publicstaticListNodesolution(ListNodel1,ListNodel2){if(l1==null){returnl2;}
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
tomcat基础与部署发布暗黑小菠萝 Tomcat java web
从51cto搬家了，以后会更新在这里方便自己查看。做项目一直用tomcat，都是配置到eclipse中使用，这几天有时间整理一下使用心得，有一些自己配置遇到的细节问题。 Tomcat：一个Servlets和JSP页面的容器，以提供网站服务。一、Tomcat安装安装方式：①运行.exe安装包 &n
网站架构发展的过程 ayaoxinchao 数据库应用服务器网站架构
1.初始阶段网站架构：应用程序、数据库、文件等资源在同一个服务器上 2.应用服务和数据服务分离：应用服务器、数据库服务器、文件服务器 3.使用缓存改善网站性能：为应用服务器提供本地缓存，但受限于应用服务器的内存容量，可以使用专门的缓存服务器，提供分布式缓存服务器架构 4.使用应用服务器集群改善网站的并发处理能力：使用负载均衡调度服务器，将来自客户端浏览器的访问请求分发到应用服务器集群中的任何
[信息与安全]数据库的备份问题 comsci 数据库
如果你们建设的信息系统是采用中心-分支的模式,那么这里有一个问题如果你的数据来自中心数据库,那么中心数据库如果出现故障,你的分支机构的数据如何保证安全呢? 是否应该在这种信息系统结构的基础上进行改造,容许分支机构的信息系统也备份一个中心数据库的文件呢? &n
使用maven tomcat plugin插件debug关联源代码商人shang maven debug 查看源码 tomcat-plugin
*首先需要配置好'''maven-tomcat7-plugin'''，参见[[Maven开发Web项目]]的'''Tomcat'''部分。 *配置好后，在[[Eclipse]]中打开'''Debug Configurations'''界面，在'''Maven Build'''项下新建当前工程的调试。在'''Main'''选项卡中点击'''Browse Workspace...'''选择需要开发的
大访问量高并发 oloz 大访问量高并发
大访问量高并发的网站主要压力还是在于数据库的操作上，尽量避免频繁的请求数据库。下面简要列出几点解决方案： 01、优化你的代码和查询语句，合理使用索引 02、使用缓存技术例如memcache、ecache将不经常变化的数据放入缓存之中 03、采用服务器集群、负载均衡分担大访问量高并发压力 04、数据读写分离 05、合理选用框架，合理架构(推荐分布式架构)。
cache 服务器小猪猪08 cache
Cache 即高速缓存.那么cache是怎么样提高系统性能与运行速度呢？是不是在任何情况下用cache都能提高性能？是不是cache用的越多就越好呢？我在近期开发的项目中有所体会，写下来当作总结也希望能跟大家一起探讨探讨，有错误的地方希望大家批评指正。　　1.Cache 是怎么样工作的? 　　Cache 是分配在服务器上
mysql存储过程香水浓 mysql
Description:插入大量测试数据 use xmpl; drop procedure if exists mockup_test_data_sp; create procedure mockup_test_data_sp( in number_of_records int ) begin declare cnt int; declare name varch
CSS的class、id、css文件名的常用命名规则 agevs JavaScript UI 框架 Ajax css
CSS的class、id、css文件名的常用命名规则 (一)常用的CSS命名规则　　头：header 　　内容：content/container 　　尾：footer 　　导航：nav 　　侧栏：sidebar 　　栏目：column 　　页面外围控制整体布局宽度：wrapper 　　左右中：left right
全局数据源 AILIKES java tomcat mysql jdbc JNDI
实验目的：为了研究两个项目同时访问一个全局数据源的时候是创建了一个数据源对象，还是创建了两个数据源对象。 1：将diuid和mysql驱动包（druid-1.0.2.jar和mysql-connector-java-5.1.15.jar）copy至%TOMCAT_HOME%/lib下；2：配置数据源，将JNDI在%TOMCAT_HOME%/conf/context.xml中配置好,格式如下：&l
MYSQL的随机查询的实现方法 baalwolf mysql
MYSQL的随机抽取实现方法。举个例子，要从tablename表中随机提取一条记录，大家一般的写法就是：SELECT * FROM tablename ORDER BY RAND() LIMIT 1。但是，后来我查了一下MYSQL的官方手册，里面针对RAND()的提示大概意思就是，在ORDER BY从句里面不能使用RAND()函数，因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中，
JAVA的getBytes()方法 bijian1013 java eclipse unix OS
在Java中，String的getBytes()方法是得到一个操作系统默认的编码格式的字节数组。这个表示在不同OS下，返回的东西不一样！ String.getBytes(String decode)方法会根据指定的decode编码返回某字符串在该编码下的byte数组表示，如： byte[] b_gbk = "
AngularJS中操作Cookies bijian1013 JavaScript AngularJS Cookies
如果你的应用足够大、足够复杂，那么你很快就会遇到这样一咱种情况：你需要在客户端存储一些状态信息，这些状态信息是跨session(会话)的。你可能还记得利用document.cookie接口直接操作纯文本cookie的痛苦经历。幸运的是，这种方式已经一去不复返了，在所有现代浏览器中几乎
[Maven学习笔记五]Maven聚合和继承特性 bit1129 maven
Maven聚合在实际的项目中，一个项目通常会划分为多个模块，为了说明问题，以用户登陆这个小web应用为例。通常一个web应用分为三个模块： 1. 模型和数据持久化层user-core, 2. 业务逻辑层user-service以 3. web展现层user-web， user-service依赖于user-core user-web依赖于user-core和use
【JVM七】JVM知识点总结 bit1129 jvm
1. JVM运行模式 1.1 JVM运行时分为-server和-client两种模式，在32位机器上只有client模式的JVM。通常，64位的JVM默认都是使用server模式，因为server模式的JVM虽然启动慢点，但是，在运行过程，JVM会尽可能的进行优化 1.2 JVM分为三种字节码解释执行方式：mixed mode, interpret mode以及compiler
linux下查看nginx、apache、mysql、php的编译参数 ronin47
在linux平台下的应用，最流行的莫过于nginx、apache、mysql、php几个。而这几个常用的应用，在手工编译完以后，在其他一些情况下（如：新增模块），往往想要查看当初都使用了那些参数进行的编译。这时候就可以利用以下方法查看。 1、nginx [root@361way ~]# /App/nginx/sbin/nginx -V nginx: nginx version: nginx/
unity中运用Resources.Load的方法？ brotherlamp unity视频 unity资料 unity自学 unity unity教程
问：unity中运用Resources.Load的方法？答：Resources.Load是unity本地动态加载资本所用的方法,也即是你想动态加载的时分才用到它,比方枪弹,特效,某些实时替换的图像什么的,主张此文件夹不要放太多东西,在打包的时分,它会独自把里边的一切东西都会集打包到一同,不论里边有没有你用的东西,所以大多数资本应该是自个建文件放置 1、unity实时替换的物体即是依据环境条件
线段树-入门 bylijinnan java 算法线段树
/** * 线段树入门 * 问题：已知线段[2,5] [4,6] [0,7]；求点2,4,7分别出现了多少次 * 以下代码建立的线段树用链表来保存，且树的叶子结点类似[i,i] * * 参考链接：http://hi.baidu.com/semluhiigubbqvq/item/be736a33a8864789f4e4ad18 * @author lijinna
全选与反选 chicony 全选
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>全选与反选</title>
vim一些简单记录 chenchao051 vim
mac在/usr/share/vim/vimrc linux在/etc/vimrc 1、问：后退键不能删除数据，不能往后退怎么办？答：在vimrc中加入set backspace=2 2、问：如何控制tab键的缩进？答：在vimrc中加入set tabstop=4 (任何
Sublime Text 快捷键 daizj 快捷键 sublime
[size=large][/size]Sublime Text快捷键：Ctrl+Shift+P：打开命令面板Ctrl+P：搜索项目中的文件Ctrl+G：跳转到第几行Ctrl+W：关闭当前打开文件Ctrl+Shift+W：关闭所有打开文件Ctrl+Shift+V：粘贴并格式化Ctrl+D：选择单词，重复可增加选择下一个相同的单词Ctrl+L：选择行，重复可依次增加选择下一行Ctrl+Shift+L：
php 引用(&)详解 dcj3sjt126com PHP
在PHP 中引用的意思是：不同的名字访问同一个变量内容. 与Ｃ语言中的指针是有差别的．Ｃ语言中的指针里面存储的是变量的内容在内存中存放的地址变量的引用 PHP 的引用允许你用两个变量来指向同一个内容复制代码代码如下: <? $a="ABC"; $b =&$a; echo
SVN中trunk,branches,tags用法详解 dcj3sjt126com SVN
Subversion有一个很标准的目录结构，是这样的。比如项目是proj，svn地址为svn://proj/，那么标准的svn布局是svn://proj/|+-trunk+-branches+-tags这是一个标准的布局，trunk为主开发目录，branches为分支开发目录，tags为tag存档目录（不允许修改）。但是具体这几个目录应该如何使用，svn并没有明确的规范，更多的还是用户自己的习惯。
对软件设计的思考 e200702084 设计模式数据结构算法 ssh 活动
软件设计的宏观与微观软件开发是一种高智商的开发活动。一个优秀的软件设计人员不仅要从宏观上把握软件之间的开发，也要从微观上把握软件之间的开发。宏观上，可以应用面向对象设计，采用流行的SSH架构，采用web层，业务逻辑层，持久层分层架构。采用设计模式提供系统的健壮性和可维护性。微观上，对于一个类，甚至方法的调用，从计算机的角度模拟程序的运行情况。了解内存分配，参数传
同步、异步、阻塞、非阻塞 geeksun 非阻塞
同步、异步、阻塞、非阻塞这几个概念有时有点混淆，在此文试图解释一下。同步：发出方法调用后，当没有返回结果，当前线程会一直在等待（阻塞）状态。场景：打电话，营业厅窗口办业务、B/S架构的http请求-响应模式。异步：方法调用后不立即返回结果，调用结果通过状态、通知或回调通知方法调用者或接收者。异步方法调用后，当前线程不会阻塞，会继续执行其他任务。实现：
Reverse SSH Tunnel 反向打洞實錄 hongtoushizi ssh
實際的操作步驟： # 首先，在客戶那理的機器下指令連回我們自己的 Server，並設定自己 Server 上的 12345 port 會對應到幾器上的 SSH port ssh -NfR 12345:localhost:22 [email protected] # 然後在 myhost 的機器上連自己的 12345 port，就可以連回在客戶那的機器 ssh localhost -p 1
Hibernate中的缓存 Josh_Persistence 一级缓存 Hiberante缓存查询缓存二级缓存
Hibernate中的缓存一、Hiberante中常见的三大缓存：一级缓存，二级缓存和查询缓存。 Hibernate中提供了两级Cache，第一级别的缓存是Session级别的缓存，它是属于事务范围的缓存。这一级别的缓存是由hibernate管理的，一般情况下无需进行干预；第二级别的缓存是SessionFactory级别的缓存，它是属于进程范围或群集范围的缓存。这一级别的缓存
对象关系行为模式之延迟加载 home198979 PHP 架构延迟加载
形象化设计模式实战 HELLO!架构一、概念 Lazy Load：一个对象，它虽然不包含所需要的所有数据，但是知道怎么获取这些数据。延迟加载貌似很简单，就是在数据需要时再从数据库获取，减少数据库的消耗。但这其中还是有不少技巧的。二、实现延迟加载实现Lazy Load主要有四种方法：延迟初始化、虚
xml 验证 pengfeicao521 xml xml解析
有些字符，xml不能识别，用jdom或者dom4j解析的时候就报错 public static void testPattern() { // 含有非法字符的串 String str = "Jamey친Ñ&#1282
div设置半透明效果 spjich css 半透明
为div设置如下样式： div{filter:alpha(Opacity=80);-moz-opacity:0.5;opacity: 0.5;} 说明： 1、filter：对win IE设置半透明滤镜效果，filter:alpha(Opacity=80)代表该对象80%半透明，火狐浏览器不认2、-moz-opaci
你真的了解单例模式么？ w574240966 java 单例设计模式 jvm
单例模式，很多初学者认为单例模式很简单，并且认为自己已经掌握了这种设计模式。但事实上，你真的了解单例模式了么。一，单例模式的5中写法。（回字的四种写法，哈哈。） 1，懒汉式（1）线程不安全的懒汉式 public cla