jsk_learner

MobileNet DepthwiseConvolution、ShuffleNet shuffle channel、CenterLoss在Caffe下实现

针对Caffe下特殊操作实现

MobileNet-DepthwiseConvolution在Caffe下实现

实现步骤
源码：

depthwise_conv_layer.hpp
depthwise_conv_layer.cpp
depthwise_conv_layer.cu

ShuffleNet通道混洗操作（shuffle channel）在Caffe下的实现

实现步骤
源码

shuffle_channel_layer.hpp
shuffle_channel_layer.cpp
shuffle_channel_layer.cu

损失函数CenterLoss在Caffe下的实现

实现步骤
源码

center_loss_layer.hpp
center_loss_layer.cpp
center_loss_layer.cu

本篇博客主要讲解Caffe下一些特殊操作的实现，主要涉及MobileNet深度可分离卷积操作的实现、ShuffleNet的通道混洗操作、CenterLoss损失函数的实现

系统：Linux-Ubuntu

MobileNet-DepthwiseConvolution在Caffe下实现

我用的是Github上shicai的源码，可在以下链接进行下载：Github上DepthwiseConvolution实现源码下载

深度可分离卷积操作即（DepthwiseConvolution）的实现不需要对Caffe目录下的/src/caffe/proto/caffe.proto进行修改。

下载链接中的代码后，在目录caffe下有两个文件夹：include和src

在两个文件夹下分别有我们需要的源码：

include：depthwise_conv_layer.hpp
src：depthwise_conv_layer.cpp、depthwise_conv_layer.cu

文件名字	文件用途
depthwise_conv_layer.hpp	头文件
depthwise_conv_layer.cpp	DepthwiseConvolution的CPU实现
depthwise_conv_layer.cu	DepthwiseConvolution的GPU实现

实现步骤

我们需要做的操作就是：

将include下的depthwise_conv_layer.hpp放到/caffeMS/include/caffe/layers/目录下

将src下的depthwise_conv_layer.cpp和 depthwise_conv_layer.cu放到/caffeMS/src/caffe/layers/目录下。

然后重新编译Caffe即可。

    make all -j8
    make test -j8
    make runtest -j8

实际使用：
对dw层，即group参数大于1的层，将其type由"Convolution"改为 “DepthwiseConvolution”

layer {
  name: "conv2_1/dw"
  type: "DepthwiseConvolution"
  bottom: "conv1"
  top: "conv2_1/dw"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 32
    bias_term: false
    pad: 1
    kernel_size: 3
    group: 32
    stride: 1
    weight_filler {
      type: "msra"
    }
    engine: CAFFE
  }
}

不过，链接中下载的文件“transferTypeToDepthwiseConvolution.py”可以直接完成这个操作

python2 transferTypeToDepthwiseConvolution.py mobilenet_train.prototxt mobilenet_train_dw.prototxt

import caffe.proto.caffe_pb2 as caffe_pb2
from google.protobuf.text_format import Merge
import argparse
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('source_prototxt')
    parser.add_argument('target_prototxt')

    args = parser.parse_args()
    net = caffe_pb2.NetParameter()
    Merge(open(args.source_prototxt, 'r').read(), net)
    for layer in net.layer:
        if layer.type == "Convolution":
            if layer.convolution_param.group !=1:
                layer.type = "DepthwiseConvolution"
    with open(args.target_prototxt, 'w') as tf:
        tf.write(str(net))

源码：

depthwise_conv_layer.hpp

/*
 * depthwise_conv_layer.hpp
 *
 *  Created on: May 23, 2017
 *      Author: liuhao
 */

#ifndef CAFFE_DEPTHWISE_CONV_LAYER_HPP_
#define CAFFE_DEPTHWISE_CONV_LAYER_HPP_



#include 

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

#include "caffe/layers/base_conv_layer.hpp"

namespace caffe {

/**
 * @brief Convolves the input image with a bank of learned filters,
 *        and (optionally) adds biases.
 *
 *   Caffe convolves by reduction to matrix multiplication. This achieves
 *   high-throughput and generality of input and filter dimensions but comes at
 *   the cost of memory for matrices. This makes use of efficiency in BLAS.
 *
 *   The input is "im2col" transformed to a channel K' x H x W data matrix
 *   for multiplication with the N x K' x H x W filter matrix to yield a
 *   N' x H x W output matrix that is then "col2im" restored. K' is the
 *   input channel * kernel height * kernel width dimension of the unrolled
 *   inputs so that the im2col matrix has a column for each input region to
 *   be filtered. col2im restores the output spatial structure by rolling up
 *   the output channel N' columns of the output matrix.
 */
template <typename Dtype>
class DepthwiseConvolutionLayer : public BaseConvolutionLayer<Dtype> {
 public:
  /**
   * @param param provides ConvolutionParameter convolution_param,
   *    with ConvolutionLayer options:
   *  - num_output. The number of filters.
   *  - kernel_size / kernel_h / kernel_w. The filter dimensions, given by
   *  kernel_size for square filters or kernel_h and kernel_w for rectangular
   *  filters.
   *  - stride / stride_h / stride_w (\b optional, default 1). The filter
   *  stride, given by stride_size for equal dimensions or stride_h and stride_w
   *  for different strides. By default the convolution is dense with stride 1.
   *  - pad / pad_h / pad_w (\b optional, default 0). The zero-padding for
   *  convolution, given by pad for equal dimensions or pad_h and pad_w for
   *  different padding. Input padding is computed implicitly instead of
   *  actually padding.
   *  - dilation (\b optional, default 1). The filter
   *  dilation, given by dilation_size for equal dimensions for different
   *  dilation. By default the convolution has dilation 1.
   *  - group (\b optional, default 1). The number of filter groups. Group
   *  convolution is a method for reducing parameterization by selectively
   *  connecting input and output channels. The input and output channel dimensions must be divisible
   *  by the number of groups. For group @f$ \geq 1 @f$, the
   *  convolutional filters' input and output channels are separated s.t. each
   *  group takes 1 / group of the input channels and makes 1 / group of the
   *  output channels. Concretely 4 input channels, 8 output channels, and
   *  2 groups separate input channels 1-2 and output channels 1-4 into the
   *  first group and input channels 3-4 and output channels 5-8 into the second
   *  group.
   *  - bias_term (\b optional, default true). Whether to have a bias.
   *  - engine: convolution has CAFFE (matrix multiplication) and CUDNN (library
   *    kernels + stream parallelism) engines.
   */
  explicit DepthwiseConvolutionLayer(const LayerParameter& param)
      : BaseConvolutionLayer<Dtype>(param) {}

  virtual inline const char* type() const { return "DepthwiseConvolution"; }

 protected:
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
  virtual inline bool reverse_dimensions() { return false; }
  virtual void compute_output_shape();
};

}  // namespace caffe



#endif /* INCLUDE_CAFFE_LAYERS_DEPTHWISE_CONV_LAYER_HPP_ */

depthwise_conv_layer.cpp

#include 
#include "caffe/layers/depthwise_conv_layer.hpp"

namespace caffe {

template <typename Dtype>
void DepthwiseConvolutionLayer<Dtype>::compute_output_shape() {
  const int* kernel_shape_data = this->kernel_shape_.cpu_data();
  const int* stride_data = this->stride_.cpu_data();
  const int* pad_data = this->pad_.cpu_data();
  const int* dilation_data = this->dilation_.cpu_data();
  this->output_shape_.clear();
  for (int i = 0; i < this->num_spatial_axes_; ++i) {
    // i + 1 to skip channel axis
    const int input_dim = this->input_shape(i + 1);
    const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1;
    const int output_dim = (input_dim + 2 * pad_data[i] - kernel_extent)
        / stride_data[i] + 1;
    this->output_shape_.push_back(output_dim);
  }
}

template <typename Dtype>
void DepthwiseConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
	const Dtype* weight = this->blobs_[0]->cpu_data();
  for (int i = 0; i < bottom.size(); ++i) {
    const Dtype* bottom_data = bottom[i]->cpu_data();
    Dtype* top_data = top[i]->mutable_cpu_data();
    for (int n = 0; n < this->num_; ++n) {
      this->forward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight,
          top_data + n * this->top_dim_);
      if (this->bias_term_) {
        const Dtype* bias = this->blobs_[1]->cpu_data();
        this->forward_cpu_bias(top_data + n * this->top_dim_, bias);
      }
    }
  }
}

template <typename Dtype>
void DepthwiseConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
  const Dtype* weight = this->blobs_[0]->cpu_data();
  Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
  for (int i = 0; i < top.size(); ++i) {
    const Dtype* top_diff = top[i]->cpu_diff();
    const Dtype* bottom_data = bottom[i]->cpu_data();
    Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();
    // Bias gradient, if necessary.
    if (this->bias_term_ && this->param_propagate_down_[1]) {
      Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();
      for (int n = 0; n < this->num_; ++n) {
        this->backward_cpu_bias(bias_diff, top_diff + n * this->top_dim_);
      }
    }
    if (this->param_propagate_down_[0] || propagate_down[i]) {
      for (int n = 0; n < this->num_; ++n) {
        // gradient w.r.t. weight. Note that we will accumulate diffs.
        if (this->param_propagate_down_[0]) {
          this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,
              top_diff + n * this->top_dim_, weight_diff);
        }
        // gradient w.r.t. bottom data, if necessary.
        if (propagate_down[i]) {
          this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,
              bottom_diff + n * this->bottom_dim_);
        }
      }
    }
  }
}

#ifdef CPU_ONLY
STUB_GPU(DepthwiseConvolutionLayer);
#endif

INSTANTIATE_CLASS(DepthwiseConvolutionLayer);
REGISTER_LAYER_CLASS(DepthwiseConvolution);
}  // namespace caffe

depthwise_conv_layer.cu

#include 
#include 
#include 
#include "caffe/layers/depthwise_conv_layer.hpp"
#include "caffe/util/math_functions.hpp"


/*
 * The depthwise layer for mobilenet.   only for stride 1
 */

namespace caffe {

template <typename Dtype>
__global__ void ConvForward(const int nthreads,
		const Dtype* const bottom_data, const int num, const int channels,
		const int height, const int width,const int conved_height,
		const int conved_width,const int kernel_h, const int kernel_w,
		const int stride_h, const int stride_w, const int pad_h, const int pad_w,
		Dtype* const top_data,const Dtype* const weight,const Dtype* const bias,const bool bias_term_) {
	CUDA_KERNEL_LOOP(index, nthreads) {

		const int pw = index % conved_width;
		const int ph = (index / conved_width) % conved_height;
		const int c = (index / conved_width / conved_height) % channels;
		const int n = index / conved_width / conved_height / channels;
		int hstart = ph * stride_h - pad_h;
		int wstart = pw * stride_w - pad_w;
		int hend = min(hstart + kernel_h, height + pad_h);
		int wend = min(wstart + kernel_w, width + pad_w);
//		const int pool_size = (hend - hstart) * (wend - wstart);
		hstart = max(hstart, 0);
		wstart = max(wstart, 0);
		hend = min(hend, height);
		wend = min(wend, width);
		Dtype aveval = 0;
		const Dtype* const bottom_slice =
		bottom_data + (n * channels + c) * height * width;
		const Dtype* const weight_slice =
		weight + c * kernel_h * kernel_w;
//		if (index==1) {
//			printf("pw%d ph%d c%d n%d \n",pw,ph,c,n);
//			printf("hstart%d wstart%d hend%d wend%d \n",hstart,wstart,hend,wend);
//		}

		int khstart=hend<kernel_h?kernel_h-hend:0;
		int kwstart=wend<kernel_w?kernel_w-wend:0;
		for (int h = hstart; h < hend; ++h) {
			for (int w = wstart; w < wend; ++w) {

				aveval += bottom_slice[h * width + w]*weight_slice[(khstart+h-hstart) * kernel_w + (kwstart+w-wstart)];
//				if (index==1) {
//					printf("pos:h%d w%d\n",h,w);
//					printf("cal:bottom%f weight%f\n",bottom_slice[h * width + w],weight_slice[(h-hstart) * kernel_w + (w-wstart)]);
//				}
			}
		}
		if(bias_term_) {
			aveval+=bias[c];
		}
		top_data[index] = aveval;
	}
}

template<typename Dtype>
void DepthwiseConvolutionLayer<Dtype>::Forward_gpu(
		const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
//	std::cout << "fp" << std::endl;
	const Dtype* weight = this->blobs_[0]->gpu_data();
	int* kernel_shape_data = this->kernel_shape_.mutable_cpu_data();
	int* stride_data = this->stride_.mutable_cpu_data();
	int* pad_data = this->pad_.mutable_cpu_data();

	for (int i = 0; i < bottom.size(); ++i) {
		const Dtype* bottom_data = bottom[i]->gpu_data();
		Dtype* top_data = top[i]->mutable_gpu_data();
		const int count = top[i]->count();
		vector<int> shape_ = bottom[i]->shape();
		const int channels_ = shape_[1];
		const int height_ = shape_[2];
		const int width_ = shape_[3];

		const int kernel_h_ = kernel_shape_data[0];
		const int kernel_w_ = kernel_shape_data[1];
		const int stride_h_ = stride_data[0];
		const int stride_w_ = stride_data[1];
		const int pad_h_ = pad_data[0];
		const int pad_w_ = pad_data[1];

		const int conved_height = this->output_shape_[0];
		const int conved_weight = this->output_shape_[1];

		const bool bias_term_ = this->bias_term_;

		if (bias_term_) {
			const Dtype* const bias = this->blobs_[1]->gpu_data();
			ConvForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
					count, bottom_data, bottom[i]->num(), channels_,
					height_, width_,conved_height,conved_weight,kernel_h_,
					kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, top_data,weight,bias,bias_term_);
		} else {
			ConvForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
					count, bottom_data, bottom[i]->num(), channels_,
					height_, width_,conved_height,conved_weight,kernel_h_,
					kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, top_data,weight,0,bias_term_);
		}
	}
}

template <typename Dtype>
__global__ void ConvBackward(const int nthreads,
const Dtype* const top_diff,
const int num, const int channels, const int height,
const int width, const int conved_height, const int conved_width,
const int kernel_h, const int kernel_w, const int stride_h,
const int stride_w, const int pad_h, const int pad_w,
Dtype* const bottom_diff,
const Dtype* const weight) {

	CUDA_KERNEL_LOOP(index, nthreads) {
		const int w = index % width + pad_w;
		const int h = (index / width) % height + pad_h;
		const int c = (index / width / height) % channels;
		const int n = index / width / height / channels;
		
		const int phstart = (h < kernel_h) ? 0 : (h - kernel_h) / stride_h + 1;
		const int phend = min(h / stride_h + 1, conved_height);
		const int pwstart = (w < kernel_w) ? 0 : (w - kernel_w) / stride_w + 1;
		const int pwend = min(w / stride_w + 1, conved_width);
		
		const int khstart=(h >= kernel_h) ? ((h-kernel_h)%stride_h)+(kernel_h-stride_h): h;
		const int kwstart=(w >= kernel_w) ? ((w-kernel_w)%stride_w)+(kernel_w-stride_w) : w;
		
		Dtype gradient = 0;
		const Dtype* const top_diff_slice =
		top_diff + (n * channels + c) * conved_height * conved_width;
		
		const Dtype* const weight_slice =weight + c * kernel_h * kernel_w;
		
//		if (index==2) {
//			printf("w%d h%d c%d n%d \n",w,h,c,n);
//			printf("phstart%d phend%d pwstart%d pwend%d \n",phstart,phend,pwstart,pwend);
//		}
		
		for (int ph = phstart; ph < phend; ++ph) {
			for (int pw = pwstart; pw < pwend; ++pw) {
				int kh=khstart-(ph-phstart)*stride_h;
				int kw=kwstart-(pw-pwstart)*stride_w;
				gradient += top_diff_slice[ph * conved_width + pw] *weight_slice[kh*kernel_w+kw];
				
//						if (index==2) {
//							printf("pos:ph%d pw%d kh%d kw%d\n",ph,pw,kh,kw);
//							printf("cal:top_diff%f weight%f\n",top_diff_slice[ph * conved_width + pw],weight_slice[kh*kernel_w+kw]);
//				//			printf("cal:top_diff%f weight%f\n",top_diff_slice[ph * conved_width + pw],weight_slice[kh*kernel_w+kw]);
//						}
			}
		}
		bottom_diff[index] = gradient;
	}
}

__device__ float atomicAddme(float* address, float val)
{
    return atomicAdd(address,val);
}

__device__ double atomicAddme(double* address, double val)
{
    unsigned long long int* address_as_ull =
                                          (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed, 
                        __double_as_longlong(val + 
                        __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}



#define DIVIDE_CEIL(a,b) a/b+((a/b*b)


template <typename Dtype>
__global__ void ConvBackwardWeight(const int nthreads,
const Dtype* const top_diff,
const int num, const int channels, const int height,
const int width, const int conved_height, const int conved_width,
const int kernel_h, const int kernel_w, const int stride_h,
const int stride_w, const int pad_h, const int pad_w,
Dtype* const weight_diff,
const Dtype* const bottom_data) {

	CUDA_KERNEL_LOOP(index, nthreads) {
		const int kw=index % kernel_w;
		const int kh= (index /kernel_w)%kernel_h;
		const int c=index /kernel_w/kernel_h;
		
//		if (index==5) {
//			printf("kh%d kw%d kc%d\n",kh,kw,c);
//		}
		Dtype gradient = 0;
		for( int n=0;n<num;n++) {
			
			const Dtype* const top_diff_slice = top_diff + (n * channels + c) * conved_height * conved_width;
			const Dtype* const bottom_data_slice = bottom_data + (n * channels + c) * height * width;
		
			
			const int phstart=max(DIVIDE_CEIL((pad_h-kh),stride_h),0);
			const int phend=min(DIVIDE_CEIL((height+pad_h-kh),stride_h),conved_height);
		
			const int pwstart=max(DIVIDE_CEIL((pad_w-kw),stride_w),0);
			
			const int pwend=min(DIVIDE_CEIL((width+pad_w-kw),stride_w),conved_width);
//			if (index==5) {
//				printf("phstart%d phend%d pwstart%d pwend%d \n",phstart,phend,pwstart,pwend);
//			}
//			
			for(int ph=phstart;ph<phend;ph++){
				for (int pw=pwstart;pw<pwend;pw++){
					const int h=ph*stride_h+kh-pad_h;
					const int w=pw*stride_w+kw-pad_w;
					gradient+=top_diff_slice[ph * conved_width + pw]*bottom_data_slice[h*width+w];
//					if (index==5) {
//						printf("n%d h%d w%d ph%d pw%d topdiff%f bottomdata%f\n",n,h,w,ph,pw,top_diff_slice[ph * conved_width + pw],bottom_data_slice[h*width+w]);
//			//			printf("phstart%d phend%d pwstart%d pwend%d \n",phstart,phend,pwstart,pwend);
//					}
				}
			}
		}
		weight_diff[c * kernel_h * kernel_w+kh*kernel_w+kw]+=gradient;
	}
}

template <typename Dtype>
__global__ void ConvBackwardBias(const int nthreads,
const Dtype* const top_diff,
const int num, const int channels, const int height,
const int width, const int conved_height, const int conved_width,
const int kernel_h, const int kernel_w, const int stride_h,
const int stride_w, const int pad_h, const int pad_w,
Dtype* const bias_diff) {
	CUDA_KERNEL_LOOP(index, nthreads) {
		const int c = index;
		Dtype gradient=0;
		for( int n=0;n<num;n++) {
			const Dtype* const top_diff_slice =
			top_diff + (n * channels + c) * conved_height * conved_width;
			for(int ph=0;ph<conved_height;ph++) {
				for (int pw=0;pw<conved_width;pw++) {
					gradient+=top_diff_slice[ph * conved_width + pw];
				}
			}
		}
		bias_diff[c]+=gradient;
	}
}
template<typename Dtype>
void DepthwiseConvolutionLayer<Dtype>::Backward_gpu(
const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {


	int* kernel_shape_data = this->kernel_shape_.mutable_cpu_data();
	int* stride_data = this->stride_.mutable_cpu_data();
	int* pad_data = this->pad_.mutable_cpu_data();

	const Dtype* weight = this->blobs_[0]->gpu_data();
	Dtype* weight_diff = this->blobs_[0]->mutable_gpu_diff();

	const bool bias_term_ = this->bias_term_;
	Dtype* bias_diff = bias_term_ ? this->blobs_[1]->mutable_gpu_diff() : 0;
	const bool bias_propagate_down_ = this->param_propagate_down_[1];
	const bool weight_propagate_down_ = this->param_propagate_down_[0];


	const int kernel_h_ = kernel_shape_data[0];
	const int kernel_w_ = kernel_shape_data[1];
	const int stride_h_ = stride_data[0];
	const int stride_w_ = stride_data[1];
	const int pad_h_ = pad_data[0];
	const int pad_w_ = pad_data[1];

	const int conved_height = this->output_shape_[0];
	const int conved_weight = this->output_shape_[1];

//	CHECK_EQ(stride_h_, 1)
//	        << "The backward of the net whose stride is bigger than 1 is not implemented now. ";
//	CHECK_EQ(stride_w_, 1)
//	        << "The backward of the net whose stride is bigger than 1 is not implemented now. ";


	for (int i = 0; i < top.size(); ++i) {

		const Dtype* top_diff = top[i]->gpu_diff();
		const Dtype* bottom_data = bottom[i]->gpu_data();
		Dtype* bottom_diff = bottom[i]->mutable_gpu_diff();

		vector<int> shape_ = bottom[i]->shape();
		const int channels_ = shape_[1];
		const int height_ = shape_[2];
		const int width_ = shape_[3];

		// Bias gradient, if necessary.
		if (bias_term_ && bias_propagate_down_) {
			const int count_bias = channels_;
			ConvBackwardBias<Dtype><<<CAFFE_GET_BLOCKS(count_bias), CAFFE_CUDA_NUM_THREADS>>>(
				count_bias, top_diff, bottom[i]->num(), channels_,
				height_, width_,conved_height,conved_weight,kernel_h_,
				kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_,
				bias_diff);
		}
		// gradient w.r.t. weight. Note that we will accumulate diffs.
		if (weight_propagate_down_) {
			const int count_weight = channels_ * kernel_h_ * kernel_w_;
			ConvBackwardWeight<Dtype><<<CAFFE_GET_BLOCKS(count_weight), CAFFE_CUDA_NUM_THREADS>>>(
					count_weight, top_diff, bottom[i]->num(), channels_,
				height_, width_,conved_height,conved_weight,kernel_h_,
				kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_,
				weight_diff,
				bottom_data);
		}
		// gradient w.r.t. bottom data, if necessary.
		if (propagate_down[i]) {
			const int count_bottom=bottom[i]->count();
			ConvBackward<Dtype><<<CAFFE_GET_BLOCKS(count_bottom), CAFFE_CUDA_NUM_THREADS>>>(
				count_bottom, top_diff, bottom[i]->num(), channels_,
				height_, width_,conved_height,conved_weight,kernel_h_,
				kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, 
				bottom_diff,
				weight);
		}
	}

}

INSTANTIATE_LAYER_GPU_FUNCS (DepthwiseConvolutionLayer);

}  // namespace caffe

ShuffleNet通道混洗操作（shuffle channel）在Caffe下的实现

我用的是Github上farmingyard的源码，可在以下链接下载： Github上shuffle channel实现源码

和DepthwiseConvolution的实现差不太多，唯一不同的是需要对Caffe目录下的/src/caffe/proto/caffe.proto文件进行修改。

还是三个文件：

文件名字	文件用途
shuffle_channel_layer.hpp	头文件
shuffle_channel_layer.cpp	shuffle channel的CPU实现
shuffle_channel_layer.cu	shuffle channel的GPU实现

实现步骤

我们需要做的操作：

将shuffle_channel_layer.hpp放到/caffeMS/include/caffe/layers/目录下

将shuffle_channel_layer.cpp和 shuffle_channel_layer.cu放到/caffeMS/src/caffe/layers/目录下。

修改/caffeMS/src/caffe/proto/caffe.proto文件：

大概是在420行左右，在message LayerParameter中添加一行代码如下，注意164可以随意取，但是不能和其他已有操作的数值一样

message LayerParameter {
...
optional ShuffleChannelParameter shuffle_channel_param = 164;
...
}

在文件最后添加：

message ShuffleChannelParameter {
  optional uint32 group = 1[default = 1]; // The number of group
}

然后重新编译Caffe即可。

make all -j8
make test -j8
make runtest -j8

源码

shuffle_channel_layer.hpp

#ifndef CAFFE_SHUFFLE_CHANNEL_LAYER_HPP_
#define CAFFE_SHUFFLE_CHANNEL_LAYER_HPP_

#include 

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

namespace caffe {

template <typename Dtype>
class ShuffleChannelLayer : public Layer<Dtype> {
public:
    explicit ShuffleChannelLayer(const LayerParameter& param)
        : Layer<Dtype>(param) {}
    virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
        const vector<Blob<Dtype>*>& top);
    virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
        const vector<Blob<Dtype>*>& top);
    virtual inline const char* type() const { return "ShuffleChannel"; }

protected:
    virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
                             const vector<Blob<Dtype>*>& top);
    virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
                             const vector<Blob<Dtype>*>& top);

    virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
                              const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
    virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
                              const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

private:
    void Resize_cpu(Dtype *output, const Dtype *input, int group_row, int group_column, int len);
    void Resize_gpu(Dtype *output, const Dtype *input, int group_row, int group_column, int len);

    //Blob temp_blob_;
    int group_;
};

}  // namespace caffe

#endif  // CAFFE_SHUFFLE_CHANNEL_LAYER_HPP_

shuffle_channel_layer.cpp

#include 
#include 

#include "caffe/layers/shuffle_channel_layer.hpp"

namespace caffe {

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype> *> &bottom, const vector<Blob<Dtype> *> &top)
{
    group_ = this->layer_param_.shuffle_channel_param().group();
    CHECK_GT(group_, 0) << "group must be greater than 0";
    //temp_blob_.ReshapeLike(*bottom[0]);
	top[0]->ReshapeLike(*bottom[0]);
}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Resize_cpu(Dtype *output, const Dtype *input, int group_row, int group_column, int len)
{
    for (int i = 0; i < group_row; ++i) // 2
    {
        for(int j = 0; j < group_column ; ++j) // 3
        {
            const Dtype* p_i = input + (i * group_column + j ) * len;
            Dtype* p_o = output + (j * group_row + i ) * len;

            caffe_copy(len, p_i, p_o);
        }
    }
}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Reshape(const vector<Blob<Dtype> *> &bottom, const vector<Blob<Dtype> *> &top)
{
  int channels_ = bottom[0]->channels();
  int height_ = bottom[0]->height();
  int width_ = bottom[0]->width();

  top[0]->Reshape(bottom[0]->num(), channels_, height_, width_);

}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
                                             const vector<Blob<Dtype>*>& top) {
    const Dtype* bottom_data = bottom[0]->cpu_data();
    Dtype* top_data = top[0]->mutable_cpu_data();

    const int num = bottom[0]->shape(0);
    const int feature_map_size = bottom[0]->count(1);
    const int sp_sz = bottom[0]->count(2);
    const int chs = bottom[0]->shape(1);

    int group_row = group_;
    int group_column = int(chs / group_row);
    CHECK_EQ(chs, (group_column * group_row)) << "Wrong group size.";

    //Dtype* temp_data = temp_blob_.mutable_cpu_data();
    for(int n = 0; n < num; ++n)
    {
		Resize_cpu(top_data + n*feature_map_size, bottom_data + n*feature_map_size, group_row, group_column, sp_sz);
    }
    //caffe_copy(bottom[0]->count(), temp_blob_.cpu_data(), top_data);
}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
                                              const vector<bool>& propagate_down,
                                              const vector<Blob<Dtype>*>& bottom) {
    if (propagate_down[0]) {
        const Dtype* top_diff = top[0]->cpu_diff();
        Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();

        const int num = bottom[0]->shape(0);
        const int feature_map_size = bottom[0]->count(1);
        const int sp_sz = bottom[0]->count(2);
        const int chs = bottom[0]->shape(1);

        int group_row = int(chs / group_);
        int group_column = group_;

        //Dtype* temp_diff = temp_blob_.mutable_cpu_diff();
        for(int n = 0; n < num; ++n)
        {
			Resize_cpu(bottom_diff + n * feature_map_size, top_diff + n*feature_map_size, group_row, group_column, sp_sz);
        }
        //caffe_copy(top[0]->count(), temp_blob_.cpu_diff(), bottom_diff);
    }
}


#ifdef CPU_ONLY
STUB_GPU(ShuffleChannelLayer);
#endif

INSTANTIATE_CLASS(ShuffleChannelLayer);
REGISTER_LAYER_CLASS(ShuffleChannel);
}  // namespace caffe

shuffle_channel_layer.cu

#include 
#include 

#include "caffe/layers/shuffle_channel_layer.hpp"

namespace caffe {

template <typename Dtype>
__global__ void ShuffleChannelKernel(const int nthreads, const int feature_map_size,
	Dtype *output, const Dtype *input, int group_row, int group_column, int len) {
	CUDA_KERNEL_LOOP(index, nthreads) {
		const int n = index / group_row / group_column / len;
		const int i = (index / group_column / len) % group_row;
		const int j = index / len % group_column;
		const int k = index - (n * feature_map_size + (i * group_column + j) * len);
		Dtype* p_o = output + n * feature_map_size + (j * group_row + i) * len;
		p_o[k] = input[index];
	}
}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Resize_gpu(Dtype *output, const Dtype *input, int group_row, int group_column, int len)
{
    for (int i = 0; i < group_row; ++i) // 2
    {
        for(int j = 0; j < group_column ; ++j) // 3
        {
            const Dtype* p_i = input + (i * group_column + j ) * len;
            Dtype* p_o = output + (j * group_row + i ) * len;

            caffe_copy(len, p_i, p_o);
        }
    }
}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
    const Dtype* bottom_data = bottom[0]->gpu_data();
    Dtype* top_data = top[0]->mutable_gpu_data();

    const int num = bottom[0]->num();
    const int feature_map_size = bottom[0]->count(1);
    const int sp_sz = bottom[0]->count(2);
    const int chs = bottom[0]->channels();

    int group_row = group_;
    int group_column = int(chs / group_row);
    CHECK_EQ(chs, (group_column * group_row)) << "Wrong group size.";
	int count = num * group_column * group_row * sp_sz;
	ShuffleChannelKernel<Dtype> << <CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS >> >(
		count, feature_map_size, top_data, bottom_data, group_row, group_column, sp_sz);
    //Dtype* temp_data = temp_blob_.mutable_gpu_data();
    //for(int n = 0; n < num; ++n)
    //{
    //    Resize_gpu(top_data + n*feature_map_size, bottom_data + n*feature_map_size, group_row, group_column, sp_sz);
    //}
    //caffe_copy(bottom[0]->count(), temp_blob_.gpu_data(), top_data);
}

template <typename Dtype>
void ShuffleChannelLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[0]) {
      const Dtype* top_diff = top[0]->gpu_diff();
      Dtype* bottom_diff = bottom[0]->mutable_gpu_diff();

      const int num = bottom[0]->num();
      const int feature_map_size = bottom[0]->count(1);
      const int sp_sz = bottom[0]->count(2);
      const int chs = bottom[0]->channels();

      int group_row = int(chs / group_);
      int group_column = group_;
	  int count = num * group_column * group_row * sp_sz;
	  ShuffleChannelKernel<Dtype> << <CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS >> >(
		  count, feature_map_size, bottom_diff, top_diff, group_row, group_column, sp_sz);
      //Dtype* temp_diff = temp_blob_.mutable_gpu_diff();
    //  for(int n = 0; n < num; ++n)
    //  {
		  //Resize_gpu(bottom_diff + n * feature_map_size, top_diff + n*feature_map_size, group_row, group_column, sp_sz);
    //  }
      //caffe_copy(top[0]->count(), temp_blob_.gpu_diff(), bottom_diff);
  }
}

INSTANTIATE_LAYER_GPU_FUNCS(ShuffleChannelLayer);

}  // namespace caffe

损失函数CenterLoss在Caffe下的实现

CenterLoss主要用在人脸识别上，某个网络需要用到，所以我学习了下怎么使用。我用的是Github上ydwen的官方源码，可在以下链接下载： Github上CenterLoss实现源码

和shuffle channel实现一样，同样也需要对/caffeMS/src/caffe/proto/caffe.proto文件进行修改。

三个文件：

文件名字	文件用途
center_loss_layer.hpp	头文件
center_loss_layer.cpp	CenterLoss的CPU实现
center_loss_layer.cu	CenterLoss的GPU实现

实现步骤

我们需要做的操作：

将center_loss_layer.hpp放到/caffeMS/include/caffe/layers/目录下

将center_loss_layer.cpp和 center_loss_layer.cu放到/caffeMS/src/caffe/layers/目录下。

修改/caffeMS/src/caffe/proto/caffe.proto文件：

大概是在420行左右，在message LayerParameter中添加一行代码如下，注意147可以随意取，但是不能和其他已有操作的数值一样

message LayerParameter {
...
optional CenterLossParameter center_loss_param = 147; 
...
}

在文件最后添加：

message CenterLossParameter {  
  optional uint32 num_output = 1; // The number of outputs for the layer  
  optional FillerParameter center_filler = 2; // The filler for the centers  
  // The first axis to be lumped into a single inner product computation;  
  // all preceding axes are retained in the output.  
  // May be negative to index from the end (e.g., -1 for the last axis).  
  optional int32 axis = 3 [default = 1];  
}

然后重新编译Caffe即可。

make all -j8
make test -j8
make runtest -j8

源码

center_loss_layer.hpp

#ifndef CAFFE_CENTER_LOSS_LAYER_HPP_
#define CAFFE_CENTER_LOSS_LAYER_HPP_

#include 

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

#include "caffe/layers/loss_layer.hpp"

namespace caffe {

template <typename Dtype>
class CenterLossLayer : public LossLayer<Dtype> {
 public:
  explicit CenterLossLayer(const LayerParameter& param)
      : LossLayer<Dtype>(param) {}
  virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

  virtual inline const char* type() const { return "CenterLoss"; }
  virtual inline int ExactNumBottomBlobs() const { return 2; }
  virtual inline int ExactNumTopBlobs() const { return -1; }

 protected:
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

  int M_;
  int K_;
  int N_;
  
  Blob<Dtype> distance_;
  Blob<Dtype> variation_sum_;
};

}  // namespace caffe

#endif  // CAFFE_CENTER_LOSS_LAYER_HPP_```

  ### center_loss_layer.cpp
  

```c
#include 

#include "caffe/filler.hpp"
#include "caffe/layers/center_loss_layer.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {

template <typename Dtype>
void CenterLossLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const int num_output = this->layer_param_.center_loss_param().num_output();  
  N_ = num_output;
  const int axis = bottom[0]->CanonicalAxisIndex(
      this->layer_param_.center_loss_param().axis());
  // Dimensions starting from "axis" are "flattened" into a single
  // length K_ vector. For example, if bottom[0]'s shape is (N, C, H, W),
  // and axis == 1, N inner products with dimension CHW are performed.
  K_ = bottom[0]->count(axis);
  // Check if we need to set up the weights
  if (this->blobs_.size() > 0) {
    LOG(INFO) << "Skipping parameter initialization";
  } else {
    this->blobs_.resize(1);
    // Intialize the weight
    vector<int> center_shape(2);
    center_shape[0] = N_;
    center_shape[1] = K_;
    this->blobs_[0].reset(new Blob<Dtype>(center_shape));
    // fill the weights
    shared_ptr<Filler<Dtype> > center_filler(GetFiller<Dtype>(
        this->layer_param_.center_loss_param().center_filler()));
    center_filler->Fill(this->blobs_[0].get());

  }  // parameter initialization
  this->param_propagate_down_.resize(this->blobs_.size(), true);
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  CHECK_EQ(bottom[1]->channels(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  M_ = bottom[0]->num();
  // The top shape will be the bottom shape with the flattened axes dropped,
  // and replaced by a single axis with dimension num_output (N_).
  LossLayer<Dtype>::Reshape(bottom, top);
  distance_.ReshapeLike(*bottom[0]);
  variation_sum_.ReshapeLike(*this->blobs_[0]);
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* label = bottom[1]->cpu_data();
  const Dtype* center = this->blobs_[0]->cpu_data();
  Dtype* distance_data = distance_.mutable_cpu_data();
  
  // the i-th distance_data
  for (int i = 0; i < M_; i++) {
    const int label_value = static_cast<int>(label[i]);
    // D(i,:) = X(i,:) - C(y(i),:)
    caffe_sub(K_, bottom_data + i * K_, center + label_value * K_, distance_data + i * K_);
  }
  Dtype dot = caffe_cpu_dot(M_ * K_, distance_.cpu_data(), distance_.cpu_data());
  Dtype loss = dot / M_ / Dtype(2);
  top[0]->mutable_cpu_data()[0] = loss;
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  // Gradient with respect to centers
  if (this->param_propagate_down_[0]) {
    const Dtype* label = bottom[1]->cpu_data();
    Dtype* center_diff = this->blobs_[0]->mutable_cpu_diff();
    Dtype* variation_sum_data = variation_sum_.mutable_cpu_data();
    const Dtype* distance_data = distance_.cpu_data();

    // \sum_{y_i==j}
    caffe_set(N_ * K_, (Dtype)0., variation_sum_.mutable_cpu_data());
    for (int n = 0; n < N_; n++) {
      int count = 0;
      for (int m = 0; m < M_; m++) {
        const int label_value = static_cast<int>(label[m]);
        if (label_value == n) {
          count++;
          caffe_sub(K_, variation_sum_data + n * K_, distance_data + m * K_, variation_sum_data + n * K_);
        }
      }
      caffe_axpy(K_, (Dtype)1./(count + (Dtype)1.), variation_sum_data + n * K_, center_diff + n * K_);
    }
  }
  // Gradient with respect to bottom data 
  if (propagate_down[0]) {
    caffe_copy(M_ * K_, distance_.cpu_data(), bottom[0]->mutable_cpu_diff());
    caffe_scal(M_ * K_, top[0]->cpu_diff()[0] / M_, bottom[0]->mutable_cpu_diff());
  }
  if (propagate_down[1]) {
    LOG(FATAL) << this->type()
               << " Layer cannot backpropagate to label inputs.";
  }
}

#ifdef CPU_ONLY
STUB_GPU(CenterLossLayer);
#endif

INSTANTIATE_CLASS(CenterLossLayer);
REGISTER_LAYER_CLASS(CenterLoss);

}  // namespace caffe

center_loss_layer.cpp

#include 

#include "caffe/filler.hpp"
#include "caffe/layers/center_loss_layer.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {

template <typename Dtype>
void CenterLossLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const int num_output = this->layer_param_.center_loss_param().num_output();  
  N_ = num_output;
  const int axis = bottom[0]->CanonicalAxisIndex(
      this->layer_param_.center_loss_param().axis());
  // Dimensions starting from "axis" are "flattened" into a single
  // length K_ vector. For example, if bottom[0]'s shape is (N, C, H, W),
  // and axis == 1, N inner products with dimension CHW are performed.
  K_ = bottom[0]->count(axis);
  // Check if we need to set up the weights
  if (this->blobs_.size() > 0) {
    LOG(INFO) << "Skipping parameter initialization";
  } else {
    this->blobs_.resize(1);
    // Intialize the weight
    vector<int> center_shape(2);
    center_shape[0] = N_;
    center_shape[1] = K_;
    this->blobs_[0].reset(new Blob<Dtype>(center_shape));
    // fill the weights
    shared_ptr<Filler<Dtype> > center_filler(GetFiller<Dtype>(
        this->layer_param_.center_loss_param().center_filler()));
    center_filler->Fill(this->blobs_[0].get());

  }  // parameter initialization
  this->param_propagate_down_.resize(this->blobs_.size(), true);
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  CHECK_EQ(bottom[1]->channels(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  M_ = bottom[0]->num();
  // The top shape will be the bottom shape with the flattened axes dropped,
  // and replaced by a single axis with dimension num_output (N_).
  LossLayer<Dtype>::Reshape(bottom, top);
  distance_.ReshapeLike(*bottom[0]);
  variation_sum_.ReshapeLike(*this->blobs_[0]);
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* label = bottom[1]->cpu_data();
  const Dtype* center = this->blobs_[0]->cpu_data();
  Dtype* distance_data = distance_.mutable_cpu_data();
  
  // the i-th distance_data
  for (int i = 0; i < M_; i++) {
    const int label_value = static_cast<int>(label[i]);
    // D(i,:) = X(i,:) - C(y(i),:)
    caffe_sub(K_, bottom_data + i * K_, center + label_value * K_, distance_data + i * K_);
  }
  Dtype dot = caffe_cpu_dot(M_ * K_, distance_.cpu_data(), distance_.cpu_data());
  Dtype loss = dot / M_ / Dtype(2);
  top[0]->mutable_cpu_data()[0] = loss;
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  // Gradient with respect to centers
  if (this->param_propagate_down_[0]) {
    const Dtype* label = bottom[1]->cpu_data();
    Dtype* center_diff = this->blobs_[0]->mutable_cpu_diff();
    Dtype* variation_sum_data = variation_sum_.mutable_cpu_data();
    const Dtype* distance_data = distance_.cpu_data();

    // \sum_{y_i==j}
    caffe_set(N_ * K_, (Dtype)0., variation_sum_.mutable_cpu_data());
    for (int n = 0; n < N_; n++) {
      int count = 0;
      for (int m = 0; m < M_; m++) {
        const int label_value = static_cast<int>(label[m]);
        if (label_value == n) {
          count++;
          caffe_sub(K_, variation_sum_data + n * K_, distance_data + m * K_, variation_sum_data + n * K_);
        }
      }
      caffe_axpy(K_, (Dtype)1./(count + (Dtype)1.), variation_sum_data + n * K_, center_diff + n * K_);
    }
  }
  // Gradient with respect to bottom data 
  if (propagate_down[0]) {
    caffe_copy(M_ * K_, distance_.cpu_data(), bottom[0]->mutable_cpu_diff());
    caffe_scal(M_ * K_, top[0]->cpu_diff()[0] / M_, bottom[0]->mutable_cpu_diff());
  }
  if (propagate_down[1]) {
    LOG(FATAL) << this->type()
               << " Layer cannot backpropagate to label inputs.";
  }
}

#ifdef CPU_ONLY
STUB_GPU(CenterLossLayer);
#endif

INSTANTIATE_CLASS(CenterLossLayer);
REGISTER_LAYER_CLASS(CenterLoss);

}  // namespace caffe

center_loss_layer.cu

#include 

#include "caffe/filler.hpp"
#include "caffe/layers/center_loss_layer.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {

template <typename Dtype>
__global__ void Compute_distance_data_gpu(int nthreads, const int K, const Dtype* bottom,
	      const Dtype* label, const Dtype* center, Dtype* distance) {
  CUDA_KERNEL_LOOP(index, nthreads) {
    int m = index / K;
    int k = index % K;
    const int label_value = static_cast<int>(label[m]);
    // distance(i) = x(i) - c_{y(i)}
    distance[index] = bottom[index] - center[label_value * K + k];
  }
}

template <typename Dtype>
__global__ void Compute_center_diff_gpu(int nthreads, const int M, const int K, 
        const Dtype* label, const Dtype* distance, Dtype* variation_sum, 
        Dtype* center_diff) {
  CUDA_KERNEL_LOOP(index, nthreads) {
    int count = 0;
    for (int m = 0; m < M; m++) {
      const int label_value = static_cast<int>(label[m]);
      if (label_value == index) {
        count++;
        for (int k = 0; k < K; k++) {
          variation_sum[index * K + k] -= distance[m * K + k];
        }
      }
    }
    for (int k = 0; k < K; k++) {
      center_diff[index * K + k] = variation_sum[index * K + k] /(count + (Dtype)1.);
    }
  }
}


template <typename Dtype>
void CenterLossLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  int nthreads = M_ * K_;
  Compute_distance_data_gpu<Dtype><<<CAFFE_GET_BLOCKS(nthreads),
      CAFFE_CUDA_NUM_THREADS>>>(nthreads, K_, bottom[0]->gpu_data(), bottom[1]->gpu_data(),
                                this->blobs_[0]->gpu_data(), distance_.mutable_gpu_data());
  Dtype dot;
  caffe_gpu_dot(M_ * K_, distance_.gpu_data(), distance_.gpu_data(), &dot);
  Dtype loss = dot / M_ / Dtype(2);
  top[0]->mutable_cpu_data()[0] = loss;
}

template <typename Dtype>
void CenterLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  int nthreads = N_;
  caffe_gpu_set(N_ * K_, (Dtype)0., variation_sum_.mutable_cpu_data());
  Compute_center_diff_gpu<Dtype><<<CAFFE_GET_BLOCKS(nthreads),
      CAFFE_CUDA_NUM_THREADS>>>(nthreads, M_, K_, bottom[1]->gpu_data(), distance_.gpu_data(), 
                                variation_sum_.mutable_cpu_data(), this->blobs_[0]->mutable_gpu_diff());

  if (propagate_down[0]) {
    caffe_gpu_scale(M_ * K_, top[0]->cpu_diff()[0] / M_, 
                             distance_.gpu_data(), bottom[0]->mutable_gpu_diff());
  }
  if (propagate_down[1]) {
    LOG(FATAL) << this->type()
               << " Layer cannot backpropagate to label inputs.";
  }
}

INSTANTIATE_LAYER_GPU_FUNCS(CenterLossLayer);

}  // namespace caffe

Caffe下关于DepthwiseConvolution、shuffle channel、CenterLoss的实现至此已介绍完毕。

希望能帮到大家。谢谢
2019.7.12

你可能感兴趣的:(Caffe)

分布式二级缓存组件实战（Redis+Caffeine实现）鸨哥学JAVA 程序员 Java 编程 redis 缓存分布式
所谓二级缓存缓存就是将数据从读取较慢的介质上读取出来放到读取较快的介质上，如磁盘-->内存。平时我们会将数据存储到磁盘上，如：数据库。如果每次都从数据库里去读取，会因为磁盘本身的IO影响读取速度，所以就有了像redis这种的内存缓存。可以将数据读取出来放到内存里，这样当需要获取数据时，就能够直接从内存中拿到数据返回，能够很大程度的提高速度。但是一般redis是单独部署成集群，所以会有网络IO上的消
使用 Caffeine 和 Redis 实现高效的二级缓存架构微技术 redis 架构数据库缓存
在现代应用开发中，缓存是提升系统性能的关键手段。为了兼顾本地缓存的高性能和分布式缓存的扩展能力，常见的实现方式是结合使用Caffeine和Redis实现二级缓存架构。本文将详细介绍如何通过SpringBoot实现一个Caffeine+Redis二级缓存，并通过合理的架构设计和代码实现，确保缓存的一致性、性能和容错性。一、需求与挑战1.多级缓存的需求：•一级缓存（Caffeine）：快速响应，存储本
如何确保热点产品查询延迟控制在10ms以内?思维导图代码示例（java 架构) 用心去追梦 java 架构开发语言
为了确保热点产品查询的延迟控制在10ms以内，可以采取一系列优化措施和技术手段。以下是一个思维导图的结构和一个简化的Java架构代码示例，用于展示如何实现这一目标。思维导图结构低延迟查询数据预加载热点数据预测提前加载到内存缓存使用高性能缓存内存级缓存（如Caffeine）分布式缓存（如Redis）缓存一致性管理弱一致性模型缓存更新策略（写后失效、读时更新等）并发处理多线程/异步编程线程池管理数据库
Spring Cache自定义过期时间
背景要求：对数据做统计分析，时间截止到当天零点根据要求，每天查询的数据范围都是截止前一天结束，第二天需要查询新数据。那么缓存只保留一天。使用caffeine简单举个例子，主要依赖有：org.springframework.bootspring-boot-starter-web2.7.18org.springframework.bootspring-boot-starter-cache2.7.18c
深度学习框架人工智能操作系统训练&前向推理 PyTorch Tensorflow MindSpore caffe 张量加速引擎TBE 深度学习编译器多面体 polyhedral AI集群框架 EwenWanW 深度学习人工智能 pytorch 深度学习编译器
深度学习框架人工智能操作系统训练&前向推理深度学习框架发展到今天，目前在架构上大体已经基本上成熟并且逐渐趋同。无论是国外的Tensorflow、PyTorch，亦或是国内最近开源的MegEngine、MindSpore，目前基本上都是支持EagerMode和GraphMode两种模式。AI嵌入式框架OneFlow&清华计图Jittor&华为深度学习框架MindSpore&旷视深度学习框架MegEn
Caffeine 与 Guava Cache 雨季里的向日葵 java
一、概要1.1背景在项目开发中，为提升系统性能，减少IO开销，本地缓存是必不可少的。最常见的本地缓存是Guava和Caffeine，Caffeine是基于GoogleGuavaCache设计经验改进的结果，相较于Guava在性能和命中率上更具有效率。1.2应用场景愿意消耗一些内存空间来提升速度预料到某些键会被多次查询缓存中存放的数据总量不会超出内存容量二、GuavaCache2.1GuavaCac
OSError: [WinError 126] 找不到指定的模块---caffe2_detectron_ops_gpu.dll 努力的小柚 python运行问题 python pytorch
代码复现记录：问题：OSError:[WinError126]找不到指定的模块。Errorloading"C:\Anaconda\Anaconda3\envs\TIN\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll"oroneofitsdependencies.在搜索很多关于无法查找到caffe2_detectron_ops_gpu
caffe/PyTorch/TensorFlow 在Jupyter Notebook GPU中运用俊俏的萌妹纸 caffe 人工智能深度学习
在JupyterNotebook中使用Caffe框架并利用GPU加速，可以实现多种效果和目的，主要集中在深度学习领域。以下是一些主要的应用场景：快速训练模型：GPU加速可以显著提高模型训练的速度。对于大型数据集和复杂的神经网络结构，使用GPU可以大大减少训练时间。实时数据增强：在训练过程中，可以实时地对输入数据进行变换和增强，以提高模型的泛化能力。GPU加速使得这些操作更加高效。大规模数据处理：深
Linux下Caffe、Docker、Tensorflow、PyTorch环境搭建(CentOS 7) SnailTyan
文章作者：Tyan博客：noahsnail.com|CSDN|注：模型的训练、测试、部署都可以通过Docker环境完成，环境问题会更少。1.CUDA8.0安装CUDA8.0Configenvvariables#CUDAPATHexportPATH="/usr/local/cuda-8.0/bin:$PATH"#CUDALDLIBRARY_PATHexportLD_LIBRARY_PATH="/us
JVM级缓存本地缓存Caffeine 旺仔爱Java JVM专题 jvm JVM缓存本地缓存 Caffeine Guava Cache
JVM级缓存本地缓存Caffeine和GuavaCache前言一、创建缓存的代码逻辑二、Caffeine的优化方面淘汰算法W-TinyLFU三、Caffeine的业务使用总结前言最新的Java面试题，技术栈涉及Java基础、集合、多线程、Mysql、分布式、Spring全家桶、MyBatis、Dubbo、缓存、消息队列、Linux…等等，会持续更新。一、创建缓存的代码逻辑Caffeine：publ
面试redis篇-04缓存雪崩卡搜偶缓存面试 redis
原理缓存雪崩：是指在同一时段大量的缓存key同时失效或者Redis服务宕机，导致大量请求到达数据库，带来巨大压力。解决方案：给不同的Key的TTL添加随机值利用Redis集群提高服务的可用性（哨兵模式、集群模式）给缓存业务添加降级限流策略（ngxin或springcloudgateway）给业务添加多级缓存（Guava或Caffeine）问答面试官：什么是缓存雪崩?怎么解决?回答：缓存雪崩意思是设
深度学习主流开源框架：Caffe、TensorFlow、Pytorch、Theano、Keras、MXNet、Chainer seasonsyy 深度学习小知识深度学习开源框架 pytorch
2.6深度学习主流开源框架表2.1深度学习主流框架参数对比框架关键词总结框架关键词基本数据结构（都是高维数组）Caffe“在工业中应用较为广泛”，“编译安装麻烦一点”BlobTensorFlow“安装简单pip”TensorPytorch“定位：快速实验研究”，“简单”，“灵活”TensorTheano×“用于处理大规模神经网络的训练”，“不支持移动设备”，“不能应用于工业环境”，“编译复杂模型时
MMsegmentation-随机初始化 SatVision炼金士 mmalb-炼金术 python
系列文章目录文章目录系列文章目录前言一、初始化单个模块二、初始化多个模块总结前言mmlab下游分支调用权重随机初始化使用参考mmengine的说明文档mmengine支持模型初始化方法包括：BaseInit,Caffe2XavierInit,ConstantInit,KaimingInit,NormalInit,PretrainedInit,TruncNormalInit,UniformInit,
解决：源码安装caffe时遇到libcudnn.so: file not recognized问题 Gracie丹妮
参考教程(19条消息)ubuntu16.04下Detectron+caffe2(Pytorch)安装配置过程_张家坎的博客-CSDN博客_caffe2_detectron_ops_gpu.dllhttps://blog.csdn.net/u014236392/article/details/81117287安装caffe2执行sudomakeinstall之后遇到如下问题:/home/Xdn/cu
进场行礼问候退场东方芭蕾Lily
1.当听到响铃声，按编号排队依次进入考场。tips：面带微笑，优雅自信且有礼貌的边看着考试官边跑到准备问好的位置。步伐轻盈像一阵风样，到位置站好一位脚，保持挺拔向上体态。小仙女就是你们。2.行礼问候Examier:(考试官)GillianMccafferyGoodmorning/afternoongirlsGoodmorning/afrernoonmadamorMs.MccafferyQuesti
YOLOv5独家改进：上采样算子 | 超轻量高效动态上采样DySample，效果秒杀CAFFE，助力小目标检测 AI小怪兽 YOLOv5原创自研 YOLO caffe 目标检测深度学习人工智能
本文独家改进：一种超轻量高效动态上采样DySample，具有更少的参数、FLOPs，效果秒杀CAFFE和YOLOv5网络中的nn.Upsample在多个数据集下验证能够涨点，尤其在小目标检测领域涨点显著。收录YOLOv5原创自研https://blog.csdn.net/m0_63774211/category_12511931.html全网独家首发创新（原创），适合paper！！！2024年计算
caffez转ncnn，及环境配置宁静深远软件安装
一、安装ncnn1、安装protobuf(a)、gitclonehttps://github.com/google/protobuf(b)、自动生成configure配置文件，运行：./autogen.sh(c)、配置环境：./configure(d)、编译源代码:make(e)、安装：sudomakeinstall(f)、刷新动态库:sudoldconfig2、安装ncnn(a)、mkdirco
最新姿态估计研究进展 a微风掠过
最新姿态估计研究进展自上而下：就是先检测包含人的框，即humanproposal，然后对框子中的人进行姿态估计。一般RCNN（区域CNN就是这个思路）自下而上：先检测keypoint，然后根据热力图、点与点之间连接的概率，根据图论知识，基于PAF（部分亲和字段）将关键点连接起来，将关键点分组到人。1、CMU：openpose研究多人的姿态估计运行环境：caffe自下而上，关键点被分组到人的实例时间
智慧云智能教育考试平台展示 barry200890 springboot vue 考试 java vue.js 小程序
智慧云智能教育平台项目简介技术架构1.1后端技术栈:*基于SpringBoot+MybatisPlus+Shiro+mysql5.7+redis+websocket构建.*使用jdk1.8的新特性如:caffeine缓存,lambda表达式.1.2前端技术:*Vue*Vuex*Vxe-Table(文档地址：https://gitee.com/xuliangzhan_admin/vxe-table)
what is SSD|Single Shot MultiBox Detector Woooooooooooooo
文章摘选自多篇文章，仅用于学习，在此表示感谢，若有侵权请联系，感谢论文下载地址：https://arxiv.org/abs/1512.02325论文代码：https://github.com/weiliu89/caffe/tree/ssd省去了区域建议网络，直接使用不同尺度featuremap中的cell得到priodbox（和anchor类似），利用卷积可以直接得到box的回归和score而不需
caffe中的参考模型雨住多一横
RCNNmode_reference_rcnn_ilsvrc13l.pngcaffenet用于Flickrstyle数据集model_finetune_flickr_style.pngAlexNetmodel_alexnet.pnggooglenetmodel_googlenet.pngcaffenetmodel_reference_caffenet.png
RT-DETR算法优化改进：上采样算子 | 超轻量高效动态上采样DySample，效果秒杀CAFFE，助力小目标检测 AI小怪兽 RT-DETR魔术师算法 caffe 目标检测 YOLO 深度学习人工智能
本文独家改进：一种超轻量高效动态上采样DySample，具有更少的参数、FLOPs，效果秒杀CAFFE和YOLOv8网络中的nn.Upsample在多个数据集下验证能够涨点，尤其在小目标检测领域涨点显著。RT-DETR魔术师专栏介绍：https://blog.csdn.net/m0_63774211/category_12497375.html✨✨✨魔改创新RT-DETR引入前沿顶会创新（CVPR
「性能提升」扩展 Spring Cache 支持多级缓存冷冷zz
为什么多级缓存缓存的引入是现在大部分系统所必须考虑的redis作为常用中间件，虽然我们一般业务系统（毕竟业务量有限）不会遇到如下图在随着data-size的增大和数据结构的复杂的造成性能下降，但网络IO消耗会成为整个调用链路中不可忽视的部分。尤其在微服务架构中，一次调用往往会涉及多次调用例如pigoauth2.0的client认证Caffeine来自未来的本地内存缓存,性能比如常见的内存缓存实现性
Spring Cache duration～ spring-boot spring java 后端
目录标题SpringCache1介绍2常用注解3入门SpringCache1介绍SpringCache是一个框架，实现了基于注解的缓存功能，只需要简单地加一个注解，就能实现缓存功能。SpringCache提供了一层抽象，底层可以切换不同的缓存实现，例如：EHCacheCaffeineRedis(常用)起步依赖：org.springframework.bootspring-boot-starter-
Caffeine与Spring cache的各种注解操作 500了 spring java 后端
前言Caffeine是一个基于Java8的进程内缓存框架，它使用乐观锁技术来提高并发吞吐量，并被誉为最快的缓存之一。Caffeine是内存型缓存，即缓存与调用者属于同一个应用，具体地说是属于同一个JVM。它的设计目标是提供高性能、高命中率以及低内存占用的本地缓存解决方案，被描述为GuavaCache的加强版和“新一代缓存”。关于Caffeine的使用，其提供了多种灵活的配置选项：自动加载数据：可以
缓存组件Caffeine的使用月月大王 Java #工具类缓存
caffeine是一个高性能的缓存组件，在需要缓存数据，但数据量不算太大，不想引入redis的时候，caffeine就是一个不错的选择。可以把caffeine理解为一个简单的redis。1、导入依赖com.github.ben-manes.caffeinecaffeine2.9.3导入是要注意版本，最开始我用的版本是3.1.1，不过启动是的时候会报错，这是因为我用的是jdk1.8，需要降低一下版本
Makefile.config walkMAN_aholic
##Refertohttp://caffe.berkeleyvision.org/installation.html#Contributionssimplifyingandimprovingourbuildsystemarewelcome!#cuDNNaccelerationswitch(uncommenttobuildwithcuDNN).USE_CUDNN:=1#CPU-onlyswitch(
缓存Caffeine之W-TinyLFU淘汰策略 georgesnoopy guava 缓存 java 淘汰策略 Caffeine
我们常见的缓存是基于内存的缓存，但是单机的内存是有限的，不能让缓存数据撑爆内存，所有需要缓存淘汰机制。https://mp.csdn.net/editor/html/115872837中大概说明了LRU的缓存淘汰机制，以及基于LRU的著名实现guavacache。除了LRU淘汰策略外，其是常见的还有FIFO以及LFU，只是说目前用的最多的是LRU。LRULRU记录了缓存中数据项的访问时间，在缓存数
Caffeine史上最快的内存缓存奇遇少年缓存 java
引言在现代的Web应用程序中，缓存是提升性能，减少数据库负载，加快响应速度的关键技术之一。SpringBoot作为一个简化Spring应用开发的框架，提供了与多种缓存技术集成的支持。Caffeine是一个高性能，灵活的缓存库，它可以作为本地缓存在Java应用中广泛使用。本文将详细介绍如何在SpringBoot项目中集成Caffeine缓存，并通过一个实例来展示它的使用。什么是Caffeine缓存？
如何解决caffe和video-caffe不能使用cudnn8编译的问题 Arnold-FY-Chen video-caffe 深度学习 Caffe video-caffe caffe 深度学习 cudnn8 cudnn
因为caffe之类的代码很久不更新了，只支持到了使用cudnn7.x，在使用了cudnn8的环境下编译caffe或video-caffe时，会在src/caffe/layers/cudnn_conv_layer.cpp等文件里出错：error:identifier"CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT"isundefinederror:iden
mysql主从数据同步林鹤霄 mysql主从数据同步
配置mysql5.5主从服务器(转) 教程开始：一、安装MySQL 说明：在两台MySQL服务器192.168.21.169和192.168.21.168上分别进行如下操作，安装MySQL 5.5.22 二、配置MySQL主服务器（192.168.21.169）mysql -uroot -p &nb
oracle学习笔记 caoyong oracle
1、ORACLE的安装 a>、ORACLE的版本 8i,9i : i是internet 10g,11g : grid (网格) 12c : cloud (云计算) b>、10g不支持win7 &
数据库，SQL零基础入门天子之骄 sql 数据库入门基本术语
数据库，SQL零基础入门做网站肯定离不开数据库，本人之前没怎么具体接触SQL，这几天起早贪黑得各种入门，恶补脑洞。一些具体的知识点，可以让小白不再迷茫的术语，拿来与大家分享。数据库，永久数据的一个或多个大型结构化集合，通常与更新和查询数据的软件相关
pom.xml 一炮送你回车库 pom.xml
1、一级元素dependencies是可以被子项目继承的 2、一级元素dependencyManagement是定义该项目群里jar包版本号的，通常和一级元素properties一起使用，既然有继承，也肯定有一级元素modules来定义子元素 3、父项目里的一级元素<modules> <module>lcas-admin-war</module> <
sql查地区省市县 3213213333332132 sql mysql
-- db_yhm_city SELECT * FROM db_yhm_city WHERE class_parent_id = 1 -- 海南 class_id = 9 港、奥、台 class_id = 33、34、35 SELECT * FROM db_yhm_city WHERE class_parent_id =169 SELECT d1.cla
关于监听器那些让人头疼的事宝剑锋梅花香画图板监听器鼠标监听器
本人初学JAVA，对于界面开发我只能说有点蛋疼，用JAVA来做界面的话确实需要一定的耐心（不使用插件，就算使用插件的话也没好多少）既然Java提供了界面开发，老师又要求做，只能硬着头皮上啦。但是监听器还真是个难懂的地方，我是上了几次课才略微搞懂了些。
JAVA的遍历MAP darkranger map
Java Map遍历方式的选择 1. 阐述　　对于Java中Map的遍历方式，很多文章都推荐使用entrySet，认为其比keySet的效率高很多。理由是：entrySet方法一次拿到所有key和value的集合；而keySet拿到的只是key的集合，针对每个key，都要去Map中额外查找一次value，从而降低了总体效率。那么实际情况如何呢？　　为了解遍历性能的真实差距，包括在遍历ke
POJ 2312 Battle City 优先多列+bfs aijuans 搜索
来源：http://poj.org/problem?id=2312 题意：题目背景就是小时候玩的坦克大战，求从起点到终点最少需要多少步。已知S和R是不能走得，E是空的，可以走，B是砖，只有打掉后才可以通过。思路：很容易看出来这是一道广搜的题目，但是因为走E和走B所需要的时间不一样，因此不能用普通的队列存点。因为对于走B来说，要先打掉砖才能通过，所以我们可以理解为走B需要两步，而走E是指需要1
Hibernate与Jpa的关系，终于弄懂 avords java Hibernate 数据库 jpa
我知道Jpa是一种规范，而Hibernate是它的一种实现。除了Hibernate，还有EclipseLink(曾经的toplink)，OpenJPA等可供选择，所以使用Jpa的一个好处是，可以更换实现而不必改动太多代码。在play中定义Model时，使用的是jpa的annotations，比如javax.persistence.Entity, Table, Column, OneToMany
酸爽的console.log bee1314 console
在前端的开发中，console.log那是开发必备啊，简直直观。通过写小函数，组合大功能。更容易测试。但是在打版本时，就要删除console.log，打完版本进入开发状态又要添加，真不够爽。重复劳动太多。所以可以做些简单地封装，方便开发和上线。 /** * log.js hufeng * The safe wrapper for `console.xxx` functions *
哈佛教授：穷人和过于忙碌的人有一个共同思维特质 bijian1013 时间管理励志人生穷人过于忙碌
一个跨学科团队今年完成了一项对资源稀缺状况下人的思维方式的研究，结论是：穷人和过于忙碌的人有一个共同思维特质，即注意力被稀缺资源过分占据，引起认知和判断力的全面下降。这项研究是心理学、行为经济学和政策研究学者协作的典范。　　这个研究源于穆来纳森对自己拖延症的憎恨。他7岁从印度移民美国，很快就如鱼得水，哈佛毕业
other operate 征客丶 OS osx
一、Mac Finder 设置排序方式，预览栏在显示－》查看显示选项中二、有时预览显示时，卡死在那，有可能是一些临时文件夹被删除了，如：/private/tmp[有待验证] -------------------------------------------------------------------- 若有其他凝问或文中有错误，请及时向我指出，我好及时改正，同时也让我们一
【Scala五】分析Spark源代码总结的Scala语法三 bit1129 scala
1. If语句作为表达式 val properties = if (jobIdToActiveJob.contains(jobId)) { jobIdToActiveJob(stage.jobId).properties } else { // this stage will be assigned to "default" po
ZooKeeper 入门 BlueSkator 中间件 zk
ZooKeeper是一个高可用的分布式数据管理与系统协调框架。基于对Paxos算法的实现，使该框架保证了分布式环境中数据的强一致性，也正是基于这样的特性，使得ZooKeeper解决很多分布式问题。网上对ZK的应用场景也有不少介绍，本文将结合作者身边的项目例子，系统地对ZK的应用场景进行一个分门归类的介绍。值得注意的是，ZK并非天生就是为这些应用场景设计的，都是后来众多开发者根据其框架的特性，利
MySQL取得当前时间的函数是什么格式化日期的函数是什么 BreakingBad mysql Date
取得当前时间用 now() 就行。在数据库中格式化时间用DATE_FORMA T(date, format) . 根据格式串format 格式化日期或日期和时间值date，返回结果串。可用DATE_FORMAT( ) 来格式化DATE 或DATETIME 值，以便得到所希望的格式。根据format字符串格式化date值: %S, %s 两位数字形式的秒（ 00,01,
读《研磨设计模式》-代码笔记-组合模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; abstract class Component { public abstract void printStruct(Str
4_JAVA+Oracle面试题(有答案) chenke oracle
基础测试题卷面上不能出现任何的涂写文字，所有的答案要求写在答题纸上，考卷不得带走。选择题 1、 What will happen when you attempt to compile and run the following code? （3） public class Static { static { int x = 5; // 在static内有效 } st
新一代工作流系统设计目标 comsci 工作算法脚本
用户只需要给工作流系统制定若干个需求，流程系统根据需求，并结合事先输入的组织机构和权限结构，调用若干算法，在流程展示版面上面显示出系统自动生成的流程图，然后由用户根据实际情况对该流程图进行微调，直到满意为止，流程在运行过程中，系统和用户可以根据情况对流程进行实时的调整，包括拓扑结构的调整，权限的调整，内置脚本的调整。。。。。在这个设计中，最难的地方是系统根据什么来生成流
oracle 行链接与行迁移 daizj oracle 行迁移
表里的一行对于一个数据块太大的情况有二种(一行在一个数据块里放不下) 第一种情况: INSERT的时候，INSERT时候行的大小就超一个块的大小。Oracle把这行的数据存储在一连串的数据块里(Oracle Stores the data for the row in a chain of data blocks)，这种情况称为行链接(Row Chain)，一般不可避免(除非使用更大的数据
[JShop]开源电子商务系统jshop的系统缓存实现 dinguangx jshop 电子商务
前言 jeeshop中通过SystemManager管理了大量的缓存数据，来提升系统的性能，但这些缓存数据全部都是存放于内存中的，无法满足特定场景的数据更新（如集群环境）。JShop对jeeshop的缓存机制进行了扩展，提供CacheProvider来辅助SystemManager管理这些缓存数据，通过CacheProvider,可以把缓存存放在内存,ehcache,redis，memcache
初三全学年难记忆单词 dcj3sjt126com english word
several 儿子；若干 shelf 架子 knowledge 知识；学问 librarian 图书管理员 abroad 到国外，在国外 surf 冲浪 wave 浪；波浪 twice 两次；两倍 describe 描写；叙述 especially 特别；尤其 attract 吸引 prize 奖品；奖赏 competition 比赛；竞争 event 大事；事件 O
sphinx实践 dcj3sjt126com sphinx
安装参考地址:http://briansnelson.com/How_to_install_Sphinx_on_Centos_Server yum install sphinx 如果失败的话使用下面的方式安装 wget http://sphinxsearch.com/files/sphinx-2.2.9-1.rhel6.x86_64.rpm yum loca
JPA之JPQL（三） frank1234 orm jpa JPQL
1 什么是JPQL JPQL是Java Persistence Query Language的简称，可以看成是JPA中的HQL， JPQL支持各种复杂查询。 2 检索单个对象 @Test public void querySingleObject1() { Query query = em.createQuery("sele
Remove Duplicates from Sorted Array II hcx2013 remove
Follow up for "Remove Duplicates":What if duplicates are allowed at most twice? For example,Given sorted array nums = [1,1,1,2,2,3], Your function should return length
Spring4新特性——Groovy Bean定义DSL jinnianshilongnian spring 4
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
CentOS安装Mysql5.5 liuxingguome centos
CentOS下以RPM方式安装MySQL5.5 首先卸载系统自带Mysql： yum remove mysql mysql-server mysql-libs compat-mysql51 rm -rf /var/lib/mysql rm /etc/my.cnf 查看是否还有mysql软件： rpm -qa|grep mysql 去http://dev.mysql.c
第14章工具函数（下） onestopweb 函数
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
POJ 1050 SaraWon 二维数组子矩阵最大和
POJ ACM第1050题的详细描述，请参照 http://acm.pku.edu.cn/JudgeOnline/problem?id=1050 题目意思：给定包含有正负整型的二维数组，找出所有子矩阵的和的最大值。如二维数组 0 -2 -7 0 9 2 -6 2 -4 1 -4 1 -1 8 0 -2 中和最大的子矩阵是 9 2 -4 1 -1 8 且最大和是15
Java8全新打造，英语学习supertool yangshangchuan java superword 闭包 java8 函数式编程
superword是一个Java实现的英文单词分析软件，主要研究英语单词音近形似转化规律、前缀后缀规律、词之间的相似性规律等等。Clean code、Fluent style、Java8 feature: Lambdas, Streams and Functional-style Programming。升学考试、工作求职、充电提高，都少不了英语的身影，英语对我们来说实在太重要