Caffe + CUDNN V5

在今年 GTC 上,NVIDIA 推出的 NVIDIA Deep Learning  SDK  中,cuDNN 赫然在列。与之比肩的还有 DIGITS, cuBLAS, cuSPARSE, NCCL 等。


从 2014 年推出 第一个版本,到如今 cuDNN 已经连续发布 5 个版本。


目前 cuDNN v5 的最新特性有:


(1) 支持递归神经网络(  LSTM  / GRU /  RNN  );


(2) cudnnConvolutionForward()  和 cudnnConvolutionBackwardData() 现已增加了 3维 FFT 分块算法;


(3) cudnnConvolutionForward()  和 cudnnConvolutionBackwardData() 现已支持新算法:
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD 和 CUDNN_CONVOLUTION_BWD_ALGO_WINOGRAD 。 
该算法降低了 3x3 卷积计算复杂度。


(4) 增加了两种新层:
Bilinear Spatial Transformer

Dropout





目前官方的 Caffe 只支持 CUDNN V4。 为了用上 V5,需要手动改代码。


先看下 V5 的这个函数:

/* Function to perform backward activation  */
cudnnStatus_t CUDNNWINAPI cudnnActivationBackward(
                                cudnnHandle_t                       handle,
                                cudnnActivationDescriptor_t         activationDesc,
                                const void                         *alpha,
                                const cudnnTensorDescriptor_t       yDesc,
                                const void                         *y,
                                const cudnnTensorDescriptor_t       dyDesc,
                                const void                         *dy,
                                const cudnnTensorDescriptor_t       xDesc,
                                const void                         *x,
                                const void                         *beta,
                                const cudnnTensorDescriptor_t       dxDesc,
                                void                               *dx );

而 Caffe 中调用代码为:

  CUDNN_CHECK(cudnnActivationBackward(this->handle_,
        CUDNN_ACTIVATION_RELU,
        cudnn::dataType<Dtype>::one,
        this->top_desc_, top_data, this->top_desc_, top_diff,
        this->bottom_desc_, bottom_data,
        cudnn::dataType<Dtype>::zero,
        this->bottom_desc_, bottom_diff));

差异主要在第二个参数。

  /*
 * activation mode
 */
typedef enum
{
    CUDNN_ACTIVATION_SIGMOID      = 0,
    CUDNN_ACTIVATION_RELU         = 1,
    CUDNN_ACTIVATION_TANH         = 2,
    CUDNN_ACTIVATION_CLIPPED_RELU = 3
} cudnnActivationMode_t;


 V4 直接使用了 cudnnActivationMode_t,在 V5 中则换为新的 cudnnActivationDescriptor_t :

cudnnStatus_t CUDNNWINAPI cudnnCreateActivationDescriptor(
                                cudnnActivationDescriptor_t        *activationDesc);

cudnnStatus_t CUDNNWINAPI cudnnSetActivationDescriptor(
                                cudnnActivationDescriptor_t         activationDesc,
                                cudnnActivationMode_t               mode,
                                cudnnNanPropagation_t               reluNanOpt,
                                double                              reluCeiling );

为了适配,需要将 Caffe 中这部分代码替换为 V5 的形式。涉及的文件:

include/caffe/layers/cudnn_relu_layer.hpp, src/caffe/layers/cudnn_relu_layer.cpp, src/caffe/layers/cudnn_relu_layer.cu

include/caffe/layers/cudnn_sigmoid_layer.hpp, src/caffe/layers/cudnn_sigmoid_layer.cpp, src/caffe/layers/cudnn_sigmoid_layer.cu

include/caffe/layers/cudnn_tanh_layer.hpp, src/caffe/layers/cudnn_tanh_layer.cpp, src/caffe/layers/cudnn_tanh_layer.cu



在 阿里云 HPC 上对 Caffe + CUDNN V5 编译成功后进行 ldd 结果如下:

caffe]# ldd ./build/tools/caffe.bin  | grep cudnn
	libcudnn.so.5 => /disk1/deeplearning/local_install/lib/libcudnn.so.5 (0x00007f14a2339000)


运行 计时,得到输出:

caffe]# ./build/tools/caffe.bin time -model models/bvlc_reference_caffenet/deploy.prototxt  -gpu 0
I0415 17:57:29.857424 45092 caffe.cpp:308] Use GPU with device ID 0
I0415 17:57:30.240999 45092 net.cpp:49] Initializing net from parameters:
name: "CaffeNet"
state {
  phase: TRAIN
}
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 10
      dim: 3
      dim: 227
      dim: 227
    }
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8"
  top: "prob"
}
I0415 17:57:30.242352 45092 layer_factory.hpp:77] Creating layer data
I0415 17:57:30.242380 45092 net.cpp:91] Creating Layer data
I0415 17:57:30.242396 45092 net.cpp:399] data -> data
I0415 17:57:30.253039 45092 net.cpp:141] Setting up data
I0415 17:57:30.253085 45092 net.cpp:148] Top shape: 10 3 227 227 (1545870)
I0415 17:57:30.253098 45092 net.cpp:156] Memory required for data: 6183480
I0415 17:57:30.253118 45092 layer_factory.hpp:77] Creating layer conv1
I0415 17:57:30.253144 45092 net.cpp:91] Creating Layer conv1
I0415 17:57:30.253157 45092 net.cpp:425] conv1 <- data
I0415 17:57:30.253180 45092 net.cpp:399] conv1 -> conv1
I0415 17:57:30.489358 45092 net.cpp:141] Setting up conv1
I0415 17:57:30.489420 45092 net.cpp:148] Top shape: 10 96 55 55 (2904000)
I0415 17:57:30.489434 45092 net.cpp:156] Memory required for data: 17799480
I0415 17:57:30.489470 45092 layer_factory.hpp:77] Creating layer relu1
I0415 17:57:30.489495 45092 net.cpp:91] Creating Layer relu1
I0415 17:57:30.489507 45092 net.cpp:425] relu1 <- conv1
I0415 17:57:30.489526 45092 net.cpp:386] relu1 -> conv1 (in-place)
I0415 17:57:30.489871 45092 net.cpp:141] Setting up relu1
I0415 17:57:30.489892 45092 net.cpp:148] Top shape: 10 96 55 55 (2904000)
I0415 17:57:30.489904 45092 net.cpp:156] Memory required for data: 29415480
I0415 17:57:30.489917 45092 layer_factory.hpp:77] Creating layer pool1
I0415 17:57:30.489933 45092 net.cpp:91] Creating Layer pool1
I0415 17:57:30.489943 45092 net.cpp:425] pool1 <- conv1
I0415 17:57:30.489967 45092 net.cpp:399] pool1 -> pool1
I0415 17:57:30.490033 45092 net.cpp:141] Setting up pool1
I0415 17:57:30.490048 45092 net.cpp:148] Top shape: 10 96 27 27 (699840)
I0415 17:57:30.490059 45092 net.cpp:156] Memory required for data: 32214840
I0415 17:57:30.490070 45092 layer_factory.hpp:77] Creating layer norm1
I0415 17:57:30.490090 45092 net.cpp:91] Creating Layer norm1
I0415 17:57:30.490102 45092 net.cpp:425] norm1 <- pool1
I0415 17:57:30.490113 45092 net.cpp:399] norm1 -> norm1
I0415 17:57:30.490322 45092 net.cpp:141] Setting up norm1
I0415 17:57:30.490339 45092 net.cpp:148] Top shape: 10 96 27 27 (699840)
I0415 17:57:30.490350 45092 net.cpp:156] Memory required for data: 35014200
I0415 17:57:30.490362 45092 layer_factory.hpp:77] Creating layer conv2
I0415 17:57:30.490383 45092 net.cpp:91] Creating Layer conv2
I0415 17:57:30.490393 45092 net.cpp:425] conv2 <- norm1
I0415 17:57:30.490406 45092 net.cpp:399] conv2 -> conv2
I0415 17:57:30.493661 45092 net.cpp:141] Setting up conv2
I0415 17:57:30.493688 45092 net.cpp:148] Top shape: 10 256 27 27 (1866240)
I0415 17:57:30.493701 45092 net.cpp:156] Memory required for data: 42479160
I0415 17:57:30.493717 45092 layer_factory.hpp:77] Creating layer relu2
I0415 17:57:30.493734 45092 net.cpp:91] Creating Layer relu2
I0415 17:57:30.493744 45092 net.cpp:425] relu2 <- conv2
I0415 17:57:30.493757 45092 net.cpp:386] relu2 -> conv2 (in-place)
I0415 17:57:30.493957 45092 net.cpp:141] Setting up relu2
I0415 17:57:30.493981 45092 net.cpp:148] Top shape: 10 256 27 27 (1866240)
I0415 17:57:30.493993 45092 net.cpp:156] Memory required for data: 49944120
I0415 17:57:30.494004 45092 layer_factory.hpp:77] Creating layer pool2
I0415 17:57:30.494019 45092 net.cpp:91] Creating Layer pool2
I0415 17:57:30.494029 45092 net.cpp:425] pool2 <- conv2
I0415 17:57:30.494045 45092 net.cpp:399] pool2 -> pool2
I0415 17:57:30.494102 45092 net.cpp:141] Setting up pool2
I0415 17:57:30.494115 45092 net.cpp:148] Top shape: 10 256 13 13 (432640)
I0415 17:57:30.494128 45092 net.cpp:156] Memory required for data: 51674680
I0415 17:57:30.494139 45092 layer_factory.hpp:77] Creating layer norm2
I0415 17:57:30.494160 45092 net.cpp:91] Creating Layer norm2
I0415 17:57:30.494170 45092 net.cpp:425] norm2 <- pool2
I0415 17:57:30.494184 45092 net.cpp:399] norm2 -> norm2
I0415 17:57:30.494521 45092 net.cpp:141] Setting up norm2
I0415 17:57:30.494545 45092 net.cpp:148] Top shape: 10 256 13 13 (432640)
I0415 17:57:30.494557 45092 net.cpp:156] Memory required for data: 53405240
I0415 17:57:30.494568 45092 layer_factory.hpp:77] Creating layer conv3
I0415 17:57:30.494582 45092 net.cpp:91] Creating Layer conv3
I0415 17:57:30.494592 45092 net.cpp:425] conv3 <- norm2
I0415 17:57:30.494607 45092 net.cpp:399] conv3 -> conv3
I0415 17:57:30.498520 45092 net.cpp:141] Setting up conv3
I0415 17:57:30.498548 45092 net.cpp:148] Top shape: 10 384 13 13 (648960)
I0415 17:57:30.498564 45092 net.cpp:156] Memory required for data: 56001080
I0415 17:57:30.498584 45092 layer_factory.hpp:77] Creating layer relu3
I0415 17:57:30.498600 45092 net.cpp:91] Creating Layer relu3
I0415 17:57:30.498611 45092 net.cpp:425] relu3 <- conv3
I0415 17:57:30.498625 45092 net.cpp:386] relu3 -> conv3 (in-place)
I0415 17:57:30.498832 45092 net.cpp:141] Setting up relu3
I0415 17:57:30.498850 45092 net.cpp:148] Top shape: 10 384 13 13 (648960)
I0415 17:57:30.498863 45092 net.cpp:156] Memory required for data: 58596920
I0415 17:57:30.498878 45092 layer_factory.hpp:77] Creating layer conv4
I0415 17:57:30.498896 45092 net.cpp:91] Creating Layer conv4
I0415 17:57:30.498906 45092 net.cpp:425] conv4 <- conv3
I0415 17:57:30.498920 45092 net.cpp:399] conv4 -> conv4
I0415 17:57:30.502863 45092 net.cpp:141] Setting up conv4
I0415 17:57:30.502892 45092 net.cpp:148] Top shape: 10 384 13 13 (648960)
I0415 17:57:30.502905 45092 net.cpp:156] Memory required for data: 61192760
I0415 17:57:30.502924 45092 layer_factory.hpp:77] Creating layer relu4
I0415 17:57:30.502939 45092 net.cpp:91] Creating Layer relu4
I0415 17:57:30.502954 45092 net.cpp:425] relu4 <- conv4
I0415 17:57:30.502974 45092 net.cpp:386] relu4 -> conv4 (in-place)
I0415 17:57:30.503175 45092 net.cpp:141] Setting up relu4
I0415 17:57:30.503192 45092 net.cpp:148] Top shape: 10 384 13 13 (648960)
I0415 17:57:30.503201 45092 net.cpp:156] Memory required for data: 63788600
I0415 17:57:30.503211 45092 layer_factory.hpp:77] Creating layer conv5
I0415 17:57:30.503227 45092 net.cpp:91] Creating Layer conv5
I0415 17:57:30.503237 45092 net.cpp:425] conv5 <- conv4
I0415 17:57:30.503250 45092 net.cpp:399] conv5 -> conv5
I0415 17:57:30.506659 45092 net.cpp:141] Setting up conv5
I0415 17:57:30.506685 45092 net.cpp:148] Top shape: 10 256 13 13 (432640)
I0415 17:57:30.506706 45092 net.cpp:156] Memory required for data: 65519160
I0415 17:57:30.506731 45092 layer_factory.hpp:77] Creating layer relu5
I0415 17:57:30.506744 45092 net.cpp:91] Creating Layer relu5
I0415 17:57:30.506757 45092 net.cpp:425] relu5 <- conv5
I0415 17:57:30.506770 45092 net.cpp:386] relu5 -> conv5 (in-place)
I0415 17:57:30.506974 45092 net.cpp:141] Setting up relu5
I0415 17:57:30.506992 45092 net.cpp:148] Top shape: 10 256 13 13 (432640)
I0415 17:57:30.507005 45092 net.cpp:156] Memory required for data: 67249720
I0415 17:57:30.507015 45092 layer_factory.hpp:77] Creating layer pool5
I0415 17:57:30.507032 45092 net.cpp:91] Creating Layer pool5
I0415 17:57:30.507043 45092 net.cpp:425] pool5 <- conv5
I0415 17:57:30.507057 45092 net.cpp:399] pool5 -> pool5
I0415 17:57:30.507122 45092 net.cpp:141] Setting up pool5
I0415 17:57:30.507135 45092 net.cpp:148] Top shape: 10 256 6 6 (92160)
I0415 17:57:30.507148 45092 net.cpp:156] Memory required for data: 67618360
I0415 17:57:30.507159 45092 layer_factory.hpp:77] Creating layer fc6
I0415 17:57:30.507181 45092 net.cpp:91] Creating Layer fc6
I0415 17:57:30.507192 45092 net.cpp:425] fc6 <- pool5
I0415 17:57:30.507206 45092 net.cpp:399] fc6 -> fc6
I0415 17:57:30.612267 45092 net.cpp:141] Setting up fc6
I0415 17:57:30.612323 45092 net.cpp:148] Top shape: 10 4096 (40960)
I0415 17:57:30.612334 45092 net.cpp:156] Memory required for data: 67782200
I0415 17:57:30.612360 45092 layer_factory.hpp:77] Creating layer relu6
I0415 17:57:30.612383 45092 net.cpp:91] Creating Layer relu6
I0415 17:57:30.612396 45092 net.cpp:425] relu6 <- fc6
I0415 17:57:30.612413 45092 net.cpp:386] relu6 -> fc6 (in-place)
I0415 17:57:30.612900 45092 net.cpp:141] Setting up relu6
I0415 17:57:30.612920 45092 net.cpp:148] Top shape: 10 4096 (40960)
I0415 17:57:30.612936 45092 net.cpp:156] Memory required for data: 67946040
I0415 17:57:30.612949 45092 layer_factory.hpp:77] Creating layer drop6
I0415 17:57:30.612979 45092 net.cpp:91] Creating Layer drop6
I0415 17:57:30.612992 45092 net.cpp:425] drop6 <- fc6
I0415 17:57:30.613004 45092 net.cpp:386] drop6 -> fc6 (in-place)
I0415 17:57:30.613055 45092 net.cpp:141] Setting up drop6
I0415 17:57:30.613068 45092 net.cpp:148] Top shape: 10 4096 (40960)
I0415 17:57:30.613080 45092 net.cpp:156] Memory required for data: 68109880
I0415 17:57:30.613091 45092 layer_factory.hpp:77] Creating layer fc7
I0415 17:57:30.613108 45092 net.cpp:91] Creating Layer fc7
I0415 17:57:30.613118 45092 net.cpp:425] fc7 <- fc6
I0415 17:57:30.613138 45092 net.cpp:399] fc7 -> fc7
I0415 17:57:30.660694 45092 net.cpp:141] Setting up fc7
I0415 17:57:30.660760 45092 net.cpp:148] Top shape: 10 4096 (40960)
I0415 17:57:30.660773 45092 net.cpp:156] Memory required for data: 68273720
I0415 17:57:30.660797 45092 layer_factory.hpp:77] Creating layer relu7
I0415 17:57:30.660817 45092 net.cpp:91] Creating Layer relu7
I0415 17:57:30.660830 45092 net.cpp:425] relu7 <- fc7
I0415 17:57:30.660851 45092 net.cpp:386] relu7 -> fc7 (in-place)
I0415 17:57:30.661133 45092 net.cpp:141] Setting up relu7
I0415 17:57:30.661188 45092 net.cpp:148] Top shape: 10 4096 (40960)
I0415 17:57:30.661201 45092 net.cpp:156] Memory required for data: 68437560
I0415 17:57:30.661212 45092 layer_factory.hpp:77] Creating layer drop7
I0415 17:57:30.661227 45092 net.cpp:91] Creating Layer drop7
I0415 17:57:30.661238 45092 net.cpp:425] drop7 <- fc7
I0415 17:57:30.661250 45092 net.cpp:386] drop7 -> fc7 (in-place)
I0415 17:57:30.661293 45092 net.cpp:141] Setting up drop7
I0415 17:57:30.661305 45092 net.cpp:148] Top shape: 10 4096 (40960)
I0415 17:57:30.661320 45092 net.cpp:156] Memory required for data: 68601400
I0415 17:57:30.661331 45092 layer_factory.hpp:77] Creating layer fc8
I0415 17:57:30.661346 45092 net.cpp:91] Creating Layer fc8
I0415 17:57:30.661358 45092 net.cpp:425] fc8 <- fc7
I0415 17:57:30.661375 45092 net.cpp:399] fc8 -> fc8
I0415 17:57:30.673393 45092 net.cpp:141] Setting up fc8
I0415 17:57:30.673418 45092 net.cpp:148] Top shape: 10 1000 (10000)
I0415 17:57:30.673435 45092 net.cpp:156] Memory required for data: 68641400
I0415 17:57:30.673454 45092 layer_factory.hpp:77] Creating layer prob
I0415 17:57:30.673476 45092 net.cpp:91] Creating Layer prob
I0415 17:57:30.673487 45092 net.cpp:425] prob <- fc8
I0415 17:57:30.673506 45092 net.cpp:399] prob -> prob
I0415 17:57:30.673952 45092 net.cpp:141] Setting up prob
I0415 17:57:30.673980 45092 net.cpp:148] Top shape: 10 1000 (10000)
I0415 17:57:30.673995 45092 net.cpp:156] Memory required for data: 68681400
I0415 17:57:30.674005 45092 net.cpp:219] prob does not need backward computation.
I0415 17:57:30.674015 45092 net.cpp:219] fc8 does not need backward computation.
I0415 17:57:30.674026 45092 net.cpp:219] drop7 does not need backward computation.
I0415 17:57:30.674036 45092 net.cpp:219] relu7 does not need backward computation.
I0415 17:57:30.674049 45092 net.cpp:219] fc7 does not need backward computation.
I0415 17:57:30.674062 45092 net.cpp:219] drop6 does not need backward computation.
I0415 17:57:30.674072 45092 net.cpp:219] relu6 does not need backward computation.
I0415 17:57:30.674084 45092 net.cpp:219] fc6 does not need backward computation.
I0415 17:57:30.674093 45092 net.cpp:219] pool5 does not need backward computation.
I0415 17:57:30.674103 45092 net.cpp:219] relu5 does not need backward computation.
I0415 17:57:30.674114 45092 net.cpp:219] conv5 does not need backward computation.
I0415 17:57:30.674129 45092 net.cpp:219] relu4 does not need backward computation.
I0415 17:57:30.674140 45092 net.cpp:219] conv4 does not need backward computation.
I0415 17:57:30.674154 45092 net.cpp:219] relu3 does not need backward computation.
I0415 17:57:30.674168 45092 net.cpp:219] conv3 does not need backward computation.
I0415 17:57:30.674180 45092 net.cpp:219] norm2 does not need backward computation.
I0415 17:57:30.674190 45092 net.cpp:219] pool2 does not need backward computation.
I0415 17:57:30.674201 45092 net.cpp:219] relu2 does not need backward computation.
I0415 17:57:30.674218 45092 net.cpp:219] conv2 does not need backward computation.
I0415 17:57:30.674228 45092 net.cpp:219] norm1 does not need backward computation.
I0415 17:57:30.674244 45092 net.cpp:219] pool1 does not need backward computation.
I0415 17:57:30.674258 45092 net.cpp:219] relu1 does not need backward computation.
I0415 17:57:30.674270 45092 net.cpp:219] conv1 does not need backward computation.
I0415 17:57:30.674281 45092 net.cpp:219] data does not need backward computation.
I0415 17:57:30.674288 45092 net.cpp:261] This network produces output prob
I0415 17:57:30.674320 45092 net.cpp:274] Network initialization done.
I0415 17:57:30.674417 45092 caffe.cpp:320] Performing Forward
I0415 17:57:30.710207 45092 caffe.cpp:325] Initial loss: 0
I0415 17:57:30.710258 45092 caffe.cpp:326] Performing Backward
I0415 17:57:30.710271 45092 caffe.cpp:334] *** Benchmark begins ***
I0415 17:57:30.710284 45092 caffe.cpp:335] Testing for 50 iterations.
I0415 17:57:30.728257 45092 caffe.cpp:363] Iteration: 1 forward-backward time: 17.9167 ms.
I0415 17:57:30.742831 45092 caffe.cpp:363] Iteration: 2 forward-backward time: 14.5289 ms.
I0415 17:57:30.757328 45092 caffe.cpp:363] Iteration: 3 forward-backward time: 14.4394 ms.
I0415 17:57:30.771814 45092 caffe.cpp:363] Iteration: 4 forward-backward time: 14.4486 ms.
I0415 17:57:30.786324 45092 caffe.cpp:363] Iteration: 5 forward-backward time: 14.4736 ms.
I0415 17:57:30.800889 45092 caffe.cpp:363] Iteration: 6 forward-backward time: 14.5299 ms.
I0415 17:57:30.815471 45092 caffe.cpp:363] Iteration: 7 forward-backward time: 14.5472 ms.
I0415 17:57:30.830006 45092 caffe.cpp:363] Iteration: 8 forward-backward time: 14.5004 ms.
I0415 17:57:30.844482 45092 caffe.cpp:363] Iteration: 9 forward-backward time: 14.4412 ms.
I0415 17:57:30.858999 45092 caffe.cpp:363] Iteration: 10 forward-backward time: 14.4823 ms.
I0415 17:57:30.873497 45092 caffe.cpp:363] Iteration: 11 forward-backward time: 14.463 ms.
I0415 17:57:30.887987 45092 caffe.cpp:363] Iteration: 12 forward-backward time: 14.4556 ms.
I0415 17:57:30.902302 45092 caffe.cpp:363] Iteration: 13 forward-backward time: 14.2781 ms.
I0415 17:57:30.915726 45092 caffe.cpp:363] Iteration: 14 forward-backward time: 13.3916 ms.
I0415 17:57:30.929179 45092 caffe.cpp:363] Iteration: 15 forward-backward time: 13.4184 ms.
I0415 17:57:30.942584 45092 caffe.cpp:363] Iteration: 16 forward-backward time: 13.3694 ms.
I0415 17:57:30.956038 45092 caffe.cpp:363] Iteration: 17 forward-backward time: 13.4208 ms.
I0415 17:57:30.969511 45092 caffe.cpp:363] Iteration: 18 forward-backward time: 13.4377 ms.
I0415 17:57:30.982931 45092 caffe.cpp:363] Iteration: 19 forward-backward time: 13.3858 ms.
I0415 17:57:30.996414 45092 caffe.cpp:363] Iteration: 20 forward-backward time: 13.4489 ms.
I0415 17:57:31.009891 45092 caffe.cpp:363] Iteration: 21 forward-backward time: 13.4417 ms.
I0415 17:57:31.023388 45092 caffe.cpp:363] Iteration: 22 forward-backward time: 13.4623 ms.
I0415 17:57:31.036834 45092 caffe.cpp:363] Iteration: 23 forward-backward time: 13.4093 ms.
I0415 17:57:31.050341 45092 caffe.cpp:363] Iteration: 24 forward-backward time: 13.4735 ms.
I0415 17:57:31.064051 45092 caffe.cpp:363] Iteration: 25 forward-backward time: 13.6753 ms.
I0415 17:57:31.077127 45092 caffe.cpp:363] Iteration: 26 forward-backward time: 13.0419 ms.
I0415 17:57:31.090260 45092 caffe.cpp:363] Iteration: 27 forward-backward time: 13.0997 ms.
I0415 17:57:31.103308 45092 caffe.cpp:363] Iteration: 28 forward-backward time: 13.0143 ms.
I0415 17:57:31.116411 45092 caffe.cpp:363] Iteration: 29 forward-backward time: 13.0691 ms.
I0415 17:57:31.131355 45092 caffe.cpp:363] Iteration: 30 forward-backward time: 14.9067 ms.
I0415 17:57:31.144529 45092 caffe.cpp:363] Iteration: 31 forward-backward time: 13.1344 ms.
I0415 17:57:31.157615 45092 caffe.cpp:363] Iteration: 32 forward-backward time: 13.0509 ms.
I0415 17:57:31.170734 45092 caffe.cpp:363] Iteration: 33 forward-backward time: 13.0845 ms.
I0415 17:57:31.183795 45092 caffe.cpp:363] Iteration: 34 forward-backward time: 13.0256 ms.
I0415 17:57:31.196889 45092 caffe.cpp:363] Iteration: 35 forward-backward time: 13.0594 ms.
I0415 17:57:31.210055 45092 caffe.cpp:363] Iteration: 36 forward-backward time: 13.1316 ms.
I0415 17:57:31.223122 45092 caffe.cpp:363] Iteration: 37 forward-backward time: 13.0324 ms.
I0415 17:57:31.236204 45092 caffe.cpp:363] Iteration: 38 forward-backward time: 13.0479 ms.
I0415 17:57:31.249371 45092 caffe.cpp:363] Iteration: 39 forward-backward time: 13.1312 ms.
I0415 17:57:31.262531 45092 caffe.cpp:363] Iteration: 40 forward-backward time: 13.1264 ms.
I0415 17:57:31.275607 45092 caffe.cpp:363] Iteration: 41 forward-backward time: 13.0423 ms.
I0415 17:57:31.288681 45092 caffe.cpp:363] Iteration: 42 forward-backward time: 13.0402 ms.
I0415 17:57:31.301772 45092 caffe.cpp:363] Iteration: 43 forward-backward time: 13.0568 ms.
I0415 17:57:31.314856 45092 caffe.cpp:363] Iteration: 44 forward-backward time: 13.048 ms.
I0415 17:57:31.327955 45092 caffe.cpp:363] Iteration: 45 forward-backward time: 13.0655 ms.
I0415 17:57:31.341047 45092 caffe.cpp:363] Iteration: 46 forward-backward time: 13.0522 ms.
I0415 17:57:31.354151 45092 caffe.cpp:363] Iteration: 47 forward-backward time: 13.0704 ms.
I0415 17:57:31.367300 45092 caffe.cpp:363] Iteration: 48 forward-backward time: 13.0992 ms.
I0415 17:57:31.380395 45092 caffe.cpp:363] Iteration: 49 forward-backward time: 13.0612 ms.
I0415 17:57:31.393465 45092 caffe.cpp:363] Iteration: 50 forward-backward time: 13.0353 ms.
I0415 17:57:31.393489 45092 caffe.cpp:366] Average time per layer:
I0415 17:57:31.393506 45092 caffe.cpp:369]       data	forward: 0.00180352 ms.
I0415 17:57:31.393535 45092 caffe.cpp:372]       data	backward: 0.0016416 ms.
I0415 17:57:31.393554 45092 caffe.cpp:369]      conv1	forward: 0.70582 ms.
I0415 17:57:31.393573 45092 caffe.cpp:372]      conv1	backward: 0.706209 ms.
I0415 17:57:31.393594 45092 caffe.cpp:369]      relu1	forward: 0.114414 ms.
I0415 17:57:31.393611 45092 caffe.cpp:372]      relu1	backward: 0.0016096 ms.
I0415 17:57:31.393631 45092 caffe.cpp:369]      pool1	forward: 0.12752 ms.
I0415 17:57:31.393651 45092 caffe.cpp:372]      pool1	backward: 0.00160192 ms.
I0415 17:57:31.393666 45092 caffe.cpp:369]      norm1	forward: 0.0773696 ms.
I0415 17:57:31.393679 45092 caffe.cpp:372]      norm1	backward: 0.206209 ms.
I0415 17:57:31.393697 45092 caffe.cpp:369]      conv2	forward: 1.12211 ms.
I0415 17:57:31.393710 45092 caffe.cpp:372]      conv2	backward: 1.12579 ms.
I0415 17:57:31.393728 45092 caffe.cpp:369]      relu2	forward: 0.0755533 ms.
I0415 17:57:31.393745 45092 caffe.cpp:372]      relu2	backward: 0.00160576 ms.
I0415 17:57:31.393760 45092 caffe.cpp:369]      pool2	forward: 0.0891341 ms.
I0415 17:57:31.393779 45092 caffe.cpp:372]      pool2	backward: 0.00160128 ms.
I0415 17:57:31.393795 45092 caffe.cpp:369]      norm2	forward: 0.161537 ms.
I0415 17:57:31.393815 45092 caffe.cpp:372]      norm2	backward: 0.488591 ms.
I0415 17:57:31.393831 45092 caffe.cpp:369]      conv3	forward: 0.571384 ms.
I0415 17:57:31.393851 45092 caffe.cpp:372]      conv3	backward: 0.856935 ms.
I0415 17:57:31.393868 45092 caffe.cpp:369]      relu3	forward: 0.0197875 ms.
I0415 17:57:31.393884 45092 caffe.cpp:372]      relu3	backward: 0.0016064 ms.
I0415 17:57:31.393905 45092 caffe.cpp:369]      conv4	forward: 0.436511 ms.
I0415 17:57:31.393929 45092 caffe.cpp:372]      conv4	backward: 0.696554 ms.
I0415 17:57:31.393942 45092 caffe.cpp:369]      relu4	forward: 0.0239539 ms.
I0415 17:57:31.393965 45092 caffe.cpp:372]      relu4	backward: 0.0015872 ms.
I0415 17:57:31.393985 45092 caffe.cpp:369]      conv5	forward: 0.30266 ms.
I0415 17:57:31.394006 45092 caffe.cpp:372]      conv5	backward: 0.519668 ms.
I0415 17:57:31.394028 45092 caffe.cpp:369]      relu5	forward: 0.0189939 ms.
I0415 17:57:31.394052 45092 caffe.cpp:372]      relu5	backward: 0.00227264 ms.
I0415 17:57:31.394070 45092 caffe.cpp:369]      pool5	forward: 0.0326426 ms.
I0415 17:57:31.394089 45092 caffe.cpp:372]      pool5	backward: 0.00213184 ms.
I0415 17:57:31.394106 45092 caffe.cpp:369]        fc6	forward: 1.60053 ms.
I0415 17:57:31.394127 45092 caffe.cpp:372]        fc6	backward: 1.36114 ms.
I0415 17:57:31.394145 45092 caffe.cpp:369]      relu6	forward: 0.0175546 ms.
I0415 17:57:31.394165 45092 caffe.cpp:372]      relu6	backward: 0.00160768 ms.
I0415 17:57:31.394186 45092 caffe.cpp:369]      drop6	forward: 0.0205677 ms.
I0415 17:57:31.394201 45092 caffe.cpp:372]      drop6	backward: 0.00160832 ms.
I0415 17:57:31.394217 45092 caffe.cpp:369]        fc7	forward: 0.724168 ms.
I0415 17:57:31.394232 45092 caffe.cpp:372]        fc7	backward: 0.605802 ms.
I0415 17:57:31.394246 45092 caffe.cpp:369]      relu7	forward: 0.0171334 ms.
I0415 17:57:31.394264 45092 caffe.cpp:372]      relu7	backward: 0.0016 ms.
I0415 17:57:31.394284 45092 caffe.cpp:369]      drop7	forward: 0.018528 ms.
I0415 17:57:31.394301 45092 caffe.cpp:372]      drop7	backward: 0.002224 ms.
I0415 17:57:31.394318 45092 caffe.cpp:369]        fc8	forward: 0.219565 ms.
I0415 17:57:31.394333 45092 caffe.cpp:372]        fc8	backward: 0.185244 ms.
I0415 17:57:31.394348 45092 caffe.cpp:369]       prob	forward: 0.0198214 ms.
I0415 17:57:31.394366 45092 caffe.cpp:372]       prob	backward: 0.00157376 ms.
I0415 17:57:31.394407 45092 caffe.cpp:377] Average Forward pass: 6.67847 ms.
I0415 17:57:31.394419 45092 caffe.cpp:379] Average Backward pass: 6.93449 ms.
I0415 17:57:31.394433 45092 caffe.cpp:381] Average Forward-Backward: 13.6816 ms.
I0415 17:57:31.394448 45092 caffe.cpp:383] Total Time: 684.081 ms.
I0415 17:57:31.394464 45092 caffe.cpp:384] *** Benchmark ends ***


你可能感兴趣的:(高性能计算,GPU,深度学习,caffe,cudnn)