ImageNet Evolution论文笔记(1)

Imagenet classification with deep convolutional neural networks

1,The Dataset

ImageNet:we down-sampled the images to a fixed resolution of 256 × 256. Given arectangular image, we first rescaled the image such that the shorter side was of length 256, and then cropped out the central 256×256 patch from the resulting image。subtracting the mean activity over the training set from each pixel

2,The Architecture

five convolutional layers, some of which are followed by max-pooling layers,and three fully-connected layers with a final 1000-way softmax
ImageNet Evolution论文笔记(1)_第1张图片
ImageNet Evolution论文笔记(1)_第2张图片
Local Response Normalization
ImageNet Evolution论文笔记(1)_第3张图片

3,Reducing Overfitting

Data Augmentation
1,extracting random 224 × 224 patches (and their horizontal reflections) from the256×256 images and training our network on these extracted patches;随机地从256x256的原始图像中截取224x224大小的区域(以及水平翻转的镜像),相当于增加了(256-224)^2x2=2048倍的数据量。
2,At test time, the network makes a prediction by extracting five 224 × 224 patches (the four corner patches and the center patch) as well as their horizontal reflections (hence ten patches in all), and averaging the predictions made by the network’s softmax layer on the ten patches。进行预测时,则是取图片的四个角加中间共5个位置,并进行左右翻转,一共获得10张图片,对他们进行预测并对10次结果求均值。
3, perform PCA on the set of RGB pixel values throughout the ImageNet training set.对图像的RGB数据进行PCA处理,并对主成分做一个标准差为0.1的高斯扰动,增加一些噪声。
这里写图片描述
Dropou
1,At test time, we use all the neurons but multiply their outputs by 0.5
,2,use dropout in the first two fully-connected layers

4,Details of learning

1,a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005
ImageNet Evolution论文笔记(1)_第4张图片
2,We initialized the weights in each layer from a zero-mean Gaussian distribution with standard deviation 0.01. We initialized the neuron biases in the second, fourth, and fifth convolutional layers,as well as in the fully-connected hidden layers, with the constant 1. This initialization accelerates the early stages of learning by providing the ReLUs with positive inputs. We initialized the neuron biases in the remaining layers with the constant 0.

结构代码:

ImageNet Evolution论文笔记(1)_第5张图片

利用tensorflow slim库实现

slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)  # 截断正态分布
def alexnet_v2( inputs,num_classes=1000,is_training=True,dropout_keep_prob=0.5,spatial_squeeze=True,scope='alexnet_v2',global_pool=False):
    """AlexNet version 2.
    Note: 所有的全连接层已经转为卷积层,输入调为224x224或者设置global_pool=True。
          为了使用全链接层把spatial_squeeze设置为false。
          局部相应归一化曾被移除改成从random_normal_initializer to xavier_initializer。
    Args:
        inputs: 大小为[batch_size, height, width, channels]的张量.
        num_classes: 
        is_training: 训练时为True,测试时为False
        dropout_keep_prob: 训练时<1,测试时=1
        spatial_squeeze: True除去不必要的维度,变成两维
        scope: 自己定义的名称scope
        global_pool: Optional boolean flag. If True, the input to the classification layer is avgpooled to size 1x1, for any input size. (This is not part of the original AlexNet.)

    Returns:
        net: the output of the logits layer (if num_classes is a non-zero integer),
             or the non-dropped-out input to the logits layer (if num_classes is 0 or None).
        end_points: a dict of tensors with intermediate activations.
    """
    with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'  # 'alexnet_v2/_end_points'
        # Collect outputs for conv2d, fully_connected and max_pool2d.
        with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                            outputs_collections=[end_points_collection]):
            net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',scope='conv1')  # name:u'alexnet_v2/conv1/Relu:0'
            net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')  # u'alexnet_v2/pool1/MaxPool:0'
            net = slim.conv2d(net, 192, [5, 5], scope='conv2')
            net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
            net = slim.conv2d(net, 384, [3, 3], scope='conv3')
            net = slim.conv2d(net, 384, [3, 3], scope='conv4')
            net = slim.conv2d(net, 256, [3, 3], scope='conv5')
            net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

            # Use conv2d instead of fully_connected layers.
            with slim.arg_scope([slim.conv2d], weights_initializer=trunc_normal(0.005),biases_initializer=tf.constant_initializer(0.1)):
                net = slim.conv2d(net, 4096, [5, 5], padding='VALID', scope='fc6')  # u'alexnet_v2/fc6/Relu:0'
                net = slim.dropout(net, dropout_keep_prob, is_training=is_training,scope='dropout6')  # u'alexnet_v2/dropout6/dropout/mul:0'
                net = slim.conv2d(net, 4096, [1, 1], scope='fc7')  # (5, 1, 1, 4096)
                # Convert end_points_collection into a end_point dict.
                end_points = slim.utils.convert_collection_to_dict(end_points_collection)
                if global_pool:
                    net = tf.reduce_mean(net, [1, 2], keep_dims=True,name='global_pool')  # (1, 4, 7, 4096) -- > (1, 1, 1, 4096)
                    end_points['global_pool'] = net
                if num_classes:
                    net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout7')
                    net = slim.conv2d(net, num_classes, [1, 1],  # (5, 1, 1, 1000)
                                      activation_fn=None,
                                      normalizer_fn=None,
                                      biases_initializer=tf.zeros_initializer(),
                                      scope='fc8')
                    if spatial_squeeze:
                        net = tf.squeeze(net, [1, 2],name='fc8/squeezed')  # 'alexnet_v2/fc8/squeezed'  (5, 1000)【u'alexnet_v2/fc8/squeezed:0'】
                    end_points[sc.name + '/fc8'] = net
            return net, end_points

你可能感兴趣的:(深度学习之图像处理)