ML_BOY

目标检测 -- SSD (tensorflow 版) 逐行逐句解读

这篇博客，主要是讲解SSD,tensorflow版的实现，代码地址是：SSD-tensorflow，大神写的代码，也是github上tensorflow版的SSD star 最多的代码，所以就用它来讲解，同时附上论文地址：SSD 论文下载

对照论文和代码讲解，代码中提供了SSD300和SSD512,代码一样，只是图像输入大小不一致，这个地方我主要讲解SSD512。

1 网络结构

我们先来看论文上的网络结构图：

网络结构比较简单，就是在VGG的基础上改得，前面和VGG一样，但是SSD把VGG的全连接层换成了几个卷积层，把droupout层去除了，同时使用了atrous algorithm，其实就是扩展卷积或带孔卷积（Dilation Conv），具体这个卷积方式可以看这个链接 atrous algorithm。

我们从图上也可以看出，SSD和YOLO不同的地方是，YOLO只是对最后一层特征图用来预测回归框，而SSD则是多层，不同大小的特征图都用来做预测和回归。YOLO的缺点是定位不准，对小物体检测效果差，而SSD一定长度上克服了这些难点，因为使用了不同特征图进行预测，SSD的多尺度，用的多层的特征图，是stride=2,不断缩小特征图的长和宽，越靠后的卷积特征图，他的感受野越大，越靠前感受野越小，同时越靠前检测小物体效果更好。但是SSD对小物体检测也并不好，因为前面VGG的已经把特征图下降了16倍。

我们看下网络结构的代码：

    end_points = {}
    with tf.variable_scope(scope, 'ssd_512_vgg', [inputs], reuse=reuse):
        # Original VGG-16 blocks.
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
        end_points['block1'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool1')
        # Block 2.
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
        end_points['block2'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool2')
        # Block 3.
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
        end_points['block3'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool3')
        # Block 4.
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
        end_points['block4'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool4')
        # Block 5.
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
        end_points['block5'] = net
        net = slim.max_pool2d(net, [3, 3], 1, scope='pool5')

        # Additional SSD blocks.
        # Block 6: let's dilate the hell out of it!
        net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')
        end_points['block6'] = net
        # Block 7: 1x1 conv. Because the fuck.
        net = slim.conv2d(net, 1024, [1, 1], scope='conv7')
        end_points['block7'] = net

        # Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
        end_point = 'block8'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block9'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block10'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block11'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block12'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 256, [4, 4], scope='conv4x4', padding='VALID')
            # Fix padding to match Caffe version (pad=1).
            # pad_shape = [(i-j) for i, j in zip(layer_shape(net), [0, 1, 1, 0])]
            # net = tf.slice(net, [0, 0, 0, 0], pad_shape, name='caffe_pad')
        end_points[end_point] = net

        # Prediction and localisations layers.
        predictions = []
        logits = []
        localisations = []
        for i, layer in enumerate(feat_layers):
            with tf.variable_scope(layer + '_box'):
                ## p = cls_pred,l =  loc_pred ,表示每一层的预测结果
                p, l = ssd_vgg_300.ssd_multibox_layer(end_points[layer],
                                                      num_classes,
                                                      anchor_sizes[i],
                                                      anchor_ratios[i],
                                                      normalizations[i])
            ## 对于类别在进行tf.softmax
            predictions.append(prediction_fn(p))
            logits.append(p)
            localisations.append(l)

上面的代码就是在构建网络，网络也就是和VGG差不多，endpoints这个字典，里面包含的是不同特征图的输出，就是SSD不是只利用一层特征，而是多层，所以这个地方存放多层的输出。需要注意，原本程序我把每一层的输出特征图大小都计算了，结果没保存，就是如果仔细取计算每一层的输出特征图的大小，会发现，后面有8×8，4×4大小的特征图，最后一层是1×1,这个是作者设计的，所以如果想换成其他的网络，有时我们自己也是需要设计这样的，就是代码中为啥有时候要加一个padd，就是为了保证最后输出结果为1×1,以及有类似8×8和4×4大小的特征图具体怎么算，可以看这个博客：点击打开链接

我们看几个参数：

        feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11', 'block12'],
        feat_shapes=[(64, 64), (32, 32), (16, 16), (8, 8), (4, 4), (2, 2), (1, 1)],

anchor_steps=[8, 16, 32, 64, 128, 256, 512],

这个几个feature -map的大小就是根据网络结构算出来的，64×64之类的，大家可以去计算，发现是对应的，block4的特征图大小就是64,所以大家想换网络，这个地方需要计算好自己改。anchor_steps也是对应的，就是特征图的缩放倍数，也是对对应的，比如：8×64=512,16×32=512等等。不能随便设置

然后上面代码中有ssd_vgg_300.ssd_multibox_layer这个函数，我们看一下：

def ssd_multibox_layer(inputs,
                       num_classes,
                       sizes,
                       ratios=[1],
                       normalization=-1,
                       bn_normalization=False):
    """Construct a multibox layer, return a class and localization predictions.
    """
    net = inputs
    if normalization > 0:
        net = custom_layers.l2_normalization(net, scaling=True)
    # Number of anchors.
    num_anchors = len(sizes) + len(ratios)

    # Location.
    num_loc_pred = num_anchors * 4
    loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,
                           scope='conv_loc')
    loc_pred = custom_layers.channel_to_last(loc_pred)
    loc_pred = tf.reshape(loc_pred,
                          tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4])
    # Class prediction.
    num_cls_pred = num_anchors * num_classes
    cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None,
                           scope='conv_cls')
    cls_pred = custom_layers.channel_to_last(cls_pred)
    cls_pred = tf.reshape(cls_pred,
                          tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes])
    return cls_pred, loc_pred

上面代码中，我们对于输出特征图，直接经过3×3的卷积层输出框和类别，custom_layers.channel_to_last，这个函数其实只是把通道数放在最后，但是tensorlfow里面本来就是，所以有点多余，num_anchors表示该层框的个数。

tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes]

tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4]

tensor_shape 就是将tensor的形状拿到，然后把最后一层拆分出来，变为5维的相当于，后两两个维度代表那个框，的那个类或者框，然后返回类和框的预测，注意这个地方这两个输出都没激活函数。

然后在回到网络结构后面的代码：

        for i, layer in enumerate(feat_layers):
            with tf.variable_scope(layer + '_box'):
                ## p = cls_pred,l =  loc_pred ,表示每一层的预测结果
                p, l = ssd_vgg_300.ssd_multibox_layer(end_points[layer],
                                                      num_classes,
                                                      anchor_sizes[i],
                                                      anchor_ratios[i],
                                                      normalizations[i])
            ## 对于类别在进行tf.softmax
            predictions.append(prediction_fn(p))
            logits.append(p)
            localisations.append(l)

        return predictions, localisations, logits, end_points

这个地方就是循环，然后把结果保存在一个list中，prediction_fn就是softmax，因为前面是没有激活函数的，所以prediction是保存了经过激活函数的，logits是没有激活函数的，localisations是保存预测的框，end_poins是每一层的输出。

以上就是整个网络的架构，就是利用VGG模型，把后面的全连接层改了，全部变为卷积层，然后不是只用最后一层预测框，中间不同特征图大小都有用来预测。

2 SSD 框的生成

    def anchors(self, img_shape, dtype=np.float32):
        """Compute the default anchor boxes, given an image shape.
        """
        return ssd_anchors_all_layers(img_shape,
                                      self.params.feat_shapes,
                                      self.params.anchor_sizes,
                                      self.params.anchor_ratios,
                                      self.params.anchor_steps,
                                      self.params.anchor_offset,
                                      dtype)

代码里面框的生成实现了两连跳，这个地方入口，调用ssd_anchors_all_layers，image_shape就是图像输入大小，我们再看这个函数：

def ssd_anchors_all_layers(img_shape,
                           layers_shape,
                           anchor_sizes,
                           anchor_ratios,
                           anchor_steps,
                           offset=0.5,
                           dtype=np.float32):
    """Compute anchor boxes for all feature layers.
    """
    layers_anchors = []
    for i, s in enumerate(layers_shape):
        anchor_bboxes = ssd_anchor_one_layer(img_shape, s,
                                             anchor_sizes[i],
                                             anchor_ratios[i],
                                             anchor_steps[i],
                                             offset=offset, dtype=dtype)
        layers_anchors.append(anchor_bboxes)
    return layers_anchors

上面这个函数，是一个for循环，就是提取出来的需要预测框和类的特征图一层一层，layer_shape就是特征图的大小，就是前面我说的计算得到的。然后又调用ssd_anchor_one_layer，我们来看下：

def ssd_anchor_one_layer(img_shape,
                         feat_shape,
                         sizes,
                         ratios,
                         step,
                         offset=0.5,
                         dtype=np.float32):
    """Computer SSD default anchor boxes for one feature layer.

    Determine the relative position grid of the centers, and the relative
    width and height.

    Arguments:
      feat_shape: Feature shape, used for computing relative position grids;
      size: Absolute reference sizes;
      ratios: Ratios to use on these features;
      img_shape: Image shape, used for computing height, width relatively to the
        former;
      offset: Grid offset.

    Return:
      y, x, h, w: Relative x and y grids, and height and width.
    """
    # Compute the position grid: simple way.
    # y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    # y = (y.astype(dtype) + offset) / feat_shape[0]
    # x = (x.astype(dtype) + offset) / feat_shape[1]
    # Weird SSD-Caffe computation using steps values...
    y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    y = (y.astype(dtype) + offset) * step / img_shape[0]
    x = (x.astype(dtype) + offset) * step / img_shape[1]

    # Expand dims to support easy broadcasting.
    y = np.expand_dims(y, axis=-1)
    x = np.expand_dims(x, axis=-1)

    # Compute relative height and width.
    # Tries to follow the original implementation of SSD for the order.
    num_anchors = len(sizes) + len(ratios)
    h = np.zeros((num_anchors, ), dtype=dtype)
    w = np.zeros((num_anchors, ), dtype=dtype)
    # Add first anchor boxes with ratio=1.
    h[0] = sizes[0] / img_shape[0]
    w[0] = sizes[0] / img_shape[1]
    di = 1
    if len(sizes) > 1:
        h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
        w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
        di += 1
    for i, r in enumerate(ratios):
        h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
        w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
    return y, x, h, w

首先下面这一句是生成网格，这样实际就代表了特征图每个点的坐标：

y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]

下面的是将我们的特征图坐标在原图中归一化，同时加上一个偏移offset=0.5,因为是框的中心，每个框里面相当于每

个点间隔是1,所以框终点需要加上0.5,对应论文上这个公式：

y = (y.astype(dtype) + offset) * step / img_shape[0]

x坐标也是一样，然后只是增加一个维度，

num_anchors 是计算每一层框的个数，

h = np.zeros((num_anchors, ), dtype=dtype)
w = np.zeros((num_anchors, ), dtype=dtype)
# Add first anchor boxes with ratio=1.
h[0] = sizes[0] / img_shape[0]
w[0] = sizes[0] / img_shape[1]
di = 1
if len(sizes) > 1:
    h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
    w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
    di += 1
for i, r in enumerate(ratios):
    h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
    w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)

这个地方是求框，我们看到，其实有一个框默认就是正方型的，就是第一个，也就是1：1的时候，为了适应不同长宽比列的物体

后面的计算就是根据上面将的公式来计算的，里面size表示该层框的大小，ratio是该层的框长宽比，这个地方需要注意，论文上是这样生成框的：

论文给定Smin=0.2,Smax=0.9,然后根据上面公式计算k表示第一个特征图，计算得到每层的sk，然后计算长和宽，计算公式如下：

当长宽比为1的时候，

多加一个上面的，但是代码不是这样实现的，他是直接给了长宽，我们看看，

 anchor_sizes=[(20.48, 51.2),
                      (51.2, 133.12),
                      (133.12, 215.04),
                      (215.04, 296.96),
                      (296.96, 378.88),
                      (378.88, 460.8),
                      (460.8, 542.72)],
        anchor_ratios=[[2, .5],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5, 3, 1./3],
                       [2, .5],
                       [2, .5]],

我们看到他的Sk是大小，不是比例，论文上是0.2-0.9,而且你用512×0.2计算得到的也不是代码给的，所以这个地方其实框的大小是可以自己给的，可以根据经验给定。根据上面的计算得到框的大小。

最后是返回一个改层每个中心点坐标和框。存在layers_anchors，并返回，这个地方其实和Faster-Rcnn是一样的。也是anchor机制。

3 对anchor和GT的预处理

我们看代码：

    def bboxes_encode(self, labels, bboxes, anchors,
                      scope=None):
        """Encode labels and bounding boxes.
        """
        return ssd_common.tf_ssd_bboxes_encode(
            labels, bboxes, anchors,
            self.params.num_classes,
            self.params.no_annotation_label,
            ignore_threshold=0.5,
            prior_scaling=self.params.prior_scaling,
            scope=scope)

一看就知道是调用了ssd_common.tf_ssd_bboxes_encode这个函数，我们看看：

def tf_ssd_bboxes_encode(labels,
                         bboxes,
                         anchors,
                         num_classes,
                         no_annotation_label,
                         ignore_threshold=0.5,
                         prior_scaling=[0.1, 0.1, 0.2, 0.2],
                         dtype=tf.float32,
                         scope='ssd_bboxes_encode'):
    """Encode groundtruth labels and bounding boxes using SSD net anchors.
    Encoding boxes for all feature layers.

    Arguments:
      labels: 1D Tensor(int64) containing groundtruth labels;
      bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
      anchors: List of Numpy array with layer anchors;
      matching_threshold: Threshold for positive match with groundtruth bboxes;
      prior_scaling: Scaling of encoded coordinates.

    Return:
      (target_labels, target_localizations, target_scores):
        Each element is a list of target Tensors.
    """
    with tf.name_scope(scope):
        target_labels = []
        target_localizations = []
        target_scores = []
        for i, anchors_layer in enumerate(anchors):
            with tf.name_scope('bboxes_encode_block_%i' % i):
                t_labels, t_loc, t_scores = \
                    tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,
                                               num_classes, no_annotation_label,
                                               ignore_threshold,
                                               prior_scaling, dtype)
                target_labels.append(t_labels)
                target_localizations.append(t_loc)
                target_scores.append(t_scores)
        ## t_labels 表示返回每个anchor对应的类别，t_loc返回的是一种变换，
        ## t_scores 每个anchor与gt对应的最大的交并比
        ## target_labels是一个list，包含每层的每个anchor对应的gt类别，
        ## target_localizations对应的是包含每一层所有anchor对应的变换
        ### target_scores 返回的是每个anchor与gt对应的最大的交并比
        return target_labels, target_localizations, target_scores

看上面的函数就知道，他们有是调用了tf.ssd_bboxes_encode_layer这个函数，有一个循环，是对需要预测的特征图一层一层的循环，然后我们看调用的函数

def tf_ssd_bboxes_encode_layer(labels,
                               bboxes,
                               anchors_layer,
                               num_classes,
                               no_annotation_label,
                               ignore_threshold=0.5,
                               prior_scaling=[0.1, 0.1, 0.2, 0.2],
                               dtype=tf.float32):
    """Encode groundtruth labels and bounding boxes using SSD anchors from
    one layer.

    Arguments:
      labels: 1D Tensor(int64) containing groundtruth labels;
      bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
      anchors_layer: Numpy array with layer anchors;
      matching_threshold: Threshold for positive match with groundtruth bboxes;
      prior_scaling: Scaling of encoded coordinates.

    Return:
      (target_labels, target_localizations, target_scores): Target Tensors.
    """
    # Anchors coordinates and volume.
    yref, xref, href, wref = anchors_layer
    ymin = yref - href / 2.
    xmin = xref - wref / 2.
    ymax = yref + href / 2.
    xmax = xref + wref / 2.
    vol_anchors = (xmax - xmin) * (ymax - ymin)

    # Initialize tensors...
    shape = (yref.shape[0], yref.shape[1], href.size)
    feat_labels = tf.zeros(shape, dtype=tf.int64)
    feat_scores = tf.zeros(shape, dtype=dtype)

    feat_ymin = tf.zeros(shape, dtype=dtype)
    feat_xmin = tf.zeros(shape, dtype=dtype)
    feat_ymax = tf.ones(shape, dtype=dtype)
    feat_xmax = tf.ones(shape, dtype=dtype)

    def jaccard_with_anchors(bbox):
        """Compute jaccard score between a box and the anchors.
        """
        int_ymin = tf.maximum(ymin, bbox[0])
        int_xmin = tf.maximum(xmin, bbox[1])
        int_ymax = tf.minimum(ymax, bbox[2])
        int_xmax = tf.minimum(xmax, bbox[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        # Volumes.
        inter_vol = h * w
        union_vol = vol_anchors - inter_vol \
            + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
        jaccard = tf.div(inter_vol, union_vol)
        return jaccard

    def intersection_with_anchors(bbox):
        """Compute intersection between score a box and the anchors.
        """
        int_ymin = tf.maximum(ymin, bbox[0])
        int_xmin = tf.maximum(xmin, bbox[1])
        int_ymax = tf.minimum(ymax, bbox[2])
        int_xmax = tf.minimum(xmax, bbox[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        inter_vol = h * w
        scores = tf.div(inter_vol, vol_anchors)
        return scores

    def condition(i, feat_labels, feat_scores,
                  feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """Condition: check label index.
        """
        ### 逐元素比较大小,其实就是遍历label，因为i在body返回的时候加1了，直到遍历完
        r = tf.less(i, tf.shape(labels))
        return r[0]

    def body(i, feat_labels, feat_scores,
             feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """Body: update feature labels, scores and bboxes.
        Follow the original SSD paper for that purpose:
          - assign values when jaccard > 0.5;
          - only update if beat the score of other bboxes.
        """
        # Jaccard score.
        label = labels[i]
        bbox = bboxes[i]
        ### 返回的是交并比,算某一层上所有的框和图像中第一个框的交并比
        jaccard = jaccard_with_anchors(bbox)
        # Mask: check threshold + scores + no annotations + num_classes.
        ### 这个地方是帅选掉交并比小于0的
        mask = tf.greater(jaccard, feat_scores)
        # mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))

        mask = tf.logical_and(mask, feat_scores > -0.5)
        mask = tf.logical_and(mask, label < num_classes)
        imask = tf.cast(mask, tf.int64)
        fmask = tf.cast(mask, dtype)
        # Update values using mask.
        feat_labels = imask * label + (1 - imask) * feat_labels
        ## tf.where表示如果mask为镇则jaccard，否则为feat_scores
        feat_scores = tf.where(mask, jaccard, feat_scores)
        ###
        feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
        feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
        feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
        feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax

        # Check no annotation label: ignore these anchors...
        # interscts = intersection_with_anchors(bbox)
        # mask = tf.logical_and(interscts > ignore_threshold,
        #                       label == no_annotation_label)
        # # Replace scores by -1.
        # feat_scores = tf.where(mask, -tf.cast(mask, dtype), feat_scores)

        return [i+1, feat_labels, feat_scores,
                feat_ymin, feat_xmin, feat_ymax, feat_xmax]
    # Main loop definition.
    i = 0
    [i, feat_labels, feat_scores,
     feat_ymin, feat_xmin,
     feat_ymax, feat_xmax] = tf.while_loop(condition, body,
                                           [i, feat_labels, feat_scores,
                                            feat_ymin, feat_xmin,
                                            feat_ymax, feat_xmax])
    # Transform to center / size.
    feat_cy = (feat_ymax + feat_ymin) / 2.
    feat_cx = (feat_xmax + feat_xmin) / 2.
    feat_h = feat_ymax - feat_ymin
    feat_w = feat_xmax - feat_xmin
    # Encode features.
    ### prior_scaling=[0.1, 0.1, 0.2, 0.2]
    feat_cy = (feat_cy - yref) / href / prior_scaling[0]
    feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
    feat_h = tf.log(feat_h / href) / prior_scaling[2]
    feat_w = tf.log(feat_w / wref) / prior_scaling[3]
    # Use SSD ordering: x / y / w / h instead of ours.
    feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
    ## feat_labels 表示返回每个anchor对应的类别，feat_localizations返回的是一种变换，
    ## feat_scores 每个anchor与gt对应的最大的交并比
    return feat_labels, feat_localizations, feat_scores

看这部分写的有点不太好读，因为他是函数里面写函数，调用自己的函数，关键是他把自己写的函数放在中间，使得代码前面一半后面一半，中间是一些函数，不仔细往后看还以为结束了。

yref, xref, href, wref = anchors_layer
ymin = yref - href / 2.
xmin = xref - wref / 2.
ymax = yref + href / 2.
xmax = xref + wref / 2.
vol_anchors = (xmax - xmin) * (ymax - ymin)

# Initialize tensors...
shape = (yref.shape[0], yref.shape[1], href.size)
feat_labels = tf.zeros(shape, dtype=tf.int64)
feat_scores = tf.zeros(shape, dtype=dtype)

feat_ymin = tf.zeros(shape, dtype=dtype)
feat_xmin = tf.zeros(shape, dtype=dtype)
feat_ymax = tf.ones(shape, dtype=dtype)
feat_xmax = tf.ones(shape, dtype=dtype)

开头是这样的，ymin，xmin，ymax，xmax之类的是把之前的坐标换成了左上角和右上角的坐标，方便求交并比，注意这个地方像y_ref之类的都是一个numpy数组，是整个特征图所以的中心点，所以这个地方相当于是numpy的广播性质，可不是一个框的操作，而是整个层的操作，shape是tensor的形状，feat_labels，feat_scores，feat_ymin这些是为了保存结果的，形状应该和我们框坐标之类的一样。

接下来，应该跳过那些函数，看后面的

# Main loop definition.
i = 0
[i, feat_labels, feat_scores,
 feat_ymin, feat_xmin,
 feat_ymax, feat_xmax] = tf.while_loop(condition, body,
                                       [i, feat_labels, feat_scores,
                                        feat_ymin, feat_xmin,
                                        feat_ymax, feat_xmax])
# Transform to center / size.
feat_cy = (feat_ymax + feat_ymin) / 2.
feat_cx = (feat_xmax + feat_xmin) / 2.
feat_h = feat_ymax - feat_ymin
feat_w = feat_xmax - feat_xmin
# Encode features.
### prior_scaling=[0.1, 0.1, 0.2, 0.2]
feat_cy = (feat_cy - yref) / href / prior_scaling[0]
feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
feat_h = tf.log(feat_h / href) / prior_scaling[2]
feat_w = tf.log(feat_w / wref) / prior_scaling[3]
# Use SSD ordering: x / y / w / h instead of ours.
feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)

tf.while_loop()这个函数是如果满足condition，则执行body，当然传递的参数就是后面的list，那我们看condition函数，

def condition(i, feat_labels, feat_scores,
              feat_ymin, feat_xmin, feat_ymax, feat_xmax):

    ### 逐元素比较大小,其实就是遍历label，因为i在body返回的时候加1了，直到遍历完
    r = tf.less(i, tf.shape(labels))
    return r[0]

我上面解释的很清楚，tf.less表示逐元素比较大小，就是如果i遍历，相当于就是遍历所有真实的框，然后我们看body这个函数

def body(i, feat_labels, feat_scores,
         feat_ymin, feat_xmin, feat_ymax, feat_xmax):

    label = labels[i]
    bbox = bboxes[i]
    ### 返回的是交并比,算某一层上所有的框和图像中第一个框的交并比
    jaccard = jaccard_with_anchors(bbox)
    # Mask: check threshold + scores + no annotations + num_classes.
    ### 这个地方是帅选掉交并比小于0的
    mask = tf.greater(jaccard, feat_scores)
    # mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))
    mask = tf.logical_and(mask, feat_scores > -0.5)
    mask = tf.logical_and(mask, label < num_classes)
    imask = tf.cast(mask, tf.int64)
    fmask = tf.cast(mask, dtype)
    # Update values using mask.
    feat_labels = imask * label + (1 - imask) * feat_labels
    ## tf.where表示如果mask为镇则jaccard，否则为feat_scores
    feat_scores = tf.where(mask, jaccard, feat_scores)

    feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
    feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
    feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
    feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax


    return [i+1, feat_labels, feat_scores,
            feat_ymin, feat_xmin, feat_ymax, feat_xmax]

jaccard_with_anchors 这个函数其实就是返回交并比，

def jaccard_with_anchors(bbox):
    """Compute jaccard score between a box and the anchors.
    """
    int_ymin = tf.maximum(ymin, bbox[0])
    int_xmin = tf.maximum(xmin, bbox[1])
    int_ymax = tf.minimum(ymax, bbox[2])
    int_xmax = tf.minimum(xmax, bbox[3])
    h = tf.maximum(int_ymax - int_ymin, 0.)
    w = tf.maximum(int_xmax - int_xmin, 0.)
    # Volumes.
    inter_vol = h * w
    union_vol = vol_anchors - inter_vol \
        + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
    jaccard = tf.div(inter_vol, union_vol)
    return jaccard

先求相交的坐标，然后求相交的面积，然后求交并比，比较简单。

这个地方是帅选掉交并比小于0的

mask = tf.greater(jaccard, feat_scores)

tf.greater就是比较大小，如果jaccard>feat_scores则为真，否则为假。tf.logical_and表示两个同时为真才是真，

feat_labels = imask * label + (1 - imask) * feat_labels

上面这一句，当imask为1,那么就是label，否则label就是0,也就是背景，那imask什么时候为1,imask = tf.cast(mask, tf.int64)，而mask又是大于feat_score的，所以这个地方因为是循环，遍历所有的目标，那么选择框的方式就是，选择交比比最大的，也就是某一个目标他对应的框里面，交并比最大的，这是一种策略，但是论文中还提到，高于0.5的我们也有对应的目标，但是代码没有这中策略，它只是选择了交并比最大的。feat_scores = tf.where(mask, jaccard, feat_scores)，这个地方就是更新feat_scores，也就是体现是选择交并比最大的。

后面的feat_ymin之类的，也是跟着更新，如果该框的交并比大，那么就是保存为GT的bbox，然后返回，进行下一个循环。循环完了我们看后面的代码：

# Transform to center / size.
feat_cy = (feat_ymax + feat_ymin) / 2.
feat_cx = (feat_xmax + feat_xmin) / 2.
feat_h = feat_ymax - feat_ymin
feat_w = feat_xmax - feat_xmin
# Encode features.
### prior_scaling=[0.1, 0.1, 0.2, 0.2]
feat_cy = (feat_cy - yref) / href / prior_scaling[0]
feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
feat_h = tf.log(feat_h / href) / prior_scaling[2]
feat_w = tf.log(feat_w / wref) / prior_scaling[3]
# Use SSD ordering: x / y / w / h instead of ours.
feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
## feat_labels 表示返回每个anchor对应的类别，feat_localizations返回的是一种变换，
## feat_scores 每个anchor与gt对应的最大的交并比
return feat_labels, feat_localizations, feat_scores

feat_cy之类的是框的左上角和右下角坐标变为中心左边和场合宽，还是一样的，是numpy的广播，这个特征层一起变，

prior_scaling这个其实我也不知道到为啥需要缩放，貌似论文没说要缩放，终点看这一快

feat_cy = (feat_cy - yref) / href / prior_scaling[0]
feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
feat_h = tf.log(feat_h / href) / prior_scaling[2]
feat_w = tf.log(feat_w / wref) / prior_scaling[3]

这个其实就是论文的这个一块，

其实和我们的Faster rcnn是一样的，是求真实框与anchor之间的变换，你把上面随便一个移项，就会得anchor经过伸缩变换得到真实的框，所以这个地方回归的是一种变换，因为实际我们的框是存在的，然后经过我们回归得到的变换，经过变换得到真实框，所以这个地方损失函数其实是我们预测的是变换，我们实际的框和anchor之间的变换和我们预测的变换之间的loss。我们回归的是一种变换。并不是直接预测框，这个和YOLO是不一样的。和Faster RCNN是一样的。然后返回每一层的结果，放在

target_labels.append(t_labels)
target_localizations.append(t_loc)
target_scores.append(t_scores)
## t_labels 表示返回每个anchor对应的类别，t_loc返回的是一种变换，
## t_scores 每个anchor与gt对应的最大的交并比
## target_labels是一个list，包含每层的每个anchor对应的gt类别，
## target_localizations对应的是包含每一层所有anchor对应的变换
### target_scores 返回的是每个anchor与gt对应的最大的交并比

接下来我们看损失函数：

4 SSD 损失函数

我们先看代码：

    def losses(self, logits, localisations,
               gclasses, glocalisations, gscores,
               match_threshold=0.5,
               negative_ratio=3.,
               alpha=1.,
               label_smoothing=0.,
               scope='ssd_losses'):
        """Define the SSD network losses.
        """
        return ssd_losses(logits, localisations,
                          gclasses, glocalisations, gscores,
                          match_threshold=match_threshold,
                          negative_ratio=negative_ratio,
                          alpha=alpha,
                          label_smoothing=label_smoothing,
                          scope=scope)

这个地方也是调用其他函数，所以这个代码读起来挺费劲的，都是这个调那个，解释以下参数的含义，logits是每一层特征图输出，是没有经过softmax的，localistions是我们的预测框，带g的表示真实的，negative是正反例之比，是1：3,也就是负例是3,这个地方和论文是一样的。label_smoothing这个地方设置为0,并没有做平滑，记得在GAN的loss里面会用到。

然后我们看ssd_losses这个函数：

def ssd_losses(logits, localisations,
               gclasses, glocalisations, gscores,
               match_threshold=0.5,
               negative_ratio=3.,
               alpha=1.,
               label_smoothing=0.,
               scope=None):
    """Loss functions for training the SSD 300 VGG network.

    This function defines the different loss components of the SSD, and
    adds them to the TF loss collection.

    Arguments:
      logits: (list of) predictions logits Tensors;
      localisations: (list of) localisations Tensors;
      gclasses: (list of) groundtruth labels Tensors;
      glocalisations: (list of) groundtruth localisations Tensors;
      gscores: (list of) groundtruth score Tensors;
    """
    with tf.name_scope(scope, 'ssd_losses'):
        l_cross_pos = []
        l_cross_neg = []
        l_loc = []
        for i in range(len(logits)):
            dtype = logits[i].dtype
            with tf.name_scope('block_%i' % i):
                # Determine weights Tensor.
                pmask = gscores[i] > match_threshold
                fpmask = tf.cast(pmask, dtype)
                n_positives = tf.reduce_sum(fpmask)

                # Select some random negative entries.
                # n_entries = np.prod(gclasses[i].get_shape().as_list())
                # r_positive = n_positives / n_entries
                # r_negative = negative_ratio * n_positives / (n_entries - n_positives)

                # Negative mask.
                no_classes = tf.cast(pmask, tf.int32)
                predictions = slim.softmax(logits[i])
                nmask = tf.logical_and(tf.logical_not(pmask),
                                       gscores[i] > -0.5)
                fnmask = tf.cast(nmask, dtype)
                nvalues = tf.where(nmask,
                                   predictions[:, :, :, :, 0],
                                   1. - fnmask)
                nvalues_flat = tf.reshape(nvalues, [-1])
                # Number of negative entries to select.
                n_neg = tf.cast(negative_ratio * n_positives, tf.int32)
                n_neg = tf.maximum(n_neg, tf.size(nvalues_flat) // 8)
                n_neg = tf.maximum(n_neg, tf.shape(nvalues)[0] * 4)
                max_neg_entries = 1 + tf.cast(tf.reduce_sum(fnmask), tf.int32)
                n_neg = tf.minimum(n_neg, max_neg_entries)

                val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
                minval = val[-1]
                # Final negative mask.
                nmask = tf.logical_and(nmask, -nvalues > minval)
                fnmask = tf.cast(nmask, dtype)

                # Add cross-entropy loss.
                with tf.name_scope('cross_entropy_pos'):
                    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
                                                                          labels=gclasses[i])
                    loss = tf.losses.compute_weighted_loss(loss, fpmask)
                    l_cross_pos.append(loss)

                with tf.name_scope('cross_entropy_neg'):
                    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
                                                                          labels=no_classes)
                    loss = tf.losses.compute_weighted_loss(loss, fnmask)
                    l_cross_neg.append(loss)

                # Add localization loss: smooth L1, L2, ...
                with tf.name_scope('localization'):
                    # Weights Tensor: positive mask + random negative.
                    weights = tf.expand_dims(alpha * fpmask, axis=-1)
                    loss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])
                    loss = tf.losses.compute_weighted_loss(loss, weights)
                    l_loc.append(loss)

        # Additional total losses...
        with tf.name_scope('total'):
            total_cross_pos = tf.add_n(l_cross_pos, 'cross_entropy_pos')
            total_cross_neg = tf.add_n(l_cross_neg, 'cross_entropy_neg')
            total_cross = tf.add(total_cross_pos, total_cross_neg, 'cross_entropy')
            total_loc = tf.add_n(l_loc, 'localization')

            # Add to EXTRA LOSSES TF.collection
            tf.add_to_collection('EXTRA_LOSSES', total_cross_pos)
            tf.add_to_collection('EXTRA_LOSSES', total_cross_neg)
            tf.add_to_collection('EXTRA_LOSSES', total_cross)
            tf.add_to_collection('EXTRA_LOSSES', total_loc)

这个地方，

pmask = gscores[i] > match_threshold
fpmask = tf.cast(pmask, dtype)
n_positives = tf.reduce_sum(fpmask)

这个代码，这个地方又做一次帅选，如果交并比大于0.5,那么我们认为是正例，fpmask 记录正例和负例，n_positives这个是

正例的个数，

no_classes = tf.cast(pmask, tf.int32)
predictions = slim.softmax(logits[i])
nmask = tf.logical_and(tf.logical_not(pmask),
                       gscores[i] > -0.5)
fnmask = tf.cast(nmask, dtype)
nvalues = tf.where(nmask,
                   predictions[:, :, :, :, 0],
                   1. - fnmask)
nvalues_flat = tf.reshape(nvalues, [-1])

no_classes把布尔型变量变为整形，那么就是要么是0,要么是1,前景就是1,背景就是0,predictions是记录预测每个类的概率

nmask，就是负例，你看，tf.logical_not(pmask)就是取反，这个地方我觉得gscores[i] > -0.5，之前已经帅选了，就

是交并比不合适的，小于0的，这个地方应该。

nvalues就是把我们的类别提取出来，否则就是0,表示背景。后面就是做了一个拉伸。tf.where(cond,x,y)表示如果cond为真，就是x，否则就是y。

n_neg = tf.cast(negative_ratio * n_positives, tf.int32)
n_neg = tf.maximum(n_neg, tf.size(nvalues_flat) // 8)
n_neg = tf.maximum(n_neg, tf.shape(nvalues)[0] * 4)
max_neg_entries = 1 + tf.cast(tf.reduce_sum(fnmask), tf.int32)
n_neg = tf.minimum(n_neg, max_neg_entries)

val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
minval = val[-1]
# Final negative mask.
nmask = tf.logical_and(nmask, -nvalues > minval)
fnmask = tf.cast(nmask, dtype)

n_neg就是负样本的数量， negative_ratio正负样本比列，默认就是3,后面的第一个取最大，我觉得是保证至少有负样本，

max_neg_entries这个就是负样本的数量，n_neg = tf.minimum(n_neg, max_neg_entries)，这个比较很好理解，万一

你总样本比你三倍正样本少，所以需要选择小的，所以这个地方保证足够的负样本，nmask表示我们所选取的负样本，

tf.nn.top_k，这个是选取前k=neg个负例，因为取了负号，表示选择的交并比最小的k个，minval就是选择负例里面交并比

最大的，nmask就是把我们选择的负样例设为整数，就是提取出我们选择的，tf.logical_and就是同时为真，首先。需要是

负例，其次值需要大于minval，因为取了负数，所以nmask就是我们所选择的负例，fnmask就是就是我们选取的负样本只是

数据类型变了，由bool变为了浮点型，(dtype默认是浮点型)，接着看损失函数：

# Add cross-entropy loss.
with tf.name_scope('cross_entropy_pos'):
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
                                                          labels=gclasses[i])
    loss = tf.losses.compute_weighted_loss(loss, fpmask)
    l_cross_pos.append(loss)

这个是正例的损失，其实就是交叉熵损失，tf.losses.compute_weighted_loss其实就是相当于loss×fpmask，

这个地方之所以需要乘以fpmask是为了过滤掉负样本，因为负样本的label就是0,其他得是1.而fpmask刚好就是这样。

with tf.name_scope('cross_entropy_neg'):
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
                                                          labels=no_classes)
    loss = tf.losses.compute_weighted_loss(loss, fnmask)
    l_cross_neg.append(loss)

这个是负例的损失函数，也是交叉熵损失，同时fnmask也是过滤掉正例，no_classes里面负例就是0,

                with tf.name_scope('localization'):
                    # Weights Tensor: positive mask + random negative.
                    weights = tf.expand_dims(alpha * fpmask, axis=-1)
                    loss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])
                    loss = tf.losses.compute_weighted_loss(loss, weights)
                    l_loc.append(loss)

这个地方就是回归框损失，我们先看看论文回归框损失用的损失函数，就是smoothL1损失，这个样子：

然后我们再看这个函数代码：

def abs_smooth(x):
    """Smoothed absolute function. Useful to compute an L1 smooth error.

    Define as:
        x^2 / 2         if abs(x) < 1
        abs(x) - 0.5    if abs(x) > 1
    We use here a differentiable definition using min(x) and abs(x). Clearly
    not optimal, but good enough for our purpose!
    """
    absx = tf.abs(x)
    minx = tf.minimum(absx, 1)
    r = 0.5 * ((absx - 1) * minx + absx)
    return r

其实就是上面的损失函数，然后后面的weight也是过滤框没有目标的，之所有alpha是因为论文也有，但是默认就是1,现在我们看看论文的损失函数

其实和论文的损失函数是一样的。

关于代码，代码的训练部分还有很多其他内容，涉及多gpu，预处理等，但是核心思想就是这些，有机会在将其他的代码。

你可能感兴趣的:(tensorflow,深度学习,深度学习)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
损失函数与反向传播 Star_. PyTorch pytorch 深度学习 python
损失函数定义与作用损失函数(lossfunction)在深度学习领域是用来计算搭建模型预测的输出值和真实值之间的误差。1.损失函数越小越好2.计算实际输出与目标之间的差距3.为更新输出提供依据（反向传播)常见的损失函数回归常见的损失函数有：均方差（MeanSquaredError，MSE）、平均绝对误差（MeanAbsoluteErrorLoss，MAE）、HuberLoss是一种将MSE与MAE
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
基于深度学习的文本引导的图像编辑 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的文本引导的图像编辑（Text-GuidedImageEditing）是一种通过自然语言文本指令对图像进行编辑或修改的技术。它结合了图像生成和自然语言处理（NLP）的最新进展，使用户能够通过描述性文本对图像内容进行精确的调整和操控。1.文本引导的图像编辑的挑战文本和图像之间的对齐：如何将文本中的语义信息准确地映射到图像中的特定区域或元素是一个关键挑战。这涉及到多模态数据的对齐和理解。编
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
深度学习驱动的车牌识别：技术演进与未来挑战逼子歌深度学习车牌识别神经网络字符识别 YOLO 卷积神经网络
一、引言1.1研究背景在当今社会，智能交通系统的发展日益重要，而车牌识别作为其关键组成部分，发挥着至关重要的作用。车牌识别技术广泛应用于交通管理、停车场管理、安防监控等领域。在交通管理中，它可以用于车辆识别、交通违法监控和车流统计等，提高交通管理的效率和准确性。在停车场管理中，实现车辆的自动识别和收费，提升管理和服务水平。在安防监控领域，可用于追踪嫌疑人及犯罪行为。深度学习的出现为车牌识别带来了重
每天五分钟玩转深度学习PyTorch：模型参数优化器torch.optim 幻风_huanfeng 深度学习框架pytorch 深度学习 pytorch 人工智能神经网络机器学习优化算法
本文重点在机器学习或者深度学习中，我们需要通过修改参数使得损失函数最小化(或最大化)，优化算法就是一种调整模型参数更新的策略。在pytorch中定义了优化器optim，我们可以使用它调用封装好的优化算法，然后传递给它神经网络模型参数，就可以对模型进行优化。本文是学习第6步(优化器)，参考链接pytorch的学习路线随机梯度下降算法在深度学习和机器学习中，梯度下降算法是最常用的参数更新方法，它的公式
什么是AIGC？有哪些免费工具？ chent_某位 AIGC
AIGC（AIGeneratedContent），即“人工智能生成内容”，是指通过人工智能技术自动生成各种类型的数字内容。AIGC让机器能够根据输入的信息或数据生成符合人类需求的文本、图像、音频、视频等内容，极大提高了内容创作的效率。AIGC的背景与起源随着深度学习和自然语言处理技术的快速发展，人工智能已经不再局限于简单的任务，如分类、预测和数据分析，而是具备了生成内容的能力。生成式AI模型，如O
transformer架构(Transformer Architecture)原理与代码实战案例讲解 AI架构设计之禅大数据AI人工智能 Python入门实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
transformer架构(TransformerArchitecture)原理与代码实战案例讲解关键词：Transformer,自注意力机制,编码器-解码器,预训练,微调,NLP,机器翻译作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来自然语言处理（NLP）领域的发展经历了从规则驱动到统计驱动再到深度学习驱动的三个阶段。
如何有效的学习AI大模型？ Python程序员罗宾学习人工智能语言模型自然语言处理架构
学习AI大模型是一个系统性的过程，涉及到多个学科的知识。以下是一些建议，帮助你更有效地学习AI大模型：基础知识储备：数学基础：学习线性代数、概率论、统计学和微积分等，这些是理解机器学习算法的数学基础。编程技能：掌握至少一种编程语言，如Python，因为大多数AI模型都是用Python实现的。理论学习：机器学习基础：了解监督学习、非监督学习、强化学习等基本概念。深度学习：学习神经网络的基本结构，如卷
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程牙牙要健康深度学习 onnx onnxruntime 深度学习 python 人工智能
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程提示:博主取舍了很多大佬的博文并亲测有效,分享笔记邀大家共同学习讨论文章目录【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程前言模型转换--pytorch转onnxWindows平台搭建依赖环境onnxruntime调用onnx模型ONNXRuntime推理核
Spring4.1新特性——Spring MVC增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
mysql 性能查询优化 annan211 java sql 优化 mysql 应用服务器
1 时间到底花在哪了？ mysql在执行查询的时候需要执行一系列的子任务，这些子任务包含了整个查询周期最重要的阶段，这其中包含了大量为了检索数据列到存储引擎的调用以及调用后的数据处理，包括排序、分组等。在完成这些任务的时候，查询需要在不同的地方花费时间，包括网络、cpu计算、生成统计信息和执行计划、锁等待等。尤其是向底层存储引擎检索数据的调用操作。这些调用需要在内存操
windows系统配置 cherishLC windows
删除Hiberfil.sys ：使用命令powercfg -h off 关闭休眠功能即可： http://jingyan.baidu.com/article/f3ad7d0fc0992e09c2345b51.html 类似的还有pagefile.sys msconfig 配置启动项 shutdown 定时关机 ipconfig 查看网络配置 ipconfig /flushdns
人体的排毒时间 Array_06 工作
======================== || 人体的排毒时间是什么时候？|| ======================== 转载于： http://zhidao.baidu.com/link?url=ibaGlicVslAQhVdWWVevU4TMjhiKaNBWCpZ1NS6igCQ78EkNJZFsEjCjl3T5EdXU9SaPg04bh8MbY1bR
ZooKeeper cugfy zookeeper
Zookeeper是一个高性能，分布式的，开源分布式应用协调服务。它提供了简单原始的功能，分布式应用可以基于它实现更高级的服务，比如同步，配置管理，集群管理，名空间。它被设计为易于编程，使用文件系统目录树作为数据模型。服务端跑在java上，提供java和C的客户端API。 Zookeeper是Google的Chubby一个开源的实现，是高有效和可靠的协同工作系统，Zookeeper能够用来lea
网络爬虫的乱码处理随意而生爬虫网络
下边简单总结下关于网络爬虫的乱码处理。注意，这里不仅是中文乱码，还包括一些如日文、韩文、俄文、藏文之类的乱码处理，因为他们的解决方式是一致的，故在此统一说明。网络爬虫，有两种选择，一是选择nutch、hetriex，二是自写爬虫，两者在处理乱码时，原理是一致的，但前者处理乱码时，要看懂源码后进行修改才可以，所以要废劲一些；而后者更自由方便，可以在编码处理
Xcode常用快捷键张亚雄 xcode
一、总结的常用命令：隐藏xcode command+h 退出xcode command+q 关闭窗口 command+w 关闭所有窗口 command+option+w 关闭当前
mongoDB索引操作 adminjun mongodb 索引
一、索引基础： MongoDB的索引几乎与传统的关系型数据库一模一样，这其中也包括一些基本的优化技巧。下面是创建索引的命令： > db.test.ensureIndex({"username":1}) 可以通过下面的名称查看索引是否已经成功建立： &nbs
成都软件园实习那些话 aijuans 成都软件园实习
无聊之中，翻了一下日志，发现上一篇经历是很久以前的事了，悔过~~ 　　断断续续离开了学校快一年了，习惯了那里一天天的幼稚、成长的环境，到这里有点与世隔绝的感觉。不过还好，那是刚到这里时的想法，现在感觉在这挺好，不管怎么样，最要感谢的还是老师能给这么好的一次催化成长的机会，在这里确实看到了好多好多能想到或想不到的东西。　　都说在外面和学校相比最明显的差距就是与人相处比较困难，因为在外面每个人都
Linux下FTP服务器安装及配置 ayaoxinchao linux FTP服务器 vsftp
检测是否安装了FTP [root@localhost ~]# rpm -q vsftpd 如果未安装：package vsftpd is not installed 安装了则显示：vsftpd-2.0.5-28.el5累死的版本信息安装FTP 运行yum install vsftpd命令，如[root@localhost ~]# yum install vsf
使用mongo-java-driver获取文档id和查找文档 BigBird2012 driver
注：本文所有代码都使用的mongo-java-driver实现。在MongoDB中，一个集合（collection）在概念上就类似我们SQL数据库中的表（Table），这个集合包含了一系列文档（document）。一个DBObject对象表示我们想添加到集合（collection）中的一个文档（document），MongoDB会自动为我们创建的每个文档添加一个id，这个id在
JSONObject以及json串 bijian1013 json JSONObject
一.JAR包简介要使程序可以运行必须引入JSON-lib包，JSON-lib包同时依赖于以下的JAR包： 1.commons-lang-2.0.jar 2.commons-beanutils-1.7.0.jar 3.commons-collections-3.1.jar &n
[Zookeeper学习笔记之三]Zookeeper实例创建和会话建立的异步特性 bit1129 zookeeper
为了说明问题，看个简单的代码， import org.apache.zookeeper.*; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ThreadLocal
【Scala十二】Scala核心六：Trait bit1129 scala
Traits are a fundamental unit of code reuse in Scala. A trait encapsulates method and field definitions, which can then be reused by mixing them into classes. Unlike class inheritance, in which each c
weblogic version 10.3破解 ronin47 weblogic
版本：WebLogic Server 10.3 说明：%DOMAIN_HOME%：指WebLogic Server 域(Domain）目录例如我的做测试的域的根目录 DOMAIN_HOME=D:/Weblogic/Middleware/user_projects/domains/base_domain 1.为了保证操作安全，备份%DOMAIN_HOME%/security/Defa
求第n个斐波那契数 BrokenDreams
今天看到群友发的一个问题：写一个小程序打印第n个斐波那契数。自己试了下，搞了好久。。。基础要加强了。 &nbs
读《研磨设计模式》-代码笔记-访问者模式-Visitor bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; interface IVisitor { //第二次分派，Visitor调用Element void visitConcret
MatConvNet的excise 3改为网络配置文件形式 cherishLC matlab
MatConvNet为vlFeat作者写的matlab下的卷积神经网络工具包，可以使用GPU。主页： http://www.vlfeat.org/matconvnet/ 教程： http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html 注意：需要下载新版的MatConvNet替换掉教程中工具包中的matconvnet： http
ZK Timeout再讨论 chenchao051 zookeeper timeout hbase
http://crazyjvm.iteye.com/blog/1693757 文中提到相关超时问题，但是又出现了一个问题，我把min和max都设置成了180000，但是仍然出现了以下的异常信息： Client session timed out, have not heard from server in 154339ms for sessionid 0x13a3f7732340003
CASE WHEN 用法介绍 daizj sql group by case when
CASE WHEN 用法介绍 1. CASE WHEN 表达式有两种形式 --简单Case函数 CASE sex WHEN '1' THEN '男' WHEN '2' THEN '女' ELSE '其他' END --Case搜索函数 CASE WHEN sex = '1' THEN
PHP技巧汇总:提高PHP性能的53个技巧 dcj3sjt126com PHP
PHP技巧汇总:提高PHP性能的53个技巧　　用单引号代替双引号来包含字符串，这样做会更快一些。因为PHP会在双引号包围的字符串中搜寻变量，　　单引号则不会，注意：只有echo能这么做，它是一种可以把多个字符串当作参数的函数译注：　　PHP手册中说echo是语言结构，不是真正的函数，故把函数加上了双引号)。　　1、如果能将类的方法定义成static，就尽量定义成static，它的速度会提升将近4倍
Yii框架中CGridView的使用方法以及详细示例 dcj3sjt126com yii
CGridView显示一个数据项的列表中的一个表。表中的每一行代表一个数据项的数据,和一个列通常代表一个属性的物品(一些列可能对应于复杂的表达式的属性或静态文本)。　　CGridView既支持排序和分页的数据项。排序和分页可以在AJAX模式或正常的页面请求。使用CGridView的一个好处是,当用户浏览器禁用JavaScript,排序和分页自动退化普通页面请求和仍然正常运行。实例代码如下：
Maven项目打包成可执行Jar文件 dyy_gusi assembly
Maven项目打包成可执行Jar文件在使用Maven完成项目以后，如果是需要打包成可执行的Jar文件，我们通过eclipse的导出很麻烦，还得指定入口文件的位置，还得说明依赖的jar包，既然都使用Maven了，很重要的一个目的就是让这些繁琐的操作简单。我们可以通过插件完成这项工作，使用assembly插件。具体使用方式如下： 1、在项目中加入插件的依赖： <plugin>
php常见错误 geeksun PHP
1. kevent() reported that connect() failed (61: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastc
修改linux的用户名 hongtoushizi linux change password
Change Linux Username 更改Linux用户名，需要修改4个系统的文件： /etc/passwd /etc/shadow /etc/group /etc/gshadow 古老/传统的方法是使用vi去直接修改，但是这有安全隐患（具体可自己搜一下），所以后来改成使用这些命令去代替： vipw vipw -s vigr vigr -s 具体的操作顺
第五章常用Lua开发库1-redis、mysql、http客户端 jinnianshilongnian nginx lua
对于开发来说需要有好的生态开发库来辅助我们快速开发，而Lua中也有大多数我们需要的第三方开发库如Redis、Memcached、Mysql、Http客户端、JSON、模板引擎等。一些常见的Lua库可以在github上搜索，https://github.com/search?utf8=%E2%9C%93&q=lua+resty。 Redis客户端 lua-resty-r
zkClient 监控机制实现 liyonghui160com zkClient 监控机制实现
直接使用zk的api实现业务功能比较繁琐。因为要处理session loss，session expire等异常，在发生这些异常后进行重连。又因为ZK的watcher是一次性的，如果要基于wather实现发布/订阅模式，还要自己包装一下，将一次性订阅包装成持久订阅。另外如果要使用抽象级别更高的功能，比如分布式锁，leader选举
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句 pda158 mysql
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句：　　方法一：SELECT table_name, column_name from information_schema.columns WHERE column_name LIKE 'Name'; 　　方法二：SELECT column_name from information_schema.colum
程序员对英语的依赖 Smile.zeng 英语程序猿
1、程序员最基本的技能，至少要能写得出代码，当我们还在为建立类的时候思考用什么单词发牢骚的时候，英语与别人的差距就直接表现出来咯。 2、程序员最起码能认识开发工具里的英语单词，不然怎么知道使用这些开发工具。 3、进阶一点，就是能读懂别人的代码，有利于我们学习人家的思路和技术。 4、写的程序至少能有一定的可读性，至少要人别人能懂吧... 以上一些问题，充分说明了英语对程序猿的重要性。骚年
Oracle学习笔记(8) 使用PLSQL编写触发器 vipbooks oracle sql 编程活动 Access
时间过得真快啊，转眼就到了Oracle学习笔记的最后个章节了，通过前面七章的学习大家应该对Oracle编程有了一定了了解了吧，这东东如果一段时间不用很快就会忘记了，所以我会把自己学习过的东西做好详细的笔记，用到的时候可以随时查找，马上上手！希望这些笔记能对大家有些帮助！这是第八章的学习笔记，学习完第七章的子程序和包之后