山居秋暝LS

FasterRCNN详解

1.2.2 FasterRCNN
- 1 模型
- - 1.1 主干网络VGG16 or ResNet50.
  - 1.2 RPN生成建议框
  - 1.3 RCNN进行分类和回归
- 2 预测
- - 2.1 预测流程
- 3 训练
- - 3.1 训练流程
  - 3.2 生成标签
  - 3.3 损失函数

1.2.2 FasterRCNN

FasterRCNN在FastRCNN的基础上，实现端到端的训练。算法分为3个部分。主干网络提取特征、RPN生成建议框、RCNN进行分类和回归。

FasterRCNN优点：

检测精度高。
RPN网络生成先验框。
通用型、鲁棒性强。
可优化空间充足。

FasterRCNN缺点：

主干网络只提取一个特征。
NMS对遮挡物不友好，可能造成漏检。
RoI Pooling两次取证带精度损失。
网络最后使用全连接导致网络参数量大。
检测速度相对慢。

1 模型

1.1 主干网络VGG16 or ResNet50.

input[800,600,3] --> VGGNet(4次下采样) / ResNet50–> Output[37,50,512]

(1) VGG16代理流程
VGG16以卷积和池化作为基本结构，输入[600,600,3]进行5次下采样得到base_layers[37,37,512]。

'''
base_layers = VGG16(inputs)
x --> Conv2D*2 + MaxPoolong2D --> Conv2D*2 + MaxPoolong2D  --> Conv2D*3 + MaxPoolong2D --> Conv2D*3 + MaxPoolong2D --> Conv2D*3  
'''
def VGG16(inputs):
    x = Conv2D(64,(3,3),activation = 'relu',padding = 'same',name = 'block1_conv1')(inputs)
    x = Conv2D(64,(3,3),activation = 'relu',padding = 'same', name = 'block1_conv2')(x)
    x = MaxPooling2D((2,2), strides = (2,2), name = 'block1_pool')(x)

    x = Conv2D(128,(3,3),activation = 'relu',padding = 'same',name = 'block2_conv1')(x)
    x = Conv2D(128,(3,3),activation = 'relu',padding = 'same',name = 'block2_conv2')(x)
    x = MaxPooling2D((2,2),strides = (2,2), name = 'block2_pool')(x)

    x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv1')(x)
    x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv2')(x)
    x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv3')(x)
    x = MaxPooling2D((2,2),strides = (2,2), name = 'block3_pool')(x)

    # 第四个卷积部分
    # 14,14,512
    x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv1')(x)
    x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv2')(x)
    x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv3')(x)
    x = MaxPooling2D((2,2),strides = (2,2), name = 'block4_pool')(x)

    # 第五个卷积部分
    # 7,7,512
    x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv1')(x)
    x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv2')(x)
    x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv3')(x)    

    return x

(2) ResNet50代码流程
ResNet50以卷积和池化作为基本结构，输入[600,600,3]进行4次下采样得到base_layers[38,38,1024]。ResNet50有conv_block 、identity_block两个基础结构。

'''
base_layers = ResNet50(inputs)
conv_block ：BottleNeck + ResNet(下采样)
identity_block ：BottleNeck + ResNet(无下采样)
ResNet50 ：ZCBAM --> conv_block + identity_block *2 --> conv_block + identity_block *3 --> conv_block + identity_block *5   
'''
def identity_block(input_tensor, kernel_size, filters, stage, block):

    filters1, filters2, filters3 = filters

    conv_name_base  = 'res' + str(stage) + block + '_branch'
    bn_name_base    = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size, padding='same', kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    x = layers.add([x, input_tensor])
    x = Activation('relu')(x)
    return x


def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):

    filters1, filters2, filters3 = filters

    conv_name_base  = 'res' + str(stage) + block + '_branch'
    bn_name_base    = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1), strides=strides, kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size, padding='same', kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    shortcut = Conv2D(filters3, (1, 1), strides=strides, kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)

    x = layers.add([x, shortcut])
    x = Activation('relu')(x)
    return x

def ResNet50(inputs):
    #-----------------------------------#
    #   假设输入进来的图片是600,600,3
    #-----------------------------------#
    # 600,600,3 -> 300,300,64
    x = ZeroPadding2D((3, 3))(inputs)
    x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
    x = BatchNormalization(name='bn_conv1')(x)
    x = Activation('relu')(x)

    # 300,300,64 -> 150,150,64
    x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

    # 150,150,64 -> 150,150,256
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

    # 150,150,256 -> 75,75,512
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

    # 75,75,512 -> 38,38,1024
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

    # 最终获得一个38,38,1024的共享特征层
    return x

1.2 RPN生成建议框

在FeatureMap上每个点生成9个Anchors。
FeatureMap分别进行卷积得到置信度和预测偏移。
Anchors与标签匹配得到正负样本，对预测偏移和置信度进行Loss。
根据预测得分、预测偏移抠图得到Proposals。

'''
rpn = get_rpn(base_layers, num_anchors)
base_layers[38,38,1024] -->  Conv2D((3, 3),512) -->  Conv2D((1, 1),num_anchors) +  Conv2D((3, 3),num_anchors * 4) --> Reshape --> x_class, x_regr]
'''
def get_rpn(base_layers, num_anchors):
    #----------------------------------------------------#
    #   利用一个512通道的3x3卷积进行特征整合
    #----------------------------------------------------#
    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer=random_normal(stddev=0.02), name='rpn_conv1')(base_layers)

    #----------------------------------------------------#
    #   利用一个1x1卷积调整通道数，获得预测结果
    #----------------------------------------------------#
    x_class = Conv2D(num_anchors, (1, 1), activation = 'sigmoid', kernel_initializer=random_normal(stddev=0.02), name='rpn_out_class')(x)
    x_regr  = Conv2D(num_anchors * 4, (1, 1), activation = 'linear', kernel_initializer=random_normal(stddev=0.02), name='rpn_out_regress')(x)
    
    x_class = Reshape((-1, 1),name="classification")(x_class)
    x_regr  = Reshape((-1, 4),name="regression")(x_regr)
    return [x_class, x_regr]

1.3 RCNN进行分类和回归

对上一步的到Proposals进行卷积得到分类结果和预测框的偏移。
根据到Proposals和groudtruth 的IoU划分正负样本。
根据正负样本，计算Loss。

'''
classifier  = get_vgg_classifier(base_layers, roi_input, 7, num_classes)

base_layers, input_rois --> RoiPoolingConv(roi_size) -->  vgg_classifier_layers --> TimeDistributed(Dense) +  TimeDistributed(Dense)  --> out_class, out_regr 
'''
def get_vgg_classifier(base_layers, input_rois, roi_size=7, num_classes=21):  
    # batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512
    out_roi_pool = RoiPoolingConv(roi_size)([base_layers, input_rois])

    # batch_size, num_rois, 7, 7, 512 -> batch_size, num_rois, 4096
    out = vgg_classifier_layers(out_roi_pool)

    # batch_size, num_rois, 4096 -> batch_size, num_rois, num_classes
    out_class   = TimeDistributed(Dense(num_classes, activation='softmax', kernel_initializer=random_normal(stddev=0.02)), name='dense_class_{}'.format(num_classes))(out)
    # batch_size, num_rois, 4096 -> batch_size, num_rois, 4 * (num_classes-1)
    out_regr    = TimeDistributed(Dense(4 * (num_classes-1), activation='linear', kernel_initializer=random_normal(stddev=0.02)), name='dense_regress_{}'.format(num_classes))(out)
    return [out_class, out_regr]

RoIPooling : batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512

'''
# batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512
# roi_input   = Input(shape=(None, 4))
# 用RPN得到的建议框在FeatureMap上截取下来，并Pooling。
'''

out = vgg_classifier_layers(out_roi_pool)

'''
def vgg_classifier_layers(x):
    # num_rois, 14, 14, 1024 -> num_rois, 7, 7, 2048
    x = TimeDistributed(Flatten(name='flatten'))(x)
    x = TimeDistributed(Dense(4096, activation='relu'), name='fc1')(x)
    x = TimeDistributed(Dense(4096, activation='relu'), name='fc2')(x)
    return x
'''

2 预测

2.1 预测流程

r_image = frcnn.detect_image(image)

对输入原图像处理。
获得rpn网络预测结果和base_layer。
生成先验框并解码。
利用建议框获得classifier网络预测结果。
利用classifier的预测结果对建议框进行解码，获得预测框。
图像绘制。

(1) rpn网络进行预测得到置信度和预测偏移值

'''
preds = self.model_rpn.predict(photo)  
1. model_rpn.predict: photo[1,600,600,3] --> model_rpn --> preds[x_class, x_regr, base_layers]

2. model_rpn: input[1,600,600,3]  -->  ResNet50(inputs) --> base_layers + num_anchors=9  -->  get_rpn(base_layers, num_anchors) --> rpn

3. ResNet50: (inputs[1,600,600,3]) --> ZCBAM(None, 150, 150, 64) --> conv_block + identity_block*2   (None, 150, 150, 256) --> conv_block + identity_block*3   (None, 75, 75, 512)--> conv_block + identity_block*5  --> base_layers(None, 38, 38, 1024)
 
4. get_rpn:  base_layers(None, 38, 38, 1024)-->  Conv2D(512) (None, 38, 38, 512)--> Conv2D(num_anchors)(None, 38, 38, 9),Conv2D(num_anchors * 4)(None, 38, 38, 36) --> x_class(None, 12996, 1) , x_regr(None, 12996, 4) , base_layers(None, 38, 38, 1024)
'''

(1.1) ResNet50

'''
base_layers = ResNet50(inputs) # [1,600,600,3] --> (None, 38, 38, 1024)
'''
def ResNet50(inputs):
    # 输入 600*600*3
    img_input = inputs

    x = ZeroPadding2D((3, 3))(img_input)
    x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)                  # 300*300*64
    x = BatchNormalization(name='bn_conv1')(x)
    x = Activation('relu')(x)
    x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)              # 150*150*64

    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))  # 150*150*256
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')                 # 75*75*512
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')                 # 38*38*1024
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

    return x   # (None, 38, 38, 1024)

(1.2) get_rpn

'''
rpn = get_rpn(base_layers, num_anchors) # (None, 38, 38, 1024) --> x_class(None, 12996, 1) , x_regr(None, 12996, 4) 
'''
def get_rpn(base_layers, num_anchors):
    # 1 base_layers进行3*3卷积
    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)
    # 2 得到每个框的置信度和框坐标
    x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
    x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)
    
    x_class = Reshape((-1,1),name="classification")(x_class)  # 如果包含物体，x_class的值接近1。
    x_regr = Reshape((-1,4),name="regression")(x_regr)
    return [x_class, x_regr, base_layers] #  x_class(None, 12996, 1) , x_regr(None, 12996, 4) , base_layers(None, 38, 38, 1024)

(2) 生成先验框

生成框边长。
边长和中心点结合，每个像素点生成框
缩放框在0，1之间

'''
anchors = get_anchors(self.get_img_output_length(width,height),width,height)
self.get_img_output_length(width,height): 根据输入特征层的大小得到输出特征的大小。600 --> [300,150,75,38]
[height:600,width:600]
'''
# 生成先验框
def get_anchors(shape,width,height):
    ''' 获得先验框
    shape： featuremap.shape ,img.width,img.height '''
    # 1 生成框边长
    anchors = generate_anchors()
    # 2 边长和中心点结合，每个像素点生成框
    network_anchors = shift(shape,anchors)
    # 3 缩放框在0，1之间
    network_anchors[:,0] = network_anchors[:,0]/width
    network_anchors[:,1] = network_anchors[:,1]/height
    network_anchors[:,2] = network_anchors[:,2]/width
    network_anchors[:,3] = network_anchors[:,3]/height
    network_anchors = np.clip(network_anchors,0,1)  # 框的坐标限制在0～1范围内。
    return network_anchors

(2.1) 生成框边长

根据sizes、ratios确定宽高array([[ 128., 256., 512., 128., 256., 512., 256., 512., 1024.],[ 128., 256., 512., 256., 512., 1024., 128., 256., 512.]])
把宽高变为原来的1/2，方便后面根据框的中心点和宽高转化为框的左上角和右下角坐标。

'''
anchors = generate_anchors()
'''
def generate_anchors(sizes=None, ratios=None):
    ''' 生成大小不同的先验框的边长
    一共九个
    [[ -64.,  -64.,   64.,   64.],
       [-128., -128.,  128.,  128.],
       [-256., -256.,  256.,  256.],
       [ -64., -128.,   64.,  128.],
       [-128., -256.,  128.,  256.],
       [-256., -512.,  256.,  512.],
       [-128.,  -64.,  128.,   64.],
       [-256., -128.,  256.,  128.],
       [-512., -256.,  512.,  256.]]
    '''
    if sizes is None:
        sizes = config.anchor_box_scales  # [128, 256, 512]

    if ratios is None:
        ratios =  config.anchor_box_ratios #  [[1, 1], [1, 2], [2, 1]]
    # 框的数目
    num_anchors = len(sizes) * len(ratios) # 3*3
    # 放置框
    anchors = np.zeros((num_anchors, 4))  #[9,4]

    anchors[:, 2:] = np.tile(sizes, (2, len(ratios))).T  # 把size复制成[2,3]
    
    for i in range(len(ratios)):
        anchors[3*i:3*i+3, 2] = anchors[3*i:3*i+3, 2]*ratios[i][0]
        anchors[3*i:3*i+3, 3] = anchors[3*i:3*i+3, 3]*ratios[i][1]
    

    anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T
    anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T
    return anchors

(2.2) 边长和中心点结合，每个像素点生成框

把原图根据特征层次的宽高把img划分为38*38的小方格。
确定每个方格中心点的坐标。
根据中心点坐标和宽高计算先验框的左上角和右下角坐标。

'''
network_anchors = shift(shape,anchors)
'''
def shift(shape, anchors, stride=config.rpn_stride):
    ''' 生成网格中心点 self.rpn_stride = 16 ，根据边长和中心点生成框 What\How \Why
    base_layers(None, 38, 38, 1024)
    self.rpn_stride = 16
    [0,1,2,3,....37]
    [0.5,1.5,2.5,....,37.5]
    [0.5,1.5,2.5,....,37.5]*stride
    '''

    shift_x = (np.arange(0, shape[0], dtype=keras.backend.floatx()) + 0.5) * stride
    shift_y = (np.arange(0, shape[1], dtype=keras.backend.floatx()) + 0.5) * stride

    shift_x, shift_y = np.meshgrid(shift_x, shift_y)

    shift_x = np.reshape(shift_x, [-1])  # (1444,)
    shift_y = np.reshape(shift_y, [-1])  # (1444,)

    shifts = np.stack([
        shift_x,
        shift_y,
        shift_x,
        shift_y
    ], axis=0)  # (4, 1444)

    shifts            = np.transpose(shifts)  # (1444, 4)
    number_of_anchors = np.shape(anchors)[0]  # 9

    k = np.shape(shifts)[0]
    # 下面两步操作得到框的左上角和右下角坐标 [1,9,4] + [1444,1,4]
    shifted_anchors = np.reshape(anchors, [1, number_of_anchors, 4]) + np.array(np.reshape(shifts, [k, 1, 4]), keras.backend.floatx())
    shifted_anchors = np.reshape(shifted_anchors, [k * number_of_anchors, 4])
    return shifted_anchors  # (12996, 4)

(3) 将预测结果进行解码 + nms筛选得到建议框

解码。
对解码后对框处理：取出得分高于confidence_threshold的框、进行iou的非极大抑制。
按照置信度进行排序、选出置信度最大的keep_top_k个。

'''
rpn_results = self.bbox_util.detection_out(preds,anchors,1,confidence_threshold=0.8)

predspreds : x_class(None, 12996, 1) , x_regr(None, 12996, 4) 

anchors(12996, 4)
'''
    def detection_out(self, predictions, mbox_priorbox, num_classes, keep_top_k=300,
                        confidence_threshold=0.5):
        ''' 对PRN网络框预测结果用先验框解码、NMS '''
        mbox_conf = predictions[0] # 类别
        mbox_loc = predictions[1]  # 网络预测的结果
        # 先验框数量
        mbox_priorbox = mbox_priorbox
        results = []
        # 对每一个图片进行处理
        for i in range(len(mbox_loc)):
            results.append([])
            # 1 解码 **************************得到预测框的左上角和右下角
            decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
            # 2 对解码后对框处理
            for c in range(num_classes):
                c_confs = mbox_conf[i, :, c]
                c_confs_m = c_confs > confidence_threshold
                if len(c_confs[c_confs_m]) > 0:
                    # 取出得分高于confidence_threshold的框
                    boxes_to_process = decode_bbox[c_confs_m]
                    confs_to_process = c_confs[c_confs_m]
                    # 进行iou的非极大抑制
                    feed_dict = {self.boxes: boxes_to_process,
                                    self.scores: confs_to_process}
                    # 把标签、置信度、框取出来
                    idx = self.sess.run(self.nms, feed_dict=feed_dict)
                    # 取出在非极大抑制中效果较好的内容
                    good_boxes = boxes_to_process[idx]
                    confs = confs_to_process[idx][:, None]
                    # 将label、置信度、框的位置进行堆叠。
                    labels = c * np.ones((len(idx), 1))
                    c_pred = np.concatenate((labels, confs, good_boxes),
                                            axis=1)
                    # 添加进result里
                    results[-1].extend(c_pred)

            if len(results[-1]) > 0:
                # 按照置信度进行排序
                results[-1] = np.array(results[-1])
                argsort = np.argsort(results[-1][:, 1])[::-1]
                results[-1] = results[-1][argsort]
                # 选出置信度最大的keep_top_k个
                results[-1] = results[-1][:keep_top_k]
        # 获得，在所有预测结果里面，置信度比较高的框
        # 还有，利用先验框和RPN网络的预测结果，处理获得了真实框（预测框）的位置
        return results

(3.1) 解码得到预测框的左上角和右下角

获取先验框的宽高和中心点。
根据上一步结果和预测偏移带入解码公式得到预测框的中心和宽高值。
预测框的值限制在0～1之间。

'''
decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
'''
def decode_boxes(self, mbox_loc, mbox_priorbox):
        #  1.1 获得先验框的宽与高
        prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0]
        prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1]

        # 1.2 获得先验框的中心点
        prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0])
        prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1])

        # 2 预测的真实框距离先验框中心的xy轴偏移情况
        decode_bbox_center_x = mbox_loc[:, 0] * prior_width / 4
        decode_bbox_center_x += prior_center_x
        decode_bbox_center_y = mbox_loc[:, 1] * prior_height / 4
        decode_bbox_center_y += prior_center_y
        
        # 预测的真实框的宽与高的求取
        decode_bbox_width = np.exp(mbox_loc[:, 2] / 4)
        decode_bbox_width *= prior_width
        decode_bbox_height = np.exp(mbox_loc[:, 3] /4)
        decode_bbox_height *= prior_height

        # 获取预测的真实框的左上角与右下角
        decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
        decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
        decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
        decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height

        # 真实框的左上角与右下角进行堆叠
        decode_bbox = np.concatenate((decode_bbox_xmin[:, None],
                                      decode_bbox_ymin[:, None],
                                      decode_bbox_xmax[:, None],
                                      decode_bbox_ymax[:, None]), axis=-1)
        # 防止超出0与1
        decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
        return decode_bbox

(3.2) NMS
上一步解码后得到预测框，然后遍历预测框的每一个类别，选出置信度大于阈值的预测框，再根据上一步的结果进行NMS删除掉重复的框。

按照置信度对同一类别的框从大到小排列。
计算所有框与第一个框的IoU，如果IoU小于阈值，则保留框，反之删除。
对上一步得到的框重复以上操作，直到待检测的框数量为0.

'''
self.nms = tf.image.non_max_suppression(self.boxes, self.scores,
                                                self._top_k,
                                                iou_threshold=self._nms_thresh)
'''

(4) P_cls种类，置信度, P_regr

抠图。遍历建议框，在特征图上截取框对应的位置，然后把截取的图统一到[1,32,14,14,1024]

'''
 [P_cls, P_regr] = self.model_classifier.predict([base_layer,ROIs]) 
'''
def get_classifier(base_layers, input_rois, num_rois, nb_classes=21, trainable=False):
    ''' roi --> cls+reg'''
    pooling_regions = 14
    input_shape = (num_rois, 14, 14, 1024)
    # base_layers[38,38,1024], input_rois[num_prior,4]  num_prior=32，out_roi_pool.shape[1,32,14,14,1024]
    # 1 roiPooling，base_layers[38,38,1024], input_rois[-1,4] ,pooling_regions=14, num_rois=32
    out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois]) # input_rois是建议框
    # 2
    out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)
    out = TimeDistributed(Flatten())(out)
    out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)
    out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)
    return [out_class, out_regr]  # 21, 20*4

(4.1) 抠图

'''
 out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
'''
   def call(self, x, mask=None):

        assert(len(x) == 2)

        img = x[0]   # featureMap
        rois = x[1]  # 建议框

        outputs = []
        # 遍历建议框
        for roi_idx in range(self.num_rois):
            # x,y左上角，wh宽高
            x = rois[0, roi_idx, 0]
            y = rois[0, roi_idx, 1]
            w = rois[0, roi_idx, 2]
            h = rois[0, roi_idx, 3]

            x = K.cast(x, 'int32')
            y = K.cast(y, 'int32')
            w = K.cast(w, 'int32')
            h = K.cast(h, 'int32')
            # 在特征图上截取
            rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
            outputs.append(rs)

        final_output = K.concatenate(outputs, axis=0)
        final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))

        final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))

        return final_output  # [1,32,14,14,1024]

(4.3) 对抠图卷积提取特征

'''
out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)

[1,32,14,14,1024] --> (None, 32, 1, 1, 204)
'''
def classifier_layers(x, input_shape=(32, 14, 14, 1024), trainable=False):
    x = conv_block_td(x, 3, [512, 512, 2048], stage=5, block='a', input_shape=input_shape, strides=(2, 2), trainable=trainable)
    x = identity_block_td(x, 3, [512, 512, 2048], stage=5, block='b', trainable=trainable)
    x = identity_block_td(x, 3, [512, 512, 2048], stage=5, block='c', trainable=trainable)
    x = TimeDistributed(AveragePooling2D((7, 7)), name='avg_pool')(x)  # 对第二维度处理

    return x   # (None, 32, 1, 1, 204)

(4.4) 分类、回归

'''
 (None, 32, 1, 1, 204) --> (None, 1, 6528)  --> (None, 1, 21) + (None, 1, 80) 
'''
out = TimeDistributed(Flatten())(out)  # (None, 1, 6528)
out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)   # (None, 1, 21) 
out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)  #(None, 1, 80)

3 训练

3.1 训练流程

加载模型
生成数据集
设置参数
训练模型

3.2 生成标签

数据增强。
获取先验框。
真实框与先验框配对，生成标签。
确定正负样本。

'''
gen = Generator(bbox_util, lines, NUM_CLASSES, solid=True)
rpn_train = gen.generate()
'''
    def generate(self):
        while True:
            shuffle(self.train_lines)
            lines = self.train_lines
            for annotation_line in lines:  
                # 数据增强
                img,y=self.get_random_data(annotation_line)
                height, width, _ = np.shape(img)
                
                if len(y)==0:
                    continue
                boxes = np.array(y[:,:4],dtype=np.float32)
                boxes[:,0] = boxes[:,0]/width
                boxes[:,1] = boxes[:,1]/height
                boxes[:,2] = boxes[:,2]/width
                boxes[:,3] = boxes[:,3]/height
                
                box_heights = boxes[:,3] - boxes[:,1]
                box_widths = boxes[:,2] - boxes[:,0]
                if (box_heights<=0).any() or (box_widths<=0).any():
                    continue

                y[:,:4] = boxes[:,:4]
                
                # 获取先验框
                anchors = get_anchors(get_img_output_length(width,height),width,height)
                
                # 计算真实框对应的先验框，与这个先验框应当有的预测结果
                assignment = self.bbox_util.assign_boxes(y,anchors)

                num_regions = 256
                
                classification = assignment[: , 4]
                regression = assignment[:,:]
                
                mask_pos = classification[:] > 0
                num_pos = len(classification[mask_pos])
                if num_pos > num_regions/2:
                    val_locs = random.sample(range(num_pos), int(num_pos - num_regions/2))
                    classification[mask_pos][val_locs] = -1
                    regression[mask_pos][val_locs,-1] = -1
                
                mask_neg = classification[:]==0
                num_neg = len(classification[mask_neg])
                if len(classification[mask_neg]) + num_pos > num_regions:
                    val_locs = random.sample(range(num_neg), int(num_neg - num_pos))
                    classification[mask_neg][val_locs] = -1
                    
                classification = np.reshape(classification,[-1,1])
                regression = np.reshape(regression,[-1,5])

                tmp_inp = np.array(img)
                tmp_targets = [np.expand_dims(np.array(classification,dtype=np.float32),0),np.expand_dims(np.array(regression,dtype=np.float32),0)]

                yield preprocess_input(np.expand_dims(tmp_inp,0)), tmp_targets, np.expand_dims(y,0)

(1) 数据增强

重新设置图片宽高(800, 600)
反转
调整色调、亮度、饱和度
修正框

'''
img,y=self.get_random_data(annotation_line)
'''
def get_random_data(self, annotation_line, random=True, jitter=.1, hue=.1, sat=1.1, val=1.1, proc_img=True):
        '''r实时数据增强随机预处理'''
        line = annotation_line.split()
        image = Image.open(line[0])
        iw, ih = image.size
        if self.solid:
            w,h = self.solid_shape
        else:
            w, h = get_new_img_size(iw, ih)
        box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])

        # resize image
        new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)
        scale = rand(.25, 2)
        if new_ar < 1:
            nh = int(scale*h)
            nw = int(nh*new_ar)
        else:
            nw = int(scale*w)
            nh = int(nw/new_ar)
        image = image.resize((nw,nh), Image.BICUBIC)

        # place image
        dx = int(rand(0, w-nw))
        dy = int(rand(0, h-nh))
        new_image = Image.new('RGB', (w,h), (128,128,128))
        new_image.paste(image, (dx, dy))
        image = new_image

        # flip image or not
        flip = rand()<.5
        if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)

        # distort image
        hue = rand(-hue, hue)
        sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
        val = rand(1, val) if rand()<.5 else 1/rand(1, val)
        x = rgb_to_hsv(np.array(image)/255.)
        x[..., 0] += hue
        x[..., 0][x[..., 0]>1] -= 1
        x[..., 0][x[..., 0]<0] += 1
        x[..., 1] *= sat
        x[..., 2] *= val
        x[x>1] = 1
        x[x<0] = 0
        image_data = hsv_to_rgb(x)*255 # numpy array, 0 to 1

        # correct boxes
        box_data = np.zeros((len(box),5))
        if len(box)>0:
            np.random.shuffle(box)
            box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
            box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
            # flip
            if flip: box[:, [0,2]] = w - box[:, [2,0]]
            # 过滤
            box[:, 0:2][box[:, 0:2]<0] = 0
            box[:, 2][box[:, 2]>w] = w
            box[:, 3][box[:, 3]>h] = h
            box_w = box[:, 2] - box[:, 0]
            box_h = box[:, 3] - box[:, 1]
            box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
            box_data = np.zeros((len(box),5))
            box_data[:len(box)] = box
        if len(box) == 0:
            return image_data, []

        if (box_data[:,:4]>0).any():
            return image_data, box_data
        else:
            return image_data, []

(2) 获取先验框

生成框边长。
边长和中心点结合，每个像素点生成框。
缩放框在0，1之间。

'''
anchors = get_anchors(get_img_output_length(width,height),width,height)
'''
def get_anchors(shape,width,height):
    ''' 获得先验框
    shape： featuremap.shape ,img.width,img.height '''
    # 1 生成框边长
    anchors = generate_anchors()
    # 2 边长和中心点结合，每个像素点生成框
    network_anchors = shift(shape,anchors)
    # 3 缩放框在0，1之间
    network_anchors[:,0] = network_anchors[:,0]/width
    network_anchors[:,1] = network_anchors[:,1]/height
    network_anchors[:,2] = network_anchors[:,2]/width
    network_anchors[:,3] = network_anchors[:,3]/height
    network_anchors = np.clip(network_anchors,0,1)
    return network_anchors

(3) 真实框与先验框配对，生成标签

确忽略的框。
找出正样本，并使符合要求每一个先验框只负责一个真实框。

'''
assignment = self.bbox_util.assign_boxes(y,anchors)
'''
    def assign_boxes(self, boxes, anchors):
        self.num_priors = len(anchors)
        self.priors = anchors
        assignment = np.zeros((self.num_priors, 4 + 1))

        assignment[:, 4] = 0.0
        if len(boxes) == 0:
            return assignment
        
        # 1. 确忽略的框  
        # 对每一个真实框都进行iou计算
        ingored_boxes = np.apply_along_axis(self.ignore_box, 1, boxes[:, :4])
        # 取重合程度最大的先验框，并且获取这个先验框的index
        ingored_boxes = ingored_boxes.reshape(-1, self.num_priors, 1)
        # (num_priors)
        ignore_iou = ingored_boxes[:, :, 0].max(axis=0)
        # (num_priors)
        ignore_iou_mask = ignore_iou > 0

        assignment[:, 4][ignore_iou_mask] = -1

        # 2. 找出正样本，并使符合要求每一个先验框只负责一个真实框。
        # (n, num_priors, 5)
        encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
        # 每一个真实框的编码后的值，和iou
        # (n, num_priors)
        encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)

        # 取重合程度最大的先验框，并且获取这个先验框的index
        # (num_priors)
        best_iou = encoded_boxes[:, :, -1].max(axis=0)
        # (num_priors)
        best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0)
        # (num_priors)
        best_iou_mask = best_iou > 0
        # 某个先验框它属于哪个真实框
        best_iou_idx = best_iou_idx[best_iou_mask]

        assign_num = len(best_iou_idx)
        # 保留重合程度最大的先验框的应该有的预测结果
        # 哪些先验框存在真实框
        encoded_boxes = encoded_boxes[:, best_iou_mask, :]

        assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4]
        # 4代表为背景的概率，为0
        assignment[:, 4][best_iou_mask] = 1
        # 通过assign_boxes我们就获得了，输入进来的这张图片，应该有的预测结果是什么样子的
        return assignment

(4) 确定正负样本

正负样本共256个。
如果正样本数量超过128个，从所有正样本中抽取(num_pos - 128/2)。
如果所有负样本量与正样本之和大于256，从所有负样本中抽取(num_neg - num_pos)个样本作为负样本。

num_regions = 256
                
classification = assignment[: , 4]
regression = assignment[:,:]

mask_pos = classification[:] > 0
num_pos = len(classification[mask_pos])
if num_pos > num_regions/2:
    val_locs = random.sample(range(num_pos), int(num_pos - num_regions/2))
    classification[mask_pos][val_locs] = -1
    regression[mask_pos][val_locs,-1] = -1

mask_neg = classification[:] == 0
num_neg = len(classification[mask_neg])
if len(classification[mask_neg]) + num_pos > num_regions:
    val_locs = random.sample(range(num_neg), int(num_neg - num_pos))
    classification[mask_neg][val_locs] = -1
    
classification = np.reshape(classification,[-1,1])
regression = np.reshape(regression,[-1,5])

tmp_inp = np.array(img)
tmp_targets = [np.expand_dims(np.array(classification,dtype=np.float32),0),np.expand_dims(np.array(regression,dtype=np.float32),0)]

3.3 损失函数

(1) smooth_l1损失函数

找到正样本
代入公式

'''
smooth_l1()
'''
def smooth_l1(sigma=1.0):
    sigma_squared = sigma ** 2

    def _smooth_l1(y_true, y_pred):
        # y_true [batch_size, num_anchor, 4+1]
        # y_pred [batch_size, num_anchor, 4]
        regression        = y_pred
        regression_target = y_true[:, :, :-1]
        anchor_state      = y_true[:, :, -1]

        # 找到正样本
        indices           = tf.where(keras.backend.equal(anchor_state, 1))
        regression        = tf.gather_nd(regression, indices)
        regression_target = tf.gather_nd(regression_target, indices)

        # 计算 smooth L1 loss
        # f(x) = 0.5 * (sigma * x)^2          if |x| < 1 / sigma / sigma
        #        |x| - 0.5 / sigma / sigma    otherwise
        regression_diff = regression - regression_target
        regression_diff = keras.backend.abs(regression_diff)
        regression_loss = tf.where(
            keras.backend.less(regression_diff, 1.0 / sigma_squared),
            0.5 * sigma_squared * keras.backend.pow(regression_diff, 2),
            regression_diff - 0.5 / sigma_squared
        )

        normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0])
        normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx())
        loss = keras.backend.sum(regression_loss) / normalizer

        return loss

    return _smooth_l1

(2) cls_loss()损失函数

找出存在目标的先验框。
找出实际上为背景的先验框。
分别计算正负样本的交叉熵。

'''
def cls_loss(ratio=3):
    def _cls_loss(y_true, y_pred):
        # y_true [batch_size, num_anchor, num_classes+1]
        # y_pred [batch_size, num_anchor, num_classes]
        labels         = y_true
        anchor_state   = y_true[:,:,-1] # -1 是需要忽略的, 0 是背景, 1 是存在目标
        classification = y_pred

        # 找出存在目标的先验框
        indices_for_object        = tf.where(keras.backend.equal(anchor_state, 1))
        labels_for_object         = tf.gather_nd(labels, indices_for_object)
        classification_for_object = tf.gather_nd(classification, indices_for_object)

        cls_loss_for_object = keras.backend.binary_crossentropy(labels_for_object, classification_for_object)

        # 找出实际上为背景的先验框
        indices_for_back        = tf.where(keras.backend.equal(anchor_state, 0))
        labels_for_back         = tf.gather_nd(labels, indices_for_back)
        classification_for_back = tf.gather_nd(classification, indices_for_back)

        # 计算每一个先验框应该有的权重
        cls_loss_for_back = keras.backend.binary_crossentropy(labels_for_back, classification_for_back)

        # 标准化，实际上是正样本的数量
        normalizer_pos = tf.where(keras.backend.equal(anchor_state, 1))
        normalizer_pos = keras.backend.cast(keras.backend.shape(normalizer_pos)[0], keras.backend.floatx())
        normalizer_pos = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer_pos)

        normalizer_neg = tf.where(keras.backend.equal(anchor_state, 0))
        normalizer_neg = keras.backend.cast(keras.backend.shape(normalizer_neg)[0], keras.backend.floatx())
        normalizer_neg = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer_neg)
        
        # 将所获得的loss除上正样本的数量
        cls_loss_for_object = keras.backend.sum(cls_loss_for_object)/normalizer_pos
        cls_loss_for_back = ratio*keras.backend.sum(cls_loss_for_back)/normalizer_neg

        # 总的loss
        loss = cls_loss_for_object + cls_loss_for_back

        return loss
    return _cls_loss
'''

你可能感兴趣的:(计算机视觉,深度学习,机器学习,人工智能)

Java 与 AI 携手，掀起多领域智能变革浪潮 WangRK_ 人工智能 java 开发语言
在数字化转型的时代浪潮下，技术更新迭代速度超乎想象。当Java这门历经二十余年沉淀的编程语言，遇上风头正劲的人工智能（AI），一场席卷多领域的智能变革正悄然发生。尤其是在金融与零售两大行业，这场技术融合带来的改变，正重塑着整个行业的生态。一、Java在金融与零售行业的“前世今生”（一）曾经的行业基石在金融领域，Java堪称“代码钢铁侠”，是金融基础设施的坚实支柱。全球顶级交易所依靠Java强大的性
用户实体行为分析与数据异常访问联防方案 KKKlucifer 时序数据库
一、用户实体行为分析（UEBA）技术概述1.1定义与概念用户实体行为分析（UEBA）是一种高级网络安全方法，它利用机器学习和行为分析技术，对用户、设备、应用程序等实体在网络环境中的行为进行深入分析，以检测出异常行为和潜在的安全威胁。UEBA的核心在于通过建立行为基线，识别出偏离正常行为模式的活动，从而发现那些传统安全工具难以检测到的高级、隐藏和内部威胁。1.2工作原理UEBA系统通过收集来自多个数
从零开始理解Transformer模型：架构与应用淮橘√ transformer 深度学习人工智能
引言近年来，Transformer模型席卷了自然语言处理（NLP）领域，成为了深度学习中的明星架构。从Google提出的《AttentionisAllYouNeed》论文到ChatGPT、BERT等模型的广泛应用，Transformer以其强大的性能和灵活性改变了我们对序列建模的认知。本文将从零开始，深入浅出地解析Transformer的架构原理、核心组件以及实际应用场景，并提供一个简单的代码示例
java opencv 数字识别算法_[机器学习]基于OpenCV实现最简单的数字识别后期小雨 java opencv 数字识别算法
本文将基于OpenCV实现简单的数字识别。这里以游戏AngryBirds为例，通过以下几个主要步骤对其中右上角的分数部分进行自动识别。1.学习分类器根据训练样本，选取模型训练产生数字分类器。这里的样本可以是通用的数字样本库(如NIST等)，也可以是针对应用场景而制作的专门训练样本。前者优在泛化性，后者强在准确率，当然常用做法是将这两者结合，即在通用数字库基础上做修改。另外这里由于模式并不复杂，计算
筑牢医疗AI安全防线：四重防护体系全解析 Allen_Lyb 数智化教程（第二期）人工智能安全
一、引言：医疗AI发展中的安全困境在数字化浪潮席卷下，医疗领域正经历着一场由人工智能（AI）驱动的深刻变革。医疗AI凭借其强大的数据分析与处理能力，在疾病诊断、药物研发、健康管理等诸多环节展现出巨大潜力，成为推动医疗行业进步的关键力量。而这一切的背后，医疗数据作为AI发展的“燃料”，以及AI算力作为运行的“引擎”，起着不可或缺的核心作用。医疗数据涵盖了患者从基本信息、病史、症状描述到各种检查检验报
Python 爬虫实战：从图片网站抓取图片并进行特征提取（2025 最新版） Python爬虫项目 2025年爬虫实战项目 python 爬虫开发语言 github chrome 数据库
一、引言在当今的数字时代，图像数据在各个领域中扮演着至关重要的角色。无论是计算机视觉、机器学习，还是数据分析，图像数据的获取和处理都是基础。然而，获取大量高质量的图像数据并非易事。幸运的是，互联网上充斥着丰富的图像资源，只需借助合适的工具和技术，我们就能高效地从中获取所需的图像数据。本文将详细介绍如何使用Python构建一个完整的爬虫系统，从图片网站抓取图像，并对其进行特征提取。我们将涵盖从网页分
Open AI在AI人工智能领域的技术安全防护体系 AI智能探索者 AI Agent 智能体开发实战人工智能安全网络 ai
OpenAI在AI人工智能领域的技术安全防护体系关键词：OpenAI、AI安全、技术防护、伦理框架、模型对齐、数据隐私、对抗攻击摘要：本文将深入探讨OpenAI在人工智能领域构建的多层次技术安全防护体系。我们将从基础概念出发，逐步解析OpenAI如何通过技术创新和系统设计来确保AI系统的安全性、可靠性和可控性。文章将涵盖从数据安全到模型对齐，从伦理框架到实际防护技术的全方位内容，帮助读者全面理解现
揭秘自然语言处理在AI人工智能领域的奥秘 AI智能探索者 AI Agent 智能体开发实战人工智能自然语言处理 easyui ai
揭秘自然语言处理在AI人工智能领域的奥秘关键词：自然语言处理、AI人工智能、语言理解、语言生成、语义分析摘要：本文深入探讨了自然语言处理（NLP）在AI人工智能领域的奥秘。首先介绍了自然语言处理的背景，包括目的、预期读者、文档结构和相关术语。接着阐述了自然语言处理的核心概念与联系，通过文本示意图和Mermaid流程图进行展示。详细讲解了核心算法原理和具体操作步骤，并用Python源代码进行阐述。分
【LangChain编程：从入门到实践】AI 大模型检索增强生成 RAG 实践 AI智能应用 Python入门实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
LangChain编程：从入门到实践-AI大模型检索增强生成RAG实践关键词：LangChain,RAG,大语言模型,检索增强生成,向量数据库,嵌入模型,提示工程1.背景介绍在人工智能和自然语言处理领域,大语言模型(LargeLanguageModels,LLMs)的出现无疑是一个重大突破。像GPT-3、GPT-4这样的模型展现出了惊人的语言理解和生成能力,为各种应用场景带来了无限可能。然而,这些
基于深度学习的线上问诊系统设计与实现（Python+Django+MySQL）神经网络15044 深度学习算法神经网络 python 深度学习 django 机器学习人工智能算法目标检测
基于深度学习的线上问诊系统设计与实现（Python+Django+MySQL）一、系统概述本系统结合YOLOv8目标检测和ResNet50图像分类算法，构建了一个智能线上问诊平台。系统支持用户上传医学影像（皮肤照片/X光片），自动分析并生成诊断报告，同时提供医生审核功能。二、技术栈后端框架：Django4.2数据库：MySQL8.0深度学习：YOLOv8：皮肤病变区域检测ResNet50：肺炎X光
深度学习中常见激活函数总结向左转,　向右走ˉ 深度学习人工智能 pytorch python
以下是一份深度学习激活函数的系统总结，涵盖定义、类型、作用、应用及选择影响，便于你快速掌握核心知识：一、激活函数的定义在神经网络中，激活函数（ActivationFunction）是神经元计算输出的非线性变换函数，作用于加权输入和偏置之和：输出=f(加权和+偏置)核心价值：引入非线性，使神经网络能够拟合任意复杂函数（无激活函数的深度网络等价于单层线性模型）。二、常见激活函数类型1.线性函数（Lin
AI离全社会普及，只差一个计算中心？ a13163944010 人工智能
过去十年，人工智能（AI）大爆炸，并第一次走进普通人的生活。但蓬勃发展的AI却碰到一个空前棘手的问题：自2012年以来，AI算力需求6年增长30万倍，远超摩尔定律！人类现有的基础设施，已跟不上AI算力需求的增长。未来，该怎么办？【1】一百多年前，人类也曾面临同样的难题。1866年，德国西门子发明自激发电机，开启了人类的电力时代。此后十几年，虽然很多企业纷纷采用电能这种新的动力，但一台电机只能供应一
首次使用“非英伟达”芯片！OpenAI租用谷歌TPU，降低推理计算成本加百力科技知识财经研究人工智能 chatgpt
OpenAI近期开始租用谷歌TPU芯片，这是该公司首次大规模使用非英伟达芯片。除了OpenAI外、苹果、SafeSuperintelligence和Cohere等公司也一直租用谷歌云的TPU。英伟达的芯片主导地位正被侵蚀，OpenAI租用谷歌TPU，为首次大规模使用“非英伟达”芯片。周六，据媒体报道，作为全球最大的人工智能芯片客户之一，OpenAI近期开始租用谷歌的TPU芯片为ChatGPT等产品
AI人工智能神经网络马里亚纳海沟网人工智能神经网络深度学习笔记运维全文检索搜索引擎
**AI人工智能神经网络概述**神经网络是并行计算设备，它们试图构建大脑的计算机模型。背后的主要目标是开发一个系统来执行各种计算任务比传统系统更快。这些任务包括模式识别和分类，近似，优化和数据聚类什么是人工神经网络(ANN)人工神经网络(ANN)是一个高效的计算系统，其核心主题是借用生物神经网络的类比。人工神经网络也被称为人工神经系统，并行分布式处理系统和连接系统。ANN获取了大量以某种模式相互连
机器学习-- 聚类 SunsPlanter 机器学习机器学习聚类人工智能
什么是聚类？Clustering可以简单地说，对有标注的数据分类，就是逻辑回归（属于有监督分类），对无标注的数据分类，就是聚类（属于无监督分类）聚类是一种无监督学习技术，其目标是根据样本之间的相似性将未标记的数据分组。比如，在一个假设的患者研究中，研究人员正在评估一项新的治疗方案。在试验期间，患者每周会报告自身症状的频率以及严重程度。研究人员可以使用聚类分析将对治疗反应相似的患者归为同一类。图1展
FP16、BF16、INT8、INT4精度模型加载所需显存以及硬件适配的分析 herosunly 大模型精度 BF16 硬件适配
大家好，我是herosunly。985院校硕士毕业，现担任算法研究员一职，热衷于机器学习算法研究与应用。曾获得阿里云天池比赛第一名，CCF比赛第二名，科大讯飞比赛第三名。拥有多项发明专利。对机器学习和深度学习拥有自己独到的见解。曾经辅导过若干个非计算机专业的学生进入到算法行业就业。希望和大家一起成长进步。本文主要介绍了FP16、INT8、INT4精度模型加载占用显存大小的分析，希望对学习大
educoder机器学习 --- 神经网络木右加木 educoder 机器学习神经网络
第1关：神经网络基本概念１、Ｃ第2关：激活函数#encoding=utf8defrelu(x):'''x:负无穷到正无穷的实数'''#*********Begin*********#ifx<=0:return0else:returnx#*********End*********#第3关：反向传播算法#encoding=utf8importosimportpandasaspdfromsklearn.
智能办公与科研革命：ChatGPT+DeepSeek大模型在论文撰写、数据分析与AI建模中的实践指南 jwwkyjspt 机器学习 SCI论文人工智能 chatgpt 语言模型机器学习
随着人工智能技术的快速发展，大语言模型如ChatGPT和DeepSeek在科研领域的应用正在为科研人员提供强大的支持。这些模型通过深度学习和大规模语料库训练，能够帮助科研人员高效地筛选文献、生成论文内容、进行数据分析和优化机器学习模型。ChatGPT和DeepSeek能够快速理解和生成复杂的语言，帮助研究人员在撰写论文时提高效率，不仅生成高质量的文章内容，还能优化论文结构和语言表达。在数据分析方面
初学Spring AI 笔记笑衬人心。大模型学习 spring 人工智能笔记
目录SpringAI简介依赖与环境配置基础概念集成OpenAI（或其他LLM提供商）Prompt模板引擎Embedding与向量数据库SpringAIChatClient使用SpringAI和LangChain对比常见问题与建议SpringAI简介SpringAI是Spring团队推出的人工智能集成框架，旨在简化AI模型（如OpenAI、HuggingFace、Mistral、AzureOpenA
AI新高度——DEEPSEEK 数字隐士·赛博智者 ai
DeepSeek是由中国人工智能公司「深度求索」开发的一系列高性能大语言模型产品及相关技术体系，其定位为通用人工智能（AGI）探索者，目前已发展成为全球增长最快、性能领先的开源模型之一。下面是关于DeepSeek的详细介绍：一、DeepSeek的开发者与背景‌公司名称‌：杭州深度求索人工智能基础技术研究有限公司（成立于2023年）‌核心支持‌：由中国知名对冲基金「高毅资产」创立并提供资金与技术资源
【机器学习&深度学习】适合微调的模型选型指南一叶千舟深度学习【应用必备常识】深度学习人工智能
目录一、不同规模模型微调适用性二、微调技术类型对显存的影响三、选择建议（根据你的硬件）四、实际模型推荐五、不同模型适合人群六、推荐几个“非常适合微调”的模型七、推荐使用的微调技术八、场景选择示例场景1：智能客服（中文）场景2：法律问答（中文RAG）场景3：医学问答/健康咨询场景4：AI写作助手（中英文）场景5：代码补全/AI编程助手对比总结表九、不同参数模型特点9.1参数规模vs能力9.2微型模型
【机器学习&深度学习】本地部署 vs API调用：关键看显存！一叶千舟深度学习【应用必备常识】深度学习人工智能
目录一、本地部署VSAPI调用1.模型运行方式2.性能与速度3.成本4.隐私与安全5.何时选择哪种方式？二、为什么推荐本地部署？1️⃣零依赖网络和外部服务，更可靠稳定2️⃣无调用次数限制，更适合高频或批量推理3️⃣避免长期API费用，节省成本4️⃣保护用户隐私和数据安全5️⃣可自定义、深度优化6️⃣加载一次即可复用，低延迟高性能7️⃣离线可用（重要！）三、适合本地部署的情况四、本地部署条件4.1模
深度学习 vs 传统机器学习：哪个更适合你的项目？ AI大模型应用之禅深度学习机器学习人工智能 ai
深度学习vs传统机器学习：哪个更适合你的项目？关键词：深度学习、传统机器学习、特征工程、数据量、计算资源、项目选择、算法对比摘要：本文将用"炒菜"和"拼图"等生活案例，从核心原理、适用场景、资源需求等维度对比深度学习与传统机器学习。通过具体代码示例和真实项目场景分析，帮助开发者和企业决策者快速判断：你的项目该选深度学习还是传统机器学习？背景介绍目的和范围随着AI技术普及，"该用深度学习还是传统机器
Python 机器学习实战：泰坦尼克号生还者预测 (从数据探索到模型构建) 程序员阿超的博客 Python python 机器学习开发语言泰坦尼克号 Kaggle Scikit-learn 实战教程
引言：挑战介绍泰坦尼克号的沉没是历史上最著名的海难之一。除了其悲剧色彩，它还为数据科学提供了一个经典且引人入胜的入门项目。Kaggle平台上的“Titanic:MachineLearningfromDisaster”竞赛，要求我们利用乘客数据来预测哪些人更有可能在这场灾难中幸存。这是一个典型的二元分类问题：目标变量Survived只有两个值，0（遇难）或1（生还）。这个项目之所以经典，是因为它涵盖
LLM大语言模型学习笔记（1） Arixs666 大语言模型语言模型笔记人工智能
1.概念大语言模型（LLM，LargeLanguageModel），也称大型语言模型，是一种旨在理解和生成人类语言的人工智能模型。LLM通常指包含数百亿（或更多）参数的语言模型，它们在海量的文本数据上进行训练，从而获得对语言深层次的理解。2.能力2.1涌现能力区分大语言模型（LLM）与以前的预训练语言模型（PLM）最显著的特征之一是它们的涌现能力。涌现能力是一种令人惊讶的能力，它在小型模型中不明显
【python数据分析】数据建模之Kmeans聚类斑点鱼 SpotFish python 数据建模聚类 python 数据分析
K-means聚类：最常用的机器学习聚类算法，且为典型的基于距离的聚类算法。K均值：基于原型的、划分的距离技术，它试图发现用户指定个数(K)的簇以欧式距离作为相似度测度Kmeans聚类案例分析：make_blobs聚类数据生成器#导入模块from sklearn.cluster import KMeansfromsklearn.datasetsimportmake_blobs#创建数据x,y_tr
Milvus向量数据库入门指南 longfei.li milvus 数据库人工智能
一、Milvus简介Milvus是一个开源的向量数据库，专为AI应用和向量相似度搜索而设计，以加速非结构化数据的检索。自2019年创建以来，Milvus专注于存储、索引和管理由深度神经网络和其他机器学习模型生成的海量嵌入向量。其能够处理万亿级别的向量索引任务。Milvus的核心优势在于其高效的索引机制，它支持多种索引类型，包括FLAT、IVF_FLAT、IVF_SQ8、IVF_PQ和HNSW等。这
常见机器学习算法与应用场景计算机软件程序设计知识科普机器学习算法人工智能
当然可以。下面是对常见机器学习算法的全面详细阐述，包括每种算法的基本原理、特点以及典型应用场景。1.监督学习（SupervisedLearning）1.1线性回归（LinearRegression）原理：通过拟合一条直线来表示输入和输出之间的关系，适用于预测连续值输出。特点：简单易懂，计算速度快，但只能捕捉线性关系。应用场景：房价预测股票价格预测销售额预测1.2逻辑回归（LogisticRegre
[论文阅读] 人工智能 + 软件工程 | 揭秘ChatGPT在软件开发问题解决中的有效性：一项实证研究张较瘦_ 前沿技术论文阅读人工智能软件工程
揭秘ChatGPT在软件开发问题解决中的有效性：一项实证研究论文：WhatMakesChatGPTEffectiveforSoftwareIssueResolution?AnEmpiricalStudyofDeveloper-ChatGPTConversationsinGitHubarXiv:2506.22390WhatMakesChatGPTEffectiveforSoftwareIssueRe
[论文阅读] 人工智能 + 软件工程 | 代码注释不一致问题研究：从数据革新到端到端解决方案张较瘦_ 前沿技术论文阅读人工智能软件工程
代码注释不一致问题研究：从数据革新到端到端解决方案原文：CCISOLVER:End-to-EndDetectionandRepairofMethod-LevelCode-CommentInconsistencyarXiv:2506.20558CCISolver:End-to-EndDetectionandRepairofMethod-LevelCode-CommentInconsistencyRe
ASM系列五利用TreeApi 解析生成Class lijingyao8206 ASM 字节码动态生成 ClassNode TreeAPI
前面CoreApi的介绍部分基本涵盖了ASMCore包下面的主要API及功能，其中还有一部分关于MetaData的解析和生成就不再赘述。这篇开始介绍ASM另一部分主要的Api。TreeApi。这一部分源码是关联的asm-tree-5.0.4的版本。在介绍前，先要知道一点， Tree工程的接口基本可以完
链表树——复合数据结构应用实例 bardo 数据结构树型结构表结构设计链表菜单排序
我们清楚：数据库设计中，表结构设计的好坏，直接影响程序的复杂度。所以，本文就无限级分类（目录）树与链表的复合在表设计中的应用进行探讨。当然，什么是树，什么是链表，这里不作介绍。有兴趣可以去看相关的教材。需求简介：经常遇到这样的需求，我们希望能将保存在数据库中的树结构能够按确定的顺序读出来。比如，多级菜单、组织结构、商品分类。更具体的，我们希望某个二级菜单在这一级别中就是第一个。虽然它是最后
为啥要用位运算代替取模呢 chenchao051 位运算哈希汇编
在hash中查找key的时候，经常会发现用&取代%，先看两段代码吧， JDK6中的HashMap中的indexFor方法： /** * Returns index for hash code h. */ static int indexFor(int h, int length) {
最近的情况麦田的设计者生活感悟计划软考想
今天是2015年4月27号整理一下最近的思绪以及要完成的任务 1、最近在驾校科目二练车，每周四天，练三周。其实做什么都要用心，追求合理的途径解决。为
PHP去掉字符串中最后一个字符的方法 IT独行者 PHP 字符串
今天在PHP项目开发中遇到一个需求，去掉字符串中的最后一个字符原字符串1,2,3,4,5,6, 去掉最后一个字符","，最终结果为1,2,3,4,5,6 代码如下： $str = "1,2,3,4,5,6,"; $newstr = substr($str,0,strlen($str)-1); echo $newstr;
hadoop在linux上单机安装过程 _wy_ linux hadoop
1、安装JDK jdk版本最好是1.6以上，可以使用执行命令java -version查看当前JAVA版本号，如果报命令不存在或版本比较低，则需要安装一个高版本的JDK，并在/etc/profile的文件末尾，根据本机JDK实际的安装位置加上以下几行： export JAVA_HOME=/usr/java/jdk1.7.0_25
JAVA进阶----分布式事务的一种简单处理方法无量多系统交互分布式事务
每个方法都是原子操作：提供第三方服务的系统，要同时提供执行方法和对应的回滚方法 A系统调用B,C,D系统完成分布式事务 =========执行开始======== A.aa(); try { B.bb(); } catch(Exception e) { A.rollbackAa(); } try { C.cc(); } catch(Excep
安墨移动广告：移动DSP厚积薄发引领未来广告业发展命脉矮蛋蛋 hadoop 互联网
　　“谁掌握了强大的DSP技术，谁将引领未来的广告行业发展命脉。”2014年，移动广告行业的热点非移动DSP莫属。各个圈子都在纷纷谈论，认为移动DSP是行业突破点，一时间许多移动广告联盟风起云涌，竞相推出专属移动DSP产品。　　到底什么是移动DSP呢? 　　DSP(Demand-SidePlatform)，就是需求方平台，为解决广告主投放的各种需求，真正实现人群定位的精准广
myelipse设置 alafqq IP
在一个项目的完整的生命周期中，其维护费用，往往是其开发费用的数倍。因此项目的可维护性、可复用性是衡量一个项目好坏的关键。而注释则是可维护性中必不可少的一环。注释模板导入步骤安装方法：打开eclipse/myeclipse 选择 window-->Preferences-->JAVA-->Code-->Code
java数组百合不是茶 java数组
java数组的声明创建初始化； java支持C语言数组中的每个数都有唯一的一个下标一维数组的定义声明： int[] a = new int[3];声明数组中有三个数int[3] int[] a 中有三个数，下标从0开始，可以同过for来遍历数组中的数
javascript读取表单数据 bijian1013 JavaScript
利用javascript读取表单数据，可以利用以下三种方法获取： 1、通过表单ID属性：var a = document.getElementByIdx_x_x("id"); 2、通过表单名称属性：var b = document.getElementsByName("name"); 3、直接通过表单名字获取：var c = form.content.
探索JUnit4扩展：使用Theory bijian1013 java JUnit Theory
理论机制（Theory）一.为什么要引用理论机制（Theory）当今软件开发中，测试驱动开发（TDD — Test-driven development）越发流行。为什么 TDD 会如此流行呢？因为它确实拥有很多优点，它允许开发人员通过简单的例子来指定和表明他们代码的行为意图。 TDD 的优点： &nb
[Spring Data Mongo一]Spring Mongo Template操作MongoDB bit1129 template
什么是Spring Data Mongo Spring Data MongoDB项目对访问MongoDB的Java客户端API进行了封装，这种封装类似于Spring封装Hibernate和JDBC而提供的HibernateTemplate和JDBCTemplate，主要能力包括 1. 封装客户端跟MongoDB的链接管理 2. 文档-对象映射，通过注解:@Document(collectio
【Kafka八】Zookeeper上关于Kafka的配置信息 bit1129 zookeeper
问题： 1. Kafka的哪些信息记录在Zookeeper中 2. Consumer Group消费的每个Partition的Offset信息存放在什么位置 3. Topic的每个Partition存放在哪个Broker上的信息存放在哪里 4. Producer跟Zookeeper究竟有没有关系？没有关系！！！ //consumers、config、brokers、cont
java OOM内存异常的四种类型及异常与解决方案 ronin47 java OOM 内存异常
　OOM异常的四种类型：　　　　　一：　StackOverflowError ：通常因为递归函数引起（死递归，递归太深）。-Xss 128k 一般够用。　二：　out Of memory: PermGen Space：通常是动态类大多，比如web 服务器自动更新部署时引起。-Xmx
java-实现链表反转-递归和非递归实现 bylijinnan java
20120422更新：对链表中部分节点进行反转操作，这些节点相隔k个： 0->1->2->3->4->5->6->7->8->9 k=2 8->1->6->3->4->5->2->7->0->9 注意1 3 5 7 9 位置是不变的。解法：将链表拆成两部分： a.0-&
Netty源码学习-DelimiterBasedFrameDecoder bylijinnan java netty
看DelimiterBasedFrameDecoder的API，有举例：接收到的ChannelBuffer如下： +--------------+ | ABC\nDEF\r\n | +--------------+ 经过DelimiterBasedFrameDecoder(Delimiters.lineDelimiter())之后，得到： +-----+----
linux的一些命令 -查看cc攻击-网口ip统计等 hotsunshine linux
Linux判断CC攻击命令详解 2011年12月23日 ⁄ 安全 ⁄ 暂无评论查看所有80端口的连接数 netstat -nat|grep -i '80'|wc -l 对连接的IP按连接数量进行排序 netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n 查看TCP连接状态 n
Spring获取SessionFactory ctrain sessionFactory
String sql = "select sysdate from dual"; WebApplicationContext wac = ContextLoader.getCurrentWebApplicationContext(); String[] names = wac.getBeanDefinitionNames(); for(int i=0; i&
Hive几种导出数据方式 daizj hive 数据导出
Hive几种导出数据方式 1.拷贝文件如果数据文件恰好是用户需要的格式，那么只需要拷贝文件或文件夹就可以。 hadoop fs –cp source_path target_path 2.导出到本地文件系统 --不能使用insert into local directory来导出数据，会报错 --只能使用
编程之美 dcj3sjt126com 编程 PHP 重构
我个人的 PHP 编程经验中，递归调用常常与静态变量使用。静态变量的含义可以参考 PHP 手册。希望下面的代码，会更有利于对递归以及静态变量的理解 header("Content-type: text/plain"); function static_function () { static $i = 0; if ($i++ < 1
Android保存用户名和密码 dcj3sjt126com android
转自：http://www.2cto.com/kf/201401/272336.html 我们不管在开发一个项目或者使用别人的项目，都有用户登录功能，为了让用户的体验效果更好，我们通常会做一个功能，叫做保存用户，这样做的目地就是为了让用户下一次再使用该程序不会重新输入用户名和密码，这里我使用3种方式来存储用户名和密码 1、通过普通的txt文本存储 2、通过properties属性文件进行存
Oracle 复习笔记之同义词 eksliang Oracle 同义词 Oracle synonym
转载请出自出处：http://eksliang.iteye.com/blog/2098861 1.什么是同义词同义词是现有模式对象的一个别名。概念性的东西，什么是模式呢？创建一个用户，就相应的创建了一个模式。模式是指数据库对象，是对用户所创建的数据对象的总称。模式对象包括表、视图、索引、同义词、序列、过
Ajax案例 gongmeitao Ajax jsp
数据库采用Sql Server2005 项目名称为:Ajax_Demo 1.com.demo.conn包 package com.demo.conn; import java.sql.Connection;import java.sql.DriverManager;import java.sql.SQLException; //获取数据库连接的类public class DBConnec
ASP.NET中Request.RawUrl、Request.Url的区别 hvt .net Web C#asp.net hovertree
如果访问的地址是：http://h.keleyi.com/guestbook/addmessage.aspx?key=hovertree%3C&n=myslider#zonemenu那么Request.Url.ToString() 的值是：http://h.keleyi.com/guestbook/addmessage.aspx?key=hovertree<&
SVG 教程（七）SVG 实例，SVG 参考手册天梯梦 svg
SVG 实例在线实例下面的例子是把SVG代码直接嵌入到HTML代码中。谷歌Chrome，火狐，Internet Explorer9，和Safari都支持。注意：下面的例子将不会在Opera运行，即使Opera支持SVG - 它也不支持SVG在HTML代码中直接使用。 SVG 实例 SVG基本形状一个圆矩形不透明矩形一个矩形不透明2 一个带圆角矩
事务管理 luyulong java spring 编程事务
事物管理 spring事物的好处为不同的事物API提供了一致的编程模型支持声明式事务管理提供比大多数事务API更简单更易于使用的编程式事务管理API 整合spring的各种数据访问抽象 TransactionDefinition 定义了事务策略 int getIsolationLevel()得到当前事务的隔离级别 READ_COMMITTED
基础数据结构和算法十一：Red-black binary search tree sunwinner Algorithm Red-black
The insertion algorithm for 2-3 trees just described is not difficult to understand; now, we will see that it is also not difficult to implement. We will consider a simple representation known
centos同步时间 stunizhengjia linux 集群同步时间
做了集群，时间的同步就显得非常必要了。以下是查到的如何做时间同步。在CentOS 5不再区分客户端和服务器，只要配置了NTP，它就会提供NTP服务。 1)确认已经ntp程序包： # yum install ntp 2)配置时间源（默认就行，不需要修改） # vi /etc/ntp.conf server pool.ntp.o
ITeye 9月技术图书有奖试读获奖名单公布 ITeye管理员 ITeye
ITeye携手博文视点举办的9月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 9月试读活动回顾：http://webmaster.iteye.com/blog/2118112本次技术图书试读活动的优秀奖获奖名单及相应作品如下（优秀文章有很多，但名额有限，没获奖并不代表不优秀）：《NFC：Arduino、Andro