用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类

效果如图:

用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类_第1张图片

这是用cnn对一张305*471的图像做分类得到的结果,相当于做了52*93次滑动窗口+分类,却仅仅耗时0.2672951465708593s。相当于一次窗口分类 ,仅仅耗时 0.00005s。

具体网络+预测如下图所示:

import numpy as np
import cv2
import time
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout,Activation,Reshape
from keras.models import Sequential,Model
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint
from keras import optimizers 
sgd = optimizers.SGD(lr=0.001, decay=1e-5, momentum=0.99, nesterov=True)
model = Sequential()
model.add(Conv2D(filters=16,strides=1,kernel_size=(3,3),padding='same',activation='relu',input_shape=(None, None,3)))
model.add(Conv2D(filters=16,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(filters=32,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
# 32 * 50 * 50
#model.add(Conv2D(filters=64,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
# 32 * 25 *25
#model.add(Flatten())
model.add(Conv2D(filters=3,strides=1,kernel_size=(25,25),padding='valid',activation='softmax',name='mutilCLS'))
#model.add(Conv2D(filters=3,kernel_size=(1,1),padding='valid',activation='relu'))
#model.add(Reshape([-1,-1,3]))
model.add(Reshape([3]))
#model.add(Flatten())
model.add(Dropout(0.5))
#model.add(Dense(3))
model.add(Activation('softmax'))
#model.add(Dropout(0.5))
#model.add(Dense(3,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.load_weights('best_cnn_cla_depthwise.h5')

Mutil_layer_model = Model(inputs=model.input,
                                     outputs=model.get_layer('mutilCLS').output)
a = cv2.imread('./train/0/2118.jpg')
#a = cv2.resize(a,(102,102))
b = np.array(a)
b = b.reshape((1,)+b.shape)
sa = time.clock()
cp = Mutil_layer_model.predict(b)
cp = cp.reshape((cp.shape[1],cp.shape[2],3))

print(cp)
db = time.clock()-sa
print(db)

for i in range(cp.shape[0]):
    for j in range(cp.shape[1]):
        if cp[i][j][1]>0.99:
#            print(i,j)
            
            cv2.rectangle(a,(j*4,i*4),(j*4+100,i*4+100),(255,0,0),1)

cv2.imshow('aaa',a)

训练网络如下:

# -*- coding: utf-8 -*-
"""
Created on Thu Sep 13 12:29:51 2018

@author: Lenovo
"""

from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout,Activation,Reshape
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint
from keras import optimizers
sgd = optimizers.SGD(lr=0.001, decay=1e-5, momentum=0.99, nesterov=True)
model = Sequential()
model.add(Conv2D(filters=16,strides=1,kernel_size=(3,3),padding='same',activation='relu',input_shape=(None, None,3)))
model.add(Conv2D(filters=16,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(filters=32,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
# 32 * 50 * 50
#model.add(Conv2D(filters=64,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
# 32 * 25 *25
#model.add(Flatten())
model.add(Conv2D(filters=3,strides=1,kernel_size=(25,25),padding='valid',activation='softmax',name='mutilCLS'))
#model.add(Conv2D(filters=3,kernel_size=(1,1),padding='valid',activation='relu'))
#model.add(Reshape([-1,-1,3]))
model.add(Reshape([3]))
#model.add(Flatten())
model.add(Dropout(0.5))
#model.add(Dense(3))
model.add(Activation('softmax'))
#model.add(Dropout(0.5))
#model.add(Dense(3,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())

gen = ImageDataGenerator(rescale=1. / 255)
train_gen_data = gen.flow_from_directory('./train',batch_size=300, shuffle=True,target_size=(100, 100),class_mode='categorical')
test_gen_data = gen.flow_from_directory('./test',batch_size=43, shuffle=True,target_size=(100, 100),class_mode='categorical')
save_best = ModelCheckpoint('best_cnn_cla_depthwise.h5', monitor='val_acc', verbose=1,save_best_only=True)
callbacks=[save_best]
model.fit_generator(train_gen_data,
                      steps_per_epoch=8,
                      epochs=45,
                      verbose=1,
                      callbacks=callbacks,
                      validation_data=test_gen_data,
                      validation_steps=1,
                      shuffle=True,
                      initial_epoch=0)
    

换张图效果依旧很好

用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类_第2张图片

对于分出这么多窗口,接下来需要非极大值抑制NMS来做窗口调优。


原理部分:

1.对象检测

对象检测(Object Detection)的目的是”识别对象并给出其在图中的确切位置”,其内容可解构为三部分:

  1. 识别某个对象(Classification);
  2. 给出对象在图中的位置(Localization);
  3. 识别图中所有的目标及其位置(Detection)。

如下图所示,从左到右分别展示了:某个对象的识别(P(目标)=1,class=car),对象在图中的定位(给出边框bounding box–

2.滑窗+CNN

滑动窗口(Sliding Windows,简称滑窗)法是进行目标检测的主流方法。对于某输入图像,由于其对象尺度形状等因素的不确定性,导致直接套用预训练好的模型进行识别效率低下。通过设计滑窗来遍历图像,将每个窗口对应的局部图像进行检测,能有效克服尺度、位置、形变等带来的输入异构问题,提升检测效果。下图展示了某种大小的滑窗在待检测图像上滑动的过程:

用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类_第3张图片

下图展示了采用滑窗(size=8×8, stride=2)对图片(10×10)进行对象检测的全过程示意。图示的输出为2×2的网格,每个格子对应一个输出标签向量,给出了原图对应的窗口区域图像的检测结果(置信度、边框位置、各类别概率等)。

用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类_第4张图片

要实现对象检测,需要有相应的目标识别模型(如上图中的Classifier),卷积神经网络(CNN)是其中的主流模型之一。但是,按照上图所示,采用CNN对每个窗口图像进行检测,会产生大量的重复计算(如卷积操作),为了提高检测效率,通过合理设计CNN模型,可以仅需一次前向传播而得出整个图像的滑窗检测结果。下图展示了相关的模型设计实现过程:

用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类_第5张图片

上图的三步描述了采用14×14大小的窗口进行滑动卷积时的CNN设计实现过程。采用(2)所设计的CNN对(3)中的输入图像进行检测,可以一次性得出最终的结果网格,其相应位置的网格映射了滑动窗口在原图像上的相应区域(如图中输出2×2网格左上角向量即为第一个窗口的CNN检测结果,图中的阴影标注了该窗口信息在CNN中的流动)。


更新18-09-16

修改了一下网络:

加上了nms

现在效果不错了

用CNN巧妙解决金字塔滑动窗口,用cnn一边滑动一遍输出预测分类_第6张图片

就是时间慢了点,可能还是网络偏大了,这时候发现,做数据集也是一个技术活,什么样的数据集就决定了网络能训练到什么样的高度,接下来 继续研究数据集。

以下是检测代码+网络代码:

import numpy as np
import cv2
import time
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,Dropout,Activation,Reshape
from keras.models import Sequential,Model
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint
from keras import optimizers 
sgd = optimizers.SGD(lr=0.001, decay=1e-5, momentum=0.99, nesterov=True)
model = Sequential()
model.add(Conv2D(filters=16,strides=1,kernel_size=(3,3),padding='same',activation='relu',input_shape=(None, None,3)))
model.add(Conv2D(filters=16,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(filters=32,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
# 32 * 50 * 50
model.add(Conv2D(filters=32,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(filters=64,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
model.add(Conv2D(filters=128,strides=1,kernel_size=(3,3),padding='same',activation='relu'))
# 32 * 50 * 50
#model.add(Conv2D(filters=64,kernel_size=(3,3),padding='same',activation='relu'))
model.add(MaxPooling2D((5,5)))
# 32 * 25 *25
#model.add(Flatten())
model.add(Conv2D(filters=3,strides=1,kernel_size=(10,15),padding='valid',activation='softmax',name='mutilCLS'))
#model.add(Conv2D(filters=3,kernel_size=(1,1),padding='valid',activation='relu'))
#model.add(Reshape([-1,-1,3]))
model.add(Reshape([3]))
#model.add(Flatten())
model.add(Dropout(0.6))
#model.add(Dense(3))
model.add(Activation('softmax'))
#model.add(Dropout(0.5))
#model.add(Dense(3,activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
print(model.summary())
model.load_weights('best_cnn_cla_depthwise.h5')

Mutil_layer_model = Model(inputs=model.input,
                                     outputs=model.get_layer('mutilCLS').output)
a = cv2.imread('testbi.png')
#a = cv2.resize(a,(102,102))
b = np.array(a)
b = b.reshape((1,)+b.shape)
sa = time.clock()
cp = Mutil_layer_model.predict(b)
cp = cp.reshape((cp.shape[1],cp.shape[2],3))
def py_nms(dets, thresh, mode="Union"):
    if len(dets) == 0:
        return []
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]
 
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]#倒序
 
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
 
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        if mode == "Union":
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
        elif mode == "Minimum":
            ovr = inter / np.minimum(areas[i], areas[order[1:]])
 
        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]
 
    return dets[keep]

print(cp)
db = time.clock()-sa
print(db)
dets=[]
for i in range(cp.shape[0]):
    for j in range(cp.shape[1]):
        if cp[i][j][1]>0.99:
#            print(i,j)
            dets.append([j*20,i*20,j*20+300,i*20+200,cp[i][j][0]])
#            cv2.rectangle(a,(j*20,i*20),(j*20+300,i*20+200),(255,0,0),1)
dets = np.array(dets).astype(np.int32)
arec = py_nms(dets,0.1)


for i in arec:
#            print(i,j)
#            dets.append([j*20,i*20,j*20+300,i*20+200,cp[i][j][0]])
    cv2.rectangle(a,(i[0],i[1]),(i[2],i[3]),(255,0,0),1)
cv2.imshow('aaa',a)

 

你可能感兴趣的:(人工智能,Python,机器学习,tensorflow,ķkeras,人工智能)