使用keras的resnet,inceptionV3,xception模型,首先加载预训练模型的权重,通过预训练权重生成对猫狗的训练值和测试值的特征向量
预训练模型下载地址:http://pan.baidu.com/s/1geHmOpH
from keras.models import *
from keras.layers import *
from keras.applications import *
from keras.preprocessing.image import *
import h5py
def write_gap(MODEL, image_size, lambda_func=None):
width = image_size[0]
height = image_size[1]
input_tensor = Input((height, width, 3))
x = input_tensor
if lambda_func:
x = Lambda(lambda_func)(x)
base_model = MODEL(input_tensor=x, weights='imagenet', include_top=False)
#进行这一步时会从Keras网站中下载imagenet模型,如果网速慢或缺乏连接,
#可以直接在上面的连接下载,放到C:\Users\用户名\.keras\models 下
#(模型有可能不是最新的,导致进行到这一步仍然报错),
#或者直接点击报错的GitHub链接进行下载
model = Model(base_model.input, GlobalAveragePooling2D()(base_model.output))
gen = ImageDataGenerator()
train_generator = gen.flow_from_directory("train", image_size, shuffle=False,
batch_size=1)
test_generator = gen.flow_from_directory("test", image_size, shuffle=False,
batch_size=1, class_mode=None)
train = model.predict_generator(train_generator, train_generator.samples)
test = model.predict_generator(test_generator, test_generator.samples)
with h5py.File("gap_%s.h5"%MODEL.__name__) as h:
h.create_dataset("train", data=train)
h.create_dataset("test", data=test)
h.create_dataset("label", data=train_generator.classes)
接着加载三个模型,并分别提取出训练集和测试集的权重,放入h5文件中
write_gap(ResNet50, (224, 224))
write_gap(InceptionV3, (299, 299), inception_v3.preprocess_input)
write_gap(Xception, (299, 299), xception.preprocess_input)
得到的每个图片的特征向量都是2048维,所以每个特征文件储存内容都是
trian:(25000,2048)
label:(25000,1)
test:(12500,2048)
接着,把三个模型合并在一起,每个图片就有2048*3个权重值了
import h5py
import numpy as np
from sklearn.utils import shuffle
from keras.layers import Dense,Input,Dropout
from keras.models import Model
import get_csv
np.random.seed(2017)
X_train = []
X_test = []
for filenames in ["gap_ResNet50.h5", "gap_Xception.h5", "gap_InceptionV3.h5"]:
filename = filenames
with h5py.File(filename, 'r') as h:
X_train.append(np.array(h['train']))
X_test.append(np.array(h['test']))
y_train = np.array(h['label'])
X_train = np.concatenate(X_train, axis=1)
X_test = np.concatenate(X_test, axis=1)
然后我们基于这些权重值建立一个全连接
inputs = Input(X_train.shape[1:])#shape=(2048*3,)
x = Dropout(0.5)(inputs)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs, x)
model.compile(optimizer='adadelta',
loss='binary_crossentropy',
metrics=['accuracy'])
开始训练,会发现,准确度在第一次训练之后就已经到达99%了,全训练完不到半分钟,训练完成之后就直接用训练好的权重预测测试集吧
model.fit(X_train, y_train, batch_size=128, nb_epoch=8, validation_split=0.2,verbose=2)
y_pred = model.predict(X_test, verbose=1)
y_pred = y_pred.clip(min=0.005, max=0.995)
得到结果测试集放在csv文件里
import pandas as pd
from keras.preprocessing.image import *
df = pd.read_csv("D:\C_V_D\data\Keras_sever\sample_submission.csv")
gen = ImageDataGenerator()
test_generator = gen.flow_from_directory("D:\C_V_D\data\Keras_test", (224, 224), shuffle=False,
batch_size=1, class_mode=None)
for i, fname in enumerate(test_generator.filenames):
index = int(fname[fname.rfind('\\')+1:fname.rfind('.')])
df.set_value(index-1, 'label', y_pred[i])
df.to_csv('pred.csv', index=None)
df.head(10)
提交到kaggle上,loss只有0.038
训练集和测试集在这里下载:https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition