kaggle的人类蛋白图谱图像分类的比赛告一段落了,终于有时间闲下来写写这一路走来填的坑了。
keras的版本是2.2.4
有没有小伙伴遇到过用keras的InceptionV3、ResNet50等含有BN层的模型做迁移学习训练集和验证集结果相差很大的问题,例如下面这样:
Epoch 1/20
1500/1500 [==============================] - 24s 16ms/step - loss: 2.1168 - binary_accuracy: 0.9169 - f1_keras: 0.0617 - val_loss: 2.2727 - val_binary_accuracy: 0.9258 - val_f1_keras: 0.0377
Epoch 2/20
1500/1500 [==============================] - 19s 13ms/step - loss: 1.1976 - binary_accuracy: 0.9480 - f1_keras: 0.1084 - val_loss: 2.4163 - val_binary_accuracy: 0.9218 - val_f1_keras: 0.0356
Epoch 3/20
1500/1500 [==============================] - 19s 13ms/step - loss: 0.9935 - binary_accuracy: 0.9540 - f1_keras: 0.1608 - val_loss: 2.7485 - val_binary_accuracy: 0.9114 - val_f1_keras: 0.0359
Epoch 4/20
1500/1500 [==============================] - 19s 13ms/step - loss: 0.8294 - binary_accuracy: 0.9572 - f1_keras: 0.1902 - val_loss: 2.9039 - val_binary_accuracy: 0.9166 - val_f1_keras: 0.0402
Epoch 5/20
1500/1500 [==============================] - 19s 13ms/step - loss: 0.7250 - binary_accuracy: 0.9606 - f1_keras: 0.2482 - val_loss: 3.1574 - val_binary_accuracy: 0.9057 - val_f1_keras: 0.0485
可以看出,模型的训练集loss在一直减小,但是验证集的loss却一直增大,而且验证集的准确率和f1分数也与训练集的结果大相径庭。有小伙伴会怀疑会不会是过拟合了,楼主也曾这样怀疑过,所以楼主将验证集用训练集代替又做了次实验,也就是说训练集和验证集都是相同的样本集,这样一来得到的预期结果应该是训练集和验证集的结果都应该相同才对。但是却得到了跟上面几乎相同的结果。
楼主又用Vgg-19模型代替InceptionV3做了相同的实验,Vgg-19等不含有BN层的模型并未出现此问题。因此楼主怀疑是BN层搞得鬼,通过查找资料发现问题出在了建造模型的代码上。先给出错误的模型建造的代码(我个人的愚见,若我讲的不对,希望大神能够指出),下面的代码是keras官方给出的,楼主上面的结果就是用这个建造模型的代码结构(结构是一样的,内容稍有差别)跑出来的。
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
运行下model.summary()看一下模型结构:
activation_20 (Activation) (None, None, None, 6 0 batch_normalization_20[0][0]
__________________________________________________________________________________________________
activation_22 (Activation) (None, None, None, 6 0 batch_normalization_22[0][0]
__________________________________________________________________________________________________
activation_25 (Activation) (None, None, None, 9 0 batch_normalization_25[0][0]
__________________________________________________________________________________________________
activation_26 (Activation) (None, None, None, 6 0 batch_normalization_26[0][0]
__________________________________________________________________________________________________
mixed2 (Concatenate) (None, None, None, 2 0 activation_20[0][0]
activation_22[0][0]
activation_25[0][0]
activation_26[0][0]
__________________________________________________________________________________________________
conv2d_28 (Conv2D) (None, None, None, 6 18432 mixed2[0][0]
__________________________________________________________________________________________________
batch_normalization_28 (BatchNo (None, None, None, 6 192 conv2d_28[0][0]
__________________________________________________________________________________________________
activation_28 (Activation) (None, None, None, 6 0 batch_normalization_28[0][0]
__________________________________________________________________________________________________
conv2d_29 (Conv2D) (None, None, None, 9 55296 activation_28[0][0]
__________________________________________________________________________________________________
batch_normalization_29 (BatchNo (None, None, None, 9 288 conv2d_29[0][0]
__________________________________________________________________________________________________
activation_29 (Activation) (None, None, None, 9 0 batch_normalization_29[0][0]
__________________________________________________________________________________________________
conv2d_27 (Conv2D) (None, None, None, 3 995328 mixed2[0][0]
__________________________________________________________________________________________________
conv2d_30 (Conv2D) (None, None, None, 9 82944 activation_29[0][0]
__________________________________________________________________________________________________
batch_normalization_27 (BatchNo (None, None, None, 3 1152 conv2d_27[0][0]
__________________________________________________________________________________________________
batch_normalization_30 (BatchNo (None, None, None, 9 288 conv2d_30[0][0]
__________________________________________________________________________________________________
activation_27 (Activation) (None, None, None, 3 0 batch_normalization_27[0][0]
__________________________________________________________________________________________________
activation_30 (Activation) (None, None, None, 9 0 batch_normalization_30[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, None, None, 2 0 mixed2[0][0]
__________________________________________________________________________________________________
mixed3 (Concatenate) (None, None, None, 7 0 activation_27[0][0]
activation_30[0][0]
max_pooling2d_3[0][0]
__________________________________________________________________________________________________
conv2d_35 (Conv2D) (None, None, None, 1 98304 mixed3[0][0]
__________________________________________________________________________________________________
batch_normalization_35 (BatchNo (None, None, None, 1 384 conv2d_35[0][0]
__________________________________________________________________________________________________
activation_35 (Activation) (None, None, None, 1 0 batch_normalization_35[0][0]
__________________________________________________________________________________________________
conv2d_36 (Conv2D) (None, None, None, 1 114688 activation_35[0][0]
__________________________________________________________________________________________________
batch_normalization_36 (BatchNo (None, None, None, 1 384 conv2d_36[0][0]
__________________________________________________________________________________________________
activation_36 (Activation) (None, None, None, 1 0 batch_normalization_36[0][0]
__________________________________________________________________________________________________
conv2d_32 (Conv2D) (None, None, None, 1 98304 mixed3[0][0]
__________________________________________________________________________________________________
conv2d_37 (Conv2D) (None, None, None, 1 114688 activation_36[0][0]
__________________________________________________________________________________________________
batch_normalization_32 (BatchNo (None, None, None, 1 384 conv2d_32[0][0]
__________________________________________________________________________________________________
batch_normalization_37 (BatchNo (None, None, None, 1 384 conv2d_37[0][0]
__________________________________________________________________________________________________
activation_32 (Activation) (None, None, None, 1 0 batch_normalization_32[0][0]
__________________________________________________________________________________________________
activation_37 (Activation) (None, None, None, 1 0 batch_normalization_37[0][0]
__________________________________________________________________________________________________
conv2d_33 (Conv2D) (None, None, None, 1 114688 activation_32[0][0]
__________________________________________________________________________________________________
conv2d_38 (Conv2D) (None, None, None, 1 114688 activation_37[0][0]
__________________________________________________________________________________________________
batch_normalization_33 (BatchNo (None, None, None, 1 384 conv2d_33[0][0]
__________________________________________________________________________________________________
batch_normalization_38 (BatchNo (None, None, None, 1 384 conv2d_38[0][0]
__________________________________________________________________________________________________
activation_33 (Activation) (None, None, None, 1 0 batch_normalization_33[0][0]
__________________________________________________________________________________________________
activation_38 (Activation) (None, None, None, 1 0 batch_normalization_38[0][0]
__________________________________________________________________________________________________
average_pooling2d_4 (AveragePoo (None, None, None, 7 0 mixed3[0][0]
__________________________________________________________________________________________________
conv2d_31 (Conv2D) (None, None, None, 1 147456 mixed3[0][0]
__________________________________________________________________________________________________
conv2d_34 (Conv2D) (None, None, None, 1 172032 activation_33[0][0]
__________________________________________________________________________________________________
conv2d_39 (Conv2D) (None, None, None, 1 172032 activation_38[0][0]
__________________________________________________________________________________________________
conv2d_40 (Conv2D) (None, None, None, 1 147456 average_pooling2d_4[0][0]
__________________________________________________________________________________________________
batch_normalization_31 (BatchNo (None, None, None, 1 576 conv2d_31[0][0]
__________________________________________________________________________________________________
batch_normalization_34 (BatchNo (None, None, None, 1 576 conv2d_34[0][0]
__________________________________________________________________________________________________
batch_normalization_39 (BatchNo (None, None, None, 1 576 conv2d_39[0][0]
__________________________________________________________________________________________________
batch_normalization_40 (BatchNo (None, None, None, 1 576 conv2d_40[0][0]
__________________________________________________________________________________________________
activation_31 (Activation) (None, None, None, 1 0 batch_normalization_31[0][0]
__________________________________________________________________________________________________
activation_34 (Activation) (None, None, None, 1 0 batch_normalization_34[0][0]
__________________________________________________________________________________________________
activation_39 (Activation) (None, None, None, 1 0 batch_normalization_39[0][0]
__________________________________________________________________________________________________
activation_40 (Activation) (None, None, None, 1 0 batch_normalization_40[0][0]
__________________________________________________________________________________________________
mixed4 (Concatenate) (None, None, None, 7 0 activation_31[0][0]
activation_34[0][0]
activation_39[0][0]
activation_40[0][0]
__________________________________________________________________________________________________
conv2d_45 (Conv2D) (None, None, None, 1 122880 mixed4[0][0]
__________________________________________________________________________________________________
batch_normalization_45 (BatchNo (None, None, None, 1 480 conv2d_45[0][0]
__________________________________________________________________________________________________
activation_45 (Activation) (None, None, None, 1 0 batch_normalization_45[0][0]
__________________________________________________________________________________________________
conv2d_46 (Conv2D) (None, None, None, 1 179200 activation_45[0][0]
__________________________________________________________________________________________________
batch_normalization_46 (BatchNo (None, None, None, 1 480 conv2d_46[0][0]
__________________________________________________________________________________________________
activation_46 (Activation) (None, None, None, 1 0 batch_normalization_46[0][0]
__________________________________________________________________________________________________
conv2d_42 (Conv2D) (None, None, None, 1 122880 mixed4[0][0]
__________________________________________________________________________________________________
conv2d_47 (Conv2D) (None, None, None, 1 179200 activation_46[0][0]
__________________________________________________________________________________________________
batch_normalization_42 (BatchNo (None, None, None, 1 480 conv2d_42[0][0]
__________________________________________________________________________________________________
batch_normalization_47 (BatchNo (None, None, None, 1 480 conv2d_47[0][0]
__________________________________________________________________________________________________
activation_42 (Activation) (None, None, None, 1 0 batch_normalization_42[0][0]
__________________________________________________________________________________________________
activation_47 (Activation) (None, None, None, 1 0 batch_normalization_47[0][0]
__________________________________________________________________________________________________
conv2d_43 (Conv2D) (None, None, None, 1 179200 activation_42[0][0]
__________________________________________________________________________________________________
conv2d_48 (Conv2D) (None, None, None, 1 179200 activation_47[0][0]
__________________________________________________________________________________________________
batch_normalization_43 (BatchNo (None, None, None, 1 480 conv2d_43[0][0]
__________________________________________________________________________________________________
batch_normalization_48 (BatchNo (None, None, None, 1 480 conv2d_48[0][0]
__________________________________________________________________________________________________
activation_43 (Activation) (None, None, None, 1 0 batch_normalization_43[0][0]
__________________________________________________________________________________________________
activation_48 (Activation) (None, None, None, 1 0 batch_normalization_48[0][0]
__________________________________________________________________________________________________
average_pooling2d_5 (AveragePoo (None, None, None, 7 0 mixed4[0][0]
__________________________________________________________________________________________________
conv2d_41 (Conv2D) (None, None, None, 1 147456 mixed4[0][0]
__________________________________________________________________________________________________
conv2d_44 (Conv2D) (None, None, None, 1 215040 activation_43[0][0]
__________________________________________________________________________________________________
conv2d_49 (Conv2D) (None, None, None, 1 215040 activation_48[0][0]
__________________________________________________________________________________________________
conv2d_50 (Conv2D) (None, None, None, 1 147456 average_pooling2d_5[0][0]
__________________________________________________________________________________________________
batch_normalization_41 (BatchNo (None, None, None, 1 576 conv2d_41[0][0]
__________________________________________________________________________________________________
batch_normalization_44 (BatchNo (None, None, None, 1 576 conv2d_44[0][0]
__________________________________________________________________________________________________
batch_normalization_49 (BatchNo (None, None, None, 1 576 conv2d_49[0][0]
__________________________________________________________________________________________________
batch_normalization_50 (BatchNo (None, None, None, 1 576 conv2d_50[0][0]
__________________________________________________________________________________________________
activation_41 (Activation) (None, None, None, 1 0 batch_normalization_41[0][0]
__________________________________________________________________________________________________
activation_44 (Activation) (None, None, None, 1 0 batch_normalization_44[0][0]
__________________________________________________________________________________________________
activation_49 (Activation) (None, None, None, 1 0 batch_normalization_49[0][0]
__________________________________________________________________________________________________
activation_50 (Activation) (None, None, None, 1 0 batch_normalization_50[0][0]
__________________________________________________________________________________________________
mixed5 (Concatenate) (None, None, None, 7 0 activation_41[0][0]
activation_44[0][0]
activation_49[0][0]
activation_50[0][0]
__________________________________________________________________________________________________
conv2d_55 (Conv2D) (None, None, None, 1 122880 mixed5[0][0]
__________________________________________________________________________________________________
batch_normalization_55 (BatchNo (None, None, None, 1 480 conv2d_55[0][0]
__________________________________________________________________________________________________
activation_55 (Activation) (None, None, None, 1 0 batch_normalization_55[0][0]
__________________________________________________________________________________________________
conv2d_56 (Conv2D) (None, None, None, 1 179200 activation_55[0][0]
__________________________________________________________________________________________________
batch_normalization_56 (BatchNo (None, None, None, 1 480 conv2d_56[0][0]
__________________________________________________________________________________________________
activation_56 (Activation) (None, None, None, 1 0 batch_normalization_56[0][0]
__________________________________________________________________________________________________
conv2d_52 (Conv2D) (None, None, None, 1 122880 mixed5[0][0]
__________________________________________________________________________________________________
conv2d_57 (Conv2D) (None, None, None, 1 179200 activation_56[0][0]
__________________________________________________________________________________________________
batch_normalization_52 (BatchNo (None, None, None, 1 480 conv2d_52[0][0]
__________________________________________________________________________________________________
batch_normalization_57 (BatchNo (None, None, None, 1 480 conv2d_57[0][0]
__________________________________________________________________________________________________
activation_52 (Activation) (None, None, None, 1 0 batch_normalization_52[0][0]
__________________________________________________________________________________________________
activation_57 (Activation) (None, None, None, 1 0 batch_normalization_57[0][0]
__________________________________________________________________________________________________
conv2d_53 (Conv2D) (None, None, None, 1 179200 activation_52[0][0]
__________________________________________________________________________________________________
conv2d_58 (Conv2D) (None, None, None, 1 179200 activation_57[0][0]
__________________________________________________________________________________________________
batch_normalization_53 (BatchNo (None, None, None, 1 480 conv2d_53[0][0]
__________________________________________________________________________________________________
batch_normalization_58 (BatchNo (None, None, None, 1 480 conv2d_58[0][0]
__________________________________________________________________________________________________
activation_53 (Activation) (None, None, None, 1 0 batch_normalization_53[0][0]
__________________________________________________________________________________________________
activation_58 (Activation) (None, None, None, 1 0 batch_normalization_58[0][0]
__________________________________________________________________________________________________
average_pooling2d_6 (AveragePoo (None, None, None, 7 0 mixed5[0][0]
__________________________________________________________________________________________________
conv2d_51 (Conv2D) (None, None, None, 1 147456 mixed5[0][0]
__________________________________________________________________________________________________
conv2d_54 (Conv2D) (None, None, None, 1 215040 activation_53[0][0]
__________________________________________________________________________________________________
conv2d_59 (Conv2D) (None, None, None, 1 215040 activation_58[0][0]
__________________________________________________________________________________________________
conv2d_60 (Conv2D) (None, None, None, 1 147456 average_pooling2d_6[0][0]
__________________________________________________________________________________________________
batch_normalization_51 (BatchNo (None, None, None, 1 576 conv2d_51[0][0]
__________________________________________________________________________________________________
batch_normalization_54 (BatchNo (None, None, None, 1 576 conv2d_54[0][0]
__________________________________________________________________________________________________
batch_normalization_59 (BatchNo (None, None, None, 1 576 conv2d_59[0][0]
__________________________________________________________________________________________________
batch_normalization_60 (BatchNo (None, None, None, 1 576 conv2d_60[0][0]
__________________________________________________________________________________________________
activation_51 (Activation) (None, None, None, 1 0 batch_normalization_51[0][0]
__________________________________________________________________________________________________
activation_54 (Activation) (None, None, None, 1 0 batch_normalization_54[0][0]
__________________________________________________________________________________________________
activation_59 (Activation) (None, None, None, 1 0 batch_normalization_59[0][0]
__________________________________________________________________________________________________
activation_60 (Activation) (None, None, None, 1 0 batch_normalization_60[0][0]
__________________________________________________________________________________________________
mixed6 (Concatenate) (None, None, None, 7 0 activation_51[0][0]
activation_54[0][0]
activation_59[0][0]
activation_60[0][0]
__________________________________________________________________________________________________
conv2d_65 (Conv2D) (None, None, None, 1 147456 mixed6[0][0]
__________________________________________________________________________________________________
batch_normalization_65 (BatchNo (None, None, None, 1 576 conv2d_65[0][0]
__________________________________________________________________________________________________
activation_65 (Activation) (None, None, None, 1 0 batch_normalization_65[0][0]
__________________________________________________________________________________________________
conv2d_66 (Conv2D) (None, None, None, 1 258048 activation_65[0][0]
__________________________________________________________________________________________________
batch_normalization_66 (BatchNo (None, None, None, 1 576 conv2d_66[0][0]
__________________________________________________________________________________________________
activation_66 (Activation) (None, None, None, 1 0 batch_normalization_66[0][0]
__________________________________________________________________________________________________
conv2d_62 (Conv2D) (None, None, None, 1 147456 mixed6[0][0]
__________________________________________________________________________________________________
conv2d_67 (Conv2D) (None, None, None, 1 258048 activation_66[0][0]
__________________________________________________________________________________________________
batch_normalization_62 (BatchNo (None, None, None, 1 576 conv2d_62[0][0]
__________________________________________________________________________________________________
batch_normalization_67 (BatchNo (None, None, None, 1 576 conv2d_67[0][0]
__________________________________________________________________________________________________
activation_62 (Activation) (None, None, None, 1 0 batch_normalization_62[0][0]
__________________________________________________________________________________________________
activation_67 (Activation) (None, None, None, 1 0 batch_normalization_67[0][0]
__________________________________________________________________________________________________
conv2d_63 (Conv2D) (None, None, None, 1 258048 activation_62[0][0]
__________________________________________________________________________________________________
conv2d_68 (Conv2D) (None, None, None, 1 258048 activation_67[0][0]
__________________________________________________________________________________________________
batch_normalization_63 (BatchNo (None, None, None, 1 576 conv2d_63[0][0]
__________________________________________________________________________________________________
batch_normalization_68 (BatchNo (None, None, None, 1 576 conv2d_68[0][0]
__________________________________________________________________________________________________
activation_63 (Activation) (None, None, None, 1 0 batch_normalization_63[0][0]
__________________________________________________________________________________________________
activation_68 (Activation) (None, None, None, 1 0 batch_normalization_68[0][0]
__________________________________________________________________________________________________
average_pooling2d_7 (AveragePoo (None, None, None, 7 0 mixed6[0][0]
__________________________________________________________________________________________________
conv2d_61 (Conv2D) (None, None, None, 1 147456 mixed6[0][0]
__________________________________________________________________________________________________
conv2d_64 (Conv2D) (None, None, None, 1 258048 activation_63[0][0]
__________________________________________________________________________________________________
conv2d_69 (Conv2D) (None, None, None, 1 258048 activation_68[0][0]
__________________________________________________________________________________________________
conv2d_70 (Conv2D) (None, None, None, 1 147456 average_pooling2d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_61 (BatchNo (None, None, None, 1 576 conv2d_61[0][0]
__________________________________________________________________________________________________
batch_normalization_64 (BatchNo (None, None, None, 1 576 conv2d_64[0][0]
__________________________________________________________________________________________________
batch_normalization_69 (BatchNo (None, None, None, 1 576 conv2d_69[0][0]
__________________________________________________________________________________________________
batch_normalization_70 (BatchNo (None, None, None, 1 576 conv2d_70[0][0]
__________________________________________________________________________________________________
activation_61 (Activation) (None, None, None, 1 0 batch_normalization_61[0][0]
__________________________________________________________________________________________________
activation_64 (Activation) (None, None, None, 1 0 batch_normalization_64[0][0]
__________________________________________________________________________________________________
activation_69 (Activation) (None, None, None, 1 0 batch_normalization_69[0][0]
__________________________________________________________________________________________________
activation_70 (Activation) (None, None, None, 1 0 batch_normalization_70[0][0]
__________________________________________________________________________________________________
mixed7 (Concatenate) (None, None, None, 7 0 activation_61[0][0]
activation_64[0][0]
activation_69[0][0]
activation_70[0][0]
__________________________________________________________________________________________________
conv2d_73 (Conv2D) (None, None, None, 1 147456 mixed7[0][0]
__________________________________________________________________________________________________
batch_normalization_73 (BatchNo (None, None, None, 1 576 conv2d_73[0][0]
__________________________________________________________________________________________________
activation_73 (Activation) (None, None, None, 1 0 batch_normalization_73[0][0]
__________________________________________________________________________________________________
conv2d_74 (Conv2D) (None, None, None, 1 258048 activation_73[0][0]
__________________________________________________________________________________________________
batch_normalization_74 (BatchNo (None, None, None, 1 576 conv2d_74[0][0]
__________________________________________________________________________________________________
activation_74 (Activation) (None, None, None, 1 0 batch_normalization_74[0][0]
__________________________________________________________________________________________________
conv2d_71 (Conv2D) (None, None, None, 1 147456 mixed7[0][0]
__________________________________________________________________________________________________
conv2d_75 (Conv2D) (None, None, None, 1 258048 activation_74[0][0]
__________________________________________________________________________________________________
batch_normalization_71 (BatchNo (None, None, None, 1 576 conv2d_71[0][0]
__________________________________________________________________________________________________
batch_normalization_75 (BatchNo (None, None, None, 1 576 conv2d_75[0][0]
__________________________________________________________________________________________________
activation_71 (Activation) (None, None, None, 1 0 batch_normalization_71[0][0]
__________________________________________________________________________________________________
activation_75 (Activation) (None, None, None, 1 0 batch_normalization_75[0][0]
__________________________________________________________________________________________________
conv2d_72 (Conv2D) (None, None, None, 3 552960 activation_71[0][0]
__________________________________________________________________________________________________
conv2d_76 (Conv2D) (None, None, None, 1 331776 activation_75[0][0]
__________________________________________________________________________________________________
batch_normalization_72 (BatchNo (None, None, None, 3 960 conv2d_72[0][0]
__________________________________________________________________________________________________
batch_normalization_76 (BatchNo (None, None, None, 1 576 conv2d_76[0][0]
__________________________________________________________________________________________________
activation_72 (Activation) (None, None, None, 3 0 batch_normalization_72[0][0]
__________________________________________________________________________________________________
activation_76 (Activation) (None, None, None, 1 0 batch_normalization_76[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D) (None, None, None, 7 0 mixed7[0][0]
__________________________________________________________________________________________________
mixed8 (Concatenate) (None, None, None, 1 0 activation_72[0][0]
activation_76[0][0]
max_pooling2d_4[0][0]
__________________________________________________________________________________________________
conv2d_81 (Conv2D) (None, None, None, 4 573440 mixed8[0][0]
__________________________________________________________________________________________________
batch_normalization_81 (BatchNo (None, None, None, 4 1344 conv2d_81[0][0]
__________________________________________________________________________________________________
activation_81 (Activation) (None, None, None, 4 0 batch_normalization_81[0][0]
__________________________________________________________________________________________________
conv2d_78 (Conv2D) (None, None, None, 3 491520 mixed8[0][0]
__________________________________________________________________________________________________
conv2d_82 (Conv2D) (None, None, None, 3 1548288 activation_81[0][0]
__________________________________________________________________________________________________
batch_normalization_78 (BatchNo (None, None, None, 3 1152 conv2d_78[0][0]
__________________________________________________________________________________________________
batch_normalization_82 (BatchNo (None, None, None, 3 1152 conv2d_82[0][0]
__________________________________________________________________________________________________
activation_78 (Activation) (None, None, None, 3 0 batch_normalization_78[0][0]
__________________________________________________________________________________________________
activation_82 (Activation) (None, None, None, 3 0 batch_normalization_82[0][0]
__________________________________________________________________________________________________
conv2d_79 (Conv2D) (None, None, None, 3 442368 activation_78[0][0]
__________________________________________________________________________________________________
conv2d_80 (Conv2D) (None, None, None, 3 442368 activation_78[0][0]
__________________________________________________________________________________________________
conv2d_83 (Conv2D) (None, None, None, 3 442368 activation_82[0][0]
__________________________________________________________________________________________________
conv2d_84 (Conv2D) (None, None, None, 3 442368 activation_82[0][0]
__________________________________________________________________________________________________
average_pooling2d_8 (AveragePoo (None, None, None, 1 0 mixed8[0][0]
__________________________________________________________________________________________________
conv2d_77 (Conv2D) (None, None, None, 3 409600 mixed8[0][0]
__________________________________________________________________________________________________
batch_normalization_79 (BatchNo (None, None, None, 3 1152 conv2d_79[0][0]
__________________________________________________________________________________________________
batch_normalization_80 (BatchNo (None, None, None, 3 1152 conv2d_80[0][0]
__________________________________________________________________________________________________
batch_normalization_83 (BatchNo (None, None, None, 3 1152 conv2d_83[0][0]
__________________________________________________________________________________________________
batch_normalization_84 (BatchNo (None, None, None, 3 1152 conv2d_84[0][0]
__________________________________________________________________________________________________
conv2d_85 (Conv2D) (None, None, None, 1 245760 average_pooling2d_8[0][0]
__________________________________________________________________________________________________
batch_normalization_77 (BatchNo (None, None, None, 3 960 conv2d_77[0][0]
__________________________________________________________________________________________________
activation_79 (Activation) (None, None, None, 3 0 batch_normalization_79[0][0]
__________________________________________________________________________________________________
activation_80 (Activation) (None, None, None, 3 0 batch_normalization_80[0][0]
__________________________________________________________________________________________________
activation_83 (Activation) (None, None, None, 3 0 batch_normalization_83[0][0]
__________________________________________________________________________________________________
activation_84 (Activation) (None, None, None, 3 0 batch_normalization_84[0][0]
__________________________________________________________________________________________________
batch_normalization_85 (BatchNo (None, None, None, 1 576 conv2d_85[0][0]
__________________________________________________________________________________________________
activation_77 (Activation) (None, None, None, 3 0 batch_normalization_77[0][0]
__________________________________________________________________________________________________
mixed9_0 (Concatenate) (None, None, None, 7 0 activation_79[0][0]
activation_80[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, None, None, 7 0 activation_83[0][0]
activation_84[0][0]
__________________________________________________________________________________________________
activation_85 (Activation) (None, None, None, 1 0 batch_normalization_85[0][0]
__________________________________________________________________________________________________
mixed9 (Concatenate) (None, None, None, 2 0 activation_77[0][0]
mixed9_0[0][0]
concatenate_1[0][0]
activation_85[0][0]
__________________________________________________________________________________________________
conv2d_90 (Conv2D) (None, None, None, 4 917504 mixed9[0][0]
__________________________________________________________________________________________________
batch_normalization_90 (BatchNo (None, None, None, 4 1344 conv2d_90[0][0]
__________________________________________________________________________________________________
activation_90 (Activation) (None, None, None, 4 0 batch_normalization_90[0][0]
__________________________________________________________________________________________________
conv2d_87 (Conv2D) (None, None, None, 3 786432 mixed9[0][0]
__________________________________________________________________________________________________
conv2d_91 (Conv2D) (None, None, None, 3 1548288 activation_90[0][0]
__________________________________________________________________________________________________
batch_normalization_87 (BatchNo (None, None, None, 3 1152 conv2d_87[0][0]
__________________________________________________________________________________________________
batch_normalization_91 (BatchNo (None, None, None, 3 1152 conv2d_91[0][0]
__________________________________________________________________________________________________
activation_87 (Activation) (None, None, None, 3 0 batch_normalization_87[0][0]
__________________________________________________________________________________________________
activation_91 (Activation) (None, None, None, 3 0 batch_normalization_91[0][0]
__________________________________________________________________________________________________
conv2d_88 (Conv2D) (None, None, None, 3 442368 activation_87[0][0]
__________________________________________________________________________________________________
conv2d_89 (Conv2D) (None, None, None, 3 442368 activation_87[0][0]
__________________________________________________________________________________________________
conv2d_92 (Conv2D) (None, None, None, 3 442368 activation_91[0][0]
__________________________________________________________________________________________________
conv2d_93 (Conv2D) (None, None, None, 3 442368 activation_91[0][0]
__________________________________________________________________________________________________
average_pooling2d_9 (AveragePoo (None, None, None, 2 0 mixed9[0][0]
__________________________________________________________________________________________________
conv2d_86 (Conv2D) (None, None, None, 3 655360 mixed9[0][0]
__________________________________________________________________________________________________
batch_normalization_88 (BatchNo (None, None, None, 3 1152 conv2d_88[0][0]
__________________________________________________________________________________________________
batch_normalization_89 (BatchNo (None, None, None, 3 1152 conv2d_89[0][0]
__________________________________________________________________________________________________
batch_normalization_92 (BatchNo (None, None, None, 3 1152 conv2d_92[0][0]
__________________________________________________________________________________________________
batch_normalization_93 (BatchNo (None, None, None, 3 1152 conv2d_93[0][0]
__________________________________________________________________________________________________
conv2d_94 (Conv2D) (None, None, None, 1 393216 average_pooling2d_9[0][0]
__________________________________________________________________________________________________
batch_normalization_86 (BatchNo (None, None, None, 3 960 conv2d_86[0][0]
__________________________________________________________________________________________________
activation_88 (Activation) (None, None, None, 3 0 batch_normalization_88[0][0]
__________________________________________________________________________________________________
activation_89 (Activation) (None, None, None, 3 0 batch_normalization_89[0][0]
__________________________________________________________________________________________________
activation_92 (Activation) (None, None, None, 3 0 batch_normalization_92[0][0]
__________________________________________________________________________________________________
activation_93 (Activation) (None, None, None, 3 0 batch_normalization_93[0][0]
__________________________________________________________________________________________________
batch_normalization_94 (BatchNo (None, None, None, 1 576 conv2d_94[0][0]
__________________________________________________________________________________________________
activation_86 (Activation) (None, None, None, 3 0 batch_normalization_86[0][0]
__________________________________________________________________________________________________
mixed9_1 (Concatenate) (None, None, None, 7 0 activation_88[0][0]
activation_89[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, None, None, 7 0 activation_92[0][0]
activation_93[0][0]
__________________________________________________________________________________________________
activation_94 (Activation) (None, None, None, 1 0 batch_normalization_94[0][0]
__________________________________________________________________________________________________
mixed10 (Concatenate) (None, None, None, 2 0 activation_86[0][0]
mixed9_1[0][0]
concatenate_2[0][0]
activation_94[0][0]
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 2048) 0 mixed10[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1024) 2098176 global_average_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 200) 205000 dense_1[0][0]
==================================================================================================
Total params: 24,105,960
Trainable params: 2,303,176
Non-trainable params: 21,802,784
__________________________________________________________________________________________________
你的迁移学习模型结构如果是这样,就说明有问题了。
将上面的代码修改成这样就可以了:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from keras import backend as K
# create the base pre-trained model
Inp = Input((224, 224, 3))
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224,224,3))
x = base_model(Inp)
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=Inp, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
运行下model.summary()再看一下模型结构:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
inception_v3 (Model) (None, 5, 5, 2048) 21802784
_________________________________________________________________
global_average_pooling2d_2 ( (None, 2048) 0
_________________________________________________________________
dense_3 (Dense) (None, 1024) 2098176
_________________________________________________________________
dense_4 (Dense) (None, 200) 205000
=================================================================
Total params: 24,105,960
Trainable params: 2,303,176
Non-trainable params: 21,802,784
_________________________________________________________________
看一下正确的结果:
Epoch 1/20
1500/1500 [==============================] - 27s 18ms/step - loss: 2.4664 - binary_accuracy: 0.9125 - f1_keras: 0.0521 - val_loss: 1.4697 - val_binary_accuracy: 0.9456 - val_f1_keras: 0.0619
Epoch 2/20
1500/1500 [==============================] - 19s 13ms/step - loss: 1.2806 - binary_accuracy: 0.9467 - f1_keras: 0.0795 - val_loss: 1.2819 - val_binary_accuracy: 0.9466 - val_f1_keras: 0.0839
Epoch 3/20
1500/1500 [==============================] - 19s 13ms/step - loss: 1.0431 - binary_accuracy: 0.9526 - f1_keras: 0.1203 - val_loss: 1.3012 - val_binary_accuracy: 0.9468 - val_f1_keras: 0.0908
Epoch 4/20
1500/1500 [==============================] - 19s 13ms/step - loss: 0.9168 - binary_accuracy: 0.9555 - f1_keras: 0.1493 - val_loss: 1.3257 - val_binary_accuracy: 0.9445 - val_f1_keras: 0.0922
Epoch 5/20
1500/1500 [==============================] - 19s 13ms/step - loss: 0.8281 - binary_accuracy: 0.9577 - f1_keras: 0.1959 - val_loss: 1.3123 - val_binary_accuracy: 0.9468 - val_f1_keras: 0.0969
可以看出验证集的准确率正常了。细心的同学会发现验证集的f1分数与训练集还是有差距的,这是因为我为了测试模型所以只用了1500个样本训练,过拟合也很正常。
如果想解冻base_model的后N层,可以先运行下面代码,看看一共有多少层并且都是哪些层:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
再根据需求解冻后N层
for layer in model.layers[:-N]:
layer.trainable = False
for layer in model.layers[-N:]:
layer.trainable = True
解决了问题的同学,留个赞再走呀?
参考资料:https://github.com/keras-team/keras/pull/9965#discussion_r187806860