深度学习笔记:三维图片分类与三维卷积神经网络

简介

做为机器学习领域里的“Hello world”,MNIST 手写数字图片数据集,是许多人研初学机器学习时都接触过的数据集。近期,为了研究深度学习在时空序列数据方面的应用,我想要了解三维卷积神经网络。在入门阶段,我接触到了三维的 MNIST 数据集,并且根据国外研究者给出示例代码来理解了三维卷积神经网络的基本结构。

数据集:3D MNIST

2D vs 3D MNIST

3D MNIST 的 Kaggle 地址是 3D MNIST
相关数据的储存格式是.h5格式,数据集分割成了一下的数组:

X_train (10000, 4096)
y_train (10000)
X_test(2000, 4096)
y_test (2000)

训练集10000张图片,测试集2000张图片,每张图片被拉平成了4096维度的向量(长16X宽16X高16=4096)。

读取数据集的示例代码:

with h5py.File("../input/train_point_clouds.h5", "r") as hf:    
     X_train = hf["X_train"][:]
     y_train = hf["y_train"][:]    
     X_test = hf["X_test"][:]  
     y_test = hf["y_test"][:]  

既然数据集是三维的,那么,在识别图片所属数字的任务中,使用三维的卷积神经网络,是否比二维的卷积神经网络表现更佳呢?我们来实验一次。

二维卷积神经网络

本次试验,使用的是 Keras 框架,首先,载入所需模块。

from __future__ import division, print_function, absolute_import

from keras.models import Sequential, model_from_json
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau, TensorBoard

import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split

设置超参数

# set up hyperparameter
batch_size = 64
epochs = 20

在本地读取数据集

with h5py.File("/Users/apple/pydata/3d_mnist/full_dataset_vectors.h5","r") as h5:
    X_train, y_train = h5["X_train"][:], h5["y_train"][:]
    X_test, y_test = h5["X_test"][:], h5["y_test"][:]

验证集所用的图片标签转化为One-Hot的数组

y_train = to_categorical(y_train, num_classes=10)

这一次用的是二维的卷积神经网络,需要一个3D的矩阵,因此,没有添加RGB 彩色通道。

X_train = X_train.reshape(-1, 16, 16, 16)
X_test = X_test.reshape(-1, 16, 16, 16)
X_train,X_val,y_train,y_val = train_test_split(X_train, y_train,
                                              test_size=0.25,
                                              random_state=42)

定义二维卷积层

# Conv2D layer
def Conv(filters=16, kernel_size=(3,3), activation='relu', input_shape=None):
    if input_shape:
        return Conv2D(filters=filters, kernel_size = kernel_size, padding='Same'
                      , activation=activation, input_shape=input_shape)
    else:
        return Conv2D(filters=filters, kernel_size = kernel_size, padding='Same'
                      , activation=activation)

定义模型架构

# Define model
def CNN(input_dim, num_classes):
    model = Sequential()
    
    model.add((Conv(8, (3,3), input_shape=input_dim)))
    model.add((Conv(16,(3,3))))
    # model.add(BatchNormalization())
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(0.25))
    
    model.add(Conv(32,(3,3)))
    model.add(Conv(64, (3,3)))
    model.add(BatchNormalization())
    model.add(MaxPool2D())
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(num_classes, activation='softmax'))
    
    return model

定义训练参数,验证方法,保存模型以及加载模型

# Train Model

def train(optimizer, scheduler, gen):
    global model
    
    print("Training...Please wait")
    model.compile(optimizer='adam', loss = "categorical_crossentropy", metrics=["accuracy"])
    
    model.fit_generator(gen.flow(X_train, y_train, batch_size=batch_size),
                    epochs=epochs, validation_data=(X_val, y_val),
                    verbose=2, steps_per_epoch=X_train.shape[0]//batch_size,
                    callbacks=[scheduler, tensorboard])

def evaluate():
    global model
    
    pred = model.predict(X_test)
    pred = np.argmax(pred, axis=1)
    
    print(accuracy_score(pred, y_test))
    
    # Heat map
    
    array = confusion_matrix(y_test, pred)
    cm = pd.DataFrame(array, index = range(10), columns = range(10))
    plt.figure(figsize=(20,20))
    sns.heatmap(cm, annot=True)
    plt.show()
def save_model():
    global model
    
    model_json = model.to_json()
    with open('/Users/apple/pydata/3d_mnist/model/model_2D.json','w') as f:
        f.write(model_json)
        
    model.save_weights('/Users/apple/pydata/3d_mnist/model/model_2D.h5')
    
    print("Model Saved")

def load_model():
    f = open("/Users/apple/pydata/3d_mnist/model/model_2D.json","r")
    model_json = f.read()
    f.close()
    
    loaded_model = model_from_json(model_json)
    loaded_model.load_weights('/Users/apple/pydata/3d_mnist/model/model_2D.h5')
    
    print("Model Loaded.")
    
    return loaded_model

if __name__ == '__main__':

    optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    scheduler = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=1e-5)

    model = CNN((16,16,16), 10)

    gen = ImageDataGenerator(rotation_range=10, zoom_range = 0.1, width_shift_range=0.1, height_shift_range=0.1)
    gen.fit(X_train)

    train(optimizer, scheduler, gen)
    evaluate()
    save_model()

二维卷积神经网络结果:最高准确率68.5%

Training...Please wait
Epoch 1/20
 - 40s - loss: 2.2051 - acc: 0.2574 - val_loss: 1.4624 - val_acc: 0.4936
Epoch 2/20
 - 42s - loss: 1.4804 - acc: 0.4842 - val_loss: 1.2500 - val_acc: 0.5528
Epoch 3/20
 - 33s - loss: 1.3187 - acc: 0.5341 - val_loss: 1.2400 - val_acc: 0.5648
Epoch 4/20
 - 31s - loss: 1.2488 - acc: 0.5604 - val_loss: 1.0896 - val_acc: 0.6132
Epoch 5/20
 - 31s - loss: 1.2123 - acc: 0.5740 - val_loss: 1.1378 - val_acc: 0.5868
Epoch 6/20
 - 31s - loss: 1.1782 - acc: 0.5833 - val_loss: 1.0483 - val_acc: 0.6284
Epoch 7/20
 - 31s - loss: 1.1431 - acc: 0.5967 - val_loss: 1.0335 - val_acc: 0.6328
Epoch 8/20
 - 31s - loss: 1.1129 - acc: 0.6054 - val_loss: 1.0082 - val_acc: 0.6412
Epoch 9/20
 - 30s - loss: 1.1071 - acc: 0.6059 - val_loss: 1.0608 - val_acc: 0.6224
Epoch 10/20
 - 31s - loss: 1.0878 - acc: 0.6127 - val_loss: 0.9602 - val_acc: 0.6580
Epoch 11/20
 - 31s - loss: 1.0756 - acc: 0.6169 - val_loss: 1.0182 - val_acc: 0.6424
Epoch 12/20
 - 31s - loss: 1.0649 - acc: 0.6221 - val_loss: 0.9905 - val_acc: 0.6560
Epoch 13/20
 - 30s - loss: 1.0508 - acc: 0.6321 - val_loss: 0.9642 - val_acc: 0.6628
Epoch 14/20
 - 32s - loss: 1.0567 - acc: 0.6289 - val_loss: 0.9452 - val_acc: 0.6696
Epoch 15/20
 - 35s - loss: 1.0271 - acc: 0.6346 - val_loss: 0.9287 - val_acc: 0.6748
Epoch 16/20
 - 36s - loss: 1.0169 - acc: 0.6386 - val_loss: 0.9542 - val_acc: 0.6668
Epoch 17/20
 - 38s - loss: 0.9975 - acc: 0.6456 - val_loss: 0.9509 - val_acc: 0.6656
Epoch 18/20
 - 35s - loss: 1.0139 - acc: 0.6456 - val_loss: 0.9452 - val_acc: 0.6716

Epoch 00018: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 19/20
 - 36s - loss: 0.9616 - acc: 0.6586 - val_loss: 0.9114 - val_acc: 0.6856
Epoch 20/20
 - 31s - loss: 0.9359 - acc: 0.6652 - val_loss: 0.9137 - val_acc: 0.6832
0.6845

混淆矩阵 Confusion Matrix

image.png

Keras 的三维卷积神经网络

相对于常见的二维卷积,三维卷积的资料较少。下面是一个三维卷积的示例图:


3D CNN

三维卷积是一个三维的滤波器,它从三个维度(x,y,z)来计算低维的特征表示,输出是一个三维的卷积空间。它在视频的事件检测,三维医学影像图片等非常有用。当然,它的使用,不仅局限于三维空间,也可应用于二维的输入,比如图片等。

下面是代码实施部分:

首先,载入所需模块


from __future__ import division, print_function, absolute_import

from keras.models import Sequential, model_from_json
from keras.layers import Dense, Dropout, Flatten, Conv3D, MaxPool3D, BatchNormalization, Input
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau, TensorBoard
Using TensorFlow backend.

import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

from sklearn.metrics import confusion_matrix, accuracy_score
# Hyper Parameter
batch_size = 86
epochs = 20
# Set up TensorBoard
tensorboard = TensorBoard(batch_size=batch_size)

读取数据

with h5py.File("/Users/apple/pydata/3d_mnist/full_dataset_vectors.h5", 'r') as h5:
    X_train, y_train = h5["X_train"][:], h5["y_train"][:]
    X_test, y_test = h5["X_test"][:], h5["y_test"][:]

给图片添加 RGB 数据通道的维度(根据Kaggle数据页里提供plot3D.py文件,第一个函数)

# Translate data to color
def array_to_color(array, cmap="Oranges"):
    s_m = plt.cm.ScalarMappable(cmap=cmap)
    return s_m.to_rgba(array)[:,:-1]

def translate(x):
    xx = np.ndarray((x.shape[0], 4096, 3))
    for i in range(x.shape[0]):
        xx[i] = array_to_color(x[i])
        if i % 1000 == 0:
            print(i)
    # Free Memory
    del x

    return xx

数据转换为矢量形式

y_train = to_categorical(y_train, num_classes=10)
# y_test = to_categorical(y_test, num_classes=10)

X_train = translate(X_train).reshape(-1, 16, 16, 16, 3)
X_test  = translate(X_test).reshape(-1, 16, 16, 16, 3)

定义模型结构

# Conv3D layer
def Conv(filters=16, kernel_size=(3,3,3), activation='relu', input_shape=None):
    if input_shape:
        return Conv3D(filters=filters, kernel_size=kernel_size, padding='Same', activation=activation, input_shape=input_shape)
    else:
        return Conv3D(filters=filters, kernel_size=kernel_size, padding='Same', activation=activation)

# Define Model
def CNN(input_dim, num_classes):
    model = Sequential()

    model.add(Conv(8, (3,3,3), input_shape=input_dim))
    model.add(Conv(16, (3,3,3)))
    # model.add(BatchNormalization())
    model.add(MaxPool3D())
    # model.add(Dropout(0.25))

    model.add(Conv(32, (3,3,3)))
    model.add(Conv(64, (3,3,3)))
    model.add(BatchNormalization())
    model.add(MaxPool3D())
    model.add(Dropout(0.25))

    model.add(Flatten())

    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(num_classes, activation='softmax'))

    return model

定义训练参数,验证方法,保存模型以及加载模型

# Train Model
def train(optimizer, scheduler):
    global model

    print("Training...")
    model.compile(optimizer = 'adam' , loss = "categorical_crossentropy", metrics=["accuracy"])

    model.fit(x=X_train, y=y_train, batch_size=batch_size, epochs=epochs, validation_split=0.15,
                    verbose=2, callbacks=[scheduler, tensorboard])

def evaluate():
    global model

    pred = model.predict(X_test)
    pred = np.argmax(pred, axis=1)

    print(accuracy_score(pred,y_test))
    # Heat Map
    array = confusion_matrix(y_test, pred)
    cm = pd.DataFrame(array, index = range(10), columns = range(10))
    plt.figure(figsize=(20,20))
    sns.heatmap(cm, annot=True)
    plt.show()

def save_model():
    global model

    model_json = model.to_json()
    with open('/Users/apple/pydata/3d_mnist/model/model_3D.json', 'w') as f:
        f.write(model_json)

    model.save_weights('/Users/apple/pydata/3d_mnist/model/model_3D.h5')

    print('Model Saved.')

def load_model():
    f = open('model/model_3D.json', 'r')
    model_json = f.read()
    f.close()

    loaded_model = model_from_json(model_json)
    loaded_model.load_weights('/Users/apple/pydata/3d_mnist/model/model_3D.h5')

    print("Model Loaded.")
    return loaded_model

if __name__ == '__main__':

    optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    scheduler = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=1e-5)

    model = CNN((16,16,16,3), 10)

    train(optimizer, scheduler)
    evaluate()
    save_model()

三维卷积神经网络结果:最高准确率75%

Training...
Train on 8500 samples, validate on 1500 samples
Epoch 1/20
 - 696s - loss: 3.1408 - acc: 0.1760 - val_loss: 7.5856 - val_acc: 0.1973
Epoch 2/20
 - 703s - loss: 1.6178 - acc: 0.4213 - val_loss: 7.9127 - val_acc: 0.2127
Epoch 3/20
 - 798s - loss: 1.2917 - acc: 0.5452 - val_loss: 6.1975 - val_acc: 0.2987
Epoch 4/20
 - 757s - loss: 1.1254 - acc: 0.6035 - val_loss: 1.0294 - val_acc: 0.6527
Epoch 5/20
 - 691s - loss: 1.0346 - acc: 0.6421 - val_loss: 1.0982 - val_acc: 0.6247
Epoch 6/20
 - 707s - loss: 0.9758 - acc: 0.6581 - val_loss: 0.9593 - val_acc: 0.6673
Epoch 7/20
 - 791s - loss: 0.9062 - acc: 0.6854 - val_loss: 0.9851 - val_acc: 0.6520
Epoch 8/20
 - 776s - loss: 0.8520 - acc: 0.7064 - val_loss: 1.1886 - val_acc: 0.6320
Epoch 9/20
 - 771s - loss: 0.7860 - acc: 0.7273 - val_loss: 3.0187 - val_acc: 0.5213

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 10/20
 - 767s - loss: 0.6525 - acc: 0.7728 - val_loss: 1.0288 - val_acc: 0.6793
Epoch 11/20
 - 728s - loss: 0.5816 - acc: 0.7995 - val_loss: 1.0606 - val_acc: 0.6760
Epoch 12/20
 - 688s - loss: 0.5443 - acc: 0.8114 - val_loss: 0.8698 - val_acc: 0.7247
Epoch 13/20
 - 696s - loss: 0.4823 - acc: 0.8326 - val_loss: 0.9301 - val_acc: 0.7007
Epoch 14/20
 - 740s - loss: 0.4209 - acc: 0.8561 - val_loss: 0.9847 - val_acc: 0.7100
Epoch 15/20
 - 730s - loss: 0.3656 - acc: 0.8746 - val_loss: 0.9250 - val_acc: 0.7260
Epoch 16/20
 - 804s - loss: 0.3150 - acc: 0.8928 - val_loss: 0.9000 - val_acc: 0.7387
Epoch 17/20
 - 759s - loss: 0.2949 - acc: 0.8999 - val_loss: 0.8230 - val_acc: 0.7387
Epoch 18/20
 - 778s - loss: 0.2401 - acc: 0.9180 - val_loss: 0.9853 - val_acc: 0.7460
Epoch 19/20
 - 759s - loss: 0.1829 - acc: 0.9365 - val_loss: 1.0410 - val_acc: 0.7493
Epoch 20/20
 - 695s - loss: 0.1827 - acc: 0.9392 - val_loss: 0.9528 - val_acc: 0.7507
0.753

Confusion Matrix 混淆矩阵


Confusion Matrix

讨论

  • 结论:从本机上复现的结果来看,在3D MNIST 数据集上,三维卷积神经网络的预测准确率,相比二维卷积神经网络,有着显著提升,最高提升约6%。
  • 不足之处:仅仅是复用了开源代码,修改了batch_size 和 epoch,识别准确率还不够高。

To-do

  • 调整超参数,修改模型结构,试着提高准确率
    • 更多神经层,更深的结构
    • 学习率、梯度下降的其他方法、不同的批尺寸(batch_size)等等
  • 在其他3D 数据集上实验三维卷积神经网络

参考资料

3D-MNIST Image Classification
3D Convolutions : Understanding and Implementation

你可能感兴趣的:(深度学习笔记:三维图片分类与三维卷积神经网络)