全连接神经网络、RBF神经网络、卷积神经网络分类器Python(TensorFlow)实现(完整代码)

一、概述

机器学习第三次小作业,要求实现全连接神经网络,RBF神经网络和CNN分类器,做了三天,终于把这些都弄得差不多明白了,利用Python和TensorFlow把这些实现一下,代码放这里做个记录(怕以后丢了忘了)。在这个过程中遇到很多问题,从神经网络的前向传播到反向传播,从网络结构设计到参数调整,里面需要学习的地方太多了,不同的网络具有不同的特征,但是时间有限,这里先记录个问题大纲,等寒假回家看看能不能把这个再详细补充一下。

二、问题大纲

1. 全连接神经网络

(1)前向传播:样本输入先乘个权重矩阵,通过隐含层神经元处理一下,隐含层输出再与下一个矩阵权重相乘,然后再经过神经元处理,具体需要多少隐含层视情况而定;
(2)反向传播:反向传播这个过程就可以直接利用TensorFlow中的优化方法就行,设置好学习率和定义好损失函数即可;
(3)损失函数:一般分类问题,我个人觉得把标签转化成onehot表示比较好,例如一共有5种标签,那么标签为3转化成的onehot形式就是 [ 0 , 0 , 1 , 0 , 0 ] [0,0,1,0,0] [0,0,1,0,0],如果使用这种形式的标签,那么最后输出的维度肯定是与标签类别数一致,损失函数也应该使用交叉熵来定义。当然可以使用原始的标签,那么最后输出就是一个值,如果利用batch样本的话,可以使用均方差来定义损失函数;

2. RBF神经网络

(1)前向传播:与全连接神经网络不同的是,RBF前向传播不需要通过权重矩阵来连接输入层与隐含层,将数据直接放到隐含层进行径向基函数的映射,不过隐含层到输出层是通过权重矩阵连接的,这两层与全连接神经网络的处理过程相同;
(2)训练参数:RBF实际上在神经网络的训练过程中需要训练的参数只有隐含层到输出层之间的权重矩阵,另外还有两个参数是隐含层的中心点 c c c和方差 σ \sigma σ(其实我觉得也可以把隐含层的中心点与方差都放入网络中一起训练)。对于分类问题,如果数据有多少类,那么隐含层就有多少个节点,也就是 c c c有多少个。计算中心点的方法大概有三种,我个人觉得使用KMeans聚类得到中心点方法比较好。对于 σ \sigma σ的计算,我看了一些资料,说法不一。对于 σ i \sigma_i σi,有人说是计算当前隐含层节点i的中心点 c i c_i ci与其他所有节点的中心点的最大距离,也有人说是最小距离乘上一个系数 λ \lambda λ,我觉得差别不大。对于高斯径向基 e − ∥ x − c i ∥ 2 σ 2 e^{-\frac{\|x-c_i\|^2}{\sigma^2}} eσ2xci2,(个人猜测)其实本质上的一个原因就是给数据做一个标准化,否则可能造成计算出的距离 d = ∥ x − c i ∥ 2 d=\|x-c_i\|^2 d=xci2过大(其实也没有很大,但是 e − d e^{-d} ed就会基本为0)。总结一下, c c c是一个数据类别数N*样本维度M的矩阵, σ \sigma σ是一个N维的向量。
(3)准确率:目前还不知道为什么,使用RBF神经网络对MNIST、Yale、lung以及gisette、COIL20等等的数据集训练的准确率很低,20%不到,试过了很多种方法,依然没能有效的解决这个问题,但是不知道为什么,我自己随机生成了一组数据,训练的准确率能到70%,具体原因还得继续深挖。

3. 卷积神经网络

(1)卷积层:对于本次实验来说,在卷积层所做的操作都是先把输入的一个样本reshape成一个二维矩阵,然后定义卷积核(卷积核的数量和大小自己定合适即可,而卷积核的维度也就是所说的通道数根据数据决定,如果数据只是一个灰度图,那么维度为1,如果为RGB三通道,那么维度为3)。这里需要注意一点的是,如果定义了两层卷积层,那么第二层的卷积核通道数就是第一层卷积核的数量(可以自行推演一遍,很容易理解),在本次作业中,我定义了两层卷积层;
(2)池化层:在卷积层之后起到对数据收缩的作用,很大程度降低参数规模,同时能够较好保持数据特征,比如某个卷积核卷积后得到一个 32 × 32 32\times32 32×32的数据,利用 2 × 2 2\times2 2×2的池化层池化后就会得到一个 16 × 16 16\times16 16×16的数据。
(3)全连接层:全连接层就是连接最后一个池化层/卷积层到输出层的权重矩阵,和上述全连接神经网络与RBF神经网络的最后一个权重矩阵作用相同。CNN的训练需要较大数量的数据,才能保证较高的准确率。

三、代码

datadvi.py

from scipy.io import loadmat
import numpy as np

def divdata():
    filename = 'C:/Users/ALIENWARE/Documents/作业/机器学习/datasets/' + input("input name of data file: ")
    data = loadmat(filename)


    if filename == 'C:/Users/ALIENWARE/Documents/作业/机器学习/datasets/COIL20.mat':
        dataX = data['fea']
        dataY = data['gnd'][0]
        print(len(dataX[0]))
    else:
        dataX = data['X']
        dataY = data['Y'].T[0]

    divideornot = input("divide data or not?(Yes/No): ")
    if divideornot == 'Yes':
        dataX_train = []
        dataX_predict = []
        dataY_train = []
        dataY_predict = []
        num_Y = np.unique(dataY).astype(int)
        for i in range(len(num_Y)):
            temp = dataY == num_Y[i]
            temp.astype(float)
            num_Y[i] = np.sum(temp)
            flag = 0
            for j in range(len(dataY)):
                if temp[j] == 1:
                    if flag < int(round(0.9 * num_Y[i])):
                        dataX_train.append(dataX[j])
                        dataY_train.append(dataY[j])
                        flag += 1
                    else:
                        dataX_predict.append(dataX[j])
                        dataY_predict.append(dataY[j])

        dataX_train = np.array(dataX_train)
        dataX_predict = np.array(dataX_predict)
        dataY_train = np.array(dataY_train)
        dataY_predict = np.array(dataY_predict)
        return dataX_train,dataX_predict,dataY_train,dataY_predict
    else:
        return dataX,dataX,dataY,dataY

FullyConnectedNN.py

import tensorflow as tf
import datadvi
import numpy as np

def FCNN(batch_size=50,learning_rate=0.0001,iteration_times=5000):
    dataX_train,dataX_predict,dataY_train,dataY_predict = datadvi.divdata()
    X = dataX_train
    Y1 = dataY_train
    print(Y1)
    num_label = len(np.unique(Y1))
    Y = (np.arange(num_label)+1 == Y1[:, None]).astype(np.float32)
    dataY_predict1 = (np.arange(num_label)+1 == dataY_predict[:, None]).astype(np.float32)
    x = tf.placeholder(tf.float32, shape=[None, len(X[0])])
    y = tf.placeholder(tf.float32, shape=[None, num_label])

    w1 = tf.Variable(tf.random_normal([len(X[0]), 256], stddev=1, seed=1))
    w2 = tf.Variable(tf.random_normal([256, 256], stddev=1, seed=1))
    w_out = tf.Variable(tf.random_normal([256, num_label], stddev=1, seed=1))

    b1 = tf.Variable(tf.random_normal([256]))
    b2 = tf.Variable(tf.random_normal([256]))
    b_out = tf.Variable(tf.random_normal([num_label]))

    def Fully_neural_network(X):
        layer_1 = tf.nn.relu(tf.add(tf.matmul(X, w1), b1))
        layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, w2), b2))
        layer_out = tf.matmul(layer_2, w_out) + b_out

        return layer_out

    net_out = Fully_neural_network(x)

    pre = tf.nn.softmax(net_out)
    pre1 = tf.argmax(pre, 1)

    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=net_out, labels=y))
    # loss = tf.reduce_mean(tf.abs(net_out-y))

    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss)

    correct_pre = tf.equal(tf.argmax(pre, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pre, tf.float32))

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for i in range(1, iteration_times + 1):
            start = (i * batch_size) % len(X)
            end = min(start + batch_size, len(X))
            batch_x = X[start:end]
            batch_y = Y[start:end]
            sess.run(train_op, feed_dict={x: batch_x, y: batch_y})
            if i % 100 == 0 or i == 1:
                l, acc = sess.run([loss, accuracy], feed_dict={x: dataX_predict, y: dataY_predict1})
                print("Step " + str(i) + ", Minibatch Loss= " + "{:.4f}".format(
                    l) + ", Training Accuracy= " + "{:.3f}".format(acc))

                print(pre1.eval(feed_dict={x: batch_x}))

RBFNN.py

import tensorflow as tf
import datadvi
import Acc
import numpy as np
import random
from sklearn.cluster import KMeans
from numpy.random import RandomState

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

def compute_sigma(c):
    sigma = np.zeros([len(c)])
    for i in range(len(c)):
        for j in range(len(c)):
            temp_dist = np.sum(np.square(c[i]-c[j]))
            if sigma[i] < temp_dist:
                sigma[i] = temp_dist
    print(sigma)
    return sigma

def init_C(dataX,num_label,label):
    c=np.empty(shape=(num_label,dataX.shape[-1]),dtype=np.float32)
    for i in range(num_label):
        c[i] = random.choice(dataX[label[:]==i+1])
    print(c)
    return c

def cluster_center(dataX, num_label):
    KM = KMeans(n_clusters=num_label,random_state=0).fit(dataX)
    return KM.cluster_centers_

def gen_data():
    rdm = RandomState(1)
    dataset_size = 1280
    X = rdm.rand(dataset_size, 2)
    Y1 = [[int(x1 + x2 < 1)] for (x1, x2) in X]
    Y = np.array(Y1).T[0]

    return X,X,Y,Y

def RBFNN(batch_size=50,learning_rate=0.01,iteration_times=10000):
    dataX_train, dataX_predict, dataY_train, dataY_predict = datadvi.divdata() #划分训练集和测试集
    #dataX_train, dataX_predict, dataY_train, dataY_predict = gen_data()

    #数据标准化
    #dataX_train =  StandardScaler().fit_transform(dataX_train)
    #dataX_predict = StandardScaler().fit_transform(dataX_predict)
    #print(len(dataX_predict[0]))
    #dataX_train = (dataX_train-np.mean(dataX_train,axis=0))/np.std(dataX_train,axis=0)
    #dataX_predict = (dataX_predict - np.mean(dataX_predict, axis=0)) / np.std(dataX_predict, axis=0)
    #print(dataX_predict)

    num_label = len(np.unique(dataY_train)) #数据种类
    num_feature = len(dataX_train[0]) #数据维度
    print(num_label)

    # 将标签转换成onehot形式
    dataY_train_onehot = (np.arange(num_label) + 1 == dataY_train[:, None]).astype(np.float32)
    dataY_predict_onehot = (np.arange(num_label) + 1 == dataY_predict[:,None]).astype(np.float32)

    #使用原始标签数据
    dataY_train_origin = np.array([dataY_train]).T
    dataY_predict_origin = np.array([dataY_predict]).T

    #定义占位
    X = tf.placeholder(tf.float32,shape=[None,num_feature])
    #Y = tf.placeholder(tf.float32,shape=[None,num_label])
    Y = tf.placeholder(tf.float32,shape=[None,1])

    #初始化中心点c和sigma(隐藏节点数与数据种类相同)
    #c = tf.Variable(tf.random_normal([num_label,num_feature]))
    #c = tf.Variable(c1)
    c = cluster_center(dataX_train,num_label)
    #c = tf.Variable(tf.cast(c,tf.float32))

    #sigma = tf.Variable(tf.ones([num_label]))
    sigma1 = compute_sigma(c)
    sigma = tf.Variable(tf.cast(sigma1,tf.float32))
    #sigma = tf.Variable(tf.cast(c[:,0],tf.float32))
    #print(sigma)

    #初始化权重W(使用onehot表示则输出节点数与种类相同)
    #W = tf.Variable(tf.random_normal([num_label,num_label]))

    W = tf.Variable(tf.random_normal([num_label,1]))

    #计算隐藏节点输出K1
    #sigma2 = tf.square(sigma)
    K = []
    for i in range(num_label):
        K.append(tf.reduce_sum(tf.square((X - c[i])), 1)/sigma[i])
    K_tensor = tf.convert_to_tensor(K)
    K1 = tf.exp(-tf.transpose(K_tensor))

    #计算输出层output
    output = tf.matmul(K1,W)

    #四舍五入得到预测标签
    pred = tf.round(output)
    #pred = tf.argmax(output,1)

    #定义损失函数loss(使用softmax交叉熵方法)
    #loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=Y))
    loss = tf.reduce_sum(tf.abs(tf.subtract(output,Y)))

    #选择优化方法
    #optimization = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    optimization = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for i in range(1,1+iteration_times):
            start = (i * batch_size) % dataX_train.shape[0]
            end = min(start + batch_size, dataX_train.shape[0])
            batch_x = dataX_train[start:end]
            batch_y = dataY_train_origin[start:end]
            #print(K1.eval(feed_dict={X:batch_x}))
            #print(b.eval())
            sess.run(optimization,feed_dict={X:batch_x,Y:batch_y})
            if i%10 == 0 or i==1:
                print("loss of step {} is {}".format(i,loss.eval(feed_dict={X:dataX_train,Y:dataY_train_origin})))
                print("ACC is: ",Acc.acc(np.array(pred.eval(feed_dict={X:dataX_train})).T[0],dataY_train))

CNN.py

import tensorflow as tf
import datadvi
import numpy as np
import math

sess = tf.InteractiveSession()


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)  # 标准差为0.1的正态分布
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)  # 偏差初始化为0.1
    return tf.Variable(initial)


def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')


def cnn(batch_size=50,learning_rate=0.0001,iteration_times=5000):
    dataX_train, dataX_predict, dataY_train, dataY_predict = datadvi.divdata()
    num_label = len(np.unique(dataY_train))

    dataY_train_onehot = (np.arange(num_label) + 1 == dataY_train[:, None]).astype(np.float32)
    dataY_predict_onehot = (np.arange(num_label) + 1 == dataY_predict[:,None]).astype(np.float32)

    x = tf.placeholder(tf.float32, [None, len(dataX_train[0])])
    y_ = tf.placeholder(tf.float32, [None, num_label])
    # -1代表先不考虑输入的图片例子多少这个维度,1是channel的数量
    mat_XY = int(math.sqrt(len(dataX_train[0])))

    if mat_XY*mat_XY == len(dataX_train[0]):
        x_image = tf.reshape(x, [-1, mat_XY, mat_XY, 1]) #这适合数据维度能够开方得到整数的数据集, MNIST 和  COIL20等
    else:
        x_image = tf.reshape(x,[-1,50,100,1]) #这是gisette的数据参数大小设置


    keep_prob = tf.placeholder(tf.float32)

    # 构建卷积层1
    W_conv1 = weight_variable([5, 5, 1, 32])  # 卷积核5*5,1个channel,32个卷积核,形成32个featuremap
    b_conv1 = bias_variable([32])  # 32个featuremap的偏置
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)  # 用relu非线性处理
    h_pool1 = max_pool_2x2(h_conv1)  # pooling池化


    # 构建卷积层2
    W_conv2 = weight_variable([5, 5, 32, 64])  # 注意这里channel值是32
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
    h_pool2 = max_pool_2x2(h_conv2)


    # 构建全连接层1
    W_fc1 = weight_variable([ int(mat_XY/4* mat_XY/4 * 64), 1024])
    #W_fc1 = weight_variable([int(50*100/4*64),1024]) #这是gisette的数据参数大小设置
    b_fc1 = bias_variable([1024])
    h_pool3 = tf.reshape(h_pool2, [-1, int(mat_XY/4* mat_XY/4 * 64)])
    #h_pool3 = tf.reshape(h_pool2,[-1,int(50*100/4*64)]) #这是gisette的数据参数大小设置
    h_fc1 = tf.nn.relu(tf.matmul(h_pool3, W_fc1) + b_fc1)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    # 构建全连接层2
    W_fc2 = weight_variable([1024, num_label])
    b_fc2 = bias_variable([num_label])
    y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.arg_max(y_conv, 1), tf.arg_max(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    tf.global_variables_initializer().run()

    for i in range(iteration_times):
        start = (i * batch_size) % dataX_train.shape[0]
        end = min(start + batch_size, dataX_train.shape[0])
        batch_x = dataX_train[start:end]
        batch_y = dataY_train_onehot[start:end]
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={x: batch_x, y_: batch_y, keep_prob: 1.0})
            print("step %d, training accuracy %g" % (i, train_accuracy))
            print("loss is: ", cross_entropy.eval(feed_dict={x:batch_x,y_:batch_y,keep_prob:1.0}))
        train_step.run(feed_dict={x: batch_x, y_: batch_y, keep_prob: 0.5})
    print("test accuracy %g" % accuracy.eval(feed_dict={x: dataX_predict,
                                                        y_: dataY_predict_onehot, keep_prob: 1.0}))

network.py

import FullyConnectedNN
import RBFNN
import CNN

batch_size = int(input("input batch_size: "))
learning_rate = float(input("input learning_rate: "))
iteration_times = int(input("input iteration_times: "))

method = int(input("choose a NN (1/FCNN  2/RBFNN 3/CNN): "))

if method == 1:
    FullyConnectedNN.FCNN(batch_size,learning_rate,iteration_times)

if method == 2:
    RBFNN.RBFNN(batch_size,learning_rate,iteration_times)

if method == 3:
    CNN.cnn(batch_size,learning_rate,iteration_times)

运行network.py即可。哎呦,终于把这个实习做完了,天天睡觉脑子里就是矩阵。

你可能感兴趣的:(机器学习,卷积,神经网络,python,tensorflow,机器学习)