本文通过假设,理论说明,实践输出,来演示dense神经网络的作用。例子采用keras编程。
假设特征向量如:[a,b,c] 特征维度为3, a,b,c为特征值,特征值可取离散的整数。例如输入特征向量为 [1,2,4], 输出特征为 0 或者 1.
我们的目标是,查找哪些逻辑关系可以学习到。如:我们将 a+b=3 的特征向量
的输出设置为1,如果神经网络Q可以通过训练,预测输入向量为[2,1,4] 的输出为 1.
那么则说,a + b = 3的逻辑关系可以通过神经网络Q学习到。
那么,Dense神经网络可以学习到什么呢?
首先,下面的神经元输出激活都为sigmoid函数
1 加减除关系的拟合
形如: a + b >1 a-b <3 a/b >5 的关系,其形态都可以转换为 x*a + y*b > z 的形式
其关系符合Dense矩阵乘法的定义,一层连接就可以拟合,无需多解释。
Q定义:输出层:Dense(1,input_dim=2)
2 and,or 逻辑关系
形如 a > 1 and b > 1 ; and,or 逻辑关系;
第一层隐藏层需要两个神经元,分别拟合a > 1 (神经元1) 和 b > 1(神经元2);输出神经元拟合(神经元1) and (神经元2)。
Q定义:隐藏层:Dense(2,input_dim=2)
输出层:Dense(1)
3 乘法关系的拟合
a,b的取值是连续的,此处为了方便分析,假设:a取值(1,2,3,4,5) b取值(1,2,3,4,5) 需要拟合的乘法为 a * b > 12。乘的关系看似最为复杂,但其实只需要两层连接,这边经历了一个实验阶段,我也将实验阶段表示出来。
实验假设:
a * b > 12 问题可以转换为 (3,4) (3,5) (4,3) (4,5) (5,3) (5,4) 这6种选择(3,4) 其实就 (1/3 * a + 1/4 *b ) = 0 线性表示,每种线性表示都需要一个神经元,那么如果某隐藏 层有6个神经元,每个神经元学到了一个线性表示,那么输出神经元形则可以通过学习 隐藏神经元1 + 隐藏神经元2 + 隐藏神经元3 + 隐藏神经元4 ...隐藏神经元6 > 0 来达到目标。
那么问题来了, (1/3 * a + 1/4 *b ) = 0 如何学到? 我采用一次线性变换,是学不到的,因为sigmoid函数是单调的。一层隐藏层只能学习到 1中的逻辑。但是(1/3 * a + 1/4 *b ) = 0 问题却可以转为((1/3 * a + 1/4 *b ) + 极小数 > 0 ) and (( 1/3 * a + 1/4 *b ) - 极小数 < 0 )的问题。也就是第一层隐藏层先拟合((1/3 * a + 1/4 *b ) + 极小数 > 0 )和((1/3 * a + 1/4 *b - 极小数) < 0 ),第二层隐藏层再拟合他们之间的and,经过测试果然是可以的。按照上面的推测,乘法需要3次连接,第一层隐藏神经元表达 (1/3 * a + 1/4 *b )>0 - 极小数 ;第二层隐藏层表达(1/3 * a + 1/4 *b )=0 ;输出层表达 (3,4) or (3,5) or (4,3) or (4,5) or (5,3) or (5,4)。
Q表达:隐藏层:Dense(2*6种选择,input_dim = 2)
隐藏层 Dense(6)
输出层 Dense(1)
实践结果:
然而我做训练的时候,只用了一层隐藏层和一个输出层,结果就是正确的了,召回律和精准律都大于百分之99,我加大的feature数,原来就a b c,现在一直到z,精准律,召回律还是很高。
Q定义:隐藏层:Dense(2)
输出层:Dense(1)
就是说明,形如a*b>12的非线性运算,通过 sigmoid(c3*sigmoid(c1*a+b1) + c4*sigmoid(c2*a+b2) + b3) 的确可以拟合出来。至于为什么可以拟合,要参考同胚变换,后面补上。
import numpy as np from keras.models import Sequential from keras.layers import Dense,LSTM from keras.utils import np_utils samples=20000 features = 3 def modelData1(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if sample[0]>sample[1]: Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def modelData2(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if (sample[0]+sample[1])>1: Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def modelData3(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if (sample[0]+sample[1]+sample[2])>1.5: Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def modelData4(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if (sample[0] > 0.5)&(sample[1] > 0.5): Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def modelData5(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if ((sample[0]/sample[1]) > 0.5): Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def modelData6(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if ((sample[0]*sample[1]) > 0.25): Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def modelData7(): np.random.seed(7) X = [] Y = [] for f in range(samples): sample = [] for e in range(features): v = np.random.random() sample.append(v) if ((sample[0] + sample[1]) > 0.95) & ((sample[0] + sample[1]) < 1.05): Y.append(1) else: Y.append(0) X.append(sample) X = np.array(X) Y = np.array(Y) split = int(np.round(samples/2)) x_train = X[:split] y_train = Y[:split] x_test = X[split:] y_test = Y[split:] return x_train,y_train,x_test,y_test def model1(data): x_train,y_train,x_test,y_test = data model = Sequential() model.add(Dense(1,input_shape=(x_train.shape[1],),activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy']) return model def model2(data): x_train,y_train,x_test,y_test = data model = Sequential() model.add(Dense(32,input_shape=(x_train.shape[1],),activation='sigmoid')) model.add(Dense(1,activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy']) return model def model3(data): x_train,y_train,x_test,y_test = data model = Sequential() model.add(Dense(128,input_shape=(x_train.shape[1],),activation='sigmoid')) model.add(Dense(1,activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy']) return model def model4(data): x_train,y_train,x_test,y_test = data model = Sequential() model.add(Dense(2,input_shape=(x_train.shape[1],),activation='sigmoid')) model.add(Dense(1,activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy']) return model def fitmode(model,data,weight): x_train, y_train, x_test, y_test = data model.fit(x_train,y_train,epochs=500,batch_size=32,verbose=2) model.save_weights(weight) return model def evaluatemode(model,data,weight): x_train, y_train, x_test, y_test = data model.load_weights(weight) result = model.predict(x_test[:20]) for a,b in zip(result,y_test[:20]): print(a,np.round(a),b) result = model.predict(x_test) TP = 0 FP = 0 FN = 0 TN = 0 for a,b in zip(result,y_test): predict = int(np.round(a)) real = int(b) if (predict==1)&(real==1): TP = TP + 1 if (predict==1)&(real==0): FP = FP + 1 if (predict == 0) & (real == 0): TN = TN + 1 if (predict==0)&(real==1): FN = FN + 1 print("真数:",TP + FN) print("假数:",TN + FP) print("精准律:(在预言的真相中,有多少是真相)") #放置除0的错误,这边+1 print((TP+1)/(TP+FP+1)) print("召回律:(在真相中,为找到多少)") print((TP+1)/(TP+FN+1)) def sigmoid(inX): return 1.0/(1+np.exp(-inX)) if __name__ == '__main__': pass #weight1 = "weights/testdense1.h5" #data = modelData1() #model = fitmode(model1(data),data,weight1) #evaluatemode(model1(data),data,weight1) #weight2 = "weights/testdense2.h5" #data = modelData2() #model = fitmode(model1(data),data,weight2) #evaluatemode(model1(data),data,weight2) # weight3 = "weights/testdense3.h5" # data = modelData3() # model = fitmode(model1(data),data,weight3) # evaluatemode(model1(data),data,weight3) #学习不佳 #weight4 = "weights/testdense4.h5" #data = modelData4() #model = fitmode(model1(data),data,weight4) #evaluatemode(model1(data),data,weight4) #更多的样本情况下,可以学习到一个不错的水平 #weight5 = "weights/testdense5.h5" #data = modelData4() #model = fitmode(model2(data),data,weight5) #evaluatemode(model2(data),data,weight5) #weight6 = "weights/testdense6.h5" #data = modelData3() #model = fitmode(model1(data),data,weight6) #evaluatemode(model1(data),data,weight6) #学习不佳 #weight7 = "weights/testdense7.h5" #data = modelData6() #model = fitmode(model1(data),data,weight7) #evaluatemode(model1(data),data,weight7) #学习的不错 #精准律:0.99 #召回律:0.99 #weight8 = "weights/testdense8.h5" #data = modelData6() #model = fitmode(model3(data),data,weight8) #evaluatemode(model3(data),data,weight8) #print(model3(data).get_weights()) #学不到 #召回律为0 #weight9 = "weights/testdense9.h5" #data = modelData7() #model = fitmode(model1(data),data,weight9) #evaluatemode(model1(data),data,weight9) #992 9008 #精准律0.92 召回律0.83 #weight10 = "weights/testdense10.h5" #data = modelData7() #model = fitmode(model2(data),data,weight10) #evaluatemode(model2(data),data,weight10) #精准律0.99 #召回律0.99 #weight9 = "weights/testdense9.h5" #data = modelData6() #model = fitmode(model4(data),data,weight9) #evaluatemode(model4(data),data,weight9)