被自然所启发
。这么说来看看大脑
的组成,启发我们构建智能机器,就合乎情理了。这就是人工神经网络ANN(Artificial Neural Network)
的根本来源。核心中的核心
。它们通用、强大、可扩展,使它成为解决复杂机器学习任务的理想选择。比如,数以亿计的图片分类,击败世界冠军的AlphaGo。人工神经元:它有一个或多个二进制 (开\关) 输入 和 一个输出。
逻辑非
的应用场景,比如dropout。感知器Perceptron
就是多个LTU单元的单层全连接NN结构。注意:X1、X2 是特征 特征 , 1为偏差特征,永远为1!!!
上面这个感知器结构做了什么?
它将一个实例(x1 x2是单个实例的2个特征
)分为3个不同的二进制类,所以它是多输出分类器
。当然也可以做成单输出分类器,在后面再加一层单个LTU单元的输出就好了,此时拥有2层的感知器叫多层感知器
(MLP, Multi-Layer Perceptron)。hebb定律
(又叫hebbinan学习):当2个神经元有相同的输出时,它们之间的链接权重就会增强。perceptron就是根据这个规则的变体来训练。训练算法 (权重更新)
:w i j n e x t s t e p = w i j + η ( y ^ j − y j ) x i w_{ij}^{next step} = w_{ij} + \eta(\hat y_j-y_j)x_i wijnextstep=wij+η(y^j−yj)xi
注意:
感知器的每个输出神经元的决策边界是线性的
,所以无法学习复杂的模式。(这点跟LR一样)
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
iris = load_iris()
X = iris.data[:, (2, 3)] # petal length, petal width
y = (iris.target == 0).astype(np.int)
per_clf = Perceptron(max_iter=100, tol=-np.infty, random_state=42)
per_clf.fit(X, y)
y_pred = per_clf.predict([[2, 0.5]])
a = -per_clf.coef_[0][0] / per_clf.coef_[0][1] #前两个系数相除
b = -per_clf.intercept_ / per_clf.coef_[0][1] #截距 除以 系数
axes = [0, 5, 0, 2]
x0, x1 = np.meshgrid(
np.linspace(axes[0], axes[1], 500).reshape(-1, 1),# 0 ~ 5之间产生500个等差数列的数
np.linspace(axes[2], axes[3], 200).reshape(-1, 1),# 0 ~ 2之间产生200个等差数列的数
)
#生成测试实例
X_new = np.c_[x0.ravel(), x1.ravel()] # 按列合并
y_predict = per_clf.predict(X_new)
zz = y_predict.reshape(x0.shape)
plt.figure(figsize=(10, 4))
plt.plot(X[y==0, 0], X[y==0, 1], "bs", label="Not Iris-Setosa")
plt.plot(X[y==1, 0], X[y==1, 1], "yo", label="Iris-Setosa")
#画出决策边界
plt.plot([axes[0], axes[1]], [a * axes[0] + b, a * axes[1] + b], "k-", linewidth=3)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#9898ff', '#fafab0'])
plt.contourf(x0, x1, zz, cmap=custom_cmap)#正负样本区域 展示不同颜色
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.legend(loc="lower right", fontsize=14)
plt.axis(axes)
# save_fig("perceptron_iris_plot")
plt.show()
注意
:感知器只能根据一个固定的阈值来做预测,而不是像LR输出一个概率,所以从灵活方面来说应该使用LR而不是Perception。复合函数求导的链式法则
。 反向传播的训练过程
:激活函数
,除了逻辑函数sigmoid等外,最流行的是2个:z = np.linspace(-5, 5, 200)
plt.figure(figsize=(11,4))
plt.subplot(121)
plt.plot(z, np.sign(z), "r-", linewidth=1, label="Step")
plt.plot(z, sigmoid(z), "g--", linewidth=2, label="Sigmoid")
plt.plot(z, np.tanh(z), "b-", linewidth=2, label="Tanh")
plt.plot(z, relu(z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
plt.legend(loc="center right", fontsize=14)
plt.title("Activation functions", fontsize=14)
plt.axis([-5, 5, -1.2, 1.2])
plt.subplot(122)
plt.plot(z, derivative(np.sign, z), "r-", linewidth=1, label="Step")
plt.plot(0, 0, "ro", markersize=5)
plt.plot(0, 0, "rx", markersize=10)
plt.plot(z, derivative(sigmoid, z), "g--", linewidth=2, label="Sigmoid")
plt.plot(z, derivative(np.tanh, z), "b-", linewidth=2, label="Tanh")
plt.plot(z, derivative(relu, z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
#plt.legend(loc="center right", fontsize=14)
plt.title("Derivatives", fontsize=14)
plt.axis([-5, 5, -0.2, 1.2])
save_fig("activation_functions_plot")
plt.show()
#shuffle分批分桶
def shuffle_batch(X, y, batch_size):
rnd_idx = np.random.permutation(len(X))
n_batches = len(X) // batch_size
for batch_idx in np.array_split(rnd_idx, n_batches):
X_batch, y_batch = X[batch_idx], y[batch_idx]
yield X_batch, y_batch #yield生成器,节省内存
n_inputs = 28*28 # MNIST
n_hidden1 = 300 #隐层1的神经元数量
n_hidden2 = 100 #隐层2的神经元数量
n_outputs = 10 #输出层的神经元数量,对于MNIST为多输出,0 - 9 共10种数字
reset_graph()
#------------------构建阶段 --------------------
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") #占位符,相当于先定义出来因变量X
y = tf.placeholder(tf.int32, shape=(None), name="y")
#构建nn结构
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
activation=tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
activation=tf.nn.relu)
logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
y_proba = tf.nn.softmax(logits)
#定义损失函数
with tf.name_scope("loss"):
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
#定义优化器和最小化损失函数的op
learning_rate = 0.01
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
#定义模型评估
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
#------------------执行阶段 --------------------
init = tf.global_variables_initializer() # 定义全局变量初始化器
saver = tf.train.Saver() #定义saver用于保存模型
n_epochs = 20 #迭代轮次
n_batches = 50 #每个批次的实例数量
with tf.Session() as sess:
init.run() #初始化变量
for epoch in range(n_epochs):
for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) #开始训练
acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch}) #每个批次的训练集精确率
acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid}) #每个批次的验证集的精确率
print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)
save_path = saver.save(sess, "./my_model_final.ckpt") #保存模型
def neuron_layer(X, n_neurons, name, activation=None):
with tf.name_scope(name):
n_inputs = int(X.get_shape()[1])
stddev = 2 / np.sqrt(n_inputs)
init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
W = tf.Variable(init, name="kernel")
b = tf.Variable(tf.zeros([n_neurons]), name="bias")
Z = tf.matmul(X, W) + b
if activation is not None:
return activation(Z)
else:
return Z
#唯一区别是这里使用了我们自定义的层结构,而不是dense
with tf.name_scope("dnn"):
hidden1 = neuron_layer(X, n_hidden1, name="hidden1",
activation=tf.nn.relu)
hidden2 = neuron_layer(hidden1, n_hidden2, name="hidden2",
activation=tf.nn.relu)
logits = neuron_layer(hidden2, n_outputs, name="outputs")
with tf.Session() as sess:
saver.restore(sess, "./my_model_final.ckpt") # or better, use save_path
X_new_scaled = X_test[:20] #这里需要特征缩放 0 ~ 1
Z = logits.eval(feed_dict={X: X_new_scaled}) # logits为nn最后的输出节点
y_pred = np.argmax(Z, axis=1) #取出最大值的索引下标,即为预测图片
Z
y_pred
隐藏层多神经元少的目的是为了训练起来更加快速
。不过,很少会有人从头构建这样的网络:更常见的是重用
别人训练好的用来处理类似任务的网络。重要
)经验
是以漏斗型来定义其尺寸,每层的神经元数依次减少,原因:许多低级功能可以合并成数量更少的高级功能。更简单的方式:
使用更多的层次和神经元,然后提前设置1 早停
来避免过拟合
,或者使用2 dropout
正则化技术。这被称为 弹力裤 方法。对于回归任务,完全可以不使用激活函数?
。额外的功能
(保存检查点,中断后从检查点恢复,添加汇总,用tensorboard绘制学习曲线
)from datetime import datetime
#定义日志路径
def log_dir(prefix=""):
now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
if prefix:
prefix += "-"
name = prefix + "run-" + now
return "{}/{}/".format(root_logdir, name)
n_inputs = 28*28 # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
reset_graph()
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")
with tf.name_scope("dnn"):
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
activation=tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
activation=tf.nn.relu)
logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
with tf.name_scope("loss"):
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
loss_summary = tf.summary.scalar('log_loss', loss)
learning_rate = 0.01
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
accuracy_summary = tf.summary.scalar('accuracy', accuracy)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
#定义二进制日志文件writer
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
m, n = X_train.shape
# -------------- 执行计算图--------------------
n_epochs = 10001
batch_size = 50
n_batches = int(np.ceil(m / batch_size))
checkpoint_path = "./tmp/my_deep_mnist_model.ckpt" #第一次训练时路径不对
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_deep_mnist_model"
best_loss = np.infty
epochs_without_progress = 0
max_epochs_without_progress = 50
with tf.Session() as sess:
if os.path.isfile(checkpoint_epoch_path):
# if the checkpoint file exists, restore the model and load the epoch number
with open(checkpoint_epoch_path, "rb") as f:
start_epoch = int(f.read())
print("Training was interrupted. Continuing at epoch", start_epoch)
saver.restore(sess, checkpoint_path)
else:
start_epoch = 0
sess.run(init)
for epoch in range(start_epoch, n_epochs):
for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
accuracy_val, loss_val, accuracy_summary_str, loss_summary_str = sess.run([accuracy, loss, accuracy_summary, loss_summary], feed_dict={X: X_valid, y: y_valid})
file_writer.add_summary(accuracy_summary_str, epoch)
file_writer.add_summary(loss_summary_str, epoch)
if epoch % 5 == 0:
print("Epoch:", epoch,
"\tValidation accuracy: {:.3f}%".format(accuracy_val * 100),
"\tLoss: {:.5f}".format(loss_val))
#保存当前模型
saver.save(sess, checkpoint_path)
#保存当前迭代轮次到.epoch后缀的文件中
with open(checkpoint_epoch_path, "wb") as f:
f.write(b"%d" % (epoch + 1))
if loss_val < best_loss:
saver.save(sess, final_model_path)
best_loss = loss_val
else:
epochs_without_progress += 5
if epochs_without_progress > max_epochs_without_progress:
print("Early stopping")
break
#模型训练完成后,删除检查点文件
os.remove(checkpoint_epoch_path)