在本实践中,我们继续使用第三章中的鸢尾花分类任务,将Softmax分类器替换为本章介绍的前馈神经网络。 在本实验中,我们使用的损失函数为交叉熵损失;优化器为随机梯度下降法;评价指标为准确率。
画出数据集中150个数据的前两个特征的散点分布图。代码实现如下:
import pandas as pd
import matplotlib.pyplot as plt
# 导入数据集
df = pd.read_csv('Iris.csv', usecols=[1, 2, 3, 4, 5])
"""绘制训练集基本散点图,便于人工分析,观察数据集的线性可分性"""
# 表示绘制图形的画板尺寸为8*5
plt.figure(figsize=(8, 5))
# 散点图的x坐标、y坐标、标签
plt.scatter(df[:50]['SepalLength'], df[:50]['SepalWidth'], label='Iris-setosa')
plt.scatter(df[50:100]['SepalLength'], df[50:100]['SepalWidth'], label='Iris-versicolor')
plt.scatter(df[100:150]['SepalLength'], df[100:150]['SepalWidth'], label='Iris-virginica')
plt.xlabel('SepalLength')
plt.ylabel('SepalWidth')
# 添加标题 '鸢尾花萼片的长度与宽度的散点分布'
plt.title('Scattered distribution of length and width of iris sepals.')
# 显示标签
plt.legend()
plt.show()
在梯度下降法中,目标函数是整个训练集上的风险函数,这种方式称为批量梯度下降法(Batch Gradient Descent,BGD)。 批量梯度下降法在每次迭代时需要计算每个样本上损失函数的梯度并求和。当训练集中的样本数量 N N N很大时,空间复杂度比较高,每次迭代的计算开销也很大。
为了减少每次迭代的计算复杂度,我们可以在每次迭代时只采集一小部分样本,计算在这组样本上损失函数的梯度并更新参数,这种优化方式称为 小批量梯度下降法(Mini-Batch Gradient Descent,Mini-Batch GD)。
第 t t t次迭代时,随机选取一个包含 K K K个样本的子集 B t \mathcal{B}_t Bt,计算这个子集上每个样本损失函数的梯度并进行平均,然后再进行参数更新。
θ t + 1 ← θ t − α 1 K ∑ ( x , y ) ∈ S t ∂ L ( y , f ( x ; θ ) ) ∂ θ \theta_{t+1} \leftarrow \theta_t - \alpha \frac{1}{K} \sum_{(\boldsymbol{x},y)\in \mathcal{S}_t} \frac{\partial \mathcal{L}\Big(y,f(\boldsymbol{x};\theta)\Big)}{\partial \theta} θt+1←θt−αK1(x,y)∈St∑∂θ∂L(y,f(x;θ))其中 K K K为批量大小(Batch Size)。 K K K通常不会设置很大,一般在 1 ∼ 100 1\sim100 1∼100之间。在实际应用中为了提高计算效率,通常设置为2的幂 2 n 2^n 2n。
在实际应用中,小批量随机梯度下降法有收敛快、计算开销小的优点,因此逐渐成为大规模的机器学习中的主要优化算法。
此外,随机梯度下降相当于在批量梯度下降的梯度上引入了随机噪声。在非凸优化问题中,随机梯度下降更容易逃离局部最优点。
为了小批量梯度下降法,我们需要对数据进行随机分组。目前,机器学习中通常做法是构建一个数据迭代器,每个迭代过程中从全部数据集中获取一批指定数量的数据。
数据迭代器的实现原理如下图所示:
在实践过程中,通常使用进行参数优化。在飞桨中,使用paddle.io.DataLoader
加载minibatch的数据,paddle.io.DataLoader
API可以生成一个迭代器,其中通过设置batch_size
参数来指定minibatch的长度,通过设置shuffle参数为True,可以在生成minibatch
的索引列表时将索引顺序打乱。
构造IrisDataset类进行数据读取,继承自paddle.io.Dataset
类。paddle.io.Dataset
是用来封装 Dataset的方法和行为的抽象类,通过一个索引获取指定的样本,同时对该样本进行数据处理。当继承paddle.io.Dataset
来定义数据读取类时,实现如下方法:
__getitem__
:根据给定索引获取数据集中指定样本,并对样本进行数据处理;__len__
:返回数据集样本个数。代码实现如下:
import torch
import torch.utils.data as io
from sklearn.datasets import load_iris
# load_data函数
def load_data(shuffle=True):
"""
加载鸢尾花数据
输入:
- shuffle:是否打乱数据,数据类型为bool
输出:
- X:特征数据,shape=[150,4]
- y:标签数据, shape=[150]
"""
# 加载原始数据
X = np.array(load_iris().data, dtype=np.float32)
y = np.array(load_iris().target, dtype=np.int64)
X = torch.tensor(X)
y = torch.tensor(y)
# 数据归一化
X_min = torch.min(X, dim=0)[0]
X_max = torch.max(X, dim=0)[0]
X = (X - X_min) / (X_max - X_min)
# 如果shuffle为True,随机打乱数据
if shuffle:
idx = torch.randperm(X.shape[0])
X = X[idx]
y = y[idx]
return X, y
# IrisDataset类
class IrisDataset(io.Dataset):
def __init__(self, mode='train', num_train=120, num_dev=15):
super(IrisDataset, self).__init__()
# 调用第三章中的数据读取函数,其中不需要将标签转成one-hot类型
X, y = load_data(shuffle=True)
if mode == 'train':
self.X, self.y = X[:num_train], y[:num_train]
elif mode == 'dev':
self.X, self.y = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev]
else:
self.X, self.y = X[num_train + num_dev:], y[num_train + num_dev:]
def __getitem__(self, idx):
return self.X[idx], self.y[idx]
def __len__(self):
return len(self.y)
torch.random.manual_seed(12)
train_dataset = IrisDataset(mode='train')
dev_dataset = IrisDataset(mode='dev')
test_dataset = IrisDataset(mode='test')
# 打印训练集长度
print("length of train set: ", len(train_dataset))
代码执行结果:
length of train set: 120
# 批量大小
batch_size = 16
# 加载数据
train_loader = io.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = io.DataLoader(dev_dataset, batch_size=batch_size)
test_loader = io.DataLoader(test_dataset, batch_size=batch_size)
构建一个简单的前馈神经网络进行鸢尾花分类实验。其中输入层神经元个数为4,输出层神经元个数为3,隐含层神经元个数为6。
代码实现如下:
from torch import nn
from torch.nn.init import constant_, normal_
# 定义前馈神经网络
class Model_MLP_L2_V3(nn.Module):
def __init__(self, input_size, output_size, hidden_size):
super(Model_MLP_L2_V3, self).__init__()
# 构建第一个全连接层
self.fc1 = nn.Linear(
input_size,
hidden_size
)
normal_(self.fc1.weight, mean=0.0, std=0.01)
constant_(self.fc1.bias, val=1.0)
# 构建第二全连接层
self.fc2 = nn.Linear(
hidden_size,
output_size
)
normal_(self.fc2.weight, mean=0.0, std=0.01)
constant_(self.fc2.bias, val=1.0)
# 定义网络使用的激活函数
self.act = nn.Sigmoid()
def forward(self, inputs):
outputs = self.fc1(inputs)
outputs = self.act(outputs)
outputs = self.fc2(outputs)
return outputs
fnn_model = Model_MLP_L2_V3(input_size=4, output_size=3, hidden_size=6)
基于RunnerV2类进行完善实现了RunnerV3类。其中训练过程使用自动梯度计算,使用DataLoader
加载批量数据,使用随机梯度下降法进行参数优化;模型保存时,使用state_dict
方法获取模型参数;模型加载时,使用set_state_dict
方法加载模型参数.
由于这里使用随机梯度下降法对参数优化,所以数据以批次的形式输入到模型中进行训练,那么评价指标计算也是分别在每个批次进行的,要想获得每个epoch整体的评价结果,需要对历史评价结果进行累积。这里定义Accuracy
类实现该功能。
Accuracy
类中的Metric
类
import six
import abc
import numpy as np
@six.add_metaclass(abc.ABCMeta)
class Metric(object):
r"""
Base class for metric, encapsulates metric logic and APIs
Usage:
.. code-block:: text
m = SomeMetric()
for prediction, label in ...:
m.update(prediction, label)
m.accumulate()
Advanced usage for :code:`compute`:
Metric calculation can be accelerated by calculating metric states
from model outputs and labels by build-in operators not by Python/NumPy
in :code:`compute`, metric states will be fetched as NumPy array and
call :code:`update` with states in NumPy format.
Metric calculated as follows (operations in Model and Metric are
indicated with curly brackets, while data nodes not):
.. code-block:: text
inputs & labels || ------------------
| ||
{model} ||
| ||
outputs & labels ||
| || tensor data
{Metric.compute} ||
| ||
metric states(tensor) ||
| ||
{fetch as numpy} || ------------------
| ||
metric states(numpy) || numpy data
| ||
{Metric.update} \/ ------------------
Examples:
For :code:`Accuracy` metric, which takes :code:`pred` and :code:`label`
as inputs, we can calculate the correct prediction matrix between
:code:`pred` and :code:`label` in :code:`compute`.
For examples, prediction results contains 10 classes, while :code:`pred`
shape is [N, 10], :code:`label` shape is [N, 1], N is mini-batch size,
and we only need to calculate accurary of top-1 and top-5, we could
calculate the correct prediction matrix of the top-5 scores of the
prediction of each sample like follows, while the correct prediction
matrix shape is [N, 5].
.. code-block:: text
def compute(pred, label):
# sort prediction and slice the top-5 scores
pred = torch.argsort(pred, descending=True)[:, :5]
# calculate whether the predictions are correct
correct = pred == label
return torch.cast(correct, dtype='float32')
With the :code:`compute`, we split some calculations to OPs (which
may run on GPU devices, will be faster), and only fetch 1 tensor with
shape as [N, 5] instead of 2 tensors with shapes as [N, 10] and [N, 1].
:code:`update` can be define as follows:
.. code-block:: text
def update(self, correct):
accs = []
for i, k in enumerate(self.topk):
num_corrects = correct[:, :k].sum()
num_samples = len(correct)
accs.append(float(num_corrects) / num_samples)
self.total[i] += num_corrects
self.count[i] += num_samples
return accs
"""
def __init__(self):
pass
@abc.abstractmethod
def reset(self):
"""
Reset states and result
"""
raise NotImplementedError("function 'reset' not implemented in {}.".
format(self.__class__.__name__))
@abc.abstractmethod
def update(self, *args):
"""
Update states for metric
Inputs of :code:`update` is the outputs of :code:`Metric.compute`,
if :code:`compute` is not defined, the inputs of :code:`update`
will be flatten arguments of **output** of mode and **label** from data:
:code:`update(output1, output2, ..., label1, label2,...)`
see :code:`Metric.compute`
"""
raise NotImplementedError("function 'update' not implemented in {}.".
format(self.__class__.__name__))
@abc.abstractmethod
def accumulate(self):
"""
Accumulates statistics, computes and returns the metric value
"""
raise NotImplementedError(
"function 'accumulate' not implemented in {}.".format(
self.__class__.__name__))
@abc.abstractmethod
def name(self):
"""
Returns metric name
"""
raise NotImplementedError("function 'name' not implemented in {}.".
format(self.__class__.__name__))
def compute(self, *args):
"""
This API is advanced usage to accelerate metric calculating, calulations
from outputs of model to the states which should be updated by Metric can
be defined here, where torch OPs is also supported. Outputs of this API
will be the inputs of "Metric.update".
If :code:`compute` is defined, it will be called with **outputs**
of model and **labels** from data as arguments, all outputs and labels
will be concatenated and flatten and each filed as a separate argument
as follows:
:code:`compute(output1, output2, ..., label1, label2,...)`
If :code:`compute` is not defined, default behaviour is to pass
input to output, so output format will be:
:code:`return output1, output2, ..., label1, label2,...`
see :code:`Metric.update`
"""
return args
Accuracy
类
class Accuracy(Metric):
def __init__(self, is_logist=True):
"""
输入:
- is_logist: outputs是logist还是激活后的值
"""
# 用于统计正确的样本个数
self.num_correct = 0
# 用于统计样本的总数
self.num_count = 0
self.is_logist = is_logist
def update(self, outputs, labels):
"""
输入:
- outputs: 预测值, shape=[N,class_num]
- labels: 标签值, shape=[N,1]
"""
# 判断是二分类任务还是多分类任务,shape[1]=1时为二分类任务,shape[1]>1时为多分类任务
if outputs.shape[1] == 1: # 二分类
outputs = torch.squeeze(outputs, dim=-1)
if self.is_logist:
# logist判断是否大于0
preds = torch.tensor((outputs >= 0), dtype=torch.float32)
else:
# 如果不是logist,判断每个概率值是否大于0.5,当大于0.5时,类别为1,否则类别为0
preds = torch.tensor((outputs >= 0.5), dtype=torch.float32)
else:
# 多分类时,使用'torch.argmax'计算最大元素索引作为类别
preds = torch.argmax(outputs, dim=1)
# 获取本批数据中预测正确的样本个数
labels = torch.squeeze(labels, dim=-1)
batch_correct = torch.sum(torch.tensor(preds == labels, dtype=torch.float32)).numpy()
batch_count = len(labels)
# 更新num_correct 和 num_count
self.num_correct += batch_correct
self.num_count += batch_count
def accumulate(self):
# 使用累计的数据,计算总的指标
if self.num_count == 0:
return 0
return self.num_correct / self.num_count
def reset(self):
# 重置正确的数目和总数
self.num_correct = 0
self.num_count = 0
def name(self):
return "Accuracy"
RunnerV3类的代码实现如下:
class RunnerV3(object):
def __init__(self, model, optimizer, loss_fn, metric, **kwargs):
self.model = model
self.optimizer = optimizer
self.loss_fn = loss_fn
self.metric = metric # 只用于计算评价指标
# 记录训练过程中的评价指标变化情况
self.dev_scores = []
# 记录训练过程中的损失函数变化情况
self.train_epoch_losses = [] # 一个epoch记录一次loss
self.train_step_losses = [] # 一个step记录一次loss
self.dev_losses = []
# 记录全局最优指标
self.best_score = 0
def train(self, train_loader, dev_loader=None, **kwargs):
# 将模型切换为训练模式
self.model.train()
# 传入训练轮数,如果没有传入值则默认为0
num_epochs = kwargs.get("num_epochs", 0)
# 传入log打印频率,如果没有传入值则默认为100
log_steps = kwargs.get("log_steps", 100)
# 评价频率
eval_steps = kwargs.get("eval_steps", 0)
# 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
save_path = kwargs.get("save_path", "best_model.pdparams")
custom_print_log = kwargs.get("custom_print_log", None)
# 训练总的步数
num_training_steps = num_epochs * len(train_loader)
if eval_steps:
if self.metric is None:
raise RuntimeError('Error: Metric can not be None!')
if dev_loader is None:
raise RuntimeError('Error: dev_loader can not be None!')
# 运行的step数目
global_step = 0
# 进行num_epochs轮训练
for epoch in range(num_epochs):
# 用于统计训练集的损失
total_loss = 0
for step, data in enumerate(train_loader):
X, y = data
# 获取模型预测
logits = self.model(X)
loss = self.loss_fn(logits, y) # 默认求mean
total_loss += loss
# 训练过程中,每个step的loss进行保存
self.train_step_losses.append((global_step, loss.item()))
if log_steps and global_step % log_steps == 0:
print(
f"[Train] epoch: {epoch}/{num_epochs}, step: {global_step}/{num_training_steps}, loss: {loss.item():.5f}")
# 梯度反向传播,计算每个参数的梯度值
loss.backward()
if custom_print_log:
custom_print_log(self)
# 小批量梯度下降进行参数更新
self.optimizer.step()
# 梯度归零
self.optimizer.zero_grad()
# 判断是否需要评价
if eval_steps > 0 and global_step > 0 and \
(global_step % eval_steps == 0 or global_step == (num_training_steps - 1)):
dev_score, dev_loss = self.evaluate(dev_loader, global_step=global_step)
print(f"[Evaluate] dev score: {dev_score:.5f}, dev loss: {dev_loss:.5f}")
# 将模型切换为训练模式
self.model.train()
# 如果当前指标为最优指标,保存该模型
if dev_score > self.best_score:
self.save_model(save_path)
print(
f"[Evaluate] best accuracy performence has been updated: {self.best_score:.5f} --> {dev_score:.5f}")
self.best_score = dev_score
global_step += 1
# 当前epoch 训练loss累计值
trn_loss = (total_loss / len(train_loader)).item()
# epoch粒度的训练loss保存
self.train_epoch_losses.append(trn_loss)
print("[Train] Training done!")
# 模型评估阶段,使用'torch.no_grad()'控制不计算和存储梯度
@torch.no_grad()
def evaluate(self, dev_loader, **kwargs):
assert self.metric is not None
# 将模型设置为评估模式
self.model.eval()
global_step = kwargs.get("global_step", -1)
# 用于统计训练集的损失
total_loss = 0
# 重置评价
self.metric.reset()
# 遍历验证集每个批次
for batch_id, data in enumerate(dev_loader):
X, y = data
# 计算模型输出
logits = self.model(X)
# 计算损失函数
loss = self.loss_fn(logits, y).item()
# 累积损失
total_loss += loss
# 累积评价
self.metric.update(logits, y)
dev_loss = (total_loss / len(dev_loader))
dev_score = self.metric.accumulate()
# 记录验证集loss
if global_step != -1:
self.dev_losses.append((global_step, dev_loss))
self.dev_scores.append(dev_score)
return dev_score, dev_loss
# 模型评估阶段,使用'torch.no_grad()'控制不计算和存储梯度
@torch.no_grad()
def predict(self, x, **kwargs):
# 将模型设置为评估模式
self.model.eval()
# 运行模型前向计算,得到预测值
logits = self.model(x)
return logits
def save_model(self, save_path):
torch.save(self.model.state_dict(), save_path)
def load_model(self, model_path):
model_state_dict = torch.load(model_path)
self.model.load_state_dict(model_state_dict)
实例化RunnerV3类,并传入训练配置,代码实现如下:
import torch.optim as opt
import torch.nn.functional as F
lr = 0.2
# 定义网络
model = fnn_model
# 定义优化器
optimizer = opt.SGD(model.parameters(), lr=lr)
# 定义损失函数。softmax+交叉熵
loss_fn = F.cross_entropy
# 定义评价指标
metric = Accuracy(is_logist=True)
runner = RunnerV3(model, optimizer, loss_fn, metric)
使用训练集和验证集进行模型训练,共训练150个epoch。在实验中,保存准确率最高的模型作为最佳模型。代码实现如下:
# 启动训练
log_steps = 100
eval_steps = 50
runner.train(train_loader, dev_loader,
num_epochs=150, log_steps=log_steps, eval_steps=eval_steps,
save_path="best_model.pdparams")
代码执行结果:
[Train] epoch: 0/150, step: 0/1200, loss: 1.09898
[Evaluate] dev score: 0.33333, dev loss: 1.09582
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.33333
[Train] epoch: 12/150, step: 100/1200, loss: 1.13891
[Evaluate] dev score: 0.46667, dev loss: 1.10749
[Evaluate] best accuracy performence has been updated: 0.33333 --> 0.46667
[Evaluate] dev score: 0.20000, dev loss: 1.10089
[Train] epoch: 25/150, step: 200/1200, loss: 1.10158
[Evaluate] dev score: 0.20000, dev loss: 1.12477
[Evaluate] dev score: 0.46667, dev loss: 1.09090
[Train] epoch: 37/150, step: 300/1200, loss: 1.09982
[Evaluate] dev score: 0.46667, dev loss: 1.07537
[Evaluate] dev score: 0.53333, dev loss: 1.04453
[Evaluate] best accuracy performence has been updated: 0.46667 --> 0.53333
[Train] epoch: 50/150, step: 400/1200, loss: 1.01054
[Evaluate] dev score: 1.00000, dev loss: 1.00635
[Evaluate] best accuracy performence has been updated: 0.53333 --> 1.00000
[Evaluate] dev score: 0.86667, dev loss: 0.86850
[Train] epoch: 62/150, step: 500/1200, loss: 0.63702
[Evaluate] dev score: 0.80000, dev loss: 0.66986
[Evaluate] dev score: 0.86667, dev loss: 0.57089
[Train] epoch: 75/150, step: 600/1200, loss: 0.56490
[Evaluate] dev score: 0.93333, dev loss: 0.52392
[Evaluate] dev score: 0.86667, dev loss: 0.45410
[Train] epoch: 87/150, step: 700/1200, loss: 0.41929
[Evaluate] dev score: 0.86667, dev loss: 0.46156
[Evaluate] dev score: 0.93333, dev loss: 0.41593
[Train] epoch: 100/150, step: 800/1200, loss: 0.41047
[Evaluate] dev score: 0.93333, dev loss: 0.40600
[Evaluate] dev score: 0.93333, dev loss: 0.37672
[Train] epoch: 112/150, step: 900/1200, loss: 0.42777
[Evaluate] dev score: 0.93333, dev loss: 0.34534
[Evaluate] dev score: 0.93333, dev loss: 0.33552
[Train] epoch: 125/150, step: 1000/1200, loss: 0.30734
[Evaluate] dev score: 0.93333, dev loss: 0.31958
[Evaluate] dev score: 0.93333, dev loss: 0.32091
[Train] epoch: 137/150, step: 1100/1200, loss: 0.28321
[Evaluate] dev score: 0.93333, dev loss: 0.28383
[Evaluate] dev score: 0.93333, dev loss: 0.27171
[Evaluate] dev score: 0.93333, dev loss: 0.25447
[Train] Training done!
可视化观察训练集损失和训练集loss变化情况。
import matplotlib.pyplot as plt
# 绘制训练集和验证集的损失变化以及验证集上的准确率变化曲线
def plot_training_loss_acc(runner, fig_name,
fig_size=(16, 6),
sample_step=20,
loss_legend_loc="upper right",
acc_legend_loc="lower right",
train_color="#e4007f",
dev_color='#f19ec2',
fontsize='large',
train_linestyle="-",
dev_linestyle='--'):
plt.figure(figsize=fig_size)
plt.subplot(1, 2, 1)
train_items = runner.train_step_losses[::sample_step]
train_steps = [x[0] for x in train_items]
train_losses = [x[1] for x in train_items]
plt.plot(train_steps, train_losses, color=train_color, linestyle=train_linestyle, label="Train loss")
if len(runner.dev_losses) > 0:
dev_steps = [x[0] for x in runner.dev_losses]
dev_losses = [x[1] for x in runner.dev_losses]
plt.plot(dev_steps, dev_losses, color=dev_color, linestyle=dev_linestyle, label="Dev loss")
# 绘制坐标轴和图例
plt.ylabel("loss", fontsize=fontsize)
plt.xlabel("step", fontsize=fontsize)
plt.legend(loc=loss_legend_loc, fontsize='x-large')
# 绘制评价准确率变化曲线
if len(runner.dev_scores) > 0:
plt.subplot(1, 2, 2)
plt.plot(dev_steps, runner.dev_scores,
color=dev_color, linestyle=dev_linestyle, label="Dev accuracy")
# 绘制坐标轴和图例
plt.ylabel("score", fontsize=fontsize)
plt.xlabel("step", fontsize=fontsize)
plt.legend(loc=acc_legend_loc, fontsize='x-large')
plt.savefig(fig_name)
plt.show()
plot_training_loss_acc(runner, 'fw-loss.pdf')
使用测试数据对在训练过程中保存的最佳模型进行评价,观察模型在测试集上的准确率以及Loss情况。代码实现如下:
# 加载最优模型
runner.load_model('best_model.pdparams')
# 模型评价
score, loss = runner.evaluate(test_loader)
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))
代码执行结果:
[Test] accuracy/loss: 1.0000/1.0183
同样地,也可以使用保存好的模型,对测试集中的某一个数据进行模型预测,观察模型效果。代码实现如下:
# 获取测试集中第一条数据
X, label = next(iter(test_loader))
logits = runner.predict(X)
pred_class = torch.argmax(logits[0]).numpy()
label = label[0].numpy()
# 输出真实类别与预测类别
print("The true category is {} and the predicted category is {}".format(label, pred_class))
代码执行结果:
The true category is 2 and the predicted category is 2
对比Softmax分类和前馈神经网络分类。
在前面实验4中的实践内容中我们已经得到了Softmax分类训练集损失和训练集loss变化情况,如下图所示:
前馈神经网络分类训练150个epoch后得到的loss结果如下所示:
[Train] epoch: 0/150, step: 0/1200, loss: 1.09898
[Evaluate] dev score: 0.33333, dev loss: 1.09582
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.33333
[Train] epoch: 12/150, step: 100/1200, loss: 1.13891
···
[Train] epoch: 137/150, step: 1100/1200, loss: 0.28321
[Evaluate] dev score: 0.93333, dev loss: 0.28383
[Evaluate] dev score: 0.93333, dev loss: 0.27171
[Evaluate] dev score: 0.93333, dev loss: 0.25447
[Train] Training done!
从结果来看,前馈神经网络的训练效果更好,loss值下降的更快。
对比SVM与FNN分类效果,谈谈自己看法。
SVM代码实现如下:
from math import exp # 数学
from random import shuffle # 随机
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
def load_data(filename):
data_row = []
with open(filename, 'r') as f:
for line in f.readlines():
line = line.split()
current_line = []
for i in range(len(line)):
if i != len(line) - 1:
current_line.append(float(line[i]))
else:
if line[i] == '1':
current_line.append([1, 1])
elif line[i] == '2':
current_line.append([-1, 1])
else:
current_line.append([-1, -1])
data_row.append(current_line)
data_colomn = []
for i in range(len(data_row[0])):
line = [data_row[j][i] for j in range(len(data_row))]
data_colomn.append(line)
return data_colomn
data = load_data('Iris.txt')
def W(zhichi, xy, a): # 计算更新 w
w = [0, 0]
if len(zhichi) == 0: # 初始化的0
return w
for i in zhichi:
w[0] += a[i] * xy[0][i] * xy[2][i] # 更新w
w[1] += a[i] * xy[1][i] * xy[2][i]
return w
def B(zhichi, xy, a): # 计算更新 b
b = 0
if len(zhichi) == 0: # 初始化的0
return 0
for s in zhichi: # 对任意的支持向量有 ysf(xs)=1 所有支持向量求解平均值
sum = 0
for i in zhichi:
sum += a[i] * xy[2][i] * (xy[0][i] * xy[0][s] + xy[1][i] * xy[1][s])
b += 1 / xy[2][s] - sum
return b / len(zhichi)
def SMO(xy, m):
a = [0.0] * len(xy[0]) # 拉格朗日乘子
zhichi = set() # 支持向量下标
loop = 1 # 循环标记(符合KKT)
w = [0, 0] # 初始化 w
b = 0 # 初始化 b
while loop:
loop += 1
if loop == 150:
print("达到早停标准")
print("循环了", loop, "次")
loop = 0
break
# 初始化=========================================
fx = [] # 储存所有的fx
yfx = [] # 储存所有yfx-1的值
Ek = [] # Ek,记录fx-y用于启发式搜索
E_ = -1 # 贮存最大偏差,减少计算
a1 = 0 # SMO a1
a2 = 0 # SMO a2
# 初始化结束======================================
# 寻找a1,a2======================================
for i in range(len(xy[0])): # 计算所有的 fx yfx-1 Ek
fx.append(w[0] * xy[0][i] + w[1] * xy[1][i] + b) # 计算 fx=wx+b
yfx.append(xy[2][i] * fx[i] - 1) # 计算 yfx-1
Ek.append(fx[i] - xy[2][i]) # 计算 fx-y
if i in zhichi: # 之前看过的不看了,防止重复找某个a
continue
if yfx[i] <= yfx[a1]:
a1 = i # 得到偏离最大位置的下标(数值最小的)
if yfx[a1] >= 0: # 最小的也满足KKT
print("一共循环", loop, "次")
loop = 0 # 循环标记(符合KKT)置零(没有用到)
break
for i in range(len(xy[0])): # 遍历找间隔最大的a2
if i == a1: # 如果是a1,跳过
continue
Ei = abs(Ek[i] - Ek[a1]) # |Eki-Eka1|
if Ei < E_: # 找偏差
E_ = Ei # 储存偏差的值
a2 = i # 储存偏差的下标
# 寻找a1,a2结束===================================
zhichi.add(a1) # a1录入支持向量
zhichi.add(a2) # a2录入支持向量
# 分析约束条件=====================================
# c=a1*y1+a2*y2
c = a[a1] * xy[2][a1] + a[a2] * xy[2][a2] # 求出c
# n=K11+k22-2*k12
if m == 1: # 线性核
n = xy[0][a1] ** 2 + xy[1][a1] ** 2 + xy[0][a2] ** 2 + xy[1][a2] ** 2 - 2 * (
xy[0][a1] * xy[0][a2] + xy[1][a1] * xy[1][a2])
elif m == 2: # 多项式核(这里是二次)
n = (xy[0][a1] ** 2 + xy[1][a1] ** 2) ** 2 + (xy[0][a2] ** 2 + xy[1][a2] ** 2) ** 2 - 2 * (
xy[0][a1] * xy[0][a2] + xy[1][a1] * xy[1][a2]) ** 2
elif m == 3: # 高斯核 取 2σ^2 = 1
n = 2 * exp(-1) - 2 * exp(-((xy[0][a1] - xy[0][a2]) ** 2 + (xy[1][a1] - xy[1][a2]) ** 2))
# 确定a1的可行域=====================================
if xy[2][a1] == xy[2][a2]:
L = max(0.0, a[a1] + a[a2] - 0.5) # 下界
H = min(0.5, a[a1] + a[a2]) # 上界
else:
L = max(0.0, a[a1] - a[a2]) # 下界
H = min(0.5, 0.5 + a[a1] - a[a2]) # 上界
if n > 0:
a1_New = a[a1] - xy[2][a1] * (Ek[a1] - Ek[a2]) / n # a1_New = a1_old-y1(e1-e2)/n
# print("x1=",xy[0][a1],"y1=",xy[1][a1],"z1=",xy[2][a1],"x2=",xy[0][a2],"y2=",xy[1][a2],"z2=",xy[2][a2],"a1_New=",a1_New)
# 越界裁剪============================================================
if a1_New >= H:
a1_New = H
elif a1_New <= L:
a1_New = L
else:
a1_New = min(H, L)
# 参数更新=======================================
a[a2] = a[a2] + xy[2][a1] * xy[2][a2] * (a[a1] - a1_New) # a2更新
a[a1] = a1_New # a1更新
w = W(zhichi, xy, a) # 更新w
b = B(zhichi, xy, a) # 更新b
# print("W=", w, "b=", b, "zhichi=", zhichi, "a1=", a[a1], "a2=", a[a2])
# 标记支持向量======================================
for i in zhichi:
if a[i] == 0: # 选了,但值仍为0
loop = loop + 1
e = 'silver'
else:
if xy[2][i] == 1:
e = 'b'
else:
e = 'r'
plt.scatter(x1[0][i], x1[1][i], c='none', s=100, linewidths=1, edgecolor=e)
print("支持向量个数:", len(zhichi), "\na为零的支持向量个数:", loop)
print("有效向量个数:", len(zhichi) - loop)
# 返回数据 w b =======================================
return [w, b]
def Def(xyz, w_b1, w_b2):
c = 0
for i in range(len(xyz[0])):
if (xyz[0][i] * w_b1[0][0] + xyz[1][i] * w_b1[0][1] + w_b1[1]) * xyz[2][i][0] < 0:
c = c + 1
continue
if (xyz[0][i] * w_b2[0][0] + xyz[1][i] * w_b2[0][1] + w_b2[1]) * xyz[2][i][1] < 0:
c = c + 1
continue
return (1 - c / len(xyz[0])) * 100
# 选择数据===================================================
Attribute1 = eval(input("选取第一个属性(输入0~4的任意整数):"))
Attribute2 = eval(input("选取第二个属性(输入0~4的任意整数):"))
# 生成数据集==================================================
lt = list(range(150)) # 得到一个顺序序列
shuffle(lt) # 打乱序列
x1 = [[], [], []] # 初始化x1
x2 = [[], [], []] # 初始化x2
for i in lt[0:100]: # 截取部分做训练集
x1[0].append(data[Attribute1][i]) # 加上数据集x属性
x1[1].append(data[Attribute2][i]) # 加上数据集y属性
x1[2].append(data[4][i]) # 加上数据集c标签
for i in lt[100:150]: # 截取部分做测试集
x2[0].append(data[Attribute1][i]) # 加上数据集x属性
x2[1].append(data[Attribute2][i]) # 加上数据集y属性
x2[2].append(data[4][i]) # 加上数据集c标签
print('\n训练开始')
def Plot(x1, x2, wb1, wb2, m):
x = [x1[0][:], x1[1][:], x1[2][:]]
for i in range(len(x[2])): # 对训练集‘上色’
if x[2][i] == [1, 1]:
x[2][i] = 'r' # 训练集 1 1 红色
elif x[2][i] == [-1, 1]:
x[2][i] = 'g' # 训练集 -1 1 绿色
else:
x[2][i] = 'b' # 训练集 -1 -1 蓝色
plt.scatter(x[0], x[1], c=x[2], alpha=0.8) # 绘点训练集
x = [x2[0][:], x2[1][:], x2[2][:]]
for i in range(len(x[2])): # 对测试集‘上色’
if x[2][i] == [1, 1]:
x[2][i] = 'orange' # 训练集 1 1 橙色
elif x[2][i] == [-1, 1]:
x[2][i] = 'y' # 训练集 -1 1 黄色
else:
x[2][i] = 'm' # 训练集 -1 -1 紫色
plt.scatter(x[0], x[1], c=x[2], alpha=0.8) # 绘点测试集
plt.xlabel('x') # x轴标签
plt.ylabel('y') # y轴标签
font = FontProperties(fname=r"C:\windows\fonts\simsun.ttc", size=16)
if m == 1:
plt.title('线性核', fontproperties=font) # 标题
elif m == 2:
plt.title('多项式核', fontproperties=font)
elif m == 3:
plt.title('高斯核', fontproperties=font)
xl = np.arange(min(x[0]), max(x[0]), 0.1) # 绘制分类线一
yl = (-wb1[0][0] * xl - wb1[1]) / wb1[0][1]
plt.plot(xl, yl, 'r')
xl = np.arange(min(x[0]), max(x[0]), 0.1) # 绘制分类线二
yl = (-wb2[0][0] * xl - wb2[1]) / wb2[0][1]
plt.plot(xl, yl, 'b')
for m in range(1,4):
if m == 1:
print('\n使用线性核训练') # 标题
elif m == 2:
print('\n使用多项式核训练')
elif m == 3:
print('\n使用高斯核训练')
# 计算 w b============================================
plt.figure(m) # 第m张画布
x = [x1[0][:], x1[1][:], []] # 第一次分类
for i in x1[2]:
x[2].append(i[0]) # 加上数据集标签
wb1 = SMO(x, m)
x = [x1[0][:], x1[1][:], []] # 第二次分类
for i in x1[2]:
x[2].append(i[1]) # 加上数据集标签
wb2 = SMO(x, m)
print("w1 =", wb1[0], "\nb1 =", wb1[1])
print("w2 =", wb2[0], "\nb2 =", wb2[1])
# 计算正确率===========================================
print("训练集上的正确率为:", Def(x1, wb1, wb2), "%")
print("测试集上的正确率为:", Def(x2, wb1, wb2), "%")
# 绘图 ===============================================
# 圈着的是曾经选中的值,灰色的是选中但更新为0
Plot(x1, x2, wb1, wb2, m)# 显示所有图
plt.show()
代码执行结果:
选取第一个属性(输入0~4的任意整数):
(输入1)
选取第二个属性(输入0~4的任意整数):
(输入3)
训练开始
使用线性核训练
一共循环 29 次
支持向量个数: 28
a为零的支持向量个数: 17
有效向量个数: 11
一共循环 65 次
支持向量个数: 64
a为零的支持向量个数: 40
有效向量个数: 24
w1 = [0.6435118046798376, -1.561166670558747]
b1 = -0.7437199069208986
w2 = [0.010546446427642797, -1.0720509834243026]
b2 = 1.8074665437926425
训练集上的正确率为: 97.0 %
测试集上的正确率为: 94.0 %
使用多项式核训练
达到早停标准
循环了 150 次
支持向量个数: 67
a为零的支持向量个数: 5
有效向量个数: 62
达到早停标准
循环了 150 次
支持向量个数: 86
a为零的支持向量个数: 22
有效向量个数: 64
w1 = [-1.086168298395296, -1.1564902479126118]
b1 = 4.365290406790221
w2 = [0.003945943056392308, -0.4353856147398717]
b2 = 0.844606561270203
训练集上的正确率为: 74.0 %
测试集上的正确率为: 54.0 %
使用高斯核训练
达到早停标准
循环了 150 次
支持向量个数: 49
a为零的支持向量个数: 45
有效向量个数: 4
一共循环 65 次
支持向量个数: 64
a为零的支持向量个数: 46
有效向量个数: 18
w1 = [0.34999999999999987, -0.95]
b1 = -0.3311224489795913
w2 = [0.05000000000000093, -1.1759635443845813]
b2 = 1.8659604131108798
训练集上的正确率为: 98.0 %
测试集上的正确率为: 94.0 %
执行代码后得到下列图像:
支持向量机的最大特点是训练所需的数据量很小,不用调节参数,泛化能力强且很容易求得全局最优解。
神经网络训练所需要的数据量一般极大,且有可能陷入局部最优值中。
上述内容也是神经网络第二次进入低谷期的原因:支持向量机经济实惠。
只要是训练就不会离开Iris数据集分类。
**训练方法之间,亦有差距。**学习相关内容时,SVM的方便快捷确实很吸引人。但是SVM在非二分类任务上的疲软表现,也让我重新审视了神经网络。我认为,SVM更像是一款老人机,神经网络像一台智能机,即便你在拨打电话时还需要打开电话软件,但更全面的功能才是我们一直所追求的。
前馈神经网络中必不可少的就是神经元和它的激活函数(Logistic函数、Sigmoid函数、ReLU函数及其变种等等),在流程中,还有反向传播算法以及自动梯度计算等。在非线性模型中前馈神经网络的优势明显,不会受“非黑即白”的限制。