sklearn机器学习之【学习曲线】

【本文精髓源自b站‘莫烦Python’系列教程】

【适用于Python3版本,相应第三方库建议升级至最新版本】

下面展示两种学习曲线的绘制方法:

'''sklearn.learning_curve 中的 learning curve学习曲线 
可以很直观的看出我们的 model 学习的进度, 对比发现 有没有overfitting 的问题. 
然后我们可以对我们的 model 进行调整, 克服 过拟合overfitting 的问题.'''
%matplotlib inline 
from sklearn.model_selection import learning_curve
from sklearn.datasets import load_digits #自带digits数据集
from sklearn.svm import SVC #支持向量分类器
import matplotlib.pyplot as plt
import numpy as np
digits = load_digits()
X = digits.data
y = digits.target
#观察样本量由小到大的学习曲线变化趋势,评价指标采用均方误差;
#k折交叉验证采用10折,训练样本由小到大体量依次为0.1,0.25,0.5,0.75,1;
train_sizes,train_loss,test_loss = learning_curve(SVC(gamma=0.001),X,y,
cv=10,scoring='neg_mean_squared_error',train_sizes=[0.1,0.25,0.5,0.75,1])#尝试调参gamma;
print(train_loss)#查看训练损失,确为负值,依据不同的train_sizes;
train_loss_mean = -np.mean(train_loss,axis=1)#将训练损失值按行求平均;
#实例演示:a = np.array([[1,2,3],[4,5,6]]),np.mean(a,axis=1),array([2., 5.])
print(train_loss_mean)#查看训练损失均值;
test_loss_mean = -np.mean(test_loss,axis=1)#验证损失,已正向;
plt.figure()
plt.plot(train_sizes,train_loss_mean,'o-',color='r',label='Training')
plt.plot(train_sizes,test_loss_mean,'o-',color='g',label='Cross-validation')
plt.xlabel("Training example quantities")
plt.ylabel('Loss')
plt.legend(loc='best')#legend是镌刻的意思,loc是location位置,best居中;


#下面演示另一种,用learning_curve默认的score精度刻画学习曲线;
train_sizes,train_scores,test_scores = learning_curve(SVC(gamma=0.001),X,y,
cv=10,train_sizes=[0.1,0.25,0.5,0.75,1])#尝试调参gamma;
print(train_scores)
train_scores_mean = np.mean(train_scores,axis=1)#此时scores本身就为正,无需加负号;
print(train_scores_mean)
test_scores_mean = np.mean(test_scores,axis=1)
plt.figure()
plt.plot(train_sizes,train_scores_mean,'o-',color='r',label='Training')
plt.plot(train_sizes,test_scores_mean,'o-',color='g',label='Cross-validation')
plt.xlabel("Training example quantities")
plt.ylabel('Scores')
plt.legend(loc='best')

plt.show()

 【print打印结果:】

neg_mean_squared_error 损失值打印:
[[-0.         -0.09937888 -0.09937888 -0.09937888 -0.09937888 -0.09937888
  -0.09937888 -0.09937888 -0.09937888 -0.09937888]
 [-0.         -0.03970223 -0.03970223 -0.03970223 -0.03970223 -0.03970223
  -0.03970223 -0.03970223 -0.03970223 -0.03970223]
 [-0.         -0.01985112 -0.01985112 -0.01985112 -0.01985112 -0.01985112
  -0.01985112 -0.01985112 -0.01985112 -0.01985112]
 [-0.         -0.0165426  -0.01323408 -0.01323408 -0.01323408 -0.01323408
  -0.01323408 -0.01323408 -0.01323408 -0.01323408]
 [-0.02233251 -0.03225806 -0.01054591 -0.03225806 -0.03225806 -0.03225806
  -0.03225806 -0.03225806 -0.03225806 -0.00992556]]
[0.08944099 0.03573201 0.017866   0.01224152 0.02686104]

默认的score精度值打印:
[[1.         0.99378882 0.99378882 0.99378882 0.99378882 0.99378882
  0.99378882 0.99378882 0.99378882 0.99378882]
 [1.         0.99751861 0.99751861 0.99751861 0.99751861 0.99751861
  0.99751861 0.99751861 0.99751861 0.99751861]
 [1.         0.99875931 0.99875931 0.99875931 0.99875931 0.99875931
  0.99875931 0.99875931 0.99875931 0.99875931]
 [1.         0.99834574 0.99917287 0.99917287 0.99917287 0.99917287
  0.99917287 0.99917287 0.99917287 0.99917287]
 [0.99937965 0.99875931 0.99875931 0.99875931 0.99875931 0.99875931
  0.99875931 0.99875931 0.99875931 0.99937965]]
[0.99440994 0.99776675 0.99888337 0.99917287 0.99888337]

 【两种学习曲线绘制结果:】

 sklearn机器学习之【学习曲线】_第1张图片

sklearn机器学习之【学习曲线】_第2张图片

你可能感兴趣的:(机器学习项目实践)