【机器学习】bias and variance

http://scott.fortmann-roe.com/docs/BiasVariance.html

bias-variance tradeoff.

  • High bias means that the model is too simple to capture the relationship of the features and target. A high bias model usually has less model complexity and has both bad training and validation/testing score.
  • High variance means models with same complexity trained on similar data points would have significant difference.It is too complex to be consistent. A high variance model usually has 1) very low training error and high testing error, and 2) large gap between training and validation curve.
  • 高偏差意味着模型太简单,估计不准确,不能捕捉特征和目标之间的关系。高偏差模型通常复杂度低,训练和测试得分(这里指R^2可决系数)低。
  • 高方差意味着同样复杂度的模型在 相似数据点上训练但有显著性的差异。模型太复杂很难一致。高方差模型通常1)训练错误低,测试错误高,2)训练和测试曲线的得分相距很远。如下图所示:
  • 在用决策树模型时,
  • max-depth=1时,Training Score 和 Validation Score都很低,是高偏差,即模型简单,在训练集上效果不好,在预测集上也不好
  • max-depth=10时,Training Score 和 Validation Score之间的gap很大,且Training Score很高,说明在训练集上效果很好,但在测试集上效果不好。是高方差。
  • 竖轴的Score是指可决系数R^2
  • 可决系数在[0,1]zhijian ,表示目标变量的预测值和实际值之间的相关程度平方的百分比,表示该模型中目标变量中有百分之多少能用特征来解释。
  • R^2为负值时,所做预测还不如直接计算目标变量的平均值。
  • 具体公式参考另一篇博客http://blog.csdn.net/duxinyuhi/article/details/52233993
  • 【机器学习】bias and variance_第1张图片


你可能感兴趣的:(机器学习,统计)