python_评估回归模型

评估回归模型

均方误差 越小越好

MSE=1n∗∑i=0n(ŷ i−yi)2
R方越接近一越好
python_评估回归模型_第1张图片

# load libraries 加载库
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
​
# generate features matrix, target vector
features, target = make_regression(n_samples = 100,
                                   n_features = 3,
                                   n_informative = 3,
                                   n_targets = 1,
                                   noise = 50,
                                   coef = False,
                                   random_state = 1)# create a linear regression object
ols = LinearRegression()# cross-validate the lienar regression using (negative) MSE
# 使用MSE  均方误差 进行交叉验证
cross_val_score(ols, features, target, scoring='neg_mean_squared_error')
array([-1718.22817783, -3103.4124284 , -1377.17858823])
Another common regression metric is the coefficient of determination,  R2R2 
cross_val_score(ols, features, target, scoring='r2')
array([0.87804558, 0.76395862, 0.89154377])
Discussion
MSE is one of the most common evaluation metrics for regression models. Formally, MSE is:

MSE=1n∗∑i=0n(ŷ i−yi)2
MSE=1n∗∑i=0n(y^i−yi)2
 
where  nn  is the number of observations  yiyi  is the true value of the target we are trying to predict for observation  ii   ŷ iy^i  is the model's predicted value

MSE is a measurement of the squared sum of all distances between predicted and true values.

The higher the value of MSE, the greater the total squared error and thus the worse the model. Ther are a number of mathematical benefits to squaring the error term, including that it forced all error alues to be positive, but one often unrealized implication is that squaring penalizes a few large errors mroe than many small errors, even if the absolute value of the errors is the same.

For example, imagine wo models, A and B, each with two observations:

Model A has errors of 0 and 10 and thus its MSE is  02+102=10002+102=100 .
Model B has two errors of 5 each, and thus its MSE is  52+52=5052+52=50 
Both models have the same total error, 10; however, MSE would consider Model A (MSE = 100) worse than Model B (MSE=50). In practice this implicatino is rarely an issue (and indeed can be theoretically beneficial) and MSE works perfectly fine as an evaluation metric

One important note: by default in scikit-learn arguments of the scoring parameter assume that higher values are better than lower values. However, this is not the case for MSE, where higher values mean a worse model. For this reason, scikit-learn looks at the negative MSE using the neg_mean_squared_error argument

A common alternative regression evaluation metric is  R2R2 , which measures the amount of variance in the target vector that is explained by the model:
R2=1−∑ni=1(yi−ŷ i)2∑ni=1(yi−y¯)2
R2=1−∑i=1n(yi−y^i)2∑i=1n(yi−y¯)2
 
where  yiyi  is the true target value of the ith observation

ŷ iy^i  is the predicted value for the ith observation

and  y¯y¯  is the mean value of the target vector.

The closer to 1.0, the better the model.

你可能感兴趣的:(算法)