r²或R²-何时使用

Picture this- You are a stock analyst responsible for predicting Walmart’s stock price ahead of its quarterly earnings report. You are hard at work just when your data scientist walks in saying they discovered a little-known data stream providing daily Walmart parking lot occupancy that seems well correlated with Walmart’s historic revenues. You are understandably excited. You ask them to use the parking lot data alongside other standard metrics in a machine learning model to forecast Walmart’s stock price.

想象一下-您是一名股票分析师,负责在其季度收益报告之前预测沃尔玛的股价。 当您的数据科学家说他们发现了一个鲜为人知的数据流时,您正在努力工作,该数据流提供的每日沃尔玛停车场占用率似乎与沃尔玛的历史收益密切相关 。 您很兴奋。 您要求他们在机器学习模型中将停车场数据与其他标准指标一起使用,以预测沃尔玛的股价。

So far so good.

到目前为止,一切都很好。

The data scientist returns in a few hours claiming that after careful validation of the model, its predictions are strongly correlated with the true stock price. Do you accept the model without any further investigations?

数据科学家在几个小时后返回,声称经过仔细验证模型后,其预测与真实股票价格密切相关 。 您接受模型而无需进一步调查吗?

I hope not.

我希望不是。

Correlations are good for identifying patterns in data, but almost meaningless for quantifying a model’s performance, especially for complex models (like machine learning models). This is because correlations only tell if two things follow each other (e.g., parking lot occupancy and Walmart’s stock), but don’t tell how they match each other (e.g., predicted and actual stock price). For that, model performance metrics like the coefficient of determination (R²) can help.

çorrelations是用于识别数据中的模式不错,但几乎没有任何意义了量化模型的性能,特别是对于复杂的模型(如机器学习模型)。 这是因为相关性仅指示两个事物是否相互跟随(例如,停车场占用率和沃尔玛的股票),而没有告诉它们如何相互匹配(例如,预测股价和实际股价)。 为此,模型性能指标(如确定系数(R²))可以提供帮助。

In this article, we will learn:

在本文中,我们将学习:

  1. What is the correlation coefficient (r) and its square (r²)?

    什么是相关系数(r)和它的平方(R²)?

  2. What is the coefficient of determination (R²)?

    什么是判定(R²)的系数α

  3. When to use each of the above?

    何时使用以上每种方法?

1.相关系数:“这个预测指标有多好?” (1. Correlation coefficient: “How good is this predictor?”)

你可能感兴趣的:(python)