线性回归 c语言实现
Linear regression models the relation between an explanatory (independent) variable and a scalar response (dependent) variable by fitting a linear equation.
线性回归通过拟合线性方程来对解释性(独立)变量和标量响应(因变量)之间的关系进行建模。
For example, Modeling the weights of Individuals with their heights using a linear equation.
例如,使用线性方程式对个人的体重及其身高进行建模。
Before trying to model the relationship on the observed data, you should first determine whether there is a linear relation between them or not, usually, the scatter plot can be a helpful tool to view the relation between the data.
在尝试对观察到的数据建立关系模型之前,首先应确定它们之间是否存在线性关系,通常,散点图可以成为查看数据之间关系的有用工具。
https://commons.wikimedia.org/w/index.php?curid=11967659 https: //commons.wikimedia.org/w/index.php?curid = 11967659A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is, and a is the intercept (the value of y when x = 0).
线性回归线的方程式为Y = a + bX ,其中X为解释变量, Y为因变量。 线的斜率是, a是截距( x = 0时y的值)。
In this article, We will implement the Simple Linear Regression model. Simple linear regression concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that predicts the dependent variable values as a function of the independent variable.
在本文中,我们将实现简单线性回归模型。 简单线性回归涉及具有一个自变量和一个因变量的二维样本点,并找到一个线性函数,该线性函数可预测因变量作为自变量的函数。
When you perform a simple linear regression (or any other type of regression analysis), you get a line of best fit. The data points usually don’t fall on this regression equation line; they are scattered around.
当执行简单的线性回归(或任何其他类型的回归分析)时,您会得到一条最合适的线。 数据点通常不会掉落 在这个回归方程线上; 他们四处散落。
A residual is a vertical distance between a data point and the regression line. Each data point has one residual. It is positive if it is above the regression line and negative if it is below the regression line. If the regression line passes through the point, the residual at that point is zero.
残差是数据点和回归线之间的垂直距离。 每个数据点都有一个残差。 如果它高于回归线,则为正;如果它低于回归线,则为负。 如果回归线通过该点,则该点的残差为零。
The main problem here is to minimize the total residual error to find the line of best fit, if you need more explanation on the theory behind the following equations, I recommend reading this article:
这里的主要问题是最小化总残留误差以找到最佳拟合线,如果您需要以下方程背后的理论更多解释,我建议您阅读本文:
Without going into details, the equations that we should use are:
在不赘述的情况下,我们应使用的公式为:
here 这里找到Simply we can divide it into the following for simplicity:
为了简单起见,我们可以将其分为以下几类:
Now we can start going through the implementation of Linear Regression
现在我们可以开始执行线性回归
1-计算系数: (1- Calculate the coefficients:)
The first step is to implement the function that calculates the coefficients
第一步是实现计算系数的功能
as the expected format for the equation is Y = a + bX, we need to calculate a and b, according to the mentioned relations.
由于方程的期望格式为Y = a + bX,因此我们需要根据上述关系式计算a和b。
1- calculate the mean for the dependent variable and the mean value for the independent variable.
1-计算因变量的平均值和自变量的平均值。
2- Calculate the SS_XY is the sum of the element-wise multiplication of the dependent variable vector with the independent variable vector.
2-计算SS_XY是因变量矢量与自变量矢量的逐元素相乘之和。
3- Calculate the SS_XX is the sum of the element-wise multiplication of the independent variable vector with itself.
3-计算SS_XX是自变量矢量与其自身的元素相乘的总和。
4- Calculate the B_1 coefficient by dividing the SS_XY over the SS_XX value.
4-通过将SS_XY除以SS_XX值来计算B_1系数。
5- Calculate the B_0 coefficient.
5-计算B_0系数。
Estimate coefficient API 估算系数API2-实施课程: (2- Implementing the Class:)
We need to train only two private variables, which are the coefficients.
我们只需要训练两个私有变量,即系数。
For the Fit API, we need it to take the dataset as a vector of the dependent and independent variables, and then estimate the coefficient based on these vectors and store the learned coefficients into our private variables.
对于Fit API,我们需要它将数据集作为因变量和自变量的向量,然后根据这些向量估计系数并将学习到的系数存储到我们的私有变量中。
The remaining part is to implement the Predict API to take the independent variable value and return the estimated value after applying the Linear regression equation.
剩下的部分是实现Predict API,以采用独立变量值并在应用线性回归方程后返回估计值。
Linear regression Class 线性回归类3-示例: (3- Example:)
An example of the usage of the Linear Model, we just implemented.We instantiated a class instance with types of float, fit this model to the independent variable and the dependent variable vectors.
我们刚刚实现了一个使用线性模型的示例,我们实例化了一个类型为float的类实例,使该模型适合自变量和因变量向量。
Then we test the model by predicting the values and showing the result after the model fitting.
然后,我们通过预测值并在模型拟合后显示结果来测试模型。
Please note that for debugging purposes, I moved the b_0 and b_1 to be public.
请注意,出于调试目的,我将b_0和b_1公开。
the coefficients printing 系数打印I have also used the matplotlibcpp to plot the output and compare the predicted values against the original data.
我还使用了matplotlibcpp来绘制输出,并将预测值与原始数据进行比较。
orange line is the predicted value after applying the linear regression 橙色线是应用线性回归后的预测值You can find an introduction to how to use the matplotlibcpp in the following article.
您可以在以下文章中找到有关如何使用matplotlibcpp的介绍。
The implementation of Linear regression is simple. Linear Regression is a powerful statistical technique and can be used to generate insights on consumer behavior, understanding business, and factors influencing profitability. Linear regressions can also be used in business to evaluate trends and make estimates or forecasts.
线性回归的实现很简单。 线性回归是一种强大的统计技术,可用于生成有关消费者行为,了解业务以及影响盈利能力的因素的见解。 线性回归还可以用于业务中以评估趋势并做出估计或预测。
This article is part of a series that address the implementation of Machine learning algorithms in C++, throughout this series, We will be using the Iris data set available here.
本文是该系列的一部分,该系列解决了C ++中机器学习算法的实现,在整个系列中,我们将使用此处提供的Iris数据集。
When Should You Learn Machine Learning using C++?
什么时候应该使用C ++学习机器学习?
The 8 Books Each C++ Developer Must Read.
每个C ++开发人员必须阅读的8本书。
Data Preprocessing And Visualization In C++.
C ++中的数据预处理和可视化。
Machine Learning Data Manipulation Using C++.
使用C ++进行机器学习数据操作。
Naive Bayes From Scratch using C++
使用C ++从零开始的朴素贝叶斯
Hope you find this article useful, Please follow to get notified when a new article in this series is released.
希望本文对您有用,请在发布本系列的新文章时关注以得到通知。
翻译自: https://medium.com/swlh/linear-regression-implementation-in-c-acdfb621e56
线性回归 c语言实现