案例场景:每个销售型公司都有一定的促销费用,促销费用可以带来销售量的显著提升;当给出一定的促销费用时,预计会带来多大的商品销售量?
数据源地址:https://pan.baidu.com/s/1VE7zmWiToI5A4zuc8zmZaQ
# 导入库
import re
import numpy
from sklearn import linear_model
from matplotlib import pyplot as plt
# 导入数据
fn = open('data.txt', 'r')
all_data = fn.readlines()
fn.close()
In [10]: all_data[-5:]
Out[10]:
['21511.0\t59960.0\n',
'28166.0\t85622.0\n',
'34130.0\t82463.0\n',
'17789.0\t64759.0\n',
'21382.0\t54315.0\n']
# 数据预处理
x = []
y = []
for single_data in all_data:
tmp_data = re.split('\t|\n', single_data)
x.append(float(tmp_data[0]))
y.append(float(tmp_data[1]))
x = numpy.array(x).reshape([-1, 1])
y = numpy.array(y).reshape([-1, 1])
# 数据分析展示
plt.scatter(x, y)
plt.show()
# 数据建模
model = linear_model.LinearRegression()
model.fit(x, y)
# 模型评估
##针对线性模型y=ax+b,由系数a和截距b两个参数;
model_coef = model.coef_ #系数a
model_intercept = model.intercept_ #截距b
r2 = model.score(x, y)
In [23]: r2 = model.score(x,y)
In [24]: model_coef
Out[24]: array([[ 2.09463661]])
In [25]: model_intercept
Out[25]: array([ 13175.36904199])
In [26]: r2
Out[26]: 0.78764146847589545
# 销售预测
In [27]: new_x = 84610
In [28]: pre_y = model.predict(new_x)
In [29]: pre_y
Out[29]: array([[ 190402.57234225]])