Cursor is an editor made for programming with AI. It’s early days, but right now Cursor can help you with a few things…
https://twitter.com/amanrsanger
对于上面最后一张图的中的代码,如果直接在IDE里面运行是不会报错的,但是有一句代码
vif["VIF"] = [variance_inflation_factor(df.values, i) for i in range(df.shape[1]-1)]
是不符合多重共线性分析或者VIF的数学原理的。因为VIF是对自变量间线性关系的分析,如果直接调用OLS;如果把OLS里面的目标函数换成非线性方程,就是表达的非线性关系。而上面的代码是把df.values都传入了variance_inflation_factor函数,包括了自变量和因变量,因此是不符合多重共线性分析原理的。
所以应改成:
import pandas as pd
data = {'x1': [1, 2, 3, 4, 5],
'x2': [2, 4, 6, 8, 10],
'x3': [3, 6, 9, 12, 15],
'y': [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Get the VIF for each feature
vif = pd.DataFrame()
vif["feature"] = df.columns[:-1]
# vif["VIF"] = [variance_inflation_factor(df.values, i) for i in range(df.shape[1]-1)]
vif["VIF"] = [variance_inflation_factor(df.values[:, :-1], i) for i in range(df.shape[1]-1)]
# Print the results
print(vif)
原理解释:
def variance_inflation_factor(exog, exog_idx):
"""
Variance inflation factor, VIF, for one exogenous variable
The variance inflation factor is a measure for the increase of the
variance of the parameter estimates if an additional variable, given by
exog_idx is added to the linear regression. It is a measure for
multicollinearity of the design matrix, exog.
One recommendation is that if VIF is greater than 5, then the explanatory
variable given by exog_idx is highly collinear with the other explanatory
variables, and the parameter estimates will have large standard errors
because of this.
Parameters
----------
exog : {ndarray, DataFrame}
design matrix with all explanatory variables, as for example used in
regression
exog_idx : int
index of the exogenous variable in the columns of exog
Returns
-------
float
variance inflation factor
Notes
-----
This function does not save the auxiliary regression.
See Also
--------
xxx : class for regression diagnostics TODO: does not exist yet
References
----------
https://en.wikipedia.org/wiki/Variance_inflation_factor
"""
k_vars = exog.shape[1]
exog = np.asarray(exog)
x_i = exog[:, exog_idx]
mask = np.arange(k_vars) != exog_idx
x_noti = exog[:, mask]
r_squared_i = OLS(x_i, x_noti).fit().rsquared
vif = 1. / (1. - r_squared_i)
return vif
https://github.com/features/copilot
目前有个人版、商业版和教育版三种,教育版里面学生账户比教师账户的免费功能更多,只需要一个学校邮箱和一卡通就可以认证学生账户或者教师账户,还是很方便。
以下代码除了第一行注释都是copilot自动补全的
# 写一个mysql取数据的类
import pymysql
class Mysql:
def __init__(self,host,user,password,database,port=3306):
self.host = host
self.user = user
self.password = password
self.database = database
self.port = port
self.conn = pymysql.connect(host=self.host,user=self.user,password=self.password,database=self.database,port=self.port)
self.cursor = self.conn.cursor()
def fetchone(self,sql):
self.cursor.execute(sql)
return self.cursor.fetchone()
def fetchall(self,sql):
self.cursor.execute(sql)
return self.cursor.fetchall()
def close(self):
self.cursor.close()
self.conn.close()
def commit(self):
self.conn.commit()
def rollback(self):
self.conn.rollback()
def execute(self,sql):
self.cursor.execute(sql)
self.commit()
def executemany(self,sql,values):
self.cursor.executemany(sql,values)
self.commit()
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
class Multicollinearity:
def __init__(self,df):
self.df = df
def get_vif(self):
vif = pd.DataFrame()
vif["VIF Factor"] = [self.get_vif_score(i) for i in range(self.df.shape[1])]
vif["features"] = self.df.columns
return vif
def get_vif_score(self,i):
y = self.df.iloc[:,i]
x = self.df.drop(self.df.columns[i],axis=1)
lr = LinearRegression()
lr.fit(x,y)
r2 = r2_score(y,lr.predict(x))
return 1/(1-r2)
用工具自动写代码的时候,最好要用实际例子或实际数据检查一下,就算没有报错,对于数值计算最好也要debug跑一遍看它生成的代码是否符合你的描述或者数学原理。具体原因见上文的注意。