数据挖掘

数据挖掘

  • 一、内置函数
  • 二、读写数据
  • 三、数据清洗
  • 四、机器学习
    • 1)训练集划分
    • 2)模型选择
    • 3)模型评估
      • 分类模型
      • 回归模型
    • 4)欠拟合和过拟合

一、内置函数

二、读写数据

三、数据清洗

四、机器学习

1)训练集划分

# 随机抽取
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 交叉验证
from sklearn.model_selection import KFold
for train_index, test_index in KFold(n_splits=5).split(X):
    print("Train:", train_index, "test:", test_index)

2)模型选择

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import ExtraTreeClassifier
from sklearn.cluster import KMeans
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

Model = \
{0: LinearRegression(),
 1: Lasso(),
 2: Ridge(),
 3: ElasticNet(),
 4: LogisticRegression(),
 5: GaussianNB(),
 6: KNeighborsClassifier(),
 7: DecisionTreeClassifier(),
 8: ExtraTreeClassifier(),
 9: KMeans(),
 10: AdaBoostClassifier(),
 11: ExtraTreesClassifier(),
 12: GradientBoostingClassifier(),
 13: RandomForestClassifier(),
 14: SVC(),
 15: XGBClassifier(),
 16: LGBMClassifier()}
 
name, score, predict = {},{},{}
for key in Model.keys():
    print(key)
    name[key] = ('%s'%Model[key])[:('%s'%Model[key]).find('(')]
    try:
        model = Model[key]
        model.fit(X_train,y_train)    #训练模型
        score[key] = model.score(X_test,y_test)    #记录模型评分
        predict[key] = model.predict(X_test)    #记录模型预测结果
    except:
        pass
score = pd.Series(score).sort_values(axis=0,ascending=False).astype(np.float16)    #根据模型拟合效果进行排序
for index in score.index:    # 显示模型结果
    print('%s: %s, accuracy: %.2f'%(index,name[index],score[index]))

3)模型评估

分类模型

回归模型

4)欠拟合和过拟合

你可能感兴趣的:(数据挖掘)