【机器学习】:Xgboost/LightGBM使用与调参技巧

机器学习模型当中,目前最为先进的也就是xgboost和lightgbm这两个树模型了。那么我们该如何进行调试参数呢?哪些参数是最重要的,需要调整的,哪些参数比较一般,这两个模型又该如何通过代码进行调用呢?下面是一张总结了xgboost,lightbgm,catboost这三个模型调试参数的一些经验,以及每个参数需要的具体数值以及含义,供大家参考:

一.Xgboost配合grid search进行网格搜索参数 

实现代码如下:

mport xgboost as xgb
from sklearn import metrics
from sklearn.model_selection import GridSearchCV

def auc(m, train, test): 
    return (metrics.roc_auc_score(y_train, m.predict_proba(train)[:,1]),
                            metrics.roc_auc_score(y_test, m.predict_proba(test)[:,1]))

# Parameter Tuning
model = xgb.XGBClassifier()
param_dist = {"max_depth": [10,30,50],
              "min_child_weight" : [1,3,6],
              "n_estimators": [200],
              "learning_rate": [0.05, 0.1,0.16],}
grid_search = GridSearchCV(model, param_grid=param_dist, cv = 3, 
                                   verbose=10, n_jobs=-1)
grid_search.fit(train, y_train)

grid_search.best_estimator_

model = xgb.XGBClassifier(max_depth=3, min_child_weight=1,  n_estimators=20,\
                          n_jobs=-1 , verbose=1,learning_rate=0.16)
model.fit(train,y_train)

print(auc(model, train, test))

这里使用了自定义的auc作为模型的评价指标,输出如下:

Fitting 3 folds for each of 27 candidates, totalling 81 fits
(0.7479275227922775, 0.7430946047035487)

二.LightGBM配合grid search进行网格搜索参数

代码如下:

import lightgbm as lgb
from sklearn import metrics

def auc2(m, train, test): 
    return (metrics.roc_auc_score(y_train,m.predict(train)),
                            metrics.roc_auc_score(y_test,m.predict(test)))

lg = lgb.LGBMClassifier(silent=False)
param_dist = {"max_depth": [25,50, 75],
              "learning_rate" : [0.01,0.05,0.1],
              "num_leaves": [300,900,1200],
              "n_estimators": [200]
             }
grid_search = GridSearchCV(lg, n_jobs=-1, param_grid=param_dist, cv = 3, 
                           scoring="roc_auc", verbose=5)
grid_search.fit(train,y_train)
grid_search.best_estimator_

d_train = lgb.Dataset(train, label=y_train, free_raw_data=False)
params = {"max_depth": 3, "learning_rate" : 0.1, "num_leaves": 900,  "n_estimators": 20}

# Without Categorical Features
model2 = lgb.train(params, d_train)
print(auc2(model2, train, test))

#With Catgeorical Features
cate_features_name = ["MONTH","DAY","DAY_OF_WEEK","AIRLINE","DESTINATION_AIRPORT",
                 "ORIGIN_AIRPORT"]
model2 = lgb.train(params, d_train, categorical_feature = cate_features_name)
print(auc2(model2, train, test))

这就是Xgboost/LightGBM的基本代码使用啦!

你可能感兴趣的:(python,机器学习,深度学习,tensorflow,java)