调参感悟

1. GridSearchCV

注意这边有一个坑,样本划分方法不是KFold,而是Stratified KFold

 

我的朋友写了一个sample generator来解决这个问题:

from sklearn.model_selection import KFold
myCV = []
for train_index, test_index in KFold(5,shuffle=True).split(train[train['installment']==1]):
    myCV.append( (train_index, test_index) )

cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=xgb1.get_params()['n_estimators'], folds=myCV,feval=KS, early_stopping_rounds=50, show_stdv =False)

myCV是generator

然后在参数里把folds设为mycv就行了

摘录她的聊天记录。这里我还没有检验过。

另外gridsearch中的scoring函数可以传入自定义函数。如果是希望ks达到最大,可以这样写ks函数:

from scipy.stats import ks_2samp
get_ks = lambda y_pred, y_true: ks_2samp(y_pred[y_true==1], y_pred[y_true!=1]).statistic
get_ks_for_grid = lambda estimators, X, y: ks_2samp((estimators.predict_proba(X)[:,0])[pd.DataFrame(y)==1], estimators.predict_proba(X)[:,0][pd.DataFrame(y)==0]).statistic
get_ks_for_grid = lambda estimators, X, y: get_ks(estimators.predict_proba(X)[:,0], y)

 

2. RandomSearchCV

TBC.

你可能感兴趣的:(调参感悟)