CatBoost参数解释和实战

据开发者所说超越Lightgbm和XGBoost的又一个神器,不过具体性能,还要看在比赛中的表现了。

整理一下里面简单的教程和参数介绍,很多参数不是那种重要,只解释部分重要的参数,训练时需要重点考虑的。

Quick start

CatBoostClassifier

import numpy as np
import catboost as cb

train_data = np.random.randint(0, 100, size=(100, 10))
train_label = np.random.randint(0, 2, size=(100))
test_data = np.random.randint(0,100, size=(50,10))

model = cb.CatBoostClassifier(iterations=2, depth=2, learning_rate=0.5, loss_function='Logloss',
                              logging_level='Verbose')
model.fit(train_data, train_label, cat_features=[0,2,5])
preds_class = model.predict(test_data)
preds_probs = model.predict_proba(test_data)
print('class = ',preds_class)
print('proba = ',preds_probs)

参数

CatBoostClassifier/CatBoostRegressor

通用参数

  • learning_rate(eta)=automatically

  • depth(max_depth)=6: 树的深度

  • l2_leaf_reg(reg_lambda)=3 L2正则化系数

  • n_estimators(num_boost_round)(num_trees=1000)=1000: 解决ml问题的树的最大数量

  • one_hot_max_size=2: 对于某些变量进行one-hot编码

  • loss_function=‘Logloss’:

RMSE
Logloss
MAE
CrossEntropy
  • custom_metric=None
RMSE
Logloss
MAE
CrossEntropy
Recall
Precision
F1
Accuracy
AUC
R2
  • eval_metric=Optimized objective
RMSE
Logloss
MAE
CrossEntropy
Recall
Precision
F1
Accuracy
AUC
R2
  • nan_mode=None:处理NAN的方法
Forbidden 
Min
Max
  • leaf_estimation_method=None:迭代求解的方法,梯度和牛顿
Newton
Gradient
  • random_seed=None: 训练时候的随机种子

性能参数

  • thread_count=-1:训练时所用的cpu/gpu核数
  • used_ram_limit=None:CTR问题,计算时的内存限制
  • gpu_ram_part=None:GPU内存限制

处理单元设置

  • task_type=CPU:训练的器件

  • devices=None:训练的GPU设备ID

  • counter_calc_method=None,

  • leaf_estimation_iterations=None,

  • use_best_model=None,

  • verbose=None,

  • model_size_reg=None,

  • rsm=None,

  • logging_level=None,

  • metric_period=None,

  • ctr_leaf_count_limit=None,

  • store_all_simple_ctr=None,

  • max_ctr_complexity=None,

  • has_time=None,

  • classes_count=None,

  • class_weights=None,

  • random_strength=None,

  • name=None,

  • ignored_features=None,

  • train_dir=None,

  • custom_loss=None,

  • bagging_temperature=None

  • border_count=None

  • feature_border_type=None,

  • save_snapshot=None,

  • snapshot_file=None,

  • fold_len_multiplier=None,

  • allow_writing_files=None,

  • final_ctr_computation_mode=None,

  • approx_on_full_history=None,

  • boosting_type=None,

  • simple_ctr=None,

  • combinations_ctr=None,

  • per_feature_ctr=None,

  • device_config=None,

  • bootstrap_type=None,

  • subsample=None,

  • colsample_bylevel=None,

  • random_state=None,

  • objective=None,

  • max_bin=None,

  • scale_pos_weight=None,

  • gpu_cat_features_storage=None,

  • data_partition=None

CatBoostClassifier

属性(attribute):

  • is_fitted_
  • tree_count_
  • feature_importances_
  • random_seed_

方法(method):

fit

  • X: 输入数据数据类型可以是,list; pandas.DataFrame; pandas.Series

  • y=None

  • cat_features=None: 拿来做处理的类别特征

  • sample_weight=None: 输入数据的样本权重

  • logging_level=None: 控制是否输出日志信息,或者何种信息

  • plot=False: 训练过程中,绘制,度量值,所用时间等

  • eval_set=None: 验证集合,数据类型list(X, y)tuples

  • baseline=None

  • use_best_model=None

  • verbose=None

predict

返回验证样本所属类别,数据类型为np.array

predict_proba

返回验证样本所属类别的概率,数据类型为np.array

get_feature_importance

eval_metrics

save_model

load_model

get_params

score

教程(tutorial)

CatBoost参数解释和实战_第1张图片

你可能感兴趣的:(人工智能,python,机器学习)