lightGBM算法

lightGBM

  • 微软出品

  • 优点:对xgboost进行了优化

    • 训练速度非常快
    • 内存消耗非常低
    • 准确率非常高
    • 并发和支持GPU加速
    • 能直接处理缺失值
    • 能处理庞大体量的数据
  • fit参数

    • eval_set:在模型每次迭代时查看进行验证的分数
    • early_stopping_rounds=50:模型50个迭代内发现验证的分数没有增长就不再迭代
    • verbose=30:每30个迭代显示一次分值
  • 重要属性

    • best_iteration_:在整个迭代中的最优迭代次数
    • feature_importances_:返回特征重要性
    • feature_names:特征名称
    • num_iteration:迭代次数
  • 重要模型参数

    • subsample:抽取样本比例
    • learning_rate:学习速率
    • boosting_type:
      • gbdt: traditional Gradient Boosting Decision Tree.
      • ‘dart’:Dropouts meet Multiple Additive Regression Trees.
      • ‘rf’:Random Forest.
    • n_estimators
  • 其他参数

lightGBM算法_第1张图片
lightGBM算法_第2张图片
lightGBM算法_第3张图片
lightGBM算法_第4张图片

lightGBM算法_第5张图片

示例

import lightgbm as lgb
import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
cancer=load_breast_cancer()
feature=cancer.data
target=cancer.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,random_state=2020)
lgb_model=lgb.LGBMClassifier(n_estimators=150)
lgb_model.fit(x_train,y_train)
y_pred=lgb_model.predict_proba(x_test)[:,1]
roc_auc_score(y_test,y_pred)  #0.9940023990403839

lgb_model=lgb.LGBMClassifier(n_estimator=150)
#eval_set在模型每次迭代时进行验证的分数查看
#early_stopping_rounds=50,模型50个迭代内发现验证的分数没有增长就不在迭代
#verbose=30,表示每30个迭代显示一次分值
#eval_set在模型每次迭代时进行验证的分数查看
lgb_model = lgb.LGBMClassifier(n_estimators=150)
lgb_model.fit(x_train,y_train,eval_set=[(x_test,y_test)],eval_metric='auc')
# [1]	valid_0's auc: 0.97451	valid_0's binary_logloss: 0.610631
# [2]	valid_0's auc: 0.977109	valid_0's binary_logloss: 0.545753
#...

#early_stopping_rounds=50模型50个迭代内发现验证分数没有增长就不再迭代了
lgb_model = lgb.LGBMClassifier(n_estimators=150)
lgb_model.fit(x_train,y_train,eval_set=[(x_test,y_test)],eval_metric='auc',early_stopping_rounds=50)
# #[1]	valid_0's auc: 0.97451	valid_0's binary_logloss: 0.610631
# Training until validation scores don't improve for 50 rounds
# [2]	valid_0's auc: 0.977109	valid_0's binary_logloss: 0.545753
# [3]	valid_0's auc: 0.978709	valid_0's binary_logloss: 0.488691
#...

#重要属性
#verbose=30,表示每30个迭代显示一次分值
lgb_model.fit(x_train,y_train,eval_set=[(x_test,y_test)],eval_metric='auc',early_stopping_rounds=50,verbose=30)
# Training until validation scores don't improve for 50 rounds
# [30]	valid_0's auc: 0.992003	valid_0's binary_logloss: 0.116233
# [60]	valid_0's auc: 0.989804	valid_0's binary_logloss: 0.0967597
# Early stopping, best iteration is:
# [37]	valid_0's auc: 0.992603	valid_0's binary_logloss: 0.0995741

#best_iteration_表示在整个迭代中的最优迭代次数
lgb_model.best_iteration_  #66
#返回特征重要性
lgb_model.feature_importances_
# array([ 24, 121,  29,  15,  25,  20,  26,  67,  18,  50,  16,  17,  10,
#         33,  31,  42,  24,  41,  41,  31,  51, 104,  64,  49,  35,  24,
#         43,  69,  42,  15], dtype=int32)
#特征名字
cancer.feature_names
# array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
#        'mean smoothness', 'mean compactness', 'mean concavity',
#        'mean concave points', 'mean symmetry', 'mean fractal dimension',
#        'radius error', 'texture error', 'perimeter error', 'area error',
#        'smoothness error', 'compactness error', 'concavity error',
#        'concave points error', 'symmetry error',
#        'fractal dimension error', 'worst radius', 'worst texture',
#        'worst perimeter', 'worst area', 'worst smoothness',
#        'worst compactness', 'worst concavity', 'worst concave points',
#        'worst symmetry', 'worst fractal dimension'], dtype='

#根据最优迭代次数进行预测
lgb_model.predict(x_test,num_iteration=lgb_model.best_iteration_)

你可能感兴趣的:(机器学习笔记)