Xgboost全名叫极端梯度提升树
1.最优模型的构建方法
构建最优模型的一般方法就是最小化训练数据的损失函数
2.目标函数
3.CART树
4.树的复杂度
5.树的分裂(回归树构建方法)
6.Xgboost与GDBT的区别
1.参数介绍
1.1通用参数
1.2Booster参数
1.3学习目标参数
# @XST1520203418
# 要天天开心呀
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import train_test_split
"""
步骤:
1.获取数据
2.数据基本处理
确定特征值,目标值
缺失值处理
数据集划分
3.特征工程(字典特征抽取)
4.机器学习(xgboost)
5.模型评估
"""
# 获取数据
data = pd.read_csv("D:\ProgramData\机器学习\数据\Titanictrain.csv")
print(data.head(1).T)
# 数据基本处理
x = data[["Pclass","Age","Sex"]]
y = data["Survived"]
x["Age"].fillna(x["Age"].mean(),inplace=True)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=22)
# 特征工程
dict = DictVectorizer(sparse=False)
x_train = dict.fit_transform(x_train.to_dict(orient="records"))
x_test = dict.fit_transform(x_test.to_dict(orient="records"))
# xgboost模型训练和模型评估
from xgboost import XGBClassifier
# 模型初步训练
xg = XGBClassifier()
xg.fit(x_train,y_train)
xg.score(x_test,y_test)
# 针对max_depth进行模型调优
depth_range = range(10)
score = []
for i in depth_range:
xg = XGBClassifier(eta=1,gamma=0,max_depth=i)
xg.fit(x_train,y_train)
s = xg.score(x_test,y_test)
print(s)
score.append(s)
# 结果可视化
import matplotlib.pyplot as p
p.plot(depth_range,score)
p.show()
1.基于直方图的决策树算法
2.lightgbm的直方图做差加速
3.带深度限制的Leaf-wise的叶子生长策略
4.直接支持类别特征
5.直接支持高效并行
1.参数介绍:
1.1 Control Parameters
1.2 Core Parameters
1.3 IO parameter
2.调参建议
# @XST1520203418
# 要天天开心呀
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
import lightgbm as lgb
# 加载数据,对数据进行基本处理
iris = load_iris()
data = iris.data # 特征值
target = iris.target # 目标值
x_train,x_test,y_train,y_test = train_test_split(data,target,test_size=0.2)
# 模型训练
gbm = lgb.LGBMRegressor(objective='regression',learning_rate=0.05,n_estimators=20)
gbm.fit(x_train,y_train,eval_set=(x_test,y_test),eval_metric='l1',early_stopping_rounds=10)
print(gbm.score(x_test,y_test))
# 网格搜索,优化参数
estimator = lgb.LGBMRegressor(num_leaves=31)
parm_grid = {"learning_rate":[0.01,0.1,1],"n_estimators":[20,30,40]}
gbm = GridSearchCV(estimator,parm_grid,cv=4)
gbm.fit(x_train,y_train)
print("best:",gbm.best_params_) # best: {'learning_rate': 0.1, 'n_estimators': 40}
# 模型调优训练
gbm = lgb.LGBMRegressor(num_leaves=31,learning_rate=0.1,n_estimators=40)
gbm.fit(x_train,y_train)
print("准确值:",gbm.score(x_test,y_test))