(三)XGBoost数据接口

import xgboost as xgb
#数据接口:
(1)逗号分隔值(CSV)文件
(2)NumPy 2D阵列
(3)XGBoost二进制缓冲区文件
#1.将CSV文件加载到DMatrix(train.csv是文件名,第0列是lable)
# label_column specifies the index of the column containing the true label
dtrain = xgb.DMatrix('train.csv?format=csv&label_column=0')
[19:35:57] 11x2 matrix with 22 entries loaded from train.csv?format=csv&label_column=0
#2.将NumPy数组加载到DMatrix
data = np.random.rand(5, 10)  # 5 entities, each contains 10 features
label = np.random.randint(2, size=5)  # binary target
dtrain = xgb.DMatrix(data, label=label)
#保存DMatrix到XGBoost二进制文件将使加载更快
dtrain.save_binary('train.buffer')
#缺少的值可以替换为DMatrix构造函数中的默认值
dtrain = xgb.DMatrix(data, label=label, missing=-999.0)
#可在需要时设置权重(指的是每个lable预测时候的权重)
w = np.random.rand(10, 1)
dtrain = xgb.DMatrix(data, label=label, missing=-999.0, weight=w)

你可能感兴趣的:(XGBoost学习)