python实现基于 Adaboost 框架来构建自定义集成模型【自定义基分类器模型】

     sklearn提供了Adaboost等几种常见的集成框架很成熟的实现,在以往的大多数使用场景中,我大都会直接使用默认的基分类器模型,不会对其进行调整设置,其他的几个主要的参数比如:基分类器数量等可能会基于网格调参的形式进行最优化参数的搜索, 下面是sklearn官网里面对adaboost模型的参数定义:

class sklearn.ensemble.AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm=’SAMME.R’, random_state=None)

     从中我们可以看到,base_estimator就是我们说的基分类器模型,使用不同的基分类器模型得到的效果也会有所不同,毕竟模型的构建原理不一样。

     今天主要是实现自己定义一个基分类器模型【这里使用的是sklearn提供的决策树模型】,之后传入Adaboost框架中进行模型的训练计算等工作,具体实现如下:

#!usr/bin/env python
# encoding:utf-8
from __future__ import division
 
'''
__Author__:沂水寒城
功能: 基于 Adaboost 框架来设计模型【自定义基分类器模型】
'''
 
 
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier


#自定义基分类器模型
model=DecisionTreeClassifier()


def loadData(flag=True):
    '''
    加载数据集
    '''
    X,y=[],[]
    if flag:
        with open('data.txt') as f:
            data_list=[one.strip().split(',') for one in f.readlines() if one]
        for i in range(len(data_list)):
            one=data_list[i]
            y.append(int(one.pop(-1)))
            X.append([int(O) for O in one])
    else:
        X, y = make_classification(n_samples=1000, n_features=18,n_informative=2, n_redundant=0,
                                   random_state=0, shuffle=False)
    return X,y


def buildModel(X, y, model):
    '''
    基于 Adaboost 框架来构建自定义模型
    '''
    clf = AdaBoostClassifier(base_estimator=model, n_estimators=100, random_state=0)
    clf.fit(X, y)  
    print clf.feature_importances_  
    print clf.predict([[23,0,16,0,0,0,0,1,0,0,0,0,0,2,24,0,0,0]])
    print clf.score(X, y)  
    return clf


if __name__=='__main__':
    X,y=loadData(flag=True)
    buildModel(X, y, model)

    X,y=loadData(flag=False)
    buildModel(X, y, model)

      上述代码中的data.txt数据内容如下:

31,0,24,0,0,0,0,1,0,0,0,0,0,2,32,0,0,0,0
21,0,14,0,0,0,0,5,2,3,0,0,0,2,26,0,0,0,0
23,0,16,1,0,0,0,24,3,10,0,1,0,3,47,0,0,0,0
22,0,15,0,0,0,0,1,0,0,0,0,0,2,23,0,0,0,0
29,0,22,0,0,0,0,1,0,0,0,0,0,2,30,0,0,0,0
24,0,17,0,0,0,0,18,3,9,0,0,0,2,42,1,1,0,0
26,0,19,0,0,0,0,1,0,0,0,0,0,2,27,0,0,0,0
24,0,17,0,0,0,0,32,3,12,0,1,0,3,56,0,0,0,0
26,0,19,1,0,0,0,12,1,11,0,0,1,3,38,0,0,0,0
24,0,17,0,0,0,0,23,3,14,0,0,0,2,47,0,0,0,0
35,0,28,1,0,0,0,12,2,10,0,0,0,2,47,0,0,0,0
23,0,16,0,0,0,0,1,0,0,0,0,0,2,24,0,0,0,0
22,0,15,0,0,0,0,1,0,0,0,0,0,2,23,0,0,0,0
25,0,18,0,0,1,0,1,0,0,0,0,0,2,26,1,0,0,1
24,0,17,0,0,0,0,1,0,0,0,0,0,2,25,0,0,0,1
23,0,16,0,0,0,0,1,0,0,0,0,0,2,24,0,0,0,1
29,0,22,0,0,0,0,1,0,0,0,0,0,2,30,0,0,0,1
27,0,20,0,0,0,0,1,0,0,0,0,0,2,28,0,0,0,1
32,0,25,0,0,0,0,1,0,0,0,0,0,2,33,0,0,0,1
24,0,17,0,0,0,0,1,0,0,0,0,0,2,25,0,0,0,1
26,0,19,0,0,0,0,1,0,0,0,0,0,2,27,0,0,0,1

      感兴趣的话可以拿去玩玩。

      我们提供了两种不同的数据集加载形式,flag为True时加载本地的数据集,flag为False时随机生成数据集,得到的输出结果如下所示:


[1.57632184e-001 0.00000000e+000 1.86172414e-001 0.00000000e+000
 0.00000000e+000 6.66666730e-004 0.00000000e+000 4.41379310e-003
 4.66666663e-003 2.36364253e-128 0.00000000e+000 0.00000000e+000
 0.00000000e+000 0.00000000e+000             nan 0.00000000e+000
 0.00000000e+000 0.00000000e+000]
[1]
0.8571428571428571
[0.04933692 0.86155671 0.00374627 0.00760245 0.0039491  0.00200001
 0.00576473 0.00580435 0.00363638 0.00875166 0.         0.00962266
 0.00395545 0.00640003 0.0095751  0.01061542 0.00768277 0.        ]
[1]
1.0
[Finished in 1.3s]

      简单的一个实践,如果想使用其他的基分类器模型比如:贝叶斯模型、支持向量机模型的话都可以去尝试一下,不过我在尝试使用贝叶斯模型的时候Adaboost框架是报错的,说没有相关的属性,这个不知道是我调用的问题还是sklearn实现的时候对应的模型没有提供相应的方法导致的,后续有时间继续研究吧!

你可能感兴趣的:(算法,编程技术,机器学习)