关于AdaBoost有很多文章,不多说了:
https://www.cnblogs.com/ScorpioLu/p/8295990.html
简单来说就是对数据集取不同的分类做成弱分类器,然后加起来作为强分类器
刚接触python和numpy不久,用python和iris数据集简单实现这个算法,作为学习笔记记录下来以供以后查阅
a. 数据处理
简单的把后面两个分类合成了一个,于是只剩下两个种类1和-1
iris = load_iris()
iris.target[iris.target > 0] = 1
iris.target[iris.target == 0] = -1
b. 循环生成弱分类器
def adaBoost(dataset,target,weekClassifierNo=4):
shape = np.shape(dataset)
recordNo = shape[0]
weights = np.ones(recordNo)/recordNo
featureNo = shape[1]
weekClassifiers = {}
for classifierSeq in range(weekClassifierNo):
#循环寻找好的划分
featureValue, alpha, newWeights = findBestCut(dataset, target, weights)
if len(newWeights) == 0:
break
weekClassifiers["WClassifier" + str(classifierSeq)] = [featureNo, featureValue, alpha]
weights = newWeights
return weekClassifiers
c. 具体的划分过程
参数是整个特征集,目标划分,对应的权重
findBestCut(dataset, target, weights)
对每种特征值做循环,依次取特征值v,>= v的预测为1, 得到的resultRow和原来的targetRow做对比,如果不一样那么说明划分错误,乘以相应的权值作为error 如果error小于0.5,说明这是个合格的弱分类器G,记录下相应的弱分类器权重W和划分值以及划分值所在的特征组并返回 d. 组合结果 最后的结果就是不同的弱分类器*权重加起来: 全部代码: for featureSeq in range(featureNo):
featureRow = dataset[:, featureSeq]
for featureIndex in range(len(featureRow)):
featureValue = featureRow[featureIndex]
resultRow = featureRow.copy()
for index in range(len(resultRow)):
if resultRow[index] >= featureValue:
resultRow[index] = 1
else:
resultRow[index] = -1
error = np.dot(np.multiply(resultRow, target), weights.T)
if error < 0.5:
alpha = calculateAlpha(error)
print('alpha is :' + str(alpha))
for weightNo in range(len(weights)):
g = learn(featureRow[weightNo], alpha, featureValue)
newWeights.append(calculateWeight(alpha, weights[weightNo], target[weightNo], g))
newWeights = newWeights / sum(newWeights)
# print('newWeights is:')
# print(newWeights)
return featureValue, alpha, newWeights
weekClassifiers = adaBoost(iris.data, iris.target, 50)
比如下面的就是
G0(x) = 第4个特征值(x) >= 4.7, 取1,如果小于4.7,取-1,结果乘以权重0.09360577104407318
再把后面四个分类器结果都加起来,如果最后结果大于0,那么预测分类就是1,否则就是-1
计算结果:
{‘WClassifier0’: [4, 4.7, 0.09360577104407318], ‘WClassifier1’: [4, 4.9, 0.03509886861946353], ‘WClassifier2’: [4, 4.9, 0.09009038757777413], ‘WClassifier3’: [4, 4.9, 0.24307972442062353], ‘WClassifier4’: [4, 5.1, 0.1719016528852874], ‘WClassifier5’: [4, 5.1, 0.5145727335107229]}import numpy as np
#Adaboost分类
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier
def calculateAlpha(error):
if error == 0:
return 0
else:
return (1/2) * (np.log((1-error)/error))
def calculateWeight(alpha, originalWeight, y, g):
weight = originalWeight * (np.e ** (-alpha * y * g))
return weight
def sign(value):
if value > 0:
return 1
else:
return -1
def learn(x, alpha, featureValue):
if x >= featureValue:
return sign(alpha * 1)
else:
return sign(alpha * -1)
def findBestCut(dataset, target, weights):
shape = np.shape(dataset)
featureNo = shape[1]
newWeights = []
for featureSeq in range(featureNo):
featureRow = dataset[:, featureSeq]
for featureIndex in range(len(featureRow)):
featureValue = featureRow[featureIndex]
resultRow = featureRow.copy()
for index in range(len(resultRow)):
if resultRow[index] >= featureValue:
resultRow[index] = 1
else:
resultRow[index] = -1
error = np.dot(np.multiply(resultRow, target), weights.T)
if error < 0.5:
alpha = calculateAlpha(error)
print('alpha is :' + str(alpha))
for weightNo in range(len(weights)):
g = learn(featureRow[weightNo], alpha, featureValue)
newWeights.append(calculateWeight(alpha, weights[weightNo], target[weightNo], g))
newWeights = newWeights / sum(newWeights)
return featureValue, alpha, newWeights
return None, None, newWeights
# newWeights = newWeights/sum(newWeights)
#
# return featureValue, alpha, newWeights
def adaBoost(dataset,target,weekClassifierNo=4):
shape = np.shape(dataset)
recordNo = shape[0]
weights = np.ones(recordNo)/recordNo
featureNo = shape[1]
weekClassifiers = {}
for classifierSeq in range(weekClassifierNo):
featureValue, alpha, newWeights = findBestCut(dataset, target, weights)
if len(newWeights) == 0:
break
weekClassifiers["WClassifier" + str(classifierSeq)] = [featureNo, featureValue, alpha]
weights = newWeights
return weekClassifiers
iris = load_iris()
iris.target[iris.target > 0] = 1
iris.target[iris.target == 0] = -1
weekClassifiers = adaBoost(iris.data, iris.target, 50)
print(weekClassifiers)