建立一个集成学习分类器,由于算法需要,其中每一个子分类器训练模型时使用了不同数据集,并引入权重,进行软分类。
sklearn里的VotingClassifier只能对同一数据集训练模型,所以自己写了一个,如有不对的地方欢迎指出。
建立的子分类器一定要定义在循环里。
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
X, y =make_classification(n_classes=2, class_sep=2, weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000)
X1, y1 =make_classification(n_classes=2, class_sep=2, weights=[0.2, 0.8], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000)
X2, y2 =make_classification(n_classes=2, class_sep=2, weights=[0.3, 0.7], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000)
X3, y3 =make_classification(n_classes=2, class_sep=2, weights=[0.4, 0.6], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000)
X4, y4 =make_classification(n_classes=2, class_sep=2, weights=[0.5, 0.5], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000)
H = {}
xx=[]
yy=[]
xx.append(X)
xx.append(X1)
xx.append(X2)
xx.append(X3)
xx.append(X4)
yy.append(y)
yy.append(y1)
yy.append(y2)
yy.append(y3)
yy.append(y4)
for i in range(4):
clf1 = KNeighborsClassifier(n_neighbors=5)
clf1.fit(xx[i],yy[i])
H[i]=clf1
W=[0.1,0.2,0.3,0.4]#权重
P = np.zeros((len(xx[0]), 2))
for i in range(len(H)):
p = H[i].predict_proba(xx[4])
print(p)
t = H[i].predict(xx[4])
print('单独分类器结果acc--------', accuracy_score(t, yy[4]))
p = np.array(p)
p = p * W[i]
P = P + p
P = P/len(H)
x=[]
for i in range(len(P)):
if P[i,1]>P[i,0]:
x.append(1)#1为多数类
else:
x.append(0)
print('集成学习结果' ,accuracy_score(x,yy[4]))
结果:
[[0. 1. ]
[0. 1. ]
[0.6 0.4]
...
[0. 1. ]
[0.6 0.4]
[0.8 0.2]]
单独分类器结果acc-------- 0.441
[[0. 1. ]
[0. 1. ]
[0.6 0.4]
...
[0. 1. ]
[0.2 0.8]
[0.8 0.2]]
单独分类器结果acc-------- 0.49
[[0. 1. ]
[0. 1. ]
[0.4 0.6]
...
[0.2 0.8]
[1. 0. ]
[0.6 0.4]]
单独分类器结果acc-------- 0.342
[[0.4 0.6]
[1. 0. ]
[0.8 0.2]
...
[0.2 0.8]
[0.2 0.8]
[0. 1. ]]
单独分类器结果acc-------- 0.53
集成学习结果 0.442