近期在做预测,之前完全没有接触过,编程语言python也是临时找的简单教程学习的。
昨天开始在弄sklearn,然后就各种google,找到很多blog,而且sklearn主页也有很详尽的介绍,但是,时间啊,我木有太多时间来学习。介于此,我把自己的能快速运用sklearn的学习过程讲一下。。。。。
首先有train_data(训练数据),train_target(训练数据的真实结果),test_data(测试数据),test_target(测试数据的真实结果,用来检测预测的正确性)
# 根据身高,体重,男女(1,2)预测此人胖否(0,1)
# 案例是我自己随意写的,当然符合实际情况
# 定义train_data,test_data二维数组[[height,weight,sex],···],train_target一维数组[胖否]
train_data = [[160, 60, 1], [155, 80, 1], [178, 53, 2], [158, 53, 2], [166, 45, 2], [170, 50, 2], [156, 56, 2],
[166, 50, 1], [175, 55, 1], [188, 68, 1], [159, 41, 2], [166, 70, 1], [175, 85, 1], [188, 98, 1],
[159, 61, 2]]
train_target = [1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1]
test_data = [[166, 45, 2], [172, 52, 1], [156, 60, 1], [150, 70, 2]]
test_target = [0, 0, 1, 1]
预测主要用两个函数 fit() 和 predict()
几种常用方法介绍:
SVM
from sklearn import svm
clf = svm.SVC()
clf.fit(train_data, train_target)
result = clf.predict(test_data)
print type(result) # 转成list 用 result.tolist()
print result # [0 1 1 1]
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression().fit(train_data, train_target)
result = clf.predict_proba(test_data)
print result
# [[ 0.95138903 0.04861097]
# [ 0.85670921 0.14329079]
# [ 0.18763392 0.81236608]
# [ 0.01270012 0.98729988]]
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(train_data, train_target)
result = clf.predict(test_data)
print result # [0 0 0 1]
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB().fit(train_data,train_target)
result = gnb.predict(test_data)
print result #[0 0 1 1]
综上Naive Bayes和Logistic Regression在此例中预测是准确的,但是本实例数据量较少,请酌情考虑