Sklearn 实例

广义线性模型

广义线性模型有:最小二乘回归、感知机、逻辑回归、岭回归,贝叶斯回归等,由 sklearn.linear_model 模块导入。对于广义线性模型而言,即通过拟合线性函数(下图)去完成样本分类或回归预测。

  • 线性回归
# -*- coding: utf-8 -*
from sklearn import linear_model
model = linear_model.LinearRegression() # 调用最小二乘回归方法
model.fit ([[0, 0], [1, 1], [2, 2]], [1, 2, 3]) # 模型拟合
print model.coef_
print model.intercept_
print model.predict([3, 3])
  • 决策树实战项目-鸢尾花分类
# -*- coding: utf-8 -*-
from sklearn import datasets
from sklearn import cross_validation
from sklearn import tree
from sklearn import metrics

iris = datasets.load_iris()
iris_X = iris.data
iris_Y = iris.target

feature_train, feature_test, target_train, target_test = 
cross_validation.train_test_split(iris_X, iris_Y,test_size = 0.33,
random_state = 42)

# 所以参数均置为默认状态
dt_model = tree.DecisionTreeClassifier()
dt_model.fit(feature_train, target_train)
predict_result = dt_model.predict(feature_test)

print predictpredict_result
# 评估计算方法查看预测结果的准确度
print metrics.accuracy_score(predict_result,target_test)

#采用model自带的评估函数
scores = dt_model.score(feature_test, target_test)
print scores
  • Kmeans

$ wget http://labfile.oss.aliyuncs.com/courses/880/cluster_data.csv 获取数据集

import pandas as pd  # 导入数据处理模块
import matplotlib.pyplot as plt #导入绘图模块
from sklearn import cluster

# 导入数据文件
file = pd.read_csv("cluster_data.csv", header = 0)
print file
X = file['x'] #定义横坐标
y = file['y'] #定义纵坐标

#plt.scatter(X,y) #绘制散列点
#plt.show() #显示图

kn_model = cluster.k_means(file,n_clusters = 3)
cluster_centers = model[0] # 聚类中心数组
cluster_labels = model[1] # 聚类标签数组
plt.scatter(X, y, c=cluster_labels) # 绘制样本并按聚类标签标注颜色
# 绘制聚类中心点,标记成五角星样式,以及红色边框
for center in cluster_centers:
    plt.scatter(center[0], center[1], marker="p", edgecolors="red")
plt.show() # 显示图

参考文献
1.sklearn 官网

你可能感兴趣的:(Sklearn 实例)