python.MLScikit-learn各种模块导入(附代码)

学了本文你能学到什么?仅供学习,如有疑问,请留言。。。

 

# -*- coding: utf-8 -*-
# Author       :   szy
# Create Date  :   2019/10/15


# 深入浅析Python 中的sklearn模型选择
notebooke= """
一.主要功能如下:
    1.classification分类
    2.Regression回归
    3.Clustering聚类
    4.Dimensionality reduction降维
    5.Model selection模型选择
    6.Preprocessing预处理

二.主要模块分类:
    1.sklearn.base: Base classes and utility function基础实用函数
    2.sklearn.cluster: Clustering聚类
    3.sklearn.cluster.bicluster: Biclustering 双向聚类
    4.sklearn.covariance: Covariance Estimators 协方差估计
    5.sklearn.model_selection: Model Selection 模型选择
    6.sklearn.datasets: Datasets 数据集
    7.sklearn.decomposition: Matrix Decomposition 矩阵分解
    8.sklearn.dummy: Dummy estimators 虚拟估计
    9.sklearn.ensemble: Ensemble Methods 集成方法
    10.sklearn.exceptions: Exceptions and warnings 异常和警告
    11.sklearn.feature_extraction: Feature Extraction 特征抽取
    12.sklearn.feature_selection: Feature Selection 特征选择
    13。sklearn.gaussian_process: Gaussian Processes 高斯过程
    14.sklearn.isotonic: Isotonic regression 保序回归
    15.sklearn.kernel_approximation: Kernel Approximation 核 逼近
    16.sklearn.kernel_ridge: Kernel Ridge Regression 岭回归ridge
    17.sklearn.discriminant_analysis: Discriminant Analysis 判别分析
    18.sklearn.linear_model: Generalized Linear Models 广义线性模型
    19.sklearn.manifold: Manifold Learning 流形学习
    20.sklearn.metrics: Metrics 度量 权值
    21.sklearn.mixture: Gaussian Mixture Models 高斯混合模型
    22.sklearn.multiclass: Multiclass and multilabel classification 多等级标签分类
    23.sklearn.multioutput: Multioutput regression and classification 多元回归和分类
    24.sklearn.naive_bayes: Naive Bayes 朴素贝叶斯
    25.sklearn.neighbors: Nearest Neighbors 最近邻
    26.sklearn.neural_network: Neural network models 神经网络
    27.sklearn.calibration: Probability Calibration 概率校准
    28.sklearn.cross_decomposition: Cross decomposition 交叉求解
    29.sklearn.pipeline: Pipeline 管道
    30.sklearn.preprocessing: Preprocessing and Normalization 预处理和标准化
    31.sklearn.random_projection: Random projection 随机映射
    32.sklearn.semi_supervised: Semi-Supervised Learning 半监督学习
    33.sklearn.svm: Support Vector Machines 支持向量机
    34.sklearn.tree: Decision Tree 决策树
    35.sklearn.utils: Utilities 实用工具
"""
#加载数据(Data Loading)
import numpy as np
import urllib
import requests
# url with dataset
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
# download the file
raw_data = requests.get(url)
# load the CSV file as a numpy matrix
dataset = np.loadtxt(raw_data, delimiter=",")
# separate the data from the target attributes
X = dataset[:,0:7]
y = dataset[:,8]
"----------------------------------------------------------------------------------------"


#数据归一化(Data Normalization)
"""
大多数机器学习算法中的梯度方法对于数据的缩放和尺度都是很敏感的,
在开始跑算法之前,我们应该进行归一化或者标准化的过程,
这使得特征数据缩放到0-1范围中。scikit-learn提供了归一化的方法:
"""
from  sklearn import preprocessing
normalized_X = preprocessing.normalize(X)
standardized_X = preprocessing.scale()
"------------------------------------------------------------------------------------"

# 特征选择(Feature Selection)
"""
在解决一个实际问题的过程中,选择合适的特征或者构建特征的能力特别重要。这成为特征选择或者特征工程。
特征选择时一个很需要创造力的过程,更多的依赖于直觉和专业知识,并且有很多现成的算法来进行特征的选择。
下面的树算法(Tree algorithms)计算特征的信息量:

"""
from  sklearn import metrics
from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(X,y)
print(model.feature_importances_)
"-----------------------------------------------------------------------------"

#算法总结
# 01逻辑回归
# 大多数问题都可以归结为二元分类问题。这个算法的优点是可以给出数据所在类别的概率。

from sklearn import  metrics
from  sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X,y)
print(model)
# make predictions
expected = y
predicted = model.predict(X)
# summarize the fit of the model
print(metrics.classification_report(expected,predicted))
print(metrics.confusion_matrix(expected,predicted))
"------------------------------------------------------------------------------"

#02朴素贝叶斯
# 该方法的任务是还原训练样本数据的分布密度,其在多类别分类中有很好的效果。
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X,y)
print(model)
# make predictions
expected = y
predicted = model.predict(X)
# summarize the fit of the model
print(metrics.classification_report(expected,predicted))
print(metrics.confusion_matrix(expected,predicted))
"------------------------------------------------------------------------------"

#03k近邻
from sklearn.neighbors import  KNeighborsClassifier
# fit a k-nearest neighbor model to the data
model = KNeighborsClassifier()
model.fit(X,y)
print(model)
# make predictions
expected = y
predicted = model.predict(X)
# summarize the fit of the model
print(metrics.classification_report(expected,predicted))
print(metrics.confusion_matrix(expected,predicted))
"------------------------------------------------------------------------------"

# 04决策树
# 分类与回归树(Classification and Regression Trees ,CART)算法
# 常用于特征含有类别信息的分类或者回归问题,这种方法非常适用于多分类情况。
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,y)
print(model)
# make predictions
expected = y
predicted = model.predict(X)
# summarize the fit of the model
print(metrics.classification_report(expected,predicted))
print(metrics.confusion_matrix(expected,predicted))
"------------------------------------------------------------------------------"

#05支持向量机
# SVM是非常流行的机器学习算法,主要用于分类问题,
# 如同逻辑回归问题,它可以使用一对多的方法进行多类别的分类。
#SVC 分类 SVR回归
from sklearn.svm import SVC
model = SVC()
model.fit(X,y)
print(model)
# make predictions
expected = y
predicted = model.predict(X)
# summarize the fit of the model
print(metrics.classification_report(expected,predicted))
print(metrics.confusion_matrix(expected,predicted))
"------------------------------------------------------------------------------"

#如何优化算法参数
"""
一项更加困难的任务是构建一个有效的方法用于选择正确的参数,我们需要用搜索的方法来确定参数。
scikit-learn提供了实现这一目标的函数。下面的例子是一个进行正则参数选择的程序
"""
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.grid_search import GridSearchCV
# prepare a range of alpha values to test
alphas = np.array([1,0.1,0.01,0.001,0.0001,0])
# create and fit a ridge regression model, testing each alpha
model = Ridge()
grid = GridSearchCV(estimator=model, param_grid=dict(alpha=alphas))
grid.fit(X, y)
print(grid)
# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_.alpha)
#有时随机从给定区间中选择参数是很有效的方法,然后根据这些参数来评估算法的效果进而选择最佳的那个。
import numpy as np
from scipy.stats import uniform as sp_rand
from sklearn.linear_model import Ridge
from sklearn.grid_search import RandomizedSearchCV
# prepare a uniform distribution to sample for the alpha parameter
param_grid = {'alpha': sp_rand()}
# create and fit a ridge regression model, testing random alpha values
model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
rsearch.fit(X, y)
print(rsearch)
# summarize the results of the random parameter search
print(rsearch.best_score_)
print(rsearch.best_estimator_.alpha)








 

 

你可能感兴趣的:(机器学习)