sljwy

结合不同的模型进行集成学习

Python Machine Learning 2nd Edition by Sebastian Raschka, Packt Publishing Ltd. 2017

Code Repository: https://github.com/rasbt/python-machine-learning-book-2nd-edition

Code License: MIT License

文章目录

Learning with ensembles
Combining classifiers via majority vote

Implementing a simple majority vote classifier
Using the majority voting principle to make predictions

Evaluating and tuning the ensemble classifier
Bagging -- Building an ensemble of classifiers from bootstrap samples

Bagging in a nutshell
Applying bagging to classify samples in the Wine dataset

Leveraging weak learners via adaptive boosting

How boosting works
Applying AdaBoost using scikit-learn

Summary

Note that the optional watermark extension is a small IPython notebook plugin that I developed to make the code reproducible. You can just skip the following line(s).

%load_ext watermark
%watermark -a "Sebastian Raschka" -u -d -v -p numpy,pandas,matplotlib,scipy,sklearn

Sebastian Raschka 
last updated: 2017-07-22 

CPython 3.6.1
IPython 6.0.0

numpy 1.13.1
pandas 0.20.2
matplotlib 2.0.2
scipy 0.19.1
sklearn 0.19b2

The use of watermark is optional. You can install this IPython extension via “pip install watermark”. For more information, please see: https://github.com/rasbt/watermark.

第七章. 结合不同的模型进行组合学习

在上一章中，我们重点介绍了调整和评估不同分类模型的最佳实践。在这一章中，我们将在这些技术的基础上，探索构建一个分类器集合的不同方法，这些分类器的预测性能往往比其单个成员的预测性能更好。我们将学习如何做到以下几点。基于多数人投票进行预测使用袋装法，通过重复抽取训练集的随机组合来减少过度拟合应用升压技术，从弱学习器中建立强大的模型，并从错误中学习

协同学习

集合方法的目标是将不同的分类器组合成一个元分类器，它比每个单独的分类器具有更好的概括性能。例如，假设我们收集了10个专家的预测，合奏方法可以让我们有策略地将这10个专家的预测组合在一起，得出一个比每个专家的预测更准确、更稳健的预测。正如我们将在本章后面看到的那样，有几种不同的方法来创建一个分类器的集合。在这一节中，我们将介绍一个基本的感知，了解聚类器是如何工作的，以及为什么聚类器通常被认为能产生良好的泛化性能。在本章中，我们将重点介绍使用多数投票原则的最流行的合集方法。多数投票简单来说，就是指我们选择被大多数分类器预测的类标签，也就是获得了50%以上的选票。严格来说，多数投票原则仅指二元类设置。但是，很容易将多数投票原则概括为多类设置，也就是所谓的复数投票。在这里，我们选择获得票数最多的类标签（模式）。下图展示了一个由10个分类器组成的集合，其中每个独特的符号（三角形、正方形和圆形）代表一个独特的类标签。

使用训练集，我们首先训练m个不同的分类器（）。根据技术的不同，我们可以从不同的分类器中建立一个集合。
决策树、支持向量机、逻辑回归分类器等算法。另外，我们还可以使用相同的基础分类算法，拟合训练集的不同子集。这种方法的一个突出的例子是随机森林算法，它结合了不同的决策树分类器。下图说明了使用多数投票的一般集合方法的概念。

from IPython.display import Image
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

Learning with ensembles

Image(filename='images/07_01.png', width=500)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dnm2rtF1-1586785355019)(output_13_0.png)]

Image(filename='images/07_02.png', width=500)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JHCxUIqm-1586785355022)(output_14_0.png)]

#from scipy.misc import comb
#scipy.special
from scipy.special import comb
import math

def ensemble_error(n_classifier, error):
    k_start = int(math.ceil(n_classifier / 2.))
    probs = [comb(n_classifier, k) * error**k * (1-error)**(n_classifier - k)
             for k in range(k_start, n_classifier + 1)]
    return sum(probs)

ensemble_error(n_classifier=11, error=0.25)

0.03432750701904297

import numpy as np

error_range = np.arange(0.0, 1.01, 0.01)
ens_errors = [ensemble_error(n_classifier=11, error=error)
              for error in error_range]

import matplotlib.pyplot as plt

plt.plot(error_range, 
         ens_errors, 
         label='Ensemble error', 
         linewidth=2)

plt.plot(error_range, 
         error_range, 
         linestyle='--',
         label='Base error',
         linewidth=2)

plt.xlabel('Base error')
plt.ylabel('Base/Ensemble error')
plt.legend(loc='upper left')
plt.grid(alpha=0.5)
#plt.savefig('images/07_03.png', dpi=300)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W904lSBl-1586785355023)(output_18_0.png)]

Combining classifiers via majority vote

Implementing a simple majority vote classifier

import numpy as np

np.argmax(np.bincount([0, 0, 1], 
                      weights=[0.2, 0.2, 0.6]))

ex = np.array([[0.9, 0.1],
               [0.8, 0.2],
               [0.4, 0.6]])

p = np.average(ex, 
               axis=0, 
               weights=[0.2, 0.2, 0.6])
p

array([0.58, 0.42])

np.argmax(p)

from sklearn.base import BaseEstimator
from sklearn.base import ClassifierMixin
from sklearn.preprocessing import LabelEncoder
from sklearn.externals import six
from sklearn.base import clone
from sklearn.pipeline import _name_estimators
import numpy as np
import operator


class MajorityVoteClassifier(BaseEstimator, 
                             ClassifierMixin):
    """ A majority vote ensemble classifier

    Parameters
    ----------
    classifiers : array-like, shape = [n_classifiers]
      Different classifiers for the ensemble

    vote : str, {'classlabel', 'probability'} (default='label')
      If 'classlabel' the prediction is based on the argmax of
        class labels. Else if 'probability', the argmax of
        the sum of probabilities is used to predict the class label
        (recommended for calibrated classifiers).

    weights : array-like, shape = [n_classifiers], optional (default=None)
      If a list of `int` or `float` values are provided, the classifiers
      are weighted by importance; Uses uniform weights if `weights=None`.

    """
    def __init__(self, classifiers, vote='classlabel', weights=None):

        self.classifiers = classifiers
        self.named_classifiers = {key: value for key, value
                                  in _name_estimators(classifiers)}
        self.vote = vote
        self.weights = weights

    def fit(self, X, y):
        """ Fit classifiers.

        Parameters
        ----------
        X : {array-like, sparse matrix}, shape = [n_samples, n_features]
            Matrix of training samples.

        y : array-like, shape = [n_samples]
            Vector of target class labels.

        Returns
        -------
        self : object

        """
        if self.vote not in ('probability', 'classlabel'):
            raise ValueError("vote must be 'probability' or 'classlabel'"
                             "; got (vote=%r)"
                             % self.vote)

        if self.weights and len(self.weights) != len(self.classifiers):
            raise ValueError('Number of classifiers and weights must be equal'
                             '; got %d weights, %d classifiers'
                             % (len(self.weights), len(self.classifiers)))

        # Use LabelEncoder to ensure class labels start with 0, which
        # is important for np.argmax call in self.predict
        self.lablenc_ = LabelEncoder()
        self.lablenc_.fit(y)
        self.classes_ = self.lablenc_.classes_
        self.classifiers_ = []
        for clf in self.classifiers:
            fitted_clf = clone(clf).fit(X, self.lablenc_.transform(y))
            self.classifiers_.append(fitted_clf)
        return self

    def predict(self, X):
        """ Predict class labels for X.

        Parameters
        ----------
        X : {array-like, sparse matrix}, shape = [n_samples, n_features]
            Matrix of training samples.

        Returns
        ----------
        maj_vote : array-like, shape = [n_samples]
            Predicted class labels.
            
        """
        if self.vote == 'probability':
            maj_vote = np.argmax(self.predict_proba(X), axis=1)
        else:  # 'classlabel' vote

            #  Collect results from clf.predict calls
            predictions = np.asarray([clf.predict(X)
                                      for clf in self.classifiers_]).T

            maj_vote = np.apply_along_axis(
                                      lambda x:
                                      np.argmax(np.bincount(x,
                                                weights=self.weights)),
                                      axis=1,
                                      arr=predictions)
        maj_vote = self.lablenc_.inverse_transform(maj_vote)
        return maj_vote

    def predict_proba(self, X):
        """ Predict class probabilities for X.

        Parameters
        ----------
        X : {array-like, sparse matrix}, shape = [n_samples, n_features]
            Training vectors, where n_samples is the number of samples and
            n_features is the number of features.

        Returns
        ----------
        avg_proba : array-like, shape = [n_samples, n_classes]
            Weighted average probability for each class per sample.

        """
        probas = np.asarray([clf.predict_proba(X)
                             for clf in self.classifiers_])
        avg_proba = np.average(probas, axis=0, weights=self.weights)
        return avg_proba

    def get_params(self, deep=True):
        """ Get classifier parameter names for GridSearch"""
        if not deep:
            return super(MajorityVoteClassifier, self).get_params(deep=False)
        else:
            out = self.named_classifiers.copy()
            for name, step in six.iteritems(self.named_classifiers):
                for key, value in six.iteritems(step.get_params(deep=True)):
                    out['%s__%s' % (name, key)] = value
            return out

Using the majority voting principle to make predictions

from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X, y = iris.data[50:, [1, 2]], iris.target[50:]
le = LabelEncoder()
y = le.fit_transform(y)

X_train, X_test, y_train, y_test =\
       train_test_split(X, y, 
                        test_size=0.5, 
                        random_state=1,
                        stratify=y)

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

clf1 = LogisticRegression(penalty='l2', 
                          C=0.001,
                          random_state=1)

clf2 = DecisionTreeClassifier(max_depth=1,
                              criterion='entropy',
                              random_state=0)

clf3 = KNeighborsClassifier(n_neighbors=1,
                            p=2,
                            metric='minkowski')

pipe1 = Pipeline([['sc', StandardScaler()],
                  ['clf', clf1]])
pipe3 = Pipeline([['sc', StandardScaler()],
                  ['clf', clf3]])

clf_labels = ['Logistic regression', 'Decision tree', 'KNN']

print('10-fold cross validation:\n')
for clf, label in zip([pipe1, clf2, pipe3], clf_labels):
    scores = cross_val_score(estimator=clf,
                             X=X_train,
                             y=y_train,
                             cv=10,
                             scoring='roc_auc')
    print("ROC AUC: %0.2f (+/- %0.2f) [%s]"
          % (scores.mean(), scores.std(), label))

10-fold cross validation:

ROC AUC: 0.87 (+/- 0.17) [Logistic regression]
ROC AUC: 0.89 (+/- 0.16) [Decision tree]
ROC AUC: 0.88 (+/- 0.15) [KNN]

# Majority Rule (hard) Voting

mv_clf = MajorityVoteClassifier(classifiers=[pipe1, clf2, pipe3])

clf_labels += ['Majority voting']
all_clf = [pipe1, clf2, pipe3, mv_clf]

for clf, label in zip(all_clf, clf_labels):
    scores = cross_val_score(estimator=clf,
                             X=X_train,
                             y=y_train,
                             cv=10,
                             scoring='roc_auc')
    print("ROC AUC: %0.2f (+/- %0.2f) [%s]"
          % (scores.mean(), scores.std(), label))

ROC AUC: 0.87 (+/- 0.17) [Logistic regression]
ROC AUC: 0.89 (+/- 0.16) [Decision tree]
ROC AUC: 0.88 (+/- 0.15) [KNN]
ROC AUC: 0.94 (+/- 0.13) [Majority voting]

Evaluating and tuning the ensemble classifier

from sklearn.metrics import roc_curve
from sklearn.metrics import auc

colors = ['black', 'orange', 'blue', 'green']
linestyles = [':', '--', '-.', '-']
for clf, label, clr, ls \
        in zip(all_clf,
               clf_labels, colors, linestyles):

    # assuming the label of the positive class is 1
    y_pred = clf.fit(X_train,
                     y_train).predict_proba(X_test)[:, 1]
    fpr, tpr, thresholds = roc_curve(y_true=y_test,
                                     y_score=y_pred)
    roc_auc = auc(x=fpr, y=tpr)
    plt.plot(fpr, tpr,
             color=clr,
             linestyle=ls,
             label='%s (auc = %0.2f)' % (label, roc_auc))

plt.legend(loc='lower right')
plt.plot([0, 1], [0, 1],
         linestyle='--',
         color='gray',
         linewidth=2)

plt.xlim([-0.1, 1.1])
plt.ylim([-0.1, 1.1])
plt.grid(alpha=0.5)
plt.xlabel('False positive rate (FPR)')
plt.ylabel('True positive rate (TPR)')


#plt.savefig('images/07_04', dpi=300)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WKezVqle-1586785355024)(output_33_0.png)]

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)

from itertools import product

all_clf = [pipe1, clf2, pipe3, mv_clf]

x_min = X_train_std[:, 0].min() - 1
x_max = X_train_std[:, 0].max() + 1
y_min = X_train_std[:, 1].min() - 1
y_max = X_train_std[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

f, axarr = plt.subplots(nrows=2, ncols=2, 
                        sharex='col', 
                        sharey='row', 
                        figsize=(7, 5))

for idx, clf, tt in zip(product([0, 1], [0, 1]),
                        all_clf, clf_labels):
    clf.fit(X_train_std, y_train)
    
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axarr[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.3)
    
    axarr[idx[0], idx[1]].scatter(X_train_std[y_train==0, 0], 
                                  X_train_std[y_train==0, 1], 
                                  c='blue', 
                                  marker='^',
                                  s=50)
    
    axarr[idx[0], idx[1]].scatter(X_train_std[y_train==1, 0], 
                                  X_train_std[y_train==1, 1], 
                                  c='green', 
                                  marker='o',
                                  s=50)
    
    axarr[idx[0], idx[1]].set_title(tt)

plt.text(-3.5, -5., 
         s='Sepal width [standardized]', 
         ha='center', va='center', fontsize=12)
plt.text(-12.5, 4.5, 
         s='Petal length [standardized]', 
         ha='center', va='center', 
         fontsize=12, rotation=90)

#plt.savefig('images/07_05', dpi=300)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gLCkxHKU-1586785355026)(output_35_0.png)]

mv_clf.get_params()

{'pipeline-1': Pipeline(memory=None,
          steps=[('sc',
                  StandardScaler(copy=True, with_mean=True, with_std=True)),
                 ['clf',
                  LogisticRegression(C=0.001, class_weight=None, dual=False,
                                     fit_intercept=True, intercept_scaling=1,
                                     l1_ratio=None, max_iter=100,
                                     multi_class='warn', n_jobs=None,
                                     penalty='l2', random_state=1, solver='warn',
                                     tol=0.0001, verbose=0, warm_start=False)]],
          verbose=False),
 'decisiontreeclassifier': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=1,
                        max_features=None, max_leaf_nodes=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, presort=False,
                        random_state=0, splitter='best'),
 'pipeline-2': Pipeline(memory=None,
          steps=[('sc',
                  StandardScaler(copy=True, with_mean=True, with_std=True)),
                 ['clf',
                  KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                       metric='minkowski', metric_params=None,
                                       n_jobs=None, n_neighbors=1, p=2,
                                       weights='uniform')]],
          verbose=False),
 'pipeline-1__memory': None,
 'pipeline-1__steps': [('sc',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ['clf',
   LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,
                      intercept_scaling=1, l1_ratio=None, max_iter=100,
                      multi_class='warn', n_jobs=None, penalty='l2',
                      random_state=1, solver='warn', tol=0.0001, verbose=0,
                      warm_start=False)]],
 'pipeline-1__verbose': False,
 'pipeline-1__sc': StandardScaler(copy=True, with_mean=True, with_std=True),
 'pipeline-1__clf': LogisticRegression(C=0.001, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='warn', n_jobs=None, penalty='l2',
                    random_state=1, solver='warn', tol=0.0001, verbose=0,
                    warm_start=False),
 'pipeline-1__sc__copy': True,
 'pipeline-1__sc__with_mean': True,
 'pipeline-1__sc__with_std': True,
 'pipeline-1__clf__C': 0.001,
 'pipeline-1__clf__class_weight': None,
 'pipeline-1__clf__dual': False,
 'pipeline-1__clf__fit_intercept': True,
 'pipeline-1__clf__intercept_scaling': 1,
 'pipeline-1__clf__l1_ratio': None,
 'pipeline-1__clf__max_iter': 100,
 'pipeline-1__clf__multi_class': 'warn',
 'pipeline-1__clf__n_jobs': None,
 'pipeline-1__clf__penalty': 'l2',
 'pipeline-1__clf__random_state': 1,
 'pipeline-1__clf__solver': 'warn',
 'pipeline-1__clf__tol': 0.0001,
 'pipeline-1__clf__verbose': 0,
 'pipeline-1__clf__warm_start': False,
 'decisiontreeclassifier__class_weight': None,
 'decisiontreeclassifier__criterion': 'entropy',
 'decisiontreeclassifier__max_depth': 1,
 'decisiontreeclassifier__max_features': None,
 'decisiontreeclassifier__max_leaf_nodes': None,
 'decisiontreeclassifier__min_impurity_decrease': 0.0,
 'decisiontreeclassifier__min_impurity_split': None,
 'decisiontreeclassifier__min_samples_leaf': 1,
 'decisiontreeclassifier__min_samples_split': 2,
 'decisiontreeclassifier__min_weight_fraction_leaf': 0.0,
 'decisiontreeclassifier__presort': False,
 'decisiontreeclassifier__random_state': 0,
 'decisiontreeclassifier__splitter': 'best',
 'pipeline-2__memory': None,
 'pipeline-2__steps': [('sc',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ['clf',
   KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                        metric_params=None, n_jobs=None, n_neighbors=1, p=2,
                        weights='uniform')]],
 'pipeline-2__verbose': False,
 'pipeline-2__sc': StandardScaler(copy=True, with_mean=True, with_std=True),
 'pipeline-2__clf': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                      metric_params=None, n_jobs=None, n_neighbors=1, p=2,
                      weights='uniform'),
 'pipeline-2__sc__copy': True,
 'pipeline-2__sc__with_mean': True,
 'pipeline-2__sc__with_std': True,
 'pipeline-2__clf__algorithm': 'auto',
 'pipeline-2__clf__leaf_size': 30,
 'pipeline-2__clf__metric': 'minkowski',
 'pipeline-2__clf__metric_params': None,
 'pipeline-2__clf__n_jobs': None,
 'pipeline-2__clf__n_neighbors': 1,
 'pipeline-2__clf__p': 2,
 'pipeline-2__clf__weights': 'uniform'}

from sklearn.model_selection import GridSearchCV

params = {'decisiontreeclassifier__max_depth': [1, 2],
          'pipeline-1__clf__C': [0.001, 0.1, 100.0]}

grid = GridSearchCV(estimator=mv_clf,
                    param_grid=params,
                    cv=10,
                    scoring='roc_auc')
grid.fit(X_train, y_train)

for r, _ in enumerate(grid.cv_results_['mean_test_score']):
    print("%0.3f +/- %0.2f %r"
          % (grid.cv_results_['mean_test_score'][r], 
             grid.cv_results_['std_test_score'][r] / 2.0, 
             grid.cv_results_['params'][r]))

0.933 +/- 0.07 {'decisiontreeclassifier__max_depth': 1, 'pipeline-1__clf__C': 0.001}
0.947 +/- 0.07 {'decisiontreeclassifier__max_depth': 1, 'pipeline-1__clf__C': 0.1}
0.973 +/- 0.04 {'decisiontreeclassifier__max_depth': 1, 'pipeline-1__clf__C': 100.0}
0.947 +/- 0.07 {'decisiontreeclassifier__max_depth': 2, 'pipeline-1__clf__C': 0.001}
0.947 +/- 0.07 {'decisiontreeclassifier__max_depth': 2, 'pipeline-1__clf__C': 0.1}
0.973 +/- 0.04 {'decisiontreeclassifier__max_depth': 2, 'pipeline-1__clf__C': 100.0}

print('Best parameters: %s' % grid.best_params_)
print('Accuracy: %.2f' % grid.best_score_)

Best parameters: {'decisiontreeclassifier__max_depth': 1, 'pipeline-1__clf__C': 100.0}
Accuracy: 0.97

Note
By default, the default setting for refit in GridSearchCV is True (i.e., GridSeachCV(..., refit=True)), which means that we can use the fitted GridSearchCV estimator to make predictions via the predict method, for example:

grid = GridSearchCV(estimator=mv_clf, 
                    param_grid=params, 
                    cv=10, 
                    scoring='roc_auc')
grid.fit(X_train, y_train)
y_pred = grid.predict(X_test)

In addition, the “best” estimator can directly be accessed via the best_estimator_ attribute.

grid.best_estimator_.classifiers

[Pipeline(memory=None,
          steps=[('sc',
                  StandardScaler(copy=True, with_mean=True, with_std=True)),
                 ['clf',
                  LogisticRegression(C=100.0, class_weight=None, dual=False,
                                     fit_intercept=True, intercept_scaling=1,
                                     l1_ratio=None, max_iter=100,
                                     multi_class='warn', n_jobs=None,
                                     penalty='l2', random_state=1, solver='warn',
                                     tol=0.0001, verbose=0, warm_start=False)]],
          verbose=False),
 DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=1,
                        max_features=None, max_leaf_nodes=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, presort=False,
                        random_state=0, splitter='best'),
 Pipeline(memory=None,
          steps=[('sc',
                  StandardScaler(copy=True, with_mean=True, with_std=True)),
                 ['clf',
                  KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                       metric='minkowski', metric_params=None,
                                       n_jobs=None, n_neighbors=1, p=2,
                                       weights='uniform')]],
          verbose=False)]

mv_clf = grid.best_estimator_

mv_clf.set_params(**grid.best_estimator_.get_params())

MajorityVoteClassifier(classifiers=[Pipeline(memory=None,
                                             steps=[('sc',
                                                     StandardScaler(copy=True,
                                                                    with_mean=True,
                                                                    with_std=True)),
                                                    ('clf',
                                                     LogisticRegression(C=100.0,
                                                                        class_weight=None,
                                                                        dual=False,
                                                                        fit_intercept=True,
                                                                        intercept_scaling=1,
                                                                        l1_ratio=None,
                                                                        max_iter=100,
                                                                        multi_class='warn',
                                                                        n_jobs=None,
                                                                        penalty='l2',
                                                                        random_state=1,
                                                                        solver='warn',
                                                                        tol=0.0001,
                                                                        verbose=0,
                                                                        w...
                                                           min_weight_fraction_leaf=0.0,
                                                           presort=False,
                                                           random_state=0,
                                                           splitter='best'),
                                    Pipeline(memory=None,
                                             steps=[('sc',
                                                     StandardScaler(copy=True,
                                                                    with_mean=True,
                                                                    with_std=True)),
                                                    ('clf',
                                                     KNeighborsClassifier(algorithm='auto',
                                                                          leaf_size=30,
                                                                          metric='minkowski',
                                                                          metric_params=None,
                                                                          n_jobs=None,
                                                                          n_neighbors=1,
                                                                          p=2,
                                                                          weights='uniform'))],
                                             verbose=False)],
                       vote='classlabel', weights=None)

mv_clf

MajorityVoteClassifier(classifiers=[Pipeline(memory=None,
                                             steps=[('sc',
                                                     StandardScaler(copy=True,
                                                                    with_mean=True,
                                                                    with_std=True)),
                                                    ('clf',
                                                     LogisticRegression(C=100.0,
                                                                        class_weight=None,
                                                                        dual=False,
                                                                        fit_intercept=True,
                                                                        intercept_scaling=1,
                                                                        l1_ratio=None,
                                                                        max_iter=100,
                                                                        multi_class='warn',
                                                                        n_jobs=None,
                                                                        penalty='l2',
                                                                        random_state=1,
                                                                        solver='warn',
                                                                        tol=0.0001,
                                                                        verbose=0,
                                                                        w...
                                                           min_weight_fraction_leaf=0.0,
                                                           presort=False,
                                                           random_state=0,
                                                           splitter='best'),
                                    Pipeline(memory=None,
                                             steps=[('sc',
                                                     StandardScaler(copy=True,
                                                                    with_mean=True,
                                                                    with_std=True)),
                                                    ('clf',
                                                     KNeighborsClassifier(algorithm='auto',
                                                                          leaf_size=30,
                                                                          metric='minkowski',
                                                                          metric_params=None,
                                                                          n_jobs=None,
                                                                          n_neighbors=1,
                                                                          p=2,
                                                                          weights='uniform'))],
                                             verbose=False)],
                       vote='classlabel', weights=None)

Bagging – Building an ensemble of classifiers from bootstrap samples

Image(filename='./images/07_06.png', width=500)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Eb2C427g-1586785355027)(output_46_0.png)]

Bagging in a nutshell

Image(filename='./images/07_07.png', width=400)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dTuVkfBq-1586785355028)(output_48_0.png)]

Applying bagging to classify samples in the Wine dataset

import pandas as pd

df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/'
                      'machine-learning-databases/wine/wine.data',
                      header=None)

df_wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash',
                   'Alcalinity of ash', 'Magnesium', 'Total phenols',
                   'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins',
                   'Color intensity', 'Hue', 'OD280/OD315 of diluted wines',
                   'Proline']

# if the Breast Cancer dataset is temporarily unavailable from the
# UCI machine learning repository, un-comment the following line
# of code to load the dataset from a local path:

# df_wine = pd.read_csv('wine.data', header=None)

# drop 1 class
df_wine = df_wine[df_wine['Class label'] != 1]

y = df_wine['Class label'].values
X = df_wine[['Alcohol', 'OD280/OD315 of diluted wines']].values

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split


le = LabelEncoder()
y = le.fit_transform(y)

X_train, X_test, y_train, y_test =\
            train_test_split(X, y, 
                             test_size=0.2, 
                             random_state=1,
                             stratify=y)

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(criterion='entropy', 
                              max_depth=None,
                              random_state=1)

bag = BaggingClassifier(base_estimator=tree,
                        n_estimators=500, 
                        max_samples=1.0, 
                        max_features=1.0, 
                        bootstrap=True, 
                        bootstrap_features=False, 
                        n_jobs=1, 
                        random_state=1)

from sklearn.metrics import accuracy_score

tree = tree.fit(X_train, y_train)
y_train_pred = tree.predict(X_train)
y_test_pred = tree.predict(X_test)

tree_train = accuracy_score(y_train, y_train_pred)
tree_test = accuracy_score(y_test, y_test_pred)
print('Decision tree train/test accuracies %.3f/%.3f'
      % (tree_train, tree_test))

bag = bag.fit(X_train, y_train)
y_train_pred = bag.predict(X_train)
y_test_pred = bag.predict(X_test)

bag_train = accuracy_score(y_train, y_train_pred) 
bag_test = accuracy_score(y_test, y_test_pred) 
print('Bagging train/test accuracies %.3f/%.3f'
      % (bag_train, bag_test))

Decision tree train/test accuracies 1.000/0.833
Bagging train/test accuracies 1.000/0.917

import numpy as np
import matplotlib.pyplot as plt

x_min = X_train[:, 0].min() - 1
x_max = X_train[:, 0].max() + 1
y_min = X_train[:, 1].min() - 1
y_max = X_train[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

f, axarr = plt.subplots(nrows=1, ncols=2, 
                        sharex='col', 
                        sharey='row', 
                        figsize=(8, 3))


for idx, clf, tt in zip([0, 1],
                        [tree, bag],
                        ['Decision tree', 'Bagging']):
    clf.fit(X_train, y_train)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axarr[idx].contourf(xx, yy, Z, alpha=0.3)
    axarr[idx].scatter(X_train[y_train == 0, 0],
                       X_train[y_train == 0, 1],
                       c='blue', marker='^')

    axarr[idx].scatter(X_train[y_train == 1, 0],
                       X_train[y_train == 1, 1],
                       c='green', marker='o')

    axarr[idx].set_title(tt)

axarr[0].set_ylabel('Alcohol', fontsize=12)
plt.text(10.2, -0.5,
         s='OD280/OD315 of diluted wines',
         ha='center', va='center', fontsize=12)

plt.tight_layout()
#plt.savefig('images/07_08.png', dpi=300, bbox_inches='tight')
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mMiVZ6IO-1586785355029)(output_54_0.png)]

Leveraging weak learners via adaptive boosting

How boosting works

Image(filename='images/07_09.png', width=400)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PhnYVell-1586785355030)(output_58_0.png)]

Image(filename='images/07_10.png', width=500)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lhfRj8r9-1586785355030)(output_59_0.png)]

Applying AdaBoost using scikit-learn

from sklearn.ensemble import AdaBoostClassifier

tree = DecisionTreeClassifier(criterion='entropy', 
                              max_depth=1,
                              random_state=1)

ada = AdaBoostClassifier(base_estimator=tree,
                         n_estimators=500, 
                         learning_rate=0.1,
                         random_state=1)

tree = tree.fit(X_train, y_train)
y_train_pred = tree.predict(X_train)
y_test_pred = tree.predict(X_test)

tree_train = accuracy_score(y_train, y_train_pred)
tree_test = accuracy_score(y_test, y_test_pred)
print('Decision tree train/test accuracies %.3f/%.3f'
      % (tree_train, tree_test))

ada = ada.fit(X_train, y_train)
y_train_pred = ada.predict(X_train)
y_test_pred = ada.predict(X_test)

ada_train = accuracy_score(y_train, y_train_pred) 
ada_test = accuracy_score(y_test, y_test_pred) 
print('AdaBoost train/test accuracies %.3f/%.3f'
      % (ada_train, ada_test))

Decision tree train/test accuracies 0.916/0.875
AdaBoost train/test accuracies 1.000/0.917

x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

f, axarr = plt.subplots(1, 2, sharex='col', sharey='row', figsize=(8, 3))


for idx, clf, tt in zip([0, 1],
                        [tree, ada],
                        ['Decision tree', 'AdaBoost']):
    clf.fit(X_train, y_train)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axarr[idx].contourf(xx, yy, Z, alpha=0.3)
    axarr[idx].scatter(X_train[y_train == 0, 0],
                       X_train[y_train == 0, 1],
                       c='blue', marker='^')
    axarr[idx].scatter(X_train[y_train == 1, 0],
                       X_train[y_train == 1, 1],
                       c='green', marker='o')
    axarr[idx].set_title(tt)

axarr[0].set_ylabel('Alcohol', fontsize=12)
plt.text(10.2, -0.5,
         s='OD280/OD315 of diluted wines',
         ha='center', va='center', fontsize=12)

plt.tight_layout()
#plt.savefig('images/07_11.png', dpi=300, bbox_inches='tight')
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gritB1qh-1586785355031)(output_63_0.png)]

Summary

…

Readers may ignore the next cell.

! python ../.convert_notebook_to_script.py --input ch07.ipynb --output ch07.py

python: can't open file '../.convert_notebook_to_script.py': [Errno 2] No such file or directory

你可能感兴趣的:(python机器学习)

Python机器学习与深度学习：决策树、随机森林、XGBoost与LightGBM、迁移学习、循环神经网络、长短时记忆网络、时间卷积网络、自编码器、生成对抗网络、YOLO目标检测等 WangYan2022 机器学习/深度学习 Python 机器学习深度学习随机森林迁移学习
融合最新技术动态与实战经验，旨在系统提升以下能力：①掌握ChatGPT、DeepSeek等大语言模型在代码生成、模型调试、实验设计、论文撰写等方面的实际应用技巧②深入理解深度学习与经典机器学习算法的关联与差异，掌握其理论基础③熟练运用PyTorch实现各类深度学习模型，包括迁移学习、循环神经网络（RNN）、长短时记忆网络（LSTM）、时间卷积网络（TCN）、自编码器、生成对抗网络（GAN）、YOL
Python 机器学习实战：Scikit-learn 算法宝典，从线性回归到支持向量机清水白石008 python Python题库 python 机器学习算法
Python机器学习实战：Scikit-learn算法宝典，从线性回归到支持向量机引言各位Python工程师，大家好！欢迎来到激动人心的机器学习世界！在这个数据驱动的时代，机器学习已经渗透到我们生活的方方面面，从智能推荐系统到自动驾驶汽车，都离不开机器学习技术的支撑。作为一名Python开发者，掌握机器学习技能，无疑将为您的职业发展注入强大的动力，让您在人工智能浪潮中占据先机。Scikit-lea
Python机器学习入门必看！从原理到实战，手把手教你线性回归模型小张在编程 python 机器学习线性回归
引言在人工智能浪潮席卷全球的今天，机器学习（MachineLearning）早已不再是实验室的“黑科技”——打开购物APP的“猜你喜欢”、输入搜索词后的“相关推荐”、甚至天气预报中的温度预测，背后都有机器学习模型的身影。而在线性回归（LinearRegression）作为机器学习中最基础、最经典的监督学习模型，堪称机器学习的“敲门砖”。本文将从原理到实战，带你彻底掌握这一核心算法。一、机器学习的“
Python机器学习实战——逻辑回归（附完整代码和结果）小白熊XBX 机器学习机器学习 python 逻辑回归
Python机器学习实战——逻辑回归（附完整代码和结果）关于作者作者：小白熊作者简介：精通c#、Halcon、Python、Matlab，擅长机器视觉、机器学习、深度学习、数字图像处理、工业检测识别定位、用户界面设计、目标检测、图像分类、姿态识别、人脸识别、语义分割、路径规划、智能优化算法、大数据分析、各类算法融合创新等等。联系邮箱：[email protected]科研辅导、知识付费答疑、个性化定制
C#串口通信上位机笔记（modbus协议）指针刺客 c#笔记开发语言
C#串口通信上位机笔记（modbus协议）提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录C#串口通信上位机笔记（modbus协议）前言一、新建工程二、使用步骤1.引入库2.串口初始化总结前言提示：这里可以添加本文要记录的大概内容：记录自己工作的上位机经验
从决策树到随机森林：Python机器学习里的“树形家族“深度实战与原理拆解小张在编程机器学习决策树随机森林
引言在机器学习的算法森林中，有一对"树形兄弟"始终占据着C位——决策树像个逻辑清晰的"老教授"，用可视化的树状结构把复杂决策过程拆解成"是/否"的简单判断；而它的进阶版随机森林更像一支"精英军团"，通过多棵决策树的"投票表决"，在准确性与抗过拟合能力上实现了质的飞跃。无论是医疗诊断中的疾病预测，还是金融风控里的违约判别，这对组合都用强大的适应性证明着自己的"算法常青树"地位。今天，我们就从原理到实
燕大《Python机器学习》实验报告：探索机器学习的奥秘温冰礼
燕大《Python机器学习》实验报告：探索机器学习的奥秘【下载地址】燕大Python机器学习实验报告下载这份实验报告是燕山大学软件工程专业的学生在进行机器学习实验时所编写的，内容详实，结构清晰，可以直接下载使用。报告中的实验数据和代码均经过验证，确保下载后可以直接应用于实际项目或作为学习参考项目地址:https://gitcode.com/Open-source-documentation-tut
（转）优秀的 python 机器学习库 patrick75 python 机器学习 python 机器学习
优秀的python机器学习库IntroductionThereisnodoubtthatneuralnetworks,andmachinelearningingeneral,hasbeenoneofthehottesttopicsintechthepastfewyearsorso.It’seasytoseewhywithallofthereallyinterestinguse-casestheys
Python机器学习元学习库higher 音程机器学习人工智能 python 机器学习
higher是一个用于元学习（Meta-Learning）和高阶导数（Higher-ordergradients）的Python库，专为PyTorch设计。它扩展了PyTorch的自动微分机制，使得在训练过程中可以动态地计算参数的梯度更新，并把这些更新过程纳入到更高阶的梯度计算中。一、主要用途higher主要用于以下场景：元学习（Meta-Learning）比如MAML（Model-Agnosti
【Rust】——使用消息在线程之间传递数据 Y小夜 Rust（官方文档重点总结）rust 开发语言后端
博主现有专栏：C51单片机（STC89C516），c语言，c++，离散数学，算法设计与分析，数据结构，Python，Java基础，MySQL，linux，基于HTML5的网页设计及应用，Rust（官方文档重点总结），jQuery，前端vue.js，Javaweb开发，Python机器学习等主页链接：Y小夜-CSDN博客目录信道与所有权转移发送多个值并观察接收者的等待通过克隆发送者来创建多个生产者学
Nginx 缓存系统 proxy_cache详解学堂在线云计算 Linux系统 nginx 缓存运维服务器开源
系列文章目录提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录系列文章目录前言功能特点proxy_cache工作原理示意图配置文件示例参数详尽说明性能优化方案总结前言Nginx的proxy_cache模块是Nginx代理功能的一部分，它允许Nginx缓存来自
Python机器学习小项目实战：随机森林算法实现信用卡欺诈检测码上研习 Python机器学习小项目实战机器学习算法 python
1.引言在之前的机器学习之旅中，我们已经探索了许多强大的算法，例如逻辑回归、支持向量机、决策树等等。每种算法都有其独特的优势和适用场景，但它们也存在一些共同的局限性。单个模型往往难以完美地捕捉复杂的数据模式，容易受到过拟合或欠拟合的影响，并且在面对噪声数据时显得脆弱。想象一下，你正在尝试预测股票价格的涨跌。你可以使用逻辑回归，但是逻辑回归假设特征之间是线性相关的，这可能无法捕捉股票市场中的复杂非线
【python机器学习】——线性回归算法爱读书的无业游民【python机器学习】机器学习算法 python 线性回归
线性回归线性回归基本概念线性回归是一种预测模型，它用于分析两个或多个变量之间的关系。在简单的线性回归中，我们通常有一个目标变量（称为响应变量或因变量）和一个或多个预测变量（称为解释变量或自变量）。目标是找到一条直线（在多元情况下是超平面），使得这条直线尽可能地拟合数据点，即最小化预测值和实际值之间的差异。线性回归的基本原理是通过最小化误差平方和来寻找最佳拟合直线。误差平方和是每个数据点到直线的距离
机器学习相关书籍源码链接(基于Python+Tensorflow2) m0_37773168 资源类深度学习机器学习 python tensorflow
机器学习相关书籍源码链接(基于Python+Tensorflow2)《深度学习》+《动手学习深度学习》https://zhuanlan.zhihu.com/p/95859749?utm_source=wechat_session&utm_medium=social&utm_oi=751192479616741376《Python机器学习手册：从数据预处理到深度学习》《简明的TensorFlow2》
(DCA) 可视化 kaka_R-Py python DCA
本文转载于微信公众号-Python机器学习AIDCA(Dollar-CostAveraging)isaninvestmentstrategywhereaninvestordividesthetotalamounttobeinvestedacrossperiodicpurchasesofatargetassettoreducetheimpactofvolatilityontheoverallpurc
Python机器学习小项目实战：随机森林模型优化，提升信用卡欺诈检测效能码上研习 Python机器学习小项目实战人工智能算法机器学习网格搜索贝叶斯优化随机搜索
1.引言在之前的博客中，我们已经成功地使用随机森林算法构建了一个信用卡欺诈检测模型。随机森林算法凭借其良好的性能和易于使用的特点，在各种实际应用中都取得了良好的效果。然而，仅仅构建一个模型是不够的。为了提高模型在实际场景中的泛化能力和检测精度，我们需要对其进行优化。模型优化是指通过调整模型的参数、选择合适的特征或使用更高级的算法来提高模型的性能。在信用卡欺诈检测中，模型优化可以帮助我们更准确地识别
Scikit-learn：开启量化价值投资的新征程量化价值投资入门到精通 scikit-learn python 机器学习 ai
Scikit-learn：开启量化价值投资的新征程关键词：Scikit-learn、量化投资、价值投资、机器学习、特征工程、投资组合优化、金融数据分析摘要：本文深入探讨了如何利用Scikit-learn这一强大的Python机器学习库来构建量化价值投资系统。文章从基础概念出发，详细介绍了价值投资的量化实现方法，包括数据获取与处理、特征工程、模型构建与优化等关键环节。通过实际案例展示了如何使用机器学
Python机器学习实战：推荐系统的原理与实现方法 AI大模型应用之禅人工智能数学基础计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Python机器学习实战：推荐系统的原理与实现方法1.背景介绍1.1问题的由来在当今数字化时代，推荐系统已成为电子商务、媒体流媒体平台、社交媒体以及在线购物网站的核心组件之一。推荐系统旨在根据用户的历史行为、偏好以及社会关系等因素，为用户提供个性化的内容或商品建议，从而提高用户体验、增加用户粘性，并提升业务转化率。1.2研究现状随着大数据和深度学习技术的快速发展，推荐系统正从基于规则的简单过滤模型
neo4j导出导入csv文件 qq_45133760 neo4j neo4j
neo4j导出导入csv文件提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录neo4j导出导入csv文件前言一、导出csv文件二、导入csv总结前言有时候需要吧一个数据库导入导入另一个数据库。有两种方法，本文介绍命令行admin方法，还有cypher方法，
Python机器学习实战：使用Pandas进行数据预处理与分析 AI天才研究院 AI Agent 应用开发计算计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Python机器学习实战：使用Pandas进行数据预处理与分析1.背景介绍在机器学习和数据科学领域中,数据预处理是一个至关重要的步骤。原始数据通常存在噪声、缺失值、异常值等问题,直接将其输入机器学习模型会导致模型性能下降。因此,对数据进行清洗、转换和规范化等预处理操作是必不可少的。Pandas是Python中广泛使用的数据分析库,提供了高性能、易于使用的数据结构和数据分析工具。它可以高效地处理结构
pip清华源（国内清华镜像）黄小耶@ 其它 pip
Python在安装第三方库的时候，因为服务器在国外所以会下载得很慢，所以我们就会选择换国内的镜像，但是国内的镜像源又比较多，我会推荐清华镜像，因为清华镜像的库是非常全的，那么该如何使用呢？示例如下pipinstall-ihttps://pypi.tuna.tsinghua.edu.cn/simplescikit-learn上面代码是我们安装python机器学习Scikit-lean库的示例代码，安
Python常用包应用专题：Scikit-learn全面指南全息架构师 Python 实战项目大揭秘 python scikit-learn 开发语言
Python常用包应用专题：Scikit-learn全面指南专题前言欢迎回到《Python常用包应用专题》系列！我是CSDN博主[你的名字]，今天我们将深入探索Python机器学习的核心武器库——Scikit-learn。无论你是机器学习初学者还是经验丰富的数据科学家，掌握Scikit-learn都能让你在AI时代如虎添翼！为什么这篇Scikit-learn指南价值连城？覆盖机器学习全流程：从数据
【Rust】——项目实例：——命令行实例（一） Y小夜 Rust（官方文档重点总结）rust 开发语言后端
博主现有专栏：C51单片机（STC89C516），c语言，c++，离散数学，算法设计与分析，数据结构，Python，Java基础，MySQL，linux，基于HTML5的网页设计及应用，Rust（官方文档重点总结），jQuery，前端vue.js，Javaweb开发，Python机器学习等主页链接：Y小夜-CSDN博客目录接收命令行程序读取参数将参数值保存进变量读取文件重构二进制项目的关注分离提取
Python机器学习实战：智能聊天机器人的构建与优化 AI天才研究院计算 AI大模型企业级应用开发实战 ChatGPT 计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Python机器学习实战：智能聊天机器人的构建与优化作者：禅与计算机程序设计艺术1.背景介绍1.1人工智能与聊天机器人的发展历程1.1.1人工智能的起源与发展人工智能（ArtificialIntelligence，AI）的起源可以追溯到上世纪50年代，图灵测试的提出标志着人工智能作为一门学科的诞生。随后，人工智能经历了几次高潮和低谷，期间涌现出许多重要的理论和算法，例如符号主义、连接主义、专家系统
头歌实践教学平台python机器学习-决策树学习只是用户态 1024程序员节
决策树简述下列说法正确的是？A、训练决策树的过程就是构建决策树的过程B、ID3算法是根据信息增益来构建决策树下列说法错误的是？B、决策树只能是一棵二叉树决策树算法任务描述本关任务：编写一个使用决策树算法进行信息增益计算及结点划分的程序。相关知识为了完成本关任务，你需要掌握：1.决策树模型，2.决策树模型用于分类，3.决策树信息熵构建。决策树模型决策树(DecisionTree）是在已知各种情况发生
QT信号和槽出现一片乱码 QT开发 qt 开发语言
系列文章目录提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录系列文章目录前言一、信号和槽信号的特点:槽的特点二、连接1、QT42、QT5四、注意事项五、扩展前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发展，机器学习这门技术也越来越重
Python机器学习：适合新手的8个项目 m0_60667010 python 机器学习信息可视化
•使用Scikit-Learn预测葡萄酒质量——训练机器学习模型的分步教程•R:caret–由caret包的作者提供的网络研讨会数据源•UCI机器学习存储库——350多个可搜索的数据集，涵盖几乎所有主题。您一定会找到您感兴趣的数据集。•Kaggle数据集——Kaggle社区上传的100多个数据集。这里有一些非常有趣的数据集，包括PokemonGo产卵地点和圣地亚哥的墨西哥卷饼。•data.gov—
Python机器学习、深度学习库总结（内含大量示例，建议收藏）(1) 2401_86449728 python 机器学习深度学习
plt.subplot(221),plt.imshow(img[:,:,::-1]),plt.title(‘Original’)plt.xticks([]),plt.yticks([])plt.subplot(222),plt.imshow(dst[:,:,::-1]),plt.title(‘Averaging’)plt.xticks([]),plt.yticks([])plt.subplot(2
Python机器学习实战：分布式机器学习框架Dask的入门与实战 AI大模型应用之禅人工智能数学基础计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Python机器学习实战：分布式机器学习框架Dask的入门与实战作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着大数据时代的到来，数据量的爆炸式增长使得传统的单机处理方式逐渐显得力不从心。无论是数据预处理、特征工程还是模型训练，单机环境下的计算资源和内存限制都成为了瓶颈。为了应对这些挑战，分布式计算框架应运而生。Das
python 文本分析库_Python有趣|中文文本情感分析 weixin_39972019 python 文本分析库
前言前文给大家说了python机器学习的路径，这光说不练假把式，这次，罗罗攀就带大家完成一个中文文本情感分析的机器学习项目，今天的流程如下：数据情况和处理数据情况这里的数据为大众点评上的评论数据（王树义老师提供），主要就是评论文字和打分。我们首先读入数据，看下数据的情况：importnumpyasnpimportpandasaspddata=pd.read_csv('data1.csv')data
分享100个最新免费的高匿HTTP代理IP mcj8089 代理IP 代理服务器匿名代理免费代理IP 最新代理IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ 120.198.243.130:80,中国/广东省 58.251.78.71:8088,中国/广东省 183.207.228.22:83,中国/
mysql高级特性之数据分区 annan211 java 数据结构 mongodb 分区 mysql
mysql高级特性 1 以存储引擎的角度分析，分区表和物理表没有区别。是按照一定的规则将数据分别存储的逻辑设计。器底层是由多个物理字表组成。 2 分区的原理分区表由多个相关的底层表实现，这些底层表也是由句柄对象表示，所以我们可以直接访问各个分区。存储引擎管理分区的各个底层表和管理普通表一样(所有底层表都必须使用相同的存储引擎)，分区表的索引只是
JS采用正则表达式简单获取URL地址栏参数 chiangfai js 地址栏参数获取
GetUrlParam:function GetUrlParam(param){ var reg = new RegExp("(^|&)"+ param +"=([^&]*)(&|$)"); var r = window.location.search.substr(1).match(reg); if(r!=null
怎样将数据表拷贝到powerdesigner (本地数据库表) Array_06 powerDesigner
================================================== 1、打开PowerDesigner12，在菜单中按照如下方式进行操作 file->Reverse Engineer->DataBase 点击后，弹出 New Physical Data Model 的对话框 2、在General选项卡中 Model name:模板名字，自
logbackのhelloworld 飞翔的马甲日志 logback
一、概述 1.日志是啥？当我是个逗比的时候我是这么理解的：log.debug()代替了system.out.print(); 当我项目工作时，以为是一堆得.log文件。这两天项目发布新版本，比较轻松，决定好好地研究下日志以及logback。传送门1：日志的作用与方法： http://www.infoq.com/cn/articles/why-and-how-log 上面的作
新浪微博爬虫模拟登陆随意而生新浪微博
转载自：http://hi.baidu.com/erliang20088/item/251db4b040b8ce58ba0e1235 近来由于毕设需要，重新修改了新浪微博爬虫废了不少劲，希望下边的总结能够帮助后来的同学们。现行版的模拟登陆与以前相比，最大的改动在于cookie获取时候的模拟url的请求
synchronized 香水浓 java thread
Java语言的关键字，可用来给对象和方法或者代码块加锁，当它锁定一个方法或者一个代码块的时候，同一时刻最多只有一个线程执行这段代码。当两个并发线程访问同一个对象object中的这个加锁同步代码块时，一个时间内只能有一个线程得到执行。另一个线程必须等待当前线程执行完这个代码块以后才能执行该代码块。然而，当一个线程访问object的一个加锁代码块时，另一个线程仍然
maven 简单实用教程 AdyZhang maven
1. Maven介绍 1.1. 简介 java编写的用于构建系统的自动化工具。目前版本是2.0.9，注意maven2和maven1有很大区别，阅读第三方文档时需要区分版本。 1.2. Maven资源见官方网站；The 5 minute test，官方简易入门文档；Getting Started Tutorial，官方入门文档；Build Coo
Android 通过 intent传值获得null aijuans android
我在通过intent 获得传递兑现过的时候报错，空指针,我是getMap方法进行传值，代码如下 1 2 3 4 5 6 7 8 9 public void getMap(View view){ Intent i =
apache 做代理报如下错误：The proxy server received an invalid response from an upstream baalwolf response
网站配置是apache＋tomcat,tomcat没有报错，apache报错是： The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /. Reason: Error reading fr
Tomcat6 内存和线程配置 BigBird2012 tomcat6
1、修改启动时内存参数、并指定JVM时区（在windows server 2008 下时间少了8个小时）在Tomcat上运行j2ee项目代码时，经常会出现内存溢出的情况，解决办法是在系统参数中增加系统参数： window下，在catalina.bat最前面 set JAVA_OPTS=-XX:PermSize=64M -XX:MaxPermSize=128m -Xms5
Karam与TDD bijian1013 Karam TDD
一.TDD 测试驱动开发（Test-Driven Development,TDD）是一种敏捷（AGILE）开发方法论，它把开发流程倒转了过来，在进行代码实现之前，首先保证编写测试用例，从而用测试来驱动开发（而不是把测试作为一项验证工具来使用）。 TDD的原则很简单： a.只有当某个
[Zookeeper学习笔记之七]Zookeeper源代码分析之Zookeeper.States bit1129 zookeeper
public enum States { CONNECTING, //Zookeeper服务器不可用，客户端处于尝试链接状态 ASSOCIATING, //？？？ CONNECTED, //链接建立，可以与Zookeeper服务器正常通信 CONNECTEDREADONLY, //处于只读状态的链接状态，只读模式可以在
【Scala十四】Scala核心八：闭包 bit1129 scala
Free variable A free variable of an expression is a variable that’s used inside the expression but not defined inside the expression. For instance, in the function literal expression (x: Int) => (x
android发送json并解析返回json ronin47 android
package com.http.test; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.HttpGet; import
一份IT实习生的总结 brotherlamp PHP php资料 php教程 php培训 php视频
今天突然发现在不知不觉中自己已经实习了 3 个月了，现在可能不算是真正意义上的实习吧，因为现在自己才大三，在这边撸代码的同时还要考虑到学校的功课跟期末考试。让我震惊的是，我完全想不到在这 3 个月里我到底学到了什么，这是一件多么悲催的事情啊。同时我对我应该 get 到什么新技能也很迷茫。所以今晚还是总结下把，让自己在接下来的实习生活有更加明确的方向。最后感谢工作室给我们几个人这个机会让我们提前出来
据说是2012年10月人人网校招的一道笔试题-给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。将重物放到天平左侧，问在两边如何添加砝码 bylijinnan java
public class ScalesBalance { /** * 题目： * 给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。（假设N无限大，但一种重量的砝码只有一个） * 将重物放到天平左侧，问在两边如何添加砝码使两边平衡 * * 分析： * 三进制 * 我们约定括号表示里面的数是三进制，例如 47=(1202
dom4j最常用最简单的方法 chiangfai dom4j
要使用dom4j读写XML文档,需要先下载dom4j包,dom4j官方网站在 http://www.dom4j.org/目前最新dom4j包下载地址:http://nchc.dl.sourceforge.net/sourceforge/dom4j/dom4j-1.6.1.zip 解开后有两个包,仅操作XML文档的话把dom4j-1.6.1.jar加入工程就可以了,如果需要使用XPath的话还需要
简单HBase笔记 chenchao051 hbase
一、Client-side write buffer 客户端缓存请求描述：可以缓存客户端的请求，以此来减少RPC的次数，但是缓存只是被存在一个ArrayList中，所以多线程访问时不安全的。可以使用getWriteBuffer()方法来取得客户端缓存中的数据。默认关闭。二、Scan的Caching 描述： next( )方法请求一行就要使用一次RPC,即使
mysqldump导出时出现when doing LOCK TABLES daizj mysql mysqdump 导数据
　　执行　mysqldump -uxxx -pxxx -hxxx -Pxxxx database tablename > tablename.sql　导出表时，会报 mysqldump: Got error: 1044: Access denied for user 'xxx'@'xxx' to database 'xxx' when doing LOCK TABLES 解决
CSS渲染原理 dcj3sjt126com Web
从事Web前端开发的人都与CSS打交道很多，有的人也许不知道css是怎么去工作的，写出来的css浏览器是怎么样去解析的呢？当这个成为我们提高css水平的一个瓶颈时，是否应该多了解一下呢？一、浏览器的发展与CSS
《阿甘正传》台词 dcj3sjt126com
Part Ⅰ: 《阿甘正传》Forrest Gump经典中英文对白 Forrest: Hello! My names Forrest. Forrest Gump. You wanna Chocolate? I could eat about a million and a half othese. My momma always said life was like a box ochocol
Java处理JSON dyy_gusi json
Json在数据传输中很好用，原因是JSON 比 XML 更小、更快，更易解析。在Java程序中，如何使用处理JSON，现在有很多工具可以处理，比较流行常用的是google的gson和alibaba的fastjson，具体使用如下： 1、读取json然后处理 class ReadJSON { public static void main(String[] args)
win7下nginx和php的配置 geeksun nginx
1. 安装包准备 nginx : 从nginx.org下载nginx-1.8.0.zip php：从php.net下载php-5.6.10-Win32-VC11-x64.zip， php是免安装文件。 RunHiddenConsole: 用于隐藏命令行窗口 2. 配置 # java用8080端口做应用服务器，nginx反向代理到这个端口即可 p
基于2.8版本redis配置文件中文解释 hongtoushizi redis
转载自： http://wangwei007.blog.51cto.com/68019/1548167 在Redis中直接启动redis-server服务时, 采用的是默认的配置文件。采用redis-server xxx.conf 这样的方式可以按照指定的配置文件来运行Redis服务。下面是Redis2.8.9的配置文
第五章常用Lua开发库3-模板渲染 jinnianshilongnian nginx lua
动态web网页开发是Web开发中一个常见的场景，比如像京东商品详情页，其页面逻辑是非常复杂的，需要使用模板技术来实现。而Lua中也有许多模板引擎，如目前我在使用的lua-resty-template，可以渲染很复杂的页面，借助LuaJIT其性能也是可以接受的。如果学习过JavaEE中的servlet和JSP的话，应该知道JSP模板最终会被翻译成Servlet来执行；而lua-r
JZSearch大数据搜索引擎颠覆者 JavaScript
系统简介：大数据的特点有四个层面：第一，数据体量巨大。从TB级别，跃升到PB级别；第二，数据类型繁多。网络日志、视频、图片、地理位置信息等等。第三，价值密度低。以视频为例，连续不间断监控过程中，可能有用的数据仅仅有一两秒。第四，处理速度快。最后这一点也是和传统的数据挖掘技术有着本质的不同。业界将其归纳为4个“V”——Volume，Variety，Value，Velocity。大数据搜索引
10招让你成为杰出的Java程序员 pda158 java 编程框架
如果你是一个热衷于技术的 Java 程序员，那么下面的 10 个要点可以让你在众多 Java 开发人员中脱颖而出。　　 1. 拥有扎实的基础和深刻理解 OO 原则　　对于 Java 程序员，深刻理解 Object Oriented Programming（面向对象编程）这一概念是必须的。没有 OOPS 的坚实基础，就领会不了像 Java 这些面向对象编程语言
tomcat之oracle连接池配置小网客 oracle
tomcat版本7.0 配置oracle连接池方式：修改tomcat的server.xml配置文件： <GlobalNamingResources> <Resource name="utermdatasource" auth="Container" type="javax.sql.DataSou
Oracle 分页算法汇总 vipbooks oracle sql 算法 .net
这是我找到的一些关于Oracle分页的算法，大家那里还有没有其他好的算法没？我们大家一起分享一下！ -- Oracle 分页算法一 select * from ( select page.*,rownum rn from (select * from help) page -- 20 = (currentPag