example datasets

nonlinear example datasets

  • half_moon

产生非线性数据集,比如用以测试核机制的性能;
核方法最终的使命是:unfold the half-moons

from sklearn.datasets import make_moons
X, y = make_moons(n_samples=200, shuffle=True, random_state=123)
plt.scatter(X[y==0, 0], X[y==0, 1], color='r', marker='^', alpha=.4)
plt.scatter(X[y==1, 0], X[y==1, 1], color='r', marker='o', alpha=.4)
plt.show()



  • concentric circles
from sklearn.datasets import make_circles
X, y = make_circles(n_samples=1000, noise=.1, factor=.2, random_state=123)
plt.scatter(X[y==0, 0], X[y==0, 1], color='r', marker='^', alpha=.4)
plt.scatter(X[y==1, 0], X[y==1, 1], color='b', marker='o', alpha=.4)
plt.show()


example datasets_第1张图片

datasets in sklearn

from sklearn import datasets
  • iris
>>> iris = datasets.load_iris()
>>> dir(iris)



>>> iris.data.shape
(150, 4)                    # 训练样本 
>>> iris.target.shape
(150,)                      # 一维的训练样本
>>> print(iris.DESCR)

# DESCR成员:数据集的介绍 

Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm         - sepal width in cm         - petal length in cm         - petal width in cm         - class:                 - Iris-Setosa                 - Iris-Versicolour                 - Iris-Virginica     :Summary Statistics:

UCI

  • Breast Cancer Wisconsin dataset

which contains 569 samples of malignant(恶性的) and benign(良性的) tumor cells.

The first two columns in the dataset store the unique ID numbers of the samples and the corresponding diagnoisi (M=malignant, B=benign), respectively.

The columns 3-32 contains 30 real-value features that have been computed from digitized images of the cell nuclei, which can be used to build a model to predict whether a tumor is benign or malignant.

import pandas as pd
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/'
                 'breast-cancer-wisconsin/wdbc.data', header=None)
X, y = df.values[:, 2:], df.values[:, 1]

你可能感兴趣的:(example datasets)