sklearn 学习笔记 —— Nearest Neighbors

文章目录

  • Intro
  • Unsupervised Nearest Neighbors
  • Nearest Neighbors Classification

Intro

sklearn 提供了 sklearn.neighbors 这个模块,这个模块提供了unsupervised and supervised neighbors-based learning methods的一些方法。

Unsupervised nearest neighbors是很多其它learning methods的基础,像著名的manifold learning和spectral clustering。

Supervised neighbors-based learning 有两种方式:1. classification for data with discrete labels 2. regression for data with continuous labels。

nearest neighbor methods 背后的原理就是找到距离新点最近的预定义数量的训练样本数,并从中预测标签。样本的数量可以是一个用户定义的常数(k-nearest neighbor learning),或者基于局部点密度进行变化(radius-based neighbor learning)。
The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice.
Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree).
尽管非常简单,nearest neighbors 方法已经在很多 classification and regression problems 中成功应用了。

sklearn.neighbors 中的类的输入可以是 NumPy arrays 或者 scipy.sparse matrices
对于稠密矩阵,支持大量可能的距离度量。对于稀疏矩阵,支持任意Minkowski度量进行搜索。

Unsupervised Nearest Neighbors

NearestNeighbors 这个类实现了 unsupervised nearest neighbors learning。它是三个不同的nearest neighbors algorithms的统一接口:BallTree,KDTree,和 a brute-force algorithm based on routines in sklearn.metrics.pairwise

Nearest Neighbors Classification

Neighbors-based classification 是一种 instance-based learning 或者 non-generalizing learning

你可能感兴趣的:(机器学习)