http://scikit-learn.org/stable/modules/multiclass.html
在实际项目中,我们真的很少用到那些简单的模型,比如LR、kNN、NB等,虽然经典,但在工程中确实不实用。
今天我们关注在工程中用的相对较多的 Multiclass and multilabel algorithms。
warning:scikit-learn的所有分类器都是可以do multiclass classification out-of-the-box(可直接使用),所以没必要使用本节介绍的 sklearn.multiclass module,这里只是讲些知识点。
Below is a summary of the classifiers supported by scikit-learn grouped by strategy; you don’t need the meta-estimators in this class if you’re using one of these unless you want custom multiclass behavior:
- Inherently multiclass: Naive Bayes, sklearn.lda.LDA, Decision Trees, Random Forests, Nearest Neighbors, setting “multi_class=multinomial” in sklearn.linear_model.LogisticRegression.
- One-Vs-One: sklearn.svm.SVC.
- One-Vs-All: all linear models except sklearn.svm.SVC.
Some estimators also support multioutput-multiclass classification tasks Decision Trees, Random Forests, Nearest Neighbors.
三类问题:
Multiclass classification means a classification task with more than two classes;但是一个sample只能属于其中一个类别(相当于一个多元分类)。
Multilabel classification assigns to each sample a set of target labels.一个sample可以属于多个类别(相当于多个二元分类)。
Multioutput-multiclass classification and multi-task classification means that a single estimator has to handle several joint classification tasks.(相当于多个多元分类:The set of labels can be different for each output variable. For instance a sample could be assigned “pear” for an output variable that takes possible values in a finite set of species such as “pear”, “apple”, “orange” and “green” for a second output variable that takes possible values in a finite set of colors such as “green”, “red”, “orange”, “yellow”...)。