Multi-class, Multi-label 以及 Multi-task 问题

一直很纠结Multi-class, Multi-label 以及 Multi-task 各自的区别和联系,最近找到了以下的说明资料:

  • Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.

  • Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

  • Multioutput-multiclass classification and multi-task classification means that a single estimator has to handle several joint classification tasks. This is a generalization of the multi-label classification task, where the set of classification problem is restricted to binary classification, and of the multi-class classification task. The output format is a 2d numpy array or sparse matrix.

    The set of labels can be different for each output variable. For instance a sample could be assigned “pear” for an output variable that takes possible values in a finite set of species such as “pear”, “apple”, “orange” and “green” for a second output variable that takes possible values in a finite set of colors such as “green”, “red”, “orange”, “yellow”…

    This means that any classifiers handling multi-output multiclass or multi-task classification task supports the multi-label classification task as a special case. Multi-task classification is similar to the multi-output classification task with different model formulations. For more information, see the relevant estimator documentation.

可以看出:

  • Multiclass classification 就是多分类问题,比如年龄预测中把人分为小孩,年轻人,青年人和老年人这四个类别。Multiclass classificationbinary classification相对应,性别预测只有男、女两个值,就属于后者。
  • Multilabel classification 是多标签分类,比如一个新闻稿A可以与{政治,体育,自然}有关,就可以打上这三个标签。而新闻稿B可能只与其中的{体育,自然}相关,就只能打上这两个标签。
  • Multioutput-multiclass classificationmulti-task classification 指的是同一个东西。仍然举前边的新闻稿的例子,定义一个三个元素的向量,该向量第1、2和3个元素分别对应是否(分别取值1或0)与政治、体育和自然相关。那么新闻稿A可以表示为[1,1,1],而新闻稿B可以表示为[0,1,1],这就可以看成是multi-task classification 问题了。 从这个例子也可以看出,Multilabel classification 是一种特殊的multi-task classification 问题。之所以说它特殊,是因为一般情况下,向量的元素可能会取多于两个值,比如同时要求预测年龄和性别,其中年龄有四个取值,而性别有两个取值。

Multi-class, Multi-label 以及 Multi-task 问题_第1张图片

你可能感兴趣的:(Math&Simulation,SomeConcepts)