【cv】Decision stumps||线性分段函数(piecewise linear function )||

1.   Decision stumps的概念

http://www.answers.com/topic/decision-stump

 

A decision stump is a machine learning model consisting of a one-leveldecision tree.[1] That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes. A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called1-rules.[2]

Depending on the type of the input feature, several variations are possible. For nominal features, one may build a stump which contains a leaf for each possible feature value[3][4] or a stump with the two leaves, one of which corresponds to some chosen category, and the other leaf to all the other categories.[5] For binary features these two schemes are identical. A missing value may be treated as a yet another category.[5]

For continuous features, usually, some threshold feature value is selected, and the stump contains two leaves — for values below and above the threshold. However, rarely, multiple thresholds may be chosen and the stump therefore contains three or more leaves.

Decision stumps are often[6] used as components (called "weak learners" or "base learners") in machine learning ensemble techniques such asbagging and boosting. For example, a state-of-the-art Viola-Jones face detection algorithm employsAdaBoost with decision stumps as weak learners.[7]

The term "decision stump" has been coined in a 1992 ICML paper by Wayne Iba and Pat Langley.[1][8]

References

  1. ^ a b Wayne Iba and Pat Langley. (1992).Induction of One-Level Decision Trees. Proceedings of the Ninth International Conference on Machine Learning.
  2. ^Robert C. Holte (1993). "Very Simple Classification Rules Perform Well on Most Commonly Used Datasets".http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.2711&rep=rep1&type=pdf. 
  3. ^Loper, Edward L.; Bird, Steven; Klein, Ewan (2009). Natural language processing with Python. Sebastopol, CA:O'Reilly. ISBN 0-596-51649-5.http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html. 
  4. ^ This classifier is implemented inWeka under the name OneR (for "1-rule").
  5. ^ a b This is what has been implemented inWeka's DecisionStump classifier.
  6. ^ Lev Reyzin and Robert E. Schapire. (2006).How Boosting the Margin Can Also Boost Classifier Complexity. ICML 2006. Page 7.
  7. ^ Paul Viola and Michael J. Jones. (2004).Robust Real-Time Face Detection. International Journal of Computer Vision, 2004.
  8. ^ Jonathan Oliver andDavid Hand. (1994). Averaging Over Decision Stumps. ECML 1994.doi:10.1007/3-540-57868-4_61
    Quote: "These simple rules are in effect severely pruned decision trees and have been termeddecision stumps [cites Iba and Langley]".

/////////////////////////////////////

Machine Learning Class Project  http://gogoshen.org/ml/Lists/General%20Discussion/DispForm.aspx?ID=42
Class Project Discussion: What is a decision stump? -- a decision stump...

 

 

What is a decision stump? -- a decision stump is a decision tree with only one node, and only one branch coming out of the node.
 
Text: This is best illustrated using Weka Explorer on the nominal-Weather dataset. Attached are 2 "Weka Explorer Result files:"
(1) The OneR (i.e., One-Rule) is essentially a single node decision tree. The node has as many output branches as there are values for the attribute assigned to the node.
(2) The Decision-Stump is a more limited single node decision tree. In contrast to One-R, the single node can only have one output branch.
 

 

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

2.  Dicision Tree
 http://superangevil.wordpress.com/2009/11/12/nltk6_4/

 

(1) Dicision Tree
decision tree
就是一个简单的流程图,它从root node开始,自顶向下,根据每个decision node对当前输入的决策选择不同的分枝,重复迭代,一直到leaf node.
总体来说,决策树学习着眼于从一组无次序、无规则的事例中推理出决策树形式的分类规则 — 从根结点到叶结点的一条路径就对应一条合取规则;整棵决策树就对应一组析取表达式规则.
看我们的gender classifier的决策树:

 

decision stump是一棵单点的decision tree, 它基于单个feature对输入进行分类,对每个可能的feature, 它都包含一个leaf,用来指明该feature的class label.
为了建立一个decision stump, 我们必须首先决定用哪一个feature:

最简单的方法是为每个可能的feature都建一个decision stump, 看哪个feature在training set上的accuracy分数高就用哪个(当然还有很多其他方法);一旦我们选定了feature, 就可以将每个leaf都标为频率最高的label.

Decision Stump的选择算法如下:

    首先为分类任务选择全局最好的decision stump
    然后在training set上检测每个leaf node的accuracy
         对于accuracy不好的leaf nodes,在该叶结点路径的训练语料库子集上重新训练获得新的decision stump替换原有decision stump

 

3. Piecewise linear function

http://en.wikipedia.org/wiki/Piecewise_linear_function

 

 

你可能感兴趣的:(PKU,Research)