1. Decision stumps的概念
A decision stump is a machine learning model consisting of a one-leveldecision tree.[1] That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes. A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called1-rules.[2]
Depending on the type of the input feature, several variations are possible. For nominal features, one may build a stump which contains a leaf for each possible feature value[3][4] or a stump with the two leaves, one of which corresponds to some chosen category, and the other leaf to all the other categories.[5] For binary features these two schemes are identical. A missing value may be treated as a yet another category.[5]
For continuous features, usually, some threshold feature value is selected, and the stump contains two leaves — for values below and above the threshold. However, rarely, multiple thresholds may be chosen and the stump therefore contains three or more leaves.
Decision stumps are often[6] used as components (called "weak learners" or "base learners") in machine learning ensemble techniques such asbagging and boosting. For example, a state-of-the-art Viola-Jones face detection algorithm employsAdaBoost with decision stumps as weak learners.[7]
The term "decision stump" has been coined in a 1992 ICML paper by Wayne Iba and Pat Langley.[1][8]
(for "1-rule").DecisionStump
Machine Learning Class Project http://gogoshen.org/ml/Lists/General%20Discussion/DispForm.aspx?ID=42 |
Class Project Discussion: What is a decision stump? -- a decision stump... |
What is a decision stump? -- a decision stump is a decision tree with only one node, and only one branch coming out of the node.
Text: This is best illustrated using Weka Explorer on the nominal-Weather dataset. Attached are 2 "Weka Explorer Result files:"
(1) The OneR (i.e., One-Rule) is essentially a single node decision tree. The node has as many output branches as there are values for the attribute assigned to the node.
(2) The Decision-Stump is a more limited single node decision tree. In contrast to One-R, the single node can only have one output branch.
2. Dicision Tree
(1) Dicision Tree
decision tree就是一个简单的流程图,它从root node开始,自顶向下,根据每个decision node对当前输入的决策选择不同的分枝,重复迭代,一直到leaf node.
总体来说,决策树学习着眼于从一组无次序、无规则的事例中推理出决策树形式的分类规则 — 从根结点到叶结点的一条路径就对应一条合取规则;整棵决策树就对应一组析取表达式规则.
看我们的gender classifier的决策树:
decision stump是一棵单点的decision tree, 它基于单个feature对输入进行分类,对每个可能的feature, 它都包含一个leaf,用来指明该feature的class label.
为了建立一个decision stump, 我们必须首先决定用哪一个feature:
最简单的方法是为每个可能的feature都建一个decision stump, 看哪个feature在training set上的accuracy分数高就用哪个(当然还有很多其他方法);一旦我们选定了feature, 就可以将每个leaf都标为频率最高的label.
Decision Stump的选择算法如下:
首先为分类任务选择全局最好的decision stump
然后在training set上检测每个leaf node的accuracy
对于accuracy不好的leaf nodes,在该叶结点路径的训练语料库子集上重新训练获得新的decision stump替换原有decision stump
3. Piecewise linear function