[Machine Learning] 基于规则的算法 (Rule-based Algorithm)

文章目录

  • One Rule (1R)
  • PRISM


One Rule (1R)

  • Generate a rule (decision stump) for each attribute
  • Evaluate each rule on the training data and calculate the number of errors
  • Choose the one with the smallest number of errors

  • Numerical datasets require discretization
    • 1R has an in-built procedure to do this

PRISM

PRISM是一种covering(覆盖型)的基于规则的算法,相比于前面提到的1R算法,它能够生成多条规则,从而解决更为复杂的问题。

Idea: Generate a rule by adding tests that maximize the rule’s accuracy

  • For each class:
    • Start with an empty rule and gradually add conditions
    • Each condition tests the value of 1 attribute
    • By adding a new test, the rule coverage is reduced → the rule becomes more specific

  • Finish when p/t=1 → the rule covers only examples from the same class, it is “perfect”

  • Which test to add at each step? The one that maximizes accuracy p/t:
    • t: total number of examples (from all classes) covered by the rule (t comes from total)
    • p: examples from the class under consideration, covered by the rule (p comes from positive)
    • t-p: number of errors made by the rule
    • Select the test that maximises the accuracy p/t

Accuracy on training data always 100%.

  • In PRISM, the order in which the rules are applied for a given class doesn’t matter. They are order-independent.
  • Some test examples might not be covered by the PRISM rules and hence, wouldn’t receive a classification. To overcome this, a default rule is needed, which assigns them to the class with the most training examples.
  • For numeric attributes, discretization is required.
  • PRISM is an example of a covering approach for constructing classification rules. It takes each class in turn and generates a rule that covers all the examples in it while excluding the examples not in that class.
  • Both PRISM and Decision Trees select the best test to add to the current rule or tree, but the criterion for selection is different. PRISM maximizes the accuracy p/t while DT maximizes the separation between classes (the information gain).
  • Decision Trees is a top-down method; it follows a divide-and-conquer approach. They start with all the examples and at each stage, find the attribute that best separates the classes; then recursively process the sub-problems that result from the attribute split.

你可能感兴趣的:(机器学习)