Constraint-Based Pattern Mining

在数据挖掘中,如何进行有约束地挖掘,如何对待挖掘数据进行条件约束与筛选,是本文探讨的话题。

Why do we use constraint-based pattern mining? Because we’d like to apply different pruning methods to constrain pattern mining process.
And for those reasons:

  • Finding all the patterns in a dataset autonomously? — unrealistic!
    • Too many patterns but not necessarily user-interested!
  • Pattern mining should be an interactive process
    • User directs what to be mined using a data mining query language (or a graphical user interface)
  • Constraint-based mining
    • User flexibility: provides constraints on what to be mined
    • Optimization: explores such constraints for efficient mining
      • Constraint-based mining: Constraint-pushing, similar to push selection first in DB query processing

Constraints in General Data Mining

A data mining query can be in the form of a meta-rule or with the following language primitives
* Knowledge type constraint:
* Ex.: classification, association, clustering, outlier finding, ….
* Data constraint — using SQL-like queries
* Ex.: find products sold together in NY stores this year
* Dimension/level constraint
* Ex.: in relevance to region, price, brand, customer category
* Rule (or pattern) constraint
* Ex.: small sales (price < $10) triggers big sales (sum > $200)
* Interestingness constraint
* Ex.: strong rules: min_sup 0.02, min_conf 0.6, min_correlation 0.7

Different Kinds of Constraints: Different Pruning Methods

  • Constraints can be categorized as
    • Pattern space pruning constraints vs. data space pruning constraints
  • Pattern space pruning constraints
    • Anti-monotonic: If constraint c is violated, its further mining can be terminated
    • Monotonic: If c is satisfied, no need to check c again
    • Succinct: if the constraint c can be enforced by directly manipulating the data
    • Convertible: c can be converted to monotonic or anti-monotonic if items can be properly ordered in processing
  • Data space pruning constraints
    • Data succinct: Data space can be pruned at the initial pattern mining process
    • Data anti-monotonic: If a transaction t does not satisfy c, then t can be pruned to reduce data processing effort.

Pattern Anti-monotonicity

这里range(S.profit)指的是max-min
这里因为随着item的增多,itemset S的support会逐渐减小,所以ex4的答案是yes

Pattern Monotonicity

Data Anti-monotonicity

Succinct Constraints

Convertible Constraints


这里,我们将transaction里面的item进行递减或递增排序,此时就可以将constraint转化为monotone或anti-monotone.


参考上面的说明,即可得出结论:这里我们会选择一个T中的一个或几个item,此时item是有顺序的。
注意我们都将按照right order进行pattern generation

你可能感兴趣的:(Data,Mining)