在讲Class Extension的时候:有一个十分重要的定义。
Let P be a prefix class with encoding P,and let (x,i) and (y,j) denote any two elements in the class.And let denote the class representing extensions of element(x,i).
Tree Mining Problem
Let D be a database of trees (ie:a forest).
and let subtree for some .
Each occurence of S can be identified by its match label,which is given as the set of matching positions (in T) for nodes in S.
what's the match label?
let be the nodes in T,so
let be the nodes in S,so
then S has a match label
1: for all k = 1,...,m
2:branch in S iff is an ancestor of in T.
这里有两个Condition.不是很懂。
注意:是指这个node的label. t 和 s 如同n一样的作用。没有其他的意思。
个人认为:
then S has a match label
应该改成:
then S has a match label 使得
第4节:使用Scope-List 来加快子树的支持度的计算。
Scope-List Representation:
概念:
X is a k-subtree of a tree T.
refer to the last node of X.
We use the notation to refer to the scope-list of X.
Each element of the scope-list is a triple(t,m,s).
where t is a tree id (tid) means X.
m is a match label of the (k-1) length prefix of X (base T) (个人添加)。
(recall that the prefix match label gives the positions of nodes in T that match the prefix。)
(Since a given prefix can occur multiple times in a tree ,X can be associated with multiple match label as well as multiple scopes.)
s is the scope of the last item
有了上述的概念之后。
4.1:Frequent Subtree Enumeration
Computing and :
Suppose that the initial database is in the horizontal string encode format.
所以D里面的T是一条条串。
看懂代码里面的描述形式:
TreeMiner(D,minsup):
= { classes [] frequent 1-subtrees };
= { classes [P]1 of frequent 2-subtrees };
for all do Enumerate-Frequent-Subtrees();
//注意
Enumerate-Frequent-Subtrees():
for each element do
for each element do
R = {(x,i)+(y,j)};
if for any ,r is frequent then
Enumerate-Frequent-Subtrees()
里面好这个函数,每次传入的是一个 前缀最后加入的是x元素的集合。
其实是 前缀集合的子集。就是最后加入的