MDL-based Tree Cut Model

A straightforward way for determining a cut of a tree is to collapse the nodes of less frequency into its parent node. However, the method is too heuristic for it relies much on manually tuned frequency threshold. In our practice, we turn to use a theoretically well-motivated method based on the MDL (Minimum Description Length) principle. MDL is a principle of data compression and statistical estimation from information theory.

 

Table 3

Calculating the description length for the model of Figure 5.

C

BIRD

bug

bee

insect

f(C)

8

0

2

0

|C|

4

1

1

1

P(C)

0.8

0.0

0.2

0.0

P(n)

0.2

0.0

0.2

0.0

T

[BIRD, bug, bee, insect]

L(α|T)

(4-1)/2 x log 10 = 4.98

L(S|T, α)

-(2+4+2+2) x log0.2 = 23.22

 

Table 4

Description length of the five tree cut models.

T

L(α|T)

L(S|T, α)

L’(T)

[ANIMAL]

0

28.07

28.07

[BIRD, INSECT]

1.66

26.39

28.05

[BIRD, bug, bee, insect]

4.98

23.22

28.20

[swallow, crow, eagle, bird, INSECT]

6.64

22.39

29.03

[swallow, crow, eagle, bird, bug, bee, insect]

9.97

19.22

29.19

 

你可能感兴趣的:(C++,c,F#,C#)