Ref: chapter 8 in <<Data Mining. Concepts and Techs> 3rd Ed, by Han, etc.
Information Gain
strong: easy implement
weak: it prefers to select attribute with a large number of values, therefore, the selected splitting attribute might cause a large number of partitions, leading to bad purity. For example, in traffic classification, if using packet size attribute as splitting attribute, the partitions (internal nodes) can include such as 20, 50, 100, 1200, 1500, etc.
Information Gain+Gain Ratio
strong: improve the weakness of information Gain by using the gain ratio parameter to select splitting attribute with a relatively smaller size. It is a trade-off betweenrespecting to classes and respecting to outcome partitions.
weak: if the split information approaches 0, the ratio is unstable. So, to avoid this, the information gain selected must be large.
Gini Index
measures the impurity of training data set or a partition. The subset that gives the minimum Gini index for that attribute is selected as its splitting subset.
weak: difficult when the number of classes is large