最近一直在看KALDI官网的资料,在看的同时加一些注解,方便自己的理解。
我的学习笔记基本上都是转自KALDI官方网址http://kaldi.sourceforge.net,并加上我的注解,特此说明。
Clustering mechanisms in Kaldi
注:这一部分主要介绍了Kaldi中的聚类机制
See Classes and functions related to clustering for a list of classes and functions involved in this. This page does not cover phonetic decision-tree clustering (seeDecision tree internals and How decision trees are used in Kaldi), although classes and functions introduced in this page are used in lower levels of the phonetic clustering code.
注:这部分不涉及音素决策树的聚类而是比较低级别的音素聚类
The Clusterable interface
The Clusterable class is a pure virtual class from which the classGaussClusterable inherits (GaussClusterable represents Gaussian statistics). In future we will add other types of clusterable object that inherit fromClusterable. The reason for the Clusterable class is to allow us to use generic clustering algorithms.
注:主要的类,Clusterable,GaussClusterable
The central notion of theClusterable interface is that of adding statistics together, and measuring the objective function. The notion of distance between twoClusterable objects is derived from measuring the objective function of the two objects separately, then adding them together and measuring the objective function; the negative of the decrease in objective function gives the notion of distance.
Examples of Clusterable classes that we intend to add at some point include mixture-of-Gaussian statistics derived from posteriors of a fixed, shared, mixture-of-Gaussians model, and also collections of counts of discrete observations (the objective function would be equivalent to the negated entropy of the distribution, times the number of counts).
An example of getting a pointer of type Clusterable* (which is actually of theGaussClusterable type) is as follows:
Vector<BaseFloat> x_stats(10), x2_stats(10); BaseFloat count = 100.0, var_floor = 0.01; // initialize x_stats and x2_stats e.g. as // x_stats = 100 * mu_i, x2_stats = 100 * (mu_i*mu_i + sigma^2_i) Clusterable *cl = new GaussClusterable(x_stats, x2_stats, var_floor, count);Clustering algorithms
We have implemented a number of generic clustering algorithms. These are listed inAlgorithms for clustering. A data-structure that is used heavily in these algorithms is a vector of pointers to theClusterable interface class:
std::vector<Clusterable*> to_be_clustered;The index into the vector is the index of the "point" to be clustered.
K-means and algorithms with similar interfaces
A typical example of calling clustering code is as follows:
std::vector<Clusterable*> to_be_clustered; // initialize "to_be_clustered" somehow ... std::vector<Clusterable*> clusters; int32 num_clust = 10; // requesting 10 clusters ClusterKMeansOptions opts; // all default. std::vector<int32> assignments; ClusterKMeans(to_be_clustered, num_clust, &clusters, &assignments, opts);After the clustering code is called, "assignments" will tell you for each item in "to_be_clustered", which cluster it is assigned to. The ClusterKMeans()algorithm is fairly efficient even for large number of points; click the function name for more details.
There are two more algorithms that have a similar interface toClusterKMeans(): namely, ClusterBottomUp() and ClusterTopDown(). Probably the more useful one is ClusterTopDown(), which should be more efficient thanClusterKMeans() if the number of clusters is large (it does a binary split, and then does a binary split on the leaves, and so on). Internally it callsTreeCluster(), see below.
Tree clustering algorithm
The function TreeCluster() clusters points into a binary tree (the leaves won't necessarily have just one point each, you can specify a maximum number of leaves). This function is useful, for instance, when building regression trees for adaptation. See that function's documentation for a detailed explanation of its output format. The quick overview is that it numbers leaf and non-leaf nodes in topological order with the leaves first and the root last, and outputs a vector that tells you for each node what its parent is.
注:
如果有什么问题或者有关于Kaldi的评论亦或是添加Kaldi使用者的邮件列表可以给这个地址发邮件:
或者访问:
http://sourceforge.net/p/kaldi/discussion/