upadate :2014-2-28 LinJM @HQU 『 libsvm专栏地址:http://blog.csdn.net/column/details/libsvm.html 』
目前libsvm最新的version是3.17,主要的改变是在svm_group_classes函数中加了几行代码。官方的说明如下:
Version 3.17 released on April Fools' day, 2013. We slightly adjust the way class labels are handled internally. By default labels are ordered by their first occurrence in the training set. Hence for a set with -1/+1 labels, if -1 appears first, then internally -1 becomes +1. This has caused confusion. Now for data with -1/+1 labels, we specifically ensure that internally the binary SVM has positive data corresponding to the +1 instances. For developers, see changes in the subrouting svm_group_classes of svm.cpp.
本文就对这个函数进行分析:
svm_group_classes函数的功能是:group training data of the same class
Important:如何将一堆数据归类到一起,同类的连续存储!可参考这个函数。
函数原型如下:
void svm_group_classes(const svm_problem *prob, int *nr_class_ret, int **label_ret, int **start_ret, int **count_ret, int *perm)主要的输入是prob这个指针,它指向svm_group_classes将要处理的样本数据集,另外几个形参是指针类型,可以相当于输出数据,其中:
// label: label name, start: begin of each class, count: #data of classes, perm: indices to the original data // perm, length l, must be allocated before calling this subroutine static void svm_group_classes(const svm_problem *prob, int *nr_class_ret, int **label_ret, int **start_ret, int **count_ret, int *perm) { int l = prob->l;//样本总数 int max_nr_class = 16;//不够的话,自动增长为原来的两倍(见下文) int nr_class = 0; int *label = Malloc(int,max_nr_class);//Malloc(type,n) (type *)malloc((n)*sizeof(type)) int *count = Malloc(int,max_nr_class); int *data_label = Malloc(int,l); int i; for(i=0;i<l;i++) { int this_label = (int)prob->y[i];//将类别赋给this_label int j; for(j=0;j<nr_class;j++) { if(this_label == label[j])//虽然刚开始label里面没值,但是第一步循环本内层也没有被运行 { ++count[j]; break; } } data_label[i] = j; if(j == nr_class) { if(nr_class == max_nr_class) { max_nr_class *= 2;//扩大最大类别数 label = (int *)realloc(label,max_nr_class*sizeof(int)); count = (int *)realloc(count,max_nr_class*sizeof(int)); } label[nr_class] = this_label; count[nr_class] = 1;//这个是1 ++nr_class; } }
本version更新部分:本部分主要是处理二类分类,当第一个出现的是-1时,负责把-1和+1的数据对调。
// // Labels are ordered by their first occurrence in the training set. // However, for two-class sets with -1/+1 labels and -1 appears first, // we swap labels to ensure that internally the binary SVM has positive data corresponding to the +1 instances. // if (nr_class == 2 && label[0] == -1 && label[1] == 1) { swap(label[0],label[1]); swap(count[0],count[1]); for(i=0;i<l;i++) { if(data_label[i] == 0) data_label[i] = 1; else data_label[i] = 0; } }
下面这一部分代码是用来计算每个类别的起始位置start、以及各个样本分类后的在原始数据中的索引位置perm数组。其中perm[i]=j: i表示当前同类样本位置,j表示原始数据位置。
Important:如何将一堆数据归类到一起,同类的连续存储!可参考这个函数。
int *start = Malloc(int,nr_class); start[0] = 0; for(i=1;i<nr_class;i++) start[i] = start[i-1]+count[i-1]; for(i=0;i<l;i++) { perm[start[data_label[i]]] = i; ++start[data_label[i]]; } start[0] = 0; for(i=1;i<nr_class;i++) start[i] = start[i-1]+count[i-1]; *nr_class_ret = nr_class; *label_ret = label; *start_ret = start; *count_ret = count; free(data_label); }
程序分析的图纸:
本文地址:http://blog.csdn.net/linj_m/article/details/20130469
微博:林建民-机器视觉