Viola-Jones人脸检测算法的伟大之处不不仅仅在于其算法的实时效果,更重要的是其提出了解决目标检测这一类问题的一种通用思路。该算法有两个亮点,一个是积分图技术,一个是Cascade训练模型,一经提出便引起了极大关注,在很多优秀的论文中都能看到他们的身影。如TLD算法中Detector部分,以及BING objectness训练时的两层SVM模型等,很难说这没有受到Viola-Jones算法的影响。下面就来介绍构成Cascade模型的其中的一个基本元素AdaptBoost吧。
AdaptBoost并不是Viola-Jones的原创算法,它是机器学习领域的产物,属于Ensemble Learning中boosting的类别。Ensemble类的学习算法分为bagging和boosting两个类别,都是基于弱分类器构造强分类器的思想,其中bagging的代表算法是RandomForests,boosting的代表算法是AdaptBoost。这里推荐一篇论文,介绍AdaptBoost算法理论的,《A Brief Introduction to Boosting》。
本着分享交流的目的,下面的内容包括对AdaptBoost算法的理论介绍及给出用标准C++实现AdaptBoost的代码。对于不想依赖特定库的伙伴们来说,标准C++的这个版本是个不错的选择。如果有什么不正确的地方,请多多指教。
1.AdaptBoost原理
我们知道对于一个给定窗口大小的图像,其Harr特征的维度是很高的,如果用直接用对训练样本计算出的Harr特征来训练分类器这是不太可行的,我们需要对高维的Harr特征进行选择,选择部分来进行分类器的训练。而AdaptBoost恰好就符合这样的思想,其基本思想是由弱分类器构造强分类器,用弱分类器的联合分类结果作为强分类器的结果。AdaptBoost的弱分类器可以是一个stump,也就是树桩的意思,就是一个弱分类器是一个二分类树。在众多维的Harr特征中进行特征选择的方法是,要求选择一个特征,及选择一个该特征下用于二分类的阈值,如果在该特征和阈值下对训练样本的分类误差最小,就以该特征和其二分类阈值作为一个训练好的弱分类器,算法的具体实现可以参看实现部分的bestStump()接口。在每一次为弱分类器选择特征完成后,对于用于训练的样本的分布(也就是各样本的权重,初始值一般是相等的,都是1/N,N为样本个数)进行更新,每次的更新是由上一次的弱分类器的分类结果确定的,对于上一次弱分类器判断错误的样本,其权重会增大,判断正确的样本其权重会减小。AdaptBoost与RandomForest的一个区别是,在计算强分类器的结果时,AdaptBoost的弱分类器的权重是不一样的,而RandomForest的弱分类器的权重是相等的。
AdaptBoost算法的伪代码描述如下:
2.标准C++实现
下面的这个接口部分,包含train的接口不包含test的部分,你可以在这个基础上增加test的接口部分。
#ifndef _ADAPTBOOST_H_ #define _ADAPTBOOST_H_ #include <vector> #include <utility> #include <cmath> using namespace std; /** * @brief decision stump declaration * * @param featureIndex * @param weightedError achieved weighted error * @param threshold * @param margin achieved margin * @param toggle +1 or -1 */ struct StumpRule{ int featureIndex; long double weightedError; double threshold; float margin; int toggle; }; /** * @brief what's inside AdaptBoost * * @param nPositives number of positive examples * @param nNegatives number of negative examples * @param initialPositiveWeight how much weight we give to positives at the outset * @param ascendingFeatures for each feature, we have (float feature value, int exampleIndex) * * @param sampleCount nPositives + nNegatives * @param inTrain is this a training set or a validation set * @param exponentialRisk exponential risk for training set * @param positiveTotalWeight total weight received by positive examples currently * @param negativeTotalWeight total weight received by negative examples currently * @param minWeight minimum weight among all weights currently * @param maxWeight maximum weight among all weights currently * @param weights weight vector for all examples involved * @param labels are they positive or negative examples * @param featureCount how many features are there * @param committee what's the learned committee */ class AdaptBoost{ private: int nPositives; int nNegatives; long double initialPositiveWeight; vector< vector<pair<float, int>> > ascendingFeatures; int sampleCount; int featureCount; long double positiveTotalWeight; long double negativeTotalWeight; long double minWeight; long double maxWeight; long double exponentialRisk; vector<double> weights; vector<int> labels; vector<StumpRule> committee; /** * @brief prevent copy and assignment */ AdaptBoost(const AdaptBoost&); AdaptBoost operator=(const AdaptBoost&); protected: /** * @brief return for an element pointed by iterator and featureIndex its exampleIndex */ int getTrainingExampleIndex(int featureIndex, int iterator); /** * @brief return for an element pointed by iterator and featureIndex its example value */ float getTrainingExampleFeature(int featureIndex, int iterator); /** * @brief sort each featrue from different samples */ void sortFeatures( vector< vector<pair<float, int>> >& features ); /** * @brief best stump given a feature */ void decisionStump( int featureIndex , StumpRule & best ); /** * @brief best stump among all features */ StumpRule bestStump(); public: /** * @brief constructor * @param nPositives number of positives for training examples * @param nNegatives number of negatives for training examples * @param initialPositiveWeight initial weight of positives * @param data for training examples, positves front and negatives back */ AdaptBoost( int nPositives , int nNegatives , long double initialPositiveWeight , const vector< vector<float> >& data ); /** * @brief destructor */ ~AdaptBoost(); /** * @brief perform one round of adaboost */ void oneRoundOfAdaboostTraining(); /** * @brief get committee adaptboost trained */ vector<StumpRule> getCommittee() { return committee; } /** * @brief get committee size */ int getCommitteeSize() { return committee.size(); } /** * @brief given the number of weak classifiers train for a committee * @param numOfWeakClassifier for number of weak classifiers of adapt boost */ void adaptBoostTraining(int numOfWeakClassifier); /** * @brief evaluate how the committee fares on a training dataset * * @param tweak for predictLableOfTrainingExamples * @return falsePositive * @return detectionRate * @vector<int> return a blackList,if element of balckList is 0, then it means that * this sample could be used again otherwise it means not usable */ vector<int> calcEmpiricalErrorInAdaBoostTraining( float tweak , float & falsePositive , float & detectionRate ); /** * @brief given a tweak and a committe, what prediction do you make as to the training examples * * @param thresholdTweak tweak * @return prediction * @param onlyMostRecent use all the committee or its most recent member (a weak learner) */ void predictLabelOfTrainingExamples( float tweakThreshold , vector<int> & prediction , bool onlyMostRecent=false ); }; #endif
#include <cassert> #include <algorithm> #include <iostream> #include <iomanip> #include "VJAdaptBoost.h" using namespace std; #define VERBOSE true //fail and messaging static void fail(const char* message){ cerr << "Error:" << message << endl; exit(EXIT_FAILURE); } //order definition for this type of pairs //compare only the feature values static bool myPairOrder( const pair<float, int>& one , const pair<float, int>& other ){ return one.first < other.first; } //why is one stump better than the other static bool myStumpOrder( const StumpRule & one , const StumpRule & other ){ if(one.weightedError < other.weightedError) return true; if(one.weightedError == other.weightedError && one.margin > other.margin) return true; return false; } int AdaptBoost::getTrainingExampleIndex(int featureIndex, int iterator){ assert(ascendingFeatures.size() > 0 && ascendingFeatures[0].size() >0); return ascendingFeatures[featureIndex][iterator].second; } float AdaptBoost::getTrainingExampleFeature(int featureIndex, int iterator){ assert(ascendingFeatures.size() > 0 && ascendingFeatures[0].size() >0); if(_isnan(ascendingFeatures[featureIndex][iterator].first)){ cerr<<"ERROR: nan feature "<<featureIndex<<" detected for example "<<getTrainingExampleIndex(featureIndex, iterator)<<endl; exit(EXIT_FAILURE); } return ascendingFeatures[featureIndex][iterator].first; } //constructor AdaptBoost::AdaptBoost( int positives , int negatives , long double positiveWeight , const vector< vector<float> >& data) { assert(positives > 0 && negatives > 0); assert(positiveWeight > 0 && positiveWeight < 1); assert(data.size() > 0 && data[0].size() > 0 ); assert(data.size() == (positives + negatives)); //add number of data info to features vector< vector<pair<float, int>> > features(data.size(), vector<pair<float, int>>(data[0].size(), pair<float, int>(0,0))); for(int i=0; i<features.size(); i++) { for(int j=0; j<features[0].size(); j++) { features[i][j] = pair<float, int>(data[i][j], i); } } //initialize the class attributes for the training set nPositives = positives; nNegatives = negatives; initialPositiveWeight = positiveWeight; sortFeatures(features);//initialize ascendingFeatures sampleCount = positives + negatives; featureCount = ascendingFeatures.size(); positiveTotalWeight = positiveWeight; negativeTotalWeight = 1 - positiveWeight; long double posAverageWeight = positiveTotalWeight/(long double)nPositives; long double negAverageWeight = negativeTotalWeight/(long double)nNegatives; maxWeight = max(posAverageWeight, negAverageWeight); minWeight = min(posAverageWeight, negAverageWeight); exponentialRisk = 1; //set weights for each example for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){ weights.push_back(exampleIndex < nPositives ? posAverageWeight : negAverageWeight); labels.push_back(exampleIndex < nPositives ? 1 : -1); } } //destructor AdaptBoost::~AdaptBoost() { } //adaptBoost interface for training void AdaptBoost::adaptBoostTraining(int numOfWeakClassifier) { assert(numOfWeakClassifier > 0); for(int i=0; i<numOfWeakClassifier; i++) { oneRoundOfAdaboostTraining(); } } //validation procedure using training examples vector<int> AdaptBoost::calcEmpiricalErrorInAdaBoostTraining( float tweak , float & falsePositive , float & detectionRate ){ vector<int> blackList; blackList.resize(nPositives, 0); blackList.resize(nPositives+nNegatives, 1); int nFalsePositive = 0; int nFalseNegative = 0; //initially let all be positive vector<int> prediction; prediction.resize(sampleCount,0); predictLabelOfTrainingExamples(tweak, prediction, false); //evaluate prediction errors vector<int> agree(sampleCount); for(int i=0; i<sampleCount; i++) { agree[i] = labels[i]*prediction[i]; } for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){ if(agree[exampleIndex] < 0) { if(exampleIndex < nPositives){ nFalseNegative += 1; blackList[exampleIndex] = 1; }else{ nFalsePositive += 1; blackList[exampleIndex] = 0; } } } //set the returned values falsePositive = nFalsePositive/(float)nNegatives; detectionRate = 1 - nFalseNegative/(float)nPositives; return blackList; } //given a tweak and a committe, what prediction does it make as to the training examples void AdaptBoost::predictLabelOfTrainingExamples( float tweakThreshold , vector<int> & prediction , bool onlyMostRecent ){ int committeeSize = committee.size(); //no need to weigh a single member's decision onlyMostRecent = committeeSize == 1 ? true : onlyMostRecent; int start = onlyMostRecent ? committeeSize - 1 : 0; //double to be more precise vector<vector<double>> memberVerdict; for(int i=0; i<committeeSize; i++) {//initialize memberVerdict vector<double> row(sampleCount); memberVerdict.push_back(row); } vector<double> memberWeight(committeeSize); //members, go ahead for(int member = start; member < committeeSize; member++){ //sanity check if(committee[member].weightedError == 0 && member != 0) fail("Boosting Error Occured!"); //0.5 does not count here //if member's weightedError is zero, member weight is nan, but it won't be used anyway memberWeight[member] = log(1./committee[member].weightedError -1); int feature = committee[member].featureIndex; #pragma omp parallel for schedule(static) for(int iterator = 0; iterator < sampleCount; iterator++){ int exampleIndex = getTrainingExampleIndex(feature, iterator); memberVerdict[member][exampleIndex] = (getTrainingExampleFeature(feature, iterator) > committee[member].threshold ? 1 : -1)*committee[member].toggle + tweakThreshold; } } //joint session if(!onlyMostRecent){ vector<double> finalVerdict(sampleCount); for(int i=0; i<sampleCount; i++) { double predict = 0; for(int j=0; j<committeeSize; j++) { predict += (memberWeight[j] * memberVerdict[j][i]); } finalVerdict[i] = predict; } for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++) prediction[exampleIndex] = finalVerdict[exampleIndex] > 0 ? 1 : -1; }else{ for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++) prediction[exampleIndex] = memberVerdict[start][exampleIndex] > 0 ? 1 : -1; } } void AdaptBoost::oneRoundOfAdaboostTraining(){ //try to be friendly here static int trainPhase = 0; if(VERBOSE && trainPhase == 0){ cout << "\n#############################ADABOOST MESSAGE EXPLAINED####################################################\n\n"; cout << "INFO: Adaboost starts. Exponential Risk is expected to go down steadily and strictly," << endl; cout << "INFO: and Exponential Risk should bound the (weighted) Empirical Error from above." << endl; cout << "INFO: Train Phase is the current boosting iteration." << endl; cout << "INFO: Best Feature is the most discriminative feature selected by decision stump at this iteration." << endl; cout << "INFO: Threshold and Toggle are two parameters that define a real valued decision stump.\n" << endl; } trainPhase++; //get and store the rule StumpRule rule = bestStump(); committee.push_back(rule); //how it fares vector<int> prediction(sampleCount); predictLabelOfTrainingExamples( 0 , prediction , /*onlyMostRecent*/ true); vector<bool> agree(sampleCount); for(int i=0; i<sampleCount; i++) { if(prediction[i] == labels[i]) { agree[i] = true; }else { agree[i] = false; } } //update weights vector<double> weightUpdate; weightUpdate.resize(sampleCount,1); bool errorFlag = false; for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){ //more weight for a difficult example if(!agree[exampleIndex]){ weightUpdate[exampleIndex] = 1/rule.weightedError - 1; errorFlag = true; } } //update weights only if there is an error if(errorFlag){ double weightSum = 0; for(int i=0; i<sampleCount; i++) { weights[i] *= weightUpdate[i]; weightSum += weights[i]; } for(int i=0; i<sampleCount; i++) { weights[i] /= weightSum; } double posTotalWeight = 0; for(int i=0; i<nPositives; i++) { posTotalWeight += weights[i]; } positiveTotalWeight = posTotalWeight; negativeTotalWeight = 1-positiveTotalWeight; double min,max; min = max = weights[0]; for(int i=0; i<sampleCount; i++) { if(weights[i] < min) { min = weights[i]; }else if(weights[i] > max) { max = weights[i]; } } minWeight = min; maxWeight = max; } //exponentialRisk can be zero at the first boosting exponentialRisk *= 2*sqrt((1-rule.weightedError)*rule.weightedError); //print some statistics if(VERBOSE){ float tweak = 0; float falsePositive = 0; float detectionRate = 0; calcEmpiricalErrorInAdaBoostTraining(tweak, falsePositive, detectionRate); float empError = static_cast<float>(falsePositive*(1-initialPositiveWeight)+initialPositiveWeight*(1-detectionRate)); cout << "Training Performance Explanation (before threshold tweaking): falsePositive " << falsePositive << " detectionRate " << detectionRate << endl; cout <<"###########################################################################################################\n"; cout << "\nTrain Phase " << trainPhase << endl << endl; // whatFeature(rule.featureIndex); cout << "\tExponential Risk " << setw(12) << exponentialRisk << setw(19) << "Weighted Error " << setw(11) << rule.weightedError << setw(14) << "Threshold " << setw(10) << rule.threshold << setw(13) <<"Toggle " << setw(12) << rule.toggle << endl; cout << "\tPositive Weight" << setw(14) << positiveTotalWeight << setw(14) << "MinWeight " << setw(16) << minWeight << setw(14) << "MaxWeight " << setw(10) << maxWeight << setw(22) << "Empirical Error " << setw(10) << empError << endl << endl; } } //get a feature from features and put them in ascending order //and record at the same time the permuted example order void AdaptBoost::sortFeatures(vector< vector<pair<float, int>> >& features) { assert(features.size()!=0 && features[0].size() !=0 ); for(unsigned int i=0; i<features[0].size(); i++) { vector<pair<float, int>> temp = vector<pair<float, int>>(); for(unsigned int j=0; j<features.size(); j++) { temp.push_back(features[j][i]); } //sort sort(temp.begin(), temp.end(), myPairOrder); ascendingFeatures.push_back(temp); } } //base learner is a stump, a decision tree of depth 1 //decisionStump has to look at feature and return rule void AdaptBoost::decisionStump( int featureIndex , StumpRule & best ){ //a stump is determined by threshold and toggle, the other two attributes measures its performance //initialize with some crazy values best.featureIndex = featureIndex; best.weightedError = 2; best.threshold = getTrainingExampleFeature(featureIndex, 0) - 1; best.margin = -1; best.toggle = 0; StumpRule current = best; //error_p and error_n allow to set the best toggle long double error_p, error_n; //initialize: r denotes right hand side and l left hand side //convention: in TrainExamples nPositives positive samples are followed by negatives samples long double rPositiveWeight = positiveTotalWeight; long double rNegativeWeight = negativeTotalWeight; //yes, nothing to the left of the sample with the smallest feature long double lPositiveWeight = 0; long double lNegativeWeight = 0; //go through all these observations one after another int iterator = -1; //to build a decision stump, you need a toggle and an admissible threshold //which doesn't coincide with any of the observations while(true){ //We've got a threshold. So determine the best toggle based on two types of error //toggle = 1, positive prediction if and only if the observed feature > the threshold //toggle = -1, positive prediction if and only if the observed feature < the threshold //error_p denotes the error introduced by toggle = 1, error_n the error by toggle = -1 error_p = rNegativeWeight + lPositiveWeight; error_n = rPositiveWeight + lNegativeWeight; current.toggle = error_p < error_n ? 1 : -1; //sometimes shit happens, prevent error from being negative long double smallerError = min(error_p, error_n); //this prevents some spurious nonzero: for currentError must be at least equal to minWeight current.weightedError = smallerError < minWeight * 0.9 ? 0 : smallerError; //update if necessary if(myStumpOrder(current, best)) best = current; //move on iterator++; //we don't actually need to look at the sample with the largest feature //because its rule is exactly equivalent to those produced //by the sample with the smallest feature on training observations //but it won't do any harm anyway if(iterator == sampleCount) break; //handle duplicates, update lr weights and find a new threshold while(true){ //take this guy's attributes int exampleIndex = getTrainingExampleIndex(featureIndex, iterator); int label = labels[exampleIndex]; long double weight = weights[exampleIndex]; //update weights if(label < 0){ lNegativeWeight += weight; rNegativeWeight -= weight; }else{ lPositiveWeight += weight; rPositiveWeight -= weight; } //if a new threshold can be found, break //two cases are possible: either it is the last observation if(iterator == sampleCount - 1) break; //or no duplicate. If there is a duplicate, repeat if(getTrainingExampleFeature(featureIndex, iterator) != getTrainingExampleFeature(featureIndex, iterator + 1)){ double test = ((double)getTrainingExampleFeature(featureIndex, iterator) + (double)getTrainingExampleFeature(featureIndex, iterator + 1))/2; //well that's a bit frustrating: I want to keep float because of memory constraint, but apparently //features are so close, sometimes, numerical precision arises as an unexpected problem, so I decide //to use a double threshold so as to separate float features if(getTrainingExampleFeature(featureIndex, iterator) < test && test < getTrainingExampleFeature(featureIndex, iterator + 1)) break; else{ #pragma omp critical { cout << "ERROR: numerical precision breached: problem feature values " << getTrainingExampleFeature(featureIndex, iterator) << " : " << getTrainingExampleFeature(featureIndex, iterator+1) << ". Problem feature " << featureIndex << " and problem example " << getTrainingExampleIndex(featureIndex, iterator) << " : " << getTrainingExampleIndex(featureIndex, iterator+1) << endl; } fail("fail to find a suitable threshold."); } } iterator++; } //update threshold if(iterator < sampleCount - 1){ current.threshold = ((double)getTrainingExampleFeature(featureIndex, iterator) + (double)getTrainingExampleFeature(featureIndex, iterator + 1))/2; current.margin = getTrainingExampleFeature(featureIndex, iterator + 1) - getTrainingExampleFeature(featureIndex, iterator); }else{ //slightly to the right of the biggest observation current.threshold = getTrainingExampleFeature(featureIndex, iterator) + 1; current.margin = 0; } } } //implement the feature selection's outer loop //return the most discriminative feature and its rule StumpRule AdaptBoost::bestStump( ){ vector<StumpRule> candidates; candidates.resize(featureCount); #pragma omp parallel for schedule(static) for(int featureIndex = 0; featureIndex < featureCount; featureIndex++) decisionStump(featureIndex, candidates[featureIndex]); //loop over all the features //the best rule has the smallest weighted error and the largest margin StumpRule best = candidates[0]; for(int featureIndex = 1; featureIndex < featureCount; featureIndex++){ if(myStumpOrder(candidates[featureIndex], best)) best = candidates[featureIndex]; } //if shit happens, tell me if( best.weightedError >= 0.5 ) fail("Decision Stump failed: base error >= 0.5"); //return return best; }
reference:
Yi-Qing Wang, An Analysis of the Viola-Jones Face Detection Algorithm, IPOL.
本文出自 “Remys” 博客,谢绝转载!