Viola-Jones人脸检测算法的伟大之处不不仅仅在于其算法的实时效果,更重要的是其提出了解决目标检测这一类问题的一种通用思路。该算法有两个亮点,一个是积分图技术,一个是Cascade训练模型,一经提出便引起了极大关注,在很多优秀的论文中都能看到他们的身影。如TLD算法中Detector部分,以及BING objectness训练时的两层SVM模型等,很难说这没有受到Viola-Jones算法的影响。下面就来介绍构成Cascade模型的其中的一个基本元素AdaptBoost吧。
AdaptBoost并不是Viola-Jones的原创算法,它是机器学习领域的产物,属于Ensemble Learning中boosting的类别。Ensemble类的学习算法分为bagging和boosting两个类别,都是基于弱分类器构造强分类器的思想,其中bagging的代表算法是RandomForests,boosting的代表算法是AdaptBoost。这里推荐一篇论文,介绍AdaptBoost算法理论的,《A Brief Introduction to Boosting》。
本着分享交流的目的,下面的内容包括对AdaptBoost算法的理论介绍及给出用标准C++实现AdaptBoost的代码。对于不想依赖特定库的伙伴们来说,标准C++的这个版本是个不错的选择。如果有什么不正确的地方,请多多指教。
1.AdaptBoost原理
我们知道对于一个给定窗口大小的图像,其Harr特征的维度是很高的,如果用直接用对训练样本计算出的Harr特征来训练分类器这是不太可行的,我们需要对高维的Harr特征进行选择,选择部分来进行分类器的训练。而AdaptBoost恰好就符合这样的思想,其基本思想是由弱分类器构造强分类器,用弱分类器的联合分类结果作为强分类器的结果。AdaptBoost的弱分类器可以是一个stump,也就是树桩的意思,就是一个弱分类器是一个二分类树。在众多维的Harr特征中进行特征选择的方法是,要求选择一个特征,及选择一个该特征下用于二分类的阈值,如果在该特征和阈值下对训练样本的分类误差最小,就以该特征和其二分类阈值作为一个训练好的弱分类器,算法的具体实现可以参看实现部分的bestStump()接口。在每一次为弱分类器选择特征完成后,对于用于训练的样本的分布(也就是各样本的权重,初始值一般是相等的,都是1/N,N为样本个数)进行更新,每次的更新是由上一次的弱分类器的分类结果确定的,对于上一次弱分类器判断错误的样本,其权重会增大,判断正确的样本其权重会减小。AdaptBoost与RandomForest的一个区别是,在计算强分类器的结果时,AdaptBoost的弱分类器的权重是不一样的,而RandomForest的弱分类器的权重是相等的。
AdaptBoost算法的伪代码描述如下:
2.标准C++实现
下面的这个接口部分,包含train的接口不包含test的部分,你可以在这个基础上增加test的接口部分。
#ifndef _ADAPTBOOST_H_ #define _ADAPTBOOST_H_ #include#include #include using namespace std; /** * @brief decision stump declaration * * @param featureIndex * @param weightedError achieved weighted error * @param threshold * @param margin achieved margin * @param toggle +1 or -1 */ struct StumpRule{ int featureIndex; long double weightedError; double threshold; float margin; int toggle; }; /** * @brief what's inside AdaptBoost * * @param nPositives number of positive examples * @param nNegatives number of negative examples * @param initialPositiveWeight how much weight we give to positives at the outset * @param ascendingFeatures for each feature, we have (float feature value, int exampleIndex) * * @param sampleCount nPositives + nNegatives * @param inTrain is this a training set or a validation set * @param exponentialRisk exponential risk for training set * @param positiveTotalWeight total weight received by positive examples currently * @param negativeTotalWeight total weight received by negative examples currently * @param minWeight minimum weight among all weights currently * @param maxWeight maximum weight among all weights currently * @param weights weight vector for all examples involved * @param labels are they positive or negative examples * @param featureCount how many features are there * @param committee what's the learned committee */ class AdaptBoost{ private: int nPositives; int nNegatives; long double initialPositiveWeight; vector< vector > > ascendingFeatures; int sampleCount; int featureCount; long double positiveTotalWeight; long double negativeTotalWeight; long double minWeight; long double maxWeight; long double exponentialRisk; vector weights; vector labels; vector committee; /** * @brief prevent copy and assignment */ AdaptBoost(const AdaptBoost&); AdaptBoost operator=(const AdaptBoost&); protected: /** * @brief return for an element pointed by iterator and featureIndex its exampleIndex */ int getTrainingExampleIndex(int featureIndex, int iterator); /** * @brief return for an element pointed by iterator and featureIndex its example value */ float getTrainingExampleFeature(int featureIndex, int iterator); /** * @brief sort each featrue from different samples */ void sortFeatures( vector< vector > >& features ); /** * @brief best stump given a feature */ void decisionStump( int featureIndex , StumpRule & best ); /** * @brief best stump among all features */ StumpRule bestStump(); public: /** * @brief constructor * @param nPositives number of positives for training examples * @param nNegatives number of negatives for training examples * @param initialPositiveWeight initial weight of positives * @param data for training examples, positves front and negatives back */ AdaptBoost( int nPositives , int nNegatives , long double initialPositiveWeight , const vector< vector >& data ); /** * @brief destructor */ ~AdaptBoost(); /** * @brief perform one round of adaboost */ void oneRoundOfAdaboostTraining(); /** * @brief get committee adaptboost trained */ vector getCommittee() { return committee; } /** * @brief get committee size */ int getCommitteeSize() { return committee.size(); } /** * @brief given the number of weak classifiers train for a committee * @param numOfWeakClassifier for number of weak classifiers of adapt boost */ void adaptBoostTraining(int numOfWeakClassifier); /** * @brief evaluate how the committee fares on a training dataset * * @param tweak for predictLableOfTrainingExamples * @return falsePositive * @return detectionRate * @vector return a blackList,if element of balckList is 0, then it means that * this sample could be used again otherwise it means not usable */ vector calcEmpiricalErrorInAdaBoostTraining( float tweak , float & falsePositive , float & detectionRate ); /** * @brief given a tweak and a committe, what prediction do you make as to the training examples * * @param thresholdTweak tweak * @return prediction * @param onlyMostRecent use all the committee or its most recent member (a weak learner) */ void predictLabelOfTrainingExamples( float tweakThreshold , vector & prediction , bool onlyMostRecent=false ); }; #endif
#include#include #include #include #include "VJAdaptBoost.h" using namespace std; #define VERBOSE true //fail and messaging static void fail(const char* message){ cerr << "Error:" << message << endl; exit(EXIT_FAILURE); } //order definition for this type of pairs //compare only the feature values static bool myPairOrder( const pair & one , const pair & other ){ return one.first < other.first; } //why is one stump better than the other static bool myStumpOrder( const StumpRule & one , const StumpRule & other ){ if(one.weightedError < other.weightedError) return true; if(one.weightedError == other.weightedError && one.margin > other.margin) return true; return false; } int AdaptBoost::getTrainingExampleIndex(int featureIndex, int iterator){ assert(ascendingFeatures.size() > 0 && ascendingFeatures[0].size() >0); return ascendingFeatures[featureIndex][iterator].second; } float AdaptBoost::getTrainingExampleFeature(int featureIndex, int iterator){ assert(ascendingFeatures.size() > 0 && ascendingFeatures[0].size() >0); if(_isnan(ascendingFeatures[featureIndex][iterator].first)){ cerr<<"ERROR: nan feature "< >& data) { assert(positives > 0 && negatives > 0); assert(positiveWeight > 0 && positiveWeight < 1); assert(data.size() > 0 && data[0].size() > 0 ); assert(data.size() == (positives + negatives)); //add number of data info to features vector< vector > > features(data.size(), vector >(data[0].size(), pair (0,0))); for(int i=0; i (data[i][j], i); } } //initialize the class attributes for the training set nPositives = positives; nNegatives = negatives; initialPositiveWeight = positiveWeight; sortFeatures(features);//initialize ascendingFeatures sampleCount = positives + negatives; featureCount = ascendingFeatures.size(); positiveTotalWeight = positiveWeight; negativeTotalWeight = 1 - positiveWeight; long double posAverageWeight = positiveTotalWeight/(long double)nPositives; long double negAverageWeight = negativeTotalWeight/(long double)nNegatives; maxWeight = max(posAverageWeight, negAverageWeight); minWeight = min(posAverageWeight, negAverageWeight); exponentialRisk = 1; //set weights for each example for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){ weights.push_back(exampleIndex < nPositives ? posAverageWeight : negAverageWeight); labels.push_back(exampleIndex < nPositives ? 1 : -1); } } //destructor AdaptBoost::~AdaptBoost() { } //adaptBoost interface for training void AdaptBoost::adaptBoostTraining(int numOfWeakClassifier) { assert(numOfWeakClassifier > 0); for(int i=0; i AdaptBoost::calcEmpiricalErrorInAdaBoostTraining( float tweak , float & falsePositive , float & detectionRate ){ vector blackList; blackList.resize(nPositives, 0); blackList.resize(nPositives+nNegatives, 1); int nFalsePositive = 0; int nFalseNegative = 0; //initially let all be positive vector prediction; prediction.resize(sampleCount,0); predictLabelOfTrainingExamples(tweak, prediction, false); //evaluate prediction errors vector agree(sampleCount); for(int i=0; i & prediction , bool onlyMostRecent ){ int committeeSize = committee.size(); //no need to weigh a single member's decision onlyMostRecent = committeeSize == 1 ? true : onlyMostRecent; int start = onlyMostRecent ? committeeSize - 1 : 0; //double to be more precise vector > memberVerdict; for(int i=0; i row(sampleCount); memberVerdict.push_back(row); } vector memberWeight(committeeSize); //members, go ahead for(int member = start; member < committeeSize; member++){ //sanity check if(committee[member].weightedError == 0 && member != 0) fail("Boosting Error Occured!"); //0.5 does not count here //if member's weightedError is zero, member weight is nan, but it won't be used anyway memberWeight[member] = log(1./committee[member].weightedError -1); int feature = committee[member].featureIndex; #pragma omp parallel for schedule(static) for(int iterator = 0; iterator < sampleCount; iterator++){ int exampleIndex = getTrainingExampleIndex(feature, iterator); memberVerdict[member][exampleIndex] = (getTrainingExampleFeature(feature, iterator) > committee[member].threshold ? 1 : -1)*committee[member].toggle + tweakThreshold; } } //joint session if(!onlyMostRecent){ vector finalVerdict(sampleCount); for(int i=0; i 0 ? 1 : -1; }else{ for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++) prediction[exampleIndex] = memberVerdict[start][exampleIndex] > 0 ? 1 : -1; } } void AdaptBoost::oneRoundOfAdaboostTraining(){ //try to be friendly here static int trainPhase = 0; if(VERBOSE && trainPhase == 0){ cout << "\n#############################ADABOOST MESSAGE EXPLAINED####################################################\n\n"; cout << "INFO: Adaboost starts. Exponential Risk is expected to go down steadily and strictly," << endl; cout << "INFO: and Exponential Risk should bound the (weighted) Empirical Error from above." << endl; cout << "INFO: Train Phase is the current boosting iteration." << endl; cout << "INFO: Best Feature is the most discriminative feature selected by decision stump at this iteration." << endl; cout << "INFO: Threshold and Toggle are two parameters that define a real valued decision stump.\n" << endl; } trainPhase++; //get and store the rule StumpRule rule = bestStump(); committee.push_back(rule); //how it fares vector prediction(sampleCount); predictLabelOfTrainingExamples( 0 , prediction , /*onlyMostRecent*/ true); vector agree(sampleCount); for(int i=0; i weightUpdate; weightUpdate.resize(sampleCount,1); bool errorFlag = false; for(int exampleIndex = 0; exampleIndex < sampleCount; exampleIndex++){ //more weight for a difficult example if(!agree[exampleIndex]){ weightUpdate[exampleIndex] = 1/rule.weightedError - 1; errorFlag = true; } } //update weights only if there is an error if(errorFlag){ double weightSum = 0; for(int i=0; i max) { max = weights[i]; } } minWeight = min; maxWeight = max; } //exponentialRisk can be zero at the first boosting exponentialRisk *= 2*sqrt((1-rule.weightedError)*rule.weightedError); //print some statistics if(VERBOSE){ float tweak = 0; float falsePositive = 0; float detectionRate = 0; calcEmpiricalErrorInAdaBoostTraining(tweak, falsePositive, detectionRate); float empError = static_cast (falsePositive*(1-initialPositiveWeight)+initialPositiveWeight*(1-detectionRate)); cout << "Training Performance Explanation (before threshold tweaking): falsePositive " << falsePositive << " detectionRate " << detectionRate << endl; cout <<"###########################################################################################################\n"; cout << "\nTrain Phase " << trainPhase << endl << endl; // whatFeature(rule.featureIndex); cout << "\tExponential Risk " << setw(12) << exponentialRisk << setw(19) << "Weighted Error " << setw(11) << rule.weightedError << setw(14) << "Threshold " << setw(10) << rule.threshold << setw(13) <<"Toggle " << setw(12) << rule.toggle << endl; cout << "\tPositive Weight" << setw(14) << positiveTotalWeight << setw(14) << "MinWeight " << setw(16) << minWeight << setw(14) << "MaxWeight " << setw(10) << maxWeight << setw(22) << "Empirical Error " << setw(10) << empError << endl << endl; } } //get a feature from features and put them in ascending order //and record at the same time the permuted example order void AdaptBoost::sortFeatures(vector< vector > >& features) { assert(features.size()!=0 && features[0].size() !=0 ); for(unsigned int i=0; i > temp = vector >(); for(unsigned int j=0; j the threshold //toggle = -1, positive prediction if and only if the observed feature < the threshold //error_p denotes the error introduced by toggle = 1, error_n the error by toggle = -1 error_p = rNegativeWeight + lPositiveWeight; error_n = rPositiveWeight + lNegativeWeight; current.toggle = error_p < error_n ? 1 : -1; //sometimes shit happens, prevent error from being negative long double smallerError = min(error_p, error_n); //this prevents some spurious nonzero: for currentError must be at least equal to minWeight current.weightedError = smallerError < minWeight * 0.9 ? 0 : smallerError; //update if necessary if(myStumpOrder(current, best)) best = current; //move on iterator++; //we don't actually need to look at the sample with the largest feature //because its rule is exactly equivalent to those produced //by the sample with the smallest feature on training observations //but it won't do any harm anyway if(iterator == sampleCount) break; //handle duplicates, update lr weights and find a new threshold while(true){ //take this guy's attributes int exampleIndex = getTrainingExampleIndex(featureIndex, iterator); int label = labels[exampleIndex]; long double weight = weights[exampleIndex]; //update weights if(label < 0){ lNegativeWeight += weight; rNegativeWeight -= weight; }else{ lPositiveWeight += weight; rPositiveWeight -= weight; } //if a new threshold can be found, break //two cases are possible: either it is the last observation if(iterator == sampleCount - 1) break; //or no duplicate. If there is a duplicate, repeat if(getTrainingExampleFeature(featureIndex, iterator) != getTrainingExampleFeature(featureIndex, iterator + 1)){ double test = ((double)getTrainingExampleFeature(featureIndex, iterator) + (double)getTrainingExampleFeature(featureIndex, iterator + 1))/2; //well that's a bit frustrating: I want to keep float because of memory constraint, but apparently //features are so close, sometimes, numerical precision arises as an unexpected problem, so I decide //to use a double threshold so as to separate float features if(getTrainingExampleFeature(featureIndex, iterator) < test && test < getTrainingExampleFeature(featureIndex, iterator + 1)) break; else{ #pragma omp critical { cout << "ERROR: numerical precision breached: problem feature values " << getTrainingExampleFeature(featureIndex, iterator) << " : " << getTrainingExampleFeature(featureIndex, iterator+1) << ". Problem feature " << featureIndex << " and problem example " << getTrainingExampleIndex(featureIndex, iterator) << " : " << getTrainingExampleIndex(featureIndex, iterator+1) << endl; } fail("fail to find a suitable threshold."); } } iterator++; } //update threshold if(iterator < sampleCount - 1){ current.threshold = ((double)getTrainingExampleFeature(featureIndex, iterator) + (double)getTrainingExampleFeature(featureIndex, iterator + 1))/2; current.margin = getTrainingExampleFeature(featureIndex, iterator + 1) - getTrainingExampleFeature(featureIndex, iterator); }else{ //slightly to the right of the biggest observation current.threshold = getTrainingExampleFeature(featureIndex, iterator) + 1; current.margin = 0; } } } //implement the feature selection's outer loop //return the most discriminative feature and its rule StumpRule AdaptBoost::bestStump( ){ vector candidates; candidates.resize(featureCount); #pragma omp parallel for schedule(static) for(int featureIndex = 0; featureIndex < featureCount; featureIndex++) decisionStump(featureIndex, candidates[featureIndex]); //loop over all the features //the best rule has the smallest weighted error and the largest margin StumpRule best = candidates[0]; for(int featureIndex = 1; featureIndex < featureCount; featureIndex++){ if(myStumpOrder(candidates[featureIndex], best)) best = candidates[featureIndex]; } //if shit happens, tell me if( best.weightedError >= 0.5 ) fail("Decision Stump failed: base error >= 0.5"); //return return best; }
reference:
Yi-Qing Wang, An Analysis of the Viola-Jones Face Detection Algorithm, IPOL.