BoF-SIFT Features with OpenCV

Introduction

Content based image retrieval (CBIR) is still an active research field. There are number of approaches available to retrieve visual data from large databases. But almost all the approaches require an image digestion in their initial steps. Image digestion is describing an image using low level features such as color, shape and texture while removing unimportant details. Color histograms, color moments, dominant color, scalable color, shape contour, shape region, homogeneous texture, texture browsing and edge histogram are some of the popular descriptors that are used in CBIR applications. Bag-Of-Feature (BoF) is another kind of visual feature descriptor which can be used in CBIR applications. In order to obtain a BoF descriptor we need to extract a feature from the image. This feature can be any thing such as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features) and LBP (Local Binary Patterns) etc. 

You can find a brief description of BoF, SIFT and how to obtain BoF from SIFT features (BoF-SIFT) with the source code from this article. The BoF-SIFT has been implemented usingOpenCV 2.4 and visual C++ (VS2008). But you can easily modify the code to work with any flavor of c++. You can write the same code yourself if you go through a fewOpenCV tutorials.  

If you are a developer of CBIR applications or a researcher of visual content analysis you may use this code for your application or for compare with your own visual descriptor.  Further you can modify this code to obtain other BoF descriptors such as BoF-SURF or BoF-LBP etc. 

Background 

BoF and SIFT are totally independent algorithms. The next sections describe what is SIFT and then BoF. 

SIFT - Scale Invariant Feature Transform  

Point like features are very popular in many fields including 3D reconstruction and image registration etc. A good point feature should be invariant to geometrical transformation and illumination. A point feature can be a blob or a corner. SIFT is one of most popular feature extraction and description algorithm. It extracts blob like feature points and describe them with a scale, illumination and rotational invariant descriptor.

BoF-SIFT Features with OpenCV_第1张图片

The above image shows how a SIFT point is describe using a histogram of gradient magnitude and direction around the feature point. I'm not going to explain the whole SIFT algorithm in this article. But you can find the theoretical background of SIFT fromWikipedia or read David Lowe's original article regarding SIFT. I recommend to read thisblog post for those who less interest in mathematics. 

Unlike color histogram descriptor or LBP like descriptors, SIFT algorithm does not give an overall impression of the image. Instead it detects blob like features from the image and describe each and every point with a descriptor that contains 128 numbers. As the output it gives an array of point descriptors.

CBIR needs a global descriptor in order to match with visual data in a database or retrieve the semantic concept out of a visual content. We can use the array of point descriptors that yields from SIFT algorithm for obtaining a global descriptor which gives an overall impression of visual data for CBIR applications. There are several methods available to obtain that global descriptor from SIFT feature point descriptors, and BoF is one general method that can be used to do the task. 

Bag-Of-Feature (BoF) Descriptor  

BoF is one of the popular visual descriptors used for visual data classification. BoF is inspired by a concept called Bag of words that is used in document classification. A bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words of features is a sparse vector of occurrence counts of a vocabulary of local image features. 

BoF typically involves in two main steps. First step is obtaining the set of bags of features. This step is actually offline process. We can obtain set of bags for particular features and then use them for creating BoF descriptor. The second step is we cluster the set of given features into the set of bags that we created in first step and then create the histogram taking the bags as the bins. This histogram can be used to classify the image or video frame. 

Bag-of_Features with SIFT

Let see how can we build BoF with SIFT features.

1. Obtain the set of bags of features.
    i. Select a large set of images
   ii. Extract SIFT feature points of all the images in the set.
  iii. Obtain SIFT descriptor for each feature point that is extracted from each image.
  iv. Define the number of bags.
   v. Cluster the set of feature descriptors for amount of bags we defined and train the bags with clustered feature descriptors (We can use K-Means algorithm).   
  vi. Obtain the visual vocabulary.

2. Obtain the BoF descriptor for given image/video frame.  
    i.  Extract SIFT feature points of given image
   ii.  Obtain SIFT descriptor for each feature points
  iii.  Match the feature descriptors with the vocabulary we created in first step and build the histogram. 

The following image shows the above two steps clearly. (The image is from http://www.sccs.swarthmore.edu/users/09/btomasi1/tagging-products.html) 

BoF-SIFT Features with OpenCV_第2张图片

Using the code  

With OpenCV we can implement BoF-SIFT with a few lines of code. Make sure that you have installedOpenCV 2.3 or higher version and visual studio 2008 or higher. TheOpenCV version requirement is a must but still you may use other c++ flavors without any problem. 

The code has two separate regions that are compiled and run independently. The first region is for obtaining the set of bags of features and other region for obtaining the BoF descriptor for a given image/video frame. You need to run the first region of the code only once. After creating the vocabulary you can use it with the second region of code any time. Modifying the code line below, can switch between the 2 regions of the code. 

#define DICTIONARY_BUILD 1 // set DICTIONARY_BUILD to 1 for Step 1. 0 for step 2
Setting the  DICTIONARY_BUILD constant to 1, will activate the following code region.
#if DICTIONARY_BUILD == 1
 
	//Step 1 - Obtain the set of bags of features.

	//to store the input file names
	char * filename = new char[100];		
	//to store the current input image
	Mat input;	
 
	//To store the keypoints that will be extracted by SIFT
	vector<KeyPoint> keypoints;
	//To store the SIFT descriptor of current image
	Mat descriptor;
	//To store all the descriptors that are extracted from all the images.
	Mat featuresUnclustered;
	//The SIFT feature extractor and descriptor
	SiftDescriptorExtractor detector;	
	
	//I select 20 (1000/50) images from 1000 images to extract 
        //feature descriptors and build the vocabulary
	for(int f=0;f<999;f+=50){		
		//create the file name of an image
		sprintf(filename,"G:\\testimages\\image\\%i.jpg",f);
		//open the file
		input = imread(filename, CV_LOAD_IMAGE_GRAYSCALE); //Load as grayscale				
		//detect feature points
		detector.detect(input, keypoints);
		//compute the descriptors for each keypoint
		detector.compute(input, keypoints,descriptor);		
		//put the all feature descriptors in a single Mat object 
		featuresUnclustered.push_back(descriptor);		
		//print the percentage
		printf("%i percent done\n",f/10);
	}	
 

	//Construct BOWKMeansTrainer
	//the number of bags
	int dictionarySize=200;
	//define Term Criteria
	TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
	//retries number
	int retries=1;
	//necessary flags
	int flags=KMEANS_PP_CENTERS;
	//Create the BoW (or BoF) trainer
	BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
	//cluster the feature vectors
	Mat dictionary=bowTrainer.cluster(featuresUnclustered);	
	//store the vocabulary
	FileStorage fs("dictionary.yml", FileStorage::WRITE);
	fs << "vocabulary" << dictionary;
	fs.release();

You can find what each line of code does by go through the comments above the line. As the summary this part of code simply read a set of images from my hard disk, extract SIFT feature and descriptors, concatenate them, cluster them to number of bags (dictionarySize) and then produce a vocabulary by training the bags with the clustered feature descriptors. You can modify the path to the images and give your own set of images to build the vocabulary. 

After running this code you can see a file called dictionary.yml in you project directory. I suggest you to open it with a notepad and see how the vocabulary appeasers. It may not make any sense for you. But you can get an idea about the structure of the file which will be important if you work with OpenCV in future,

If you run this code successfully then you can activate the next section by setting the DICTIONARY_BUILD to 0. Here onward we don't need the first part of the code since we already obtain a vocabulary and saved it in a file.

The following part is the next code section which achieves the second step.

#else
	//Step 2 - Obtain the BoF descriptor for given image/video frame. 

    //prepare BOW descriptor extractor from the dictionary    
	Mat dictionary; 
	FileStorage fs("dictionary.yml", FileStorage::READ);
	fs["vocabulary"] >> dictionary;
	fs.release();	
    
	//create a nearest neighbor matcher
	Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
	//create Sift feature point extracter
	Ptr<FeatureDetector> detector(new SiftFeatureDetector());
	//create Sift descriptor extractor
	Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);	
	//create BoF (or BoW) descriptor extractor
	BOWImgDescriptorExtractor bowDE(extractor,matcher);
	//Set the dictionary with the vocabulary we created in the first step
	bowDE.setVocabulary(dictionary);
 
	//To store the image file name
	char * filename = new char[100];
	//To store the image tag name - only for save the descriptor in a file
	char * imageTag = new char[10];
 
	//open the file to write the resultant descriptor
	FileStorage fs1("descriptor.yml", FileStorage::WRITE);	
	
	//the image file with the location. change it according to your image file location
	sprintf(filename,"G:\\testimages\\image\\1.jpg");		
	//read the image
	Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);		
	//To store the keypoints that will be extracted by SIFT
	vector<KeyPoint> keypoints;		
	//Detect SIFT keypoints (or feature points)
	detector->detect(img,keypoints);
	//To store the BoW (or BoF) representation of the image
	Mat bowDescriptor;		
	//extract BoW (or BoF) descriptor from given image
	bowDE.compute(img,keypoints,bowDescriptor);
 
	//prepare the yml (some what similar to xml) file
	sprintf(imageTag,"img1");			
	//write the new BoF descriptor to the file
	fs1 << imageTag << bowDescriptor;		
 
	//You may use this descriptor for classifying the image.
			
	//release the file storage
	fs1.release();
#endif

In this section SIFT features and descriptors are calculated for a particular image and match the each and every feature descriptor with the vocabulary we created before.
Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);

This line of code will create a matcher that match the descriptor with Fast Library for Approximate Nearest Neighbors (FLANN). There are some other types of matchers available so you can explore about them yourself. In general approximate nearest neighbor matching works well. 

Finally the code outputs the Bag Of Feature descriptor and save in a file with the following code line.

fs1 << imageTag << bowDescriptor;

This descriptor can be used to classify the image for several classes. You may use SVM or any other classifier to check the discriminative power and the robustness of this descriptor. In other hand you can directly match BoF descriptors of to different images in order to measure the similarity.

BOFISIFT Code

// BoFSIFT.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <opencv/cv.h>
#include <opencv/highgui.h>
#include <opencv2/nonfree/features2d.hpp>

using namespace cv;
using namespace std;

#define DICTIONARY_BUILD 1 // set DICTIONARY_BUILD 1 to do 
                           //Step 1, otherwise it goes to step 2

int _tmain(int argc, _TCHAR* argv[])
{	
#if DICTIONARY_BUILD == 1

	//Step 1 - Obtain the set of bags of features.

	//to store the input file names
	char * filename = new char[100];		
	//to store the current input image
	Mat input;	

	//To store the keypoints that will be extracted by SIFT
	vector<KeyPoint> keypoints;
	//To store the SIFT descriptor of current image
	Mat descriptor;
	//To store all the descriptors that are extracted from all the images.
	Mat featuresUnclustered;
	//The SIFT feature extractor and descriptor
	SiftDescriptorExtractor detector;	
	
	//I select 20 (1000/50) images from 1000 images to extract feature
        // descriptors and build the vocabulary
	for(int f=0;f<999;f+=50){		
		//create the file name of an image
		sprintf(filename,"G:\\testimages\\image\\%i.jpg",f);
		//open the file
		input = imread(filename, CV_LOAD_IMAGE_GRAYSCALE); //Load as grayscale				
		//detect feature points
		detector.detect(input, keypoints);
		//compute the descriptors for each keypoint
		detector.compute(input, keypoints,descriptor);		
		//put the all feature descriptors in a single Mat object 
		featuresUnclustered.push_back(descriptor);		
		//print the percentage
		printf("%i percent done\n",f/10);
	}	


	//Construct BOWKMeansTrainer
	//the number of bags
	int dictionarySize=200;
	//define Term Criteria
	TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
	//retries number
	int retries=1;
	//necessary flags
	int flags=KMEANS_PP_CENTERS;
	//Create the BoW (or BoF) trainer
	BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
	//cluster the feature vectors
	Mat dictionary=bowTrainer.cluster(featuresUnclustered);	
	//store the vocabulary
	FileStorage fs("dictionary.yml", FileStorage::WRITE);
	fs << "vocabulary" << dictionary;
	fs.release();
	
#else
	//Step 2 - Obtain the BoF descriptor for given image/video frame. 

    //prepare BOW descriptor extractor from the dictionary    
	Mat dictionary; 
	FileStorage fs("dictionary.yml", FileStorage::READ);
	fs["vocabulary"] >> dictionary;
	fs.release();	
    
	//create a nearest neighbor matcher
	Ptr<DescriptorMatcher> matcher(new FlannBasedMatcher);
	//create Sift feature point extracter
	Ptr<FeatureDetector> detector(new SiftFeatureDetector());
	//create Sift descriptor extractor
	Ptr<DescriptorExtractor> extractor(new SiftDescriptorExtractor);	
	//create BoF (or BoW) descriptor extractor
	BOWImgDescriptorExtractor bowDE(extractor,matcher);
	//Set the dictionary with the vocabulary we created in the first step
	bowDE.setVocabulary(dictionary);

	//To store the image file name
	char * filename = new char[100];
	//To store the image tag name - only for save the descriptor in a file
	char * imageTag = new char[10];

	//open the file to write the resultant descriptor
	FileStorage fs1("descriptor.yml", FileStorage::WRITE);	
	
	//the image file with the location. change it according to your image file location
	sprintf(filename,"G:\\testimages\\image\\1.jpg");		
	//read the image
	Mat img=imread(filename,CV_LOAD_IMAGE_GRAYSCALE);		
	//To store the keypoints that will be extracted by SIFT
	vector<KeyPoint> keypoints;		
	//Detect SIFT keypoints (or feature points)
	detector->detect(img,keypoints);
	//To store the BoW (or BoF) representation of the image
	Mat bowDescriptor;		
	//extract BoW (or BoF) descriptor from given image
	bowDE.compute(img,keypoints,bowDescriptor);

	//prepare the yml (some what similar to xml) file
	sprintf(imageTag,"img1");			
	//write the new BoF descriptor to the file
	fs1 << imageTag << bowDescriptor;		

	//You may use this descriptor for classifying the image.
			
	//release the file storage
	fs1.release();
#endif
	printf("\ndone\n");	
    return 0;
}

(from website:http://www.codeproject.com/Articles/619039/Bag-of-Features-Descriptor-on-SIFT-Features-with-O)


关于Image Engineering & Computer Vision的更多讨论与交流,敬请关注本博和新浪微博songzi_tea.


你可能感兴趣的:(计算机视觉,图像分析,BoF-SIFT)