Fastest HOG Feature Extraction implementation

Refer from http://stackoverflow.com/questions/18474897/fastest-hog-feature-extraction-implementation


Question
What's the fastest open-source HOG extraction code for multicore CPUs?

Motivation
I'm working on a real-time object detection application. Specifically, I've developed a variant of Deformable Parts Model cascades, targeting 30fps object detection. I've reached a point where extracting HOG features is more expensive than the rest of my pipeline, combined. I'm using the [Felzenzwalb, Girshick, et al] parameters for HOG extraction. That is, a multiresolution pyramid of HOG descriptors, and each descriptor has a total of 32 bins for orientation and a few other cues.

Goals
I'd like to do multiscale HOG feature extraction at 60fps (16ms) for 640x480 images on a multicore CPU.

Related Work
I've benchmarked a few off-the-shelf multiscale HOG implementations on a 6-core Intel 3930k CPU. For a 640x480 image, I observe the following performance numbers:

  • HOG in Dubout's FFLD DPM code: 19fps (52ms) -- C++ with OpenMP, but no vectorization
  • HOG in voc-release5 DPM code: 2.4fps (410ms) -- singlethreaded C++, plus a Matlab wrapper

I've also experimented with the OpenCV HOG extraction code. The OpenCV version works, but it seems to be hard-coded for Dalal-Triggs' HOG setup, and OpenCV doesn't seem to allow me to use the same HOG parameters (normalization scheme, binary position features, etc) as [Felzenzwalb, Girshick, et al]. The OpenCV version also doesn't natively support multiscale HOG, though you could do the downsampling yourself and call OpenCV HOG for each scale. I don't remember what the OpenCV HOG performance looked like.

Final Thoughts

  1. The fastest HOG implementation -- FFLD -- seems to leave a lot of performance on the table. I haven't done a GFLOP/s estimate, but I do notice that FFLD's HOG code doesn't use any SSE/AVX vectorization. There isn't that much control flow, so vectorization seems like a cheap speedup opportunity here.
  2. I haven't mentioned GPU HOG implementations here. I've experimented with groundHOG/CUHOG and fasthog. The CUHOG authors claim 20fps (50ms) HOG extraction on an NVIDIA GTX560. But, Intel CPUs are the target platform for my application, and copying a full HOG pyramid from the GPU to CPU is prohibitively expensive.
share improve this question
 
 
OpenCV includes the Dalal's implementation of HOG both in CPU and GPU versions. They work pretty good in my opinion, and they can be easily used for object detection with the OpenCV's CvSVM. –  marcos.nieto Oct 25 '13 at 17:10
 
The filter convolution is the most expensive part in DPM so how do you manage this part? –  Mickey Shine Jun 11 '14 at 14:35
1  
@MickeyShine the usual stuff... massively quantizing the features, and doing cascades. I'm doing more deep learning and less HOG-based DPMs these days. But I reached a point where I could the convolutions for a HOG-based 3-component, 8-part-per-component model in well under 50ms. –  solvingPuzzles Jun 11 '14 at 16:15
1  
@3yanlis1bos Thanks! I've fixed the FFLD link. –  solvingPuzzles Feb 19 '15 at 19:17
2  
Just adding a couple of updated links ffld and ffld2. Seems to have moved again –  Jon Mar 9 '15 at 8:29

1 Answer

active oldest votes
up vote 1 down vote

Have a look at the following implementation HoG SSE

It does fit your time requirements. It is written in C and uses 128 bit long SIMD instructions.

The code can be also further customized depending on normalization strategy and output type you need.

I would be glad to hear your feedback and be able to improve this code.

share improve this answer
 
 
Interesting! I'll give this a try. Does it do multiscale extraction (a "HOG pyramid," as some people call it)? –  solvingPuzzles Nov 26 '13 at 6:28
1  
@solvingPuzzles, did the HoG SEE fit your time requeriments? which solution did you find? –  Tin Feb 24 '14 at 11:32
 
@ivan_a could you please explain, how to use this code? I see that it uses only 16 bins and it is written that you can't change this? What does that mean? –  Parag S. Chandakkar Aug 17 '14 at 9:28

你可能感兴趣的:(Fastest HOG Feature Extraction implementation)