转载“现在Computer Vision基本要用的几个图像特征和方法”

转载自:http://www.zhizhihu.com/html/y2010/2431.html

 

       一直在关注Action Classification,VOC2010结果发布之后,大体看了一下,基本上就那些图像特征的使用(dense SIFT+Spatial Pyramid),然后就是乱七八糟的融合了,归结都低就是Multiple Kernel Learning以及一些近似的算法。

下面看看VOC2010关于ActionClassification部分的结果:

Average Precision (AP %)

  phoning playing
instrument
reading riding
bike
riding
horse
running taking
photo
using
computer
walking
BONN_ACTION 47.5 51.1 31.9 64.5 69.1 78.5 32.4 53.9 61.1
CVC_BASE 56.2 56.5 34.7 75.1 83.6 86.5 25.4 60.0 69.2
CVC_SEL 49.8 52.8 34.3 74.2 85.5 85.1 24.9 64.1 72.5
INRIA_SPM_HT 53.2 53.6 30.2 78.2 88.4 84.6 30.4 60.9 61.8
NUDT_SVM_WHGO_SIFT_CENTRIST_LLM 47.2 47.9 24.5 74.2 81.0 79.5 24.9 58.6 71.5
SURREY_MK_KDA 52.6 53.5 35.9 81.0 89.3 86.5 32.8 59.2 68.6
UCLEAR_SVM_DOSP_MULTFEATS 47.0 57.8 26.9 78.8 89.7 87.3 32.5 60.0 70.1
UMCO_DHOG_KSVM 53.5 43.0 32.0 67.9 68.8 83.0 34.1 45.9 60.4
WILLOW_A_SVMSIFT_1-A_LSVM 49.2 37.7 22.2 73.2 77.1 81.7 24.3 53.7 56.9
WILLOW_LSVM 40.4 29.9 32.2 53.5 62.2 73.6 17.6 45.8 41.5
WILLOW_SVMSIFT 47.9 29.1 21.7 53.5 76.7 78.3 26.0 42.9 56.4

各种方法的描述后面也有。

首先看看UCLEAR_SVM_DOSP_MULTFEATS的方法:

Multiple chi squared kernels are computed: spatial pyramid (SP) w/ dense SIFT, dense overlapping SP w/ HOG, texture filter, LAB values (bag-of-words w/ the above features) and edge dir hists. They are computed on full images, person bounding boxes (BB) and BB of the lower part (simple stretch-scale of person BB) expected to contain horse, bike etc. They are combined with class specific binary weights based on their perf on val set. Finally, class specific SVMs trained on train+val.

是不是感觉方法很简单?

再看看SURREY_MK_KDA的方法:

Kernel-level fusion with Spatial Pyramid Grids, Soft Assignment and Kernel Discriminant Analysis using spectral regression. 18 kernels have been generated from 18 variants of SIFT. 融合吧。

CVC_SEL的方法:

Enhanced CVC submission built upon CVC-BASE for action recognition. Standard BoW model over multiple features from CVC-BASE plus contextual object descriptors. Cross-validation procedure for action-specific feature and kernel selection. Foreground/background/neighborhood modeled separately, spatial pyramid over several features for foreground representation. Object detection based on deformable part-based detector incorporated. Late fusion of feature-specific SVM outputs for final action score.

综上所述:Spatial Pyramid w/(dense SIFT | overlap HOG)这是最好用的描述模板的方法,一起用就用Multiple Kernel融合起来,学个融合的参数,其实效果真的很好很好,不骗你。

所以说,对于一些类似这样的问题,除非你是非得自己发明一些描述子,不然用这些就能够达到一些实验的目标,当然实用也是未尝不可的。

转载“现在Computer Vision基本要用的几个图像特征和方法”_第1张图片

 

你可能感兴趣的:(opencv,学习)