Video representation based on mid-level discriminative patches
Mine these patches from training videos
Then use these patch as a discriminative vovabulary for action classification
represent videos in terms of discriminative spato-temporal patches rather than global feature vectors.
Video Representation方法大致有三类:
1、Global spatio-temporal templates
2、based on bag of features models (spatio-temporal imterest points、dense interest points...)适用于分类问题,不太适用于动作检测
3、分解视频为patches~
Mining Discriminative patches。。。。