Video Analysis 入门

主要基于网上一些质量比较高的分享进行总结,以对视频分析领域的常见问题和方法有一个大致的理解。

Activity Net 2017的五个task:
Task 1: Untrimmed Video Classification (ActivityNet)
videos can contain more than one activity, and typically large time lapses of the video are not related with any activity of interest.

Task 2: Trimmed Action Recognition (Kinetics) [New]
videos contain a single activity, and all the clips have a standard duration of ten seconds.

Task 3: Temporal Action Proposals (ActivityNet) [New]
The goal is to produce a set of candidate temporal segments that are likely to contain a human action.

Task 4: Temporal Action Localization (ActivityNet)
This task is intended to evaluate the ability of algorithms to temporally localize activities in untrimmed video sequences. Here, videos can contain more than one activity instance, and mutiple activity categories can appear in the video.

Task 5: Dense-Captioning Events in Videos (ActivityNet Captions) [New]
This task involves both detecting and describing events in a video.

动作识别数据集
lab: KTH, Weizmann
TV, Movies: UCF Sports, Holloywood
Web: HMDB, UCF101, THUMOS, ActivityNet
Video Analysis 入门_第1张图片

视频理解的任务:
动作分类,时序检测(temporal localization),空间检测,时空检测。
难点:
时间维度(运动信息),计算量和存储要求大(kinetics > 10TB, Youtube 8M > 350TB),(时序)标定困难、有噪声

动作分类:
two stream cnn (2014), C3D (2015), TDD(2015), P3D?(后面看)

视频数据的短时建模介绍,具体介绍Appearance-and-Relation Networks (ARTNet)
1s, 0.5s,如run, jump, land(high jump的分解动作)
视频数据的中时建模介绍,具体介绍Temporal Segment Networks (TSN)
5s,如high jump。用于剪辑视频分类。

TSN extensions
用于时序动作检测的 Structure Segment Network 方法
视频数据的长时建模介绍,具体介绍UntrimmedNets
~1h。用于从未剪辑长视频中学习行为理解模型。

基于图卷积网络(graph convolutional networks)和骨架的行为识别方法 ST-GCN。

ref:
回顾 | 苏黎世联邦理工学院博士后王利民:基于视频的时序建模与动作识别
亚马逊高级应用科学家熊元骏:人类行为理解研究进展 | 直播实录·PhD Talk
深度前沿: 基于深度学习的智能视频分析,微软亚洲研究院梅涛博士ACM MM 2017 Tutorial解读
视频分析-动作识别前沿综述【上】
经典导读 | 基于双流卷积神经网络的视频流动作识别【附ppt和开源链接】
漫谈视频行为识别主流技术 (附视频)

Video Analysis 论文笔记
activity-net 2017

你可能感兴趣的:(深度学习与机器学习)