Video Analysis 入门


Activity Net 2017的五个task:
Task 1: Untrimmed Video Classification (ActivityNet)
videos can contain more than one activity, and typically large time lapses of the video are not related with any activity of interest.

Task 2: Trimmed Action Recognition (Kinetics) [New]
videos contain a single activity, and all the clips have a standard duration of ten seconds.

Task 3: Temporal Action Proposals (ActivityNet) [New]
The goal is to produce a set of candidate temporal segments that are likely to contain a human action.

Task 4: Temporal Action Localization (ActivityNet)
This task is intended to evaluate the ability of algorithms to temporally localize activities in untrimmed video sequences. Here, videos can contain more than one activity instance, and mutiple activity categories can appear in the video.

Task 5: Dense-Captioning Events in Videos (ActivityNet Captions) [New]
This task involves both detecting and describing events in a video.

lab: KTH, Weizmann
TV, Movies: UCF Sports, Holloywood
Web: HMDB, UCF101, THUMOS, ActivityNet
动作分类,时序检测(temporal localization),空间检测,时空检测。
时间维度(运动信息),计算量和存储要求大(kinetics > 10TB, Youtube 8M > 350TB),(时序)标定困难、有噪声

two stream cnn (2014), C3D (2015), TDD(2015), P3D?(后面看)

视频数据的短时建模介绍,具体介绍Appearance-and-Relation Networks (ARTNet)
1s, 0.5s,如run, jump, land(high jump的分解动作)
视频数据的中时建模介绍,具体介绍Temporal Segment Networks (TSN)
5s,如high jump。用于剪辑视频分类。

TSN extensions
用于时序动作检测的 Structure Segment Network 方法

基于图卷积网络(graph convolutional networks)和骨架的行为识别方法 ST-GCN。

Video Analysis 论文笔记
activity-net 2017
