NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

论文主要创建了The RGB+D Action Dataset ,并提出了Part_Aware LSTM Network。主要内容如下:

一:Dataset

    数据库包含3D skeletons (body joints),Masked depth maps,Full depth maps,RGB videos,IR videos. 包含3种不同角度的视图(-45,0,45)

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis_第1张图片

二:Part_Aware LSTM Network 

        这的论文中比较新颖的地方,不同于把整个身体的long-term memory 保存在 cell 里,文章采用 part-based。独立去存储每一个part 的memory,然后连接在一起组成一个大的cell。


NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis_第2张图片 NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis_第3张图片

    其实这种思想类似于很多文章,下面列举两例:

    (1):Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

    (2):A Hierarchical Deep Temporal Model for Group Activity Recognition

        首先总结一下上图,上图对 t 时刻的身体信息进行分块,相当于关注于细节,对不同part 的信息进行学习,汇总为一个大的cell。这种方法与直接接入普通的LSTM相比,网络结构稍微复杂了一些,更关注细节了,有一点分层的感觉,只不过把分层的思想放在了LSTM中。博主列举的两篇文章都是采用的分层的思想,一个基于body-part 采用多层BRNN去做动作识别,一个是针对video,对不同人进行part,最终做group activity,具体论文可参考博主以前博客


三:Experimental Setup

(1):How important is the skeleton normalization step, described in experimantal setup section?
      In the extension of our experiments, we found out the normalization is not vital. You can skip the normalization step and it should work fine. Actually the network is supposed to learn how to normalize the data by itself.

(2):How did you choose the main actor in the preprocessing step?
      We used a heuristic. It's very simple (but not necessarily correct for all the samples). Consider the variance of the X, Y, and Z values of all the joints and add them up. We took the body with the higher value as the main subject.

(3):How did you handle the variable subject numbers (one or two) in the input of the network?
     Our inputs initially includes two sets of joints (for two skeletons). When we observed just one, the second set was filled with zeros. When we observed two or more, we decided about which one to be the main subject and which one to be the second one, by measuring the amount of motion of their joints. Also, some of the detected skeletons are noise, like tables and seats!!! You can eliminate them by filtering out the skeletons which does not have reasonable Y spread over X spread values over all of their joints.

       

四:总结

       仔细回想,作者这种改造LSTM的思想还是很赞的,相比分层思想减少了计算量,而且简化了网络。以上只是个人感想,错误之处还请之处,非常感谢!










你可能感兴趣的:(group,activity,lstm)