论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data

目录

创新点(Main Contributions)

Proposed Method

Spatial Attention

Temperal Attention

Joint Training of the Networks

Regularized Objective Function

 


 

论文名称:An end-to-end spatio-temporal attention model for human action recognition from skeleto(2017 AAAI)

下载地址:https://arxiv.org/pdf/1611.06067v1.pdf

 


 

创新点(Main Contributions)

作者提出了一种使用注意力机制去学习骨架点数据时间-空间特征的框架,来做动作识别的任务。

整个框架是由三部分组成,主 LSTM 网络空间维度上的注意力子网时间维度上的注意力子网

其中,在空间维度上的注意力子网中,作者使用其中的 LSTM 网络来学习当前帧节点和之前的帧节点之间的关系,形成对当前输入帧关节点数据的 attention map,自动挖掘出当前帧数据里哪些骨架点对动作识别的影响最大

时间维度上的注意力子网中,作者使用其中的 LSTM 网络来学习当前帧和之前的帧之间的关系,形成对当前输入帧数据的 attention map,自动学习哪些视频帧对动作识别的贡献最大

此外,作者采用一种交替的联合训练方式来训练网络,并设计了一个正则化的损失函数来防止模型训练得过拟合。

 


 

Proposed Method

论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data_第1张图片

论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data_第2张图片

 

 

Spatial Attention

论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data_第3张图片

在每个时间戳(time step) t,输入为:

 

the scoresfor indicating the importance of the K joints, and they are jointly obtained as 

 

For the k th joint, the activation as the joint-selection gate is computed as: 

 

Instead of assigning equal degrees of importance to all the joints x_t, the input to the main LSTM network is modulated to

 

 

Temperal Attention

论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data_第4张图片

 

The activation as the frame-selection gate can be computed as

 

For the sequence level classification, based on the output z_t of the main LSTM network and the temporal attention value \beta _t at each time step t.

the scores for C classes are the weighted summation of the scores at all time steps.

        其中,

 

The predicted probability being the i th class given a sequence X is

 

 

Joint Training of the Networks

论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data_第5张图片

 

 

Regularized Objective Function

论文阅读:An end-to-end spatio-temporal attention model for human action recognition from skeleton data_第6张图片

The scalars λ1, λ2, and λ3 balance the contribution of the three regularization terms.

 

你可能感兴趣的:(人工智能,学习,骨架点动作识别,RNN,LSTM,Attention,骨架点)