An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL

这是一篇做关于用reinfocement learning(RL)做Natural Language Object Retrieval的文章,paper的链接https://arxiv.org/abs/1703.07579,没有找到作者的homepage,但是code已经released出来了https://github.com/jxwufan/NLOR_A3C。
文章要做的事情:
输入:text+image dataset     输出:object
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL_第1张图片

method

文章中给出的natural language object retrieval via context-aware deep reinforcement learning的一个示意图。
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL_第2张图片
context-aware policy and value network framework如下所示。
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL_第3张图片
training pipeline如下所示。
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL_第4张图片
这篇文章重要的的一点是用end-end并通过强化学习的方式来产生bbox,而不需要通过训练好的proposel(rely heavily on the training data of object proposals and are restricted to the predefined object categories)网络来提取。
image features: concat global feature( ResNet152 global average pooling) and local feature( ResNet152 Roi pooling+global average pooling),2048+2048=4096dim。
sentence features: skip-thought vectors [ http://papers.nips.cc/paper/5950-skip-thought-vectors ] trained on the BookCorpus dataset,4096dim。
然后再将image feature和sentence feature做dot product和L2的运算,然后再与50(50x9=450dim)个之前的动作向量和一个bbox向量(5dim)做concatation运算,得到一个4096+450+9=4551dim的向量,然后再通过2个FC得到1024dim的feature,然后在通过一个带有Layer Normalization的LSTM(根据temporal context做subsequent decision making),最后输出policy(决定要采取的action)和value(估计reward)。
training: uses multiple agents associated with environments to collect data in parallel and updates the policy asynchronously by asynchronous advantage actor-critic (A3C) method [ https://arxiv.org/abs/1602.01783 ]。

你可能感兴趣的:(跨媒体)