第六十二周学习笔记

第六十二周学习笔记

论文阅读概述

  • Hierarchy Parsing for Image Captioning: This article introduces a hierarchy encoder for image captioning which combine object, subobject and semantic segmentation information to form a tree structure, applying tree-LSTM to get a feature with more semantic and hierarchy information.
  • Reflective Decoding Network for Image Captioning: This article introduces a image captioning decoder which attend to all previous hidden state to strengthen LSTM’s ability to model long-term dependency.
  • Attention on Attention for Image Captioning: This article introduces Attention on Attention mechanism which adaptively choose whether to use attention information by a gate variable which is interpreted as similarity between query and attention vector, achieving 129.8 CIDEr without any out source labels which is impressive.
  • Human Attention in Image Captioning: Dataset and Analysis: This article studies the difference between human attention and machine attention and found that the consistence between human and machine attention will not lead to higher performance while it will help machine to perform better that use human attention to boost it.
  • Unpaired Image Captioning via Scene Graph Alignments: This article introduces a unsupervised image captioning training strategy and its key idea is to align scene graph feature of sentence and image by cycle gan, achieving SoTA CIDEr 69.5.
  • What do different evaluation metrics tell us about saliency models?: This article is about metrics of saliency detection and it draws the conclusion that NSS and CC is more fair and correlative metrics which are recommended to use.
  • Aesthetic Image Captioning From Weakly-Labelled Photographs: This article introduces a new aesthetic image captioning dataset and a weakly supervised strategy to train encoder of captioning model by extracting label from captions which is impressive.

bottom-up attention可视化

结果

红蓝绿分别是被attend到的top3区域,图片标题是当前time step生成的词,以及这三个区域的累积权值
第六十二周学习笔记_第1张图片
第六十二周学习笔记_第2张图片
第六十二周学习笔记_第3张图片
第六十二周学习笔记_第4张图片

一些现象

  • 重复的attention区域,如图2
  • 不知所云的attend区域(尤其是none-visual word ),所有图片
  • 较低的累积概率,所有图片
  • gt中没有出现却正确的词(surfboard),且attend对了,如图3
  • 错误的caption,似乎正确的attention对应(elephant),如图4

本周小结

上周任务

  • 完成ROI attention可视化 √
  • 读论文>5篇 √

下周目标

  • 完成倒排权值的attention模型
  • 参考19CVPR的object detection借助image captioning的文章,作出saliency map的版本
  • 读论文 > 5篇,偏向attention以及跨level视觉任务的文章

Appendix(日记)

10月8日

  • 读了两篇论文,一篇是关于unsupervised的,一篇是关于attention机制的
  • 完成了bu attention的可视化

10月9日TODO

  • 代码重构
  • 结合之前的观察,作出更加丰富的可视化结果

10月9日小结

  • 重构完成,很happy,之前代码写的太烂了
  • 根据visual word可视化了topdown模型
  • 读了一篇论文

10月10日TODO

  • 把PPT做好
  • paper reading

10月10日小结

  • 把miccai challenge的ppt做好了
  • 开了组会,汇报了工作,准备下一步做倒排attention和借鉴19年CVPR那篇文章来做image captioning guided saliency prediction
  • 下午小组讨论了论文,值得注意的是以下的问题
    • LSTM的dropout,mask掉某些输入
    • AoA其实就是adaptive attention
    • 在language model之前是否可以加额外的层做特征映射,从而让image 特征更加适应到captioning任务上
  • 准备周日写一个简单的科研辅助网页
  • 读了一篇论文

10月11日TODO

  • 在topdown上直接做倒排的attention,与之前的训练结果对比(spatial, bu)
  • 复习CVPR2019的论文,找到那篇使用image captioning特征的方法
  • [stack]科研辅助网页

10月11日小结

补作业,啥也没做

10月12日TODO

  • 继续补作业
  • 完成11日的任务
  • [stack]科研辅助网页

10月12日小结

  • 作业太多了,补了一整天

你可能感兴趣的:(学习笔记)