CVPR 2020文本图像检测与识别论文/代码


向AI转型的程序员都关注了这个号????????????

机器学习AI算法工程   公众号:datayx

CVPR 2020 共收录 1470篇文章,算法主要领域:图像与视频处理,图像分类&检测&分割、视觉目标跟踪、视频内容分析、人体姿态估计、模型加速、网络架构搜索(NAS)、生成对抗(GAN)、光学字符识别(OCR)、人脸识别、三维重建等方向。

# 图像处理

1. Deep Image Harmonization via Domain Verification

论文:Deep Image Harmonization via Domain Verification

代码:bcmi/Image_Harmonization_Datasets

2. Learning to Shade Hand-drawn Sketches

论文:Learning to Shade Hand-drawn Sketches

3. Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

论文:Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

4. Single Image Reflection Removal through Cascaded Refinement

论文:arxiv.org/abs/1911.0663

5. RoutedFusion: Learning Real-time Depth Map Fusion

论文:arxiv.org/pdf/2001.0438

# 图像分类

1. Towards Robust Image Classification Using Sequential Attention Models

论文:Towards Robust Image Classification Using Sequential Attention Models

2. Self-training with Noisy Student improves ImageNet classification

论文:Self-training with Noisy Student improves ImageNet classification

3. Image Matching across Wide Baselines: From Paper to Practice

论文:Image Matching across Wide Baselines: From Paper to Practice

4. Improved Few-Shot Visual Classification

论文:arxiv.org/pdf/1912.0343

5. A General and Adaptive Robust Loss Function

论文:A General and Adaptive Robust Loss Function

6. Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks

论文:Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks

# 目标检测和分割

![](images.studyai.com/blog)

1. Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

论文:Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

2. Bridng the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

论文:arxiv.org/abs/1912.0242

代码:sfzhang15/ATSS

3. Semi-Supervised Semantic Image Segmentation with Self-correcting Networks

论文:Semi-Supervised Semantic Image Segmentation with Self-correcting Networks

4. Deep Snake for Real-Time Instance Segmentation

论文:Deep Snake for Real-Time Instance Segmentation

5. SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks

论文:SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks

6. xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

论文:xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

7. CenterMask : Real-Time Anchor-Free Instance Segmentation

论文:CenterMask : Real-Time Anchor-Free Instance Segmentation

代码:youngwanLEE/CenterMask

8. PolarMask: Single Shot Instance Segmentation with Polar Representation

论文:PolarMask: Single Shot Instance Segmentation with Polar Representation

代码:xieenze/PolarMask

9. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

论文:BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

# 视觉目标跟踪

![](images.studyai.com/blog)

1. ROAM: Recurrently Optimizing Tracking Model

论文:ROAM: Recurrently Optimizing Tracking Model

# 视频内容分析(理解)

![](images.studyai.com/blog)

1. Hierarchical Conditional Relation Networks for Video Question Answering

论文:Hierarchical Conditional Relation Networks for Video Question Answering

2. Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

论文:Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

代码:bbrattoli/ZeroShotVideoClassification

3. Action Modifiers:Learning from Adverbs in Instructional Video

论文:Action Modifiers: Learning from Adverbs in Instructional Videos

4. Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

论文:Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

5. Blurry Video Frame Interpolation

论文:Blurry Video Frame Interpolation

6. Object Relational Graph with Teacher-Recommended Learning for Video Captioning

论文:Object Relational Graph with Teacher-Recommended Learning for Video Captioning

7. Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

论文:Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

8. Learning Representations by Predicting Bags of Visual Words

论文:Learning Representations by Predicting Bags of Visual Words

9. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

论文:Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

# 人体关键点检测和姿态估计

![](images.studyai.com/blog)

1. Distribution-Aware Coordinate Representation for Human Pose Estimation

论文:Distribution-Aware Coordinate Representation for Human Pose Estimation

代码:ilovepose/DarkPose

2. VIBE: Video Inference for Human Body Pose and Shape Estimation

论文:VIBE: Video Inference for Human Body Pose and Shape Estimation

代码:mkocabas/VIBE

3. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

论文:The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

4. Optimal least-squares solution to the hand-eye calibration problem

论文:Optimal least-squares solution to the hand-eye calibration problem

5. Distribution Aware Coordinate Representation for Human Pose Estimation

论文:Distribution-Aware Coordinate Representation for Human Pose Estimation

6. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

论文:D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

7. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

论文:Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

8. PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

论文:arxiv.org/abs/1911.0423

9. 4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

论文:4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

# 模型轻量化和加速

1. GPU-Accelerated Mobile Multi-view Style Transfer

论文:GPU-Accelerated Mobile Multi-view Style Transfer

# 神经网络架构设计和搜索NAS

![](images.studyai.com/blog)

1. GhostNet: More Features from Cheap Operations

论文:GhostNet: More Features from Cheap Operations

代码:huawei-noah/ghostnet

2. CARS: Contunuous Evolution for Efficient Neural Architecture Search

论文:arxiv.org/pdf/1909.0497

代码:huawei-noah/CARS

3. Visual Commonsense R-CNN

论文:arxiv.org/abs/2002.1220

4. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral

论文:Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions

5. AdderNet: Do We Really Need Multiplications in Deep Learning?

论文:arxiv.org/pdf/1912.1320

6. Filter Grafting for Deep Neural Networks

论文:arxiv.org/pdf/2001.0586

# 生成对抗GAN

![](images.studyai.com/blog)

1. Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

论文:Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

代码:giannisdaras/ylg

2. MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis

论文:MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis

3. Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory

论文:Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory

# 三维点云&3D重建

![](images.studyai.com/blog)

1. PointAugment: an Auto-Augmentation Framework for Point Cloud Classification

论文:PointAugment: an Auto-Augmentation Framework for Point Cloud Classification

代码:liruihui/PointAugment

2. PF-Net: Point Fractal Network for 3D Point Cloud Completion

论文:PF-Net: Point Fractal Network for 3D Point Cloud Completion

3. Learning multiview 3D point cloud registration

论文:Learning multiview 3D point cloud registration

4. Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

论文:Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

5. In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction from 2D Landmarks

论文:arxiv.org/pdf/1911.1192

6. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

论文:RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

7. C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

论文:C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

8. Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs

论文:Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs

9. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

论文:Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

# 光学字符识别OCR

1. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

论文:ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

代码:github.com/Yuliang-Liu/

# 迁移学习

![](images.studyai.com/blog)

1. Meta-Transfer Learning for Zero-Shot Super-Resolution

论文:Meta-Transfer Learning for Zero-Shot Super-Resolution

2. Transferring Dense Pose to Proximal Animal Classes

论文:Transferring Dense Pose to Proximal Animal Classes

# 弱监督 & 无监督学习

1. Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

论文:Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

2. Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction

论文:Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction

3. Rethinking the Route Towards Weakly Supervised Object Localization

论文:Rethinking the Route Towards Weakly Supervised Object Localization

4. NestedVAE: Isolating Common Factors via Weak Supervision

论文:NestedVAE: Isolating Common Factors via Weak Supervision

# 人脸识别

1. Towards Universal Representation Learning for Deep Face Recognition

论文:Towards Universal Representation Learning for Deep Face Recognition

2. Suppressing Uncertainties for Large-Scale Facial Expression Recognition

论文:Suppressing Uncertainties for Large-Scale Facial Expression Recognition

代码:kaiwang960112/Self-Cure-Network

3. Face X-ray for More General Face Forgery Detection

论文:arxiv.org/pdf/1912.1345

# 图神经网络GNN

1. Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

论文:Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

2. Bundle Adjustment on a Graph Processor

论文:Bundle Adjustment on a Graph Processor

代码:joeaortiz/gbp

# 视觉 & 语言 混合任务研究

1. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

论文:Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

代码:weituo12321/PREVALENT

2. 12-in-1: Multi-Task Vision and Language Representation Learning

论文:12-in-1: Multi-Task Vision and Language Representation Learning

3. Hierarchical Conditional Relation Networks for Video Question Answering

论文:Hierarchical Conditional Relation Networks for Video Question Answering

# 其他问题研究

1. What it Thinks is Important is Important: Robustness Transfers through Input Gradients

论文:arxiv.org/abs/1912.0569

2. Holistically-Attracted Wireframe Parsing

论文:Holistically-Attracted Wireframe Parsing

3. Attntive Context Normalization for Robust Permutation-Equivariant Learning

论文:Attentive Context Normalization for Robust Permutation-Equivariant Learning

5. ClusterFit: Improving Generalization of Visual Representations

论文:ClusterFit: Improving Generalization of Visual Representations

6. Learning in the Frequency Domain

论文:Learning in the Frequency Domain

7. A Characteristic Function Approach to Deep Implicit Generative Modeling

论文:A Characteristic Function Approach to Deep Implicit Generative Modeling

8. Auto-Encoding Twin-Bottleneck Hashing

论文:Auto-Encoding Twin-Bottleneck Hashing

CVPR 2020 所有文本图像(text)相关论文,主要分为手写文本和场景文本两大方向总计16篇,对文献进行了细致的分类,大部分论文是围绕识别问题的研究。

方向包括:

1)场景文本检测(Scene Text Detection),从街景等场景文本中检测文本的位置,2 篇文献均为不规则任意形状文本的检测;

2)场景文本识别(Scene Text Recognition),对场景文本检测得到的结果进行识别,共 4 篇文章;

3)手写文本识别(Handwritten Text Recognition),2 篇文章;

4)场景文本端到端识别(Scene Text Spotting),1 篇文章,即华南理工大学和阿德莱德大学学者提出的实时 ABCNet 算法,很吸引人,已经开源;

5)手写文本生成(Handwritten Text Generation),为了增加手写文本的训练样本(感觉也可以用来“写作业”手动滑稽”),1 篇文章;

6)场景文本合成(Scene Text Synthesis),为了增加场景文本的训练样本,1 篇文章,出自旷视科技,UnrealText用渲染引擎生成逼真场景文本;

7)文本图像的数据增广,用于手写和场景文本识别算法的训练,1 篇文章;

8)场景文本编辑(Scene Text Editor),对场景文本图像中的文字进行替换;

9)碎纸文档重建,用于刑侦领域的文档被破坏成碎片后的重建,1篇;

10)文本风格迁移,1篇;

11)场景文本识别的对抗攻击研究,1篇;

12)笔迹鉴定,1篇。

值得一提的,16篇文章中10篇已经开源或者准备开源,感谢这些开发者~

已经开源或者即将开源的论文,把代码地址也附上了。

大家可以在:

http://openaccess.thecvf.com/CVPR2020.py

按照题目下载这些论文。

场景文本检测

深度关系推理图网络用于任意形状文本检测

[1].Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

作者 | Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin

单位 | 北京科技大学;中国科学技术大学人工智能联合实验室;腾讯科技(深圳)

代码 | https://github.com/GXYM/DRRG

解读 | https://blog.csdn.net/SpicyCoder/article/details/105072570

CVPR 2020文本图像检测与识别论文/代码_第1张图片

[2].ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

作者 | Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang

单位 | 中国科学技术大学

代码 | https://github.com/wangyuxin87/ContourNet

解读 | https://zhuanlan.zhihu.com/p/135399747

CVPR 2020文本图像检测与识别论文/代码_第2张图片

场景文本识别

论场景文本识别中的词汇依赖性

[3].On Vocabulary Reliance in Scene Text Recognition

作者 | Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

单位 | 旷视;中国矿业大学;罗切斯特大学

CVPR 2020文本图像检测与识别论文/代码_第3张图片

[4].SCATTER: Selective Context Attentional Scene Text Recognizer

作者 | Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha

单位 | Amazon Web Services

CVPR 2020文本图像检测与识别论文/代码_第4张图片

语义推理网络,用于场景文本的精确识别

[5].Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

作者 | Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, Errui Ding

单位 | 国科大;百度;中科院

代码 | https://github.com/chenjun2hao/SRN.pytorch

语义增强的编解码框架,用于识别低质量图像(模糊、光照不均、字符不完整等)场景文本

[6].SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

作者 | Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, Weiping Wang

单位 | 中科院;国科大

代码 | https://github.com/Pay20Y/SEED(即将)

CVPR 2020文本图像检测与识别论文/代码_第5张图片

手写文本识别

[7].OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

作者 | Mohamed Yousef, Tom E. Bishop

单位 | Intuition Machines, Inc

代码 | https://github.com/IntuitionMachines/OrigamiNet

CVPR 2020文本图像检测与识别论文/代码_第6张图片

  Scene Text Spotting

实时端到端场景文本识别

[8].ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

作者 | Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang

单位 | 华南理工大学;阿德莱德大学;

代码 | https://github.com/Yuliang-Liu/bezier\_curve\_text\_spotting

备注 | CVPR 2020 Oral

解读 | https://zhuanlan.zhihu.com/p/146276834

CVPR 2020文本图像检测与识别论文/代码_第7张图片

手写文本生成

半监督变长手写文本生成,增加文本数据集,提高识别算法精度

[9].ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

作者 | Sharon Fogel, Hadar Averbuch-Elor, Sarel Cohen, Shai Mazor, Roee Litman

单位 | 以色列国,Amazon Rekognition;康奈尔大学

代码 | https://github.com/amzn/convolutional-handwriting-gan

CVPR 2020文本图像检测与识别论文/代码_第8张图片

场景文本合成

使用渲染引擎合成场景文本,增加训练样本,提升识别算法精度

[10].UnrealText: Synthesizing Realistic Scene Text Images From the Unreal 

作者 | WorldShangbang Long, Cong Yao

单位 | 卡内基梅隆大学;旷视

代码 | https://jyouhou.github.io/UnrealText/

解读 | https://zhuanlan.zhihu.com/p/137406773

CVPR 2020文本图像检测与识别论文/代码_第9张图片

数据增广+文本识别

图像增广用于手写与场景文本识别

[11].Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

作者 | Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

单位 | 华南理工大学;阿里

代码 | https://github.com/Canjie-Luo/Text-Image-Augmentation

CVPR 2020文本图像检测与识别论文/代码_第10张图片

场景文本编辑

[12].STEFANN: Scene Text Editor Using Font Adaptive Neural Network

作者 | Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

单位 | 印度统计研究所;印度理工学院

代码 | https://github.com/prasunroy/stefann

网站 | https://prasunroy.github.io/stefann/

  碎纸文档重建

破碎纸片重建文档,用于法医等刑侦调查

[13].Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning

作者 | Thiago M. Paixao, Rodrigo F. Berriel, Maria C. S. Boeres, Alessandro L. Koerich, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos

单位 | IFES,Brazil;UFES,Brazil;ETS,Canada

CVPR 2020文本图像检测与识别论文/代码_第11张图片

文本风格迁移

[14].SwapText: Image Based Texts Transfer in Scenes

作者 | Qiangpeng Yang, Jun Huang, Wei Lin

单位 | 阿里

CVPR 2020文本图像检测与识别论文/代码_第12张图片

 场景文本识别+对抗攻击

[15].What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images

作者 | Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, Heng Tao Shen

单位 | 电子科技大学

CVPR 2020文本图像检测与识别论文/代码_第13张图片

笔迹鉴定

[16].Sequential Motif Profiles and Topological Plots for Offline Signature Verification

作者 | Elias N. Zois, Evangelos Zervas, Dimitrios Tsourounis, George Economou

单位 | University of West Attica ;派图拉斯大学

CVPR 2020文本图像检测与识别论文/代码_第14张图片


阅读过本文的人还看了以下文章:

基于40万表格数据集TableBank,用MaskRCNN做表格检测

《基于深度学习的自然语言处理》中/英PDF

Deep Learning 中文版初版-周志华团队

【全套视频课】最全的目标检测算法系列讲解,通俗易懂!

《美团机器学习实践》_美团算法团队.pdf

《深度学习入门:基于Python的理论与实现》高清中文PDF+源码

特征提取与图像处理(第二版).pdf

python就业班学习视频,从入门到实战项目

2019最新《PyTorch自然语言处理》英、中文版PDF+源码

《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码

《深度学习之pytorch》pdf+附书源码

PyTorch深度学习快速实战入门《pytorch-handbook》

【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》

《Python数据分析与挖掘实战》PDF+完整源码

汽车行业完整知识图谱项目实战视频(全23课)

李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材

笔记、代码清晰易懂!李航《统计学习方法》最新资源全套!

《神经网络与深度学习》最新2018版中英PDF+源码

将机器学习模型部署为REST API

FashionAI服装属性标签图像识别Top1-5方案分享

重要开源!CNN-RNN-CTC 实现手写汉字识别

yolo3 检测出图像中的不规则汉字

同样是机器学习算法工程师,你的面试为什么过不了?

前海征信大数据算法:风险概率预测

【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类

VGG16迁移学习,实现医学图像识别分类工程项目

特征工程(一)

特征工程(二) :文本数据的展开、过滤和分块

特征工程(三):特征缩放,从词袋到 TF-IDF

特征工程(四): 类别特征

特征工程(五): PCA 降维

特征工程(六): 非线性特征提取和模型堆叠

特征工程(七):图像特征提取和深度学习

如何利用全新的决策树集成级联结构gcForest做特征工程并打分?

Machine Learning Yearning 中文翻译稿

蚂蚁金服2018秋招-算法工程师(共四面)通过

全球AI挑战-场景分类的比赛源码(多模型融合)

斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)

python+flask搭建CNN在线识别手写中文网站

中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程

不断更新资源

深度学习、机器学习、数据分析、python

 搜索公众号添加: datayx  


机大数据技术与机器学习工程

 搜索公众号添加: datanlp

长按图片,识别二维码

你可能感兴趣的:(CVPR 2020文本图像检测与识别论文/代码)