【图像处理-OCR】图像场景文本识别相关论文总结

[1] 场景文本位置检测

1、传统场景文本识别方法

(1) 滑窗(sliding-window)和强连通分支(Connected Components (CCs))
相关论文:

  • L. Neumann and J. Matas. Scene text localization and recognition with oriented stroke detection. In Proc. of ICCV, 2013.
  • Y. Pan, X. Hou, and C. Liu. A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. on Image Processing, 20(3):800–813, 2011.
  • X. C. Yin, X. Yin, K. Huang, and H. Hao. Robust text detection in natural scene images. IEEE Trans. on PAMI, 36(5):970–983, 2014.
  • W. Huang, Z. Lin, J. Yang, and J. Wang. Text localization in natural images using stroke feature transform and text covariance descriptors. ICCV 2013.
  • TextFlow
    S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. Lim Tan. Text flow: A unified text detection system in natural scene images. CVPR 2015.
  • L. Sun, Q. Huo, and W. Jia. A robust approach for text detection from natural scene images. Pattern Recognition, 48(9):2906–2920, 2015.
  • X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao. [Multi-orientation scene text detection with adaptive clustering. IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, 2015.
  • L. Kang, Y. Li, and D. Doermann. Orientation robust text line detection in natural images. CVPR 2014, 4034–4041.
  • C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. Detecting texts of arbitrary orientations in natural images. CVPR 2012, 1083–1090.
  • H. Cho, M. Sung, and B. Jun. Canny Text Detector: Fast and Robust Scene Text Localization Algorithm. CVPR 2016, 3566-3573.

(2)其他传统方法还有MSE(Maximally Stable Extremal Regions)

  • D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on robust reading. In Proc. of ICDAR, 2015.
  • D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras. ICDAR 2013 robust reading competition. In Proc. of ICDAR, 2013.

(3)基于深度学习技术的方法用于场景文本识别
相关论文:

  • Deeptext
    Z. Zhong, L. Jin, S. Zhang, and Z. Feng. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016.
    使用文本检测网络
  • TextBoxes
    M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu. TextBoxes:A Fast Text Detector with a Single Deep Neural Network. AAAI 2017.
    TextBoxes是一个端到端快速场景的一个单个深度神经网络(single deep nuural network)的文本检测器。
  • CTPN
    Z. Tian, W, Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network, ECCV 2016.
  • A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data for text localisation in natural images. CVPR 2016.
    FCRN(Fully-Convolutional Regression Network)使用合成的图片去用于场景文本识别的模型。
  • RPPN
    J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. arXiv preprint arXiv:1703.01086, 2017.
    文章提出的方法需要三步:通过文本块FCN检测文本块,基于MSER生成多方向文本线候选,文本线候选分类。RRPN(Rotation Region Proposal Network )提出用于检测任意方向场景文本。
  • MCLAB_FCN
    Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai. Multioriented text detection with fully convolutional networks. CVPR 2016
    文章提出的方法适用于检测水平文本,但是不适用于高度倾斜文本检测。这个方法基于FCN (Fully Convolutional Network)去设计检测多方向的场景文本。
  • SegLink
    B. Shi, X. Bai, and S. Belongie. Detecting oriented text in natural images by linking segments. arXiv:1703.06520.
    文章提出通过检测文本片段和联系来检测方向文本。在任意长度的文本线上效果很好。Connectionist Text Proposal Network (CTPN)检测固定宽度的垂直框,使用BLSTM(双向LSTM)来抓取序列信息,然后和垂直框一起得到最终的目标检测框。
  • EAST
    X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang. EAST: An efficient and accurate scene text detector. arXiv:1704.03155.
    文章使用EAST生成了在自然场景中快速和准确的文本检测。
  • DMPNet
    Y. Liu, and L. Jin. Deep matching prior network: toward tighter multi-oriented text detection. arXiv:1703.01425.
    DMPNet设计使用贴紧的四边形来检测文本。
  • Deep Direct Regression
    W. He, X.Y. Zhang, F. Yin, and C.L. Liu. Deep direct regression for multi-oriented scene text detectionarXiv:1703.08289.
    Deep direct regression提出用于多方向场景文本。
  • R2CNN
    R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
    文章提出的方法基于Faster R-CNN结构来检测任意方向的场景文本。RPN用于生成矩形框包围不同方向的文本。针对每个被矩形框包围的文本,然后使用不同池化尺寸(7X7,11X3,3X11)的池化目标文本。最后联合提取的特征用于预测文本分数,估计矩形框和倾斜的最小面积框。采用的数据集是ICDAR2013和ICDAR2015。相关的知识点有RPN(Region Proposal Network)、NMS、ROIPooling和Anchor。
    【图像处理-OCR】图像场景文本识别相关论文总结_第1张图片
    github:R2CNN: Rotational Region CNN Based on FPN (Tensorflow)

(补充)基于深度学习技术的方法用于目标检测

(1)目标检测依赖于区域目标的方法有

  • R-CNN
    R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014.
  • SSPnet
    K. He, X. Zhang, S. Ren, and J. Sun. SPPNet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014, 346-361.
  • Fast R-CNN
    R. Girshick. Fast R-CNN. ICCV 2015, 1440–1448.
    github:https://github.com/rbgirshick/fast-rcnn
    文章翻译:https://zhuanlan.zhihu.com/p/27582096
    【图像处理-OCR】图像场景文本识别相关论文总结_第2张图片
  • Faster R-CNN
    S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015.
    文章提出通过CNN生成卷积特征图,使用RPN(Region Proposal Network)生成高质量的物体目标。这个由RPN生成的目标通过Fast R-CNN直接定义和分类。也就是说Faster RCNN的结构分为两部分:RPN(region proposal network和Fast RCNN).。常用数据集是VOC和COCO。
    【图像处理-OCR】图像场景文本识别相关论文总结_第3张图片
    从编程角度来说, Faster R-CNN主要分为四部分(下图中四个绿色框):
    (1)Dataset:数据,提供符合要求的数据格式(目前常用数据集是VOC和COCO);
    (2)Extractor: 利用CNN提取图片特征features(原始论文用的是ZFNet和VGG16,后来人们又用ResNet101);
    (3)RPN(Region Proposal Network): 负责提供候选区域rois(每张图给出大概2000个候选框);注:20000个左右的anchor。
    (4)RoIHead: 负责对rois分类和微调。对RPN找出的rois,判断它是否包含目标,并修正框的位置和坐标。
    【图像处理-OCR】图像场景文本识别相关论文总结_第4张图片
    github: simple-faster-rcnn-pytorch
  • R-FCN
    J. Dai, Y. Li, K. He, and J. Sun. R-FCN: Object Detection via Region-based Fully Convolutional Networks. NIPS 2016.

(2)目标检测不依赖于区域目标的方法有

  • SSD
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. SSD: Single Shot MultiBox Detector, ECCV 2016
  • YOLO
    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. CVPR 2016.
    全局回归,不用Anchor。
    知乎:YOLO原理与实现
  • YOLOv2
    Redmon, Joseph, and Ali Farhadi. YOLO9000: better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    使用Anchor。

[2] 场景文本字符识别

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[3] 图像处理相关数据集

  • ICDAR2019

[4] 图像目标分类

  • CNN+SVM

你可能感兴趣的:(图像处理)