
[1] 场景文本位置检测


(1) 滑窗(sliding-window)和强连通分支(Connected Components (CCs))

  • L. Neumann and J. Matas. Scene text localization and recognition with oriented stroke detection. In Proc. of ICCV, 2013.
  • Y. Pan, X. Hou, and C. Liu. A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. on Image Processing, 20(3):800–813, 2011.
  • X. C. Yin, X. Yin, K. Huang, and H. Hao. Robust text detection in natural scene images. IEEE Trans. on PAMI, 36(5):970–983, 2014.
  • W. Huang, Z. Lin, J. Yang, and J. Wang. Text localization in natural images using stroke feature transform and text covariance descriptors. ICCV 2013.
  • TextFlow
    S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. Lim Tan. Text flow: A unified text detection system in natural scene images. CVPR 2015.
  • L. Sun, Q. Huo, and W. Jia. A robust approach for text detection from natural scene images. Pattern Recognition, 48(9):2906–2920, 2015.
  • X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao. [Multi-orientation scene text detection with adaptive clustering. IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, 2015.
  • L. Kang, Y. Li, and D. Doermann. Orientation robust text line detection in natural images. CVPR 2014, 4034–4041.
  • C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. Detecting texts of arbitrary orientations in natural images. CVPR 2012, 1083–1090.
  • H. Cho, M. Sung, and B. Jun. Canny Text Detector: Fast and Robust Scene Text Localization Algorithm. CVPR 2016, 3566-3573.

(2)其他传统方法还有MSE(Maximally Stable Extremal Regions)

  • D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on robust reading. In Proc. of ICDAR, 2015.
  • D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras. ICDAR 2013 robust reading competition. In Proc. of ICDAR, 2013.


  • Deeptext
    Z. Zhong, L. Jin, S. Zhang, and Z. Feng. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016.
  • TextBoxes
    M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu. TextBoxes:A Fast Text Detector with a Single Deep Neural Network. AAAI 2017.
    TextBoxes是一个端到端快速场景的一个单个深度神经网络(single deep nuural network)的文本检测器。
  • CTPN
    Z. Tian, W, Huang, T. He, P. He, and Y. Qiao. Detecting text in natural image with connectionist text proposal network, ECCV 2016.
  • A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data for text localisation in natural images. CVPR 2016.
    FCRN(Fully-Convolutional Regression Network)使用合成的图片去用于场景文本识别的模型。
  • RPPN
    J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. arXiv preprint arXiv:1703.01086, 2017.
    文章提出的方法需要三步:通过文本块FCN检测文本块,基于MSER生成多方向文本线候选,文本线候选分类。RRPN(Rotation Region Proposal Network )提出用于检测任意方向场景文本。
    Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai. Multioriented text detection with fully convolutional networks. CVPR 2016
    文章提出的方法适用于检测水平文本,但是不适用于高度倾斜文本检测。这个方法基于FCN (Fully Convolutional Network)去设计检测多方向的场景文本。
  • SegLink
    B. Shi, X. Bai, and S. Belongie. Detecting oriented text in natural images by linking segments. arXiv:1703.06520.
    文章提出通过检测文本片段和联系来检测方向文本。在任意长度的文本线上效果很好。Connectionist Text Proposal Network (CTPN)检测固定宽度的垂直框,使用BLSTM(双向LSTM)来抓取序列信息,然后和垂直框一起得到最终的目标检测框。
  • EAST
    X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang. EAST: An efficient and accurate scene text detector. arXiv:1704.03155.
  • DMPNet
    Y. Liu, and L. Jin. Deep matching prior network: toward tighter multi-oriented text detection. arXiv:1703.01425.
  • Deep Direct Regression
    W. He, X.Y. Zhang, F. Yin, and C.L. Liu. Deep direct regression for multi-oriented scene text detectionarXiv:1703.08289.
    Deep direct regression提出用于多方向场景文本。
  • R2CNN
    R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
    文章提出的方法基于Faster R-CNN结构来检测任意方向的场景文本。RPN用于生成矩形框包围不同方向的文本。针对每个被矩形框包围的文本,然后使用不同池化尺寸(7X7,11X3,3X11)的池化目标文本。最后联合提取的特征用于预测文本分数,估计矩形框和倾斜的最小面积框。采用的数据集是ICDAR2013和ICDAR2015。相关的知识点有RPN(Region Proposal Network)、NMS、ROIPooling和Anchor。
    github:R2CNN: Rotational Region CNN Based on FPN (Tensorflow)



  • R-CNN
    R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014.
  • SSPnet
    K. He, X. Zhang, S. Ren, and J. Sun. SPPNet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014, 346-361.
  • Fast R-CNN
    R. Girshick. Fast R-CNN. ICCV 2015, 1440–1448.
  • Faster R-CNN
    S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015.
    文章提出通过CNN生成卷积特征图,使用RPN(Region Proposal Network)生成高质量的物体目标。这个由RPN生成的目标通过Fast R-CNN直接定义和分类。也就是说Faster RCNN的结构分为两部分:RPN(region proposal network和Fast RCNN).。常用数据集是VOC和COCO。
    从编程角度来说, Faster R-CNN主要分为四部分(下图中四个绿色框):
    (2)Extractor: 利用CNN提取图片特征features(原始论文用的是ZFNet和VGG16,后来人们又用ResNet101);
    (3)RPN(Region Proposal Network): 负责提供候选区域rois(每张图给出大概2000个候选框);注:20000个左右的anchor。
    (4)RoIHead: 负责对rois分类和微调。对RPN找出的rois,判断它是否包含目标,并修正框的位置和坐标。
    github: simple-faster-rcnn-pytorch
  • R-FCN
    J. Dai, Y. Li, K. He, and J. Sun. R-FCN: Object Detection via Region-based Fully Convolutional Networks. NIPS 2016.


  • SSD
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. SSD: Single Shot MultiBox Detector, ECCV 2016
  • YOLO
    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. CVPR 2016.
  • YOLOv2
    Redmon, Joseph, and Ali Farhadi. YOLO9000: better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[2] 场景文本字符识别

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

[3] 图像处理相关数据集

  • ICDAR2019

[4] 图像目标分类

