[深度学习]场景文字检测与识别

目录

背景

文字为什么重要?

问题定义

那么会有那些挑战呢?

近期前沿和有代表性算法

Holistic, Multi-Channel Prediction

TextBoxes

Rotation Proposals

Corner Localization and Region Segmentation(A Megvii work in CVPR 2018)

Simpler Pipelines

EAST (A Megvii work in CVPR 2017)

任意形状的文字检测

TextSnake (A Megvii work in ECCV 2018)

Mask TextSpotter (A Megvii work in ECCV 2018)

文字识别

CRNN

ASTER

FAN

资源推荐


背景

文字为什么重要?

因为人类创造了文字,它具有两种特点:

  1. 具有丰富和精确的高层语义信息
  2. 传达了人类的思想和感情

同时文字在自然场景中可以作为一种视觉线索,具有互补的作用,比如边缘,纹理等等。

问题定义

文字检测是指通过算法判断文字的位置以及检测字符的过程。

那么会有那些挑战呢?

与传统的OCR不同,

自然场景更杂乱,OCR 更规整

文字类型千变万化,格式,颜色等

具体的挑战分为三类:

  1. 不同的大小,语言,格式等
  2. 背景中的干扰,符号,交通信号灯等结构具有局部相似性
  3. 成像过程,噪声模糊遮挡阴影等等。

近期前沿和有代表性算法

有一些算法从目标检测和语义分割中得到灵感启发:

Holistic, Multi-Channel Prediction

[深度学习]场景文字检测与识别_第1张图片

Yao et al.. Scene Text Detection via Holistic, Multi-Channel Prediction. 2016. arXiv preprint arXiv:1606.09002

  • lholistic vs. local
  • ltext detection is casted as a semantic segmentation problem
  • lconceptionally and functionally different from previous sliding-window or connected component based approaches
  •  
  • lholistic, pixel-wise predictions: text region map, character map and linking orientation map
  • ldetections are formed using these three maps
  • lcan simultaneously handle horizontal, multi-oriented and curved text in real-world natural images

TextBoxes

[深度学习]场景文字检测与识别_第2张图片

Liao et al.. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI, 2017.

 

  • la text detection method inspired by SSD
  • lboth high accuracy and efficiency

Rotation Proposals

[深度学习]场景文字检测与识别_第3张图片

Ma et al.. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. arxiv, 2017.

 

  • la multi-oriented text detection method based on Faster RCNN
  • lpropose several modifications to better detect scene text

Corner Localization and Region Segmentation
(A Megvii work in CVPR 2018)

[深度学习]场景文字检测与识别_第4张图片

Lyu et al.. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. CVPR, 2018.

  • la compound text detection method: corner localization and region segmentation

[深度学习]场景文字检测与识别_第5张图片

  • lcorner localization: corner detection with SSD
  • lregion segmentation: position-sensitive segmentation with R-FCN

Simpler Pipelines

EAST (A Megvii work in CVPR 2017)

[深度学习]场景文字检测与识别_第6张图片

Zhou et al.. EAST: An Efficient and Accurate Scene Text Detector. CVPR, 2017.

lmain idea: predict location, scale and orientation of text with a single model and multiple loss functions (multi-task training)

ladvantanges:

          (a). accuracy: allow for end-to-end training and optimization

          (b). efficiency: remove redundant stages and processings

任意形状的文字检测

TextSnake (A Megvii work in ECCV 2018)

[深度学习]场景文字检测与识别_第7张图片

Long et al.. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes, ECCV, 2018.

 

  • la novel and flexible representation
  • lable to effectively and precisely describe the geometric properties, such as location, scale, and bending of curved text, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle

[深度学习]场景文字检测与识别_第8张图片

la text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation

Mask TextSpotter (A Megvii work in ECCV 2018)

[深度学习]场景文字检测与识别_第9张图片

Lyu et al.. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes, ECCV, 2018.

 

  • lan end-to-end system for both text detection and recognition
  • linspired by Mask R-CNN

[深度学习]场景文字检测与识别_第10张图片

  • lRPN for text proposal generation
  • lFast R-CNN for proposal classification and regression
  • lmask branch for character segmentaion and recognition

文字识别

CRNN

[深度学习]场景文字检测与识别_第11张图片

Shi et al.. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition, TPAMI, 2017.

ASTER

[深度学习]场景文字检测与识别_第12张图片

Shi et al.. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification, TPAMI, 2018.

[深度学习]场景文字检测与识别_第13张图片

FAN

[深度学习]场景文字检测与识别_第14张图片

资源推荐

 

Survey

Scene Text Detection and Recognition: The Deep Learning Era

arXiv: https://arxiv.org/abs/1811.04256     (draft version)

Github: https://github.com/Jyouhou/SceneTextPapers (compiled papers, datasets & codes)

 

Laboratories and Papers

https://github.com/chongyangtao/Awesome-Scene-Text-Recognition

 

Datasets and Codes

https://github.com/seungwooYoo/Curated-scene-text-recognition-analysis

 

Projects and Products

https://github.com/wanghaisheng/awesome-ocr

 

你可能感兴趣的:(机器学习,深度学习,神经网络,文字检测,文字识别)