转自华南理工大学SCTU-DLVC实验室,原文链接https://github.com/HCIILAB/Scene-Text-Detection
Author: Chongyu Liu
ICDAR 2003(IC03):
ICDAR 2011(IC11):
ICDAR 2013(IC13):
USTB-SV1K:
SVT:
SVT-P:
ICDAR 2015(IC15):
COCO-Text:
MSRA-TD500:
MLT 2017:
MLT 2019:
CTW:
RCTW-17:
ReCTS:
CUTE80:
Total-Text:
SCUT-CTW1500:
LSVT:
ArT:
Synth80k :
SynthText :
Comparison of Datasets | |||||||||||||
Datasets | Language | Image | Text instance | Text Shape | Annotation level | ||||||||
Total | Train | Test | Total | Train | Test | Horizontal | Arbitrary-Quadrilateral | Multi-oriented | Char | Word | Text-Line | ||
IC03 | English | 509 | 258 | 251 | 2266 | 1110 | 1156 | ✓ | ✕ | ✕ | ✕ | ✓ | ✕ |
IC11 | English | 484 | 229 | 255 | 1564 | ~ | ~ | ✓ | ✕ | ✕ | ✓ | ✓ | ✕ |
IC13 | English | 462 | 229 | 233 | 1944 | 849 | 1095 | ✓ | ✕ | ✕ | ✓ | ✓ | ✕ |
USTB-SV1K | English | 1000 | 500 | 500 | 2955 | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
SVT | English | 350 | 100 | 250 | 725 | 211 | 514 | ✓ | ✓ | ✕ | ✓ | ✓ | ✕ |
SVT-P | English | 238 | ~ | ~ | 639 | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
IC15 | English | 1500 | 1000 | 500 | 17548 | 122318 | 5230 | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
COCO-Text | English | 63686 | 43686 | 20000 | 145859 | 118309 | 27550 | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
MSRA-TD500 | English/Chinese | 500 | 300 | 200 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✕ | ✓ |
MLT 2017 | Multi-lingual | 18000 | 7200 | 10800 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
MLT 2019 | Multi-lingual | 20000 | 10000 | 10000 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
CTW | Chinese | 32285 | 25887 | 6398 | 1018402 | 812872 | 205530 | ✓ | ✓ | ✕ | ✓ | ✓ | ✕ |
RCTW-17 | English/Chinese | 12514 | 15114 | 1000 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✕ | ✓ |
ReCTS | Chinese | 20000 | ~ | ~ | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✓ | ✓ | ✕ |
CUTE80 | English | 80 | ~ | ~ | ~ | ~ | ~ | ✕ | ✕ | ✓ | ✕ | ✓ | ✓ |
Total-Text | English | 1525 | 1225 | 300 | 9330 | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✓ |
CTW-1500 | English/Chinese | 1500 | 1000 | 500 | 10751 | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✓ |
LSVT | English/Chinese | 450000 | 430000 | 20000 | ~ | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✓ |
ArT | English/Chinese | 10166 | 5603 | 4563 | ~ | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✕ |
Synth80k | English | 80k | ~ | ~ | 8m | ~ | ~ | ✓ | ✕ | ✕ | ✓ | ✓ | ✕ |
SynthText | English | 800k | ~ | ~ | 6m | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
Scene text detection methods can be devided into four parts:
(a) Traditional methods;
(b) Segmentation-based methods;
© Regression-based methods;
(d) Hybrid methods.
It is important to notice that: (1) “Hori” stands for horizontal scene text datasets. (2) “Quad” stands for arbitrary-quadrilateral-text datasets. (3) “Irreg” stands for irregular scence text datasets. (4) “Traditional method” stands for the methods that don’t rely on deep learning.
Method | Model | Code | Hori | Quad | Irreg | Source | Time | Highlight |
Yao et al. [1] | TD-Mixture | ✕ | ✓ | ✓ | ✕ | CVPR | 2012 | 1) A new dataset MSRA-TD500 and protocol for evaluation. 2) Equipped a two-level classification scheme and two sets of features extractor. |
Yin et al. [2] | ✕ | ✓ | ✕ | ✕ | TPAMI | 2013 | Extract Maximally Stable Extremal Regions (MSERs) as character candidates and group them together. | |
Le et al. [5] | HOCC | ✕ | ✓ | ✓ | ✕ | CVPR | 2014 | HOCC + MSERs |
Yin et al. [7] | ✕ | ✓ | ✓ | ✕ | TPAMI | 2015 | Presenting a unified distance metric learning framework for adaptive hierarchical clustering. | |
Wu et al. [9] | ✕ | ✓ | ✓ | ✕ | TMM | 2015 | Exploring gradient directional symmetry at component level for smoothing edge components before text detection. | |
Tian et al. [17] | ✕ | ✓ | ✕ | ✕ | IJCAI | 2016 | Scene text is first detected locally in individual frames and finally linked by an optimal tracking trajectory. | |
Yang et al. [33] | ✕ | ✓ | ✓ | ✕ | TIP | 2017 | A text detector will locate character candidates and extract text regions. Then they will linked by an optimal tracking trajectory. | |
Liang et al. [8] | ✕ | ✓ | ✓ | ✓ | TIP | 2015 | Exploring maxima stable extreme regions along with stroke width transform for detecting candidate text regions. | |
Michal et al.[12] | FASText | ✕ | ✓ | ✓ | ✕ | ICCV | 2015 | Stroke keypoints are efficiently detected and then exploited to obtain stroke segmentations. |
Method | Model | Code | Hori | Quad | Irreg | Source | Time | Highlight | ||||||||||||
Li et al. [3] | ✕ | ✓ | ✓ | ✕ | TIP | 2014 | (1)develop three novel cues that are tailored for character detection and a Bayesian method for their integration; (2)design a Markov random field model to exploit the inherent dependencies between characters. | |||||||||||||
Zhang et al. [14] | ✕ | ✓ | ✓ | ✕ | CVPR | 2016 | Utilizing FCN for salient map detection and centroid of each character prediction. | |||||||||||||
Zhu et al. [16] | ✕ | ✓ | ✓ | ✕ | CVPR | 2016 | Performs a graph-based segmentation of connected components into words (Word-Graph). | |||||||||||||
He et al. [18] | Text-CNN | ✕ | ✓ | ✓ | ✕ | TIP | 2016 | Developing a new learning mechanism to train the Text-CNN with multi-level and rich supervised information. | ||||||||||||
Yao et al. [21] | ✕ | ✓ | ✓ | ✕ | arXiv | 2016 | Proposing to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem. | |||||||||||||
Hu et al. [27] | WordSup | ✕ | ✓ | ✓ | ✕ | ICCV | 2017 | Proposing a weakly supervised framework that can utilize word annotations. Then the detected characters are fed to a text structure analysis module. | ||||||||||||
Wu et al. [28] | ✕ | ✓ | ✓ | ✕ | ICCV | 2017 | Introducing the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border. | |||||||||||||
Tang et al.[32] | ✕ | ✓ | ✕ | ✕ | TIP | 2017 | A text-aware candidate text region(CTR) extraction model + CTR refinement model. | |||||||||||||
Dai et al. [35] | FTSN | ✕ | ✓ | ✓ | ✕ | arXiv | 2017 | Detecting and segmenting the text instance jointly and simultaneously, leveraging merits from both semantic segmentation task and region proposal based object detection task. | ||||||||||||
Wang et al. [38] | ✕ | ✓ | ✕ | ✕ | ICDAR | 2017 | This paper proposes a novel character candidate extraction method based on super-pixel segmentation and hierarchical clustering. | |||||||||||||
Deng et al. [40] | PixelLink | ✓ | ✓ | ✓ | ✕ | AAAI | 2018 | Text instances are first segmented out by linking pixels wthin the same instance together. | ||||||||||||
Liu et al. [42] | MCN | ✕ | ✓ | ✓ | ✕ | CVPR | 2018 | Stochastic Flow Graph (SFG) + Markov Clustering. | ||||||||||||
Lyu et al. [43] | ✕ | ✓ | ✓ | ✕ | CVPR | 2018 | Detect scene text by localizing corner points of text bounding boxes and segmenting text regions in relative positions. | |||||||||||||
Chu et al. [45] | Border | ✕ | ✓ | ✓ | ✕ | ECCV | 2018 | The paper presents a novel scene text detection technique that makes use of semantics-aware text borders and bootstrapping based text segment augmentation. | ||||||||||||
Long et al. [46] | TextSnake | ✕ | ✓ | ✓ | ✓ | ECCV | 2018 | The paper proposes TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms based on symmetry axis. | ||||||||||||
Yang et al. [47] | IncepText | ✕ | ✓ | ✓ | ✕ | IJCAI | 2018 | Designing a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. | ||||||||||||
Yue et al. [48] | ✕ | ✓ | ✓ | ✕ | BMVC | 2018 | Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously. | |||||||||||||
Zhong et al. [53] | AF-RPN | ✕ | ✓ | ✓ | ✕ | arXiv | 2018 | Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework. | ||||||||||||
Wang et al. [54] | PSENet | ✓ | ✓ | ✓ | ✓ | CVPR | 2019 | Proposing a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. | ||||||||||||
Xu et al.[57] | TextField | ✕ | ✓ | ✓ | ✓ | arXiv | 2018 | Presenting a novel direction field which can represent scene texts of arbitrary shapes. | ||||||||||||
Tian et al. [58] | FTDN | ✕ | ✓ | ✓ | ✕ | ICIP | 2018 | FTDN is able to segment text region and simultaneously regress text box at pixel-level. | ||||||||||||
Tian et al. [83] | ✕ | ✓ | ✓ | ✓ | CVPR | 2019 | Constraining embedding feature of pixels inside the same text region to share similar properties. | |||||||||||||
Huang et al. [4] | MSERs-CNN | ✕ | ✓ | ✕ | ✕ | ECCV | 2014 | Combining MSERs with CNN | ||||||||||||
Sun et al. [6] | ✕ | ✓ | ✕ | ✕ | PR | 2015 | Presenting a robust text detection approach based on color-enhanced CER and neural networks. | |||||||||||||
Baek et al. [62] | CRAFT | ✕ | ✓ | ✓ | ✓ | CVPR | 2019 | Proposing CRAFT effectively detect text area by exploring each character and affinity between characters. |
Method | Model | Code | Hori | Quad | Irreg | Source | Time | Highlight | ||||||||||||
Gupta et al. [15] | FCRN | ✓ | ✓ | ✕ | ✕ | CVPR | 2016 | (a) Proposing a fast and scalable engine to generate synthetic images of text in clutter; (b) FCRN. | ||||||||||||
Zhong et al. [20] | DeepText | ✕ | ✓ | ✕ | ✕ | arXiv | 2016 | (a) Inception-RPN; (b) Utilize ambiguous text category (ATC) information and multilevel region-of-interest pooling (MLRP). | ||||||||||||
Liao et al. [22] | TextBoxes | ✓ | ✓ | ✕ | ✕ | AAAI | 2017 | Mainly basing SSD object detection framework. | ||||||||||||
Liu et al. [25] | DMPNet | ✕ | ✓ | ✓ | ✕ | CVPR | 2017 | Quadrilateral sliding windows + shared Monte-Carlo method for fast and accurate computing of the polygonal areas + a sequential protocol for relative regression. | ||||||||||||
He et al. [26] | DDR | ✕ | ✓ | ✓ | ✕ | ICCV | 2017 | Proposing an FCN that has bi-task outputs where one is pixel-wise classification between text and non-text, and the other is direct regression to determine the vertex coordinates of quadrilateral text boundaries. | ||||||||||||
Jiang et al. [36] | R2CNN | ✕ | ✓ | ✓ | ✕ | arXiv | 2017 | Using the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. | ||||||||||||
Xing et al. [37] | ArbiText | ✕ | ✓ | ✓ | ✕ | arXiv | 2017 | Adopting the circle anchors and incorporating a pyramid pooling module into the Single Shot MultiBox Detector framework. | ||||||||||||
Zhang et al. [39] | FEN | ✕ | ✓ | ✕ | ✕ | AAAI | 2018 | Proposing a refined scene text detector with a novel Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement. | ||||||||||||
Wang et al. [41] | ITN | ✕ | ✓ | ✓ | ✕ | CVPR | 2018 | ITN is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene text instances with in-network transformation embedding. | ||||||||||||
Liao et al. [44] | RRD | ✕ | ✓ | ✓ | ✕ | CVPR | 2018 | The regression branch extracts rotation-sensitive features, while the classification branch extracts rotation-invariant features by pooling the rotation sensitive features. | ||||||||||||
Liao et al. [49] | TextBoxes++ | ✓ | ✓ | ✓ | ✕ | TIP | 2018 | Mainly basing SSD object detection framework and it replaces the rectangular box representation in conventional object detector by a quadrilateral or oriented rectangle representation. | ||||||||||||
He et al. [50] | ✕ | ✓ | ✓ | ✕ | TIP | 2018 | Proposing a scene text detection framework based on fully convolutional network with a bi-task prediction module. | |||||||||||||
Ma et al. [51] | RRPN | ✓ | ✓ | ✓ | ✕ | TMM | 2018 | RRPN + RRoI Pooling. | ||||||||||||
Zhu et al. [55] | SLPR | ✕ | ✓ | ✓ | ✓ | arXiv | 2018 | SLPR regresses multiple points on the edge of text line and then utilizes these points to sketch the outlines of the text. | ||||||||||||
Deng et al. [56] | ✓ | ✓ | ✓ | ✕ | arXiv | 2018 | CRPN employs corners to estimate the possible locations of text instances. And it also designs a embedded data augmentation module inside region-wise subnetwork. | |||||||||||||
Cai et al. [59] | FFN | ✕ | ✓ | ✕ | ✕ | ICIP | 2018 | Proposing a Feature Fusion Network to deal with text regions differing in enormous sizes. | ||||||||||||
Sabyasachi et al. [60] | RGC | ✕ | ✓ | ✓ | ✕ | ICIP | 2018 | Proposing a novel recurrent architecture to improve the learnings of a feature map at a given time. | ||||||||||||
Liu et al. [63] | CTD | ✓ | ✓ | ✓ | ✓ | PR | 2019 | CTD + TLOC + PNMS | ||||||||||||
Xie et al. [79] | DeRPN | ✓ | ✓ | ✕ | ✕ | AAAI | 2019 | DeRPN utilizes anchor string mechanism instead of anchor box in RPN. | ||||||||||||
Wang et al. [82] | ✕ | ✓ | ✓ | ✓ | CVPR | 2019 | Text-RPN + RNN | |||||||||||||
Liu et al. [84] | ✕ | ✓ | ✓ | ✓ | CVPR | 2019 | CSE mechanism | |||||||||||||
He et al. [29] | SSTD | ✓ | ✓ | ✓ | ✕ | ICCV | 2017 | Proposing an attention mechanism. Then developing a hierarchical inception module which efficiently aggregates multi-scale inception features. | ||||||||||||
Tian et al. [11] | ✕ | ✓ | ✕ | ✕ | ICCV | 2015 | Cascade boosting detects character candidates, and the min-cost flow network model get the final result. | |||||||||||||
Tian et al. [13] | CTPN | ✓ | ✓ | ✕ | ✕ | ECCV | 2016 | 1) RPN + LSTM. 2) RPN incorporate a new vertical anchor mechanism and LSTM connects the region to get the final result. | ||||||||||||
He et al. [19] | ✕ | ✓ | ✓ | ✕ | ACCV | 2016 | ER detetctor detects regions to get coarse prediction of text regions. Then the local context is aggregated to classify the remaining regions to obtain a final prediction. | |||||||||||||
Shi et al. [23] | SegLink | ✓ | ✓ | ✓ | ✕ | CVPR | 2017 | Decomposing text into segments and links. A link connects two adjacent segments. | ||||||||||||
Tian et al. [30] | WeText | ✕ | ✓ | ✕ | ✕ | ICCV | 2017 | Proposing a weakly supervised scene text detection method (WeText). | ||||||||||||
Zhu et al. [31] | RTN | ✕ | ✓ | ✕ | ✕ | ICDAR | 2017 | Mainly basing CTPN vertical vertical proposal mechanism. | ||||||||||||
Ren et al. [34] | ✕ | ✓ | ✕ | ✕ | TMM | 2017 | Proposing a CNN-based detector. It contains a text structure component detector layer, a spatial pyramid layer, and a multi-input-layer deep belief network (DBN). | |||||||||||||
Zhang et al. [10] | ✕ | ✓ | ✕ | ✕ | CVPR | 2015 | The proposed algorithm exploits the symmetry property of character groups and allows for direct extraction of text lines from natural images. |
Method | Model | Code | Hori | Quad | Irreg | Source | Time | Highlight | ||||||||||||
Tang et al. [52] | SSFT | ✕ | ✓ | ✕ | ✕ | TMM | 2018 | Proposing a novel scene text detection method that involves superpixel-based stroke feature transform (SSFT) and deep learning based region classification (DLRC). | ||||||||||||
Xie et al.[61] | SPCNet | ✕ | ✓ | ✓ | ✓ | AAAI | 2019 | Text Context module + Re-Score mechanism. | ||||||||||||
Liu et al. [64] | PMTD | ✓ | ✓ | ✓ | ✕ | arXiv | 2019 | Perform “soft” semantic segmentation. It assigns a soft pyramid label (i.e., a real value between 0 and 1) for each pixel within text instance. | ||||||||||||
Liu et al. [80] | BDN | ✓ | ✓ | ✓ | ✕ | IJCAI | 2019 | Discretizing bouding boxes into key edges to address label confusion for text detection. | ||||||||||||
Zhang et al. [81] | LOMO | ✕ | ✓ | ✓ | ✓ | CVPR | 2019 | DR + IRM + SEM | ||||||||||||
Zhou et al. [24] | EAST | ✓ | ✓ | ✓ | ✕ | CVPR | 2017 | The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images with instance segmentation. | ||||||||||||
Yue et al. [48] | ✕ | ✓ | ✓ | ✕ | BMVC | 2018 | Proposing a general framework for text detection called Guided CNN to achieve the two goals simultaneously. | |||||||||||||
Zhong et al. [53] | AF-RPN | ✕ | ✓ | ✓ | ✕ | arXiv | 2018 | Presenting AF-RPN(anchor-free) as an anchor-free and scale-friendly region proposal network for the Faster R-CNN framework. |
Method | Model | Source | Time | Method Category | IC11[68] | IC13 [69] | IC05[67] | ||||||
P | R | F | P | R | F | P | R | F | |||||
Yao et al. [1] | TD-Mixture | CVPR | 2012 | Traditional | ~ | ~ | ~ | 0.69 | 0.66 | 0.67 | ~ | ~ | ~ |
Yin et al. [2] | TPAMI | 2013 | 0.86 | 0.68 | 0.76 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Yin et al. [7] | TPAMI | 2015 | 0.838 | 0.66 | 0.738 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Wu et al. [9] | TMM | 2015 | ~ | ~ | ~ | 0.76 | 0.70 | 0.73 | ~ | ~ | ~ | ||
Liang et al. [8] | TIP | 2015 | 0.77 | 0.68 | 0.71 | 0.76 | 0.68 | 0.72 | ~ | ~ | ~ | ||
Michal et al.[12] | FASText | ICCV | 2015 | ~ | ~ | ~ | 0.84 | 0.69 | 0.77 | ~ | ~ | ~ | |
Li et al. [3] | TIP | 2014 | Segmentation | 0.80 | 0.62 | 0.70 | ~ | ~ | ~ | ~ | ~ | ~ | |
Zhang et al. [14] | CVPR | 2016 | ~ | ~ | ~ | 0.88 | 0.78 | 0.83 | ~ | ~ | ~ | ||
He et al. [18] | Text-CNN | TIP | 2016 | 0.91 | 0.74 | 0.82 | 0.93 | 0.73 | 0.82 | 0.87 | 0.73 | 0.79 | |
Yao et al. [21] | arXiv | 2016 | ~ | ~ | ~ | 0.889 | 0.802 | 0.843 | ~ | ~ | ~ | ||
Hu et al. [27] | WordSup | ICCV | 2017 | ~ | ~ | ~ | 0.933 | 0.875 | 0.903 | ~ | ~ | ~ | |
Tang et al.[32] | TIP | 2017 | 0.90 | 0.86 | 0.88 | 0.92 | 0.87 | 0.89 | ~ | ~ | ~ | ||
Wang et al. [38] | ICDAR | 2017 | 0.87 | 0.78 | 0.82 | 0.87 | 0.82 | 0.84 | ~ | ~ | ~ | ||
Deng et al. [40] | PixelLink | AAAI | 2018 | ~ | ~ | ~ | 0.886 | 0.875 | 0.881 | ~ | ~ | ~ | |
Liu et al. [42] | MCN | CVPR | 2018 | ~ | ~ | ~ | 0.88 | 0.87 | 0.88 | ~ | ~ | ~ | |
Lyu et al. [43] | CVPR | 2018 | ~ | ~ | ~ | 0.92 | 0.844 | 0.880 | ~ | ~ | ~ | ||
Chu et al. [45] | Border | ECCV | 2018 | ~ | ~ | ~ | 0.915 | 0.871 | 0.892 | ~ | ~ | ~ | |
Wang et al. [54] | PSENet | CVPR | 2019 | ~ | ~ | ~ | 0.94 | 0.90 | 0.92 | ~ | ~ | ~ | |
Huang et al. [4] | MSERs-CNN | ECCV | 2014 | 0.88 | 0.71 | 0.78 | ~ | ~ | ~ | 0.84 | 0.67 | 0.75 | |
Sun et al. [6] | PR | 2015 | 0.92 | 0.91 | 0.91 | 0.94 | 0.92 | 0.93 | ~ | ~ | ~ | ||
Gupta et al. [15] | FCRN | CVPR | 2016 | Regression | 0.94 | 0.77 | 0.85 | 0.938 | 0.764 | 0.842 | ~ | ~ | ~ |
Zhong et al. [20] | DeepText | arXiv | 2016 | 0.87 | 0.83 | 0.85 | 0.85 | 0.81 | 0.83 | ~ | ~ | ~ | |
Liao et al. [22] | TextBoxes | AAAI | 2017 | 0.89 | 0.82 | 0.86 | 0.89 | 0.83 | 0.86 | ~ | ~ | ~ | |
Liu et al. [25] | DMPNet | CVPR | 2017 | ~ | ~ | ~ | 0.93 | 0.83 | 0.870 | ~ | ~ | ~ | |
Jiang et al. [36] | R2CNN | arXiv | 2017 | ~ | ~ | ~ | 0.92 | 0.81 | 0.86 | ~ | ~ | ~ | |
Xing et al. [37] | ArbiText | arXiv | 2017 | ~ | ~ | ~ | 0.826 | 0.936 | 0.877 | ~ | ~ | ~ | |
Wang et al. [41] | ITN | CVPR | 2018 | 0.896 | 0.889 | 0.892 | 0.941 | 0.893 | 0.916 | ~ | ~ | ~ | |
Liao et al. [49] | TextBoxes++ | TIP | 2018 | ~ | ~ | ~ | 0.92 | 0.86 | 0.89 | ~ | ~ | ~ | |
He et al. [50] | TIP | 2018 | ~ | ~ | ~ | 0.91 | 0.84 | 0.88 | ~ | ~ | ~ | ||
Ma et al. [51] | RRPN | TMM | 2018 | ~ | ~ | ~ | 0.95 | 0.89 | 0.91 | ~ | ~ | ~ | |
Zhu et al. [55] | SLPR | arXiv | 2018 | ~ | ~ | ~ | 0.90 | 0.72 | 0.80 | ~ | ~ | ~ | |
Cai et al. [59] | FFN | ICIP | 2018 | ~ | ~ | ~ | 0.92 | 0.84 | 0.876 | ~ | ~ | ~ | |
Sabyasachi et al. [60] | RGC | ICIP | 2018 | ~ | ~ | ~ | 0.89 | 0.77 | 0.83 | ~ | ~ | ~ | |
Wang et al. [82] | CVPR | 2019 | ~ | ~ | ~ | 0.937 | 0.878 | 0.907 | ~ | ~ | ~ | ||
Liu et al. [84] | CVPR | 2019 | ~ | ~ | ~ | 0.937 | 0.897 | 0.917 | ~ | ~ | ~ | ||
He et al. [29] | SSTD | ICCV | 2017 | ~ | ~ | ~ | 0.89 | 0.86 | 0.88 | ~ | ~ | ~ | |
Tian et al. [11] | ICCV | 2015 | 0.86 | 0.76 | 0.81 | 0.852 | 0.759 | 0.802 | ~ | ~ | ~ | ||
Tian et al. [13] | CTPN | ECCV | 2016 | ~ | ~ | ~ | 0.93 | 0.83 | 0.88 | ~ | ~ | ~ | |
He et al. [19] | ACCV | 2016 | ~ | ~ | ~ | 0.90 | 0.75 | 0.81 | ~ | ~ | ~ | ||
Shi et al. [23] | SegLink | CVPR | 2017 | ~ | ~ | ~ | 0.877 | 0.83 | 0.853 | ~ | ~ | ~ | |
Tian et al. [30] | WeText | ICCV | 2017 | ~ | ~ | ~ | 0.911 | 0.831 | 0.869 | ~ | ~ | ~ | |
Zhu et al. [31] | RTN | ICDAR | 2017 | ~ | ~ | ~ | 0.94 | 0.89 | 0.91 | ~ | ~ | ~ | |
Ren et al. [34] | TMM | 2017 | 0.78 | 0.67 | 0.72 | 0.81 | 0.67 | 0.73 | ~ | ~ | ~ | ||
Zhang et al. [10] | CVPR | 2015 | 0.84 | 0.76 | 0.80 | 0.88 | 0.74 | 0.80 | ~ | ~ | ~ | ||
Tang et al. [52] | SSFT | TMM | 2018 | Hybrid | 0.906 | 0.847 | 0.876 | 0.911 | 0.861 | 0.885 | ~ | ~ | ~ |
Xie et al.[61] | SPCNet | AAAI | 2019 | ~ | ~ | ~ | 0.94 | 0.91 | 0.92 | ~ | ~ | ~ | |
Liu et al. [80] | BDN | IJCAI | 2019 | ~ | ~ | ~ | 0.887 | 0.894 | 0.89 | ~ | ~ | ~ | |
Zhou et al. [24] | EAST | CVPR | 2017 | ~ | ~ | ~ | 0.93 | 0.83 | 0.870 | ~ | ~ | ~ | |
Yue et al. [48] | BMVC | 2018 | ~ | ~ | ~ | 0.885 | 0.846 | 0.870 | ~ | ~ | ~ | ||
Zhong et al. [53] | AF-RPN | arXiv | 2018 | ~ | ~ | ~ | 0.94 | 0.90 | 0.92 | ~ | ~ | ~ |
Method | Model | Source | Time | Method Category | IC15 [70] | MSRA-TD500 [71] | USTB-SV1K [65] | SVT [66] | ||||||||
P | R | F | P | R | F | P | R | F | P | R | F | |||||
Le et al. [5] | HOCC | CVPR | 2014 | Traditional | ~ | ~ | ~ | 0.71 | 0.62 | 0.66 | ~ | ~ | ~ | ~ | ~ | ~ |
Yin et al. [7] | TPAMI | 2015 | ~ | ~ | ~ | 0.81 | 0.63 | 0.71 | 0.499 | 0.454 | 0.475 | ~ | ~ | ~ | ||
Wu et al. [9] | TMM | 2015 | ~ | ~ | ~ | 0.63 | 0.70 | 0.66 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Tian et al. [17] | IJCAI | 2016 | ~ | ~ | ~ | 0.95 | 0.58 | 0.721 | 0.537 | 0.488 | 0.51 | ~ | ~ | ~ | ||
Yang et al. [33] | TIP | 2017 | ~ | ~ | ~ | 0.95 | 0.58 | 0.72 | 0.54 | 0.49 | 0.51 | ~ | ~ | ~ | ||
Liang et al. [8] | TIP | 2015 | ~ | ~ | ~ | 0.74 | 0.66 | 0.70 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Zhang et al. [14] | CVPR | 2016 | Segmentation | 0.71 | 0.43 | 0.54 | 0.83 | 0.67 | 0.74 | ~ | ~ | ~ | ~ | ~ | ~ | |
Zhu et al. [16] | CVPR | 2016 | 0.81 | 0.91 | 0.85 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ||
He et al. [18] | Text-CNN | TIP | 2016 | ~ | ~ | ~ | 0.76 | 0.61 | 0.69 | ~ | ~ | ~ | ~ | ~ | ~ | |
Yao et al. [21] | arXiv | 2016 | 0.723 | 0.587 | 0.648 | 0.765 | 0.753 | 0.759 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Hu et al. [27] | WordSup | ICCV | 2017 | 0.793 | 0.77 | 0.782 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Wu et al. [28] | ICCV | 2017 | 0.91 | 0.78 | 0.84 | 0.77 | 0.78 | 0.77 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Dai et al. [35] | FTSN | arXiv | 2017 | 0.886 | 0.80 | 0.841 | 0.876 | 0.771 | 0.82 | ~ | ~ | ~ | ~ | ~ | ~ | |
Deng et al. [40] | PixelLink | AAAI | 2018 | 0.855 | 0.820 | 0.837 | 0.830 | 0.732 | 0.778 | ~ | ~ | ~ | ~ | ~ | ~ | |
Liu et al. [42] | MCN | CVPR | 2018 | 0.72 | 0.80 | 0.76 | 0.88 | 0.79 | 0.83 | ~ | ~ | ~ | ~ | ~ | ~ | |
Lyu et al. [43] | CVPR | 2018 | 0.895 | 0.797 | 0.843 | 0.876 | 0.762 | 0.815 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Chu et al. [45] | Border | ECCV | 2018 | ~ | ~ | ~ | 0.830 | 0.774 | 0.801 | ~ | ~ | ~ | ~ | ~ | ~ | |
Long et al. [46] | TextSnake | ECCV | 2018 | 0.849 | 0.804 | 0.826 | 0.832 | 0.739 | 0.783 | ~ | ~ | ~ | ~ | ~ | ~ | |
Yang et al. [47] | IncepText | IJCAI | 2018 | 0.938 | 0.873 | 0.905 | 0.875 | 0.790 | 0.830 | ~ | ~ | ~ | ~ | ~ | ~ | |
Wang et al. [54] | PSENet | CVPR | 2019 | 0.8692 | 0.845 | 0.8569 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Xu et al.[57] | TextField | arXiv | 2018 | 0.843 | 0.805 | 0.824 | 0.874 | 0.759 | 0.813 | ~ | ~ | ~ | ~ | ~ | ~ | |
Tian et al. [58] | FTDN | ICIP | 2018 | 0.847 | 0.773 | 0.809 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Tian et al. [83] | CVPR | 2019 | 0.883 | 0.850 | 0.866 | 0.842 | 0.817 | 0.829 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Baek et al. [62] | CRAFT | CVPR | 2019 | 0.898 | 0.843 | 0.869 | 0.882 | 0.782 | 0.829 | ~ | ~ | ~ | ~ | ~ | ~ | |
Gupta et al. [15] | FCRN | CVPR | 2016 | Regression | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | 0.651 | 0.599 | 0.624 |
Liu et al. [25] | DMPNet | CVPR | 2017 | 0.732 | 0.682 | 0.706 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
He et al. [26] | DDR | ICCV | 2017 | 0.82 | 0.80 | 0.81 | 0.77 | 0.70 | 0.74 | ~ | ~ | ~ | ~ | ~ | ~ | |
Jiang et al. [36] | R2CNN | arXiv | 2017 | 0.856 | 0.797 | 0.825 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Xing et al. [37] | ArbiText | arXiv | 2017 | 0.792 | 0.735 | 0.759 | 0.78 | 0.72 | 0.75 | ~ | ~ | ~ | ~ | ~ | ~ | |
Wang et al. [41] | ITN | CVPR | 2018 | 0.857 | 0.741 | 0.795 | 0.903 | 0.723 | 0.803 | ~ | ~ | ~ | ~ | ~ | ~ | |
Liao et al. [44] | RRD | CVPR | 2018 | 0.88 | 0.8 | 0.838 | 0.876 | 0.73 | 0.79 | ~ | ~ | ~ | ~ | ~ | ~ | |
Liao et al. [49] | TextBoxes++ | TIP | 2018 | 0.878 | 0.785 | 0.829 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
He et al. [50] | TIP | 2018 | 0.85 | 0.80 | 0.82 | 0.91 | 0.81 | 0.86 | ~ | ~ | ~ | ~ | ~ | ~ | ||
Ma et al. [51] | RRPN | TMM | 2018 | 0.822 | 0.732 | 0.774 | 0.821 | 0.677 | 0.742 | ~ | ~ | ~ | ~ | ~ | ~ | |
Zhu et al. [55] | SLPR | arXiv | 2018 | 0.855 | 0.836 | 0.845 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Deng et al. [56] | arXiv | 2018 | 0.89 | 0.81 | 0.845 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ||
Sabyasachi et al. [60] | RGC | ICIP | 2018 | 0.83 | 0.81 | 0.82 | 0.85 | 0.76 | 0.80 | ~ | ~ | ~ | ~ | ~ | ~ | |
Wang et al. [82] | CVPR | 2019 | 0.892 | 0.86 | 0.876 | 0.852 | 0.821 | 0.836 | ~ | ~ | ~ | ~ | ~ | ~ | ||
He et al. [29] | SSTD | ICCV | 2017 | 0.80 | 0.73 | 0.77 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Tian et al. [13] | CTPN | ECCV | 2016 | 0.74 | 0.52 | 0.61 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
He et al. [19] | ACCV | 2016 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | 0.87 | 0.73 | 0.79 | ||
Shi et al. [23] | SegLink | CVPR | 2017 | 0.731 | 0.768 | 0.75 | 0.86 | 0.70 | 0.77 | ~ | ~ | ~ | ~ | ~ | ~ | |
Tang et al. [52] | SSFT | TMM | 2018 | Hybrid | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | 0.541 | 0.758 | 0.631 |
Xie et al.[61] | SPCNet | AAAI | 2019 | 0.89 | 0.86 | 0.87 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Liu et al. [64] | PMTD | arXiv | 2019 | 0.913 | 0.874 | 0.893 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Liu et al. [80] | BDN | IJCAI | 2019 | 0.881 | 0.846 | 0.863 | 0.87 | 0.815 | 0.842 | ~ | ~ | ~ | ~ | ~ | ~ | |
Zhang et al. [81] | LOMO | CVPR | 2019 | 0.878 | 0.876 | 0.877 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Zhou et al. [24] | EAST | CVPR | 2017 | 0.833 | 0.783 | 0.807 | 0.873 | 0.674 | 0.761 | ~ | ~ | ~ | ~ | ~ | ~ | |
Yue et al. [48] | BMVC | 2018 | 0.866 | 0.789 | 0.823 | ~ | ~ | ~ | ~ | ~ | ~ | 0.691 | 0.660 | 0.675 | ||
Zhong et al. [53] | AF-RPN | arXiv | 2018 | 0.89 | 0.83 | 0.86 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ |
Method | Model | Source | Time | Method Category | COCO-Text [72] | RCTW-17 [73] | MLT [76] | OSTD[77] | ||||||||
P | R | F | P | R | F | P | R | F | P | R | F | |||||
Le et al. [5] | HOCC | CVPR | 2014 | Traditional | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | 0.80 | 0.73 | 0.76 |
Yao et al. [21] | arXiv | 2016 | Segmentation | 0.432 | 0.27 | 0.333 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Hu et al. [27] | WordSup | ICCV | 2017 | 0.452 | 0.309 | 0.368 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Lyu et al. [43] | CVPR | 2018 | 0.351 | 0.348 | 0.349 | ~ | ~ | ~ | 0.743 | 0.706 | 0.724 | ~ | ~ | ~ | ||
Chu et al. [45] | Border | ECCV | 2018 | ~ | ~ | ~ | 0.782 | 0.588 | 0.671 | 0.777 | 0.621 | 0.690 | ~ | ~ | ~ | |
Yang et al. [47] | IncepText | IJCAI | 2018 | ~ | ~ | ~ | 0.785 | 0.569 | 0.660 | ~ | ~ | ~ | ~ | ~ | ~ | |
Wang et al. [54] | PSENet | CVPR | 2019 | ~ | ~ | ~ | ~ | ~ | ~ | 0.7535 | 0.6918 | 0.7213 | ~ | ~ | ~ | |
Baek et al. [62] | CRAFT | CVPR | 2019 | ~ | ~ | ~ | ~ | ~ | ~ | 0.806 | 0.682 | 0.739 | ~ | ~ | ~ | |
He et al. [29] | SSTD | ICCV | 2017 | Regression | 0.46 | 0.31 | 0.37 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ |
Gupta et al. [15] | FCRN | CVPR | 2016 | ~ | ~ | ~ | ~ | ~ | ~ | 0.844 | 0.763 | 0.801 | ~ | ~ | ~ | |
Liao et al. [49] | TextBoxes++ | TIP | 2018 | 0.61 | 0.57 | 0.59 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Ma et al. [51] | RRPN | TMM | 2018 | ~ | ~ | ~ | ~ | ~ | ~ | 0.7669 | 0.5794 | 0.6601 | ~ | ~ | ~ | |
Deng et al. [56] | arXiv | 2018 | 0.555 | 0.633 | 0.591 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ||
Cai et al. [59] | FFN | ICIP | 2018 | 0.43 | 0.35 | 0.39 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Xie et al. [79] | DeRPN | AAAI | 2019 | 0.586 | 0.557 | 0.571 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
He et al. [29] | SSTD | ICCV | 2017 | 0.46 | 0.31 | 0.37 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Xie et al.[61] | SPCNet | AAAI | 2019 | Hybrid | ~ | ~ | ~ | ~ | ~ | ~ | 0.806 | 0.686 | 0.741 | ~ | ~ | ~ |
Liu et al. [64] | PMTD | arXiv | 2019 | ~ | ~ | ~ | ~ | ~ | ~ | 0.844 | 0.763 | 0.801 | ~ | ~ | ~ | |
Liu et al. [80] | BDN | IJCAI | 2019 | ~ | ~ | ~ | ~ | ~ | ~ | 0.791 | 0.698 | 0.742 | ~ | ~ | ~ | |
Zhang et al. [81] | LOMO | CVPR | 2019 | ~ | ~ | ~ | 0.791 | 0.602 | 0.684 | 0.802 | 0.672 | 0.731 | ~ | ~ | ~ | |
Zhou et al. [24] | EAST | CVPR | 2017 | 0.504 | 0.324 | 0.395 | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | ~ | |
Zhong et al. [53] | AF-RPN | arXiv | 2018 | ~ | ~ | ~ | ~ | ~ | ~ | 0.75 | 0.66 | 0.70 | ~ | ~ | ~ |
In this section, we only select those methods suitable for irregular text detection.
Method | Model | Source | Time | Method Category | Total-text [74] | SCUT-CTW1500 [75] | ||||
P | R | F | P | R | F | |||||
Baek et al. [62] | CRAFT | CVPR | 2019 | Segmentation | 0.876 | 0.799 | 0.836 | 0.860 | 0.811 | 0.835 |
Long et al. [46] | TextSnake | ECCV | 2018 | 0.827 | 0.745 | 0.784 | 0.679 | 0.853 | 0.756 | |
Tian et al. [83] | CVPR | 2019 | ~ | ~ | ~ | 81.7 | 84.2 | 80.1 | ||
Wang et al. [54] | PSENet | CVPR | 2019 | 0.840 | 0.779 | 0.809 | 0.848 | 0.797 | 0.822 | |
Zhu et al. [55] | SLPR | arXiv | 2018 | Regression | ~ | ~ | ~ | 0.801 | 0.701 | 0.748 |
Liu et al. [63] | CTD+TLOC | PR | 2019 | ~ | ~ | ~ | 0.774 | 0.698 | 0.734 | |
Wang et al. [82] | CVPR | 2019 | ~ | ~ | ~ | 80.1 | 80.2 | 80.1 | ||
Liu et al. [84] | CVPR | 2019 | 0.814 | 0.791 | 0.802 | 0.787 | 0.761 | 0.774 | ||
Zhang et al. [81] | LOMO | CVPR | 2019 | Hybrid | 87.6 | 79.3 | 83.3 | 85.7 | 76.5 | 80.8 |
Xie et al.[61] | SPCNet | AAAI | 2019 | 0.83 | 0.83 | 0.83 | ~ | ~ | ~ |
[A] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper
[B] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper
[C] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper
If you are insterested in developing better scene text detection metrics, some references recommended here might be useful.
[A] Wolf, Christian, and Jean-Michel Jolion. “Object count/area graphs for the evaluation of object detection and segmentation algorithms.” International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. paper
[B] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. paper
[C] Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. “What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions.” Image and Vision Computing 46 (2016): 1-17. paper
[D] Shi, Baoguang, et al. “ICDAR2017 competition on reading chinese text in the wild (RCTW-17).” 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. paper
[E] Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE.
paper
[F] Dangla, Aliona, et al. “A first step toward a fair comparison of evaluation protocols for text detection algorithms.” 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. paper
[G] He,Mengchao and Liu, Yuliang, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web images. ICPR 2018. paper
[H] Liu, Yuliang and Jin, Lianwen, et al. “Tightness-aware Evaluation Protocol for Scene Text Detection” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. paper code
OCR | API | Free |
---|---|---|
Tesseract OCR Engine | × | √ |
Azure | √ | √ |
ABBYY | √ | √ |
OCR Space | √ | √ |
SODA PDF OCR | √ | √ |
Free Online OCR | √ | √ |
Online OCR | √ | √ |
Super Tools | √ | √ |
Online Chinese Recognition | √ | √ |
Calamari OCR | × | √ |
Tencent OCR | √ | × |
[1] Yao C, Bai X, Liu W, et al. Detecting texts of arbitrary orientations in natural images. 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012: 1083-1090. Paper |
[2] Yin X C, Yin X, Huang K, et al. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 36(5): 970-83. Paper |
[3] Li Y, Jia W, Shen C, et al. Characterness: An indicator of text in the wild. IEEE transactions on image processing, 2014, 23(4): 1666-1677. Paper |
[4] Huang W, Qiao Y, Tang X. Robust scene text detection with convolution neural network induced mser trees. European Conference on Computer Vision(ECCV), 2014: 497-511. Paper |
[5] Kang L, Li Y, Doermann D. Orientation robust text line detection in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 4034-4041. Paper |
[6] Sun L, Huo Q, Jia W, et al. A robust approach for text detection from natural scene images. Pattern Recognition, 2015, 48(9): 2906-2920. Paper |
[7] Yin X C, Pei W Y, Zhang J, et al. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015 (9): 1930-1937. Paper |
[8] Liang G, Shivakumara P, Lu T, et al. Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Transactions on Image Processing, 2015, 24(11): 4488-4501. Paper |
[9] Wu L, Shivakumara P, Lu T, et al. A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video. IEEE Trans. Multimedia, 2015, 17(8): 1137-1152. Paper |
[10] Zheng Z, Wei S, et al. Symmetry-based text line detection in natural scenes. IEEE Conference on Computer Vision & Pattern Recognition(CVPR), 2015. Paper |
[11] Tian S, Pan Y, Huang C, et al. Text flow: A unified text detection system in natural scene images. Proceedings of the IEEE international conference on computer vision(ICCV). 2015: 4651-4659. Paper |
[12] Buta M, et al. FASText: Efficient unconstrained scene text detector. 2015 IEEE International Conference on Computer Vision (ICCV). 2015: 1206-1214. Paper |
[13] Tian Z, Huang W, He T, et al. Detecting text in natural image with connectionist text proposal network. European conference on computer vision(ECCV), 2016: 56-72. Paper Code |
[14] Zhang Z, Zhang C, Shen W, et al. Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 4159-4167. Paper |
[15] Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016: 2315-2324. Paper Code |
[16] S. Zhu and R. Zanibbi, A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 625-632. Paper |
[17] Tian S, Pei W Y, Zuo Z Y, et al. Scene Text Detection in Video by Learning Locally and Globally. IJCAI. 2016: 2647-2653. Paper |
[18] He T, Huang W, Qiao Y, et al. Text-attentional convolutional neural network for scene text detection. IEEE transactions on image processing, 2016, 25(6): 2529-2541. Paper |
[19] He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. Aggregating local context for accurate scene text detection. ACCV, 2016. Paper |
[20] Zhong Z, Jin L, Zhang S, et al. Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314, 2016. Paper |
[21] Yao C, Bai X, Sang N, et al. Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002, 2016. Paper |
[22] Liao M, Shi B, Bai X, et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI. 2017: 4161-4167. Paper Code |
[23] Shi B, Bai X, Belongie S. Detecting Oriented Text in Natural Images by Linking Segments. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 3482-3490. Paper Code |
[24] Zhou X, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. CVPR, 2017: 2642-2651. Paper Code |
[25] Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. CVPR, 2017: 3454-3461. Paper |
[26] He W, Zhang X Y, Yin F, et al. Deep Direct Regression for Multi-Oriented Scene Text Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017: 745-753. Paper |
[27] Hu H, Zhang C, Luo Y, et al. Wordsup: Exploiting word annotations for character based text detection. ICCV, 2017. Paper |
[28] Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. ICCV, 2017. Paper |
[29] He P, Huang W, He T, et al. Single shot text detector with regional attention. The IEEE International Conference on Computer Vision (ICCV). 2017, 6(7). Paper Code |
[30] Tian S, Lu S, Li C. Wetext: Scene text detection under weak supervision. ICCV, 2017. Paper |
[31] Zhu, Xiangyu and Jiang, Yingying et al. Deep Residual Text Detection Network for Scene Text. ICDAR, 2017. Paper |
[32] Tang Y , Wu X. Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks. IEEE Transactions on Image Processing, 2017, 26(3):1509-1520. Paper |
[33] Yang C, Yin X C, Pei W Y, et al. Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework with Dynamic Programming. IEEE Transactions on Image Processing, 2017. Paper |
[34] X. Ren, Y. Zhou, J. He, K. Chen, X. Yang and J. Sun, A Convolutional Neural Network-Based Chinese Text Detection Algorithm via Text Structure Modeling. in IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 506-518, March 2017. Paper |
[35] Dai Y, Huang Z, Gao Y, et al. Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:1709.03272, 2017. Paper |
[36] Jiang Y, Zhu X, Wang X, et al. R2CNN: rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579, 2017. Paper |
[37] Xing D, Li Z, Chen X, et al. ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene. arXiv preprint arXiv:1711.11249, 2017. Paper |
[38] C. Wang, F. Yin and C. Liu, Scene Text Detection with Novel Superpixel Based Character Candidate Extraction. in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, pp. 929-934. Paper |
[39] Sheng Zhang, Yuliang Liu, Lianwen Jin et al. Feature Enhancement Network: A Refined Scene Text Detector. In AAAI 2018. Paper |
[40] Dan Deng et al. PixelLink: Detecting Scene Text via Instance Segmentation. In AAAI 2018. Paper Code |
[41] Fangfang Wang, Liming Zhao, Xi L et al. Geometry-Aware Scene Text Detection with Instance Transformation Network. In CVPR 2018. Paper |
[42] Zichuan Liu, Guosheng Lin, Sheng Yang et al. Learning Markov Clustering Networks for Scene Text Detection. In CVPR 2018. Paper |
[43] Pengyuan Lyu, Cong Yao, Wenhao Wu et al. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. In CVPR 2018. Paper |
[44] Minghui L, Zhen Z, Baoguang S. Rotation-Sensitive Regression for Oriented Scene Text Detection. In CVPR 2018. Paper |
[45] Chuhui Xue et al. Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping. In ECCV 2018. Paper |
[46] Long, Shangbang and Ruan, Jiaqiang, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In ECCV, 2018. Paper |
[47] Qiangpeng Yang, Mengli Cheng et al. IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection. In IJCAI 2018. Paper |
[48] Xiaoyu Yue et al. Boosting up Scene Text Detectors with Guided CNN. In BMVC 2018. Paper |
[49] Liao M, Shi B , Bai X. TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, 2018, 27(8):3676-3690. Paper Code |
[50] W. He, X. Zhang, F. Yin and C. Liu, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, in IEEE Transactions on Image Processing, vol. 27, no. 11, pp.5406-5419, 2018. Paper |
[51] Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals.in IEEE Transactions on Multimedia, 2018. Paper Code |
[52] Youbao Tang and Xiangqian Wu. Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification. In TMM, 2018. Paper |
[53] Zhuoyao Zhong, Lei Sun and Qiang Huo. An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches. arXiv preprint arXiv:1804.09003. 2018. Paper |
[54] Wenhai W, Enze X, et al. Shape Robust Text Detection with Progressive Scale Expansion Network. In CVPR 2019. Paper Code |
[55] Zhu Y, Du J. Sliding Line Point Regression for Shape Robust Scene Text Detection. arXiv preprint arXiv:1801.09969, 2018. Paper |
[56] Linjie D, Yanxiang Gong, et al. Detecting Multi-Oriented Text with Corner-based Region Proposals. arXiv preprint arXiv: 1804.02690, 2018. Paper Code |
[57] Yongchao Xu, Yukang Wang, Wei Zhou, et al. TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. arXiv preprint arXiv: 1812.01393, 2018. Paper |
[58] Xiaowei Tian, Dao Wu, Rui Wang, Xiaochun Cao. Focal Text: an Accurate Text Detection with Focal Loss. In ICIP 2018. Paper |
[59] Chenqin C, Pin L, Bing S. Feature Fusion Network for Scene Text Detection. In ICIP, 2018. Paper |
[60] Sabyasachi Mohanty et al. Recurrent Global Convolutional Network for Scene Text Detection. In ICIP 2018. Paper |
[61] Enze Xie, et al. Scene Text Detection with Supervised Pyramid Context Network. In AAAI 2019. Paper |
[62] Youngmin Baek, Bado Lee, et al. Character Region Awareness for Text Detection. In CVPR 2019. Paper |
[63] Yuliang L, Lianwen J, Shuaitao Z, et al. Curved Scene Text Detection via Transverse and Longitudinal Sequence Connection. Pattern Recognition, 2019. Paper Code |
[64] Jingchao Liu, Xuebo Liu, et al, Pyramid Mask Text Detector. arXiv preprint arXiv:1903.11800, 2019. Paper Code |
[79] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection. In AAAI, 2019. Paper Code |
[80] Yuliang Liu, Lianwen Jin, et al, Omnidirectional Scene Text Detction with Sequential-free Box Discretization. In IJ |