TextDetection文本检测数据集汇总

字符识别和文本检测在实际生活中十分重要,从最简单的车牌检测到复杂的环境文本识别都需要这一技术的支持。目前这一领域最著名的会议是International Conference on Document Analysis and Recognition(ICDAR)

1.文字检测与识别主要数据集


Total-Text

paper
github


COCO-Text, COCO-Text V2
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JjV7heVX-1575979414454)(https://vision.cornell.edu/se3/wp-content/uploads/2016/01/cocotext-705x708.jpg =500x)]
paper


MSRA-TD500
TextDetection文本检测数据集汇总_第1张图片
ref paper


ICDAR2017, 竞赛中包含了多个领域的数据集。

Category: Handwritten Historical Document Layout Recognition

cBAD: ICDAR2017 Competition on Baseline Detection ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts ICDAR2017 Competition on Historical Book Analysis
Category: Historical Handwritten Script Analysis

ICDAR 2017 Competition on the Classification of Medieval Handwritings in Latin Script ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) Competition on Multi-script Writer Identification Using LAMIS-MSHD and CERUG Databases
Category: Character/Word Spotting

Competition on Query-by-Example Glyph Spotting of Southeast Asian Palm Leaf Manuscript Images Handwritten Keyword Spotting Competition
Category: Handwriting Recognition

ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset ICDAR2017 Competition on Information Extraction in Historical Handwritten Records
Category: Document Image Binarization

ICDAR2017 Competition on Document Image Binarization (DIBCO 2017)
Category: Document Recognition (Layout analysis & Text Recognition)

ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017 ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017 ICDAR2017 Competition on Page Object Detection
Category: Document Reconstruction

Smartphone-captured Document Image Reconstruction from Multiple Views
Category: Post OCR Correction

ICDAR2017 Competition on Post-OCR Text Correction
Category: Robust Reading Competitions

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) paper:https://arxiv.org/pdf/1708.09585.pdf ICDAR2017 Robust Reading Challenge on COCO-Text
Category: Text in Video

ICDAR2017 Competition on Arabic Text Detection and Recognition in Multi-resolution Video Frames Competition on Video Script Identification
Category: Forensics
Competition on File Type Identification
Category: Miscellaneous Competitions

ICDAR2017 Competition on Multi-font and Multi-Size Digitally Represented Arabic Text ref:http://mac.xmu.edu.cn/valse2017/ppt/Invited/VALSE2017_bx.pdf

ICDAR2015
场景文字识别
生成数字图像文字识别
还包含了一个文本超分辨数据集
opencv中的一个接口


ICDAR2013
Robust Reading:http://refbase.cvc.uab.es/files/KSU2013.pdf
中文手写数据集, 下载
ref:https://www.computer.org/csdl/proceedings/icdar/2013/4999/00/06628568.pdf
数字文件researcher:https://roundtrippdf.com/en/

2.一些最近发表的工作(from total-text)

Detection

MSR , FTSN, TextSnake, TextField , Mask TextSpotter , TextNet, Textboxes, EAST, Baseline, SegLink

End-to-end Recognition

TextNet, Mask TextSpotter, Textboxes


此外还有下面一些和数字字符识别相关的数据集:
手写字符识别:MNIST


街道门牌号数据集:SVHN


一些相关网站,可以找到更多数据集:
国际模式识别协会会第十一技术组(OCR):IAPR-TC11, TC11 datasets
图像识别TC10工作组, TC10 datasets
ICDAR 2017汇总:https://github.com/cs-chan/Total-Text-Dataset
近年来Robust Reading竞赛汇总网站:http://rrc.cvc.uab.es/
研究导航:http://www.guide2research.com/conference/icdar-2019

TextDetection文本检测数据集汇总_第2张图片
pic from pexels.com

你可能感兴趣的:(深度学习,目标检测,视觉)