字符识别和文本检测在实际生活中十分重要,从最简单的车牌检测到复杂的环境文本识别都需要这一技术的支持。目前这一领域最著名的会议是International Conference on Document Analysis and Recognition(ICDAR)
Total-Text
paper
github
COCO-Text, COCO-Text V2
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JjV7heVX-1575979414454)(https://vision.cornell.edu/se3/wp-content/uploads/2016/01/cocotext-705x708.jpg =500x)]
paper
ICDAR2017, 竞赛中包含了多个领域的数据集。
cBAD: ICDAR2017 Competition on Baseline Detection ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts ICDAR2017 Competition on Historical Book Analysis
ICDAR 2017 Competition on the Classification of Medieval Handwritings in Latin Script ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) Competition on Multi-script Writer Identification Using LAMIS-MSHD and CERUG Databases
Competition on Query-by-Example Glyph Spotting of Southeast Asian Palm Leaf Manuscript Images Handwritten Keyword Spotting Competition
ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset ICDAR2017 Competition on Information Extraction in Historical Handwritten Records
ICDAR2017 Competition on Document Image Binarization (DIBCO 2017)
ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017 ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017 ICDAR2017 Competition on Page Object Detection
Smartphone-captured Document Image Reconstruction from Multiple Views
ICDAR2017 Competition on Post-OCR Text Correction
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) paper:https://arxiv.org/pdf/1708.09585.pdf ICDAR2017 Robust Reading Challenge on COCO-Text
ICDAR2017 Competition on Arabic Text Detection and Recognition in Multi-resolution Video Frames Competition on Video Script Identification
ICDAR2017 Competition on Multi-font and Multi-Size Digitally Represented Arabic Text ref:http://mac.xmu.edu.cn/valse2017/ppt/Invited/VALSE2017_bx.pdf
ICDAR2015
场景文字识别
生成数字图像文字识别
还包含了一个文本超分辨数据集
opencv中的一个接口
ICDAR2013
Robust Reading:http://refbase.cvc.uab.es/files/KSU2013.pdf
中文手写数据集, 下载
ref:https://www.computer.org/csdl/proceedings/icdar/2013/4999/00/06628568.pdf
数字文件researcher:https://roundtrippdf.com/en/
MSR , FTSN, TextSnake, TextField , Mask TextSpotter , TextNet, Textboxes, EAST, Baseline, SegLink
TextNet, Mask TextSpotter, Textboxes
此外还有下面一些和数字字符识别相关的数据集:
手写字符识别:MNIST
街道门牌号数据集:SVHN
一些相关网站,可以找到更多数据集:
国际模式识别协会会第十一技术组(OCR):IAPR-TC11, TC11 datasets
图像识别TC10工作组, TC10 datasets
ICDAR 2017汇总:https://github.com/cs-chan/Total-Text-Dataset
近年来Robust Reading竞赛汇总网站:http://rrc.cvc.uab.es/
研究导航:http://www.guide2research.com/conference/icdar-2019