
1. funtinue



其他任务直接在相应数据集上预训练, 如检测在coco

2.端到端任务如ocr 检测+识别


可以先只训练检测, 在这个基础上再端到端序列, 


To jointly train text localization and recognition network, the overall training procedure is divided into two stages. To overcome the shortage of small datasets, we utilize the vgg-synthetic data [12], i.e., VGG-Synth 800k, to train the base network using ImageNet pretrained model. Specifically, one training strategy is to train the detection branch until it almost converges to a steady point, and jointly train detection and recognition branches simultaneously. Another training strategy is to train the recognition branch at first instead, and then jointly train the whole network. These two training procedures are equal to achieve the final convergence in our attempts. To evaluate the performance on benchmarks, we finetune the VGG-Synth trained model on ICDAR-13, ICDAR15, and Total-Text, respectively.

