Rare Chinese Character Recognition by Radical Extraction Network 笔记

Rare Chinese Character Recognition by Radical Extraction Network 笔记

声明:仅翻译部分内容,若阅读体验不佳,还请见谅

摘要:

首先提取和识别基础的Graphical components。
在这篇论文里面提出了新的Radical Extraction Network。使用CNN提取和识别Radicals
首先在常见的Chinese characters里面学习到识别不同的Radicals,然后迁移学习到的deep appearance models到常用的Chinese characters上面。

1 Introduction

Opitcal Character Recognition(OCR)
中文识别起来比较困难(中文OCR比较困难)是因为中文字比较多而且字与字之间比较相似。

Chinese characters are formed by a combination of radicals(中文字由部首组成)

takes as input the feature maps(以feature maps作为输入)

不同于传统的方法经常需要对齐的radical-level训练图片作来实现识别不同radicals的功能,we learn to localize in a weakly supervised fashion:在训练过程中只用到了character-level(字级别)的图片。

weakly supervised object detection(WSD)弱监督目标检测

REN has three data streams: 1 a radical-level classification stream to classify different radicals,2 a radical-level detection stream to select positive candidate bounding box that tightly contain a particular radical,3 and a character-level classification stream to classify different Chinese characters based on radical-level recognition results.(偏旁部首级别的分分类,偏旁部首级别的目标检测,字级别的分类)

整个过程端到端训练,训练过程中只需要字级别的图片,REN被训练以自动地从字级别的annotations(标注?)中提取和检测不同的radicals。

REN可以以较高准确率识别出radicals,并且提高了Chinese characters的识别准确率。

2 Method

Architecture of Radical Extraction Network

WSDDN is a state-of-art weakly supervised object detection method。REN has one more stream than WSDDN to perform classification on character-level.

ROI pooling layer:
输入:
以及region set
输出:

where the is the dimension of pooled representation of each bounding box.

a radical-level classification stream

矩阵 被几个全连接网络处理,并且每个区域(region)分别被映射到一个维向量。这些全连接网络输出矩阵
,之后一个row-wise softmax operator被应用到上面。该数据流的最终输出为:

b Radical-level detection data stream

The aim of this data stream is to select a best bounding box for every radical.
该数据流始于被池化的表示矩阵。我们通过几个全连接网络将每个region映射到一个 向量。这些全连接网络输出一个score matrix,之后一个column-wise softmax operator 将被加之于上。在第一个(?)数据流里面我们不会让这些层之间共享权重系数。该数据流的最终输出由下式给出:

The radical score is obtained by combining and :

其中 表示各对应元素相乘( element-wise product operator). 考虑到 中的每个元素都在 (0, 1)中取值 ,我们将视为字包含第个radical的置信度(confidence)。

c Character-level classification data stream

The aim of this stream is to obtain the final character-level classification score.我们基于以下信息对一个中文字做分类:1)中文图片本身以及,2)从图片中识别出的偏旁部首。图片本身可以提供必要的global context,从中识别出的偏旁部首则可以捕获到字的内部结构。在该数据流中我们融合了以上两种信息。

该数据流始于卷积feature map ,并通过几个全连接网络将其映射到一个的global context 向量。之后,再在上面施加一个linear map,再追加一个softmax operator:

where is the final character-level classification score, , are weights to be learned, and .

Training REN

training data:
charcter-level labels:
where .
我们使用Edge Boxes从中提取了大约B个bounding box,由此构成的集合记为。更进一步,我们可以构造一个character-radical correspondence matrix ,以表示一个character是否包含一个特定的radical。注意到该矩阵与训练集的大小无关,因此容易获得。基于我们可以为构造一个radical-level的标签 ,以表示某一特定的radical是否在中。

J_{rad}(\theta)=-\frac{1}{N} \sum_{i=1}^N \sum_{j=1}^{C_{rad}} \mathbf{1} \left\{y_i^{rad}=1\right\}log[\phi^{rad}(x_i, \mathcal{R}_i; \theta)]_j -\frac{1}{N} \sum_{i=1}^N \sum_{j=1}^{C_{rad}} \mathbf{1} \left\{y_i^{rad}=0\right\}log(1-[\phi^{rad}(x_i, \mathcal{R}_i; \theta)]_j)

TBC

你可能感兴趣的:(Rare Chinese Character Recognition by Radical Extraction Network 笔记)