Adversarial Multi-Criteria Learning for Chinese Word Segmentation

paper: https://arxiv.org/pdf/1704.07556.pdf

code:  https://github.com/FudanNLP

Adversarial Multi-Criteria Learning for Chinese Word Segmentation_第1张图片
title

Abstract

中文分词(CWS)有很多不同的分词标准criterion,这篇文章就是想要利用对抗学习,提取多种不同的标准中的共享知识。

In this paper, we propose adversarial multi-criteria learning for CWS by integrating shared knowledge from multiple heterogeneous segmentation criteria.

以前也有类似利用多个corpora的方法,不过大多都只是利用linear classifier with discrete features。这篇文章其实就是一个multi-task任务,他把每个分词标准当作一个task,然后有三个不同的share-private models:shared / private layer,提取与标准无关/相关的特征。用对抗的方法确保共享层提取common underlying and criteria-invariant features。

The contributions of this paper could be summarized as follows.

• Multi-criteria learning is first introduced for CWS, in which we propose three shared-private models to integrate multiple segmentation criteria.

• An adversarial strategy is used to force the shared layer to learn criteria-invariant features, in which a new objective function is also proposed instead of the original cross-entropy loss.

• We conduct extensive experiments on eight CWS corpora with different segmentation criteria, which is by far the largest number of datasets used simultaneously.


Methods

对每个字符标记 {B, M, E, S} (begin, middle, end, single)。普通结构:character embedding layer -> feature layers (BLSTM) -> tag inference layer (CRF).

Adversarial Multi-Criteria Learning for Chinese Word Segmentation_第2张图片
Three shared-private models for multi-criteria learning. The yellow blocks are the shared BLSTM while the gray blocks are private BLSTM. The yellow circles are shared embedding. The red information flow indicates the difference between three models.

Model 1: Parallel Shared-Private Model

把private和shared layer看作并行的,在隐层的计算相互独立。不过两个隐层一起进入CRF层

the score function in the CRF layer

Model 2: Stacked Shared-Private Model

把shared层的输出也作为private输入的一部分,并只将private的隐层输入CRF层

Adversarial Multi-Criteria Learning for Chinese Word Segmentation_第3张图片
the hidden states of shared layer and private layer (第m个标准)

Model 3: Skip-Layer Shared-Private Model

Eq.14 + 15 + 16

Adversarial Training for Shared Layer

Adversarial Multi-Criteria Learning for Chinese Word Segmentation_第4张图片
The architecture of Model-III with adversarial training strategy for shared layer.

为了让shared层提取到的特征是criterion-invariant的。用一个criterion discriminator判别是句子被shared features用哪个criterion标注。

Training


Adversarial Multi-Criteria Learning for Chinese Word Segmentation_第5张图片
The objective function for multi-task model. To maximize the log conditional likelihood of true labels on all the corpora.
The criterion discriminator maximizes the cross entropy of predicted criterion distribution p(.|X) and true criterion.
Adversarial Multi-Criteria Learning for Chinese Word Segmentation_第6张图片
The shared layer maximize the entropy of predicted criterion distribution.
Overall objective functions

Experiments

CWS

dataset: MSRA, AS, PKU, CTB, CKIP, CITYU, NCC, SXU

Knowledge Transfer

1. simplified Chinese to traditional Chinese: 先在简体中文数据集上训练,再在繁体数据集上训练并固定shared层参数。在繁体数据集上测试: AS, CKIP, CITYU

2. formal texts to informal texts: 在NLPCC2016上训练,在微博数据上测试

你可能感兴趣的:(Adversarial Multi-Criteria Learning for Chinese Word Segmentation)