本文贡献点:
模型结构
MLM as correction
Sentence Order Prediction(SOP)
Neural Architecture
获取文本上下文表示
X = [ CLS ] A 1 … A n [ SEP ] B 1 … B m [ SEP ] H ( 0 ) = Embedding ( X ) H ( i ) = Transformer ( H ( i − 1 ) ) , i ∈ { 1 , … , L } \begin{gathered} X=[\text { CLS }] A_{1} \ldots A_{n}[\text { SEP }] B_{1} \ldots B_{m}[\text { SEP }] \\ \boldsymbol{H}^{(0)}=\operatorname{Embedding}(X) \\ \boldsymbol{H}^{(i)}=\text { Transformer }\left(\boldsymbol{H}^{(i-1)}\right), \quad i \in\{1, \ldots, L\} \end{gathered} X=[ CLS ]A1…An[ SEP ]B1…Bm[ SEP ]H(0)=Embedding(X)H(i)= Transformer (H(i−1)),i∈{1,…,L}
MLM as correction任务损失定义
p i = H i m W e T + b L = − 1 M ∑ i = 1 M y i log p i \boldsymbol{p}_i = \boldsymbol{H}_i^m\boldsymbol{W}^{e^T} + \boldsymbol{b} \\ \mathcal{L} = -\frac{1}{M}\sum_{i=1}^{M}\boldsymbol{y}_i\text{log}\boldsymbol{p}_i pi=HimWeT+bL=−M1i=1∑Myilogpi
SOP输出定义,SOP损失也采用交叉熵损失函数
p = s o f t m a x ( h 0 W s + b s ) \boldsymbol{p} = \bold{softmax}(\boldsymbol{h}_0\boldsymbol{W}^s+\boldsymbol{b}^s) p=softmax(h0Ws+bs)
网络损失函数
L = L m a c + L s o p \mathcal{L} = \mathcal{L}_{mac} + \mathcal{L}_{sop} L=Lmac+Lsop
Machine Reading Comprehension
Machine Reading Comprehension(MRC)是一种具有代表性的文档级建模任务,需要根据给定的文章回答问题。
Single Sentence Classification
Sentence Pair Classification
小模型上的结果