《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记

摘要:CNN在sentence modeling和分类取得了state-of-the-art的结果,但是这些都是处理词向量sequentially并且忽略long-distance依赖。为了结合深度学习和句子结构,本文提出了一种dependency-based convolution approach,使用tree-based n-grams而不是surface ones,因此使用non-local interactions with words。

CNNs被使用在NLP的问题上,例如sequence labeling(Collober et al, 2011),semantic parsing(Yin et al. 2014)  和search query retrieval(Shen et al., 2014)。更近的是sentence modeling(Kalchbrenner et al. 2014, Kim, 2014)在很多分类问题上,例如sentiment,subjectivity和question-type classification。然而,有一个问题,CNN是基于像素矩阵的方法,只考虑连续的sequential n-grams而忽视长期以来,例如negation否定,subordination主从关系,和wh-extraction。

sentiment分析中,researchers结合了来自syntactic parse tree的long-distance information,一些说有small improvements,另一些说并没有。。。

本文作者怀疑是因为data sparsity,根据他们的实验,tree n-gram比surface n-gram会稀疏很多。但是这个问题被word embedding减轻了。


Dependency-based Convolution:

第i个词和第(i+j)词的级联操作

n-gram models which feeds local information into convolution operations

然而这个操作不能获取long-distance relationships,除非增大窗口大小,但是会造成数据稀疏问题。

Convolution on Ancestor Paths:

《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记_第1张图片

生成一个句子的feature map:

《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记_第2张图片
《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记_第3张图片
生成一个句子的feature map

Max-Over-Tree Pooling and Dropout:

公式4可以当做pattern detection:only the most similar pattern between the words and the filter could return the maximum activation。

在sequential CNNs中,max-over-time polling(Collobert et al.2011, Kim,2014) 在feature map上操作获得最大的activation代表整个feature map

max-over-time pooling

本文的DCNNs也pool the maximum activation from feature map.

为了获取足够多的variations,随机设置filters来detect different structure patterns。

每个filter的高度是numbers of words,宽度是word representation的维度d

each filter will be represented by only one feature after max-over-tree pooling,after a series of convolution with different filter with different height,multiple features carry different structural information become the final representation of the input sentence。

Then, this sentence representation is passed to a fully connected soft-max layer and outputs a  distribution over different label.


Convolution on Siblings:

ancestor paths不能获取足够的linguistic phenomena,例如conjunction连接词, Inspired by higher-order dependency parsing(Mc-Donald and Pereira,2006; Koo and Collins, 2010)


Combined Model:

结构信息不能fully cover sequential information。并且parsing errors直接影响DCNN的performance while sequential n-grams are always correctly observed。

最简单的结合的方法是concatenate these representations together,then feed into fully connected soft-max neural networks。

《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记_第4张图片
最终的sentence representation

实验结果:

《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记_第5张图片

你可能感兴趣的:(《Dependency-based Convolutional Neural Networks for Sentence Embedding》阅读笔记)